Most major pre-testing research systems, including Ameritest, can provide management with a one-number report card measure, or overall performance score which summarizes in a single statistic the potential effectiveness of an advertising execution. Like IQ scores, these ad performance indices are an attempt to reduce multi-dimensional research on a complex subject to one key number that is easy to think about. But just as over-reliance on IQ scores can lead to oversimplified thinking, and frequently to underestimating the full potential of a child, the overuse of ad effectiveness indices can result in sub-optimal decision-making about advertising creative.

While different pre-testing systems derive their performance measures in somewhat different ways, we would all agree that for advertising to be effective it must perform well on a number of different, independent dimensions. The theoretical constructs that most ad researchers generally agree are important predictors of an ad’s success are:(1) a measure of the ability of an ad to break through clutter to get attention, (2) a measure of brand linkage, and (3) a measure of persuasiveness or motivation. (Some systems such as ARS deal with these first two constructs with a single recall score, which, as Ipsos-ASI points out, is really just a measure of “branded attention”.) These different components of performance are a bit like the verbal scores and the math scores on an SAT test.

The Ameritest Performance Index combines these scores together to provide a percentile ranking of the test ad’s performance either relative to the overall database of previously tested advertising. or some relevant subset, such as ads tested in the same category of goods or services. In general, the distributions of these report card scores in virtually all the major pre-testing systems follow a “bell curve” (again, like IQ scores) which indicates the probability of an ad’s success in the marketplace.

There are three advantages of using this single number report card measure as a simple way of ranking creative executions:

1) The single measure provides clear, unambiguous feedback to the brand/agency team that created the advertising on how hard the creative idea is working, as it is currently executed.

2) From a process standpoint, a simple scoring system makes it easy for busy middle and senior managers responsible for overseeing the production of many pieces of advertising to spot trends over time in terms of the quality and productivity of the creative function of the organization.

3) It is an easy number to communicate to senior management to aid in budget allocation decisions. For example, which ads should get more media support, which should get less.

It seems that there are also three limitations of using a single number report card measure. First an undue focus on the “score” can lead to rote, mechanical decision making, or worse a cynical mistrust of the research feedback on the grounds that “copytesting research can’t measure creativity.” To continue the educational analogy, this attitude can lead to creative director complaints about a “teaching to the test” mentality, or an attempt to “game the system.” In contrast, the best use of advertising research should be to inform creative judgment and produce insights so that research becomes a learning experience about what really works in advertising in addition to providing well-defined measures of success.

Second, a single number score inherently can’t differentiate the strengths and weaknesses of an advertising execution. Just as you have to balance the strengths and weaknesses of individuals in assembling an effective work team, you need to know what each of the different creative elements contribute to an effective advertising campaign as a whole, or to a larger integrated marketing effort. For example, two ads can have identical effectiveness scores for different reasons. One ad because it is high in attention getting power but only average in motivation, while the other ad because it is high in motivation but only average in attention-getting power. Knowing this might cause you to deploy the two ads quite differently in your media plan.

How willing should you be to trade-off a weak brand linkage score for a strong motivation score? A weak branding score means that your advertising is likely selling the category rather than the brand, or perhaps even a competitor’s product. Compensating for weak branding with stronger motivation simply means you are selling the category—or your competitor —even harder. But if the research tells you that your ad has weak brand linkage, in our diagnostic experience this is very often a fixable problem. So, if managers only focus on the single report card score they might not ask the right questions to challenge the creative team to achieve excellence.

The third and strongest argument against single number decision-making is that it can lead to sub-optimization of costly advertising. The reason for this lies in the laws of statistics. Regardless of which pre-testing system you are using, report card measures all follow a bell curve distribution. What does this tell us to expect regarding the ingoing odds for getting an above average performance score? (See Exhibit 1)


By definition, above average means the top thirty percent of the distribution of scores in the database. Only three out of every ten ads submitted for testing will produce scores that management will feel good about spending money on. Another three out of ten ads will clearly score below average. There will be few regrets about throwing these executions away and moving on. But what about those four out of ten executions that will inevitably score average?

How do you feel about spending money on average advertising? As a practical matter, by the time most advertising executions get tested, time is running out. So, there is real pressure to move forward with something in order to meet deadlines, even if your best scoring execution is only scoring “at norm” on the report card measure. This is a situation that can easily lead to sub-optimal decision-making.

Our ingoing presumption with pre-testing research is that many of the ideas that make it as far as the quantitative testing stage have the potential to be winners except for some flaw in the execution that’s holding them back. Sometimes, the first execution of an idea produces a “diamond in the rough.” That’s when diagnostic research is used to help the ad team answer the following question when confronted with a disappointing test score: Is this a little idea that has been well-executed or is it a potentially big idea that has some executional flaw holding it back? Ask yourself if it is an idea worth investing additional time and money trying to improve.

The primary goal of diagnostic research, therefore, is to help us identify these missed opportunities. There’s a huge gain in productivity to be had if you can rework an idea that scores only average to make it into an above average performer rather than throwing out the idea and starting over. Time as well as money is lost when you only use research as a filter for advertising ideas.

The value of diagnostic pre-testing research, therefore, is optimization. For instance, there are many examples of ads that score only average on the performance measure of Attention but score highly on diagnostic factors such as likeability, originality or entertainment value. These factors are normally strongly predictive of above average attention-getting power. In those cases, we at Ameritest look for a structural flaw in the execution that might be fixed with a little re-editing.

In this case, the job of diagnostic research is to define an ad’s problem as precisely as possible so that creatives know what to fix! For that reason, we don’t just engage in copy-testing. We use both verbal and non-verbal diagnostic techniques to provide insights into how an ad is performing.

To illustrate my point, a recent Harvard Business Review article written by a senior Unilever researcher described how Ameritest diagnostics helped Unilever optimize the performance of their television advertising. Looking at one hundred seventeen TV ads tested with Ameritest in a major business unit in two prior years, he found fifty cases of commercial executions that, on the first pass through testing, scored only average on key performance measures. This is the normal rate we would expect from the bell curve. Our diagnostic research suggested that half (twenty five) of these “average” commercials had untapped potential. So these comercials were re-edited based on the research findings.

As a quality control check, one in four had been retested. All but one of the executions improved significantly, from average to above average, on key measures. This was an +87% success rate for improving commercial performance among these underachievers. (See exhibit 2)


Looking at the hundred plus ads produced during this review period, the Unilever researcher found that nearly half, or 43%, of the ads that were approved for airing were first re-edited, or optimized, based on research findings. Diagnostic research using the Ameritest Picture Sorts®—referred to as the “Spielberg” variables in the Harvard Business Review article—enabled Unilever to save a large number of creative ideas that simply needed more polishing while maintaining above-average standards for performance.

In terms of both time and money, therefore, the overall contribution to advertising productivity can be substantial if you take an optimization approach to using research, and don’t focus solely on report card measures.

As a final point, if advertising research is really being used as a learning tool, and not just as a mechanical filtering system, we would expect that your advertising performance should be getting better over time. Advertising research that helps you understand what the consumer is looking for and responding to in advertising should be a feedback mechanism that helps improve the creative judgment of brand managers.

From an organizational standpoint, that means the research helps to train young managers to become better “buyers” of advertising over time. As evidence that diagnostic copytesting can perform this function as well, I cite another study reported by Unilever in an Admap article on the research strategies they use to improve television ad productivity. In a review of over two hundred commercials tested by Ameritest for one of their business units, Unilever found a systematic improvement in overall commercial performance over a six year period. (See exhibit 3)


While a disciplined focus on report card performance of advertising can raise or lower the floor of advertising performance, only an emphasis on learning and understanding can result in the kind of performance growth curve shown here. The Ameritest point of view is that advertising research should be used as a learning system, not just a report card, and as part of a business process of continuous quality improvement.


Kastenholz, John, “The Spielberg Variables” Harvard Business Review, April 2005.

Kastenholz, John and Charles Young “5 Learning strategies for improving ad productivity” Admap magazine, February 2005.