Like the children of Lake Wobegon, all advertising ideas are above average — if you believe the spin. But the statistical reality of the bell-shaped curve is that only thirty out of every one hundred ads you test will score above average. Forty out of a hundred must, at least the first time you test them, score only average.
No business wants to spend money on average advertising. Competitive advantage is gained in the marketplace only when you spend your money behind advertising that is superior to your competitor’s.
But all too often, because of the tyranny of the calendar and the need to put something on air when the deadline of an air date arrives, your advertising manager must confront the reality of spending millions of dollars behind what is no more than an average performer.
A commercials factory
Advertising is the business process by which ordinary products are made famous and turned into stars. Unilever is a master of this process. It markets over a thousand consumer brands globally and last year spent $3.3 billion advertising them. Like the Hollywood dream factories of yesteryear, Unilever manufactures movies by the hundreds — bite-sized 30-second ones — but each at a cost rapidly approaching four hundred thousand dollars. And this is only the tip of the media iceberg. In the US, the industry rule-of-thumb is that while ten percent of the advertising budget is spent producing a commercial, ninety percent of the total budget is spent buying air time, which works out to nearly a five-million dollar investment associated with each and every television commercial.
How do you manage this expensive but creative and therefore frustratingly unpredictable process to beat the averages? As shown in Figure 1, you can bring research discipline to the creative process by using five different learning strategies to repeatedly achieve productivity gains in advertising performance.
Test the creative with a valid performance standard
Who would launch a new product into a national marketplace without first testing it among consumers? It only makes sense to apply the same logic to advertising, which is simply another product of human ingenuity and craft. One way of making sure that you are not spending money behind average advertising is to set a firm hurdle rate that says an ad cannot go on air unless it first achieves an above average score in pre-testing research.
One secret for identifying superior creative performers is making sure you use valid measures — are the measurements you collect actually related to effectiveness in the marketplace? To answer that question, we have put considerable effort and expense into getting the research right.
Just last year, for example, we stopped using Day-After Recall testing in the US, a measure first popularized by P&G in the 1950s and still the most widely-used commercial pre-testing measure in the US. The impetus for this decision came out of our research (reported in Kastenholz,Young & Kerr, 2004) comparing a sample of sixty commercials tested in three different pre-testing systems. Our results showed that Recall scores systematically rewarded boring, bland and emotionally uninvolving advertising.
Of course, this does not mean that you should stop using “hard” metrics to evaluate TV advertising. In place of Recall, we use a quantitative measure of breakthrough or attention-getting power — one that does not rely on an outdated and limited concept of how advertising enters the mind of the consumer. Modern research has shown that human memory is highly complex and involves several different systems in the brain. The type of memory that Recall testing taps into — semantic memory — is not, for example, the memory system where human emotional experience is recorded, which is known as the episodic system. So, like the joke about the drunk looking for his lost keys under the streetlamp because that’s where the light is, we concluded that the historical reasons for Recall testing may have been a matter of looking in the wrong place for evidence of advertising effectiveness because that’s where the research lights were shining in the past.
By changing our research performance standard to reward more experiential advertising, we also improved the process of working with our advertising agencies. As an artificial barrier to advertising creativity, Recall testing had for years been a great source of friction between Agency Creatives, Brand Managers and Researchers. By aligning the different stakeholders in the advertising development process with a common “mental model” of how advertising works, we are trying hard to continuously improve our ability to collaborate effectively with our creative partners.
Rehearse the creative in rough form first
What stand up comedian would take their act directly to a national television audience like the Tonight Show without trying out their jokes in a small nightclub first? Yet many advertisers go directly into final production of their expensive commercials without any kind of rehearsal in front of a real audience. As Senge (1990) points out in his book on learning organizations, The Fifth Discipline, “The almost total absence of meaningful ‘practice’ or ‘rehearsal’ is probably the predominant factor that keeps most management teams from being effective learning units.”
For that reason, we test many advertising ideas at a rough stage of production first. Most television commercial concepts can be executed inexpensively in a cartoon form known as an “animatic” or with borrowed footage from other pieces of film in a form known as a “ripomatic.” These rough versions cost only about a tenth as much as the final film and are usually good enough for testing purposes.
Rough production not only makes it cheaper to screen ideas, but it can also be used to make good ideas even better. Not all creative concepts are like the goddess Athena, who sprang fully-formed from the mind of Zeus. Sometimes a newborn idea has to come of age before being launched into the world.
In a review of fifty recent ad pre-tests, Kastenholz,Young, & Dubitsky (2004) compared the performance of commercials that went straight to final production, because of timing constraints or other reasons, with commercials that were first rehearsed in animatic form and then were fully produced with the creative benefit of research feedback. Ads that had been rehearsed first in rough production were + 28% stronger in terms of attention-getting power than those that skipped the rehearsal step and went straight to final film. What this means in financial terms is that we can spend less than $50,000 on the rough production and research of an advertising idea in order to make approximately a $1 million improvement in the audience impact of the final commercial. That’s the value of rehearsal.
Experiment with alternatives
All babies are beautiful in their parents’ eyes. The same is true for creative concepts. To find the best solution to your advertising needs, you must avoid the suboptimal approach of falling in love with your first idea.
One of the benefits of rough production, therefore, is that it makes it cost-effective to test multiple options for an advertising campaign. Like new product development, creativity frequently boils down to a process of trial and error. Thomas Edison tried out over 3,000 different prototypes before hitting on the right way to make the electric light bulb.
According to the laws of statistics, if the first idea you test has a 30% chance of scoring above average, then testing two ideas gives you a 51% probability that one of them will score above average, testing three ideas gives you a 66% per cent chance, and testing four ideas gives you a 76% probability that at least one of your creative concepts will score above average on the first pass through the testing system. (This is based on the binomial distribution that arises from a series of Bernoulli trials.) For a new product launch, for example, where timetables are extremely tight and where you don’t want to be forced into the position of launching with an average commercial for lack of better alternatives, it makes sense to plan in advance on testing multiple advertising concepts to gain favorable odds for success.
Optimize the creative with diagnostic insights
Remember, the first time through the testing system you can expect 4 out of every 10 ideas to score at an average level simply because of the laws of probability. Sometimes, though, the first execution of an idea produces a “diamond in the rough.”
Our ingoing assumption with pre-testing research is that many of the ideas that make it as far as the quantitative testing stage have the potential to be winners — but for some flaw in the execution that’s holding them back. The primary goal of diagnostic research, therefore, is to help us identify missed opportunities.
There’s a huge gain in productivity to be realized if you can rework an idea that scores only average to make it into an above-average performer rather than throwing out the idea and starting over. Time as well as money is lost when you only use research as a filter for advertising ideas. The value of diagnostic pre-testing research, therefore, is optimization.
To begin with, diagnostic research is to used to help the ad team answer the following question when confronted with a disappointing test score: Is this a little idea that has been well-executed or is it a potentially big idea that has some executional flaw holding it back? Or in business terms, is this an idea worth investing additional time and money trying to fix?
For instance, we frequently find examples of commercials that score only at average levels on the performance measure of Attention but score highly on diagnostic factors such as likeability, originality or entertainment value — factors that normally are strongly predictive of above average attention-getting power. In those cases, we look for some structural flaw in the flow of the film that might be fixed with a little re-editing. Think of it as the film equivalent of fixing a grammatical error in a sentence you are writing.
Next, the job of diagnostic research is to define the problem as precisely as possible — so that Creatives know what to fix! For that reason, we don’t just engage in copy-testing. We use both verbal and non-verbal diagnostic techniques to provide insights into how a commercial is performing. For example, the Ameritest Picture Sorts™ techniques allow us to understand how a commercial is working as a piece of film, as detailed in Young & Kastenholz (2003). These film direction diagnostics — what we call the “Spielberg variables” — are particularly important because, at the end of the day, the actionability of diagnostic advertising research can only be found in the editing room.
In a review of the hundred plus ads tested in a major business unit in the past couple of years, we found fifty cases of commercial executions that, on the first pass through testing, scored only average on key performance measures — the normal rate we would expect from the Bell Curve. Diagnostics, however, suggested that half, or twenty-five, of these “average” commercials had untapped potential. So these commercials were re-edited based on the research findings.
As a quality control check, one in four had been retested. And all but one of the executions improved significantly, from average to above average, on key measures — an +87% success rate for improving commercial performance among these underachievers.
In terms of both time and money, therefore, the overall contribution to advertising productivity can be substantial if you take an optimization approach to using research. Looking at the hundred plus ads that this business unit produced during this review period, we found that nearly half, or 43%, of the of the ads that were approved for airing were first re-edited, or optimized, based on research findings. As shown in Figure 2, diagnostic research, therefore, provided us with a way to save a large number of creative ideas that simply needed more polishing while maintaining above average standards for performance.
Learn from the competition
From a benchmarking standpoint, it’s not enough for your advertising to outscore the commercials averaged together in some historical norm base. The reality is that you have to beat the other guys’ advertising — right now! Your advertised share-of-mind is not just a function of your share-of-voice, or ad spending, but it’s also a function of the relative strength of your advertising creative when compared to the competition.
Some years ago, we had a run of weak test scores for one of our deodorant brands. This lead to much discussion and theorizing about deodorant being a “low-involvement” category where we shouldn’t expect test scores to be as high as in, say, the more glamorous shampoo category. To some of us, however, this line of explanation had the look and feel of a self-limiting mindset.
Then, a new marketing director took over, who was a strong believer in research. He reasoned that the competition is selling the same kinds of products to the consumer in this so-called low-involvement category, so let’s do some competitive intelligence and test some of their advertising to see their scores. Of course, the competition’s advertising was found to significantly outscore ours. With this new evidence, it wasn’t long before our creative was scoring at this new higher level.
Competitive testing gives you a way of experimenting with different ways of reaching the consumer at someone else’s expense. And if you have more powerful diagnostic tools for understanding why the consumer is responding the way she does to these different approaches, you may end up actually knowing more about your competitor’s advertising than he does himself. And you learn how to beat him.
The new director was responsible for a number of brands besides deodorant. From the beginning of his tenure, he began a program for systematically testing competitive advertising in all of his categories. What he learned from this competitive research found its way into the performance of his own advertising. When we compared the average performance of our ads produced during the two years prior to the onset of competitive ad testing to the average performance of ads produced during the two years after competitive testing was done, in those same product categories, we found that the Unilever average had increased by +23%.
Test, rehearse, experiment, diagnose, learn — that’s our research mantra for improving advertising productivity. Figure 3 shows the performance of two hundred Unilever ads tested over a six-year period with this approach. Now a research system that simply filters advertising can raise or lower the hurdle for acceptable advertising performance, but it would not produce a growth curve such as this. We believe, therefore, that this improvement in our advertising performance over time provides evidence for a genuine learning effect that is produced by working systematically with these five strategies.
J Kastenholz, C Young and T Dubitsky: Rehearse your creative ideas in rough production to optimize ad effectiveness. Marketing Research, 2004, under review.
J Kastenholz, C Young and G Kerr: Does Day-After Recall Testing Produce Vanilla Advertising? Admap, June 2004.
P Senge: The fifth discipline: the art and practice of the learning organization. New York: Doubleday, 1990.
C Young and J Kastenholz: A film director’s guide to ad effectiveness. Admap, September 2003.