How To Use Holdout Testing To Figure Out Your Most Responsive Ads

Holdout testing is great technique to measure your various ads’ response rates, but there’s some things you have to consider if you want to do it correctly…

Holdout testing basically means you “hold out” (or withhold) on distributing one type of ad temporarily to see if response rates decrease. If your response rates decrease by a significant amount, then your “held out” ad was contributing to response rates. If there was no significant decrease, your “held out” ad might not be contributing much and may not be worth distributing.

Holdout testing is useful if you distribute more than one kind of ad and/or use different mediums such as direct mail, television and internet. You can only do one holdout test at a time, but if you keep testing you’ll often find that one or two of your ads significantly get better responses over others.

Various marketing sites recommend using holdout tests. Here’s what I think are some basics to using holdout tests correctly:

1) Use A Good Sample Size

If you “hold out” an ad for only some of your target market, make sure its large enough to detect a difference. Suppose you send out 5 different ads to 1000 people/households for each ad per month, which is 5000 people/households per month, and you receive an average of 75 sales a month from your ads. That means your average response rate is  1.5%. Suppose your general response rate deviation is about .2% so your response range is from 1.3%-1.7 % per month. You can’t pull away 50 of one ad and expect to a meaningful response, or even 100. You won’t be able to tell if a dip is just within a response range or significant.  Even 500 would be too small (0.015×500/5000 is smaller than .2%). You need to hold out at least 667 fliers of one ad to account for the margin of error/range.

Your best bet in this case would be to “hold out” on all 1000 of one ad to test this out (round numbers are easier). However if you reduce the number of fliers sent out by that 1000, you’ll change your rates, which brings me to my next point…

2) Replace the ads you held out on with your other ads (equally divided)

From the example above, lets say you decided to do a holdout test on all thousand of your third flier, lets call it flier C. You still have 4000 fliers out for the next month for fliers A,B, D, & F (1000 of each of course). Suppose you leave it at 4000 total, not replacing the flier C’s you removed. Lets say you get 70 sales that month, so your response rate is now 1.75%. You think “wow, we really didn’t need that add! Look at how much higher our response rate is now!” But this response rate can be misleading.

Why? Well it’s sort of like celebrating a 100% response rate by giving out 1 flier over a 50% response rate by giving out 100 fliers. The latter is alot more valuable even though the response rate is lower. Plus you cannot be sure if your results are from other factors like the nature of the demographic for flier C. You leave too many factors to uncertainty.

If you instead distribute 5000 fliers of fliers A,B,D,& F evenly (1250 fliers for each) you result will be more reliable. Suppose you try this out and get 70 response again. Your response rate is 1.4%. That changes the picture because 1.4% is within your response range so it just shows flier C does not have a strong impact. If you got back 50 responses, a 1% response rate, then flier C was likely significantly useful. If you got back 90 responses, an increase to 1.8%, then you know flier C wasn’t useful and either flier A,B,D, or F is very useful.

But what if some your responses this month are from flier C’s from 2 or more months ago? That brings me to my last point…

3) Account the “time  lag” for your fliers.

By time lag, I mean how much a flier is used after a certain time period. Sometimes, people don’t buy things right away. They like an ad, they get tempted to buy, but hold off for various reasons. Then they decide to respond long after they first saw the ad. When I put up multiple tutoring fliers, I noticed I would get responses from one flier weeks after I put it up, even after I put up new fliers. Some responded from my older three weeks after I posted them! Suppose in our example a few responders are found to respond to a two month old flier C during the month no flier C was sent out. That would give your fliers a 3 month time lag. That means its a good idea to do your holdout test for 3 months to see if your results are consistent.

One way to find out and calculate your time lag is to set up a control market. Send 1000 fliers to a new test market one time. It could be any of your fliers, from A-F, but with a new contact info or coupon to indicate that specific flier was used. We’ll call this flier C-2. Suppose C-2 gets sent to 1000 people in a new area only one month, and C-2 gets 16 responses that month, a 1.6% response rate. The next month, C-2 gets 8 responses (.8% rate) and then 3 the third month (.3% rate), then 1 the fourth month (.1% rate). Remember that we sent out 1000 fliers the first month only, no fliers after that.  The response rates for each month can represent the total fliers for 5000 also.

Your time lag you need to account for is 3 months because our margin of error/range in our example is .2% and the fourth month is less, at 1%. With these rates, it is probably safe to try out a 3 month holdout test period for one of the fliers to make sure you have consistent results (unless your drops in sales are so blatant the first month you can’t afford to continue the test)

So there you have it, some basics for a great way to pick out your most responsive ads.

I’ll talk about holdout testing again sooner or later, depending on how long I hold out on it 🙂