There is a simple idea at the core of most mobile marketing campaigns these days – if you spend $x on some marketing activity and received $y in return you want y to be grater than x. This is often referred to as ROAS or campaign ROI. We have trained mobile marketers to break down their activities to small units: ad groups, ad sets, ad creatives, audiences, … and find the ones that show ROAS. Doubling down on the positive ROAS units while shutting down the negetive ROAS units is the leading campaign optimization strategy today.
Here is the problem – it only works under certain conditions.
There is a famous saying by Mark Twain – “There are lies, damned lies and Statistics”. It comes to warn people about using statistics in a wrong way. One such way is using statistics when small numbers are involved. Another way in which statistics are deceiving is called Multiplicity or Multiple comparisons. Let’s see how those come into play when calculating returns.
Beware of the small numbers
Most companies base their ROAS calculations only on revenues from In-App Purchases. This is a result of 2 things:
- Up until recently, ad based monetization and ad spend were mutually exclusive
- Until SOOMLA TRACEBACK there was no way to attribute ad monetization
The problem with In-App Purchases revenue is that it’s highly concentrated. Studies have shown that purchases are less than 2% of the users and among those 2%, the top 10% generate half of the revenue. Let’s say that you spent $5,000 to acquire 1,000 users and you are trying to figure out the return. Most likely you have 20 purchases but there are 2 whale users who generated $1,500 each (this is aligned with the studies – yes). Now, suppose you had 2 ad-groups in that campaign and you are trying to figure out which one was better. Here are the options:
- Group A had both whales
- Group A had one whale and B had one whale
- Group B had both whales
Since we are talking about 2 users here – the scenario that actually happened would be completely random. Even if one ad-group is better than the other it is still very likely for that group to outperform the other group when we are talking about only 2 users who can flip the outcome completely. The danger here is that our UA teams would double down on the ad-group that yielded the 2 whales without understanding that it’s not better than the other. If we look at sample sizes here n=1000 is normally considered a good sample size. Has the monetization been less concentrated a sample size of 1,000 should have been enough to make decisions. However, for the purpose of acquiring whales the actual sample size is n=2 in this case. We should try to get at least n=500 before we start making decisions on media buying. The problem of course is that attracting 500 whales could be a very expensive test – more than $100,000 based on the numbers in the example above.
On the other hand, companies who monetize with ads enjoy the fact that more users participate in generating revenue and can make decisions based on smaller sample sizes and smaller test budgets.
Multiplicity – the bias of multiple shots
Another bias we normally see in mobile marketing is Multiplicity. The easiest way to explain this is with the game of basketball. Let’s imagine you are through from the 3-pt line and you have 50% chance to score. What happense if you try twice, the chances of scoring at least once becomes 75%. With 3 shots, it’s 87.5% and so forth. The more times you try the better your chances to score.This is what happens when you try to hard to find positive ROAS in a campaigns that has a lot of parameter. You compare ad-groups – that’s 1 shot, you compare ad creatives – that’s a 2nd shot, you compare audiences – that’s a 3rd shot and so forth. The more you try to find a segments with positive ROAS by slicing and dicing the more likely you are to find a false positive one.