A/B testing has been an integral part of marketer toolbox for a good reason – it takes a great deal of the guess work away from marketing. In online and mobile companies it also became a popular tool for product managers. Every time a new version is released, why not a/b test against the existing version and make sure nothing got broken. In mobile app monetization, however, this tool is not available.
Why ad based app monetization is so hard to A/B test
The core requirement for A/B testing is to be able split your users into two groups, give each group a different experience and measure the performance of each one so you can compare it later. There are a number of tools who can facilitate the split for you including Google Staged Rollout. If you are measuring IAP monetization it’s easy enough to associate purchases to the users who made them and then sum the revenue in Group A and Group B. In ad monetization however, it’s impossible to associate ad revenue to individual users. The ad partners mostly don’t report the revenue in this level of granularity.
Method 1 – interval testing
One alternative that companies have been using is interval testing. In this method, the app publisher will have one version of the app already published and will roll out a version with the new feature to all the devices. To make sure all the users received the new version publishers will normally use force update method that gives the user no choice. The impact of the new feature will be measured by comparing the results over two different time intervals. For example, Week1 might have contained version 1 and week 2 might contain version 2 so a publisher can compare version 1 vs. version 2 by comparing the results in different date ranges.
Pros
- Very simple to implement – no engineering effort
Cons
- Highly inaacurate and subject to seasonality
- Force update method has a negative impact on retention
Method 2 – using placements or different app keys
This is a pretty clever workaround for the problem. Most ad providers has a concept of placements. In some cases, they are called zones or areas but all 3 have the same use – they are planned so you can identify different areas in your app where ads are shown for reporting and optimization purposes. The way to use this for A/B testing is to create a zone A and Zone B and then report Zone B for users that received the new feature while reporting Zone A for the control group. If you are already using the zones feature for it’s original purpose, you might already have zone 1, 2, 3, 4 and 5 so you would create 1a, 1b, 2a, 2b, ….
Of course, if you are using multiple ad-networks you would need to repeat this set up for every ad-network and after the test period aggregate the results back to conclude your A/B test.
A variation of this method is to create a new app in your ad-network configuration screen. This means you will have 2 app keys and can implement one app key in group A and the other app key in group B.
Pros
- More accurate compared to other methods
Cons
- The effort for implementing a single test is very high and requires engineering effort
- Will be hard to foster a culture of testing and being data driven
Method 3 – counting Impressions
This method requires some engineering effort to set up – every time an impression is served the publisher reports an event to his own servers. In addition, the publishers sets up a daily routine that queries the reporting API of each ad-network and extracts the eCPM per country. This information is than merged in the publisher database so that for every user the impression count for every ad-network is multiplied by the daily average eCPM of that ad-network in that country. The result is the (highly inaccurate estimation of the) ad revenue of that user in that day. Once you have this system in place, you can implement A/B tests, split the users to testing groups and than get the average revenue per user in each group.
Pros
- After the initial set up there is no engineering effort per test
Cons
- Settting this system up is complex and requires a big engineering effort
- Highly inaacurate – it uses average eCPM while eCPM variance is very high
- Can lead to wrong decisions
Method 4 – leveraging true eCPM
This method leverages multiple data sources to triangulate the eCPM of every single impression. It requires significant engineering effort or a 3rd party tool like SOOMLA TRACEBACK. Once the integration of the data to the company database is completed, publishers can implement a/b tests and can get the results directly to their own BI or view them through the dashboard of the 3rd party tool. Implementing A/B tests becomes easy and a testing and optimization culture can be established.
Pros
- The most accurate method
- Low effort for testing allows for establishing a testing culture
- Improvement in revenue can be in millions of dollars
Cons
- The 3rd party tool can be expensive but there is usually very quick ROI