Every developer uses A/B testing to optimize the product performance. In this part, we will learn how to time the A/B tests across your product life-cycle and design the variants.
One of the most common ways to optimize app’s performance is A/B testing. Yes, it is a familiar topic to most app developers, especially if you are using analytics. I hope to bring a little twist on the discussion of testing by focusing on timing of the A/B tests and design of variants used in these tests.
A/B testing should be incorporated to your app development from the start of the project to the very end of app life cycle. In different app lifecycle phases your testing needs will change from technical testing to optimizing a specific part of the app such as monetization. The key is to think of A/B testing through phases.
Before launching the app, you should make sure that everything is technically ready in the app for A/B testing. Make sure that the app is integrated to an analytics platform that supports testing, or to your internal tools when available. Test the different variants (content) with tests users and devices. Also important is to decide with parameters should be used for testing: in-app soft currency exchange rates, starting coins, time to complete task, etc. The list parameters is almost endless and app specific, so spending some time to design in detail what kind of tests you plan to run for different types of optimization exercises such as retention, engagement and monetization. You want to make sure that everything is in place before the app is in the stores.
Now you will start having real users that can be used to tweak the A/B tests and overall application performance. The main focus of the A/B tests in this phase is to optimize for retention. Now real users are providing direct feedback by coming back to the app, or by not coming. You can also start testing for suitable value ranges to be used in later tests. The most common focus area at this stage is on retention and engagement. Monetization is often a lesser concern, as it is typically driven by highly engaged users. If soft launch period is lengthy, testing for monetization could be useful in the user base is large enough.
This is the key moment for testing, and when everything needs to be in place. There should be no technical issues as everything was tested in advance, and you know what are the parameter ranges to be used based on the test ran during the soft launch. While the testing in this phase focuses more on retention and monetization, engagement should not be left outside as there might significant opportunities for testing with localized content. The length of the A/B test depends on the number of variants used in tests, and of course number of users. Many variants and small user base require longer tests, whereas a few variants and large user base might need a very short timeframe to produce significant results. (I’ll come back to this later in this blog, where I’ll address variant design and sample sizes.) The goal of the global launch A/B test is to find the most optimized user experience that will be provided to all user with the next update. Once the best performing user experience is found, you should update the app for all users. In some cases, the A/B testing tool used might also allow you to change the app configuration from multiple variants to a single one. This could be used to speed up the process of delivering optimized user experience to all users at once instead waiting for the store release cycle.
Testing should not end to the global launch. The app might get updates where new features can be tested and optimized. Also, it might be useful once and while to run a repeat test to confirm that results of earlier tests are still valid. Why? The optimized user experience might change as the app and it’s user base matures, and more and more players are highly engaged users. For them an older, previously poorly matching variant might now be the best performing experience.
As you might have figured out already, there is not a single approach to A/B testing as there are several options to consider. Right choices often depend to the data available, and this brings us to variant design and sample size.
As mentioned above, one of the key aspects of designing successful A/B tests is to design the variants accordingly. The variants should not only match the use case at hand, but also the targeted user base. For example, if you plan to test for monetization and aim to target previous buyers, you need to have enough spenders in your user base to have a meaningful test. As typically the paying users only account 1% of all users, the test should only planned when daily user base starts be in thousands, preferable in tens of thousands. Otherwise the test won’t be statistically valid. To take the size of the user base into account when designing A/B tests, you can approach the design process through a matrix that scales based on the tested dimensions and the size of the user base.
In the matrix above, you can see that the first test with smaller user base will focus on progression speed which can be altered, for example, by varying level difficulty. You might wonder what is the connection between level difficulty and retention. The progression speed needs to optimized so that users have just the right amount of challenge in the game but don’t get stuck too early as that might lead to churn. Running this small user base test should reveal the optimal progression speed.
When your user base increases, it’s time to add more variants. In the example matrix, the new variants are related to the amount of starting gems that the user has in the beginning. By changing the number of gems, the test focuses on players willingness to use gems as more frequent use might eventually lead to in-app purchases. As the focus is on the propensity to make purchases, you need a large user base as only a fraction of users will spend actual money in the app. With a larger user base you might have 25 different test variants, instead of 5 with a small user base.
So how does this relate to the sequential A/B testing? You can think that the examples discussed above are tests run during the soft launch phase. You first ran the progression test with a small user base with progression speed varying between 1-100. You identified that the optimal speed for retention is on the faster side of the variants, for example 50-100. In the next test with larger user base you can split this into more granular values that would eventually reveal the optimal progression speed is in the range of 65-75. In addition to testing for the progression, you can test for monetization by varying the amount of starting gems. Every test that you run after can have more granular values allowing you to find the optimal performance of the app on all the metrics that you find relevant.
There are no limits on what to test, but bare in mind the impact of sample size and and testing the “right” thing at the right time of your app lifecycle. A final reminder. When your app gets very popular be careful with radical A/B testing as active player communities might not be too keen on changes in game balance.
To learn more on how Omniata can help you with A/B testing, please feel free to contact us