A/B Tests

All Campaigns are associated to a A/B test regardless of whether the actual campaign was designed to be one. To view the results of a running or completed Campaign, go to Engage > Campaigns > Actions > ans select Results.

The Results page has many data points, so have patience to read all of it before diving deeper into the data exploration. The Results page indicates the following:

  • Name of the Campaign
  • Environment
  • Start date
  • End date
  • Data points
  • Campaign status
  • Confidence interval (and their interpretation)
  • Filters
  • Results per metric

Interpreting Results

  • A Data Point is not same as a user. A Data Point is a measurement of each user per day. For example, if a campaign lasts 3 days, and there were 3 users on day one, 2 users on day two, and 2 users on day three, then there would be 7 Data Points for the experiment.
  • Experiences indicated with green are statistically significant and have a higher value than the Control Experience. Experiences in red are statistically significant, but the value is lower than the Control Experience.
  • The number of Data Points matter. Statistical comparisons require enough volume to yield relevant results. With low number of users the statistical measures of standard error and standard deviation are more volatile leading potentially to high Z scores that might lead to false results. This is especially true for the ARPPU measure as it only consists of paying users.
  • The provided filters will allow you to drill down to the results in detail.
  • If you wish to have more in-depth results based on the Campaigns and Experiences, you can always use both in the Analyzer to create custom reports. Each user that is part of an Campaign is assigned to designed Experience allowing you to drill-down in much as details as your data allows.

Statistical Calculations

The Results page of a Campaign shows the results of that campaign for the date range in which there are Data Points.

Omniata stores the results per day aggregated over certain dimensions. All the Campaigns and their Experiences are stored in the same table.

The dimensions include:

  • Experience name
  • Campaign name
  • Project name
  • Country code
  • Activity date

The measures include:

  • users: the number of distinct users
  • new users count
  • payers count
  • sum of revenue
  • purchase count
  • event count
  • session time: the sum of session lengths
  • days active for last 7, 14, 30 and 60

All the information on the Campaigns Results page is derived from this data structure. UID is not included in the Dimensions, so no reporting that would require this is currently possible.

All the tables on the page (ARPU, ARPPU etc) have the same structure. The first column (Group) has the Experience name within the Campaign. All Experiences that have Data Points are listed. The seconds column has the count of Data Points. The 3rd column has the value of the metric and the rest of the columns have different kind of values calculated for that Campaign.

The mathematical definitions and the semantics of the metrics are below. All of them are weighted averages:


Source data

Daily weight

ARPU in $

Daily ARPU - the total revenue of the day divided by the number of active users on the day

The number of distinct active users on the day.

ARPPU in $

Daily ARPPU - the total revenue of the day divided by the number of distinct payers users on the day.

The number of distinct payers on the day.

Total Time per User per Day in minutes

Daily total session time - the total time spend by the users on the day, calculated using Omniata session length calculation logic defined elsewhere.

The number of distinct active users on the day.

Session per User per Day, count

Daily total session count - the total number of session on the day, calculated using Omniata session count calculation logic defined elsewhere.

The number of distinct active users on the day.

Days Active Last 7, 14, 30, etc

Number of distinct days users are active within 7, 14, 30, or any number of days since install. This is a combination retention and engagement metric, a higher value indicates that, on average, users in a specific experiment return more and play more than the the control group.

The number of distinct active users on the day.

The values in the columns are calculated as follows:

Metric Definition
Change The percentage change of the metric compared to the control group.
Std Error - Standard Error

The standard error can be thought of as the standard deviation of the sample value. In this case, the calculation is done by dividing the standard deviation of the observed metric for the A group by the number of measurements (users) who contributed to that metric, doing the same for the B group, summing the values and taking the square root.

The standard error is used for calculating the z value; dividing the difference in the observed metrics by the standard error yields the z value.

Std Dev - Standard Deviation

Standard deviation is a number that describes how much a group of measurements varies from the mean - a higher number indicates the data is further from the mean.

To calculate the standard deviation, the inputs are the summation of the measurement over a day (for instance, seconds of gameplay or revenue in cents) as well as the same value squared.

For instance, if a user spends 100 cents in the morning and 100 cents in the evening on a single day, there should be a single row in the table with 200 and 40,000. It should not display 100 and 10,000 on 2 lines (breaking up purchases), or 200 and 20,000 (squaring the individual purchases and then summing).

Once you have the values, subtract the sum of the original values squared from the sum of the squares, and divide this value by the number of observations (users in this case). Divide this value by the one less than the number of observations, and then take the square root of that value.

This will yield the standard deviation.

Z Score

A z score describes the distance from the mean of the sample to the mean of the population, using units of the standard error.

To calculate it, first sum all the measurements for the A group and divide by the number of measurements, then do the same for the B group. Subtract the value for the A group from the B group’s value to get the difference. Divide the difference by the standard error to get the z score.

The z score is then used to find the confidence interval, or p-value, by looking at a table of z scores and corresponding intervals.


A confidence interval (or p-value) is a percentage that indicates how reliable a sample measurement is. In an A/B test, it indicates how reliable a specific test (i.e. how sure are we that the measured ARPU for A group vs B group is accurately telling us the difference).

Confidence intervals can also be determined via the normal distribution function in Excel. Pass the z value and “TRUE” into the NORM.S.DIST function to get the p-value.

Omniata uses a lookup table to assign the p-values.

Confidence intervals over 95% can be seen as statistically very reliable. Confidence intervals between 90% and 95% are best looked at directionally, meaning that if the A measurement is higher than the B measurement, but with a 90-94.9999% confidence interval, you can believe that the A group’s measurement was higher, but the exact amount may be off.

Confidence intervals under 90% should not be used as the sample data may not accurately reflect the population data.

This article was last updated on February 12, 2016 01:10. If you didn't find your answer here, search for another article or contact our support to get in touch.