A key part of any new project is measuring the return to evaluate its impact – and Progressive Web Apps (PWA) are no different. That’s where an A/B test comes in.
An A/B test is an experiment where you randomly assign users to a control group to test the impact of the change. It is impossible to accurately measure the effect of a change without controlling for all factors aside from the factor being tested. We’ve tried to measure the causal impact of implementing a PWA by using a Bayesian structural time series model, but ecommerce data is really noisy so it’s difficult to pinpoint the exact impact.
A/B tests can be tricky, so here’s all the information you need to run an A/B test on your new PWA.
Should I run an A/B test on my new PWA?
We recommend A/B testing your new PWA because it’s the only way to isolate and measure the impact of your PWA. Here are the pros and cons to consider.
While we do recommend A/B testing your PWA, it ultimately comes down to the cost involved. Note that sites with few transactions may require a test to run for too long to make it worth testing. Always calculate the required test duration before making a decision.
How long should I run an A/B test for?
The duration of an A/B test should be determined before launching the test. It will be based on the amount of traffic to the site, how the success metric differs across users (the standard deviation), and the expected lift.
Round up to the nearest week, with a minimum of two weeks to account for day-of-the-week bias. In the case of a high-consideration purchase that takes a user three weeks, running the test for four weeks rather than two will show a greater lift, closer to the actual uptick in customer lifetime value.
How do I A/B test my new PWA?
- Set up your test plan.
Determine your audience, user-based success metric, and timeline. In terms of your timeline, make sure you choose a period when you have normal traffic and you calculate how much time would be required for a statistically significant test. We recommend starting with 5% of your traffic as a QA group to ensure the split and site are working as expected. Keep this live QA time in mind, as well as implementation time when determining your timeline.
- Configure your analytics platform for the test.
If you’re relying on Google Analytics, you’ll need to set up a couple custom dimensions to extract the results of the test.
- A custom session-level dimension to track the test group. It’s extremely important that users stay within the same group for the duration of the test and that you analyze all metrics in terms of the user. For example, you should look at user conversion rate rather than session conversion rate because it’s the users that are being randomly split into groups, not the sessions.
- A custom session-level dimension to track a unique client ID for each user. This is required so that you can calculate the standard deviation of any metrics that do not have a binomial outcome. For example, revenue per user is more than a yes/no value for each user. If you only care whether an action takes place or not, such as conversion, you can calculate the standard deviation without tracking individual users because the value for each user is either 1 or 0 (converted or didn’t).
- Implement tracking for a QA group
A/B testing a whole new site against an old one can be difficult, especially when the sites were built on different platforms. Mobify provides a simple-to-use code snippet to split your traffic between your PWA and your old experience.
- Review and debug, as necessary
We recommend launching with a 5% split to ensure that everything is tracking correctly.
- Launch the test
Launch your test according to the timeline and resist the urge to peek at the results. You’ve already debugged during the previous phase and so now it’s time to collect enough data to make a decision.
- Extract the user-level data
Analytics platforms do not provide standard deviations with their aggregates. You’ll need to extract user-level data using that custom dimension you set up in Step 2.
- Analyze the results
Calculate whether or not there is a statistically significant difference between the PWA and the old site.
Can I measure the revenue impact of a PWA without an A/B test?
No. Even with an A/B test, you can only approximate the lift over a brief period – the true lift will probably be higher. You’ll be answering the following question: “Approximately how much more revenue are users spending over the duration of test period?” It’s a great estimate, but even then, the effect on customer lifetime value should be even greater. For example, if your customers typically purchase from you once every two months, you will not account for the increase in loyalty if the test runs for only two weeks.
Can I A/B test just one section of the site?
Because launching a PWA is such a sweeping change, you will have a disjointed user experience if you launch it in parts and the test would not accurately evaluate the impact. Imagine rebranding all of your product pages to blue when every other page is red. It’s very possible that having everything blue would outperform red, but you won’t be able to see the result by testing half a change.
Why is it important that it’s “normal traffic” if we’re randomizing?
The users in your testing period (i.e. “the sample”) should be representative of your standard users (i.e. “the population”). If you run an A/B test on Black Friday, for example, the behavior of the sample may not match the behavior of the population and so the results will not describe the actual gains.
Within the sample itself, users are split randomly into the control and treatment groups so everything other than what you’re testing for is controlled.
What makes a good success metric?
For an A/B test, you’ll want one metric to serve as the criterion for the winning variant. This should be derived from the hypothesis you’re testing. The most common hypothesis we test is that the PWA will increase revenue per user.
All metrics you look at for a test should be per user. Do not look at any metric that is divided by sessions because you are not splitting sessions randomly; you are splitting users randomly. It is possible that one variant will cause users to return more often, which will skew your sessions.
Also remember that every user-based metric applies to a time interval. For example, a test running for two weeks can measure only revenue per user up to two weeks, not the full customer lifetime value.
Metrics you may consider for KPIs:
- Revenue per User
- User Conversion Rate
- Product View Rate
- Add to Cart Rate
- Checkout Rate
- Store Finder Rate
- Email Sign Up Rate
Note that the standard deviation is required to calculate statistical significance. It is simple to calculate the standard deviation for rates with binomial (success/failure) outcomes such as return rate and user conversion rate. For metrics where the outcome is a number, such as revenue per user or sessions per user, you will need to extract the raw data for each user in order to calculate the standard deviation. This requires additional tracking.
What makes a bad success metric?
There are metrics that you should be cautious of:
- “Engagement” Metrics
- Time on Site per User
- Pages per User
- Sessions per User
While these metrics are sometimes interesting, they fail to capture the actual goal of an ecommerce site. A good user experience will help shoppers find what they’re looking for faster and in fewer visits. This could decrease these metrics. A good experience will also encourage loyalty, which will cause shoppers to return and spend more time on the site, which could increase these metrics. These metrics are not the ones you want to optimize for.
Rather than… Consider… Time on Site Time in Checkout (minimize this) Pages per User Product View Rate (yes/no) Sessions per User Cohort analysis with the size of the cohort being the typical time between purchases. It is unlikely your A/B test will span a duration long enough to do this metric justice
- Bounce Rate
Bounce Rate is traditionally defined as the percentage of sessions that contain only one pageview. This is not a good metric for an A/B test because the test randomizes users, not sessions. While you could look at the percentage of users who have only one pageview, this may be temporarily biased by returning users who are just not used to the new experience.
- Checkout Completion Rate
You shouldn’t analyze conditional rates when you’re testing the whole website. The Checkout Completion Rate is influenced by the number of users who enter the checkout, so if more people enter the checkout then the rate of shoppers completing the checkout will naturally fall.
- Wish List Rate
Be cautious of any metric that competes with the desired primary action. The rate of shoppers using a wish list, for example, could decrease if they commit to purchasing right away.
- Average Page Load Time
First, if you plot the load times for each pageview, you will see that the distribution is highly skewed, so an average should never be used. You could look at the median, but most analytics platforms do not report this out of the box.Second, PWAs speed up the experience after the first page load. For this reason, it is crucial to separate landing page load times from subsequent page load times for analysis.
Should I split the traffic 50/50, or can I use a different split?
It is possible to split the traffic at a higher or lower percentage than 50%. We usually advise 50% because that split will maximize the test’s confidence (and minimize the required test duration).
Can I change the split during the test?
No, you will need to restart the test if you change the split, which will prolong the duration.
Can I stop the test ahead of time?
No. It is important to run the test for the period that you pre-determined. If you run the test only for the weekend, for example, it is possible that your weekday users behave differently.
How do I know if the split is working?
If you’re splitting users 50/50, the percentage of users in each group should be between 49% and 51%. Note that sessions will likely not be 50/50 because one group’s users may be more likely to return. It is important to drill into different segments of users to ensure that, within each segment, the split is working 50/50. For example, Android users should be 50/50 and iOS users should be 50/50.
How do site updates during the test affect the results?
Updates to the site should be instrumented on both the control and PWA. For example, running a promotional banner on only one version of the site will spoil the results.
Should I look at all users?
You should analyze the group of users affected by the change, otherwise your results will be diluted.
If you’re launching a PWA just for mobile, only look at mobile. If it’s only for Android users, just look at Android users. With the Mobify Platform, you can build a PWA that works across mobile, tablet and desktop with one codebase.
What do I do if there is no winner?
If there is no winner, we cannot statistically prove that one experience yielded more revenue per user than the other. If this happens, it’s investigation time. Is the experience performing well for one segment but not another? Drill in to discover potential bugs and re-evaluate.
What are the typical lifts you’ve seen?
The increase in revenue per user for a PWA depends on your starting point and your user base. Mobify PWAs typically range from a 20% to 40% increase in revenue per user.
If you want to learn more about the return on investment your team can expect with a PWA, book a call with one of our ecommerce experts.