In this post, we’re going to walk through how to interpret A/B testing results in order to choose a winner by analyzing the results of an example test we set up.

But before we get started, we need to remember two things that you must do when you setup your own tests.

  1. Your test should run until there’s at least a 95% chance to beat the control.
    Sometimes it’s tempting to end a test earlier, but this is the only way to know for sure that the conversion rate improvement is statistically relevant. You also should know that the smaller the improvement, the longer the test will need to run in order to reach statistical significance. A test with a 5% improvement will need to run longer than a test with a 45% improvement.
  2. You should also test for a minimum of seven days.
    This takes into account traffic variations for weekdays and weekends and also ensures your winning variation can stand the test of time. If it can’t win for seven days, can you really trust it to improve conversion rates for the long-term on your site? Not really.

Now that we have that out of the way, let’s talk about how to best interpret test results.

Interpreting Your Test Results

Most tests include engagement (the number of clicks on a page) by default, and then also measure other significant goals that were set up when the test began. For our sample test we set up goals for the number of people who make it to the sign-up page and the number of people that make it to the confirmation page, a.k.a. the number of people that sign up for a free trial. If this was a full-fledged test, we would also track the number of people that eventually sign up for a paid account because revenue generated is what matters the most, not necessarily the number of people who sign up for free accounts. Thus, we’d want to track that as well. We’ll also include engagement as a goal for the example so we can get an idea of which version increases engagement.

The first thing you’ll see when you click on “Reports” and then “Running Tests” in Visual Website Optimizer is a summary of your current test. It shows the total visitors, total conversions, the conversion rate, and the number of variations and goals being tracked. It also provides a quick summary table that indicates with a red or green bar whether the new variation has improved conversion rates or not (green means it has and red means it has not). This data is helpful for a quick update at a glance, but you’ll learn more by clicking on the “Detailed Report” tab or button.

Note: The sample data in this example is based on a test that was not conducted on CrazyEgg.com.

We’re using the data for example purposes, not to show real results from the Crazy Egg website.

After clicking on “Detailed Reports,” you’ll see an option to toggle between goals and a chart representing the conversion rate for each variation (including the control). The chart visually shows the conversion rates for each variation and helps to get an idea how much better one version does compared to the other over the period of time that you test. In the example below, the new variation (the purple line), started out performing significantly better than the control, leveled off somewhat, and continued to outperform the variation over the seven day testing period.

Below the chart is a detailed view of the results for whichever goal you’ve selected. It compares the control against the variation(s) and shows the conversion rate, the conversion rate range, the percentage improvement, the chance to beat the original, and the number of conversions and visitors for each variation. We’ll look at these results more closely in the next section.

A Detailed Look At The Results

For our example stats, we’re measuring engagement, visits to the sign-up page, and free-trial sign ups. Let’s look at each of those more closely now.

Engagement

When we look at engagement, we see that the new variation increased engagement by 2.48% with an 82% chance to beat the original. Engagement for the new version was 60.69%, and engagement for the control was 59.23%.

Takeaway: Since this isn’t a super important stat for this test, we’ll monitor it as a way to learn more about the new variation, but we won’t put too much weight on it one way or the other because sometimes versions that decrease engagement lead to a higher conversion rate for other goals which means less engagement is better than more for some variations. We’re also not worried about getting to a 95% or higher chance to beat the original when it comes to engagement since it’s not the most important stat for our test.

Sign Up Page Visits

Next, we’ll look at the sign-up page visits. Our example test increased click-throughs to the sign-up page by 2.5% with a 72% chance to beat the original. Conversions for the control are 36.41%, and conversions for the new variation are 37.32%.

Takeaway: Once again, this isn’t the most important goal for this test, so we’ll take what we can learn from the improved conversion rate but won’t base the outcome of the test on this result. It’s nice to see that click-throughs have gone up which means the top of our funnel is converting better, but there’s no guarantee that this will lead to more sign-ups (although there’s a good chance it will so this is an encouraging improvement). This is another stat we can learn from, but not the stat we’ll use to decide the winning variation.

Free Trial Sign Ups

The next and final stat is free trial sign ups. The primary goal for our test was to get more people to sign up for a free trial, so this is the most important goal of the three we’ve looked at so far. Even though the other goals increased conversions in earlier steps of the funnel, we won’t necessarily want to implement the change unless free trial sign ups improve as well.

What we see from the stats is the new version increased conversions by 8.21% with a 92% chance to beat the original. The conversion rate for the control is 24.86%, and the conversion rate for the new variation is 26.9%.

Takeaway: The takeaway based on the results is very promising. The test has been running for seven days, and the new variation has a 92% likelihood to improve conversions by 8.21%. It’s still early to declare a winner based on the recommendation to wait until a 95% chance to beat the original, but the results look promising. We’ll continue to let the test run a bit longer until it reaches a 95% or higher chance of winning, but there’s a good chance we’ll get there since we’re already at a 92% likelihood.

Declaring A Winner

Based on these results, we’d declare the new version a winner so long as it continues to improve conversions until there is a 95% or higher likelihood of winning. We can continue to look at the other results and conversion improvements, but since goal number three is the most important, we’re going to declare the winner of the test based on which version has better conversions for that goal.

And once we do declare a winner, we’ll hand over the winning version’s information to the IT department or a web developer to implement the change. Visual Website Optimizer makes it easy to test simple changes without getting a developer involved, but the changes eventually need to be made permanently on your site, which means you’ll need to request the services of a web developer who can implement the changes.

What Did We Learn From The Test?

Did you find these results interesting? The new variation increased engagement by 2.48%, increased sign-up page visits by 2.5%, and increased free trial sign-ups by 8.21%. One of the biggest lessons we learned from this test is that the final goal completions matter more than micro-conversion improvements from other goals. Since what we really care about in this test is more people signing up for a free trial, then that’s the goal that really matters, and that’s the goal we need to use to declare a winner. Visits to the free trial sign-up page only went up by 2.5%, but free trial sign-ups went up by 8.21%. If we were only measuring click-throughs to the sign-up page, we wouldn’t have realized how beneficial this test was for the stat that really matters. This shows how important it is to measure the most important goal for your site, and not micro-conversions.

It’s also important to point out that the goal that really matters the most is eventual paid account sign-ups. If we really want this test to improve the bottom line, we also need to measure the percentage of people that eventually sign up for a paid account. It’s possible that we increase the number of free trial sign ups but decrease the number of paid account sign ups, which means we need to measure all the way to the end of the funnel in order to get the most relevant results.

Additional Tools: Heatmaps & Clickmaps

Visual Website Optimizer also provides heatmaps and clickmaps you can use to gain further insights. By looking at the heatmaps and clickmaps for your test, you’ll get a better idea about how people are interacting with your website. You’ll know which links and buttons they click on the most, which ones they ignore, what they think is a link but isn’t, and where they might be confused on a site. This information is useful for coming up with new ideas and knowing if visitors are getting confused on your site or not. You can look at this information during and after a test to get even more information about how visitors are using your site.

This is an example of what heatmaps and clickmaps look like in Visual Website Optimizer.

Crazy Egg also provides heatmaps and clickmaps that offer even more insights about your visitors. The VWO version is a great add-on for A/B tests, but Crazy Egg offers a full-fledged version that provides even more insights such as clicks based on different traffic sources and scrollmaps that show how far down the page visitors scroll. The heatmaps and clickmaps provided by Visual Website Optimizer is a great start and a great add-on for A/B tests, but Crazy Egg is better for long-term data gathering.

A Few More Tools

Visual Website Optimizer provides four more free tools that are useful for testing purposes.

Landing Page Analyzer

The Landing Page Analyzer provides a series of questions about your landing page that result in an eventual score based on different elements such as relevance, motivation, call to action, removing distractions, and reducing hesitation. It also provides detailed feedback with suggestions and recommendations on ways you can improve your landing page once you complete the series of questions they use to analyze your page.

Test Significance Calculator

The Test Significance Calculator helps you calculate how significant your test is based on the results you receive. This is already calculated with Google Content Experiments, Visual Website Optimizer, and Optimizely, but if you manually run your own test, you can use the Test Significance Calculator to verify the significance of the results you achieve.

A/B Ideafox

The A/B Ideafox allows you to search for case studies based on challenge (sales, downloads, signups) or website type. You can also use the search feature to find case studies based on any type of criteria you’d like to enter. The A/B Ideafox is a great place to learn more about CRO and to find studies that will reveal potential testing opportunities on your site.

Test Duration Calculator

The fourth and final tool is the Test Duration Calculator. This calculator helps you to know how long you should run a test in order to get relevant results. You may have started a test in VWO that’s improving conversions by 8%, but you may not know how long it will take to get relevant results. The Test Duration Calculator will help you know based on the existing conversion rate, the expected improvement, the number of combinations, the average number of daily visitors, and the percentage of visitors included in the test.

One Final Point About Traffic

Something else to keep in mind is that the amount of traffic your site receives determines how long you’ll need to run a test in order to get statistically relevant results. If your site receives a large amount of daily traffic, then you’ll get relevant results faster than a site that receives a small amount of daily traffic.

In addition, smaller differences in conversion rates require you to run a test for longer. If you don’t have much traffic and if the difference between variations is small, you may need to run the test for a really, really long time. The good news is you can use the Test Duration Calculator mentioned above to estimate how long you’ll need to run a test in order to get significant results. You may also want to consider finding ways to drive new traffic in order to determine a winning variation in a shorter amount of time.

The more traffic your site has, the faster you’ll find statistically significant results. Conversely, if your site traffic is too low, it will take a long time to identify a statistically significant result.

Conclusion

  • You should always run your tests until there’s at least a 95% likelihood that the new variation will increase conversions.
  • Your tests should last for a minimum of seven days in order to make sure the results can stand up over time.
  • The percentage improvement for the most important goal for your project matters more than anything else. If you’re goal is to generate more leads, then you should base the results of the test on that, not the number of people that click on your “Learn More” button.
  • Other goals can be tracked in addition to the goal that will determine a winner, but you shouldn’t base the success or failure of the test on those results.
  • If you have a small amount of traffic and a small improvement in conversions, you’ll need to run your test for a long period of time. The test duration calculator will give you an idea of how long you’ll need to run your test based on the expected improvement and the amount of traffic your site receives.