<p><em>Editor&#8217;s Note: This post was originally written by Ryan Iyengar. If you&#8217;re interested in seeing more of Ryan&#8217;s content, visit </em><a href="https://ryaniyengar.com/" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)"><em>his Medium account</em></a><em>.</em></p>



<p>For a lot of e-commerce, subscription or internet companies, A/B testing is a way of life. There are many tools and software services out there for setting up and reporting on tests, but if you happen to be in the privileged position of owning your own A/B test data in a relational database, then it’s worth setting up your own custom reporting in SQL.</p>



<p>I’ll be using <a href="https://www.sisense.com/product/data-teams/">Sisense for Cloud Data Teams</a> for this exercise, and some of their SQL shorthand to speed up my query writing. Specifically, I’ll be referring to sample data loaded into SQL Views, which are referred to within brackets as [table_name].</p>



<h2 class="wp-block-heading"><strong>Example A/B Test Data</strong></h2>



<p>For this exercise, I’ve generated two tables of sample data, in a format you might expect to find it in a relational database. Check out my blog post on <a href="https://ryaniyengar.com/generating-sample-sql-data-in-excel-and-google-sheets-76d851afd306" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">how to generate sample data in Excel and Google Sheets</a> if you’re interested in more specifics. You can also check out the <a href="https://docs.google.com/spreadsheets/d/1x-jO246uH5SQQnXJgY1WXFKbzYj_zLYw63ZJ5yf6lco/edit#gid=0" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Google Sheet for A/B Test Sample Data here</a>.</p>



<h3 class="wp-block-heading"><strong>‍ab_test_entrants</strong></h3>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/AB-Test-Entrants.png" alt="AB Test Entrants" class="wp-image-74945"/></figure>



<p>This table contains the date a given user_id was entered into a test, the name of the test they entered into, and finally the version of the test they’re a member of. Tables like this should usually have a uniqueness constraint across user_id / test_name, so a business user knows that one user didn’t see multiple versions of a test. This particular sample table refers only to a single test, “new_funnel_test_2017–12–01”, but could theoretically contain more.</p>



<h3 class="wp-block-heading"><strong>ab_test_conversions</strong></h3>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/AB-Test-Conversions.png" alt="AB Test Conversions" class="wp-image-74955"/></figure>



<p>This table refers to only the user_ids that reached a conversion step, and the date when they converted. Tables like this can get hairy to validate with conversions that can happen multiple times per user (e.g. e-commerce sales) and running multiple tests for the same user (e.g. running distinct tests in subsequent months). This is a simple example that dodges a lot of those complications by maintaining only one test.</p>



<h2 class="wp-block-heading"><strong>Simple A/B Test Validations</strong></h2>



<p>Before I start digging into calculating conversion rates, I like to validate that my test data is correct. That means I don’t have orphaned users, my conversions are happening after the test started, and of course that I’m collecting multiple converting users across my expected test versions. Here’s some quick “unit testing” style queries that validate those questions.</p>



<p><strong>Check for orphaned users</strong></p>



<p>With a FULL OUTER JOIN between the entrants and converting users, how many users fall into the three possible buckets: Entered but not converted, entered and converted, and converted but not entered?</p>



<pre class="wp-block-code"><code>select
 count(
   case
     when ab_test_entrants.user_id is not null
     and ab_test_conversions.user_id is null
       then 1
     else null
   end
 )
 as users_entered_but_not_converted
 , count(
   case
     when ab_test_entrants.user_id is not null
     and ab_test_conversions.user_id is not null
       then 1
     else null
   end
 )
 as users_entered_and_converted
 , count(
   case
     when ab_test_entrants.user_id is null
     and ab_test_conversions.user_id is not null
       then 1
     else null
   end
 )
 as users_converted_and_not_entered
from
 [ab_test_entrants]
 full outer join [ab_test_conversions] using (user_id)</code></pre>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/AB-Test-Users-Entered-and-Converted.png" alt="AB Tests Users Entered and Converted" class="wp-image-74950"/></figure>



<p>So far so good, 40 converted out of 200 total, with no orphaned converted users polluting my pool for analysis.</p>



<h3 class="wp-block-heading"><strong>Check for conversions after test entries</strong></h3>



<p>With a LEFT JOIN starting from the AB test entrants, what is the distribution of days it takes to get from entry to conversion? Hopefully I see numbers greater than 0 to tell me I’m looking at the right time frame for conversion.</p>



<pre class="wp-block-code"><code>select
 datediff('day', date(ab_test_entrant_date), date(conversion_date)) as days_from_entry_to_conversion
 , count(*) as num_entrants
from
 [ab_test_entrants]
 left join [ab_test_conversions] using (user_id)
group by
 1</code></pre>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Days-from-AB-Test-Entry-to-Conversion.png" alt="Days from AB test Entry to Conversion image" class="wp-image-74961"/></figure>



<p>Also so far so good! Looks like this sample data set took at most 7 days to get from entry to conversion.</p>



<h3 class="wp-block-heading"><strong>Check for data populating in each bucket</strong></h3>



<p>Finally, do I have the expected number of versions, collecting conversions in each bucket?</p>



<pre class="wp-block-code"><code>select
 ab_test_version
 , count(ab_test_entrants.user_id) as ab_test_entrants
 , count(ab_test_conversions.user_id) as ab_test_conversions
from
 [ab_test_entrants]
 left join [ab_test_conversions] using (user_id)
group by
 1</code></pre>



<figure class="wp-block-image"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/AB-Test-Entrants-and-Conversions-by-Version-770x251.png" alt="AB Test Entrants and Conversions by Version" class="wp-image-74940"/></figure>



<p>Yep! Two versions, both starting to collect real data to analyze.</p>



<h2 class="wp-block-heading"><strong>A/B Test Conversion Rate Calculations</strong></h2>



<p>The simplest way to compare versions in an A/B test like this, is using conversion rates. So total number of unique users that converted in a bucket, divided by total number of users entered into that same bucket. From the above graph, the orange bar divided by the blue bar.</p>



<pre class="wp-block-code"><code>select
 ab_test_version
 , count(ab_test_conversions.user_id)::float /    
   count(ab_test_entrants.user_id) as conversion_rate
from
 [ab_test_entrants]
 left join [ab_test_conversions] using (user_id)
group by
 1</code></pre>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Conversion-Rate-by-Version.png" alt="Conversion Rate by Version" class="wp-image-74967"/></figure>



<p>Our test appears to be winning, 22% of users are converting vs. 18% in the control treatment! Now, how do we know if that’s a statistically significant result?</p>



<h2 class="wp-block-heading"><strong>Calculating A/B Test Confidence Intervals in SQL</strong></h2>



<p>I’m going to crib heavily from a Sisense for Cloud Data Teams blog post on the topic, and estimate a confidence interval around the initial results using just SQL. In simple terms, the more data you have, the tighter the confidence interval, and the more you should believe that the 4 percentage point difference above is a real thing, not a statistical accident.</p>



<pre class="wp-block-code"><code>with
 ab_test_conversions as (
   select
     ab_test_version
     , count(ab_test_entrants.user_id) as ab_test_entrants
     , count(ab_test_conversions.user_id) as ab_test_conversions
   from
     [ab_test_entrants]
     left join [ab_test_conversions] using (user_id)
   group by
     1
 )
 , ab_test_conversion_rates as (
   select
     ab_test_version
     , ab_test_entrants
     , ab_test_conversions
     , (ab_test_conversions + 1.92) / (ab_test_entrants + 3.84 )::float as conversion_rate
   from
     ab_test_conversions
 )
 , conversion_rate_standard_error as (
   select
     ab_test_version
     , ab_test_entrants
     , ab_test_conversions
     , conversion_rate
     , sqrt(conversion_rate * (1 - conversion_rate) / ab_test_entrants) as standard_error
   from
     ab_test_conversion_rates
 )
select
 ab_test_version
 , ab_test_entrants
 , ab_test_conversions
 , conversion_rate - standard_error * 1.96 as conversion_rate_low
 , conversion_rate
 , conversion_rate + standard_error * 1.96 as conversion_rate_high
from
 conversion_rate_standard_error</code></pre>



<p>Stepping through the above, I’m doing one operation per CTE expression:</p>



<ul><li>Counting the same conversions and entrants as the previous conversion rate chart by version</li><li>Modifying the numerator and denominator slightly to calculate the conversion rate, as per the <a href="http://www.measuringux.com/adjustedwald.htm" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Adjusted Wald Method</a></li><li>Calculating the <a href="https://en.wikipedia.org/wiki/Standard_error" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Standard Error</a> of that conversion rate</li><li>Padding my calculated conversion rate on either side by multiplying that standard error by <a href="https://en.wikipedia.org/wiki/1.96" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">1.96</a>, to estimate the area under which 95% of cases probably fall</li></ul>



<p>That will generate 3 separate conversion rate estimates per test version, which can be a lot to visualize. My preferred method in Sisense for Cloud Data Teams is the combo scatter / bar.</p>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Conversion-Rate-with-Confidence-Intervals-by-Version.png" alt="Conversion rate with confidence intervals by version" class="wp-image-74978"/></figure>



<p>So now we can see that even though our Test version is winning at the moment, there’s a decent chance it could be due to variance. If the conversion_rate_low of the Test version continues to increase over time, and eventually rises above the conversion_rate_high of the Control version, we can say: “with 95% confidence, the Test variant is a winner”.</p>



<h3 class="wp-block-heading"><strong>Cumulative A/B Test Conversion Rates Over Time</strong></h3>



<p>Another way I like to visualize the variance in these kinds of test results, is through a cumulative conversion rate over time.</p>



<pre class="wp-block-code"><code>with
 ab_test_conversions_over_time as (
   select
     date(ab_test_entrant_date) as ab_test_entrant_date
     , ab_test_version
     , count(ab_test_entrants.user_id) as ab_test_entrants
     , count(ab_test_conversions.user_id) as ab_test_conversions
   from
     [ab_test_entrants]
     left join [ab_test_conversions] using (user_id)
   group by
     1
     , 2
 )
 , ab_test_conversions_cumulative as (
   select
     ab_test_entrant_date
     , ab_test_version
     , sum(ab_test_entrants) over(partition by ab_test_version order by ab_test_entrant_date rows unbounded preceding) as cumulative_entrants
     , sum(ab_test_conversions) over(partition by ab_test_version order by ab_test_entrant_date rows unbounded preceding) as cumulative_conversions
   from
     ab_test_conversions_over_time
 )
select
 ab_test_entrant_date
 , ab_test_version
 , cumulative_conversions::float / cumulative_entrants as cumulative_conversion_rate
from
 ab_test_conversions_cumulative</code></pre>



<p>So starting from the same data set, but this time also grouping by date of entry, we can see how each day’s results affect the final outcome. Because tests like these tend to vary with different winners on different days, I prefer to track to cumulative outcome rather than daily conversion rate. It’s more clear which variant is trending towards winning, and easier to see noise and variance as well.</p>



<p>The intermediary CTE ab_test_conversions_cumulative looks like this:</p>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Cumulative-Conversion-Rates-over-Time-by-Version-table.png" alt="Cumulative Conversion Rates over Time" class="wp-image-74989"/></figure>



<p>So you can already tell that the Test variant gets a slower start out of the gate, but quickly caught up and overtook the control variant. If you take the final step in the query and divide cumulative_conversions into cumulative_entrants, you get this nice looking graph:</p>



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Cumulative-Conversion-Rates-over-Time-by-Version-chart.png" alt="Cumulative Conversion Rates over Time" class="wp-image-74984"/></figure>



<p>The most variant tests tend to cross over in cumulative results quite often. If instead you launch a test with a clear winner, it might take off early and never cross back over.</p>



<p>A subtle nuance to look out for here, is that by grouping by the entry date, but conversion dates can be much later, earlier dates on this graph can end up re-stated if more conversions appear. I don’t mind looking at charts that vary in the past, but for those less comfortable with that, there are ways to modify. My favorite of which is applying a “time-limit” to your numerator, so that instead of tabulating all conversions, you instead only consider conversions that happened within 7 days of their entry. That necessitates you wait at least 7 days before judging results to be final, but that delay can sometimes be a helpful bit of friction. Imagine if we had called this particular example test within 2 days, we would have thought it was a failure!</p>



<h3 class="wp-block-heading"><strong>Parameterized SQL to make reporting on new A/B tests a breeze</strong></h3>



<p>Possibly my favorite feature of Sisense for Cloud Data Teams is their Parameterized SQL Snippets. You can take arbitrarily complex SQL that you need to run multiple times with only slight changes, and turn those slight changes into variables to be plugged in. As an example, we can take the same code from above, add a WHERE clause in the first CTE, and populate it with [ab_test_name], a dynamic parameter defined in the name of our Parameterized SQL Snippet, ab_test_confidence_intervals(ab_test_name).</p>



<pre class="wp-block-code"><code>Parameterized SQL Snippet:
ab_test_confidence_intervals(ab_test_name)
with
 ab_test_conversions as (
   select
     ab_test_version
     , count(ab_test_entrants.user_id) as ab_test_entrants
     , count(ab_test_conversions.user_id) as ab_test_conversions
   from
     [ab_test_entrants]
     left join [ab_test_conversions] using (user_id)
   where
     ab_test_name=[ab_test_name] -- dynamic SQL Snippet parameter
   group by
     1
 )
 , ab_test_conversion_rates as (
   select
     ab_test_version
     , ab_test_entrants
     , ab_test_conversions
     , (ab_test_conversions + 1.92) / (ab_test_entrants + 3.84 )::float as conversion_rate
   from
     ab_test_conversions
 )
 , conversion_rate_standard_error as (
   select
     ab_test_version
     , ab_test_entrants
     , ab_test_conversions
     , conversion_rate
     , sqrt(conversion_rate * (1 - conversion_rate) / ab_test_entrants) as standard_error
   from
     ab_test_conversion_rates
 )
select
 ab_test_version
 , ab_test_entrants
 , ab_test_conversions
 , conversion_rate - standard_error * 1.96 as conversion_rate_low
 , conversion_rate
 , conversion_rate + standard_error * 1.96 as conversion_rate_high
from
 conversion_rate_standard_error</code></pre>



<p>After saving that SQL Snippet, I can create a simple chart in Sisense for Cloud Data Teams with the below single line of code:</p>



<pre class="wp-block-code"><code>[ab_test_confidence_intervals('new_funnel_test_2017-12-01')]</code></pre>



<p>Need a new test’s results visualized? No problem! Drop a new ab_test_name into the snippet, and you’ll get those results instead.</p>



<h2 class="wp-block-heading"><strong>Final thoughts</strong></h2>



<p>Not everyone has the pleasure of working directly with A/B test data in a relational database (or possibly the headaches of maintaining that data!). But for those that do, it provides ample ground to experiment with visualizations and transformations like the above examples. I’ve found SQL to be an incredibly powerful and customizable tool for exploring this kind of data, and I hope this helps you get started with it too!</p>


A/B Test Reporting and Visualization in SQL

LinkedIn

Twitter

GitHub

curve-image-unique-image-unique

curve

3-dark-2-image-unique-image-unique

3 DARK 2

Get the latest in analytics right in your inbox.

Article