Finding the min, max, and avg price of an order in your purchases table is easy. But what about the median price? Medians are much more difficult for the database to compute, so there usually isn’t a built-in function like the following:



<pre class="wp-block-code"><code>-- not going to work!
select median(price) from purchases</code></pre>



<h2 class="wp-block-heading">Why is Median Hard?</h2>



For the standard aggregate functions (min, max, count, etc.) it’s possible to get the result in a single pass over the data, storing the current min/max/count along the way. If the column is indexed, the minimum or maximum value can be found without looking at any rows!



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-01-14.png" alt="scan to min" class="wp-image-78576"/></figure>



Median cannot work like that because it’s algorithmically much more complicated. The median is the middle value. The data needs to be sorted so that the middle record can be found.



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/image-02-14.png" alt="scan sort min" class="wp-image-78582"/></figure>



SQL Analytics Starter Kit: Best Practices, Tips, and Tricks:



<a class="action-btn " href="https://www.sisense.com/whitepapers/sql-analytics-best-practices-tips-and-tricks/" target="_blank" rel="noopener noreferrer">Get the Starter Kit</a>



<h2 class="wp-block-heading">Finding the Median with Different Databases</h2>



Since median is harder than min/max/count and not a standard aggregate function, we’ll need to do a little more work to calculate it. Here’s how on several different databases:



<h3 class="wp-block-heading">Median on Redshift</h3>



The Redshift team recently released a <a href="https://docs.aws.amazon.com/redshift/latest/dg/r_Examples_of_median_WF.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">median window function</a>, making it one of the easiest syntaxes to find the median with:



<pre class="wp-block-code"><code>select median(price) over () as median
from purchases
limit 1</code></pre>



Note the&nbsp;limit 1: Since&nbsp;median&nbsp;is a window function and not an aggregate function, it’ll return one value for each row in the table.



<h3 class="wp-block-heading">Median on Postgres</h3>



If you like defining your own functions in Postgres, the&nbsp;<a href="https://wiki.postgresql.org/wiki/Aggregate_Median" target="_blank" rel="noreferrer noopener">Postgres Wiki has a definition for&nbsp;median</a>. We’ll do it in SQL and get Postgres to help us find the middle value by numbering all the rows with the&nbsp;row_number()&nbsp;window function.



First, a CTE to sort and number all of the rows, with a&nbsp;count&nbsp;that’ll help later on:



<pre class="wp-block-code"><code>with ordered_purchases as (
 select
 price,
 row_number() over (order by price) as row_id,
 (select count(1) from purchases) as ct
 from purchases
)</code></pre>



 Then we find the middle one or two rows and average their values: 



<pre class="wp-block-code"><code>select avg(price) as median
from ordered_purchases
where row_id between ct/2.0 and ct/2.0 + 1</code></pre>



The&nbsp;where&nbsp;clause ensures that we’ll get the two middle values if there is an even number of values, and the single middle number if there is an odd number of values because&nbsp;between&nbsp;is inclusive of its bounds.



<h3 class="wp-block-heading">Median on MySQL</h3>



MySQL might not have window functions, but it does have variables, and we’ll use them to achieve the same result.



First we’ll set two variables, one for the row count and one to act as the&nbsp;row_id&nbsp;from before:



<pre class="wp-block-code"><code>set @ct := (select count(1) from purchases);
set @row_id := 0;</code></pre>



And just like before, we average the middle one or two values:



<pre class="wp-block-code"><code>select avg(price) as median
from (select * from purchases order by price)
where (select @row_id := @row_id + 1)
between @ct/2.0 and @ct/2.0 + 1</code></pre>



The @row_id := @row_id + 1 syntax simply increments the @row_id counter for each row. Unlike Postgres we don’t need to build up a temporary result set of rows with row_id because variables let us compute the row_id on the fly.



<h2 class="wp-block-heading">Bonus Round: Mode</h2>



No post with avg and median is complete without mode! The mode (defined as the most frequent value) of a column can be found with a simple group and count:



<pre class="wp-block-code"><code>select price as mode
from purchases
group by 1
order by count(1) desc
limit 1</code></pre>



This query gives us the&nbsp;price&nbsp;with the greatest&nbsp;count()&nbsp;in the table, also known as the mode.



And now you know how to compute median in SQL for Redshift, Postgres and MySQL databases!



SQL Analytics Starter Kit: Best Practices, Tips, and Tricks:



<a class="action-btn " href="https://www.sisense.com/whitepapers/sql-analytics-best-practices-tips-and-tricks/" target="_blank" rel="noopener noreferrer">Get the Starter Kit</a>

Medians in SQL

SQL Superstar

LinkedIn

Twitter

GitHub

curve-image-unique-image-unique

curve

3-dark-2-image-unique-image-unique

3 DARK 2

Get the latest in analytics right in your inbox.

Article