Periscope Data
The world’s fastest visualization platform for data analysts.

Calculating Proportional Values in SQL

September 7, 2016

SQL is great for aggregating: counting, summing, and averaging numbers are straightforward tasks in SQL. When looking at a distribution, it is often useful to look at them in context and compare the relative size of each part of the proverbial pie.

We want to track the number of users that we acquire from each marketing source. To do this we can write a simple query to calculate users by source:

select
  date_trunc('week',created_at) as date
  , source
  , count(1)
from
  users
group by
  date
  , source

We can plot this data, and track the growth of users by source over time:

It’s great to know how many users are coming from each source, but what if we want to know what proportion of sources are coming from a given source on any given week? Let’s take a look!

Redshift

Redshift has the handy ratio_to_report window function to help us out. The ratio_to_report function divides the current value by the sum of all values in the dictated partition. In our query, we want to divide the count of users from one source by the count of users from all sources.

select
  date_trunc('week',created_at) as date
  , source
  , count(1) as count 
  , ratio_to_report(count(1)) over (
    partition by date_trunc('week',created_at)
  ) as proportion
from
  users
group by
  date

Postgres

Postgres does not have the same ratio_to_report function, but we can use sum() over in a window function to achieve a similar result. We will first create a common table expression with the count of users:

with users_by_source as (
select
  date_trunc('week',created_at) as date
  , source
  , count(1) as count 
from
  users
group by
  date
  , source
)

We can then use our users_by_source CTE, with a sum() over function to divide by the total for each week.

select 
  date
  , source
  , 1.0 * count / sum(count) over (partition by date)
from users_by_source

MySQL

If our database is MySQL, or if we don’t want to use window functions, we can do this with a few simple subqueries. We will first want to calculate the total for each date:

select 
  yearweek(created_at) as date 
  , count(1) as count
from 
  users
group by 
  date

Next, we want the users by source:

select 
  yearweek(created_at) as date
  , source
  , count(1) as count
from 
  users
group by 
  date, source

Finally, we can join and divide:

select
  T1.date
  , t1.source
  , t1.count / t2.count
from
  (
    select
      yearweek(created_at) as date
      , source
      , count(1) as count
    from
      Users
    group by
      date
      , source
  )
  as t1
  join(
    select
      yearweek(created_at) as date
      , count(1) as count
    from
      Users
    group by
      date
  )
  as t2 on
    t1.date = t2.date

Periscope

In Periscope, you can make your bar chart proportional for a visual representation of the data:

You can use Periscope’s pivot tables to quickly calculate the percentage-of-column to show the data in tabular form.

Now you can spend less time writing queries and more time analyzing the results of your data!

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?