Want to make data analysis fast for everyone?Join Us!
SQL is great for aggregating: counting, summing, and averaging numbers are straightforward tasks in SQL. When looking at a distribution, it is often useful to look at them in context and compare the relative size of each part of the proverbial pie.
We want to track the number of users that we acquire from each marketing source. To do this we can write a simple query to calculate users by source:
We can plot this data, and track the growth of users by source over time:
It’s great to know how many users are coming from each source, but what if we want to know what proportion of sources are coming from a given source on any given week? Let’s take a look!
Redshift has the handy ratio_to_report window function to help us out.
ratio_to_report function divides the current value by the sum of all values in the dictated partition.
In our query, we want to divide the count of users from one source by the count of users from all sources.
Postgres does not have the same ratio_to_report function, but we can use
sum() over in a window function to achieve a similar result.
We will first create a common table expression with the count of users:
We can then use our
users_by_source CTE, with a
sum() over function to divide by the total for each week.
If our database is MySQL, or if we don’t want to use window functions, we can do this with a few simple subqueries. We will first want to calculate the total for each date:
Next, we want the users by source:
Finally, we can join and divide:
In Periscope, you can make your bar chart proportional for a visual representation of the data:
You can use Periscope’s pivot tables to quickly calculate the percentage-of-column to show the data in tabular form.
Now you can spend less time writing queries and more time analyzing the results of your data!