Want to make data analysis fast for everyone?Join Us!
The best way to make your SQL queries run faster is to have them do less work, and a great way to do less work is to query a materialized view that’s already done the heavy lifting.
Materialized views are particularly nice for analytics queries, where many queries do math on the same basic atoms, data changes infrequently (often as part of hourly or nightly ETLs), and those ETL jobs provide a convenient home for view creation and maintenance logic.
Redshift doesn’t yet support materialized views out of the box, but with a few extra lines in your import script (or a tool like Periscope), creating and maintaining materialized views as tables is a breeze.
Lifetime Daily ARPU (average revenue per user) is common metric and often takes a long time to compute. Let’s speed it up with materialized views.
Calculating Lifetime Daily ARPU
This common metric shows the changes in how much money you’re making per user over the lifetime of the your product.
|Sum of purchases up to Date|
|Unique user count up to Date|
For that we’ll need a
purchases table and a
gameplays table, and the lifetime accumulated values for each date. Here’s the SQL for calculating lifetime gameplays:
with lifetime_gameplays as ( select dates.d, count(distinct gameplays.user_id) as count_users from ( select distinct date(created_at) as d from gameplays ) as dates inner join gameplays on date(gameplays.created_at) <= dates.d group by d ),
The range join in the correlated subquery lets us recalculate the distinct number of users for each date.
Here’s the SQL for lifetime purchases in the same format:
lifetime_purchases as ( select dates.d, sum(price) as sum_purchases from ( select distinct date(created_at) as d from purchases ) as dates inner join purchases on date(purchases.created_at) <= dates.d group by d )
Now that the setup is done, we can calculate lifetime daily ARPU:
with lifetime_gameplays as (...), lifetime_purchases as (...) select lifetime_gameplays.d as date, round( lifetime_purchases.sum_purchases / lifetime_gameplays.count_users , 2 ) as arpu from lifetime_purchases inner join lifetime_gameplays on lifetime_purchases.d = lifetime_gameplays.d order by lifetime_gameplays.d
That’s a monster query and it takes minutes to run on a database with 2B gameplays and 3M purchases. That’s way to slow, especially if we want to quickly slice by dimensions like what platform the game was played on. Plus, similar lifetime metrics will need to recalculate the same data over and over again!
Easy View Materialization on Redshift
Conveniently, we wrote our query in a format that makes it obvious which parts can be extracted into materialized views:
We’ll fake view materialization in Redshift by creating tables, and Redshift makes it easy to create tables from snippets of SQL:
create table lifetime_purchases as ( select dates.d, sum(price) as sum_purchases from ( select distinct date(created_at) as d from purchases ) as dates inner join purchases on date(purchases.created_at) <= dates.d group by d )
Do the same thing for
lifetime_gameplays, and and calculating Lifetime Daily ARPU now takes less than a second to complete!
Redshift is especially great for this kind of optimization because data on a cluster usually changes infrequently, often as a result of hourly or nightly ETLs.
Remember to drop and recreate these tables every time you upload data to your Redshift cluster to keep them fresh. Or create views in Periscope instead, and we’ll keep them up to date automatically!