Common Statistical Operations
Cohorted Engagement: 7-day and 30-day Metrics
When tracking user growth, it is helpful to look at multiple time periods. In a previous post, Cohorted Engagement with Comparable Time Windows in SQL, we covered creating charts of the form: “How many times does a user play our game in the first 30 days, and how has this varied for the users that started in January, February, and so on?”
Today we’ll extend our previous technique. Our goal is to create charts like this one:
The blue line is the number of users we’ve added in the first seven days after a new company account is created. For example, each week in January over a thousand new users were added within the first seven days of a company signing up. The orange line is the first 30 days, and it ends early because the full 30 days hasn’t passed yet - each day the blue, orange, and green lines will extend an extra day.
We can spot several trends in this chart. The week of March 14th and May 16th both have large downward cliffs and warrant further investigation. The d7 users trendline has been consistent throughout the year, but there’s a worrying d90 users trend at the beginning: it could be accounts aren’t growing and expanding as much, or new accounts are from smaller companies.
We won’t know how April and May did on d90 for another couple months but we can cautiously hope that it isn’t getting worse: our d30 trendline, on average, isn’t worse than the low point on March 14th.
Let’s walk through creating this chart. First, we’ll start with a table of users and a table of accounts, where each company has many users.
Let’s create a helper CTE, a with clause in postgres, to pull out some intermediate calculations and join the tables.
with new_users as (
date_diff('day', accounts.created_at, user.created_at)
date_diff('day', account.created_at, now())
left join users on accounts.id = users.account_id
group by 1,2,3
select * from new_users
Running this query builds our new_users table:
As an example of the two helper fields, if a company that signed up 20 days ago had a user sign up after 5 days, account_age_when_invited would be 5, and account_age would be 20. We’d want to count that user in 7-day metrics because they were invited in the first 7 days of the account, and the account age is more than 7.
With that, we can easily make our trend lines:
with new_users as (
sum(case when account_age >= 7 and account_age_when_invited < 7
then users else null end) as d7_users,
sum(case when account_age >= 30 and account_age_when_invited < 30
then users else null end) as d30_users,
sum(case when account_age >= 90 and account_age_when_invited < 90
then users else null end) as d90_users,
group by 1
We get our resulting table
and produce our final chart!
We aggregate the days up to weeks to smooth out the daily variance. Each time window is a separate sum statement. The account_age check is what causes the lines to end earlier for larger time windows, and the account_age_when_invited restricts the user count to just the users that fit in the N-day window.
That’s the general technique - good luck in slicing your data by multiple cohorted time windows!