No items found.
Analysis

Adding a Total Row or Column to Your Cohort

This post was originally published on the Periscope Data Community. To see more content like this and to communicate with data experts like you, join the community.

In many cases, a cohort grid can benefit from showing the rolled-up row or column totals. This can be accomplished within the query by UNIONing the appropriately grouped columns.

Let's start by taking a look at the structure of our baseline cohort grid:

We have three columns:

  1. Month
  2. Source
  3. Num_Users

Our goal will now be to add a "Total" row and column to our cohort grid that sums up the number of users for each.

Adding a Total Row:

First, we can start by building a new CTE to give us a total row (i.e. a row that shows the sum for each of our sources). Because this row will sum up the values by the source, we can use a 'Total' placeholder text for the month column. Our SQL for this CTE will look like:

, row_total AS
(
  select
    'Total'::text
    , source
    , sum(num_users)
  from
    users_by_source
  group by
   1
   , 2
)

it's important to note that we cast the 'Total' value to explicitly be text because we plan on UNIONing this back to our original CTE. In order for this to succeed, we would also want to make sure to cast our original "month" conversion in the users_by_source CTE to also be text. Now, our final query and result will look like:

Adding a Total Column:

We can now use similar logic for adding a total column. But instead of selecting and grouping by the source, we'll want to select and group by our month column:

, total_column AS
(
  select
    month
    , 'Total'::text
    , sum(num_users)
  from
    users_by_source
  group by
    1
    , 2
)

Adding this into our previous query, we'll now have:

You'll notice that the ordering doesn't come out quite the way we're aiming for. This is due to the inconsistent nature of ordering in SQL if we haven't explicitly added an ORDER BY statement. We'll now account for this in our final step.

Ordering Total Row/Column:

The simplest way to approach this is to add one more static, numeric column into each of our CTEs. We'll then use this column to define our final ordering. The important factors to keep in mind here are:

1. We want the 'Total' columns to appear at the end, so the column we add into their CTE's should have a larger value to order by
2. Accordingly, our baseline CTE ("users_by_source") should have a smaller value than the equivalent column in our total CTE's
3. Because our "Total" and now "Ordering" columns are going to be static, we don't necessarily have to include them in our grouping. This is why the CTE's in the SQL below only group by the non-static columns. But, it's also acceptable to do: "group by 1,2,4" in each CTE.

The final query with this new "Ordering" column is shown below:

with
 users_by_source as (
   select
     ([created_at:month])::text as month
     , source
     , count(1) as num_users
     , 1 as ordering
   from
     users
   group by
     1
     , 2
 )
 , total_row as (
   select
     'Total'::text
     , source
     , sum(num_users)
     , 2 as ordering
   from
     users_by_source
   group by
     2
 )
 , total_column as (
   select
     month
     , 'Total'::text
     , sum(num_users)
     , 2 as ordering
   from
     users_by_source
   group by
     1
 )
, roll_up as (
 select * from users_by_source
 union
 select * from total_row
 union
 select * from total_column
)
select * from roll_up
order by ordering, month

And voilà! Our final display now properly places our 'Total' row and column at the end of their respective groupings:

Tags: 

Want to discuss this article? Join the Periscope Data Community!

Andreas Djokic
Andreas moved to San Francisco to chase his startup dreams. Following his backend work with algorithm design, he joined Periscope Data to help others find the fun side of data.