Periscope Data
The world’s fastest visualization platform for data analysts.

Selecting Only One Row Per Group

July 18, 2014

Selecting the First Row for each Group

Sometimes you just want to graph the winners. Were there more iOS or Android users today? Grouping and counting the daily usage per platform is easy, but getting only the top platform for each day can be tough.

Unlike joining only the first row, primary and foreign keys won’t be of much use, so we’ll need a different approach to make a chart like this:

Let’s start with daily counts per platform from our gameplays table:

select date(created_at) dt, platform, count(1) ct
from gameplays
group by 1, 2

This gives us a familiar table, with one date spanning multiple rows:

     dt     | platform |   ct   
------------+----------+--------
 2014-06-30 | iOS      |  49751
 2014-06-30 | android  |  80781
 2014-06-29 | iOS      | 158909
 2014-06-29 | android  |  91380
 2014-06-28 | iOS      | 108206
 2014-06-28 | android  |  95363
 2014-06-27 | iOS      | 105756
 2014-06-27 | android  |  92316 

We want a table with one row per date with the highest count, like this:

     dt     | platform |   ct   
------------+----------+--------
 2014-06-30 | android  |  80781
 2014-06-29 | iOS      | 158909
 2014-06-28 | iOS      | 108206
 2014-06-27 | iOS      | 105756

 

Postgres and Redshift

As usual on Postgres and Redshift, window functions make this an easy task. We’ll use the row_number() function partitioned by date in an inner query, and then filter to row_num = 1 in the outer query to get just the first record per group.

This function numbers each of the rows:

row_number() over (partition by dt order by ct desc) row_num

We’ll plug it into an inner query, like so, to get the desired results:

select dt, platform, ct 
from (
  select
    date(created_at) dt, 
    platform, 
    count(1) ct,
    row_number() over
      (partition by dt order by ct desc) row_num
  from gameplays
  group by 1, 2
) t
where row_num = 1

 

MySQL

Since MySQL doesn’t have window functions, we’ll do something similar using group_concat. With group_concat we can roll up the platform column into a comma-separated string per-date, ordered by its count:

group_concat(platform order by ct desc) platform

That’ll give us all the platforms, with the highest-performing platform first.

The complete query looks like this:

select 
  dt, 
  group_concat(platform order by ct desc) platform
from (
  select date(created_at) dt, platform, count(1) ct
  from gameplays
  group by 1, 2
) t
group by 1

Which gives us results like this:

     dt     |  platform
------------+-------------
 2014-06-30 | android,iOS
 2014-06-29 | iOS,android
 2014-06-28 | iOS,android
 2014-06-27 | iOS,android
 2014-06-26 | android,iOS
 2014-06-25 | iOS,android

Perfect! The platforms with the highest counts are first in the list. Using substring_index — which grabs the first word before the comma — we can extract out only the first platform:

substring_index(
  group_concat(
    platform order by ct desc
  )
, ',',1) platform,

Once we know the highest-performing platform each day, we can use max(ct) to get the count associated with that platform.

The complete query:

select 
  dt, 
  substring_index(
    group_concat(
      platform order by ct desc
    ), ',',1
  ) platform,
  max(ct) ct
from (
  select date(created_at) dt, platform, count(1) ct
  from gameplays
  group by 1, 2
) t
group by 1

And that’s all there is to it! As always, if you spend too much time hand-crafting queries like this, consider signing up for Periscope to make it faster and easier.

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?