Use generate_series to get continuous results

Want to make data analysis fast for everyone?

Join Us!

How many widgets did I sell?

When you’re just starting out, chances are you’ll be asking a very simple question: How many widgets did I sell each day?

Your SQL might look something like this:

select date(created_at) as day, count(1)
from sales
where created_at > now() - interval '30 day'
group by day order by day;

With that SQL, you can expect results like these:

    day     | count
------------+-------
 2014-03-12 |     5
 2014-03-13 |     1
 2014-03-19 |     6
 2014-03-20 |     4
 2014-03-21 |     1
 2014-03-24 |     1
 2014-04-08 |     2
 2014-04-09 |     3
(8 rows)

If you viewed the results in a tool like Periscope, you’d get a graph like this:

Discrete Results

Unfortunately, since you’re just starting out, you’re not selling widgets every day. And notice the zeroes aren’t even showing up! Your investors will not be pleased with such a misleading graph.

If you’re using Postgres, generate_series can help. generate_series produces a table with a given first point, last point, and interval that you specify. In this case we’ll choose 30 days ago, today, and each day in between. Here’s our revised SQL:

select date(d) as day, count(sales.id) 
from generate_series(
  current_date - interval '30 day', 
  current_date, 
  '1 day'
) d 
left join sales on date(sales.created_at) = d 
group by day order by day;

By building an explicit list of dates in our query, rather than relying on the dates, in the sales table, we made sure every date showed up in our result set.

    day     | count
------------+-------
 2014-03-12 |     5
 2014-03-13 |     1
 2014-03-14 |     0
 2014-03-15 |     0
 2014-03-16 |     0
 2014-03-17 |     0
 2014-03-18 |     0
 2014-03-19 |     6
 2014-03-20 |     4
 2014-03-21 |     1
 2014-03-22 |     0
 2014-03-23 |     0
 2014-03-24 |     1
 2014-03-25 |     0
 2014-03-26 |     0
 2014-03-27 |     0
 2014-03-28 |     0
 2014-03-29 |     0
 2014-03-30 |     0
 2014-03-31 |     0
 2014-04-01 |     0
 2014-04-02 |     0
 2014-04-03 |     0
 2014-04-04 |     0
 2014-04-05 |     0
 2014-04-06 |     0
 2014-04-07 |     0
 2014-04-08 |     2
 2014-04-09 |     3
 2014-04-10 |     0
 2014-04-11 |     0
(31 rows)

And here’s the Periscope graph:

Continuous Results

That’s more like it!

Caveats and gotchas

left join: We used generate_series to make sure we got a row for every day in the last 30 days, not just a row for every day in the sales table. So be careful not to use an inner join, which will wipe out the days on the left-hand-side that have no matching day on the right-hand-side.

count(sales.id): A snippet from our joined table looks like this:

2014-03-21 00:00:00 | 1178 | [email protected] | Dorian     | Nakamoto   | 4096 | 2014-32-21 22:19:03.36718 
2014-03-22 00:00:00 |      |                        |            |            |      |
2014-02-23 00:00:00 |      |                        |            |            |      |
2014-03-24 00:00:00 | 1179 | [email protected]     | Harry      | Glaser     | 500  | 2014-32-24 23:27:56.759234

As intended, there are blank rows for days where we had no sales. As a result, we can’t just count every row to get number of sales. count(sales.id) will count just the rows with non-null sales IDs, which is what we want.

Other databases

Not on Postgres? Here are some generate_series equivalents that are as good or better:

Oracle’s magic dual table can help:

select sysdate - level from dual connect by level <= 30;

SQL Server’s recursive with statements are helpful, and very impressive in general:

with t as (
  select 0 as d
  union all
  select d + 1
  from t
  where d < 30
) select getdate() - d from t

MySQL, unfortunately, is trickier. For some clever hacks, start with this answer on StackOverflow.

An easier way

Of course, a visualization tool like Periscope will continuous-ify the data for you! In our tool of choice, head to a chart’s settings and toggle “Continuous Axis” for the desired effect.

Thank you