## Fun with Window Functions

### Want to make data analysis fast for everyone?

Join Us!### Beyond Group And Count

Window functions are a wonderfully useful SQL technique. They make complex aggregations simple to build.

After using them to great effect for everything from selecting only one row, to computing running averages, to breaking out day-over-day changes, we thought it was high time to explain them in more detail.

### Simple Aggregations and Percentages

Let’s start with some data from a video game company. For each platform, we want to know how many times a user played a game on that platform, and what percent of all gameplays that platform has.

The window function in this query is ** sum(count(1)) over ()**.

** sum(count(1))** gives us the total number of gameplays.

**specifies to aggregate over all the rows without collapsing them. Thus this function gives us the total number of gameplays across all platforms.**

`over ()`

The ** count(1)** in the numerator is not part of the window function, and so it applies to all rows in the group, giving us a per-platform count.

Putting it all together, here are the results:

### Calculating ntiles

Quartiles or deciles can be a very useful way to split a dataset. Window are by far the easiest way to do this in SQL. Let’s look at user spend quartiles, and the min and max spend within each quartile.

The inner query gives us a table of spend per user. The middle query annotates each row with the quartile — ** ntile(4)** — of spend. Finally, the outer query aggregates the rows into just the min and max of each quartile.

### Cumulative Metrics

Say what you will about cumulative metrics — they are certainly to be used sparingly — but they are easy to calculate with window functions. Here we’ll compute a running sum of all revenue.

The inner query defines a simply daily sum of all revenue. The outer query makes it cumulative, summing all the values between the first day and the current day.

That’s accomplished with ** rows between unbounded preceding and current row**. For each row,

**begins the sum at the beginning of the table, and**

`unbounded preceding`

**halts the sum at, well, the current row.**

`current row`

Here are the results of both the inner and outer queries:

### Determining The Position of a Row

Ordering information is another useful trick window functions give us. Let’s take the previous query, and also add a ranking column for which platform has the highest number of plays:

** rank()** gives the row’s number, and

**specifies the order in which to apply the rank.**

`over (order by plays desc)`

### Multiple Windows With Partition

Often we want a separate ordering for different parts of the table. This is what the ** partition** feature enables. It splits the window function, applying it separately to each specified partition.

For example, let’s find the players with the most gameplays for each platform:

Our ** partition by platform** makes the

**function give us a separate rank for each platform.**

`rank()`

### How It All Works

Superficially, window functions are similar to your basic “group by” functionality. However, rather than subdividing tables into exclusive “groups” of rows and collapsing them, window functions can look at arbitrary “windows”, and do so without collapsing the windows into a single row.

#### Pieces of a Window Function

Dissecting our last example, ** rank() over (partition by platform order by plays desc)**, we can pull out three pieces:

— The function, which aggregates, ranks, or filters the rows in the partition.`rank()`

— The window, which specifies which rows the function applies to.`over(...)`

— Which subset of rows are considered. In this case, all rows with the same platform are in this partition.`partition by platform`

— The order of the rows in the window. This is especially useful for functions like`order by plays desc`

or`first()`

which depend on ordering.`row_number()`

Finally, the ** over()** window definition can also have a row specifier, which further restricts which rows are in the window. The Cumulative Metrics section above goes into this in more detail.

#### Evaluation Order

Window functions are evaluated after the join, group & having clauses, at the same time as other select statements.

That unfortunately means your window functions can’t refer to other fields in the select statement. To do this, you’ll need to wrap the select in a subquery and put your window function in the outer query.

#### Window Function Availability

Window functions were defined in SQL:2003 and are available in PostgreSQL, SQL Server, Redshift (which suppports a subset of Postgres’s functions) and Oracle (which calls them “analytic functions”).

Unfortunately they’re not supported on MySQL, though you can get a lot of mileage out of variables and group_concat.

### More Neat Tricks

As you can see, we’re big fans of window functions! Here are some times we’ve used them to great effect:

- Predicting Exponential Growth with SQL, in which they calculate a regression of exponential data.
- Use window functions for time-series percentages, in which they make a time series proportional.
- Generate Series in Redshift and MySQL, in which they replace Redshift’s unfortunate lack of generate_series.
- 4 Ways to Join Only The First Row in SQL, in which (spoiler alert!) one of the ways is to use a window function.
- ASCII Art Charts in the Terminal, in which they auto-scale our ascii charts.