Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Using Row Numbering And Full Joins To Transpose Wide Tables

Using Row Numbering And Full Joins To Transpose Wide Tables
October 2, 2014

More Rows, Not More Columns

Many a MySQL or Postgres DBA, with responsibility for serving a fast and responsive website, will design a schema for a survey that looks like this:

Row-Based Table Design

This is a great schema for serving a live site. You only need to show one user her responses, and in a row-based DB like MySQL or Postgres, select * from survey_responses where user_id = 5 is very fast.

When this table is moved to the data warehouse, however, tables like this make your data analyst crazy. The analyst would much rather see a table like this:

For one thing, the analyst will tend to run queries that scan all the rows, like this one calculating the “yes” rate on Question 2:

select 
  sum(
    case 
      when response is true then 1 
      else 0 
    end
  )::float / count(1
from survey_responses 
where question = 'question 2'

For another, the analyst will be using a data warehouse, which makes looking at all the rows in a couple columns very efficient, but looking at all the columns — even only a couple hundred — quite expensive.

Making The Transition

We know we want a row-based table design for serving our website, and a columnar design for analyzing the data. So how do we get from one to the other? How do we transpose the first table into the second table?

A naive solution is a set of unions:

select user_id, 'question 1', question_1_response
from survey_responses where user_id = 1
union select user_id, 'question 2', question_2_response
from survey_responses where user_id = 1
union select user_id, 'question d', question_3_response
from survey_responses where user_id = 1
...

This approach has a number of drawbacks: First and foremost, it is a pain in the ass to write.

Moreover, it is expensive. It asks for every whole row n times if there are nquestions. And since this transformation typically takes place on the data warehouse, asking for whole rows over and over again is especially brutal.

Row Numbering And Full Joins

We’ll start with a simple list of numbers:

with numbers as (
  select row_number() over (order by true) i from survey_responses
)

Now, for every row in our row-based table, we’ll create a row per question:

expanded_responses as (
  select * from survey_responses join numbers on true
)

Notice survey_responses join numbers on true. This is a full join, meaning a row is created in the resulting table for every possible combination of rows survey_responses and numbers.

Our table so far looks like so:

Blown-Out Table

Notice the i column on the far right. We now have one row per user per question, which is about to come in very handy.

Now we will use our row numbering to select responses to question 1 from rows numbered 1, responses to question 2 from rows numbered 2, etc.:

select 
  user_id,
  'question ' || i as question,
  case
    when i = 1 then question_1_response
    when i = 2 then question_2_response
    when i = 3 then question_3_response
    when i = 4 then question_4_response
    when i = 5 then question_5_response
  end as response
from expanded_responses

This gives us our fully-transposed table!

The row numbers gave us a handy way of picking which row to use for selecting the answer to each question. And since this is running on a columnar data warehouse, blowing out the row count is not a big deal.

Putting it all together, the query looks like this:

with numbers as (
  select row_number() over (order by true) i from survey_responses
),
expanded_responses as (
  select * from survey_responses join numbers on true
)
select 
  user_id,
  'question ' || i as question
  case
    when i = 1 then question_1_response
    when i = 2 then question_2_response
    when i = 3 then question_3_response
    when i = 4 then question_4_response
    when i = 5 then question_5_response
  end as response
from expanded_responses

When transposing a 111-column, 132,000-row table on Amazon Redshift, this technique ran in 7 seconds. We’d love to be able to tell you how long the union-based approach took for comparison, but it crashed the cluster!

You Guessed It!

There’s an easier way. Using Periscope, use the Transpose Table checkbox in the Settings tab to transpose your whole table in a single click:

Transpose Checkbox

Missing out? Sign up for Periscope to get one-click transpose functionality and a whole lot more.

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?