No items found.

$3000 Data Warehouse — Redshift vs. Postgres

You have $3,000 a month to spend on a data warehouse. Should you upgrade your read replica or switch to Redshift? If you value fast queries, the answer is Redshift.

We benchmarked Amazon Redshift against Amazon RDS Postgres and found Redshift to be 100-1000 times faster on common analytics queries.

The Specs

To make the comparison as fair as possible, we benchmarked the largest RDS Postgres box (DB.R3.8XLarge) against a similarly priced and spec’d Redshift cluster (16 DW2.Large nodes).

Both our RDS Postgres box and our Redshift cluster used default settings and each costs about $3000/month:

We ran each test query 3 times on an otherwise idle setup. The reported time is the average of the second two executions.

Each query was run against a transactions table that’s comprised of:

  • 1 billion rows
  • 50 million unique users in user_id
  • 10 thousand unique products in product_id
  • Timestamps spanning one year in created_at
  • And a dozen extra columns representing various attributes of the transaction

The RDS Postgres version of this table had indexes on created_atuser_id, and product_id.

The Redshift table used user_id as the dist key(user_id, created_at) as the sort key, and the compression encodings recommended by analyze compression.

Both tables were analyzed and vacuumed before running any queries.

Metrics Queries

Many of our customers look at metrics like Daily Revenue, Daily Active Users, and Daily ARPU. On average, Redshift was 500x faster than RDS Postgres:

Here are the metrics queries we tested:

-- Daily Revenue
select date(created_at), sum(amount)
from transactions group by 1
-- Daily Active Users
select date(created_at), count(distinct user_id)
from transactions group by 1
-- Daily ARPU
select date(created_at), sum(amount) / count(distinct user_id)
from transactions group by 1

Distinct Queries

Whether it’s 30-day retention or unique sessions, many analytics queries rely on being able to count the distinct number of elements in a set very fast. On average, Redshift was 200x faster than RDS Postgres for these queries.

Here are the distincting queries we tested:

-- Users per Product
select product_id, count(distinct user_id)
from transactions group by 1
-- Products per User
select user_id, count(distinct product_id)
from transactions group by 1
-- Products per Date
select date(created_at), count(distinct product_id)
from transactions group by 1

How is Redshift so Fast?

Redshift owes it’s speed to the following three factors:

Compressed Columnar Storage

Postgres stores data by row. This means you have to read the whole table to sum the price column.

Redshift stores its data organized by column. This allows the database to compress records because they’re all the same data type. Once they’re compressed, there’s less data to read off disk and store in RAM.

Block Storage and 100% CPU

Postgres does not use multiple cores for a single query. While this allows more queries to run in parallel, no single query can use all of the machine’s resources.

Redshift stores each table’s data in thousands of chunks called blocks. Each block can be read in parallel. This allows the Redshift cluster to use all of its resources on a single query.

Clusters make IOPS easy

 The RDS Postgres box we used had the standard 3K Input/Output Operations Per Second (IOPS). Even raising that to 10K IOPS for another $1000 a month barely moved the needle. Reading from disk is just really slow.

A Redshift cluster can achieve higher much IOPS. Each node reads from a different disk and the IOPS sums across the cluster. Our benchmark cluster achieved over 50K IOPS.

Faster Queries without the Extra Work

You’ll always have faster results querying a lot of data on Redshift versus on a large read replica such as RDS Postgres.

If you want the speed of Redshift but don’t want to spend the time ETLing your data, sign up for Periscope and we’ll give you the best of both worlds.

Your data can stay in your replica and our backend cache will make your analytics queries super fast!

Want to discuss this article? Join the Periscope Data Community!

Periscope Data
The world’s fastest visualization platform for data analysts.