Databases
$3000 Data Warehouse — Redshift vs. Postgres
You have $3,000 a month to spend on a data warehouse. Should you upgrade your read replica or switch to Redshift? If you value fast queries, the answer is Redshift.
We benchmarked Amazon Redshift against Amazon RDS Postgres and found Redshift to be 100-1000 times faster on common analytics queries.

The Specs
To make the comparison as fair as possible, we benchmarked the largest RDS Postgres box (DB.R3.8XLarge) against a similarly priced and spec’d Redshift cluster (16 DW2.Large nodes).
Both our RDS Postgres box and our Redshift cluster used default settings and each costs about $3000/month:

We ran each test query 3 times on an otherwise idle setup. The reported time is the average of the second two executions.
Each query was run against a transactions table that’s comprised of:
- 1 billion rows
- 50 million unique users in user_id
- 10 thousand unique products in product_id
- Timestamps spanning one year in created_at
- And a dozen extra columns representing various attributes of the transaction
The RDS Postgres version of this table had indexes on created_at, user_id, and product_id.
The Redshift table used user_id as the dist key, (user_id, created_at) as the sort key, and the compression encodings recommended by analyze compression.
Both tables were analyzed and vacuumed before running any queries.
Metrics Queries
Many of our customers look at metrics like Daily Revenue, Daily Active Users, and Daily ARPU. On average, Redshift was 500x faster than RDS Postgres:


Here are the metrics queries we tested:
-- Daily Revenue
select date(created_at), sum(amount)
from transactions group by 1
-- Daily Active Users
select date(created_at), count(distinct user_id)
from transactions group by 1
-- Daily ARPU
select date(created_at), sum(amount) / count(distinct user_id)
from transactions group by 1
Distinct Queries
Whether it’s 30-day retention or unique sessions, many analytics queries rely on being able to count the distinct number of elements in a set very fast. On average, Redshift was 200x faster than RDS Postgres for these queries.


Here are the distincting queries we tested:
-- Users per Product
select product_id, count(distinct user_id)
from transactions group by 1
-- Products per User
select user_id, count(distinct product_id)
from transactions group by 1
-- Products per Date
select date(created_at), count(distinct product_id)
from transactions group by 1
How is Redshift so Fast?
Redshift owes it’s speed to the following three factors:
Compressed Columnar Storage
Postgres stores data by row. This means you have to read the whole table to sum the price column.
Redshift stores its data organized by column. This allows the database to compress records because they’re all the same data type. Once they’re compressed, there’s less data to read off disk and store in RAM.
Block Storage and 100% CPU
Postgres does not use multiple cores for a single query. While this allows more queries to run in parallel, no single query can use all of the machine’s resources.
Redshift stores each table’s data in thousands of chunks called blocks. Each block can be read in parallel. This allows the Redshift cluster to use all of its resources on a single query.
Clusters make IOPS easy
The RDS Postgres box we used had the standard 3K Input/Output Operations Per Second (IOPS). Even raising that to 10K IOPS for another $1000 a month barely moved the needle. Reading from disk is just really slow.
A Redshift cluster can achieve higher much IOPS. Each node reads from a different disk and the IOPS sums across the cluster. Our benchmark cluster achieved over 50K IOPS.
Faster Queries without the Extra Work
You’ll always have faster results querying a lot of data on Redshift versus on a large read replica such as RDS Postgres.
If you want the speed of Redshift but don’t want to spend the time ETLing your data, sign up for Periscope and we’ll give you the best of both worlds.
Your data can stay in your replica and our backend cache will make your analytics queries super fast!