<p>Your data warehouse is a vital part of your business, so making decisions like upgrading your read replica vs switching to Redshift are important. If you value fast queries, Redshift is the way to go.</p>



<p>When benchmarking Amazon Redshift against Amazon RDS Postgres, Redshift came out to be 100-1,000 times faster on common analytics queries.</p>



<figure class="wp-block-image size-full fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/benchmark-summary.png" alt="Benchmark summary" class="wp-image-80852"/></figure>



<h2 class="wp-block-heading">The Specs</h2>



<p>To make the comparison as fair as possible, we benchmarked the largest RDS Postgres box (DB.R3.8XLarge) against a similarly priced and spec’d Redshift cluster (16 DW2.Large nodes). Both our RDS Postgres box and our Redshift cluster used default settings.</p>



<p>We ran each test query 3 times on an otherwise idle setup. The reported time is the average of the second two executions.</p>



<p>Each query was run against a transactions table that’s comprised of:</p>



<ul><li>1 billion rows</li><li>50 million unique users in user_id</li><li>10 thousand unique products in product_id</li><li>Timestamps spanning one year in created_at</li><li>And a dozen extra columns representing various attributes of the transaction</li></ul>



<p>The RDS Postgres version of this table had indexes on created_at, user_id, and product_id.</p>



<p>The Redshift table used user_id as the dist key, (user_id, created_at) as the sort key, and the compression encodings recommended by analyze compression.</p>



<p>Both tables were analyzed and vacuumed before running any queries.</p>



<h2 class="wp-block-heading">Metrics Queries</h2>



<p>Many of our customers look at metrics like Daily Revenue, Daily Active Users, and Daily ARPU. On average, Redshift was 500x faster than RDS Postgres:</p>



<figure class="wp-block-image size-full fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/benchmark-metrics.png" alt="Benchmark metrics" class="wp-image-80858"/></figure>



<figure class="wp-block-image size-full fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/table-metrics.png" alt="Table metrics" class="wp-image-80864"/></figure>



<p>Here are the metrics queries we tested:</p>



<pre class="wp-block-code"><code>-- Daily Revenue
select date(created_at), sum(amount)
from transactions group by 1
-- Daily Active Users
select date(created_at), count(distinct user_id)
from transactions group by 1
-- Daily ARPU
select date(created_at), sum(amount) / count(distinct user_id)
from transactions group by 1</code></pre>



<h2 class="wp-block-heading">Distinct Queries</h2>



<p>Whether it’s 30-day retention or unique sessions, many analytics queries rely on being able to count the distinct number of elements in a set very fast. On average, Redshift was 200x faster than RDS Postgres for these queries.</p>



<figure class="wp-block-image size-large"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/benchmark-distinct.png" alt="Benchmark distinct" class="wp-image-80876"/></figure>



<figure class="wp-block-image size-large"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/table-distinct.png" alt="Table distinct" class="wp-image-80870"/></figure>



<p>Here are the distincting queries we tested:</p>



<pre class="wp-block-code"><code>-- Users per Product
select product_id, count(distinct user_id)
from transactions group by 1
-- Products per User
select user_id, count(distinct product_id)
from transactions group by 1
-- Products per Date
select date(created_at), count(distinct product_id)
from transactions group by 1</code></pre>



<h2 class="wp-block-heading">How is Redshift so Fast?</h2>



<p>Redshift owes its speed to the following three factors:</p>



<h3 class="wp-block-heading">Compressed Columnar Storage</h3>



<p>Postgres stores data by row. This means you have to read the whole table to sum the price column.</p>



<p>Redshift stores its data <a href="https://docs.aws.amazon.com/redshift/latest/dg/c_columnar_storage_disk_mem_mgmnt.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">organized by column</a>. This allows the database to compress records because they’re all the same data type. Once they’re compressed, there’s less data to read off disk and store in RAM.</p>



<h3 class="wp-block-heading">Block Storage and 100% CPU</h3>



<p>Postgres does not use multiple cores for a single query. While this allows more queries to run in parallel, no single query can use all of the machine’s resources.</p>



<p>Redshift stores each table’s data in thousands of chunks called blocks. Each block can be read in parallel. This allows the Redshift cluster to use all of its resources on a single query.</p>



<h3 class="wp-block-heading">Clusters make IOPS easy</h3>



<p>The RDS Postgres box we used had the standard 3K Input/Output Operations Per Second (IOPS). Even raising that to 10K IOPS for another $1000 a month barely moved the needle. Reading from disk is just really slow.</p>



<p>A Redshift cluster can achieve higher much IOPS. Each node reads from a different disk and the IOPS sums across the cluster. Our benchmark cluster achieved over 50K IOPS.</p>



<h2 class="wp-block-heading">Faster Queries without the Extra Work</h2>



<p>You’ll always have faster results querying a lot of data on Redshift versus on a large read replica such as RDS Postgres.</p>



<p>If you want the speed of Redshift but don’t want to spend the time ETLing your data, <a href="https://www.sisense.com/get/free-trial/">Sisense</a> can give you the best of both worlds.</p>


Data Warehouse Showdown: Redshift vs. Postgres

LinkedIn

Twitter

GitHub

curve-image-unique-image-unique

curve

3-dark-2-image-unique-image-unique

3 DARK 2

Get the latest in analytics right in your inbox.

Article