Amazon Redshift’s Hardware Upgrade Improves Query Speed by up to 5x
Last year we compared the query speeds of Redshift, Snowflake and BigQuery. Thanks to a massive hardware upgrade, Redshift just got a significant boost. With this in mind, let’s revisit query speeds in Redshift!
Redshift offers two different node types — dense compute (DC) and dense storage (DS) — which are both available in two sizes each (Large and 8Xlarge, and Xlarge and 8Xlarge, respectively).
The DC nodes are pricier, but run on blazing-fast solid-state disks. We’ve historically used a mixture of dc1.large and dc1.8xlarge nodes at Periscope Data to best meet customer needs.
But now, second generation dense compute nodes are available! They come in the same two sizes, but have massive hardware upgrades across the board. DC2 nodes feature Intel E5-2686 v4 (Broadwell) CPUs, DDR4 memory, and low-latency NVMe-based solid-state disks (a huge upgrade from those used on the DC1s). Compared with DC1, DC2 nodes can deliver up to 2x more disk and network throughput!
Through some additional optimizations, dc2.8xlarge also improved data storage efficiency and will be a focus in the following benchmarks.
Evaluating Redshift’s Upgrades
Needless to say, we were excited to take the new node types for a spin (though there’s nothing really spinning about them)! We turned to some of our heaviest queriers and bucketed testing into three types of commonly seen clusters:
- Small dcl - Low node count, dc1.large / dc2.large (<5 nodes)
- Small dc8xl - Low node count, dc1.8xlarge / dc2,8xlarge (<5 nodes)
- Large dcl - High node count, dc1.large / dc2.large (20–30 nodes)
After that, we carefully watched a variety of metrics, ranging from raw read/write latency to loading bar times in our product. We looked at performance improvements a few different ways:
- Read latency
- Write latency
- CPU utilization
- Loading bars in the product
- Query throughput (queries serviced per hour)
- Query execution time
As data-driven folk, we’re acutely aware of the query patterns that occur in our product. Though everyone’s use case is different, we generally see queries in four buckets:
- Near-instant queries, which run in under 5 seconds
- Fast queries, which run in under 10 seconds
- Slow queries, which take between 20 seconds and four minutes to run
- ETL queries, which can be quick, but often take up to 20 minutes or longer; these are run asynchronously in the background and produce reporting tables to power the aforementioned near-instant queries
Here’s a histogram that shows the distribution of query volume by current query run time:
Hardware’s Impact on Cluster-Level Performance
Our immediate instinct was to fire up our AWS console and take a look at what these new node types were doing.
As expected, with the new hardware, read and write latencies dropped across the board for all our cluster types: small dcl, large dcl and small dcxl.
As much as lower read and write latencies are nice, our end users don’t see them. They see loading bars and query run times, so let’s focus on those.
Shorter Loading Bars and Query Run Times With Amazon Redshift
Our small dc8xl cluster sees a nearly 2x decrease in loading bar times!
And though loading bars are frustrating, reducing frustration doesn’t equate with a return on investment.
Instead we tried to proxy customer value with the number of queries we could serve in an hour.
At peak load times, we see nearly double the number of queries returned per hour on the small dc8xl and even more on the small dcl!
What about actual query run times? This gets a little tricky. Of the queries run per day on our clusters, a plurality are brand new queries and are attributable to analysts, well, analyzing. It wouldn’t be fair to introduce a sample bias by just tracking the queries we had seen before, so let’s take a look at this by the heuristics introduced earlier: How are our near-instant, fast and ETL queries doing? The slow queries don’t stick around much anyway and are usually refactored into ETL, so we won’t dive into those.
Despite already being fast, near-instant queries speed up by almost a second. And in some cases, our ETL averages are nearly 5x the speed!
It wasn’t just outliers shifting around means: The median run times saw tons of improvement too.
We also saw improvements of up to 5x on a per-query level.
Ready to Improve Your Query Speeds?
You’ll be glad to know the Redshift team is providing these new node types at no extra cost. We know we’ll be switching all of our clusters over in just a bit.