<p>Keeping your Redshift clusters running well requires maintenance. Updating and deleting data creates dead rows that need to be vacuumed, and even append-only tables need to be re-sorted if the append order is not consistent with the sort key.</p>



<p>The best compression encodings for your tables can change as the data changes, and you’ll want to resize your cluster before it gets too full to run queries.</p>



<h2 class="wp-block-heading">When Not to Vacuum</h2>



<p>Most guidance around vacuuming says to do it as often as necessary. When in doubt, we recommend nightly. But vacuum operations can be very expensive on the cluster, greatly reducing query performance. You can skip vacuuming tables in certain situations:</p>



<h3 class="wp-block-heading">Data is loaded in sort key order</h3>



<p>When new rows are added to a Redshift table, they’re appended to the end of the table in an “unsorted region”. For most tables, this means you have a bunch of rows at the end of the table that need to be merged into the sorted region of the table by a vacuum.</p>



<p>You don’t need to <a href="https://docs.aws.amazon.com/redshift/latest/dg/vacuum-load-in-sort-key-order.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">vacuum when appending rows in sort key order</a>: If you’re adding new rows to an events table that is sorted by the event’s time, the rows are already sorted when they’re added! In this case, you don’t need to resort this table with a vacuum because it’s never unsorted.</p>



<h3 class="wp-block-heading">A lot of data is unsorted</h3>



<p>If it’s been a long time since you vacuumed the table or if you’ve appended a ton of unsorted data, it can be faster to copy the table than to vacuum it.</p>



<p>You can recreate the table with all the same columns, compression encodings, and dist and sort keys with create table like: </p>



<pre class="wp-block-code"><code>create table events_copy (like events);
insert into events_copy (select * from events);
drop table events;
alter table events_copy rename to events</code></pre>



<h3 class="wp-block-heading">A lot of data was deleted</h3>



<p>Unlike Postgres, <a href="https://docs.aws.amazon.com/redshift/latest/dg/r_VACUUM_command.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">the default vacuum operation in Redshift is vacuum full</a>. This operation reclaims dead rows and resorts the table.</p>



<p>If you’ve recently deleted a lot of rows from a table, you might just want to get the space back. You can use a <a href="https://docs.aws.amazon.com/redshift/latest/dg/t_Reclaiming_storage_space202.html#vacuum-types" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">delete-only vacuum</a> to compact the table without spending the time to resort the remaining rows:</p>



<pre class="wp-block-code"><code>vacuum delete only events</code></pre>



<p>You can see how many rows were deleted or resorted from the most recent vacuums by querying svv_vacuum_summary:</p>



<pre class="wp-block-code"><code>select * from svv_vacuum_summary
where table_name = 'events'</code></pre>



<p>And it’s always a good idea to analyze a table after a major change to its contents:</p>



<pre class="wp-block-code"><code>analyze events</code></pre>



<h2 class="wp-block-heading">Rechecking Compression Settings</h2>



<p>When you copy data into an empty table, <a href="https://docs.aws.amazon.com/redshift/latest/dg/c_Loading_tables_auto_compress.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">Redshift chooses the best compression encodings</a> for the loaded data. As data is added and deleted from that table, the best compression encoding for any column might change.</p>



<p>What used to make sense as a bytedict might now be better off as a delta encoding if the number of unique values in the column has grown substantially.</p>



<p>To see the current compression encodings for a table, query pg_table_def:</p>



<pre class="wp-block-code"><code>select "column", type, encoding
from pg_table_def
where tablename = 'events'</code></pre>



<p>And to see what Redshift recommends for the current data in the table, run <a href="https://docs.aws.amazon.com/redshift/latest/dg/r_ANALYZE_COMPRESSION.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">analyze compression</a>:</p>



<pre class="wp-block-code"><code>analyze compression events</code></pre>



<p>Then simply compare the results to see if any changes are recommended.</p>



<p>Redshift doesn’t currently have a way to alter the compression encoding of a column. You can add a new column to the table with the new encoding, copy over the data, and then drop the old column:</p>



<pre class="wp-block-code"><code>alter table events add column device_id_new integer delta;
update events set device_id_new = device_id;
alter table events drop column device_id;
alter table events rename column device_id_new to device_id;</code></pre>



<h2 class="wp-block-heading">Monitoring Disk Space</h2>



<p>If your cluster gets too full, queries will start to fail because there won’t be enough space to create the temp tables used during query execution. Vacuums can also fail if there isn’t enough free space to store the intermediate data while it’s getting re-sorted.</p>



<p>To keep an idea on how much space is available in your cluster via SQL, <a href="https://docs.aws.amazon.com/redshift/latest/dg/welcome.html" target="_blank" rel="noreferrer noopener" aria-label=" (opens in a new tab)">query stv_partitions</a>:</p>



<pre class="wp-block-code"><code>select sum(used)::float / sum(capacity) as pct_full
from stv_partitions </code></pre>



<p>And to see individual table sizes:</p>



<pre class="wp-block-code"><code>select t.name, count(tbl) / 1000.0 as gb
from (
  select distinct datname id, name
  from stv_tbl_perm 
    join pg_database on pg_database.oid = db_id
  ) t
join stv_blocklist on tbl=t.id
group by t.name order by gb desc</code></pre>



<p>And then you can either drop unnecessary tables or resize your cluster to have more capacity!</p>


Redshift Maintenance 101

LinkedIn

Twitter

GitHub

curve-image-unique-image-unique

curve

3-dark-2-image-unique-image-unique

3 DARK 2

Get the latest in analytics right in your inbox.

Article