Helpful Admin Queries
You have a running Redshift cluster. You've loaded data and run queries. Now you want to know what's going on inside the cluster. Which queries are running? Which nodes are getting full? Are some queries taking way too long because they're not using the correct
The AWS Redshift console is helpful, but it doesn't have all the answers. The system tables in your Redshift cluster have data about actively running and recently completed queries, table sizes and node capacity, and overall performance metrics.
In this section we'll show you how to query Redshift's system tables to answer these kinds of questions.
Redshift-Specific System Tables
It is easy to treat Redshift as a black box — queries go in, answers come out. When something goes wrong, though, you’ll want to open the hood and see what Redshift is actually doing.
To dig into any issues, each Redshift cluster provides virtual system tables you can query. Like Postgres, Redshift has the
pg_catalog tables, but it also has plenty of Redshift-specific system tables.
All Redshift system tables are prefixed with
stl_prefix denotes system table logs.
stl_tables contain logs about operations that happened on the cluster in the past few days.
stv_prefix denotes system table snapshots.
stv_tables contain a snapshot of the current state of the cluster.
svl_prefix denotes system view logs.
svl_views join some number of system tables to provide more descriptive info.
svv_prefix denotes system view snapshots. Like
svv_views join some system tables to provide more descriptive info.
Current Cluster Status
One of the most common reasons to log into the Redshift console is to kill a misbehaving query. To find which queries are currently in progress, check the
select userid , query , pid , starttime , left(text, 50) as text from stv_inflight
You'll end up with a table like this:
To kill a query, use the
cancel <pid> <msg> command. Be sure to use the process id — pid in the table above — and not the query id. You can supply an optional message which will be returned to the issuer of the query and logged.
Redshift also stores the past few days of queries in
svl_qlog if you need to go back further. The
stv_recents view has all recently queries with their status, duration, and pid for currently-running queries.
All of these tables only store the first 200 characters of each query. The full query is stored in chunks in
stl_querytext. Join this table in by
query, and sort by
sequence to get each 200 character chunk in order.
select query, starttime, text, sequence from stl_query join stl_querytext using (query) order by query,sequence limit 5;
If your cluster has a suspiciously long-running update, it may be in a deadlocked transaction. The
stv_locks table will indicate any transactions that have locks along with the process id of the relevant sessions. This pid can be passed to
pg_terminate_backend(pid) to kill the offending session.
To inspect the locks, order them by oldest first.
select table_id, last_update, last_commit, lock_owner_pid, lock_status from stv_locks order by last_update asc
To terminate the session, run
select pg_terminate_backend(lock_owner_pid), using the value from
Debugging connection issues is never fun. Luckily, Redshift has a few tables that make up for the lack of a network debugging tool.
stv_sessions table lists all the current connection, similar to Postgres's
pg_stat_activity. While useful, it doesn’t have the actual connection information for host and port. That can be found in
stl_connection_log. This table has a list of all connects, authenticates, and disconnects on your cluster. Joining these tables returns a list of sessions and remote host information.
select distinct starttime , process , user_name , '169.254.21.1' remotehost , remoteport from stv_sessions left join stl_connection_log on pid = process and starttime > recordtime - interval '1 second' order by starttime desc
You'll get a table like this:
stl_alert_event_log table is important for optimizing queries. When the cluster executes your query, it records problems found by the query planner into
stl_alert_event_log along with suggested fixes. Some problems can be fixed by running
vacuum , while others might require rewriting the query or changing your schema.
svv_table_info returns extended information about state on disk of your tables. This table can help troubleshoot low-performing tables. While we recommend regular vacuuming and other maintenance, you can also use this table as a guide for when to
Here are the column names you'll see in the
emptyshows how many blocks are waiting to be freed by a
unsortedshows the percent of the table that is unsorted. The cluster will need to scan this entire section for every query. You need to
vacuumto re-sort and bring this back to 0.
sortkey1_enclists the encoding of the first
sortkey. This can sometimes affect lookup performance.
skew_sortkey1shows the ratio of the size of the first column of the
sortkeyto the size of the largest
non-sortkeycolumn, if a
sortkeyis defined. You can use this value to evaluate the effectiveness of the
skew_rowsshows the ratio of rows from most on a slice to least on a slice. Use it to evaluate
max_varcharshow the size of the largest
varcharscompress well, they can force a temporary result which otherwise fits in ram to be stored on disk, reducing query performance.
For help on changing the
dist keys of your giant tables, check out this post.
If you want to copy or split a table, Redshift supports both
create table like, and
create table as syntax.
create table like copies the structure, compression, distribution and
sortkey. This is great for archiving tables as it keeps the compression settings.
create table as creates a table and fills it with the given query. You can supply optional
distkeys. Note that this won’t compress the table, even if the source tables are compressed.
create table as is best for small temporary tables, as compression helps with performance for an upfront cost.
create table events_201404 as ( select * from events where created_at >= '2014-04-01' and created_at < '2014-05-01' ); create table events_201404 like events; insert into events_201404 ( select * from events where created_at >= '2014-04-01' and created_at < '2014-05-01' );
To create a compressed table from a query after using
create table as, run
analyze compression, create a new table with those encodings, and then copy the data into the new table. Alternatively,
unload the data somewhere, and load it back with
For more tips on Redshift maintenance, take a look at our Maintenance Section. We hope this gives you a solid base for troubleshooting Redshift issues!