Periscope Data
The world’s fastest visualization platform for data analysts.

Helpful Redshift Admin Queries

April 30, 2015

It is easy to treat Redshift as a black box — queries go in, answers come out. When something goes wrong, though, you’ll want to open the hood and see what Redshift is actually doing.

To dig into any issues, each Redshift cluster provides virtual system tables you can query.

STL, STV, SVV, OMG?

Like Postgres, Redshift has the information_schema and pg_catalog tables, but it also has plenty of Redshift-specific system tables.

All Redshift system tables are prefixed with stl_stv_svl_ or svv_.

  • The stl_ prefix denotes system table logs. stl_ tables contain logs about operations that happened on the cluster in the past few days.
  • The stv_ prefix denotes system table snapshots. stv_ tables contain a snapshot of the current state of the cluster.
  • The svl_ prefix denotes system view logs. svl_ views join some number of system tables to provide more descriptive info.
  • The svv_ prefix denotes system view snapshots. Like svl_, the svv_ views join some system tables to provide more descriptive info.

Current Cluster Status

One of the most common reasons to log into the Redshfit console is to kill a misbehaving query. To find which queries are currently in progress, check the stv_inflight table.

select
  userid
  , query
  , pid
  , starttime
  , left(text, 50) as text
from stv_inflight

You’ll end up with a table like this:

To kill a query, use the cancel <pid> <msg> command. Be sure to use the process id — pid in the table above — and not the query id. You can supply an optional message which will be returned to the issuer of the query and logged.

Redshift also stores the past few days of queries in svl_qlog if you need to go back further. The stv_recents view has all recently queries with their status, duration, and pid for currently-running queries.

All of these tables only store the first 200 characters of each query. The full query is stored in chunks in stl_querytext. Join this table in by query, and sort by query_id and sequence to get each 200 character chunk in order.

select 
  query,
  starttime, 
  text,
  sequence 
from stl_query 
join stl_querytext using (query) 
order by query,sequence 
limit 5;

Deadlocks

If your cluster has a suspiciously long-running update, it may be in a deadlocked transaction. The stv_locks table will indicate any transactions that have locks along with the process id of the relevant sessions. This pid can be passed to pg_terminate_backend(pid) to kill the offending session.

To inspect the locks, order them by oldest first.

select 
  table_id, 
  last_update, 
  last_commit, 
  lock_owner_pid, 
  lock_status 
from stv_locks 
order by last_update asc

To terminate the session, run select pg_terminate_backend(lock_owner_pid), using the value from stl_locks.

Connection Issues

Debugging connection issues is never fun. Luckily, Redshift has a few tables that make up for the lack of a network debugging tool.

The stv_sessions table lists all the current connection, similar to Postgres’s pg_stat_activity. While useful, it doesn’t have the actual connection information for host and port. That can be found in stl_connection_log. This table has a list of all connects, authenticates, and disconnects on your cluster. Joining these tables returns a list of sessions, along with remote host information.

select distinct
  starttime
  , process
  , user_name
  , '169.254.21.1' remotehost
  , remoteport
from stv_sessions
left join stl_connection_log on pid = process
  and starttime > recordtime - interval '1 second'
order by starttime desc

You’ll get a table like this:

Query Performance

STL_ALERT_EVENT_LOG

The stl_alert_event_log table is important for optimizing queries. As the cluster executes your query, it records problems found by the query planner into stl_alert_event_log along with suggested fixes. Some problems can be fixed by running analyze or vacuum, while others might require rewriting the query or changing your schema.

SVV_TABLE_INFO

svv_table_info returns extended information about state on disk of your tables. This table can help troubleshoot low-performing tables. While we recommend regular vacuuming and other maitenance, you can also use this table as a guide for when to vacuum.

Here are the column names you’ll see in the svv_table_info table:

  • empty shows how many blocks are waiting to be freed by a vacuum.
  • unsorted shows the percent of the table that is unsorted. The cluster will need to scan this entire section for every query. You need to vacuum to re-sort and bring this back to 0.
  • sortkey1_enc lists the encoding of the first sortkey. This can sometimes affect lookup performance.
  • skew_sortkey1 shows the ratio of the size of the first column of the sortkey to the size of the largest non-sortkey column, if a sortkey is defined. You can use this value to evaluate the effectiveness of the sortkey.
  • skew_rows shows the ratio of rows from most on a slice to least on a slice. Use it to evaluate distkey.
  • max_varchar show the size of the largest varchars. While varchars compress well, they can force a temporary result which otherwise fits in ram to be stored on disk, reducing query performance.

For help on changing the sort and distkeys of your giant tables, check out this post.

Copying Tables

If you want to copy or split a table, Redshift supports both create table like, and create table as syntax.

create table like copies the structure, compression, distribution and sortkey. This is great for archiving tables as it keeps the compression settings.

create table as creates a table and fills it with the given query. You can supply optional sort and distkeys. Note that this won’t compress the table, even if the source tables are compressed. create table as is best for small temporary tables, as compression helps with performance for an upfront cost.

create table events_201404 as (
  select * 
  from events 
  where created_at >= '2014-04-01' and created_at < '2014-05-01'
);
create table events_201404 like events;
insert into events_201404 (
  select * 
  from events 
  where created_at >= '2014-04-01' and created_at < '2014-05-01'
);

To create a compressed table from a query after using create table as, run analyze compression, create a new table with those encodings, and then copy the data into the new table. Alternatively, unload the data somewhere, and load it back with copy.

Further Reading

For more tips on Redshift maintenance, take a look at our Redshift Maintenance 101 post. If you need tips on importing your data into Redshift from a MySQL or Postgres database, we’ve got you covered here.

We hope this gives you a solid base for troubleshooting Redshift issues. As always, shoot a tweet to @PeriscopeData or an email to [email protected] with any questions.

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?