Managing Query Load
Not all queries need to be treated equally. Some queries are more important than others, and some require additional resources to be successful.
Redshift allows you to configure priority queues for your queries through its Workload Management (WLM) interface. You'll be able to separate your data import jobs from users running analysis queries, and make sure both have the resources they need to complete.
In this section we'll show you how to configure and monitor WLM to make the most of your cluster's resources.
Creating ETL Job Queues
Let's start by creating some queues for our ETL jobs. For this walkthrough, we’ll assume we have two types: small, quick jobs that run frequently and a few big, heavy ones that require serious resources.
The first step is to create a group for all our ETL users:
create group etl with etl_ro, etl_rw
Next, we need to set up workload management (WLM) queues, and assign them based on our group. WLM settings are part of the Parameter Group Settings, which you can find through the "Parameter Group" link in the sidebar of the Redshift console.
Select a parameter set or create a new one, and click Edit WLM.
Let's start with two new queues for our two types of ETL jobs. Queue #1 will be the big job queue, and we'll give it one slot and 20% of memory.
Queue #2 will be for all other ETL jobs. We'll give it concurrency of two and 20% memory, meaning each job will get 10% memory. Here's what our WLM settings look like:
Notice that queries from users in the
etl user group will automatically run in Queue #2.
Assigning Queries to Queues
By default, queries run in the default queue for their user group. For users in the
etl group, that's Queue #2.
Users can also set their query queue on a per-session basis with
set query_group. For example, to run in Queue #1:
set query_group to slow_etl_queue
For our set of queues, we might have all ETL jobs use a user account in the
etl user group, and additionally have slow queries set their
query_group before running. This way, slow ETLs use the slow queue, all other ETLs use the standard ETL queue.
Non-ETL queries will run in Queue #3, which is the default, because they have not specified otherwise. Since that queue has the default concurrency of 5, we can expect each slot to get one fifth of the remaining 60% memory, or 12% of cluster memory apiece.
Checking Our Work
We've set up what we think is the ideal slot structure. Now we want to make sure it works.
First, let's get the service classes for our queues:
select id , condition , action_service_class from STV_WLM_CLASSIFICATION_CONFIG where action_service_class > 4
We get the following results:
The first condition is the special super user queue. If all query slots are taken, the super user can still run queries with
set query_group to superuser. This allows one query at a time with a reserved portion of memory.
Now that we have each queue's service class, we can match it up to number of query slots and total memory:
select name , service_class , num_query_tasks as slots , query_working_mem from STV_WLM_SERVICE_CLASS_CONFIG where service_class > 4
Here are our results:
The name "Service Class #X" maps to "Queue #X" in the Redshift console, and of course
service_class maps to our previous query and other Redshift system tables.
service_class 1-4 are reserved for internal use, and
service_class 5 is always the superuser queue.
Finally, we’ll look at the queries in the general purpose queue,
select userid , query , service_class , slot_count , total_queue_time , total_exec_time , service_class_start_time from stl_wlm_query order by service_class_start_time desc limit 5
We get these results:
Notice that some queries are using multiple slots, as shown in the
slot_count column. That's because those queries have their
wlm_slot_count set above 1, instructing them to wait for multiple slots to be open and then consume all of them.
Time is logged first for waiting in queue, and then for executing. Both of those numbers can be very helpful in debugging slowdowns.
When Not to Use WLM
WLM is great at partitioning your cluster’s resources, but be wary of creating too many queues. If one queue is often full and another empty, then you’ll be wasting cluster capacity. In general, the more constraints you give your Redshift cluster, the less flexibility it will have to optimize your workload for you.
Finally, keep an eye on the
total_queue_time field of
stl_wlm_query and consider increasing the number of query slots for queues and service class with high queue times.