Sean Cook
Sean has chased the best tech from San Diego to NYC to SF working in GIS, data science and marketing. You can find him alternatively creating dazzling new analyses and driving content strategy at Periscope Data.

The High-Performance SQL Blog

SQL Query Order of Execution

June 8, 2017

SQL Order of Execution

The SQL order of execution defines the order in which the clauses of a query are evaluated. Some of the most common query challenges I run into could be easily avoided with a clearer understanding of the SQL order of execution, sometimes called the order of operations. Understanding query order can help you diagnose why a query won't run, and even more frequently will help you optimize your queries to run faster.

In today's modern world, SQL query planners can do all sorts of tricks to make queries run more efficiently, but they must always reach the same final answer as a query that is executed per the standard order of execution. This order is:

FROM Clause

SQL's from clause selects and joins your tables and is the first executed part of a query. This means that in queries with joins, the join is the first thing to happen.

It's a good practice to limit or pre-aggregate tables before potentially large joins, which can otherwise be very memory intensive. Many modern SQL planners use logic and different types of joins to help optimize for different queries, which can be helpful but shouldn't be relied on. 

In an instance like below, the SQL planner may know to pre-filter pings. That technically violates the "correct" order, but will return the correct result.

select
 count(*)
from
 pings
join
 signups
on
 pings.cookie = signups.cookie
where
 pings.url ilike '%/blog%'

However, if you are going to use columns in a way that prevents pre-filtering, the database will have to sort and join both full tables. For example, the following query requires a column from each table and will be forced into a join before any filtering takes place.

-- || is used for concatenation
select
 count(*)
from
 first_names
join last_names
 on first_names.id = last_names.id
where
 first_names.name || last_names.name ilike '%a%'

To speed up the query, you can pre-filter names with "a" in them:

with limited_first_names as (
 select
   *
 from
   first_names
 where
   name ilike '%a%'
)
, limited_last_names as (
  select
    *
  from
    last_names
  where
     name ilike '%a%'
)
select
 count(*)
from
 limited_first_names
join
 limited_last_names
on
 limited_last_names.id = limited_first_names.id

To learn more, you can also read about how we sped up our own queries by 50x using pre-aggregation.

WHERE Clause

The where clause is used to limit the now-joined data by the values in your table's columns. This can be used with any data type, including numbers, strings, or dates.

where nmbr > 5;
where strng = 'Skywalker';
where dte = '2017-01-01';

One frequent "gotcha" in SQL is trying to use a where statement to filter aggregations, which will violate the rules of SQL's order of execution. This is because when the where statement is being evaluated, the group by statement has yet to be executed and aggregate values are unknown. Thus, the following query will fail:

select
 country
, sum(area)
from
 countries
where
 sum(area) > 1000
group by
 1

But it can be solved using the having clause, explained below.

GROUP BY Clause

Group by collapses fields of the result set into their distinct values. This clause is used with aggregations such as sum() or count() to show one value per grouped field or combination of fields.

When using group by:

Group by X means put all those with the same value for X in the same row.
Group by X, Y put all those with the same values for both X and Y in the same row.

The group by clause is worthy of its own post for many reasons, and you can find a lot more information about group by on our blog.

HAVING Clause

The having clause is used in SQL to perform the same function as the where clause, but with aggregate values. So while the final query in the above where section will fail, this will work perfectly.

select
  country
  , sum(area)
from
  countries
group by
  1
having
  sum(area) > 100000

The having clause is a bit of a latecomer to SQL, added because of the frequent need to filter aggregates. Prior to the having clause, a nested sub-query would have been used to reach the same result, for example:

select
  *
from
  (select
    country
    , sum(area) as area
  from
    countries
  group by
    1) areas
where  
  areas.area > 100000

[Window Functions]

If you are using any window functions, they will be calculated immediately after the having clause. Window functions may only be used in in the select and order by clauses because of their late stage in the execution plan. You may, however, use an aggregate function inside of a window function because they have already been computed.

SELECT Clause

The select statement is where you finally specify the values and aggregations remaining in the data set after the grouping and pruning have occurred. The execution will be almost complete at this point, with only orders and limits left to specify.

ORDER BY Clause

Order by is used to return the table sorted by a column or selection of columns. You can use asc and desc to define whether you want a low-to-high or high-to-low order. The following query will list countries from largest to smallest.

select
  country  
  , area
from  
  country_data
order by 
  2 desc

LIMIT Clause

The limit clause will let you specify how many rows you want returned. This is frequently used to speed up work and avoid returning large amounts of data. It can also be used in combination with the order by clause to find the "Top N" of a column. For instance, we could find the 10 largest countries with a query like:

select
  
country
  , area
from
  
country_data
order by
  2 desc
limit 
 
10

Get the Most out of Your Database

A good understand of the order of execution will make writing efficient SQL much easier, and you will begin to see immediate improvements as you consider how your SQL optimizer is planning your queries. We have frequently seen speed-ups of 50x or more in queries with some simple changes. Every query you improve helps other queries in your database finish faster, so learning the order of execution is definitely a worthwhile investment.


Want to discuss this article? Join the Periscope Data Community!
Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?

Subscribe to our Newsletter