SQL is one of the analyst&#8217;s most powerful tools. In SQL Superstar, we give you actionable advice to help you get the most out of this versatile language and create beautiful, effective queries. 



<h2 class="wp-block-heading">One problem, many solutions</h2>



For today’s <a href="https://www.sisense.com/reporting/">daily report</a>, we need a list of users and the most recent widget each user has created. We have a users table and a widgets table, and each user has many widgets. users.id is the primary key on users, and widgets.user_id is the corresponding foreign key in widgets.



To solve this problem, we need to join only the first row. There are several ways to do this. Here are a few different techniques and when to use them.



<h2 class="wp-block-heading">Use Correlated Subqueries when the foreign key is indexed</h2>



Correlated subqueries are subqueries that depend on the outer query. It’s like a for loop in SQL. The subquery will run once for each row in the outer query:



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Correlated-subqueries.png" alt="Correlated subqueries" class="wp-image-74411"/></figure>



Get SQL tips and tricks from our experts:



<a class="action-btn " href="https://www.sisense.com/whitepapers/sql-analytics-best-practices-tips-and-tricks/" target="_blank" rel="noopener noreferrer">Read More</a>



<h2 class="wp-block-heading">Use a Complete Subquery when you don’t have indexes</h2>



Correlated subqueries break down when the foreign key isn’t indexed, because each subquery will require a full table scan.



In that case, we can speed things up by rewriting the query to use a single subquery, only scanning the widgets table once:



<pre class="wp-block-code"><code>select * from users join (
 select distinct on (user_id) * from widgets
 order by user_id, created_at desc
) as most_recent_user_widget
on users.id = most_recent_user_widget.user_id</code></pre>



We’ve used Postgres’ DISTINCT ON syntax to easily query for only one widget per user_id. If your database doesn’t support something like DISTINCT ON, you have two options:



<h2 class="wp-block-heading">Use Nested Subqueries if you have an ordered ID column</h2>



In our example, the most recent row always has the highest id value. This means that even without DISTINCT ON, we can cheat with our nested subqueries like this:



<pre class="wp-block-code"><code>select * from users join (
 select * from widgets
 where id in (
 select max(id) from widgets group by user_id
 )
) as most_recent_user_widget
on users.id = most_recent_user_widget.user_id</code></pre>



We start by selecting the list of IDs representing the most recent widget per user. Then we filter the main widgets table to those IDs. This gets us the same result as DISTINCT ON since sorting by id and created_at happen to be equivalent.



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Complete-subqueries.png" alt="Complete subqueries" class="wp-image-74416"/></figure>



We’ve used Postgres’ DISTINCT ON syntax to easily query for only one widget per user_id. If your database doesn’t support something like DISTINCT ON, you have two options:



<h2 class="wp-block-heading">Use Nested Subqueries if you have an ordered ID column</h2>



In our example, the most recent row always has the highest id value. This means that even without DISTINCT ON, we can cheat with our nested subqueries like this:



<pre class="wp-block-code"><code>select * from users join (
 select * from widgets
 where id in (
 select max(id) from widgets group by user_id
 )
) as most_recent_user_widget
on users.id = most_recent_user_widget.user_id</code></pre>



We start by selecting the list of IDs representing the most recent widget per user. Then we filter the main widgets table to those IDs. This gets us the same result as DISTINCT ON since sorting by id and created_at happen to be equivalent.



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Nested-subqueries.png" alt="Nested subqueries" class="wp-image-74421"/></figure>



<h2 class="wp-block-heading">Use Window Functions if you need more control</h2>



If your table doesn’t have an id column, or you can’t depend on its min or max to be the most recent row, use row_number with a window function. It’s a little more complicated, but a lot more flexible:



<pre class="wp-block-code"><code>select * from users join (
 select * from (
 select *, row_number() over (
 partition by user_id
 order by created_at desc
 ) as row_num
 from widgets
 ) as ordered_widgets
 where ordered_widgets.row_num = 1
) as most_recent_user_widget
on users.id = most_recent_user_widget.user_id
order by users.id</code></pre>



The interesting part is here:



<pre class="wp-block-code"><code>select *, row_number() over (
 partition by user_id
 order by created_at desc
) as row_num
from widgets</code></pre>



over (partition by user_id order by created_at desc specifies a sub-table, called a window, per user_id, and sorts those windows by created_at desc. row_number() returns a row’s position within its window. Thus the first widget for each user_id will have row_number 1.



In the outer subquery, we select only the rows with a row_number of 1. With a similar query, you could get the 2nd or 3rd or 10th rows instead.



<figure class="wp-block-image fancybox"><img decoding="async" src="https://cdn.sisense.com/wp-content/uploads/Window-functions.png" alt="Window functions" class="wp-image-74428"/></figure>



Get SQL tips and tricks from our experts:



<a class="action-btn " href="https://www.sisense.com/whitepapers/sql-analytics-best-practices-tips-and-tricks/" target="_blank" rel="noopener noreferrer">Read More</a>

4 Ways to Join Only The First Row in SQL

SQL Superstar

LinkedIn

Twitter

GitHub

curve-image-unique-image-unique

curve

3-dark-2-image-unique-image-unique

3 DARK 2

Get the latest in analytics right in your inbox.

Article