Want to make data analysis fast for everyone?Join Us!
Many SQL implementations don’t have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!
We’ll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.
Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.
Recursive CTEs are more powerful - they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to a
for loop in other programming languages.
These CTEs have two parts — an
anchor member and a
recursive member. The
anchor member selects the starting rows for the recursive steps.
recursive member generates more rows for the CTE by first joining against the
anchor rows, and then joining against rows created in previous recursions. The
recursive member comes after a
union all in the CTE definition.
Here’s a simple recursive CTE that generates the numbers 1 to 10. The
anchor member selects the value 1, and the
recursive member adds to it up to the number 10:
The first time the recursive CTE runs it generates a single row
1 using the
anchor member. In the second execution, the
recursive member joins against the
1 and outputs a second row,
2. In the third execution the
recursive step joins against both rows
2 and adds the rows
2 (a duplicate) and
Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.
Notice how the CTE specifies its output as the named value
prev_val. This lets us refer to the output of the previous recursive step.
And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!
Under the hood, the database is building up a table named after this recursive CTE using unions:
Recursive CTEs can also have many parameters. Here’s one that takes the sum, double, and square of starting values of 1, 2 and 3:
With recursive CTEs we can solve the Traveling Salesman Problem.
Finding the Shortest Path
There are many algorithms for finding the shortest round-trip path through several cities. We’ll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We’ll then sort to find the shortest.
First, a list of cities with Periscope customers, along with their latitudes and longitudes:
And we’ll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com):
Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:
The parameters in the CTE are:
places_chain: The list of places visited so far, which will be different for each instance of the recursion
last_lat and last_lon: The latitude and longitude of the last place in the
total_distance: The distance traveled going from one place to the next in the
num_places: The number of places in
places_chain— we’ll use this to tell which routes are complete because they visited all cities
recursive member, the
where clause ensures that we never repeat a place. If we’ve already visited Denver,
position(...) will return a number greater than 0, invalidating this instance of the recursion.
We can see all possible routes by selecting all 8-city chains:
We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco’s lat/lon, but a join is more elegant. Once that’s done we sort by distance and show the smallest:
Even though this query is significantly more complicated than the
incrementer query earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE’s rows, the bottom branch is the final join and sort:
Run this query and you’ll see the shortest route takes 6671 miles and visits the cities in this order:
Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!