###### Common Statistical Operations

# Solving the Traveling Salesman Problem with Postgres Recursive CTEs

Many SQL implementations don’t have loops, making some kinds of analysis very difficult. Postgres, SQL Server, and several others have the next best thing — recursive CTEs!

We’ll use them to solve the Traveling Salesman Problem and find the shortest round-trip route through several US cities.

### With Recursive

Normal CTEs are great at helping to organize large queries. They are a simple way to make temporary tables you can access later in your query.

Recursive CTEs are more powerful - they reference themselves and allow you to explore hierarchical data. While that may sound complicated, the underlying concept is very similar to a * for* loop in other programming languages.

These CTEs have two parts — an * anchor* member and a

*member. The*

**recursive***member selects the starting rows for the recursive steps.*

**anchor**The * recursive* member generates more rows for the CTE by first joining against the

*rows, and then joining against rows created in previous recursions. The*

**anchor***member comes after a*

**recursive***in the CTE definition.*

**union all**Here’s a simple recursive CTE that generates the numbers 1 to 10. The * anchor* member selects the value 1, and the

*member adds to it up to the number 10:*

**recursive**withrecursiveincrementer(prev_val)as(

select1-- anchor member

unionall

select-- recursive member

incrementer.prev_val +1

fromincrementer

whereprev_val<10-- termination condition

)

select*fromincrementer

The first time the recursive CTE runs it generates a single row * 1* using the

*member. In the second execution, the*

**anchor***member joins against the*

**recursive***and outputs a second row,*

**1***. In the third execution the*

**2****recursive**step joins against both rows

*and*

**1***and adds the rows*

**2***(a duplicate) and*

**2***.*

**3**Recursive CTEs also only return distinct rows. Even though our CTE above creates many rows with the same value, only a distinct set of rows will be returned.

Notice how the CTE specifies its output as the named value * prev_val*. This lets us refer to the output of the previous recursive step.

And at the very end there is a termination condition to halt the recursion once the sum gets to 10. Without this condition, the CTE would enter an infinite loop!

Under the hood, the database is building up a table named after this recursive CTE using unions:

Recursive CTEs can also have many parameters. Here’s one that takes the sum, double, and square of starting values of 1, 2 and 3:

withrecursivecruncher(inc, double, square)as(

select1,2.0,3.0-- anchor member

unionall

select-- recursive member

cruncher.inc +1,

cruncher.double *2,

cruncher.square ^2

fromcruncher

whereinc<10

)

select*fromcruncher

With recursive CTEs we can solve the Traveling Salesman Problem.

### Finding the Shortest Path

There are many algorithms for finding the shortest round-trip path through several cities. We’ll use the simplest: brute force. Our recursive CTE will enumerate all possible routes and their total distances. We’ll then sort to find the shortest.

First, a list of cities with Periscope customers, along with their latitudes and longitudes:

createtableplacesas(

select

'Seattle'asname,47.6097aslat,122.3331aslon

unionallselect,'San Francisco'37.7833,122.4167

unionallselect,'Austin'30.2500,97.7500

unionallselect,'New York'40.7127,74.0059

unionallselect,'Boston'42.3601,71.0589

unionallselect,'Chicago'41.8369,87.6847

unionallselect,'Los Angeles'34.0500,118.2500

unionallselect,'Denver'39.7392,104.9903

)

And we’ll need a distance function to compute how far two lat/lons are from each other (thanks to strkol on stackoverflow.com):

createorreplacefunctionlat_lon_distance(

lat1 float, lon1 float, lat2 float, lon2 float

)returnsfloatas$$

declare

x float=69.1*(lat2-lat1);

y float=69.1*(lon2-lon1)*cos(lat1/57.3);

begin

returnsqrt(x*x+y*y);

end

$$languageplpgsql

Our CTE will use San Francisco as its anchor city, and then recurse from there to every other city:

withrecursivetravel(places_chain, last_lat, last_lon,

total_distance, num_places)as(

select-- anchor member

name, lat, lon,0::float,1

fromplaces

wherename='San Francisco'

unionall

select-- recursive member

-- add to the current places_chain

travel.places_chain||' -> '||places.name,

places.lat,

places.lon,

-- add to the current total_distance

travel.total_distance +

lat_lon_distance(last_lat, last_lon, places.lat, places.lon),

travel.num_places +1

from

places, travel

where

position(places.nameintravel.places_chain)=0

)

The parameters in the CTE are:

: The list of places visited so far, which will be different for each instance of the recursion**places_chain**: The latitude and longitude of the last place in the**last_lat and last_lon****places_chain**: The distance traveled going from one place to the next in the**total_distance****places_chain**: The number of places in**num_places**— we’ll use this to tell which routes are complete because they visited all cities**places_chain**

In the * recursive* member, the

*clause ensures that we never repeat a place. If we’ve already visited Denver,*

**where***will return a number greater than 0, invalidating this instance of the recursion.*

**position(...)**We can see all possible routes by selecting all 8-city chains:

select*fromtravelwherenum_places=8

We need to add in the distance from the last city back to San Francisco to complete the round-trip. We could hard code San Francisco’s lat/lon, but a join is more elegant. Once that’s done we sort by distance and show the smallest:

select

travel.places_chain||' -> '||places.name,

total_distance+lat_lon_distance(

travel.last_lat, travel.last_lon,

places.lat, places.lon)asfinal_dist

fromtravel, places

where

travel.num_places=8

andplaces.name='San Francisco'

orderby2-- ascending!

limit1

Even though this query is significantly more complicated than the * incrementer* query earlier, the database is doing the same things behind the scenes. The top branch is the creating the CTE’s rows, the bottom branch is the final join and sort:

Run this query and you’ll see the shortest route takes 6671 miles and visits the cities in this order:

San Francisco -> Seattle -> Denver ->

Chicago -> Boston -> New York -> Austin ->

Los Angeles -> San Francisco

Thanks to recursive CTEs, we can solve the Traveling Salesman Problem in SQL!