Geographic Analysis in SQL: Measuring Polygon Area from Latitude and Longitude
Previously, we've demonstrated using the Haversine Formula to calculate distances on a globe in SQL. Today we'll extend that to look at areas for irregular polygons defined by latitude-longitude coordinates.
Computing the area for polygons on a globe is challenging for a number of reasons:
- A polygon drawn on the Earth's surface cannot be computed using simple Pythagorean math. To deal with this, we'll project our latitude-longitude coordinates to a plane.
- For polygons with more than three points, the drawing order is important! The correct area is calculated using the coordinates of the points in relation to their neighbors. An improperly ordered table of polygon vertices will return wildly inaccurate results.
We will use a table "points", containing several lat-long points in the Bay Area that will be the vertices of our polygon:
Which looks like this when plotted on Google Maps:
Converting Latitude and Longitude to (x, y) Coordinates
First we'll convert our latitude and longitude to Cartesian coordinates. There are a lot of options when projecting a sphere to a flat surface—each of which has trade-offs depending on whether you want to preserve area, shape or be able to plot a ship's course on a constant bearing with a straight line.
For our purpose, we want an equal-area projection. We'll use a sinusoidal projection, which can be projected from lat-long to Cartesian coordinates, and is defined by:
x = (longitude - prime_meridian) * cosine(latitude)
y = latitude
Here is a picture of a sinusoidal projection. Please don't use this for navigation:
We'll convert our map into a new unit more appropriate for measuring area in the process. I'll use kilometers, which is factored into our conversion through the number 6371, the Earth's radius in km.
When we project our points to x,y Cartesian coordinates, we'll grab a few more useful numbers:
- The geographic mean avgx, avgy of the vertices using a window function. We'll need this later to define pathing for our polygon; i.e., the order in which the vertices are connected.
- Our points normalized to the mean, so we can do some math centered around 0,0.
with sinusoidal as (
-- y-axis variables
lat * 3.14159 * 6371 / 180 as y
, avg(lat) over() * 3.14159 * 6371 / 180 as avgy
, avg(lat) over() * 3.14159 * 6371 / 180 - (lat * 3.14159 * 6371 / 180) as normy
-- x-axis variables
, lng * 3.14159 * 6371 / 180 * cos(radians(lat)) as x
, avg(lng) over() * 3.14159 * 6371 / 180 * cos(radians(avg(lat) over())) as avgx
, avg(lng) over() * 3.14159 * 6371 / 180 * cos(radians(avg(lat) over())) - (lng * 3.14159 * 6371 / 180 * cos(radians(lat))) as normx
Drawing a Polygon
We need to work out the pathing between the vertices. We are going to compute the area using the angles between points, so knowing which points come before and after the current point is critical. To think of this another way, four vertices could define both of the following shapes:
We want our path to follow the outside perimeter of our polygon, touching each vertex and never intersecting itself. One way to do this is to compute the angle between the geographic mean of our polygon and each point, and then order our table by the angle:
We can determine our angles using atan, like so:
, angles as (
, (atan(normy / normx) * 180) / 3.14159
+ ((normx / abs(normx)) * -1) * 90 + 180
The table now looks like:
Now we have a column to order our vertices and thus define our polygon!
Each vertex will need to have values calculated between itself and its neighbors in a complete loop. We can union all the table to itself so the first row can look at the last row using a lag function. An "ordr" column can be used to grab the second half of the table later, which will have fully populated data.
, unioned as (
1 as ordr
Ok, our last CTE!
We use the lag window function to append the neighboring vertex's x,y coordinates. I've slightly modified this query and my final equation from the standard form to avoid using both a lead and a lag function, instead opting to use lag(x,1) and lag(y,2). This is the step where table order is critical, and we need to order by both our ordr and angle columns to lag to the correct data.
, final as (
, lag(y, 2) over(order by ordr, angle) as lagy2
, lag(x, 1) over(order by ordr, angle) as lagx
With our table fully populated, we can now calculate our polygon's area. More information on this algorithm, as well as an implementation in C++, can be found here.
abs(sum(lagx * (y - lagy2)) / 2) as area
ordr = 2
Any projection and measurement will have errors, related to both the projection's limitations and the Earth being an oblate spheroid rather than a true sphere, so it's important to validate any geospatial algorithm you use in code.
In this case. our query returns a result of 766.93 sq. km, and Google Earth returns 767 sq. km (without the option to see decimals)—making this method very robust at the city-regional scale!