Comparing IP Addresses in SQL
It’s a common practice to store users’ IP addresses when logging actions to an events table. Most warehouses we’ve seen have chosen to store IPs as varchars.
While this makes querying with specific IP addresses easy, it can make filtering to ranges of IPs quite difficult depending on your database. Comparing IP addresses as varchars will yield incorrect results:
-- 100 is not less than 2, but is when compared as varchars!
select '22.214.171.124' < '126.96.36.199'
-- => true
Filtering to a range of IP addresses requires being able to compare any two IPs. Once you can compare, it’s trivial to filter records down to those with IPs in the range you care about. Here’s how to do just that in Postgres, MySQL, SQL Server, and Redshift.
Postgres makes working with IPs very easy. It supports an inet column type, which provides native support for IPs and IP ranges. If you’re using Postgres and didn’t use the inet column type, don’t worry, you can still cast your varchars to inets to make comparisons easy:
select '192.168.1.1'::inet < '255.1.1.1'::inet
And to check if a single IP is within a range, use the << operator:
select '192.168.1.1'::inet << '192.168.0.0/16'::inet
While MySQL doesn’t have a data type for IP addresses, it’s still easy to compare IPs in MySQL. Using inet_aton you can convert your IP addresses to integers before you compare.
-- => 3,232,235,777
IPv4 addresses are just nice ways of writing 32-bit numbers. Each octet of an IPv4 address represents 8 bits. Once the IPs have been converted to integers, they are easy to compare:
select inet_aton('192.168.1.1') < inet_aton('255.1.1.1')
With SQL Server there isn’t a built-in function to convert a varchar IP to an integer, but we can make one! We’ll use the function from Dennis Gobo over at sqlblog.com:
create function dbo.IPAddressToInteger(@ip as varchar(15))
convert(bigint, parsename(@ip, 1)) +
convert(bigint, parsename(@ip, 2)) * 256 +
convert(bigint, parsename(@ip, 3)) * 65536 +
convert(bigint, parsename(@ip, 4)) * 16777216
SQL Server doesn’t have split function, but the parsename function works great for our use case. It is normally used to extract parts of object identifiers like dbo.table1.column1. But we can use it to parse IP addresses because they are also dot-delimited:
-- => 3,232,235,777
We’re multiplying the extracted octets by large numbers to form the final 32-bit number representing the IP address. These multiplications and additions are equivalent to bit shifting and binary or’ing the octets together, which we’ll also do in the Redshift section.
Like with SQL Server, Redshift requires some gymnastics to convert a varchar IP to an integer IP. First split the varchar on the dots and convert each octet to an integer. Bitshift each octet with << to it’s position in the 32-bit version of the IP address, and finally use | to binary or the four integers together.
split_part(user_ip, '.', 1)::integer << 24 |
split_part(user_ip, '.', 2)::integer << 16 |
split_part(user_ip, '.', 3)::integer << 8 |
split_part(user_ip, '.', 4)::integer
Splitting and shifting many varchar IPs can take a while. To speed things up, you can use a CTE or a temporary table to store the results of this conversion before it’s time to compare the IP addresses.
But we can do better - this integer conversion is quite slow. Instead of converting the varchar IP to an integer, we’ll add a bunch of '0' padding to the varchar so that any two IP varchars will be comparable.
-- Octets with under 3 chars get 0-padded
lpad(split_part(user_ip, '.', 1), 3, '0') ||
lpad(split_part(user_ip, '.', 2), 3, '0') ||
lpad(split_part(user_ip, '.', 3), 3, '0') ||
lpad(split_part(user_ip, '.', 4), 3, '0')
-- '192.168.1.1' => 192168001001
The resulting values aren’t true IP addresses, but they will compare correctly. Be sure to convert the IP range’s bounds to the same format. As a bonus, this varchar version runs 15x faster than converting the IPs to integers.
You’re best option for performance is to store the IP addresses in a format that can be used natively in comparisons. For Postgres, use the inet type. For everything else, we recommend an integer.
And to simplify querying, store the IP twice - once as a varchar, and once as an integer, and use whichever is most convenient for your query.
Want to discuss this article? Join the Periscope Data Community!