Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Getting Started with Regular Expressions

Getting Started with Regular Expressions
April 14, 2016

Last time we talked about matching strings in SQL, we covered tactics that work well for strings on the simple side. For strings with more complicated patterns, the regular expressions below are a handy tool to have in your belt.

Regular expressions are easy to get started with, so let’s jump right in. Postgres and Redshift follow the POSIX standard for regular expressions, so for this post we will focus on that.

Comparison Operators

Comparison expressions compare strings for relationships.

  • ~ matches a regular expression that is case sensitive. E.g. 'a' ~ 'A' will return false
  • ~* matches a regular expression that is case insensitive. E.g 'a' ~* 'A' will return true
  • !~ does not match a regular expression and is case sensitive. E.g. 'a' !~ 'A' will return true
  • !~* does not match a regular expression and is case insensitive. E.g. 'a' !~* 'A' will return false

OR Operator

The or operator is denoted by a pipe or vertical bar: |. This matches one of multiple alternatives, e.g. 1|2 matches the number 1 or the number 2.

Groups

With regular expressions, we say that a string matches a particular pattern. While you can match exactly one item, or a simple set of alternatives, regular expressions are most useful when we combine multiple groups.

A group is specified with parentheses. These define the scope of the operators. For example, if we wanted to match 111 or 121, the pattern 1(1|2)1 would match that.

Repetition

Being able to match very specific patterns is great, however, often we want to match things that repeat. Below are some great operators used to express how often we want the pattern to repeat.

  • ? matches zero or one occurrences
  • * matches zero or more
  • + matches one or more
  • {n} matches exactly n occurrences of the preceding pattern
  • {n,} matches n or more occurrences
  • {n,m} matches between n and m occurrences

Position Operators

Position operators signal the beginning or end of the search string.

  • ^ matches the beginning of a string. E.g. ^a matches a as the first letter of a string.
  • $ matches the end of a string.

Other Operators

Our last set of operators round out our selection.

  • . matches any character
  • [ ] Square parentheses match any single character contained in the parentheses and can contain ranges. E.g. [a-z] matches any lowercase letter.
  • [^ ] matches any character not contained in the set. E.g. [^123] matches any character that is not 1, 2 or 3.

Example

Suppose we have a campaign column, where we want to find campaigns that do not start with a number, and contain a country code somewhere in the campaign id. To do this, we would use this pattern:

^[^0-9].*[A-Z]{2}.*

Let’s break it down:

  • ^[^0-9] matches to the first character, and says that the first character cannot be a number between 0 and 9.
  • .* matches any number of any character
  • [A-Z] matches any uppercase letter between A and Z
  • {2} says there are two uppercase characters. This is because country codes are two characters.

Note: there are usually many ways to match a string or a pattern with a regular expression, so as an exercise, practice matching things multiple ways. For example, our earlier expression of 1(1|2)1 would match the same set of strings as 1[12]{1}1 and 111|121.

Now it’s time to start matching!

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?