## Net Promoter Score and Survey Analysis in SQL

Introduced by Fred Reichheld in 2003, Net Promoter Score (NPS) is a simple method for measuring the likelihood your customers will recommend your product or service. It requires that you survey respondents and ask one simple question:

“How likely are you to recommend [Company/Product/Service] to a friend or colleague?”

Typically respondents are presented with a scale from 0 to 10 to answer this question. Here’s how the NPS question might appear on one of our own customer surveys:

After collecting results, calculating NPS is easy. A response of 0 through 6 categorizes respondents as ‘Detractors’, a 7 or 8 as ‘Passives’, and a 9 or 10 as ‘Promoters’. Your Net Promoter Score is a measure of the ratio of Detractors to Promoters. This is calculated by subtracting the number of Detractors from the number of Promoters, dividing the result by the total number of survey responses, and then multiplying by 100 to turn the result into an integer. NPS scores have a range of -100 to 100.

That’s a little wordy, here’s how we like to think about it:

This gives us a clear quantitative measure of the likelihood to recommend, but what about the “why?” Most NPS surveys also collect qualitative data to pair with the quantitative measure. Something along the lines of a question like this:

The majority of the Fortune 500 measure NPS for a very good reason: moving people from Detractors and Passives to Promoters improves business outcomes. It provides a simple mechanism for companies to bring the voice of the customer into product and business decisions, leading to reduced churn and increased customer satisfaction.

In this post we’re going to talk about how you can analyze numeric and text NPS results using SQL. The latter can be especially difficult - text analytics software is expensive, often difficult to implement in existing ETL processes, and can be challenging to understand. We’ll provide clear approaches to successfully tackling both.

### NPS at the Hotel California

We collected 2,700 hotel reviews for the Hotel California from TripAdvisor for our sample dataset. Each review rates the hotel between 1 and 5. We map these reviews into NPS terminology using 5 as a Promoter, 4 as a Passive, and 1-3 as a Detractor. A bit different from above, but the same principles apply.

The table we will work from has 4 columns: Name, Date, Verbatim, and Review. It is called hotel_reviews and one row looks like:

### Simple Charts

The first thing you need to do is partition your data by the review score to take a look. All of our further analysis will be based around our segments, but we’re analysts and we like raw data.

And voila:

We can see that our hotel has pretty good scores! Our NPS will certainly be high, but to obtain the final score we’ll need to bucket our data into the NPS segments - Promoters, Passives, and Detractors. We can see our segments with a case statement:

In order to calculate our NPS we will need to do some math. This is a simple operation that can be done in SQL.

First bucket the surveys and assign a value of 1 to Promoters, 0 to Passives and -1 to Detractors. This can appear to be an odd skew, but treating Detractors and Promoters as equal but opposite is implicit to the metric. We can now calculate NPS on the fly like so:

We can now see our hotel’s NPS is a respectable 48. NPS should be compared carefully across verticals, but a quick look at the NPS of some common companies can provide some frame of reference. Apple has an NPS of 72 and Comcast an NPS of -3. Those are reasonable boundaries for our expectations.

If we add a date column to our query we can examine our scores over time. We have also added a count column so we can see how many survey responses we have been collecting. As you can see, NPS appears to be trending slightly upwards, although 2005 and 2006 were rough years!

A similar query can show you how your relative proportions of Promoters and Detractors are changing over time. NPS changes as people change segments. You can improve NPS by moving your customers from Detractors to Passives, or Passives to Promoters. Both have value and understanding the mix-shift can help craft a complete NPS strategy.

We know that our score is reasonably good, and that it seems to be increasing. Everyone has patted themselves on the back, but now it’s time to focus on next steps. Luckily Hotel California has asked customers why they provided the scores they did.

### Text Analysis

Now that we have a grasp on our NPS numbers, it’s time to examine the wealth of text data we have collected. This is much more challenging. It requires more data preparation and more human interpretation. Survey response volume also matters - the more survey responses the better.

First, it’s important to clean your text responses. For our purposes this means:

1. Making the text lower case. This will reduce work and computation later on by allowing simpler joins across tables.

2. Removing grammar and numbers. These characters provide context for humans but for our analysis will cause errors. For instance, we want ‘food’ and ‘food!’ to be treated in the same fashion. This will allow us to provide accurate counts of word frequency in our next step.

3. Correcting spacing, so that multiple consecutive spaces become a single space. We will use spaces to count words below. Wait for it……it’s clever!

We used Redshift’s regex functionality to achieve each of these goals. The inner statement makes the verbatim lower case and removes everything but letter characters. The second takes consecutive spaces and reduces them to a single space. If you would like a refresher on regex syntax, we have a blog post to help you. The code used for this exercise was:

The iterative process of refining your selection is made much easier using SQL Snippets with Periscope Data by maintaining a standardized set of code which can be improved over time.

I put this into a SQL snippet called [verbatims_clean], so I could reuse it quickly and easily. Here is a row, pre and post-cleanup. The first is easier to read for us, but the second is easier to work with in SQL.

The first thing you can do is add the ability to filter based on words. Filters can be applied to the clean verbatim, and the original verbatim can be displayed. In the incredibly flexible Periscope Data Filters, this looks like:

This will be done differently in different visualization systems, but the core principle remains the same. By filtering based on a cleaned verbatim set, you can return comprehensive results to the user.

Once you have these cleaned verbatims, you can break them down into an even more interesting table, with one row per word per verbatim. We can now perform operations based on individual words rather than on sentences. This can help create powerful visualizations and can even be used for semantic analysis and categorization, topics we plan to cover in a future post.

The table should then look like:

The SQL to perform this operation looks like so, and will be further explained below:

– Here we will use the length function to count the number of words in each text response. After our regex clean up step, each text responses will have words equal to the number of spaces, plus one. For example, ‘live long and prosper’ has 3 spaces and 4 words.

1. In word_parse we are looking to count the number of words in each text response. We subtract the length of the cleaned verbatim with spaces removed from the length of the cleaned verbatim with spaces, plus one. This gives us the number of words in the sentence. To do this, it is important you use regex to strip all whitespace down to a single character, as double spaces would lead to inaccurate word counts.

2. We join to a table called ‘numbers’. This is a sequence of all numbers from 1 to 1 million, used in interesting cases like this. Here, it allows us to split the verbatims based on the number of words per sentence. Make sure this table is properly indexed, or sub-select a much smaller set of numbers! Read more about sequence tables here.

3. In the final part of the query I remove every word defined as a ‘stop word’. A stop word is any word that is frequent but provides little value to semantic analysis. They are essentially grammar. A list can be found here: http://www.lextek.com/manuals/onix/stopwords1.html
This list should be kept up-to-date with words related to your own business. For instance, your company name is a stop word which will be in every verbatim but does not provide any information.

Now that you have all your text broken into individual words, you can create more interesting visualizations. For example, the following query will give you the word ranked by frequency of appearance among people who give you different scores:

Keeping in mind that a low rank means a high frequency of occurrence, this chart can quickly show great insights and tell you where you need to focus attention. In the following chart, we have the frequency rank along the Y axis and the review score on the X axis. Here we can see that ‘food’ is highly ranked among low scores, whereas ‘staff’ is the opposite. The staff are doing great, but it’s time to begin working on the kitchen!

### Wrap Up

In a few minutes we calculated Hotel California’s Net Promoter Score, examined their history, and began to explore some of the deeper reasons why they aren’t beating the Ritz. This is valuable insight, and will help drive the goals of potential Circle-back Projects, Rewards Programs, Tactical Churn Groups, or Voice of Consumer Product Development Task Forces, some of which are certain to follow.

Bring this together and you will have an awesome, filterable dashboard to explore your survey data. Have fun, and Happy Periscoping!

Thank you