What You Need To Know About SQL's Group By

SQL’s group by is both common in SQL statements and just complicated enough to throw the occasional SQL writer for a loop. This means it’s critical to understand it well to write accurate SQL. In this post we will explore it’s basic use, some risks to be aware of, and some of the interesting oddities associated with it.

Read More

SQL Text Analysis with Donald Trump’s Tweets

There were many unique aspects of the 2016 U.S. presidential election, with one of the most interesting being Donald Trump’s use of Twitter. As luck would have it, tweets happen to be very easy to extract and put into relational database format. Being the data junkies that we are, we couldn’t resist downloading President Trump’s entire tweet history and loading it into Periscope Data. It’s an excellent opportunity to slice, analyze, and quantify presidential communications while also exploring methods for analyzing text in SQL at the same time.

So let’s get started.

Read More

Net Promoter Score and Survey Analysis in SQL

Introduced by Fred Reichheld in 2003, Net Promoter Score (NPS) is a simple method for measuring the likelihood your customers will recommend your product or service. It requires that you survey respondents and ask one simple question:

“How likely are you to recommend [Company/Product/Service] to a friend or colleague?”

Typically respondents are presented with a scale from 0 to 10 to answer this question. Here’s how the NPS question might appear on one of our own customer surveys:

0-10 scale

Read More

How Joins Work

The SQL join operation is one of the most powerful and commonly used SQL operations, but little attention is paid to how the internal SQL engine breaks down the tasks of join operations. This post will explore the common algorithms that databases use to compute them, including nested loop, hash, and merge joins. Our aim is to act as a resource for SQL users interested in exploring optimizations.

Read More

Interactive Analytics: Redshift vs Snowflake vs BigQuery

Every day customers ask us what the best data warehouse technology is for handling the workload of their growing analytics department: they want to run queries at interactive, real-time speeds at a price-point that fits in their budget. We’ve helped our users with everything from tried-and-true MySQL servers, day-old Postgres boxes, and fifteen year old SQL Server deploys. We help customers work with their production databases and modern cloud analytics solutions like AWS Redshift, Snowflake and Google BigQuery.

Our customers want to know where to get the fastest query times, how much data their data warehouse can handle, how much it will cost them. We want to make sure we provide the best recommendations to our users when they ask for our help navigating the data warehouse space, so we put Redshift, Snowflake, and BigQuery to the test.

After investigating Redshift, Snowflake, and BigQuery, we found that Redshift is the best choice for real-time query speeds on our customers’ typical data volumes. In this post, we will walk through our analysis of these three data warehouse solutions and the compelling use cases we found for each of the technologies.

Read More

Data Warehouse Management: Redshift, Bigquery, and Snowflake

In our post comparing Redshift, BigQuery, and Snowflake on query performance and cost for interactive analytics, we looked at the trade-offs across different data warehouses from a performance perspective.

BigQuery, Snowflake and Redshift all have web based consoles where you control your data, clusters, user management, query logging and system analytics. In this post we will visit the management console of each system and the supporting tools for managing your warehouse deployments.

Read More

Tracking KPIs using SQL for Effective Revenue Growth Analysis

Tracking revenue is crucial to understanding the health of your business. Whether your long-term growth is declining, has plateaued, or is heading upwards helps you define your long term strategy and goals. Many sales cycles are biased towards weekends, beginning-of-month, or end-of-quarter deals, and it can be hard to know day-to-day if your sales numbers are trending on target.

At Periscope we track several KPIs as they progress over the course of the month. In this post, we are going to look at monthly recurring revenue (MRR) historically across the year, and segment each month by its progress up to the current date.

Read More

SQL Symbol Cheat Sheet

When you are new to programming in SQL, you will come across a lot of hard-to-search-for character operators. If you are prone to forgetting ~ is called tilde, are wondering why there are so many %s in your strings, or have a hard time googling what the (+) symbol in where users.group_id(+) = group.id, this guide is for you.

Read More

The Best Redshift Optimization Blog Posts of 2016

At Periscope Data we are huge fans of AWS Redshift because it empowers our users to run queries blazingly fast. Beyond the success of Redshift the product is the ecosystem that has grown around the platform: dozens of companies helping you deploy and optimize your Redshift clusters and a myriad of resources for maximizing your query speed.

Here are our six favorite blog posts on optimizing Redshift performance to come out of 2016.

Read More

Calculating Proportional Values in SQL

SQL is great for aggregating: counting, summing, and averaging numbers are straightforward tasks in SQL. When looking at a distribution, it is often useful to look at them in context and compare the relative size of each part of the proverbial pie.

Read More