Unified Business Intelligence Starts With Your Data Warehouse

Your data warehouse is where all your data comes harmoniously together, breaking down the silos that crop up as a business grows. Whether it’s product usage, sales pipeline or marketing performance data it needs to be available in one secure location with the ability to provide fast, insightful analytics. That’s why we built the Periscope Data Warehouse.

Read More
Using Python to Write a Create Table Statement and Load a CSV into Redshift

For wide tables, it can be a pain to write those long create table statements and load the data into Redshift. Here is some simple Python to get the job done using the 2017 Stack Overflow survey.

Read More
SQL Query Order of Execution

The SQL order of execution defines the order in which the clauses of a query are evaluated. Let's look at how your query SQL executes, and how you can use that knowledge to write faster SQL

Read More
How to Get Started With AWS Spectrum in Minutes

AWS Redshift Spectrum is Amazon’s newest database technology, allowing exabyte scale data in S3 to be accessed through Redshift. This is not simply file access; Spectrum uses Redshift's brain, deploying workers by the thousands to order, join and aggregate your data before sending the minimum amount of data needed back to your Redshift cluster to finish your query. Learn how to get started!

Read More
How Joins Work

The SQL join operation is one of the most powerful and commonly used SQL operations, but little attention is paid to how the internal SQL engine breaks down the tasks of join operations.

Read More
Data Warehouse Management: Redshift, Bigquery, and Snowflake

In our post comparing Redshift, BigQuery, and Snowflake on query performance and cost for interactive analytics, we looked at the trade-offs across different data warehouses from a performance perspective.

Read More
Interactive Analytics: Redshift vs Snowflake vs BigQuery

Every day customers ask us what the best data warehouse technology is for handling the workload of their growing analytics department: they want to run queries at interactive, real-time speeds at a price-point that fits in their budget.

Read More
Converting MySQL Syntax and Functions into Redshift

Because our cache utilizes Redshift clusters, we often receive questions on how to convert a particular query from MySQL syntax to Redshift syntax. While Google turns up a treasure trove of results for syncing a MySQL database into Redshift, it comes up a bit short for showing how to actually translate the syntax of specific queries.

Read More
Concatenating Rows in Redshift, Postgres, & MySQL

Sometimes it’s helpful to look at an aggregated overview of many rows. With numeric columns it’s easy to sum or average many values, but for string columns we need something different.

Read More
Understanding Distributed Analytics Databases, Part 2 — Data Distribution

Last week, we talked about query strategies for distributed analytics databases. In Part 2, we’ll explain how to adjust your data storage to take advantage of distributed columnar architectures.

Read More
Understanding Distributed Analytics Databases, Part 1: Query Strategies

Distribution and columnar storage are core to high-performance analytics databases like Redshift, CitusDB, and Snowflake. Understanding these features will help you write more efficient, faster queries.

Read More
Connecting to a Redshift Cluster with a Private IP

Last week we went through how to set up your Amazon Redshift Cluster. Our instructions covered getting everything set up with a public IP. When your cluster requires a private IP for increased security, the steps for connecting are a little more complicated.

Read More
Setting up Your Redshift Cluster

You’ve done your data warehouse research and have settled on Amazon Redshift. Now you just need to get everything set up. We’re heavy users of Redshift, so this is something we have a lot of experience with.

Read More
How to ETL Data into and out of Amazon Redshift

A key question about any data warehouse, often overlooked in the purchasing process, is how the data gets in and out.

Read More
Explain and Other Tools for Query Optimization

As the datasets you work with get larger, queries that were once lightning fast can slow to a crawl. When this happens, it’s time to optimize.

Read More
4 Reasons Not To Use MySQL For Analysis

Here at Periscope, we see a lot of databases. Many of them work well for their users and were well-chosen for their specific analysis use case. But there’s one antipattern that frustrates analysts again and again: MySQL.

Read More
10 Rules for a Better SQL Schema

There are a lot of decisions to make when creating new tables and data warehouses. Some that seem inconsequential at the time end up causing you and your users pain for the life of the database.

Read More
Optimizing Query Management on Redshift

It’s happened to everyone. Your nightly ETLs didn’t run because someone’s long-running job locked the entire cluster overnight. Now your analysts don’t have the data they need to run daily reports.

Read More
Helpful Redshift Admin Queries

It is easy to treat Redshift as a black box — queries go in, answers come out. When something goes wrong, though, you’ll want to open the hood and see what Redshift is actually doing.

Read More
Redshift User Defined Functions in Python

Today the Redshift team announced support for User Defined Functions! UDFs allow you to create your own SQL functions, backed by the power and flexibility of Python. Now you can add your favorite functions from other databases, or invent new ones, to make your data analysis much easier.

Read More
The Lazy Analyst's Guide To Postgres JSON

JSON support is the most interesting new Postgres feature of the last few years. It relaxes the primary constraint of SQL databases — the rigid schema structure — by letting you store semistructured data in your tables alongside other data.

Read More
Exact Row Counts for All Tables in MySQL and Postgres

Keeping track of your row counts can be helpful for budgeting and capacity planning, yet accurate counts are surprisingly hard to get.

Read More
$3000 Data Warehouse — Redshift vs. Postgres

You have $3,000 a month to spend on a data warehouse. Should you upgrade your read replica or switch to Redshift? If you value fast queries, the answer is Redshift.

Read More
Importing Data into Redshift from MySQL and Postgres

Users are signing up in droves and your app is producing tons of data. MySQL has worked well as a production database, but your analysis queries are starting to run slowly. So you decide to test out Redshift as a data warehouse.

Read More
Counting Comma-Delimited Values in Postgres, MySQL, Amazon Redshift and MS SQL Server.

Every once in a great while, the enterprising SQL analyst is confronted with data that is not relational in nature.

Read More
Demystifying Redshift — What's Up with My Disks?

Columnar stores like Amazon Redshift achieve high speeds by reading only the columns needed to complete a query. The best speedups are achieved when these columns and the intermediate results fit in RAM. These speedups degrade if the intermediate results exceed the available RAM and get written to disk.

Read More
Redshift Maintenance 101

Keeping your Redshift clusters running well requires maintenance. Updating and deleting data creates dead rows that need to be vacuumed, and even append-only tables need to be resorted if the append order is not consistent with the sort key.

Read More
Changing Dist and Sort Keys on Giant Tables in Redshift

Keeping your Redshift fast requires maintenance. As your query patterns change you’ll want to update the dist and sort keys to improve performance.

Read More
Cross-Database Joins

It began with the best of intentions: You launched your first web app for your customers, backed by a database full of transactional data to analyze

Read More
Optimizing Your psql

While it’s tempting to use the latest tools, there’s often a powerful enough solution that already exists. For connecting to and querying a PostgreSQL database, that tool is psql.

Read More
Hashing Tables to Ensure Consistency in Postgres, Redshift and MySQL

One of the main things Periscope does to speed up queries is maintain a cache of customer data. The cache is optimized to serve certain kinds of queries, and benefits f

Read More
Faster Redshift Queries with Materialized Views — Lifetime Daily ARPU

The best way to make your SQL queries run faster is to have them do less work, and a great way to do less work is to query a materialized view that’s already done the heavy lifting.

Read More
2X Your Redshift Speed With Sortkeys and Distkeys

Like a lot of folks in the data community, we’ve been impressed with Redshift, Amazon’s new distributed database. Yet at first, we couldn’t figure out why performance was so variable on seemingly-simple queries.

Read More
Count Distinct Compared on Top 4 SQL Databases

After reading our last post, Use Subqueries to Count Distinct 50X Faster, some good folks at Hacker News and r/programming asked to see our results compared across RDBMS’s.

Read More