Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Cross-Database Joins

November 17, 2014

In celebration of our friends at Segment launching Segment SQL last week, we’re writing on a topic of particular interest to our mutual customers: Cross-Database Joins.

All The King’s Databases

It began with the best of intentions: You launched your first web app for your customers, backed by a database full of transactional data to analyze. In time you added a read replica, and replaced Excel with an awesome visualization tool to go with it.

Now you’re launching your first mobile app. You want SQL access to the underlying data store, but building a server to receive pings is much too difficult. So you make great use of a fabulous event-tracking analytics solution.

But now your data is in two places. What if you want to know whether your iOS users are big spenders? You’d need to slice monthly iOS users in your mobile app database by payment plan information in your web app database.

Luckily, there is a solution: cross-database joins.

Cross-Database Joins

Let’s start by counting our iPhone MAUs, as measured on the Segment database:

select date_trunc('month', created_at), count(1)
from iphone_production.session_started
group by 1

We’re counting a user as active in a given month if they’ve started a session in that month. This query gives us a graph like this:

Now we just need to bring our payment plans into the chart. This is where the magic happens. We’ll join in the users table on our web database, and slice the query by users.payment_plan:

  date_trunc('month', created_at),
from segment.iphone_production.session_started
join web_prod.users
  on = session_started.user_id
group by 1, 2

Note that we now need to fully qualify the tables in the from and join clauses with their database names: segment and web_prod.

And our hard work pays off! Here we can see our iPhone MAUs sliced by payment plan:

How It Works

Cross-database joins, as well as Periscope’s query speeds, are enabled by our Postgres-based data cache. Each customer’s data is stored in the same database, with one schema per (database, schema) pair.

This architecture allows us to run exactly the query you wrote, with some simple rewrites to make the query valid. Here’s the rewritten query:

  date_trunc('month', created_at),
from db_1234_iphone_production.session_started
join db_1235_public.users
  on = session_started.user_id
group by 1, 2

In this example, your segment database’s iphone_production schema is translated to the db_1234_iphone_production schema in Periscope’s data cache. And web_prod’s (unspecified) public schema is translated to the db_1235_public schema. The rest of the query remains the same!

Start Exploring

If you want simple event tracking that gives you SQL access to the underlying data, Segment SQL is your answer. If you want to take it a step further and join that data to all your other databases, sign up for Periscope.

Want to discuss this article? Join the Periscope Data Community!
Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?

Subscribe to our Newsletter