Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Single Event Tables and Common Analysis Queries

Single Event Tables and Common Analysis Queries
April 9, 2015

Event Tables

Single denormalized events tables are increasingly common, especially for storing logs data from mobile clients. These single-table implementations often use JSON blobs to store properties.

A single table is great for quickly ingesting data. Unfortunately, it can also make real-time queries slow and cumbersome.

User-Based Analysis on Denormalized Tables

A lot of data analysis involves aggregating data by user and reasoning about those users. With a denormalized table, you have to extract these user data point separately for each query, which wastes CPU time and analyst time.

For example, calculating paid user retention requires finding all the users who have paid. Then you join that back to your activity table to see those users’ behaviors changing over time.

Since this is a core metric for your businesss, you’ll be running a query like this daily, if not hourly:

with retention_users as (
  select 
    user_id,
    date(min(created_at)) as first_login,
  from events
  group by 1
  having min(created_at) > now() - interval '14 day'
),
user_activity as (
  select
    user_id,
    date(created_at) as day
  from events
  group by 1, 2
  where created_at > now() - interval '7 day'
)
select 
  day
  count(retention_users.user_id) / count(user_activity.user_id)
from user_activity
right join retention_users using (user_id)
where 
  retention_users.first_login + interval '7 day' 
    > user_activity.day 
group by 1

The Solution

Luckily, there’s a solution: Normalizing metadata from events tables. This can make analysis queries much faster.

users table is a great place to start. You’ll want to store some metadata. The details depend on the analyses you’ll be doing, but common cohorts include total spend, platform, marketing channel, join date, and experiment groups.

Here’s an example:

With such a table, your retention query simplifies to something like this:

with user_activity as (
  select
    user_id,
    date(created_at) as day
  from events
  group by 1, 2
  where created_at > now() - interval '7 day'
)
select 
  day
  count(users.user_id) / count(user_activity.user_id)
from user_activity
right join users using (user_id)
where users.first_login + interval '7 day' > user_activity.day 
  and users.age between 18 and 35
group by day, users.spend_level

As a bonus, this version will run quite a bit faster, as you’re not joing the whole events table to itself!

Creating and Updating Your Tables

Depending on your stack and preferences, you have lots of options for creating and updating these tables.

Views

Periscope’s Views feature will let you materialize a table with a simple select statement. You write the statement once, and Periscope updates the view every hour. This is an especially good option if you don’t own the database because your events are stored by a third-party service like Amplitude or Segment.

If you do own the database, certain databases support materialized views, which are a good option.

Frequent ETLs

If you’re already ETLing data into the database, this can provide a natural home for code to create a table. You can also set it up to trigger the table creation when new events are added, keeping the metadata tables perfectly up-to-date.

Database Triggers

If you want the metadata tables to live in your database, but don’t own your ETL, a database trigger works well. The trigger will run when new events are inserted and update your metadata tables as well. Remember to keep them fast, or they’ll bog down your insert statements.

Wrapping Up

Denormalized logs tables can be convenient when you’re logging the data. Just remember to set up normalized metadata tables on the other end!

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?