Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Marketing Attribution in SQL

Marketing Attribution in SQL
November 6, 2015

One of the most difficult marketing challenges today is attribution. Give too much credit to one marketing channel and you might mask its underperformance. Don’t give enough credit to another, and you could hide your most valuable channel.

Getting The Data Organized

Let’s start with an idealized web logs table that may be familiar to a lot of folks. We’ll have each request’s cookie, URL, referrer and timestamp. Here’s an example:

The first order of business is to attribute marketing channels to individual requests. We’ll do that with a simple case statement: ​

select 
  cookie,
  case 
    when url ilike '%utm_source=adwords%' then 'paid'
    when url ilike '%/blog%' then 'content'
    when referrer ilike '%facebook%' then 'social'
  end as channel,
  timestamp
from web_logs

​ This effectively replaces the raw URL and referrer data with useful channel info:

Now, for each cookie, let’s get the channels in order for easy analysis later. We’ll use Postgres’s array_agg here. For other DBs, avail yourself of one of their alternatives. ​

select
  cookie, 
  array_agg(channel order by timestamp) as channel_list
from channel_web_logs
group by 1

This gives us a nice ordered array of channels for each cookie: ​

Now let’s put some attribution strategies into practice! ​ 

Stack-Rank Attribution

Stack-Rank Attribution credits leads to channels in descending order of the channel’s importance. This is a helpful strategy on small teams when certain channels are much more important than others, and the team needs to make sure the most important channels are performing. ​ Here’s an example traffic count query: ​

select
  case 
    when 'paid' = any(channel_list) then 'paid'
    when 'content' = any(channel_list) then 'content'
    when 'social' = any(channel_list) then 'social'
    else 'uncategorized' end
  end as channel,
  count(1
from cookie_channels 
group by 1

Don’t be confused by the Postgres any function. It just checks whether the element is in the array. Here we assign a cookie to a channel in descending order of channel importance: “paid” if it’s ever come through an ad click, “content” if it’s ever viewed a blog post, “social” if it’s ever come from Facebook. Otherwise it’s “uncategorized”. ​

This is useful for making sure to fully attribute every paid dollar, almost fully attribute every hour spent on content, and not stress too much about other channels. ​ 

First-Touch Attribution

First-Touch is a very common attribution strategy. It’s useful for attributing leads more fairly, and operates under the philosophy that the user would not be here at all were it not for that first channel that brought them in. ​ Here’s the SQL for visitors by channel, attributing via first touch: ​

select
  coalesce(channel_list[1], 'uncategorized'),
  count(1
from cookie_channels 
group by 1

This SQL is even simpler! We simply look at the first channel in our list of channels. If there’s no first channel, i.e. the channel_list is empty, we attribute “uncategorized”. ​ 

Last-Touch Attribution

Last-Touch Attribution is also fairly common, and, just like it sounds, attributes cookies to the last marketing channel that they touched. This strategy is more common in freemium businesses, where free users may come from a variety of channels, but the business value is generated by the channel that converted them to paid. ​ The SQL is very similar to our first-touch SQL: ​

select
  coalesce(
    channel_list[array_length(channel_list, 1)], 
    'uncategorized'
  ),
  count(1
from cookie_channels 
group by 1

We’re using the array_length function here to grab the last element of the channel_list array instead of the first. The rest is the same as above! ​ 

Weighted Attribution

Weighted Attribution seeks to evenly distribute the credit for each visitor to each channel that contributed. If a visitor has touched 4 marketing channels, then each channel should get credit for 0.25 visitors.

This strategy is common in larger marketing organizations, where it’s important to allocate resources fairly based on each channel’s performance. ​ Here’s the SQL: ​

with cookie_weights as (
  select 
    cookie,
    1.0 / count(1) as channel_weight
  from (
    select 
      cookie,
      unnest(channel_list)
    from cookie_channels
  )
)
select 
  channel, 
  sum(channel_weight)
from cookie_channels
join cookie_weights 
  using (channel)
group by 1

Let’s take this query step by step. In the inner select of the cookie_weights CTE, we use unnest to expand the channel array out to one row per channel. Then we assign a weight to each channel in inverse proportion to the number of channels. For example, if a cookie has 4 channels attributed, then its weight is 0.25.

Finally, we join those weights back to our original cookie_channels table and sum the weights per channel. This gives each channel its weighted sum of cookies brought into the product!

Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?