Periscope Data
The world’s fastest visualization platform for data analysts.

The High-Performance SQL Blog

Feature Analysis with Mean-Time-Between-Usage Calculation

June 24, 2016

In Periscope Data, you can apply filters to charts in your dashboard. Filters allow you to change certain values in the underlying SQL, such as date range. Users can also create a default set of filters they want to apply to the dashboard, so that whenever they access the dashboard, the charts will first display with your default filters applied.

Our engineering team refers to this default set as the User Data View (UDV). Recently, we wanted to evaluate how people use this feature and see if there are any performance optimizations we could make based on the usage.

Finding Percentage of Users Using the Feature

At Periscope, we like to use data to see how our users are using our product. We put a lot of such data in the “feature log.” From this log table, we are going to find out what percentage of our active users created or changed their UDV in the last 30 days.

We call the activity of creating or changing a UDV “udv-change” in the log. We can find the number of active users by counting the number of unique users that had “focused_tab” as true. We obtained the data by querying this log in our redshift database:

with
  not_udv_users as (
        select
        1 as id,
          count(distinct user_id)
        from
          feature_logs
        where
          feature = 'udv-change'
          and created_at >= (getdate() - interval '30 day'
          and created_at < getdate()
  )
  , active_users as (
    select
    1 as id,
       count(distinct user_id) as count
    from time_on_site_logs
      where created_at >= (getdate() - interval '30 day'
      and created_at < getdate() 
      and focused_tab = true
  )
select
  1 - 
  cast(not_udv_users.count as float) 
  / cast(active_users.count as float)
from
  not_udv_users 
join active_users 
on not_udv_users.id = active_users.id

We also wanted to find out how often people change UDVs. In order to do this, we wanted to calculate mean time between usage (MTBU) - a term borrowed from mean time between failure (MTBF), a common measure of resiliency of a system in engineering.

We want to put the two adjacent event times in the same row so that we can calculate the difference between these times. Let’s first join the feature_log table with itself. We call the first one a, and the other b. We left join a with b on user_id, dashboard_id and when a.created_at < b.created_at. This result will give you an intermediate result such as:

If we picked the minimum of b_time per user_id, dashboard_id, each row will have adjacent times.

By calculating the difference between these two times, we can get the time between all the adjacent events of UDV change.

The following query generates a distribution of these times:

with times as 
(select 1 as join_col, 
a.user_id, 
a.dashboard_id, 
a.created_at as earlier_time,
datediff(s, a.created_at, min(b.created_at)) as time_to_change
from feature_logs a
left join feature_logs b
on a.user_id = b.user_id
    and a.dashboard_id = b.dashboard_id
    and a.created_at < b.created_at
where a.feature = 'udv-change'
    and a.created_at >= getdate() - interval '29 day' 
    and a.created_at < getdate()
    and b.created_at is not null
group by a.user_id, a.dashboard_id, earlier_time),
total as 
(select 1 as join_col, count(1) as total_count from times)
select 
    case when time_to_change between 0 and 10
        then '1: 10s'
    when time_to_change between 11 and 60
        then '2: 1min'
    when time_to_change between 61 and 300
        then '3: 5mins'
    when time_to_change between 300 and 3600
        then '4: 1hour'
    else '5: >1hr'
end,
cast(count(1) as float) / cast(total.total_count as float)
from times
left join total 
on times.join_col = total.join_col
group by 1, total.total_count
order by 1

The histogram above shows that UDVs are changed within 10 seconds of each other 7.4% of the time. This probably indicates that users are just playing with the filter set, rather than are actually looking at the charts applied with the filter.

Users update UDVs within a minute 26% of the time. These users are also not likely to be actively analyzing the data that returned. While most dashboards in Periscope are fast to update, some dashboards with a lot of charts or complicated SQL queries might take longer than a minute to update.

So by the time the user switches to a new filter set, some charts might not have finished updating.

Because Periscope updates charts by the underlying SQL rather than each chart object, changing the filters multiple times could create a lot of different queries on the Periscope backend, even when they are on the same chart.

This implementation has a lot of advantages related to performance and security in the Periscope backend. But based on the data we have gathered, we realized that we will be able to reduce the load on our backend servers as well as customers’ databases by detecting when filters are changed at such short intervals and keeping the earlier updates from being executed.

We have implemented a check that skips updating on SQLs for charts that users are no longer interested in and ran it on our own Periscope account (yes, we use our own product to analyze data at Periscope).

So far we have seen a number of cases where we could update the charts that users are expecting while reducing the number of queries we run in the backend with this check. A win-win for both our users and our servers!


Want to discuss this article? Join the Periscope Data Community!
Tags: 
Haven't tried Periscope Data yet?
Start a trial and we’ll send you one of our famous coffee mugs.
Read More
Haven’t tried Periscope Data yet?

Subscribe to our Newsletter