Feature Analysis with Mean-Time-Between-Usage Calculation
In Periscope Data, you can apply filters to charts in your dashboard. Filters allow you to change certain values in the underlying SQL, such as date range. Users can also create a default set of filters they want to apply to the dashboard, so that whenever they access the dashboard, the charts will first display with your default filters applied.
Our engineering team refers to this default set as the User Data View (UDV). Recently, we wanted to evaluate how people use this feature and see if there are any performance optimizations we could make based on the usage.
Finding Percentage of Users Using the Feature
At Periscope, we like to use data to see how our users are using our product. We put a lot of such data in the “feature log.” From this log table, we are going to find out what percentage of our active users created or changed their UDV in the last 30 days.
We call the activity of creating or changing a UDV “udv-change” in the log. We can find the number of active users by counting the number of unique users that had “focused_tab” as true. We obtained the data by querying this log in our redshift database:
not_udv_users as (
1 as id,
feature = 'udv-change'
and created_at >= (getdate() - interval '30 day')
and created_at < getdate()
, active_users as (
1 as id,
count(distinct user_id) as count
where created_at >= (getdate() - interval '30 day')
and created_at < getdate()
and focused_tab = true
cast(not_udv_users.count as float)
/ cast(active_users.count as float)
on not_udv_users.id = active_users.id
We also wanted to find out how often people change UDVs. In order to do this, we wanted to calculate mean time between usage (MTBU) - a term borrowed from mean time between failure (MTBF), a common measure of resiliency of a system in engineering.
We want to put the two adjacent event times in the same row so that we can calculate the difference between these times. Let’s first join the feature_log table with itself. We call the first one a, and the other b. We left join a with b on user_id, dashboard_id and when a.created_at < b.created_at. This result will give you an intermediate result such as:
If we picked the minimum of b_time per user_id, dashboard_id, each row will have adjacent times.
By calculating the difference between these two times, we can get the time between all the adjacent events of UDV change.
The following query generates a distribution of these times:
with times as
(select 1 as join_col,
a.created_at as earlier_time,
datediff(s, a.created_at, min(b.created_at)) as time_to_change
from feature_logs a
left join feature_logs b
on a.user_id = b.user_id
and a.dashboard_id = b.dashboard_id
and a.created_at < b.created_at
where a.feature = 'udv-change'
and a.created_at >= getdate() - interval '29 day'
and a.created_at < getdate()
and b.created_at is not null
group by a.user_id, a.dashboard_id, earlier_time),
(select 1 as join_col, count(1) as total_count from times)
case when time_to_change between 0 and 10
then '1: 10s'
when time_to_change between 11 and 60
then '2: 1min'
when time_to_change between 61 and 300
then '3: 5mins'
when time_to_change between 300 and 3600
then '4: 1hour'
else '5: >1hr'
cast(count(1) as float) / cast(total.total_count as float)
left join total
on times.join_col = total.join_col
group by 1, total.total_count
order by 1
The histogram above shows that UDVs are changed within 10 seconds of each other 7.4% of the time. This probably indicates that users are just playing with the filter set, rather than are actually looking at the charts applied with the filter.
Users update UDVs within a minute 26% of the time. These users are also not likely to be actively analyzing the data that returned. While most dashboards in Periscope are fast to update, some dashboards with a lot of charts or complicated SQL queries might take longer than a minute to update.
So by the time the user switches to a new filter set, some charts might not have finished updating.
Because Periscope updates charts by the underlying SQL rather than each chart object, changing the filters multiple times could create a lot of different queries on the Periscope backend, even when they are on the same chart.
This implementation has a lot of advantages related to performance and security in the Periscope backend. But based on the data we have gathered, we realized that we will be able to reduce the load on our backend servers as well as customers’ databases by detecting when filters are changed at such short intervals and keeping the earlier updates from being executed.
We have implemented a check that skips updating on SQLs for charts that users are no longer interested in and ran it on our own Periscope account (yes, we use our own product to analyze data at Periscope).
So far we have seen a number of cases where we could update the charts that users are expecting while reducing the number of queries we run in the backend with this check. A win-win for both our users and our servers!