No items found.
Analysis

A Technique to Drive Data-Informed Product Management

Hear Scott Castle speak about the rise of data teams and the use of analytics to grow a business at our Startup to Unicorn event in Tel Aviv on December 10. If you're in Israel, register now!

Do you make guesses about how users are engaging with your product? Before we had the ability to analyze detailed information about usage, we might have observed a few people operating the product and made anecdotally informed guesses about how the user dynamics worked. But in today’s data data-rich environment, we can combine quantitative datasets with a little SQL to practice data-informed product management.

For example, when planning sales trials activities we want to understand which features drive pre-sales interest and which ones resonate more after deployment. This lets product management plan the ideal demo environment to achieve the highest possible trial win rate. 

At Periscope Data, we coach product managers to learn which features deliver a positive impact on trials — will users want to sign up because they’ve seen this awesome new feature? — and which will drive post-sale upgrades, capabilities a user will most likely appreciate once they’ve gotten into production.

Synthesizing product usage data into a picture of market dynamics is hard - but we can make a big impact with even a modest analysis.  Let’s take a look at an example feature and see if we can build a model that answers whether it’s a pre-sale or post-sale winner.

In Periscope Data, users can fetch data via SQL and then further transform that data using Python before visualizing it. To understand the user dynamics of this feature, we can query our usage data and get a list of all charts that use Python post-processing:

select
count(widgets.id)
, [widgets.created_at:pst:date]
from
 vw.widgets
where
 widgets.code is not null
 and (widgets.code_language = 'python')
 and [widgets.created_at:pst:date] > '2017-10-01'
group by 2

Let’s look at what’s happening here.  The widgets table contains all charts, and we want a count of how many were created each day.

The bracket notation around widgets.created_at uses a built-in Periscope Data operator that aligns the datetime to PST and reports it by day — it’s a convenience operator that expands (in AWS Redshift syntax) to :

date_trunc('day', ((widgets.created_at + interval '-8 hour'))::timestamp)::date)

Our where clause applies constraints to identify the charts we care about: those which include a post-processing step written in Python and (in order to reduce query run time) a date range that only finds records created after the Python integration feature was released. We’ll take records that match these criteria and group them by date.

Let’s check that result in Periscope Data (with limit 10 applied to get a small sample), to make sure we’re getting what we expect:

This list of charts is useful because it gives us a history of everything relevant that was created in Periscope Data. We’ll want to do a bit more filtering to make sure we’re only capturing charts created by paying customers. To do that, we’ll need information about the customer account associated with each chart (to filter out non-paying trial accounts) and the creator email address (to filter out charts created by Periscope Data engineers).

That data isn’t stored in the widgets table directly, but we can uncover it with a handful of joins to other data models.

select
count(widgets.id)
, [widgets.created_at:pst:date]
from
 widgets
 join users on
   widgets.user_id = users.id
 join dashboards on
   widgets.dashboard_id = dashboards.id
 join account_sites_mapping on dashboards.site_id = account_sites_mapping.site_id
 join account_details on account_sites_mapping.account_id = account_details.account_id    
where
 widgets.code is not null
 and not users.email_address like '%periscopedata.com'  
 and (code_language = 'python')
 and [widgets.created_at:pst:date] > '2017-10-01'
 and account_details.mrr > 0
group by 2

In the query above, we’ve joined widgets with the users table (where we’ll get email addresses) — this one is easy. Getting to our account records is a bit harder. To do that, we’ll need the foreign key site_id, which we have in the dashboards table, and we’ll use a foreign key lookup table called account_sites_mapping to make the leap from site_id (a product identifier) to the account_id of the account record in our CRM model.

Using these additional tables, we’ll filter out Periscope Data employees:

and not users.email_address like '%periscopedata.com'

And accounts that aren’t paying customers:

and account_details.mrr > 0

Now we’ve got a clean list of widgets that were created by real users, have Python post-processing steps, and have grouped them by date. This data by itself doesn’t tell us enough to decide whether Python support is a pre- or post-sale driver though — we need to look at chart creation in the context of sales opportunities in order to determine what kind of dynamics this feature has.

To start including this information, we take our original query and join it with account and opportunity data.

with widgets as (
 select
   account_details.account_id
   , [widgets.created_at:pst:date] as widget_creation_date
   , case
     when account_details.last_closed_won_upsell_at is not null
       then [account_details.last_closed_won_upsell_at:pst:date]
     else [account_details.last_closed_won_at:pst:date] end as last_sold
   , [opportunity.close_date:pst:date] as close_date
 from widgets
   join dashboards on widgets.dashboard_id = dashboards.id
   join account_sites_mapping on
     dashboards.site_id = account_sites_mapping.site_id
   join account_details on
     account_sites_mapping.account_id = account_details.account_id
     and account_details.mrr > 0
   inner join opportunity on
     account_details.account_id = opportunity.account_id
     and opportunity.is_won is true
     and opportunity.close_date > '2017-12-01'
   join users on widgets.user_id = users.id
 where
   widgets.code is not null
   and not users.email_address like '%periscopedata.com'  
)

With this query, we’re pulling in data that we’ll use to associate usage with opportunities — the date of the last closed won opportunity with each account that has usage. In the data model we use that tracks account details, we have a last_closed_won_upsell_at field that is null if an account has never purchased an expansion, so we use a case when to create a last_sold field we can use to identify which account opportunity was the most recently sold one.

Armed with this table of widget creation dates, won opportunity close dates, and accounts, we can now aggregate this information to make use of it:

agg as (
 select
   account_id
 , min(widget_creation_date) as first_widget_date
 , close_date
 , first_widget_date - close_date as separation_close_use
 from widgets
 group by account_id, close_date
)

This CTE calculates the number of days between the first widget created and the date of each account opportunity, giving us an array of ranges. We want to associate usage of the feature with the closest proximity opportunity, so we’ll use another CTE to select the smallest window between the first widget and a close date.

ranges as (
 select
   account_id
 , close_date
 , min(abs(separation_close_use)) * case when
     abs(min(separation_close_use)) > min(separation_close_use) then -1 else 1
     end as bought_within_days_of_use
 from agg
 group by
   account_id
 , close_date
)

There’s a little fancy footwork here using a case when to identify the smallest absolute gap without losing the sign of separation_close_use, which we’ll use to differentiate between pre-close and post-close usage, but otherwise this is a straightforward aggregation.

Finally, we select ranges that are relevant to our analysis:

select * from ranges
where abs(bought_within_days_of_use) < 60

Think carefully about what maximum range size you want to use to measure your feature here  — what you choose should relate to your average sales cycle. If your business uses six-month POCs, you’ll want something more like 180 days (or realistically, 210+) for your range rather than the 60 days I use here; if your sales cycles involve a seven-day trial, 60 days is way too long.

Here’s our sample result dataset:

Visualizing this dataset as a bar chart with separation days on the Y axis gives us a really nice picture of our dynamics:

This chart is a clear indicator of a feature that, when experienced in trial, drives wins.  Most of our opps are closing after first use — so the users are discovering it in trial — and relatively few are buying the product before using Python.

Looking at our statistical summary of ranges, we can confirm that our feature definitely gets used in trial, pre-purchase, and further, that the feature is one of the first things tested in most evaluations. Given this information, a product manager designing a new trial walkthrough would do well to showcase this as a key feature.

Data-informed product management is more than simply calculating growth rates and TAM,  it’s the practice of applying contextual information to solve business questions. It’s powerful because it takes a lot of the guesswork (and opinion) out of product decisions, leading to consistently improved outcomes.

Tags: 

Want to discuss this article? Join the Periscope Data Community!

Scott Castle
Scott is the head of product at Pericope Data. He brings over 20 years of experience in software development and product management at leading technology companies including Adobe, Electric Cloud, and FileNet.