This post originally appeared on Towards Data Science
Data Informed Rolling Forecasts
Leveraging data science tools at a start-up to drive product-led growth by collaborating with Finance.
Note: this post, originally covered on my blog here, was written while I was VP of Product at FloSports. The goal is to show how the right data science tools and an understanding of fundamental business drivers can support collaboration between product and finance to accelerate product-led growth. At the time of original publication, we were a Series B sports technology company that had nearly 25 different sports at various stages of their life cycle.
One new initiative that I spearheaded at FloSports is what I call a “Data Informed Rolling Forecast”. In this post, I walk through at a macro level how this was accomplished and how Periscope Data was invaluable in helping us achieve a pacing dashboard — with limited to no engineering time required.
Below is the list of ingredients that were necessary for me to accomplish this:
- An understanding of the key drivers of your business and economic model
- A query-able database (our reporting warehouse resided on Redshift) to conduct historical benchmarking
- Business intelligence tools which facilitate database querying and extraction
- A bottoms-up financial model template, ideally scenario and case based (e.g., only growth from existing business versus existing plus high probability to close business), which builds off of historical actuals from #2 and forecasts expected results for the next quarter
- Business intelligence tools which facilitate the creation of a pacing dashboard
Part 1: Understanding key drivers and benchmarking
When I first arrived at FloSports, one of my first action items was to understand our existing KPIs; this included everything from understanding what, why, and how metrics were tracked. We built a data warehouse from scratch and the initial data source that we would ETL into our new Redshift warehouse was our Stripe payments data. To kick this off, we sat down with a data science consultant. He was familiar with our business given his involvement in helping us prepare for our Series B round. As part of that process, and without going into too much detail, we mapped out all of our existing subscription plan offerings and the movement across plans and sports that our subscribers exhibited. Based on this, we defined our subscriber types and plan intervals. As an example, we mapped out our different subscriber types based on the type of payment made combined with where the subscriber was in his / her lifecycle with our business. For example:
- New: subscriber makes a payment for the first time for a particular sport’s subscription
- Reactivation: subscriber, who has been a subscriber prior to previously canceling, returns and makes a payment for a particular sport’s subscription
- Recurring: subscriber makes a recurring payment for his / her particular sport’s subscription
Aggregate Fact Tables. Given the range of our sports and plan offerings, queries could become rather complicated, especially for sports with multi-year histories and lots of activity from both new and existing subscribers. Given this, our most common reporting use cases were generally converted into aggregate fact tables. These often combined two, three or more different data sources. We built aggregate fact tables for key performance metrics including subscriber waterfalls (monthly grain) and payments (daily grain). Leveraging these data rich fact tables and a flexible BI tool like Periscope Data, we were able to create a historical benchmarking dashboard, filterable by sport. We provided this to relevant business stakeholders, who could request data based on select filters and download CSVs to load into financial models.
Part 2: Building the bottoms-up financial model template
Given the reality that we had 25 sports, one of my primary goals in building our bottoms-up financial forecast model for each sport was to make the model dynamic and flexible. This facilitated limited, if any, rewiring of the model for each sport as the model flexed different launch dates and historical performances, e.g., sub retention, for each sport.
Some sports had been around for several years, whereas others only had a few months of historical data. Rather than building each model to conform to an individual sport’s history, and also needing to rebuild each model every quarter, I spent a lot of upfront time figuring out how to best flex the model. Examples of what we dynamically fed through all 15+ worksheets include:
- Subscriber cohort retention waterfalls
- Weighted average retention curves across all historical cohorts with projected future churn based on decay rates
- Historical plan selection mix, e.g., % monthly / yearly
- Subscriber corkscrew build over time — beginning plus new less churned equals ending (monthly grain); (also helpful for MRR by plan offering and sport)
- Weekly forecast for the quarter based on historical cohorts and their projected recurring payments; we combined these projected recurring payments with forecasted new and reactivated subscriber payments
The model’s flexibility is generally accomplished by determining which input drivers require dynamic formulas; in this example, inputs such as vertical launch month, last month of actuals, and date quarter starts allowed me to key the model off of the unique aspects of each sport. This also allowed me to tailor the model to each sport’s history with minimal manual rewiring.
For those familiar with financial modeling, you know that you can make use of scenarios and cases to sensitize for various potential outcomes. One example could be, “what if we close a large rights deal in the 2nd month of the quarter”, while all of our key inputs, e.g., monthly price, remain the same. To account for this possibility we would create a scenario, using the offset function in Excel, that allows us to select whether or not we want the impact of this potential outcome to change certain inputs and then flow through the model. On top of this, we could combine the “Large New Event” scenario with a “Bull” case → “what if we close the large event and the % yearly mix for the quarter is 500 bps higher than we’ve historically seen, due to a more robust live events schedule for this sport?”
While each sport had an individual model, within each I added a worksheet that rolled up all of the key reporting metrics across the various worksheets as rows (weeks) and columns (metrics), which could then serve as a flat, query-able table → combining all of these models’ sheets into a single CSV served as the data source for our pacing dashboard. By doing so, we accomplished the following:
- Visualizations (with downloadable data) that reflect the quarter’s goals; much easier for business users to see comprehensive goals and the drivers for each
- A pacing dashboard that provides insight into how the sports are pacing versus forecast, including net revenue, signups, subscribers and churn.
Part 3: Leveraging the right BI tools, aka Periscope Data FTW
I’ve written about Periscope’s data cache before, and it proved to be extremely helpful in this case. We used Periscope’s data cache in several ways. As one example, we accomplished cross database joins — one source in MySQL, another in Postgres, and a third from a flat CSV file. By combining these disparate sources, we created a robust, no data engineering required, dashboard. This dashboard satisfied one of our sports’ subscriber attribution reporting requests in probably 1/20th the time it would have taken to incorporate this our proprietary ETL workflow.
Periscope’s Cache was also helpful in enabling us to quickly build proof of concepts (POCs) for new data sources and receive feedback from business stakeholders; we always built these before requesting data engineering to add new data sources into our ETL workflows. Furthermore, for the business users who are less familiar with requesting their current state data processes to become a part of our ETL and BI reporting processes, Periscope’s cache gave us a low-overhead way to prove out their reporting ideas. After iterating on POCs and then ensuring requirements were ready, this increased efficiency for engineering to understand how to model the data and incorporate the new source(s) into our overall warehouse workflow and schemas.
The Quarterly Dashboard that we built using Periscope was a highly customizable reporting tool for business users which displayed daily, weekly, monthly and quarterly pacing visualizations and tables. This provided visibility for the rights acquisition team, marketers, management and the data analyst teams in terms of where we were beating or lagging our quarterly performance goals — further, we used the dashboard’s reports to trigger areas requiring additional investigations. Some examples included:
- An event exceeded its signups forecast by 1.5x; did we see better conversions through the live event funnel? What content seemed to work well in influencing visitors to convert to paid subscribers?
- We saw higher retention than forecast; what is this sport doing well, is it transferable to other sports, and how do we do more of this?
A sanitized illustration of the Q2 2017 dashboard signups pacing:
I love this type of work because it combines my financial modeling background with data analytics. It’s also gratifying to see that hard work in defining and validating business logic and getting your data strategy right can really support product-led growth. And the folks at Periscope Data were great partners along the way and ultimately were the reason that this highly iterative, low overhead approach was possible!
If you enjoyed this post, it would be awesome if you would click the “claps” icon to let me know and to help increase circulation of my work.