In 2009, Netflix offered $1 million to anyone who could improve the quality of its recommendation engine by 10%. It took two years, but a team finally won. Netflix paid the bounty—then ignored the code.
As it turned out, the enhanced algorithms “did not seem to justify the engineering effort needed to bring them into a production environment.”
Not only did the winning prediction engine fail to scale economically, it also addressed an outdated problem: The shift from mail to streaming during that same two-year window gave Netflix all the data it needed to develop newer, better algorithms.
Predictive analytics, in other words, wasn’t a panacea. Nor, in the decade since, has it become one. But, in 2018, incremental gains no longer cost $1 million either:
- You have more data;
- Storage is cheap; and
- Cloud computing is almost infinitely scalable.
This post details those changes and shows how several businesses—and not just behemoths—have cultivated the predictive analytics landscape.
What’s changed in the last decade?
1. More data, more storage, more computing power
Massive, cloud-based repositories of customer interactions, often called data lakes, are the raw source material for predictive analytics applications.
Many companies have taken advantage of cheap cloud storage to stow away data for years—without even considering its potential use. (How many neglected data points do you have in Google Analytics, Google Ads, MailChimp, Marchex, Stripe, and similar services?)
That dual growth in scale—of data collected and accessibility to it—has solved two primary challenges of predictive analytics implementation.
Historically, raw computing power has been the other. As Andrew Pearson of Intelligencia notes, “Without significant hardware investments, predictive analytics programs either weren’t possible or too slow to be useful.”
That, Pearson continued, has also changed: “Cloud-based analytics systems have added massive computer power into the mix.” Increasingly powerful systems cracked open the door for real-time predictive analytics.
2. A world of real-time predictions
We already live in a world of “real-time” predictive analytics. A simple predictive analysis is your arrival time in Waze. A more complex real-time prediction occurs billions of times worldwide every millisecond in matching certain types of digital advertising.
Further, companies like Mintigo and Versium now offer real-time solutions for lead scoring, showing that the transition is technically possible. Possible, however, doesn’t mean perfect. Sam Underwood, a vice president at Futurety, acknowledged the complexity of necessary integrations:
Especially in the mid-market world, the tools that gather data to turn into predictive modeling—CRM systems, social media aggregators, logistics, and purchasing systems—often do not have friendly APIs or other easy mechanisms with which to quickly gather and interpret data.
That disconnect still thwarts even the most fundamental business cases for real-time predictive analytics. David Longstreet, the chief data scientist at FanThreeSixty, offered an example:
In our world of sports and entertainment, for example, most sports teams do not know how many people are in a stadium for a game. Teams know how many tickets were distributed; however, they do not know in “real time” how many people are in the venue or stadium during the event.
That knowledge gap hampers efforts to staff and stock the stadium appropriately. It’s also why interest in predictive analytics is almost universal, even if it vastly outpaces adoption.
3. Slow adoption but soaring interest
So how many businesses are actively using predictive analytics? According to research from Dresner Advisory Services, about 23%, a figure essentially unchanged from the prior year.
Interest, however, exceeds implementation. The same research suggests that 90% of businesses “attach, at minimum, some importance to advanced and predictive analytics.”
So which questions are those 23% answering with predictive analytics? Let’s take a look.
Which questions can marketers answer with predictive analytics?
“They want to predict everything,” according to Underwood. And who wouldn’t want to know the exact foot (or web) traffic by month, day, and hour to streamline staffing (or allocate server resources)?
But, Underwood continued, he tries to focus clients on “the one thing that, if we could predict it for you, would revolutionize your business.”
In digital marketing, Phillips outlined myriad use cases for predictive analytics, including the capability to predict:
- which advertising will be most effective—however you define effective.
- which marketing campaigns, channels, touches, behaviors, and demographics are contributing to a business outcome, a form of “machine learning–based attribution.”
- which segment, test, or personalization a user is most likely to respond to.
- the probability of users to click on an ad, to download a whitepaper, to respond to an email, to respond to an offer, and other customer response you define.
- which leads will convert—however you define conversion.
- which customers will buy one or more products for a cross-sell or upsell.
- the number of purchases or revenue that will occur in the future.
- which customers will have high/medium/low lifetime value.
- customer churn.
The novel opportunity of predictive analytics, then, is not what you can predict but the fact that you can predict. The historical data you currently analyze can probably become a prediction.
Just make sure you have the data.
What do you need to get started with predictive analytics?
Data, data, and data. “Priority 1A and 1B are data sources,” stated Underwood. That’s true whether you plan to license software or hire an outside organization. (Both options are detailed later.)
All uses require training data. That training data, in turn, is used to build a predictive model to apply to current data. “The only limitation we’ve run into,” Phillips noted, “is a company’s available data for training.”
How much data is enough? According to Phillips:
A few thousand records with a sufficient amount of positive and negative outcomes can be sufficient for marketing, sales, and product prediction.
Not all data is created (or stored) equally
“You have to understand—I grew up tearing tickets.”
FanThreeSixty’s Longstreet has heard that same explanation from venue managers who have spent countless hours counting stacks of stubs after games. It’s a reason why vital data sources may not be easily accessible, or accessible at all.
In stadiums, Longstreet explained, point-of-sale machines and ticket scanners exist for a single purpose—to complete transactions quickly and keep lines moving. Those systems do not store data efficiently for extraction, nor can they handle incessant server requests (unless hungry fans don’t mind waiting).
For Underwood, clients tend to fall into one of two buckets, with half in each:
- “The ideal client has an internal database set up and ready to go. We pull in the data, build the model, and are off and running.”
- The other half have a mix of data sources, which inevitably include an offshore SQL database (or ten) managed by an external vendor whom no one can track down.
Stitching data sources together is a major development project that may require creating custom connectors, setting up third-party FTP drops, and other complex but thankless tasks. That work, however, is necessary: Models and their predictions are only as accurate as the data they’re built upon.
Don’t forget external data sources
Not all data comes from internal sources, either. External data sources, like weather reports, are often a critical addition to data lakes, especially for small businesses. As Underwood explained:
Restaurants may use analytics to trigger email sends; for example, we can set up the email platform to sync with National Weather Service data to send an email about iced tea when the temperature in a given metro area is above 90 degrees.
Likewise, we can trigger an email to send to customers in a given city if the system detects wind gusts of 40+ MPH. Both of these use cases reach consumers in a key moment of need, negating downstream ad spend and beating competitors to the punch.
So you have a large, well-organized dataset. What do you do with it?
How do you turn data into predictions?
While the limitation of insufficient data has faded, another remains:
Companies require either a dedicated team of data scientists to parse through these sets, or a software suite powerful enough to do so rapidly. For most small and medium-sized businesses, this usually means settling for subpar software, or forgoing it entirely.
For businesses of all sizes, solutions branch into two options:
- Purchase software and create predictions in-house.
- Pay an outside vendor to develop models and visualizations for you.
1. Predictive analytics software
The marketplace for predictive analytics software has ballooned: G2Crowd records 92 results in the category. Pricing varies substantially based on the number of users and, in some cases, amount of data, but generally starts around $1,000 per year, though it can easily scale into six figures.
G2Crowd lists both IBM’s SPSS Statistics and SAS’s Advanced Analytics as market leaders at the enterprise level. Along with RStudio, the pair are also tagged as leaders for mid-market companies; only IBM retains a place in the “Leaders” quadrant for small businesses.
Historically, however, even industry-leading predictive analytics software hasn’t been a simple, jump-right-in experience. Take these two examples from IBM’s SPSS Statistics and RapidMiner:
While these platforms are powerful, users must format data files, link nodes, and develop visualizations. Learning how to do this—and having the time to do it—is a specialized, full-time job. (To believe otherwise is to expect a Microsoft Word license to write your Great American Novel.)
Not surprisingly, the market is shifting. RapidMiner has rolled out a SaaS beta that, with a bit of manual adjustment, translates an Excel sheet full of, say, employment data to a prediction of employee retention:
Some companies, like Vizadata’s Phillips, see the user-friendly SaaS model as the future:
We are democratizing data science, so that people with limited or no data science or engineering skills can predict. You simply upload your data and click next. We do all the heavy lifting.
Our intelligence determines your dependent and independent variables and the type of analysis to run. You can go with our selections or override them—from regression, where we can do forecasting and optimization, to both binary and multiclass classification, where we can predict the probability of outcomes.
Like Vizadata, MIT’s Endor pursues this path. The platform uses a query-builder to allow anyone to ask questions like “Where should we open our next store?” or “Who is likely to try product X?” It then mines targeted datasets to provide answers, often in a matter of minutes.
The inclusion of tangential datasets that fall outside consideration—or feasibility—for human observers is a recurring advantage of predictive analytics. Endor’s creators offer an example:
A marketing department for a bank asks, “Who is going to get a mortgage in the next six months?” Machine-learning engines may detect a pool of, say, 5,000 customers who have a bank credit card and a high credit score, and are married—many of which may be false positives.
Endor detects more specific clusters of, say, couples about to get married or going through a divorce, founders who recently sold their startups to Facebook, or customers who recently graduated from a local real-estate course.
Of course, if you want to outsource the process entirely, outside vendors can organize your data, build models, and visualize predictions for you.
2. Outside vendors
Agencies offering bespoke solutions
For most clients, Futurety starts by identifying the key business question—not a specific metric or visualization. Clients may come in for one-off projects, annual re-runs of their data, or ongoing work.
“The end result is not always clear at the beginning,” Underwood explained. “When we’re delivering to someone close to the outcome, like a marketing manager, they’re typically happy with the model, the finding, and the math behind it.”
The “end result” could be several things:
- Integration with a third-party platform, like an email client, to automate predictive messaging.
- Plain-text predictive answers to guide practitioners.
- Robust visualizations to demonstrate the process and value to the C-Suite.
At the end of each engagement, Futurety delivers the model back to the client for management and maintenance.
Predictive analytics at work
- Futurety has a small business client that helps aspiring performing arts majors gain admission to their dream college. But few high schoolers have broad knowledge of good programs. More often than not, they know only one name: Julliard.
- Futurety trained its model on three years of placement data. Then, using new student data entered into a common portal, predicted where students would get accepted and succeed academically.
- The predictive analytics model, which Futurety updates annually, delivers a simple list of recommended schools for students based on factors like grades and exposure to different musical or artistic styles.
- The model takes into account whether past placements graduated or won awards.
All-in-one niche providers
FanThreeSixty serves a narrow market: sporting venues. Because they work with a comparatively consistent dataset—season ticket, concession, and souvenir sales—they know the range of business questions, data outcomes, and relevant visualizations.
This consistency incentivizes niche vendors like FanThreeSixty to develop proprietary dashboards to roll out to all clients.
The interface allows Longstreet’s team to keep data science in the background: “The secret of machine learning is when you’re being prompted behind the scenes.”
Distilled fully, FanThreeSixty’s goal (and Longstreet’s explanation of his role at dinner parties) is to “help teams sell more tickets and hot dogs.”
Predictive analytics at work
- FanThreeSixty mines historical data to see which concessions are most commonly purchased with a hot dog at a Major League Soccer venue.
- If a customer purchases a hot dog, concession staff are prompted to ask whether a customer would like to add the most popular accompaniment. That recommendation—a prediction of fan desire—changes based on other variables.
- Predictions consider more than 20 datasets—everything from the home location of season ticket holders to the weather—to tailor messaging before, during, and after matches.
- During cold-weather games, for example, FanThreeSixty can automate push notifications with tailored coupons, like buy-three-get-one-free hot chocolate for a family of four.
Whether solutions are internally or externally managed, they‘ve long been common in enterprise businesses.
Predictive analytics use cases at the enterprise level
Marketing departments in large organizations have used predictive analyticsfor years:
- AutoTrader. AutoTrader uses data from its 40 million monthly visitors to better understand the sometimes lengthy customer journey. They built propensity models based on search behavior and created high-value lookalike audiences.
- Editialis. The French publisher uses predictive analytics in its email campaigns to “anticipate engagement at an individual level.” As a result, they’ve seen click-through-rates increase “dramatically.”
Predictive analytics can also coordinate offline and online interactions, with two clear use cases for marketers whose companies have physical products or storefronts:
- Improved pricing. Smartphone data registers in-store browsing habits to improve online or offline marketing targeting, approximating the advantages enjoyed by ecommerce companies.
- Inventory management. Full warehouses cost money; empty shelves cost money. Folding online data, such as search patterns, into sales data can better manage inventory, especially at a regional and local level.
In addition to external marketing campaigns, predictive analytics also supports internal project management. Large marketing campaigns have many moving parts—a new ad campaign needs new creative, new copywriting, new landing pages, etc.
Coordinating the involvement of those teams and accurately estimating the time-to-launch is complex. Many fail to get it right, sometimes at great expense.
Predictive algorithms, as McKinsey notes, use a wider lens that captures historical patterns and unique project elements in a single frame:
While every development project is unique, the underlying complexity drivers across projects are similar and can be quantified. If companies understand the complexity involved in a new project, they can estimate the effort and resources required to complete it.
Predictive analytics models “take into account not only the complexity of the project (both the functional and implementation aspects) but also the complexity of the team environment.”
Predictive analytics at work:
More accurate internal project management, in an example McKinsey offers, can have a major impact:
- A company initially planned a product update to take roughly 300 person-weeks of effort, an estimate based on the limited number of changes between the current product and a new design.
- However, that estimate failed to take into account the fact that planned updates would affect many different teams. Predictive analytics models did take it into account and estimated that the project would take three to four times as long.
- As a result, the company limited the work to the original product team, enabling them to deliver the update on time.
In addition to helping companies solve internal and external challenges, predictive analytics is also the foundation for some businesses.
Building a business on predictive analytics
Ken Lazarus, CEO of the recruiting platform Scout Exchange, has an advantage—the company has been around for only five years.
That means that the company’s data sources are already primed for extraction into its predictive models that pair companies with the right recruiter.
The single best predictor of job placement, Lazarus and his team have found, is the track record of job recruiters. In contrast, pairing the right job description with the right resume remains exceedingly difficult.
“Job specs are horrible,” he lamented. “The data isn’t on the paper. CVs are pretty horrible, too.” (Data augmentation, such as skills testing and video interview decoding, Lazarus noted, offer potential improvements.)
Nonetheless, holes remain. Candidates will never disclose negatives on their resume, and important information might forever remain “non-data,” such as whether a candidate is a good “culture fit.”
Scaling data gathering
Scout Exchange has honed its predictions by focusing on enterprise customers—its algorithms feast on hundreds or thousands of openings from Fortune 500 clients.
As a result, the platform takes in roughly 1 million data points monthly, with each new job posting yielding an additional 50 data points.
Still, human assessment by a recruiter—and their client—is necessary. Lazarus drew a parallel: “Would you let machine learning pick your wife? No. But would you let it pick the right matchmaker to help you find a spouse? Yes.”
Those who are trying to solve the most complex human issues aren’t even in the business world.
Predictive analytics with life or death consequences
The greatest challenges for predictive analytics are those that deal with complex, individualized human behavior, such as the likelihood that a patient or crisis-line texter will commit suicide.
Because success or failure is measured in human lives, these challenges are also the most urgent. And while these projects operate beyond the scope of marketing and business, they suggest the potential for predictive analytics as it evolves.
“REACH VET is not about trying to find the veteran who’s sitting in the car in a parking lot with a gun in his lap,” Aaron Eagan, Veteran Affairs deputy director for innovation told a Washington conference.
“What we found,” Eagan continued, “is that veterans at highest risk of suicide [also have] significantly increased rates of all-cause mortality, accident morality, overdoses, violence, [and] opioids.” Proactive alerts that trigger physician check-ins have improved primary-care appointment attendance and reduced hospital admissions for mental health issues.
The project is similar to a collaboration between Periscope Data and Crisis Text Line, a text-based suicide hotline.
Leaning on natural language processing and predictive analytics, the program analyzed conversations, forecasted trends, and trained more than 13,000 volunteers. The results?
- Wait times decreased to less than 5 minutes, an operational goal.
- Capacity increased by 10% during peak periods.
- Responses were prioritized based on machine-identified urgency.
Endor’s technology has taken on similarly serious challenges. Using 15 million data points from 50 known ISIS supporters, Endor identified 80 lookalike accounts in less than half an hour, with only 35 false positives—expert investigation was still necessary yet feasible.
In a collaborative project with the U.S. Defense Advanced Research Project Agency, the platform also analyzed mobile data to identify patterns to predict future riots.
Predictive analytics is not immune to criticism: GDPR rebuffs some of the same collection methods that swell data lakes. And not all predictions, even the most accurate, are well-received. (Famously, Target unwittingly informed a father of his teenage daughter’s pregnancy based on seemingly benign shopping habits.)
Predictive analytics experts point out that their algorithms search for patterns among values, not the values themselves. Regardless, insufficient data is unlikely to hold back the expansion of the industry—the IoT, wearables, and other data collectors already supplement traditional web and app analytics.
User-friendly SaaS platforms are still an emerging opportunity. For most businesses, creating models and predictions from historical data still requires a dedicated employee to navigate complex software solutions or the outsourcing of that work to a vendor.
For those postponing predictive analytics projects until the SaaS options are more mature, you would be wise to keep filling your data lake.