It’s been an exciting, but complex year in the data world.
Just as last year, the data tech ecosystem has continued to “fire on all cylinders”. If nothing else, data is probably even more front and center in 2018, in both business and personal conversations. Some of the reasons, however, have changed.
On the one hand, data technologies (Big Data, data science, machine learning, AI) continue their march forward, becoming ever more efficient, and also more widely adopted in businesses around the world. It is no accident that one of the key themes in the corporate world in 2018 so far has been “digital transformation”. The term may feel quaint to some (“isn’t that what’s been happening for the last 25 years?”), but it reflects that many of the more traditional industries and companies are now fully engaged into their journey to become truly data-driven.
On the other hand, a much broader cross-section of the public has become aware of the pitfalls of data. Whether it is through the very public debate over the risks of AI, the Cambridge Analytica scandal, the massive Equifax data breach, GDPR-related privacy discussions or reports of growing government surveillance in China, the data world has started revealing some darker, scarier undertones.
Both are the flipside of the same phenomenon, which has been brewing for many years but is now in full display: just about everything (whether personal or professional) is rapidly getting digitized, and data technologies are becoming more adept than ever at processing and analyzing this massive data exhaust, increasingly in real time. From this can result both magic and abuse. The debate on how to combine this great power with a necessary sense of responsibility has become essential.
Let’s highlight some of the key trends and events of 2018.
Infrastructure & Analytics
From an industry standpoint, the data ecosystem remains as exciting and vibrant as ever, with a rich tapestry of innovative startups, mature “scale-ups”, and many aggressive public technology vendors. Most importantly, many customers large and small are deploying those technologies in production at scale, and reaping undeniable value from their efforts.
As the cycle of replacing older IT technologies with more modern data products continues, it seems that the Big Data market (infrastructure, analytics) is cycling through the early majority of buyers and transitioning into the late majority of the traditional adoption curve.
In addition, the data world continues its inexorable evolution towards the cloud. It is actually staggering to see how fast the large public cloud providers (AWS, Azure, Google Cloud Platform, IBM) are growing, considering they already each generate billions of dollars of revenues every quarter. The trend raises ongoing concerns around vendor lock-in, and this may open up opportunities for startups offering multi-cloud solutions. However, to date even companies adopting multi-cloud strategies tend to still rely on one vendor as their primary provider.
As they keep growing, large cloud providers increasingly compete with each other by offering a wide array of Big Data, data engineering and machine learning tools through their platforms (e.g., Amazon Neptune, Google AutoML, etc.) – and often with aggressive pricing, to attract more developers, as their true business model is data storage. As the scope and sophistication of such tools keep growing, this has a big impact on the data technology landscape, making it arguably harder for startups to compete, at least for broad, horizontal opportunities. A bit more every year, the list of product announcements at big annual cloud vendor conferences (see AWS re:Invent, for example) sends shockwaves in the startup industry, as they put cloud vendors in direct competition with dozens of VC-backed startups in one fell swoop. It will be interesting to see how public markets react to the upcoming Elastic IPO, an open-source software company that saw Amazon launch a direct competitor, Elasticsearch, three years ago.
Plenty of opportunities for startups remain, however, as long as they are sufficiently differentiated. Many in the space are scaling fast, and there are a number of particularly interesting, fast-growing segments in the infrastructure and analytics part of the ecosystem, including streaming/real-time, data governance, and data fabrics/virtualization. The explosion of interest in AI has also led to great opportunities (and a lot of funding) in AI chips, GPU databases, AI devops tools, and platforms enabling the deployment of data science and machine learning in the enterprise.
Machine Learning & AI
It’s certainly been a wild year in the world of AI research, with anything from the prowess of AlphaZero to the staggering pace of release of new advances – new forms of Generative Adversarial Networks, Vicarious’ new Recursive Cortical Networks, Geoff Hinton’s new Capsule Networks. AI conferences like NIPS have grown to attract 8,000 people and thousands of academic papers are being submitted every day.
At the same time, the pursuit of AGI remains elusive, perhaps thankfully so. Much of the current wave of excitement (and fear) about AI results from the impressive performance of deep learning since 2012 but, in the AI research community, there’s a growing sense of “what now?” as some question the foundations of deep learning (backpropagation) and others look to move past what they consider “brute force” approaches (lots of data, lots of computing power), perhaps in favor of more neuroscience-based approaches.
Far from fearing robot world domination, many in the AI research community are concerned that continued over-hyping of the field may eventually disappoint and lead to another AI nuclear winter.
Outside of AI research, however, we are just at the beginning of a wave of deployment and application of deep learning in the real world across a variety of problems involving speech recognition, image classification, object recognition and language, in different industries. If the infrastructure and analytics part of the ecosystem is getting to the late majority, we’re still very much in early adopter territory for enterprise and vertical AI applications.
The Cambrian explosion of deep-learning based startups that started a year or two ago has mostly continued unabated, even though the AI startup market is (arguably) showing signs of finally cooling down. Expectations, round sizes and valuations remain high, but we are certainly past the phase where big Internet companies would snap up very early AI startups at high prices just for the talent. The air is also clearing up a bit and revealing “real” AI startups, versus a number of other companies that were leveraging the hype. Some of the AI startups that were founded in the2014-2016 time frame are starting to hit early scale, and many are offering increasingly interesting products across industries and verticals including health, finance, “industry 4.0” and back office automation. Deep learning will continue bringing a lot of value in real world applications for years to come, and vertical-focused AI startups have many great opportunities ahead of them.
This continued explosion is very much a global phenomenon, with Canada, France, Germany, the U.K. and Israel being particularly active. However, China seems to be playing at a completely different level in AI, with reports of government-led pooling of data at mind-boggling scale (across Internet companies and municipalities), rapid advances in areas such as facial recognition and AI chips, and gigantic rounds of financing for its startups: according to CB Insights, China accounted for only 9% of global AI deal share but nearly 48% of global AI funding in 2017, up from 11% in 2016 (see some examples below).
In the same vein, issues of data privacy (and ownership and security) are emerging as a major concern around the world. In the early days of the Internet, data privacy was about protecting what we did online, a comparatively small portion of our activities. Correspondingly, only a small (albeit vocal and passionate) minority of people truly cared. As just about every aspect of our personal and professional lives is now connected to the Internet through an ever increasing array of connected devices, the stakes are changing. With its ability to spot anomalies in massive data sets, predict outcomes and recognize faces, AI is compounding the data privacy problem.
A separate but related concern is that a lot of this data is owned by large Internet companies (GAFA). Some, like Facebook, have proven to be a less than perfect steward of it. Nonetheless, this data provides them an unfair advantage in the race to produce ever more powerful AI.
Against those issues, an emerging theme is to think of the blockchain as a possible foil against the risks of AI, as well as a way for others, outside of GAFA, to produce great AI. Crypto economics are viewed as a way to incentivize individuals to provide their personal data and for machine learning engineers to build models by processing this data anonymously. It all remains very experimental, but some early marketplaces and networks are emerging
Quite semantic note: buzz terms come and go. Fewer people speak about “Big Data”, many more about “AI”, often to described the same reality. Consequently, we have slightly rebranded our 2018 landscape: it is now called the “Big Data & AI” landscape!