5 Trends That Will Transform Big Data in 2018
At Periscope Data, 2017 has been a year of exciting growth and development of our platform. We’re now working with more than 950 customers, and with the introduction of our Unified Data Platform we’re now providing them with the world’s first software platform truly built to address the complete analytics lifecycle.
With the year almost behind us, we’re ready to look ahead and project what 2018 will bring in the world of big data analytics. Here’s what some of our leaders are keeping an eye on for the next 12 months:
1. Unsupervised Machine Learning is Big Data’s Biggest Story in 2018
We’ve just seen Google’s chess computer, which had never seen a human game, beat the best chess computer in the world. Advances in neural networks and specialized hardware for machine learning have led to incredible growth in the power of deep learning. After a mere 4 hours of training on 5,000 of Google’s Tensor Processing Units, AlphaZero was able to beat Stockfish, a chess computer created through the collaboration of thousands of individual programmers and chess players over years.
This may sound like a fun experiment, but it points to the new capabilities we can expect to see applied to the massive data volumes businesses generate today. While not everyone will be using machine learning to make decisions, platforms that focus on flexibility and rapid iteration will outpace those based on rigid structures and processes. Visual-based data discovery tools should enable flexibility to quickly explore new types of questions — those that don’t will have serious problems in the future.
2. Anomaly Detection-Reporting Augments Your KPI Dashboards
BI measurement and reporting will be significantly refined with the growth of machine learning. We’re already familiar with the idea of an AI assistant who uses machine learning to assist in decision making. Technologies like Siri or Alexa are a long way from actually predicting our final intentions and completing them, but they can already spot anomalies and help you make an informed decision.
For humans, detecting anomalies in large volumes of data is time-consuming and difficult. When large amounts of data flow through a system, AI can make anomaly detection much easier. There’s huge value in using AI for this detection, then adding humans for the final step.
For data teams, it will be increasingly important to differentiate between your standard KPI reports and analysis supported by advanced techniques including AI. These deeper analyses should be held responsible for reporting when your KPIs are off, allowing you to dig into problems and quickly address them as they arise. Smart companies are already shifting their reporting priorities to reflect this, and unlocking these capabilities for data teams is a central part of what we’re working on.
3. Breakthroughs in Sophistication of Data Pipelines & Architectures
Enterprises of all sizes and in all verticals are moving 100 percent of their data to the cloud. But the transformative change isn’t about moving servers, it’s about rebuilding architectures in the cloud to take on sophisticated and complex workloads that couldn’t be done anywhere else.
Smart organizations are rethinking the possibilities with their data pipelines and architectures. What pre-built tools and services make the barriers to entry far lower? What workflows can we streamline by taking advantage of new offerings from AWS, Azure and others?
We’re already seeing exciting changes like ingesting data for machine learning in real-time, or automatically connecting to thousands of tools on the fly through marketplaces of third-party software — all with increasingly elastic architecture. Users can now easily scale their resources as needs change, plan for using services when costs are lower and pay only for what they use.
4. Data Lake Accessibility Transforms How Companies Handle Big Data
While the use of data lakes was a change originally driven by a need to reduce costs, we’ve evolved to the point where the lake is more than a simple data repository. The data lake model is now transforming the way companies access and use data. New tools are enabling data lakes to act as warehouses and the source of truth for analytics platforms in their own right.
Data lakes can now be accessed from multiple tools for a wide variety of purposes, from data modeling to machine learning to KPI reporting, without the need to move data around multiple systems. This change will enable companies to consolidate complex ETL processes and unify their business intelligence around a single set of data.
In 2017, Amazon released multiple tools that are transforming how data lakes can be used. Amazon Spectrum now enables Redshift to query data stored in Amazon S3. That means no more switching tools, languages or mindsets as you work in your warehouse. Amazon Athena allows serverless on-demand SQL against S3 without writing ETL processes or a data warehouse at all. That makes accessing your big data storage a painless experience. These are transformational changes in database architecture that will speed up analytics and make data professionals lives easier.
5. Cloud Data Warehouse Providers Begin to Battle
The modern generation of cloud data warehouse leaders – Amazon Redshift, Snowflake and BigQuery – have reached a level of maturity where they’ll now leave the stage of greenfield growth and start slugging it out with each other for market share. The success of their business will be predicated on capturing significant chunks of the Fortune 500 businesses, which means it’s becoming a zero sum game and they’ll all begin running into each other in key deals.
To boot, it’s getting increasingly hard for these warehouses to differentiate themselves. Until now, there’s been a clear divide between services that were, for example, cloud-hosted or not, or serverless or not. Now that the market has settled on columnar distributed cloud databases these companies will have to find new ways to differentiate themselves under more intense competition.