“Big data” is dead. Vendors killed it. Well, industry leaders helped, and the media got the ball rolling, but vendors hold the most responsibility for the painful, lingering death of one of the most overhyped and poorly understood terms since the phrase “cloud computing.”
Any established vendor offering a storage or analytics product for a tiny or a large amount of data is now branded as big data, even if their technology is exactly the same as it was 5 years ago (thank you, marketing departments!). Startups, too, lay claim to the moniker of “big data app” or “big data startup,” eager to soak up some of the big data money floating around in big data-focused VC funds.
The phrase “big data” is now beyond completely meaningless. For those of us who have been in the industry long enough, the mere mention of the phrase is enough to induce a big data headache — please pass the big data Advil. (Editor’s note: We couldn’t agree more!)
If you want proof, witness the rising tide of backlash against the term:
RIP Big Data.
Now that big data is dead, we’re free to move onto the next chapter of our lives. Which, from a data perspective, means we can stop worrying about the volume, variety, velocity, veracity and verisimilitude of data (just put it in Hadoop already!), and begin focusing on ways to impact bottom-line metrics by leveraging the talent, tools, and technologies that are slowly making their way into mainstream.
As the industry matures, there won’t be a single term that replaces the big data moniker. Instead, different tools and technologies will carve out different niches, each more narrowly focused and highly specialized than the universal sledgehammer that was big data.
I’m going to talk about some of the niches you’re going to hear about again and again. Alas, some of these will be spun into buzzwords that, like big data, accumulate so much “momentum” they eventually lose meaning. But for now, they should give you a glimpse into what lies ahead in the future of data storage, processing and analysis.
I’ve identified six different aspects of data that you’re going to hear more frequently in 2013. Each of these terms actually conveys useful information, and cuts across slices and use cases that fall under the rubric of “big data”.
Various industry leaders, writers, speakers, and pollsters have started using the term “smart data” to refer to an increasingly common pattern emerging in the big data scene involving the productization of persistent data through predictive analytics.
In essence, companies are moving beyond BI, which relies on humans to interpret data, and are looking to monetize their vast troves of machine-captured data through predictive analytics (which relies on advanced techniques in statistics and machine learning to recognize and exploit patterns). These predictive analytics are often deployed as revenue-generating, intelligent features inside products, such as fraud detection, recommendations, personalizations, ad targeting, and much more. Examples of companies leveraging smart data include Netflix, Amazon, Rich Relevance, Gravity, LinkedIn, SailThru, and many more.
Data science is a new field that employs advanced techniques in statistics, machine learning, natural language processing, and computer science to extract meaning from large amounts of data (sometimes with the goal of creating new data products — arguably the reason data science was created). Though still meaningful, this term is starting to come under abuse from vendors due to its skyrocketing popularity. Metamarkets, for example, touts the benefits of its “data science platform”, but their core technology is a slice & dice aggregator. Similarly, many people who know SQL and MicroStrategy are now claiming to be data scientists. I fear this term may become a victim of its own success and suffer the same fate as big data.
NewSQL is a moniker for describing highly-scalable, horizontally-distributed SQL systems. Drawntoscale, VoltDB, SpliceMachine, SQLFire, Impala, Redshift, Clustrix, NuoDB, and Hadapt are a few of the many solutions that combine the scalability of NoSQL platforms with SQL and the strong ACID guarantees of legacy relational databases. NewSQL doesn’t mean NoSQL will die, it just means that companies who want scalability and SQL can have their cake and eat it, too.
Many companies will continue choosing NoSQL systems because they support non-relational data and can offer higher performance because they don’t provide ACID guarantees.
After many years of relative obscurity, predictive analytics are coming into their own. Core to both data science and smart data, predictive analytics are the flip side to historical analytics, and involve using historical data to predict future events. If you can predict the future, you can also change it.
Indeed, predictive analytics are behind everything from recommendation engines (which recommend items that are predicted to maximize the chance of a conversion), to fraud detection, to yes, predicting which parolees are most likely to commit murder. The field calls upon techniques in statistics, machine learning, modeling, and other fields to identify and exploit patterns.
Trends that didn’t make the cut, but are worth paying attention to, include stream processing and streaming analytics, NLP (which seems well on its way to entering mainstream, thanks in no small part to technology vendors like AlchemyAPI), image and video mining (including face, gesture, and emotion detection), machine learning, in-memory storage and computing grids, and graph databases, which offer a completely different way of solving problems in data analysis.
Big data as a term has seen its heyday. While many of the challenges that gave rise to the term are valid, storing virtually infinite amounts of multi-structured data is no longer novel or even mildly interesting.
Moreover, widespread and proliferating abuse of the term by vendors means that it means less and less with each passing month.
Increasing sophistication in the storage, processing, and use of data means we’re probably not going to see a single term replace big data. Instead, we’ll see the most common use cases forge on ahead, adopting terminology more restrictive and more descriptive.
Welcome to the post-big data era! It’s going to be one hell of a ride.