Most weather forecasts institutions run weather models that deal with a lot of data. But they are not really good at predicting the weather. For all the hardware power, the sophistication of the models and increased use of more and different data, forecasting the weather has not improved a lot. The atmosphere is such a highly complex environment that it is seemingly impossible to generate reliable and accurate predictions more than three days out. The financial world is dealing with the same problem. It has borrowed models, technologies and methods of physical sciences to get a better grip on financial trends and buying/selling behavior. But for all the alleged smartness of traders and their models, the results are often under par when comparing with market averages and also frequently below the results of apes (Gorilla Jacko) and clowns (Frenky and Milko).
The Large Hadron Collider will produce roughly 15 petabytes (15 million gigabytes) of data annually – enough to fill more than 1.7 million dual-layer DVDs a year! Source: Cern
While meteorologist and traders are very intelligent people they deliver suboptimal results. Clearly, for understanding complex relationships intelligence, our current data and data-models don’t cut it. One needs more data and better models. But with the help of more data and new data-analytics, unexpected patterns may emerge that could significantly improve forecasts. The promise of big data and big analytics can be a game changer in more than one way. In our CIO roadmap we alluded to this phenomenon as an important market trend in 2012.
The acceptance that there is value in data is not new but the scale at which we are generating, collecting and storing data over the last ten years is unprecedented. The speed at which the digital universe is expanding is still increasing. Structured and – increasingly – unstructured data from social media, sensor data, machine generated data, spatial data, meta data; there is hardly any data that is not stored. But what is more is the realization that we are now able to tap into the value of big data.
Interestingly enough big data is no longer the sole domain of telecom operators, banks and retailers. While traditional data warehousing and data-marts and their associated olap tools were all about analyzing structured data, big things are now happening with semi-structured and unstructured data. Crunching structured transaction data is just one side of the intelligence equation. Finding the best customers, the best selling products, the right time etc is the result of combining transaction data and context or interaction data; the latter being derived from a cocktail of spatial, sensor, machine and social data and meta-data. As a result Big Data is a key concept in the strategic vision of companies in other sectors as well.
The realization that there is big money in big data has been a trigger for many ICT companies to focus their attention on big data. 2011 was a key year in this respect. The release of Apache Hadoop 1.0.0. has created a common framework from which large datasets across a cluster of computers can be handled. With the blessing of IBM, Microsoft, Oracle and HP and the release of 3 commercial distributions (Cloudera, Hortonworks, MapR) Hadoop can now be regarded as the de facto platform for Big Data. 2011 also showed a series of acquisitions and products to underline the importance of the big data opportunity in the ICT industry.
- The acquisition of Greenplum by EMC
- IBM’s Hadoop ++ based Infosphere BigInsights announcement completes the prior acquisitions of Cognos, SPSS and Infosphere Warehouse
- Teradata’s acquisition of Aster Data
- SAP announced HANA (High Performance Analytical Appliance) roadmap
- Hewlett-Packard acquisition of Autonomy, Vertica Systems, Mulesoft and Soasta
- Dell acquired Cloudera: a commercial Hadoop distribution
- Oracle released its Big Data Appliance and exalogic
- Microsoft stopped its own development in favor of Hadoop
- Google’s MapReduce programming model
Perhaps the best development around Big Data are the efforts to combine the promise of Big Data with Cloud services. As businesses are starting to realize the potential of big data, the emergence of Big Data as a Service (BDaaS) is taking shape and will democratize the monetization of Big Data enabling midsize and smaller companies to benefit from Big Data without massive investment. For this emerging market we should track the whereabouts of companies such as Amazon Web Services, Accenture, IBM, Cloudera, HPCC, Microsoft, Opera Solutions, Quantivo, Hortonworks, 1010data, Think Big Analytics, Tidemark, and Tresata.
The M&A activity, product announcements and technological progress in Big Data are common headlines these days. Big Data is the new holy grail for business and science. In 2008 Chris Anderson wrote in Wired that we should throw our 20th century scientific education overboard and instead start relying on superior data analysis; just like Google did. In a way he suggests that working from current existing scientific methods and techniques distracts us from the truth hidden in actual data. The classic scientific research approach has its foundation in working from existing knowledge towards new theories and models in order to gain insights into phenomena. According to Karl Popper’s empirical falsification method scientific progress is made by trying to prove theories and models to be wrong. But Popper’s theory was developed in a time when data was scarce and human intelligence perceived as the driving force behind scientific progress. Now we live in a time of abundant if not ubiquitous data and the tools to analyze this data are starting to challenge human intelligence. While making progress without knowledge is somewhat difficult to apprehend, complex phenomena may well be better understood without the biased context of existing theories and models or conventional current scientific paradigms. A data-only approach to research through Big Data tools could create disruptive innovation in (scientific) research and may well impede the process of understanding and progress. But the Big Data concept also has its downsides, such as:
- Data is not so straightforward as it seems. How do you know if data is complete, up to date, valid, accurate, self explanatory, non-ambivalent etc.
- Unrealistic expectations from big data. Big data is not the answer to all problems but will pay off faster if expectations are modest and realistic. This means that big data should be used in areas that are clearly defined and well scoped.
- Infinite data and infinite data analysis could just create wow and flutter that looks like true insight. In other words understanding the outcomes of Big Data analysis is still key if we want to apply the outcomes. The demand for data talent will therefore accelerate and shortage of skilled data scientists could be an inhibitor to delivering the promise of big data.
- Mounting data privacy issues. The increasing collection of data will put people’s privacy under pressure, calling for some kind of data policy.
Despite the hurdles and potential pitfalls we anticipate a fast growing and vibrant sector that will quickly evolve in the next five years. In the business environment big data results will generate new effective ways of doing business as well as new business models. In science we expect traditional scientific research to more slowly give way to more data driven research; more like a generation shift of scientists. We do think that Big Data deserves and requires a deeper fundamental understanding into the consequences and results it generates for people, business and science.
Please mail us your thoughts about Big Data.