Big data is overhyped and unable to meet the expectations created by suppliers and consultants. So what exactly are the issues surrounding the big data paradigm? Below are nine examples of different interpretations how data can be used or abused.
Apple – Good data, bad app. Last year, Apple came up with the idea of making its own Apple Maps. It was a good idea but poorly executed as it turned out. Apple maps looked like an earthquake in San Francisco. After some finger pointing to data supplier TomTom, it turned out that it was Apple’s app that was just not up to scratch. And dangerous to boot. The Australian police had their hands full saving people that were lost in the desert while on their way to the city.
Big Bang – Signal or noise? Radio astronomers Wilson and Penzias were making overtime while calibrating their radio telescope but could not get rid of a strange static noise. They initially blamed the noise on faulty equipment. Months later, however, they discovered that the noise was a signal from the past: the remnants of the Big Bang. With big data we get both noise and signal and it is not always clear what the signal and what the noise. At the same time signals may get lost in the plentitude of noise.
Chris Anderson – Data is everything. In contrast to the 20th century philosopher Karl Popper, Chris Anderson – former editor in chief of Wired magazine – stated in his article “The End of Theory” that science is superfluous. Because of the ubiquity of data we should spend more time on studying data than on developing new theories. Or, as Chris Anderson’s described in his article: “Correlation supersedes causation, and science can advance here without coherent models, unified theories, or really any mechanistic explanation at all”, brute force trumps intellect and can give science a huge boost not in the least because data is unbiased.
Diederik Stapel – Data is of secondary importance to the result. As a social scientist, Diederik Stapel believed very strongly in the power of data to explain social phenomena. He claimed that meateaters are more anti-social than vegetarians. To support his conclusion he massaged and made up research data until he got the results he desired. Data and facts are not the same thing.
Gorilla Jacko – Without data, you can still win. For a period of 12 years (!) Gorilla Jacko managed to beat the stock market professionals. The clowns Franky en Milco experienced similar success with stock picking. The question arises whether more data and more complex models yield better results. Winning on the stock market means doing better than average. But as professional stockbrokers will allocate increasing resources to outperform each other, there must be losers too. Are we talking about a race-to-the-bottom here?
ICCP (International Climate Change Partnership) – Correlation versus causality. Humans are good at recognizing patterns. The problem is that they almost automatically want to explain correlating phenomena by plausible causation. Climatologists associated with the ICCP tried to explain global warming by linking it to the growing impact of human activity. Although greenhouse gasses such as CO2 and methane contribute to global warming, it is virtually impossible to prove that the increasing intensity of human behavior is causing increased CO2 exhaust and henceforth global warming. The inclination towards a causal relationship caused so much commotion that serious doubts emerged as to the objectivity of the ICCP scientists. The result was that the debate on global warming lost relevance and urgency.
Minority Report – Data is unnecessary, the future is fixed. In the movie Minority Report, adapted from the novel by Philip K. Dick, it is assumed that the future is fixed. The pre-cogs (diviners) predict crime and criminals. Data is no longer needed. Ultimately, two of the three pre-cogs turned out to be fake, and only one of them was predicting the future; the whole charade appeared to be a fear-based method to suppress crime. Even more strange, without a truck load of data the police in the same movie were unable to track down the future criminal; apparently the pre-cogs were not prescient enough to predict the criminals’ location.
Blue C.R.U.S.H – Data never tells the full story. The introduction of the Blue C.R.U.S.H. system (Crime Reduction Utilizing Statistical History) has been a huge success. That is to say: if we believe the press release issued by the Memphis Police department (MPD). By mapping and storing crime with all its dirty details and integrating and analyzing all kinds of environmental information CRUSH predicts where, when and under what circumstances crimes are likely to take place. With this knowledge it is simply a matter of sending out an additional patrol car to the right spot at the right time in order to prevent crime. The problem is most crimes are based on motive, means and opportunity. By simply taking away the opportunity, motive and means remain and the crime will just be postponed temporarily. Crimes will just happen at a different time and place, and under other circumstances. Also, CRUSH does nothing to prevent online crime. Crime follows the money and 90% of money is found online nowadays. Cybercrime is simply much more rewarding and less risky than street robbery. At the same time it will be extremely difficult for the MPD to send a patrol car to Eastern Europe.
Robbert Dijkgraaf – Random data makes the future unpredictable. During Robbert Dijkgraaf’s TV lecture on the smallest building blocks of the universe there was a question from the audience: if we would know the direction and speed of all of the smallest building blocks in our universe and we would unleash powerful algorithms and computers to predict their future movement, could we then predict the future? No, said Dijkgraaf and used a Geiger counter to show that although he could predict there would be static noise when hovering above the radon dial plate of his old watch he could never predict what pattern would emerge. The pattern would remain random. The future is not fixed, he thus concluded.
Data is undoubtedly the new holy grail. It is not so difficult to imagine that the combination of larger, more diverse and fresher datasets can create value. With the powerful computers and software we have currently available, we can analyze everything, however absurd, and thus unlock the hidden value of data. But in contrast to the accepted belief of data’s great potential, data science is still in its infancy. Typical human traits such as inaccurate use of data and misinterpretation of data will also cause some drawbacks. It also seems that the new data centric paradigm decrees that data will explain everything, provided there is sufficient relevant data. Unfortunately, “satisfactory” and “relevant” are not objective qualifications. Data may be the new pillar of science but Popper cannot yet be pushed aside. On the contrary, we will have to think harder than ever before.