BBVA has announced that it is creating a team of 2,000 data scientists to develop new digital products and services. It is only one of the many companies jumping on the Big Data waggon. But do these companies understand the dangers of Big Data analysis, and in particular the algorithms that underlie them?
Big Data analysis seeks to take advantage of the vast quantities of digital data we (whether governments, companies or individuals) post on the internet every day. The digital data is collected and then analysed to identify trends and patterns which can help shape the decisions taken by governments or companies. Such decisions can range from whether to give a prisoner bail or grant a client a mortgage to designing commercial and marketing strategies. There is even discussion of introducing Big Data Analysis to foreign ministries to support foreign policy analysis.
Because Big Data analysis is carried out by computers, based on the massive collection of digital data, executives and policy makers live under the illusion that Big Data analysis is objective and necessarily accurate. But it is an illusion. The actual Big Data analysis is carried out by algorithms designed by data scientists. The algorithms reflect the epistemological biases and prejudices of their designers. The algorithm designers are computer data experts, but often know little about the issues to which their algorithms are being applied. The decision makers using the Big Data understand little, if anything, about the algorithms on which that Big Data analysis depends. This can result in perverse outcomes.
We have been here before, with the mathematical models used to support investment decisions. These highly complex models are designed by mathematicians and physicists who understand little about how economies function. They drive investment decisions by bank and investment fund executives who understand nothing about mathematical models. It is unlikely that the modellers and executives would even be able to communicate with each other, so different is their understanding of the world.
The common feature of these models is that they assume a normal distribution of risk or variation. A normal distribution, which readers will recall from the bell charts at school, is a good way of thinking about things like human height. The heights of individuals tend to cluster around average heights for men and women. Some are taller than the average and some shorter. But no-one is 3m tall. There are extremes that can be ignored. Unfortunately, the distribution of events in complex systems (like stock markets and economies) do not follow a normal distribution, but a power law distribution. In a power law distribution, small events or variations occur frequently, but major events or variations do occur, albeit not often. 3m men and women do occur, just not very often.
However, power law distributions do not allow predictions and normal distributions do. So normal distributions remain at the core of economic modelling. Modellers would argue that most of the time this does not matter, as most of the time events and fluctuations, in markets and the economy, follow a normal distribution. Most of the time there are no 3m men or women. But when models based on normal distribution do fail, they do so spectacularly. When the model for option pricing underlying the hedge fund Long Term Capital Management failed to predict the Russian default in 1998, the fund’s collapse nearly took the global economy with it. Nevertheless such economic models continue to be used and played their part in the 2008 financial crisis. Fund managers’ failure to understand the flaws in the models their investment decisions were based on left them unaware of the risks they were taking.
Similar problems could arise with Big Data analysis. And it may be about to get worse. Machine learning is removing humans from the design of the algorithms. Machine learning feeds massive quantities of data to a neural network, which then designs its own algorithm, teaching itself from the data. This approach allowed the AlphaGo programme to defeat the world´s human Go champion. However, counter-intuitively, this does not increase the objectivity of the algorithm. Bias and prejudice are now introduced with the data fed to the neural network to allow the “machine to learn”. Crap data in, crap learning out. There is another problem. With a human algorithm designer, executives can at least try to interrogate him about his thinking, even if the answers may be hard to understand. With machine learning there is no-one to interrogate. Experience with machine learning algorithms playing chess and GO demonstrates that such programmes think in very different ways to humans. In other words, we have no real idea of how the computer designed its algorithm.
Companies and governments will inevitably adopt Big Data analysis, and other artificial intelligence and machine learning tools. But in doing so, executives and decision makers must understand the limitations and dangers of the underlying algorithms, and not become over-dependent on tools they do not understand.