OPINION25 June 2014
All MRS websites use cookies to help us improve our services. Any data collected is anonymised. If you continue using this site without accepting cookies you may experience some performance issues. Read about our cookies here.
OPINION25 June 2014
By applying some of the basic principles of survey research and statistics, Ipsos MORI Digital’s Claire Emes says many of the short-comings of Big Data can be overcome.
Much has been written on the pros and cons of Big Data; in fact the White House recently published a report titled ‘Big Data: Seizing Opportunities, Preserving Values’ which examines how big data is changing the way we live and work.
What follows is certainly not a definitive guide, rather it is a brief critique of Big Data from a researcher’s perspective and how, by applying some of the basic principles of survey research and statistics, we can overcome many of its shortcomings and unlock its value.
In his book, The Signal and the Noise: Why So Many Predictions Fail but Some Don’t, Nate Silver suggests the quantity of information in the world is increasing by 2.5 quintillion bytes per day, but the amount of useful information almost certainly isn’t. He explains that most of it is just noise, and the noise is increasing faster than the signal. There are so many hypotheses to test and so many data sets to mine, but according to Silver there is only a relatively constant amount of objective truth to find.
Taking the principle of bigger isn’t always better one step further, I’d suggest it’s not only not better, it can actually be worse. A number of proponents of Big Data refer to a Big Data set as one where ‘N = All’, where we no longer have to sample as we have access to the entire background population. But is ‘N = All’ really a good description of most available data sets? Do we ever really have all of the data?
As the economist Tim Harford and Microsoft’s Kate Crawford, among others, point out, most Big Data sets contain systematic biases. It takes careful thought to identify and correct for these skews. Big data sets can seem comprehensive but ‘N = All’ is often a seductive illusion.
Think of Social Media, it is in principle possible to record and analyse every message on Twitter and use it to draw conclusions about the public mood, but even if we analyse every tweet, Twitter users are not representative of the population as a whole. According to Ipsos MORI’s tech tracker only 15% of the UK population are on Twitter and they are disproportionately young and from higher social grades. In most situations, we’d be better analysing an infinitely smaller but representative sample of the population we’re wishing to understand.
Another issue is that if we rely on ‘found’ data alone, we’re constrained by what exists. As Nate Silver pointed out in his interview with Ipsos MORI’s CEO, Ben Page, “The credit rating agencies in advance of the crunch had millions of observations on individual mortgages, but all from a period when housing prices were increasing”.
It can be risky to rely entirely on past observable behaviour and algorithms. If you have no idea what is behind a correlation, you have no idea what might cause that correlation to break down. This is particularly concerning when people feel that they can be more certain about their predictions because the size of the data set means that they’ve got the numbers to back it up.
Perhaps we could go so far as to suggest that Big Data can be dangerous. Big Data can mean big errors. The data can be wrong or misleading, but more often than not there are errors in interpretation rather than the data themselves. This is frightening if authorities wrongly predict a health scare (or fail to) and frustrating if a company tries to sell you something you already have or simply aren’t interested in.
Further, Big Data models do not just predict, they can make things happen by creating a behavioural loop. A person feeds in data, which is collected by an algorithm which then presents the person with choices, so steering behaviour. This can create efficiencies but it’s easy to see how this could result in yet more data skews or could be abused.
So Big Data can be unwieldy, misleading and possibly even hazardous but, despite this, we and many of our clients are genuinely excited about the opportunities it presents. Many of the projects we’re undertaking today leverage Big Data sources and techniques and we expect this to apply to even more of our work in the future.
Our experience suggests there are some key principles we need to consider to ensure we don’t fall into any Big Data traps.
In summary, while Big Data may not be the answer to all our questions, it can certainly provide a very useful contribution and, when combined with other sources of insight, helps us develop a deeper understanding of people’s motivations and behaviour.
Claire Emes is head of Ipsos MORI Digital
2 Comments
BigDataGuru
11 years ago
Small minds = small industry.
Like Reply Report
Matt Champagne
11 years ago
I enjoyed this article. This one is bookmarked. ;)
Like Reply Report