Data is Big, but there is no Overload

by Andreas Koller

The next chapter from my dissertation „Making Big Data Small“ deals with the much-cited ‚data overload‘, arguing that the notion of an overload or flood might be wrong — isn’t it nothing more than a filter failure and could data visualisation be part of the solution?

„It’s not information overload.

It’s filter failure.“

Clay Shirky

Google and Amazon were among the first companies to harness the value and build a business on analysing data. Surely, before Amazon started in 1994, States gathered data about their citizens and the church about its customers. The first census took place in Egypt around 3000 BC. In 1085, William Duke of Normandy commissioned a „survey to discover the resources and taxable values of all the boroughs and manors in England“. The resulting Domesday book was regarded as the authoritative register of rightful possession for many centuries. Its 413 (Great Domesday) and 475 (Little Domesday) pages can be considered ‚big data‘ in the analogue age, and was one of the first data collections available.

As soon as the Gutenberg printing press was invented, the church rationalised its indulgence business and started printing the forms in larger numbers — a ready-made receipt leaving an empty space for the name of the purchaser: the first large-scale data accumulation. The British Library has one of the 1455 machine-printed indulgences on display, and it was the first mass-produced form mass-to collect data. So, what’s the recent fuss about data? Why is it suddenly in such high demand and treated as a new natural resource?

One reason is the incomprehensible scale of the data that was made possible by the invention of digital media. The amount of aggregated bits and bytes is growing exponentially. The ‘datafication’ of the world has started in the 1990s, when the shift from atoms to bits took place. Today, there is no question that the amount of generated data is exploding. During the London Olympics, around 60 GB of data was transmitted every second — the whole content of Wikipedia could be transferred within 0.5 seconds at this rate. The ‘Square Kilometre Array’, a radio telescope currently in construction in Australia and South Africa, will be the world’s largest and most sensitive radio telescope. It will generate many Exabytes per year once it is fully in operational.

Visualising is Filtering

However, this increase can be regarded as a natural process, not as uncontrolled growth. As Weinberger says, we do not speak of taste overload just because there are an endless amount of tastes in the world. Just as there is no data or information overload. Yes, the amount of newly generated data cannot be grasped by anyone any more, but it is a filter problem. When data is turned into information, filters are applied. Every visualisation is an abstraction of data, filtering out the essence or rather one of the essences of the data set. We still have to learn and develop filters that make it possible to read the world of big data, to picture it, to understand it.

The next chapter of my dissertation, „Order out of Chaos“, deals with the messiness of big data, how new meaning can emerge from large, unordered datasets, and how it is similar to the way we organise knowledge.

Bibliography
(sources for the references in the text above only)

Newman 2011 R. Newman, J Tseng. “Cloud Computing and the Square Kilometre Array.” Memo 134, Oxford University, May 2011 (2011). URL
Mayer-Schönberger 2013 Mayer-Schönberger, Viktor, Cukier, Kenneth. Big Data: A Revolution That Transforms How we Work, Live, and Think. London: John Murray Publishers, 2013. Amazon
Weinberger 2013 Weinberger, David. Too Big To Know. Rethinking Knowledge Now That the Facts Aren’t the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room. New York: Basic Books, 2013. Amazon

Schreib einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *