Making Big Data Small

by Andreas Koller

I have just finished my MA thesis at Royal College of Art entitled „Making Big Data Small – Visualising Patterns within Big Data Sets“. Let me share some of the things I wrote.

In my dissertation, I investigate the nature of big data, its different definitions and misconceptions and try to take a human-centered perspective on data. I connect this with the practice of visualising data by creating patterns with scatter-plots, which reveal the underlying structure of data: text visualisations, moving image analytics and visualisation of personal data. I want to emphasize the need for new tools (meta-tools) that allow for a more fluid way of interacting with data and the importance to establish data visualisation as a cultural technique to enable not only insight into big data sets, but also to foster cultural innovation.

Making Big Data Small - Dissertation

The following is an excerpt of the first chapter called ‚Data‘.

Data

Not one day passes when we are not reminded that we are apparently drowning in data. ‘Big Data’ has become a buzzword and has sparked a revolution. More specifically, the real revolution is “in data itself and how we use it”. In recent years it has been hailed as the solution for problems in design, science and business — sometimes even for all the problems of humanity. According to its followers, it is the source of innovation and progress, and big data is celebrated as the ‘new oil’. In these excessive discussions about data and its impact, opportunities and risks, it is necessary to step back a bit and define what we are actually talking about when we talk about data.

Data as Material

What is data? This fundamental question is often answered with the data-information-knowledge-wisdom (DIKW) hierarchy that introduces data at the basis of a pyramid, consisting of four levels of intelligence. Russell Ackoff, who introduced this hierarchy in 1989 in his paper ‘From data to wisdom’, defines data as symbols that represent properties of entities. They are the products of observation, but useless until they are rendered into a relevant state: information. Information is inferred from data and answers the questions that begin with such words as who, what, when and how many?. Information is defined as the extract of data, while knowledge is said to be “actionable information”. Knowledge turns “information into instructions”, making it the foundation for individual decisions and actions.

dikw-pyramid

Russell Ackoff and before him Milan Zeleny offer the following definitions for these terms:

Data are symbolsKnow nothing
Information is structured dataKnow what
Knowledge is actionable informationKnow how
Wisdom is the “ability to increase efficiency”Know why

Jennifer Rowley of Bangor Business School, University of Wales, conducted an in-depth analysis of this hierarchy in 2006 (published 2007). Its origins can actually be traced back to the poem ‘The Rock’ by T.S. Eliot in 1934. This poem contains the following lines:

„Where is the Life we have lost in living?
Where is the wisdom that we have lost in knowledge?
Where is the knowledge that we have lost in information?“

T.S. Eliot, The Rock

The DIKW hierarchy is very contested. It became the de-facto standard definition after Ackoff introduced it to the research community. It makes a lot of sense at first glance for several reasons: the pyramid seems logical, because “there’s obviously plenty of data in the world, but not a lot of wisdom.” However, it is generally agreed that this hierarchy simplifies things too much and makes it look like a logical progression from data to wisdom that happens automatically. The meaning of knowledge itself has changed during the last century. The pursuit of knowledge used to be the “most profound of human goals” – however the DIKW hierarchy wrongly suggests that it is generated by following a recipe that miraculously turns information into something we know. It is more complicated, but it can be better understood by the analogy that knowledge today has taken on the shape of the internet — connecting the knowledge and creating dynamic filters which present the information to us.
“Today data refers to a description of something that allows it to be recorded, analysed, and reorganised.” This broad definition sees data as everything that can be known. The word ‘data’ means ‘given’ in Latin, in the sense of a ‘fact’. Data is any ‘given’ thing that can be measured and put into numbers.


From Data to Information

„A bit of information is a difference that can make a difference.“
Gregory Bateson

Data is obviously closely connected to information. The most common definition of information was offered by Bateson, who defined information as the difference that makes a difference. This popular self- referencing definition also doesn’t take us far here. A general consensus on the definition of the word information cannot be found.

“Information is a measure of how difficult something is to describe.
Disorder has a high information content and order has a low one.”

Ted Nørretranders

Nørretranders finds a fitting metaphor by describing information as a measurement that indicates the complexity of something: while complete order has no information content and chaos has the maximum possible amount, everything that is of interest happens in between: “living creatures, thoughts and conversations.” Therefore, the amount of information itself doesn’t tell us much about its usefulness or interest; it is rather the process behind the information which is relevant. A better understanding of this notion can be gained when looking at the examples he provides: saying ‘yes’ at a wedding contains little information (one bit, actually), but the statement has depth and represents a huge amount of previous shared experiences that led to the saying of this word. In the same way, the result of a computation holds less information than the problem itself — it is always an abstraction. He demonstrates this by the simple computation 2+2=4: the result (4) contains less information than the initial computation (2+2). This is crucial for our definition of data, as the information that is contained in the final visualisation can be quite little compared to the initial data set, but the goal is that it is possible for the reader to imagine or understand the process behind it, the ‘depth’ of which Nørretranders speaks.
These more poetic interpretations of the same data-information hierarchy seem to be more to the point, because they introduce more human elements. Clifford Stoll, American astronomer and author, notes that we actually end up with very little information from big amounts of data:

„Our networks are awash in data.
A little of it is information.
A smidgen of this shows up as knowledge.
Combined with ideas, some of that is actually useful.
Mix in experience, context, compassion, discipline, humour, tolerance, and humility, and perhaps knowledge becomes wisdom.“

Clifford Stoll

Stoll manages to express the magnitude of data around us and the tiny amount of information that can be extracted from it, while the knowledge that results from it is even less. It is highly improbable and depends on many factors that this process eventually creates new wisdom.

The DIKW Hierarchy Revisited

For my purposes, I find the notion of a hierarchy from data to information useful, and I’d like to propose the following definition from the viewpoint of a designer:

Data is material (collected, measured and stored)
Information is designed data (structured, accessible and comprehensible)
Knowledge is understood information (internalised, remembered and contextualised)
Wisdom is the individual and collective ability to re-combine knowledge towards new insight and enabling cultural innovation.

As I see it, the designer is in charge of the first step — turning data into information. The next steps can only be done by the reader. It is a designer’s responsibility to ensure that the transformation in the next steps, from information to knowledge to wisdom, are facilitated, but that is out of the designer’s direct control.

To sum up, data in the context of this text can be understood as a physical resource, consisting of a large number of individual entities, which are turned into information through a design process; this results in a visualisation and aims to enable the reader to turn it into individual knowledge.

The following topics are „Data is Big, but there is no Overload“, „Order out of Chaos“, „Data as Truth (Data in Business)“, „N = all (Data in Science)“ and the first chapter concludes with „The Human Perspective“ on data which will be published in a follow-up to this blog post.

Bibliography
(sources for the references in the text above only)

Ackoff 1989 Ackoff, R. L. “From data to wisdom”. Journal of Applied Systems Analysis 15 (1989): 3-9.
Koomey 2008 Koomey, Jonathan G. Turning Numbers into Knowledge: Mastering the Art of Problem Solving. 2nd edition. 2008. URL
Mayer-Schönberger 2013 Mayer-Schönberger, Viktor, Cukier, Kenneth. Big Data: A Revolution That Transforms How we Work, Live, and Think. London: John Murray Publishers, 2013.
Nørretranders 2011 Nørretranders, Tor. “Depth” (2011). URL
Rowley 2007 Rowley, Jennifer. “The wisdom hierarchy: Representations of the DIKW hierarchy.” Journal of Information Science 33 (2007): 163-180. URL
Sharma 2004 Sharma, Nikhil. “The origin of the ‘data information knowledge wisdom’ hierarchy.” (2004) URL
Sloman 2009 Sloman, Aaron. “What’s information, for an organism or intelligent machine? How can a machine or organism mean?” Information and Computation: Essays on Scientific and Philosophical Understanding of Foundations of Information and Computation, World Scientific Series (2009) URL
Thorp 2012 Thorp, Jer. “Big Data Is Not the New Oil”. Harvard Business Review, November 30 (2012). URL
Weinberger 2013 Weinberger, David. Too Big To Know. Rethinking Knowledge Now That the Facts Aren’t the Facts, Experts Are Everywhere, and the Smartest Person in the Room Is the Room. New York: Basic Books, 2013.
Zeleny 1987 Zeleny, Milan. “Management support systems: towards integrated knowledge management”, Human Systems Management 7(1) (1987): 59–70.

Schreib einen Kommentar

Deine E-Mail-Adresse wird nicht veröffentlicht. Erforderliche Felder sind markiert *