By David hand
Why what you don’t know matters…
This book explores the Achilles heel of data science. Going beyond the data you have, it examines the data you don’t have, illustrating with many real-life examples how lack of awareness of what you are missing can lead to distorted understanding, incorrect conclusions, and mistaken actions. It then goes on to show how such dangerous ignorance can be detected and what to do to avoid the risks: it shows how to shine a light on dark data, revealing what was previously concealed and enabling you to draw accurate conclusions and take appropriate actions even in the face of ignorance. And stepping even further, it also demonstrates how this new perspective enables you to strategically apply ignorance to your advantage, leading to greater understanding and to better decisions. In short it flips the world of data science on its head, and looks at things from the opposite direction.
David Hand shines a bright light onto the dark corners of statistics. This is a learned book but a witty, readable, and important one. I learned a lot and so will you.
– Tim Harford, author of Fifty Inventions That Shaped the Modern Economy
[A] penetrating study of missing (‘dark’) data and its impacts on decisions . . . Hand offers expert training, from recognizing when facts are being cherry-picked to designing randomized trials. A book illuminating shadowed corners in science, medicine and policy.
– Barbara Kiser, Nature
This insightful book should be required reading for everyone in an age when ‘fake news’ and the explosion of data go hand in hand
–ADRIAN SMITH, CHIEF EXECUTIVE, THE ALAN TURING INSTITUTE
Best Selling Author
David J. Hand is emeritus professor of mathematics and senior research investigator at Imperial College London, a former president of the Royal Statistical Society, and a Fellow of the British Academy. His previous books include The Improbability Principle, Measurement: A Very Short Introduction, Statistics: A Very Short Introduction, The Wellbeing of Nations, and Principles of Data Mining.
The Dark Data Blog
Is there more than this? Dark data behind the story
Dark data that you do have
Sometimes the phrase “dark data” is used in a narrow sense to describe data that an organisation has collected but which has not yet been used to gain understanding or insight. The data might have been used for an immediate operational purpose (e.g. adding up the...
The Dark Data Blog
How Dark Data led us astray
Missing frequencies in energy data
Philip Q. Hanser of Northeastern University sent me this very nice example of DD-Type 14 dark data: ‘I was studying the question of wind turbine output variability to try to understand how much non-wind turbine back-up resources are needed to ensure the electrical...
Covid-19 and dark data
Ideally, whenever we have to make a decision we collect all the relevant data and condense it down so that we can make the best decision. However, the notion of “all the relevant data” is an elusive one, and very often, perhaps most often, much of the relevant data...
Much dark data arises in the form of entire records missing – people who refuse to take part in surveys, stars which do not radiate in the visible spectrum, patients not yet showing symptoms, and so on. But other dark data are more subtle. One example is when numbers...