I rarely read pop-sci books, and I even more rarely review books in any form. However, I bought „Everybody Lies“ some months ago and just finished reading it. It took me about four months to read it, partly because it made me so angry as a researcher reading it.
Stephen-Davidowitz‘ book is mainly on how Big Data from the Internet allows us to draw conclusions about people and their behavior. A lot of studies are cited – nearly exclusively studies using observational data from the Internet. His own studies mostly involve search data from Google Trends, Twitter or Google News. Many of the questions raised are indeed interesting and the methodology promises to hold more insights for both companies and researchers.
There is no problem with this kind of research per se: A lot can be learned from observational studies, especially for questions where strict experimental manipulations are not possible because of ethical or economical reasons. The data collected on social media and other places around the web provides insights that might have been unavailable before. The most important thing, however, is to discern the different methodologies and the inferences that are possible in the chosen paradigms. And in this context „Everybody Lies“ falls very short. There is a whole chapter dedicated to causal inference and how only experiments are able to truly provide causal evidences. Still, the author draws implicitly causal inferences based on purely observational data again and again.
Making assumptions, using theory and adequate models you can make causal inferences from observational data, but this is far more difficult to do well than is presented in the book.
Well, this is a pop-sci book and aims to tell lay-people about data science and Big Data, so no philosophical or statistical consideration of inferential methods are to be expected. Nevertheless, for a book that claims to present a “revolutionary” approach to data collection, data analysis and inferences I would expect somewhat more precision and realism.
In the concluding chapter a reference is made to Karl Popper and his criticism of social science.
To the extent this was ever true, the Big Data revolution has changed that. If Karl Popper were alive today and attended a presentation by Raj Chetty, Jesse Shapiro, Esther, Dflo, or (humor me) myself, I strongly suspect he would not have the same reaction he had back then.
I beg to differ. In fact, I believe, the line of research presented in this book is in fact the very kind of research Popper has critised. Unreflective thinking, vague theories (if any at all) and simply analyzing data is in fact not scientific. If the reproducibility crisis taught us one thing, than that we need to take care to ask good questions, design good studies (true experiments or observational studies alike) and make good analyses.
Maybe I am super skeptical because the book is about what I do and what I care about: empirical research in times of big data and methods for data analytics. But I feel that this book does not do an adequate job in representing what data science can do and how it works. Even when boiling things down for laypeople.
And one last quote from the first half of the book:
Data science makes many parts of Freud falsifiable — it puts many of his famous theories to the test.
I am simply not sure if the author understands “falsifiability”. Or inferential statistics. Or empirical methodology. Or social science in general.
Just finished „Everybody Lies“. That book made me angry: no, Big Data will not be the sole future of social science. And I’m not sure if the author understands science, Popper and scientific theory.
— Christopher Harms ?? (@chrisharms) December 25, 2017
0/10. Would not read again.