Data visualization: a reading list

Here is a collection of books and peer-reviewed articles on data visualization. There is a lot of good material on the philosophy, principles and practices of data visualization.

I plan to update the list with additional material in the future (see the current version as a draft). Do reach out if you have any recommendations.

Introduction

Graphs in Statistical Analysis (Anscombe 1973)
An Economist’s Guide to Visualizing Data (Schwabish 2014)
Data Visualization in Sociology (Healy and Moody 2014)
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm (Weissgerber et al. 2015)
Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods (Cleveland and McGill 1984)
Graphic Display of Data (Wilkinson 2012)
Visualizing Data in Political Science (Traunmüller 2020)
Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks (Schwabish 2021)

History

Historical Development of the Graphical Representation of Statistical Data (Funkhouser 1937)
Quantitative Graphics in Statistics: A Brief History (Beniger and Robyn 1978)

Tips and recommendations

Ten Simple Rules for Better Figures (Rougier et al. 2014)
Designing Graphs for Decision-Makers (Zacks and Franconeri 2020)
Designing Effective Graphs (Frees and Miller 1998)
Fundamental Statistical Concepts in Presenting Data: Principles for Constructing Better Graphics (Donahue 2011)
Designing Better Graphs by Including Distributional Information and Integrating Words, Numbers, and Images (Lane and Sándor 2009)

Analysis and decision making

Statistical inference for exploratory data analysis and model diagnostics (Buja et al. 2009)
Statistics and Decisions: The Importance of Communication and the Power of Graphical Presentation (Mahon 1977)
The Eight Steps of Data Analysis: A Graphical Framework to Promote Sound Statistical Analysis (Fife 2020)

Uncertainty

Researchers Misunderstand Confidence Intervals and Standard Error Bars (Belia et al. 2005)
Error bars in experimental biology (Cumming et al. 2007)
Confidence Intervals and the Within-the-Bar Bias (Pentoney and Berger 2016)
Depicting Error (Wainer 1996)
When (ish) is My Bus?: User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems (Kay et al. 2016)
Decisions With Uncertainty: The Glass Half Full (Joslyn and LeClerc 2013)
Uncertainty Visualization (Padilla et al. 2020)
A Probabilistic Grammar of Graphics (Pu and Kay 2020)

Tables

Let’s Practice What We Preach: Turning Tables into Graphs (Gelman et al. 2002)
Why Tables Are Really Much Better Than Graphs (Gelman 2011)
Graphs or Tables (Ehrenberg 1978)
Using Graphs Instead of Tables in Political Science (Kastellec and Leoni 2007)
Ten Guidelines for Better Tables (Schwabish 2020)

Deciding on a chart

Graph and chart aesthetics for experts and laymen in design: The role of familiarity and perceived ease of use (Quispel et al. 2016)

Chart types

Boxplots

40 years of boxplots (Wickham and Stryjewski 2011)

Pie charts

No Humble Pie: The Origins and Usage of a Statistical Chart (Spence 2005)

Infographics

Infovis and Statistical Graphics: Different Goals, Different Looks (Gelman and Unwin 2013)
InfoVis Is So Much More: A Comment on Gelman and Unwin and an Invitation to Consider the Opportunities (Kosara 2013)
InfoVis and Statistical Graphics: Comment (Murrell 2013)
Graphical Criticism: Some Historical Notes (Wickham 2013)
Tradeoffs in Information Graphics (Gelman and Unwin 2013)

Maps

Visualizing uncertainty in areal data with bivariate choropleth maps, map pixelation and glyph rotation (Lucchesi and Wikle 2017)

Scatterplot

The Many Faces of a Scatterplot (Cleveland and McGill 1984)
The early origins and development of the scatterplot (Friendly and Denis 2005)

Dot plots

Dot Plots: A Useful Alternative to Bar Charts (Robbins 2006)

3D charts

The Pseudo Third Dimension (Haemer 1951)

Teaching pedagogy

Correlational Analysis and Interpretation: Graphs Prevent Gaffes (Peden 2001)
Numbers, Pictures, and Politics: Teaching Research Methods Through Data Visualizations (Rom 2015)
Data Analysis and Data Visualization as Active Learning in Political Science (Henshaw and Meinke 2018)

Software

Excel

Effective Data Visualization: The Right Chart for the Right Data (Evergreen 2016)

R

Data Visualization (Healy 2018)
Data Visualization with R (Kabacoff 2018)
ggplot2: Elegant Graphics for Data Analysis (Wickham 2009)
Fundamentals of Data Visualization (Wilke 2019)
R Graphics Cookbook (Chang 2020)

Stata

A Visual Guide to Stata Graphics (Mitchell 2012)


Changelog
– 2021-03-01: Add ‘Better Data Visualizations’
– 2020-08-03: Add ‘Ten Guidelines for Better Tables’
– 2020-07-14: Add ‘Designing Graphs for Decision-Makers’ and ‘A Probabilistic Grammar of Graphics’ (ht: Simon Straubinger)

10 method books you should read before you die

In this post you will find my 10 recommendations for method books you should read (or at least buy to impress your so-called friends). I have tried my best to put some order into the list so you can begin from the beginning. However, you should be able to read the books in any order you prefer.

Before we begin, I should note a few things. First, the list is ‘biased’ towards quantitative approaches. This is not to say that such books are more important or better (they are); the list is simply a reflection of my personally biased and professional interests. Second, while I can recommend books such as Data Analysis Using Regression and Multilevel/Hierarchical Models, Mostly Harmless Econometrics and Quantitative Social Science etc., I decided to go with 10 recommendations instead of 15 or 20.

1. The Seven Deadly Sins of Psychology: A Manifesto for Reforming the Culture of Scientific Practice
Science is broken. We all know that, but Chris Chambers knows it better than anyone else. He has been part of the open science movement for a long time and provides a tour de force through how “bad” science (i.e. most science) is conducted. From confirmation bias to p-hacking and everything else you need to be aware of when you read the endnotes in PNAS (i.e. the method section).

I suggest that this is the first book you should read. The book reminds you that science is done by humans and no specific method or no amount of statistics can remove the human element in doing scientific research. The book is about the procedures we don’t think about but should. Most importantly, I find the book optimistic in so far that it is pragmatic in terms of what we can do in order to conduct better science.

Related to this, I can also recommend this article: Five ways to fix statistics

2. Bit by Bit: Social Research in the Digital Age
This is a great book by Matthew J. Salganik. The book is introductory in its material and provides a lot of interesting and relevant examples. For that reason, I have used this book in my teaching.

The book provides a good introduction to the basics of social science research with a focus on contemporary data sources, e.g. social media data, and the different methods we can use. In addition, I also find the ‘Ethics‘ chapter much more relevant compared to what you often find in similar books.

Interestingly, and another reason why I can definitely recommend this book, the book is available for free online. If you do like the book, consider buying a copy.

3. Understanding Psychology as a Science: An Introduction to Scientific and Statistical Inference
Multiple books deal with philosophy of science and research methods, but no book is better than Understanding Psychology as a Science to give a solid introduction to the philosophy of (social) science.

What I find great about this book is that it fills a gap between philosophy of science and research methods compared to how most books cover both topics. Specifically, the book connects the work of Karl Popper and Imre Lakatos on scientific inference to the foundations of statistics (in particular hypothesis testing and significance testing).

4. Designing Social Inquiry: Scientific Inference in Qualitative Research
There is no way around this political science classic. Whether you like it or not, you cannot engage with the literature on research design in political science without having read KKV (an abbreviation of the three authors, King, Keohane and Verba).

The book is now over 25 years old (published in 1994) but still worth reading.

I have read the book from A to Z a few times (it is an easy read),

5-7. Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction, Experimental and Quasi-Experimental Designs for Generalized Causal Inference and Counterfactuals and Causal Inference: Methods and Principles for Social Research

There are different causal models. Each of these models have their advantages and disadvantages. The three most important causal models to know about are Rubin’s causal model, Campbell’s causal model and Pearl’s causal model (see Shadish and Sullivan 2012 for a comparison).

In my view, the most important causal model to be familiar with is the potential outcome framework. In Causal Inference in Statistics, Social, and Biomedical Sciences: An Introduction, Guido W. Imbens and Donald B. Rubin provide an introduction to Rubin’s causal model and several topics related to experimental and observational research.

Next, the classic book on the validity model to causality (Campbell’s causal model) is Experimental and Quasi-experimental Designs for Generalised Causal Inference. This book is written with psychology research in mind but is relevant for most of the social sciences. What I like about this book is that it devotes a lot of attention to the threats to validity that researchers will often encounter but might not even consider.

For an introduction to Pearl’s causal model, I recommend Counterfactuals and Causal Inference: Methods And Principles For Social Research. This book provides a very good introduction to the directed acyclic graph (DAG) framework to causality.

Some might ask why I don’t recommend any of the work by Judea Pearl himself. In short, while I do like his work I am not a great fan of his writing. His book Causality: Models, Reasoning and Inference is not a good introduction (especially not for most social scientists) and The Book of Why: The New Science of Cause and Effect is not doing a good job positioning the framework within the broader literature (in other words, I agree with Peter M. Aronow and Fredrik Sävje that the book is selective and narrow in its introduction to the history of causality).

I recommend to read the three books and compare the different approaches to causality. Not for the purpose of finding your ‘causality tribe’, but – on the contrary – to understand the strengths and limitations of different approaches.

8. Field Experiments – Design, Analysis, and Interpretation
Field Experiments – Design, Analysis, and Interpretation is a solid book on how to design, analyse and interpret experiments. In other words, the subtitle of the book is very much correct. If you have very limited experience with experiments, this book is a must read.

The book is great at introducing the logic of the experimental method and connect this to statistical topics such as different estimators, how to calculate standard errors etc.

Also, while Don Green, one of the co-authors, was involved in some problematic “empirical” research (to say the least), this book is definitely still worth your time.

9. Design of Observational Studies

Design of Observational Studies by Paul Rosenbaum is one of the best books to understand the design of observational studies (not to be compared with Observational Studies by the same author).

The book deals with statistical approaches to observational studies (including matching) and is not too difficult to get into (even for social science students). I have also included it on this list as it covers various elements of observational studies that I didn’t find in any other books.

10. Experimental Political Science and the Study of Causality: From Nature to the Lab
If you are into experiments this book is the primer on all aspects of experiments. What is great about this book is that it covers a lot of topics and how different experimental traditions within economics and psychology look at these topics. For example, what is the role of deception in experiments and what can we learn from experiments when deception is involved?

This is, in other words, the go-to reference for people who wants to conduct experimental political science. And even if you are not a political scientist, I can highly recommend this book.

These are my ten recommendations. Have fun! Last, my apologies for the clickbait title. These books will not sell themselves. Also, if you made it this far I am sure you wouldn’t need an apology in any case.