Potpourri: Statistics #78

Investigation of Data Irregularities in Doing Business 2018 and Doing Business 2020
Dyadic Clustering in International Relations
Forecasting: Principles and Practice
Data Disasters
A Quick How-to on Labelling Bar Graphs in ggplot2
Data visualisation using R, for researchers who don’t use R
Easy access to high-resolution daily climate data for Europe
Put R Models in Production
Machine learning, explained
Three ways to visualize binary survey data
In defense of simple charts
Modern Statistics with R
How to avoid machine learning pitfalls: a guide for academic researchers
Tune xgboost models with early stopping to predict shelter animal status
Machine-learning on dirty data in Python: a tutorial
I saw your RCT and I have some worries! FAQs
Up and running with officedown
Use racing methods to tune xgboost models and predict home runs
The 5-minute learn: Create pretty and geographically accurate transport maps in R
R’s Internal Data Formats: .Rda, .RData, .rds
Improve Your Code – Best Practices for Durable Code
An educator’s perspective of the tidyverse
Estimating regression coefficients using a Neural Network (from scratch)
Let users choose which plot you want to show
A look into ANOVA. The long way.
3 alternatives to a discrete color scale legend in ggplot2
Downloading the Census Household Pulse Survey in R
The Stata Guide
The Four Pipes of magrittr
Introducing {facetious} – alternate facets for ggplot2
Alternatives to Simple Color Legends in ggplot2
Top 3 Coding Best Practices from the Shiny Contest
Visualizing ordinal variables
Making Shiny apps mobile friendly
Climate circles
Elegant and informative maps with tmap
Exploring R² and regression variance with Euler/Venn diagrams
Exploring Pamela Jakiela’s simple TWFE diagnostics with R
The marginaleffects package for R
A lightweight data validation ecosystem with R, GitHub, and Slack
Create spatial square/hexagon grids and count points inside in R with sf
A daily updated JSON dataset of all the Open House London venues, events, and metadata
Animating Network Evolutions with gganimate
Beyond Bar and Box Plots
Causal Inference in R Workshop
Odds != Probability
How to visualize polls and results of the German election with Datawrapper
Irreproducibility in Machine Learning
tidybundestag
A collection of themes for RStudio
Shiny, Tableau, and PowerBI: Better Business Intelligence
Automate PowerPoint Production Using R
Estimating graph dimension with cross-validated eigenvalues
Understanding text size and resolution in ggplot2
Introduction to linear mixed models


Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77

New book: Reporting Public Opinion

I am happy to announce the publication of a new book, ‘Reporting Public Opinion: How the Media Turns Boring Polls into Biased News‘, co-authored with Zoltán Fazekas. The book is about how and why opinion polls are more likely to be about change in the news reporting. Specifically, journalists are more likely to pick opinion polls that show changes, even when such changes are within the margin of error, highlight such changes in the reporting – and the public, pundits and politicians are more likely to respond to and share such polls.

Here is the puzzle we address throughout the various chapters: how can most opinion polls show a lot of stability over short periods of time whereas the reporting of opinion polls are dominated by change?

Even for the most hardcore followers of politics, opinion polls are quite boring in and by themselves. In most cases they show nothing new. When we take the margin of error into account, a new opinion poll will most likely show that there is no statistically significant shift in the polls for any of the political parties of interest. And when there is a large change, it is most likely a statistical fluke we should be cautious about. I have over the years written countless posts about such opinion polls being covered in the Danish media.

The book is our attempt to provide a unified framework to better understand these dynamics in a systematic manner. In the first chapter of the book, we introduce the theoretical puzzle and outline the main limitation of existing studies on the topic, namely that studies on opinion polls tend to focus on one specific stage in the coverage, such as whether methodological details are present in the coverage or not. To fully understand how opinion polls are covered and consumed in contemporary democracies, we argue that we need to combine different literatures on opinion polls and examine how a strong preference for change can explain biases in how opinion polls travel through several stages from their initial collection to how they reach the public.

In the second chapter, we further develop a framework that focuses on the temporal dimension of how opinion polls are brought to the public via the media. This chapter serves as an introduction to the four stages that opinion polls have to go through in our framework. Specifically, we show how each stage – or activity – will lead to polls showing greater changes getting more attention. This is illustrated below:

Next, throughout Chapters 3, 4, and 5, we cover the stages of opinion polls in greater detail and show collectively how opinion polls are being turned into specific news stories. In Chapter 3, we focus on the selection of opinion polls. That is, we investigate what can explain whether journalists decide to cover an opinion poll or not. In Chapter 4, we target the content of the reporting of opinion polls, which covers the news articles dedicated to the opinion polls that journalists have decided to report on. In doing this, we show how the selection and reporting of opinion polls are shaped by a similar preference for change. Noteworthy, when introducing the idea of change, we dedicate extensive considerations to how we can best measure change and what the availability of these change measures means for the selection and reporting.

In Chapter 5, we analyse the next natural stage in the life of opinion polls: how do politicians, experts and the public respond to them and to the stories written about them. Essentially, we delve into the implications of how these opinion polls are selected and covered. Here, we show that both elites and the broader public have a strong preference to engage with (respond to or share) opinion polls that show greater changes or support a well-defined change narrative. Interestingly, we find that opinion polls showing greater changes are much more likely to go viral on Twitter.

In Chapter 6, we turn our attention to the alternatives of the reporting of opinion polls. Here, we discuss how no opinion polls at all, poll aggregators, social media, and vox pops can be seen as alternatives to opinion polls, and in particular what are their strengths and limitations. The ambition here is not to force the reader to decide whether opinion polls are good or bad, but rather to understand how alternatives to opinion polls can mitigate or amplify the biases introduced in the previous chapter.

Last, in Chapter 7, we conclude how the media might report on opinion polls by considering the trade-offs between what the polls often show and what journalists wish they showed. Specifically, we first set out to discuss the implications of the findings for how we understand the political coverage of opinion polls today and then discuss the most important questions to be answered in future work.

The book is the product of years of work on the topic of how opinion polls are reported in the media. However, while the topic should be of interest to most people with an interest in politics and opinion polls, this is an academic book and I should emphasise that it might be a tough read for a non-academic audience.

You can buy the book at Waterstones, Bookshop, Springer, Blackwell’s and Palgrave.

Causality models: Campbell, Rubin and Pearl

In political science, the predominant way to discuss causality is in relation to experiments and counterfactuals (within the potential outcomes framework). However, we also use concepts such as internal and external validity and sometimes we use arrows to show how different concepts are connected. When I was introduced to causality, it was on a PowerPoint slide with the symbol X, a rightwards arrow, and the symbol Y, together with a few bullet points on the specific criteria that should be met before we can say that a relationship is causal (inspired by John Gerring’s criterial approach; see, e.g., Gerring 2005).

Importantly, there are multiple models we can consider when we want to discuss causality. In brief, there are three popular causality models today: 1) the Campbell model (focusing on threats to validity), 2) the Rubin model (focusing on potential outcomes), and 3) the Pearl model (focusing on directed acyclic graphs). The names of the models are based on the names of the researchers who have been instrumental in the development of these models (Donald Campbell, Donald Rubin and Judea Pearl). I believe a good understanding of these three models is a prerequisite to be able to discuss causal inference within quantitative social science.

Luckily, we have good introductions to the three frameworks that compare the main similarities and differences. The special issue introduced by Maxwell (2010) focuses on two of the frameworks, namely the frameworks related to Campbell and Rubin. What is great about the special issue is that it focuses on important differences between the two frameworks but also how the two frameworks are complementary. That being said, it does not pay a lot of attention to the Pearl’s framework. Shadish (2010) and West and Thoemmes (2010) provide comparisons of the work by Campbell and Rubin on causal inference. Rubin (2010) and Imbens (2010) further provide some additional reflections on the causal models from their own perspectives.

The best primer to understand the three frameworks is the book chapter by Shadish and Sullivan (2012). They make it clear that all three models to causality acknowledge the importance of manipulable causes and brings an experimental terminology into observational research. In addition, they highlight the importance of assumptions (as causal inference without assumptions is impossible). Unfortunately, they do not summarise the key similarities and differences between the models in a table. For that reason, I decided to create the table below to provide a brief overview of the three models. Keep in mind that the table provides a simplified comparison and there are important nuances that you will only fully understand by consulting the relevant literature.

Campbell Rubin Pearl
Core Validity typology and the associated threats to validity Precise conceptualization of causal inference Directed acyclic graphs (DAGs)
Goal Create a generalized causal theory Define an effect clearly and precisely State the conditions under which a given DAG can support a causal inference
Fields of development Psychology Statistics, program evaluation Artificial intelligence, machine learning
Examples of main concepts Internal validity, external validity, statistical conclusion validity, construct validity Potential outcomes, causal effect, stable-unit-treatment-value assumption Node, edge, collider, d-seperation, back-door criterion, do(x) operator
Definition of effect Difference between counterfactuals Difference between potential outcomes The space of probability distributions on Y using the do(x) operator
Causal generalisation Meta-analysis, construct and external validity Response surface analysis, meditational modeling Specified within the DAG
Assumption for valid inference in observational research Ruled out all threats to validity Strong ignorability Correct DAG
Examples of application Quasi-experiments Missing data imputation, propensity scores Mediational paths
Conceptual and philosophical scope Wide-ranging Narrow, formal statistical model Narrow, formal statistical model
Emphasis Descriptive causation Descriptive causation Explanatory causation
Preference for randomized experiments Yes Yes No
Focus on effect or mechanism Effect Effect Mechanism
Limitation General lack of quantification, no formal statistical model (lacks analytic sophistication) Limited focus on features of research designs with observational data Vulnerability to misspecification

The Campbell model focuses on validity, i.e., the quality of the conclusions you can make based on your research. The four types of validity to consider here are: 1) (statistical) conclusion validity, internal validity, construct validity, and external validity. Most important for the causal model is the internal validity. That is, the extent to which the research design identities a causal relationship. External validity refers to teh extent to which we can generalise the causal relationship to other populations/contexts. I believe one of the key advantages here is the comprehensive list of potential threats to validity listed in this work. Some of these potential threats are more relevant for specific designs or results, and being familiar with these potential threats will make you a much more critical (and thereby better) researcher. The best comprehensive introduction to the Campbell model is Shadish et al. (2002).

The Rubin model focuses on potential outcomes and how units have potential outcomes in different conditions (most often with and without a binary treatment). For example, Y(1) is an array of potential outcomes under treatment 1 and Y(0) is an array of potential outcomes under treatment 0. This is especially useful when considering an experiment and how randomisation can realise one potential outcome for a unit that can, in combination with other units, be used to calculate the average treatment effect (as we cannot estimate individual-level causal effects). To solve the fundamental problem of causal inference (that we can only observe one unit in one world) we would need a time machine, and in the absence of such science fiction tools, we are left with the importance of the assignment mechanism for causal inference (to estimate effects such as ATE, LATE, PATE, ATT, ATC, and ITT). One of the key advantages of this model is to understand how potential outcomes are turned into one realised outcome and the assumptions we rely on. For example, the Stable Unit Treatment Value Assumption (SUTVA) implies that potential outcomes for one unit are unaffected by the treatment of another unit. This emphasises the importance of minimising the interference between units. The best comprehensive introduction to the Rubin model is Imbens and Rubin (2015).

The Pearl model provides causal identification through directed acylic graphs (DAGs), i.e., how conditioning on a variable along a path blocks the path, and how specific effects need to be restricted in order to make causal inferences. When using with this model of causality, you are often worken with multiple paths and not a simple setup where you only have two groups, one outcome and a single treatment. DAGs can also be understood as non-parametric structural equation models, and are particular useful when working with conditional probabilities and Bayes networks/graphical models.

One of the main advantages of the Pearl model is that it forces you to think much more carefully about your causal model, including what not to control for. For that reason, the model is much better geared to causal inference in complicated settings than, say, the Rubin model.

However, there are also some noteworthy limitations. Interactions and effect heterogeneity are implied in the model, and it can be difficult to convey such ideas (whereas it is easier to consider conditional average treatment effects in the Rubin model). While DAGs are helpful to understand complex causal models, it is often less helpful when we have to consider the parametric assumptions we need to estimate causal effects in practice.

The best introduction to the Pearl model is, surprisingly, not the work by Pearl himself (although I did enjoy The Book of Why). As a political scientist (or a social scientist more generally), I find introductions such as Morgan and Winship (2014), Elwert (2013), Elwert and Winship (2014), Dablander (2020), and Rohrer (2018) much more accessible.

(For Danish readers, you can also check out my lecture slides from 2016 on the Rubin model, the Campbell model and the Pearl model. I also made a different version of the table presented above in Danish that you can find here.)

In political science, researchers have mostly relied on the work by Rubin and Campbell, and less so on the work by Pearl. However, recently we have seen some good work that relies on the insights provided by DAGs. Great examples include the work on racially biased policing in the U.S. (see Knox et al. 2020) and the the work on estimating controlled direct effects (Acharya et al. 2016).

Imbens (2020) provides a good and critical discussion of DAGs in relation to the Rubin model (in favour of the potential outcomes over DAGs as the preferred model to causality within the social sciences). Matthay and Glymour (2020) show how the threats to internal, external, construct and statistical conclusion validity can be presented as DAGs. Lundberg et al. (2021) show how both potential outcomes and DAGs can be used to outline the identification assumptions linking a theoretical estimand to an empirical estimand. This is amazing work and everybody with an interest in strong causal inference connecting statistical evidence to theory should read it.

My opiniated take is that the three models work well together but not necessarily at the same time when thinking about theories, research designs and data. Specifically, I prefer Pearl → Rubin → Campbell. First, use Pearl to outline the causal model (with a particular focus on what not to include). Second use Rubin to focus on the causal estimand of interest, consider different estimators and assumptions (SITA/SUTVA). Third, use Campbell to discuss threats to vality, measurement error, etc.

In sum, the three models are all good to be familiar with if you do quantitative (and even qualitative) social science.

How (not) to study suicide terrorism

Today is the 20 year anniversary for 9/11. That made me look into one of the most salient methodological discussions on how to study suicide terrorism within political science.

Suicide terrorism is a difficult topic to study. Why? Because we cannot learn about the causes (or correlates) of suicide terrorism from only studying cases of terrorism. Pape (2003) studies 188 suicide attacks in the period 1980-2001. He concludes that there is a strategic logic to these attacks, namely that they pay off for the organisations and groups pursuing such attacks.

Ashworth et al. (2008) use simple statistics such as conditional probabilities to show that there are problems with the paper in question, namely that the original paper “samples on the dependent variable.” I especially liked this formulation in the conclusion: “It is important to note that our critique of Pape’s (2003) analysis does not make the well-known point that association does not imply causation. Rather, because Pape collects only instances of suicide terrorism, his data do not even let him calculate the needed associations.”

Pape (2008) provides a reply to the critique raised by Ashworth and colleagues. He first brings a long excerpt from his book not taking the critique of Ashworth et al. into account. Then, he writes: “One might still wonder whether the article is flawed by sample bias because it considered systematically only actual instances of suicide terrorism. The answer is no, for two reasons. First, the article did not sample suicide terrorism, but collected the universe of suicide terrorist attacks worldwide from 1980 through 2001. […] There is no such thing as sample bias in collecting a universe. Second, although it is true that the universe systematically studied did not include suicide terrorist campaigns that did not happen, and that this limits the claims that my article could make, this does not mean that my analysis could not support any claims or that it could not support the claims I actually made.”

Importantly, just because you might have the universe of suicide terrorist attacks, you should still treat it as a sample (especially if you want to make policy recommendations about future cases we have not seen yet). In other words, this is a weird way of defending your flawed analysis. In an unpublished rejoinder, Ashworth (2008) provide some additional arguments to why the response to the criticism is flawed. Also, Horowitz (2010) shows that when you increase the universe of cases, Pape’s findings do not hold.

The debate is more than ten years old but reminiscent of similar contemporary debates on data and causality. Accordingly, I find it to be a good read for people interested in research design, data and inference — and it’s a good case to discuss what can (not) be learned from ‘selecting on the dependent variable’. Last, and most importantly, if you want to understand this amazing tweet, it is good to be familiar with the debate.

Assorted links #6

151. The Art of Command Line
152. The Ultimate Guide to Inflation
153. How will climate change shape climate opinion?
154. We Should All Be More Afraid of Driving
155. A decade and a half of instability: The history of Google messaging apps
156. Twenty Years Gone: What Bobby McIlvaine Left Behind
157. Burning out and quitting
158. 100 Very Short Rules for a Better Life
159. How Flash games shaped the video game industry
160. How I practice at what I do
161. 10 Positions Chess Engines Just Don’t Understand‎
162. Hundreds of Ways to Get S#!+ Done—and We Still Don’t
163. Tank Man
164. This page is a truly naked, brutalist html quine.
165. The 25 Most Significant Works of Postwar Architecture
166. Eunoia: Words That Don’t Translate
Some documentaries I like that are available on YouTube:
167. Ways of Seeing: Episode 1, Episode 2, Episode 3, Episode 4
168. Kubrick Remembered
169. The King of Kong
170. Lektionen in Finsternis
171. Koyaanisqatsi
172. Baraka
173. Do Not Split
174. 66 scener fra Amerika
175. The Power of Nightmares: The Rise of the Politics of Fear: Baby It’s Cold Outside, The Phantom Victory, The Shadows in the Cave
176. Zizek!
177. Powers of Ten
178. Modern Marvels: The Berlin Wall
179. The Hotline Miami Story
180. Stop Making Sense

Er Socialdemokratiet gået tilbage i meningsmålingerne? #4

Hos TV 2 kan man læse, at en opsigtsvækkende meningsmåling fra Megafon antyder nye styrkeforhold i dansk politik. Konkret fremhæves det, at Socialdemokratiet og Venstre begge står til en mærkbar tilbagegang.

På baggrund af dækningen af Megafon-målingen hos TV 2 er det således nemt at tro, at den konkrete måling viser store forandringer. Dette er, som jeg vil vise i dette indlæg, slet ikke tilfældet. I artiklen står der først: “I en ny måling, som Megafon har foretaget for TV 2 og Politiken, går regeringspartiet 2,7 procentpoint tilbage sammenlignet med forrige måling.” Dette tal stemmer ikke overens med, hvad de rapporterer i figuren, hvor det er klart, at det blot er 2,1 procentpoint.

Det er relevant at sammenligne den nye Megafon-måling (fra august) med den forrige måling fra Megafon (fra maj). Når vi sammenligner en meningsmåling med et valgresultat, har vi kun en statistisk usikkerhed for meningsmålingen, men ikke for valgresultatet. Når vi sammenligner en meningsmåling med en tidligere meningsmåling, er det vigtigt at tage den statistiske usikkerhed for begge i betragtning.

Vi kan beregne den statistiske usikkerhed for forskellen mellem to andele ved hjælp af følgende formel, der giver os standardfejlen for forskellen mellem opbakningen til et parti i to forskellige meningsmålinger:

$$ \sqrt{p_{a}\frac{(1-p_{a})}{n_{a}} + p_{b}\frac{(1-p_{b})}{n_{b}}} $$

Hvor $ p_{a} $ og $ n_{a} $ er henholdsvis opbakningen til et parti (evt. 0,3 hvis partiet står til 30% af stemmerne) og stikprøvestørrelsen, begge i meningsmåling $ a $, og $ p_{b} $ og $ n_{b} $ er henholdsvis opbakningen til et parti og stikprøvestørrelsen i meningsmåling $ b $.

Denne usikkerhed kan vi så bruge til at se, om der er forskelle mellem to andele. Konkret kan vi multiplicere standardfejlen med 1,96, der giver os et 95% konfidensinterval, når vi lægger det til og fra forskellen mellem $ p_{a} $ og $ p_{b} $. Hvis du hurtigt vil sammenligne tallene fra to målinger men ikke gider at lave disse beregninger selv, kan du altid bruge denne service.

I nedenstående figur ser vi nærmere på, hvor store forskelle der er i partiernes opbakning i de to Megafon-målinger. Er der tale om en opsigtsvækkende meningsmåling, der antyder nye styrkeforhold i dansk politik? Nej.

I figuren kan vi se, at Socialdemokratiet er det parti, der går mest tilbage, men det er også det største parti, hvorfor den statistiske usikkerhed er større. Det står dog klart, at der ikke er tale om de store forandringer. Bemærk også, at TV 2 skriver, at tilbagegangen særligt gavner SF og Enhedslisten, der “begge går nævneværdigt frem”. Jeg har ingen anelse om, hvad det vil sige, at partierne går nævneværdigt frem.

Hvad jeg finder nævneværdigt her er det modsatte. Siden den seneste Megafon-måling, der fandt sted før sommerferien, har intet parti rykket sig nævneværdigt i meningsmålingerne. Dette er i tråd med min pointe i de forrige indlæg omkring opbakningen til Socialdemokratiet i de seneste meningsmålinger. Dette bekræftes også, når vi ser på meningsmålingerne fra Epinion. Hvis vi sammenligner den seneste meningsmåling fra Epinion med deres meningsmåling fra før sommerferien, ser vi en forskel i Socialdemokratiets opbakning på 0,0%.

Mit gæt er, at hvis alle andre meningsmålinger havde vist akkurat det samme som Megafons seneste måling, ville der ikke have været en historie. Derfor nævnes Berlingskes vægtede snit da også et par gange i artiklen, hvor det fremhæves, at målingen står i kontrast med hvad Berlingske Barometer viser. Det er interessant at TV 2 flere gange refererer til Berlingske Barometer, da Megafon har gjort hvad de kunne for ikke at være inkluderet i dette vægtede gennemsnit. Megafon (eller TV 2 eller Politiken?) ønsker ikke at bidrage til, at vi har den bedst tilgængelige viden om, hvad målingerne viser (ved at være inkluderet i Berlingske Barometer), men de har det fint med, at de bruges til at opsummere andre målinger. Det er helt til grin.

Som med tidligere indlæg jeg har skrevet om Socialdemokratiets opbakning i meningsmålingerne, savner jeg at se flere meningsmålinger, der systematisk viser en tilbagegang på tværs af institutterne, før jeg vil konkludere, at partiet ligger tættere på 25% end 30% af stemmerne. Flere medier synes dog at have det fint med at konkludere på baggrund af enkeltmålinger.