Udregn mandater til Folketinget med R

I mange meningsmålinger rapporteres partiernes opbakning ikke udelukkende med andelen af stemmer i procent, men også som mandattal. D’Hondts metode bruges som bekendt fordelingen af kredsmandater ved Folketingsvalg, der sammen med tillægsmandater sikrer en ligelig fordeling mellem stemmer og mandater ved valget.

Hvis man gerne vil estimere hvor mange mandater de respektive partier står til at få, kan jeg varmt anbefale seatdist pakken til R. Den er udviklet af Juraj Medzihorsky og kan findes her. Når du har installeret pakken kan du nemt hente den ind i R og bruge giveseats() funktionen til at udregne mandater:

giveseats(c(33, 6, 10, 7, 8, 1, 3, 1, 5, 17, 8, 1), 
          ns = 175, 
          thresh = 0.02,
          method = "dh")

Det første vi giver funktionen er en vektor med opbakningen til partierne i procent (jeg har her undladt decimaler blot for at gøre det nemmere at læse). 33 er eksempelvis opbakningen til Socialdemokratiet i procent. ns angiver hvor mange mandater, vi skal fordele (number of seats, i dette tilfælde 175), thres angiver spærregrænsen (2% i dette tilfælde) og method er vores fordelingsmetode (dh for D’Hondt).

Her kan vi se at Socialdemokratiet vil få omkring 61 mandater ved næste folketingsvalg. Dette er selvfølgelig et estimat, da vi 1) har usikkerhed i meningsmålingen og 2) alle mandater ikke fordeles så simpelt ved valget. Vi tager ligeledes ikke de fire nordatlantiske mandater med i betragtning. Ikke desto mindre er det relativt nemt at få et estimat på, hvor stor opbakningen er til partierne i mandater. Pakken giver desuden en lang række muligheder for at undersøge partiernes mandattal, hvis man tog andre mandatfordelingsmetoder i brug.

Potpourri: Statistics #70 {gt}

gt – a (G)rammar of (T)ables
Functions and Themes for gt tables
Beautiful Tables in R: gt and the grammar of tables
Embedding custom HTML in gt tables
A 3-way crosstab table using {gt}
Replicating a New York Times Table of Swedish COVID-19 deaths with gt
Spending on Education
The Big Mac Index Table
gtsummary: Presentation-Ready Data Summary and Analytic Result Tables


Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69

How to improve your figures #2: Don’t show overlapping text labels

I was reading this study on the impact of Weberian bureaucracy on economic growth published in Comparative Political Studies. It’s a great article and I can highly recommend reading it.

I like that the study presents most of the results in figures. In fact, there are more figures than tables in the article. However, a few of the figures present several data points (countries) in scatter plots with labels to all points (country names). Here is one example:

As you can see, several country labels overlap with each other making it difficult to read the country names. The problem is not as severe as it could have been (as the authors have made the height greater than the width, making more space for horisontal text). However, for a lot of the labels it is simply not possible to read the country names.

Importantly, this is not only about aesthetics. When several country labels overlap, it is no longer possible to see whether there are actual data points hidden by the labels.

To improve the figure, my suggestion would be to only show some of the value labels. In the figure below I have tried to only show the country names for the countries that you can actually read in the figure above.

In my view, this is a clear improvement of the original figure.

My R-script to create the figure is here:

library("tidyverse")
library("haven")

bureaucracygrowth <- read_dta("22725104_Replication_data_Bureaucracy_Growth.dta")

bureaucracygrowth %>% 
  mutate(country_name_show = case_when(
    v2stcritrecadmv9 < -1  ~ country_name,
    QoG_expert_q2_a > 6.3 | QoG_expert_q2_a < 2 ~ country_name,
    v2stcritrecadmv9 > 0.7 & QoG_expert_q2_a < 3.5 ~ country_name,
    v2stcritrecadmv9 < 1 & QoG_expert_q2_a > 4.4 ~ country_name,
    TRUE ~ ""
  )) %>% 
  ggplot(aes(v2stcritrecadmv9, QoG_expert_q2_a)) +
  geom_smooth(method = "lm", se = FALSE) +
  ggrepel::geom_text_repel(aes(label = country_name_show)) +
  geom_point() +
  theme_minimal() +
  labs(y = "Meritocratic recruitment (QoG expert-survey), 2014",
       x = "Meritocratic recruitment (V-Dem), 2014")

ggsave("bureaucracygrowth.png", width = 6, height = 6)

Potpourri: Statistics #69

Hands-On Data Visualization: Interactive Storytelling from Spreadsheets to Code
Reflecting on “Vote Cones”
Least squares as springs
Applying PCA to fictional character personalities
Bayes Rules! An Introduction to Bayesian Modeling with R
tiktokr: An R Scraper for Tiktok
Efficient and beautiful data synthesis: Taking your tidyverse skills to the next level
A Gentle Introduction to Tidy Model Stacking
11 Short Machine Learning Ethics Videos
Your first R package in 1 hour
What is a dot plot?
JavaScript for R
Literature on Recent Advances in Applied Micro Methods
In Fallout Over Polls, ‘Margin of Error’ Gets New Scrutiny
Programming Choice Experiments in Qualtrics
The list of 2020 visualization lists
The 9 concepts and formulas in probability that every data scientist should know
Collapse repetitive piping with reduce()
Economics charts in R using ggplot2
Top 10 tips to make your R package even more awesome
Running R Scripts on a Schedule with GitHub Actions
Leveraging labelled data in R
Creating and using custom ggplot2 themes
Advanced R Course
The intuition behind averaging
Underrated Tidyverse Functions
Bullet Chart Variants in R
Using the tidyverse with Databases – Part I


Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68

Potpourri: Statistics #68

Rain, Rain, Go away: 137 potential exclusion-restriction violations for studies using weather as an instrumental variable
Awesome R Learning Resources
A Quick Guide for Journalists to the Use and Reporting of Opinion Polls
Mapping congressional roll calls
Fancy Times and Scales with COVID data
Colors via clrs.cc in R
grstyle: Customizing Stata graphs made easy
American political data & R
Best-Practice Recommendations for Defining, Identifying, and Handling Outliers
Likelihood Ratios: A Tutorial
Data Science related quotes
cleanplots: Stata graphics scheme
PLSC 31101: Computational Tools for Social Science
Covid-19: The global crisis — in data
Working with Large Spatial Data in R
PCA tidyverse style
The many Flavours of Missing Values
Introducing RStudio and R Markdown
Tools for Analyzing R Code the Tidy Way
Dive into dplyr (tutorial #1)
The Good, the Bad and the Ugly: how (not) to visualize data
Programatically Generating PDF Reports with the Tidyverse
Building an animation step-by-step with gganimate
“package ‘foo’ is not available” – What to do when R tells you it can’t install a package

Tidyverse resources on YouTube

I have been watching a lot of YouTube videos lately with people using tidyverse. These videos are not tutorials per se but rather demonstrations on how to wrangle and analyse data. These videos use a lot of dplyr and ggplot2 as well as packages associated with the tidyverse, e.g. tidytext. For the data, they often use data associated with the TidyTuesday project.

I can highly recommend the following three channels: 1) David Robinson, 2) Julia Silge and 3) TidyX. The first two write their own code. All of them are good at going through various functions in R demonstrating the power of tidyverse.

The videos by David Robinson are great for beginners. He is good at introducing various packages and functions in a simple manner (or as simple as possible). In addition, you can find a systematic overview of the packages and functions introduced in the different videos here (made by Alex Cookson).

Of course, you can also – with little extra work – find other good videos on YouTube related to tidyverse, e.g. this video with 18 tips and tricks. The key benefit of videos on YouTube (compared to text guides) is that you can actually see how things are carried out and pick up on good/best practices. There is also an overview of other YouTube accounts here (though I am not yet familiar with all of them).

In brief, if you are already familiar with the basics of R but are looking for various videos on how to improve your data wrangling and visualisation skills, I can highly recommend these resources on YouTube.

Potpourri: Statistics #67

Computational Causal Inference at Netflix
Tools for Ex-post Survey Data Harmonization
How to pick more beautiful colors for your data visualizations
Shiny in Production: App and Database Syncing
Introduction to Causal Inference
An Illustration of Decision Trees and Random Forests with an Application to the 2016 Trump Vote
Key things to know about election polling in the United States
State-of-the-art NLP models from R
Introduction to Stan in R
How to write your own R package and publish it on CRAN
Bayesian Analysis for A/B Testing
Estimating House Effects
Heatmaps in ggplot2
The Taboo Against Explicit Causal Inference in Nonexperimental Psychology
Spreadsheet workflows in R
A beginner’s guide to Shiny modules
Dataviz Interview
10 Things to Know About Survey Experiments
Applying Weights
Creating effective interrupted time series graphs: Review and recommendations
Lasso and the Methods of Causality
How We Designed The Look Of Our 2020 Forecast
Taking Control of Plot Scaling
How to measure spatial diversity and segregation?
10+ Guidelines for Better Tables in R
How maps in the media make us more negative about migrants
Comparing two proportions in the same survey
Quantitative Social Science Methods, I (Gov2001 at Harvard University)
Introduction to Computational Thinking
Creating R Packages with devtools
Introduction to Statistical Learning in R
Textrecipes series: Pretrained Word Embedding

Potpourri: Statistics #66

How To Read 2020 Polls Like A Pro
Visualizing Complex Science
What is machine learning, and how does it work?
Choosing Fonts for Your Data Visualization
Why linear mixed-effects models are probably not the solution to your missing data problems
Outstanding User Interfaces with Shiny
How I Taught Tidymodels, Virtually
How your colorblind and colorweak readers see your colors
Graphic Content: How Visualizing Data Is a Life-or-Death Matter
How to Create Brand Colors for Data Visualization Style Guidelines
The R package workflow
Introducing Modeltime: Tidy Time Series Forecasting using Tidymodels
Tidy Geospatial Networks in R
PCA with Age of Empires II data
A practical guide to geospatial interpolation with R
Oh my GOSH: Calculating all possible meta-analysis study combinations
Rating children’s books with empirical Bayes estimation
Normalizing and rescaling children’s book ratings
Tips from an R Journalist
How to improve your R package
A very short introduction to Tidyverse
purrr: Introduction and Application
Supervised Machine Learning for Text Analysis in R
4 Tips to Make Your Shiny Dashboard Faster
How I share knowledge around R Markdown
Teaching Statistics and Data Science Online
a ggplot2 grammar guide
Five Tidyverse Tricks You May Not Know About
How to build a Tufte-style weather graph in R using ggplot2

Data visualization: a reading list

Here is a collection of books and peer-reviewed articles on data visualization. There is a lot of good material on the philosophy, principles and practices of data visualization.

I plan to update the list with additional material in the future (see the current version as a draft). Do reach out if you have any recommendations.

Introduction

Graphs in Statistical Analysis (Anscombe 1973)
An Economist’s Guide to Visualizing Data (Schwabish 2014)
Data Visualization in Sociology (Healy and Moody 2014)
Beyond Bar and Line Graphs: Time for a New Data Presentation Paradigm (Weissgerber et al. 2015)
Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods (Cleveland and McGill 1984)
Graphic Display of Data (Wilkinson 2012)
Visualizing Data in Political Science (Traunmüller 2020)
Better Data Visualizations: A Guide for Scholars, Researchers, and Wonks (Schwabish 2021)

History

Historical Development of the Graphical Representation of Statistical Data (Funkhouser 1937)
Quantitative Graphics in Statistics: A Brief History (Beniger and Robyn 1978)

Tips and recommendations

Ten Simple Rules for Better Figures (Rougier et al. 2014)
Designing Graphs for Decision-Makers (Zacks and Franconeri 2020)
Designing Effective Graphs (Frees and Miller 1998)
Fundamental Statistical Concepts in Presenting Data: Principles for Constructing Better Graphics (Donahue 2011)
Designing Better Graphs by Including Distributional Information and Integrating Words, Numbers, and Images (Lane and Sándor 2009)

Analysis and decision making

Statistical inference for exploratory data analysis and model diagnostics (Buja et al. 2009)
Statistics and Decisions: The Importance of Communication and the Power of Graphical Presentation (Mahon 1977)
The Eight Steps of Data Analysis: A Graphical Framework to Promote Sound Statistical Analysis (Fife 2020)

Uncertainty

Researchers Misunderstand Confidence Intervals and Standard Error Bars (Belia et al. 2005)
Error bars in experimental biology (Cumming et al. 2007)
Confidence Intervals and the Within-the-Bar Bias (Pentoney and Berger 2016)
Depicting Error (Wainer 1996)
When (ish) is My Bus?: User-centered Visualizations of Uncertainty in Everyday, Mobile Predictive Systems (Kay et al. 2016)
Decisions With Uncertainty: The Glass Half Full (Joslyn and LeClerc 2013)
Uncertainty Visualization (Padilla et al. 2020)
A Probabilistic Grammar of Graphics (Pu and Kay 2020)

Tables

Let’s Practice What We Preach: Turning Tables into Graphs (Gelman et al. 2002)
Why Tables Are Really Much Better Than Graphs (Gelman 2011)
Graphs or Tables (Ehrenberg 1978)
Using Graphs Instead of Tables in Political Science (Kastellec and Leoni 2007)
Ten Guidelines for Better Tables (Schwabish 2020)

Deciding on a chart

Graph and chart aesthetics for experts and laymen in design: The role of familiarity and perceived ease of use (Quispel et al. 2016)

Chart types

Boxplots

40 years of boxplots (Wickham and Stryjewski 2011)

Pie charts

No Humble Pie: The Origins and Usage of a Statistical Chart (Spence 2005)

Infographics

Infovis and Statistical Graphics: Different Goals, Different Looks (Gelman and Unwin 2013)
InfoVis Is So Much More: A Comment on Gelman and Unwin and an Invitation to Consider the Opportunities (Kosara 2013)
InfoVis and Statistical Graphics: Comment (Murrell 2013)
Graphical Criticism: Some Historical Notes (Wickham 2013)
Tradeoffs in Information Graphics (Gelman and Unwin 2013)

Maps

Visualizing uncertainty in areal data with bivariate choropleth maps, map pixelation and glyph rotation (Lucchesi and Wikle 2017)

Scatterplot

The Many Faces of a Scatterplot (Cleveland and McGill 1984)
The early origins and development of the scatterplot (Friendly and Denis 2005)

Dot plots

Dot Plots: A Useful Alternative to Bar Charts (Robbins 2006)

3D charts

The Pseudo Third Dimension (Haemer 1951)

Teaching pedagogy

Correlational Analysis and Interpretation: Graphs Prevent Gaffes (Peden 2001)
Numbers, Pictures, and Politics: Teaching Research Methods Through Data Visualizations (Rom 2015)
Data Analysis and Data Visualization as Active Learning in Political Science (Henshaw and Meinke 2018)

Software

Excel

Effective Data Visualization: The Right Chart for the Right Data (Evergreen 2016)

R

Data Visualization (Healy 2018)
Data Visualization with R (Kabacoff 2018)
ggplot2: Elegant Graphics for Data Analysis (Wickham 2009)
Fundamentals of Data Visualization (Wilke 2019)
R Graphics Cookbook (Chang 2020)

Stata

A Visual Guide to Stata Graphics (Mitchell 2012)


Changelog
– 2021-03-01: Add ‘Better Data Visualizations’
– 2020-08-03: Add ‘Ten Guidelines for Better Tables’
– 2020-07-14: Add ‘Designing Graphs for Decision-Makers’ and ‘A Probabilistic Grammar of Graphics’ (ht: Simon Straubinger)

A response to Andrew Gelman

In a new blog post, Andrew Gelman writes that the findings in an article of ours are best explained by forking paths. I encourage you to read the blog post and, if you still care about the topic, continue and read this post as well.

This is going to be a (relatively) long post. In brief, I will show that the criticism is misleading. Specifically, it is easy to find the effect we report in our paper (without “statistical rule-following that’s out of control”) and that Andrew Gelman is either very unlucky or, what I find more likely, very selective in what he reports. I have no confidence that Andrew Gelman engaged with our material with an open mind but, on the contrary, I believe he invested a non-trivial amount of time building up a (false) narrative about the validity and reliability of our findings.

That being said, Andrew Gelman was polite in reaching out and he gave me the possibility to comment on his criticism. Beyond a few clarifications, I decided not to provide most of the comments below in a private conversation. Again, based upon his language and analysis of our data, I am convinced that he has no interest in engaging in a constructive discussion about the validity of our findings. For that reason, I find it better to keep everything public and transparent.

Our contribution

In our paper, we show that winning political office have significant implications for the longevity of candidates for US gubernatorial office. Here is the abstract:

Does political office cause worse or better longevity prospects? Two perspectives in the literature offer contradicting answers. First, increased income, social status, and political connections obtained through holding office can increase longevity. Second, increased stress and working hours associated with holding office can have detrimental effects on longevity. To provide causal evidence, we exploit a regression discontinuity design with unique data on the longevity of candidates for US gubernatorial office. The results show that politicians winning a close election live 5–10 years longer than candidates who lose.

And here is the table with the main result:

The paper is published in Political Science Research and Methods. You can find the replication material for the article here.

Is our finding replicated?

Before we get into the details of the analysis and whatnot, I am happy to confirm that our findings are similar to those in a recent study published by the economists Mark Borgschulte and Jacob Vogler in Journal of Economic Behavior & Organization. For the sample most similar to ours, they find a local average treatment effect of 6.26 years:

It is great to see that other people are working on the topic and reaching similar conclusions. Overall, I believe that the effect we report in our paper is not only reproducible but also replicated by another set of researchers. I did inform Andrew Gelman about this study but I can understand his reasons for not linking to this study as well in his blog post.

How difficult is it to reproduce our findings?

In brief, Andrew Gelman is not convinced by our study. That’s okay. I am rarely convinced when I see a new study (it’s easy to think about potential limitations and statistical issues). However, what we see here are several characterisations of the results in particular and the statistical approach in general, such as “silly”, a “scientific failure”, “statistical rule-following that’s out of control”, “fragile” and “fatally flawed”.

No scientific study is perfect. Data is messy and to err is human. Accordingly, it is important and healthy for science that we closely and thoroughly inspect the quality of each others work (especially when we consider how many errors that can easily slip through peer-review). Not only do I appreciate that smart colleagues devote their sparse time to look at my work, I also encourage people to check out my work and reach out if something is not working out (or write a blog post or publish a paper or make a tweet). For that reason, I make all my material publicly available (most often on Harvard Dataverse and GitHub). That being said, in this case, I am simply not convinced that our study is fatally flawed.

Andrew Gelman argue that we report an estimate that is very difficult to reproduce (and, at the end of the day, unrepresentative of a true effect): “I tried a few other things but I couldn’t figure out how to get that huge and statistically significant coefficient in the 5-10 year range. Until . . . I ran their regression discontinuity model”.

Seriously? Is it really that difficult to obtain a significant effect in line with what we report in the paper? Based upon Andrew Gelman’s post, I can understand if that’s the impression people might have. Maybe Andrew Gelman should consider reading Gelman and Hill (2007) and follow the first suggestion on how to run an RDD analysis. From page 214:

Without any complicated procedures, what happens if we get the data and follow the procedure as described in the introductory textbook? Well, let us look at the data and run the suggested regression.

# Load tidyverse (to make life easy)
library("tidyverse")

# Load the data
df_rdd <- data.table::fread("longevity.csv")

# Make the data ready for the analysis
df_m <- df_rdd %>% 
  filter(year >= 1945, living_day_imp_post > 0) %>% 
  mutate(won = ifelse(margin_pct_1 >= 0, 1, 0),
         margin = margin_pct_1)

# Run regression
df_m %>% 
  filter(abs(margin) < 5) %>% 
  lm(living_day_imp_post ~ won + margin, data = .) %>% 
  broom::tidy()
# A tibble: 3 x 5
  term        estimate std.error statistic  p.value
                          
1 (Intercept)    8429.      676.     12.5  4.63e-28
2 won            3930.     1193.      3.29 1.13e- 3
3 margin         -656.      219.     -3.00 2.98e- 3

The effect of winning office is ~10 years according to this model. I am not saying that this is the best model to estimate (it’s not a model we report in the paper). However, that’s it. Nothing more. We don’t need rocket science (or a series of covariates and weird analytical choices) to reproduce the main result. You might hate the result, not believe it, believe it is nothing but noise etc., but at least show the dignity to acknowledge that it is there.

How can it be that difficult to get this estimate? Gelman and Hill (2007) is more than 10 years old (I can still very much recommend it though), but I would suggest that you at least try to follow your own guidelines before you try to convince your readers that you could not reproduce a set of results. That is, if you don’t want to make a fool of yourself.

There is a certain irony to all of this, especially when the reason we didn’t pursue certain analytical choices was that it “would take additional work” (or, that’s at least what Andrew Gelman speculates). I know that I can’t expect too much of Andrew’s time; he is on a tight schedule with a new blog post every day (with an incentive – or at least motivation – to point out flaws in research, especially in studies using regression discontinuity designs), but… why not show how easy it is to get the main effect, as much as you don’t like it or the design that gave birth to it, without “statistical rule-following that’s out of control”? Why pretend that you can only find the effect when you follow our regression discontinuity model?

(Also, do note the number of cases in the example in the textbook, i.e. n = 68. Not a single word about power here. There is something funny about how the textbook example is good enough when you believe the result but that the sample size is turning this into a “fatally flawed” study when you don’t believe in the result. I know, the book is old and things have changed over the last few years (“New Statistics“, all for the better). However, I have seen hundreds of RDD studies with less power than us that are nowhere as reliable. And don’t get me started on IV regression studies with weak instruments.)

The conclusion that our results are so difficult to reproduce is misleading. Or, in other words, it is convenient for his blog post that he didn’t bother to run the first suggestion for an RDD analysis suggested by Gelman and Hill (2007).

So what did he do? Interestingly, in order to show how difficult it is to reproduce our results, you need to take a few analytical steps. I don’t want to speculate too much but, for the lack of better words, we can say that Andrew Gelman fell prey to the garden of forking paths.

Specifically, Andrew Gelman is not trying to limit the amount of decisions he have to make in order to find a non-significant effect (or, I think he was, but that didn’t work out). On the contrary, he is very much interested in getting “the most natural predictors” (I know, I laughed too!) right from the get-go. What’s the difference between falling prey to the garden of forking paths and having common sense insights into ‘the most natural predictors’? You tell me.

Let us look at what exactly Andrew Gelman is saying that he did in order to get to his statistically non-significant effect:

I created the decades_since_1950 variable as I had the idea that longevity might be increasing, and I put it in decades rather than years to get a more interpretable coefficient. I restricted the data to 1945-2012 and to candidates who were no longer alive at this time because that’s what was done in the paper, and I considered election margins of less than 10 percentage points because that’s what they showed in their graph, and also this did seem like a reasonable boundary for close elections that could’ve gone either way (so that we could consider it as a randomly assigned treatment).

Here is the problem. When I do all of this, I get an effect! Hmm. Hmm. Hmm. We better dig into the code reported in the blog post. Here is the code that he used to get at the first reported regression:

df_rdd <- data.table::fread("longevity.csv")
death_date <- sapply(df_rdd[,"death_date_imp"], as.character) 
living <- df_rdd[,"living"] == "yes"
death_date[living] <- "2020-01-01"
election_year <- as.vector(unlist(df_rdd[,"year"]))
election_date <- paste(election_year, "-11-05", sep="")
more_days <- as.vector(as.Date(death_date) - as.Date(election_date))
more_years <- more_days/365.24
age <- as.vector(unlist(df_rdd[,"living_day_imp_pre"]))/365.24
n <- nrow(df_rdd)
name <- paste(unlist(df_rdd[,"cand_last"]), unlist(df_rdd[,"cand_first"]), unlist(df_rdd[,"cand_middle"]))
first_race <- c(TRUE, name[2:n] != name[1:(n-1)])
margin <- as.vector(unlist(df_rdd[,"margin_pct_1"]))
won <- ifelse(margin > 0, 1, 0)
lifetime <- age + more_years
decades_since_1950 <- (election_year - 1950)/10
data <- data.frame(margin, won, election_year, age, more_years, living, lifetime, decades_since_1950)
subset <- first_race & election_year >= 1945 & election_year <= 2012 & abs(margin) < 10 & !living
library("arm")
fit_1a <- lm(more_years ~ won + age + decades_since_1950 + margin, data=data, subset=subset) 
display(fit_1a)
lm(formula = more_years ~ won + age + decades_since_1950 + margin, 
    data = data, subset = subset)
                   coef.est coef.se
(Intercept)        78.60     4.05  
won                 2.39     2.44  
age                -0.98     0.08  
decades_since_1950 -0.21     0.51  
margin             -0.11     0.22  
---
n = 311, k = 5
residual sd = 10.73, R-Squared = 0.35

(No, the code is not a historical document on how people wrote R code in the 90s – nor a paid ad for tidyverse.)

Here the effect of winning office is only 2.39 years (all his models show estimates between 1 and 3 years). The first thing I notice here is that the sample size is substantially different from mine, so it must be something with the subsetting. Ah, I get it! He also restricted the sample with the first_race variable. Let us try to subset according to the actual procedure outlined in the paragraph above and estimate the model again.

subset_reported <- election_year >= 1945 & election_year <= 2012 & abs(margin) < 10 & !living
fit_1a_reported <- lm(more_years ~ won + age + decades_since_1950 + margin, data=data, subset=subset_reported) 
display(fit_1a_reported)
lm(formula = more_years ~ won + age + decades_since_1950 + margin, 
    data = data, subset = subset_reported)
                   coef.est coef.se
(Intercept)        74.65     3.18  
won                 3.15     1.89  
age                -0.91     0.06  
decades_since_1950 -0.03     0.41  
margin             -0.19     0.17  
---
n = 499, k = 5
residual sd = 10.93, R-Squared = 0.33

That makes more sense. Somebody might even want to call it statistically significant (I will, for the sake of argument, not do this here). My theory is that Andrew Gelman initially did as he wrote in the blog post but decided that it would not be good for his criticism to actually find an effect and, accordingly, took another path in the garden. In other words, the effect is not good to introduce in the first model in a blog post about a paper with forking paths and “statistical rule-following that’s out of control”. However, I can say that what Andrew Gelman is doing here is the simple act of ‘p-hacking in reverse’ (type-2 professor-level p-hacking instead of the well-known newb type-1 p-hacking).

Here is another funny thing: Later in the blog post, Andrew Gelman follows the actual procedure as described in the blog post. Now, however, it is described as an explicit choice to include “duplicate cases” — and just to laugh at the “silly” results: “Just for laffs, I re-ran the analysis including the duplicate cases”.

That’s a fucked-up sense of humour. Different strokes for different folks, I guess. Andrew Gelman played around with the data till he got the insignificant finding he wanted and then he decides to attribute effects consistent with those in the paper to ‘just for laughs’ or by ‘including data’ (that was not excluded in the first place). What is the difference between selecting “the most natural predictors” and including variables just for laughs? Garden of forking paths, I guess.

In any case, congratulations! You could select a set of covariates that returns a sample size of 311 and the standard errors you wanted when you decided to go into the garden.

Using the non-significant effect, Gelman then continues to introduce a set of follow-up regressions to show that it is not possible for him to get anywhere near a significant effect, implying that a significant effect can only be obtained by forking paths: “What about excluding decades_since_1950? […] Nahhh, it doesn’t do much. We could exclude age also: […] Now the estimate’s even smaller and noisier! We should’ve kept age in the model in any case. We could up the power by including more elections: […] Now we have almost 500 cases, but we’re still not seeing that large and statistically significant effect.”

How (un)lucky can you be? Or, how little scientific integrity can you show when you engage with the material?

Here’s a thought experiment: If we had used the “most natural predictors” in the paper and found an effect (which we still did afterall), would Andrew Gelman then have agreed with us that those were the most natural predictors? What about if the simple model presented above (with no covariates) would have returned no effect, would Andrew Gelman still have found “the most natural predictors” to be relevant as the most natural predictors? Of course not.

As you will see in his blog post, he describes that he limits the sample significantly, but this is only to keep things simple: “To keep things simple, I just kept the first election in the dataset for each candidate”. I suggest he replace “keep things simple” with “make sure I have a small sample size and a non-significant effect and a chance to keep this blog post interesting”. Sure, there can be valid reasons to exclude these observations (or at least reflect upon how to best model the data at hand), but if you are trying to tell us that our study is useless, please provide better reasons for discarding a significant number of cases than to “keep things simple”.

Again, I am not convinced by the argument that he was unable to reproduce an effect similar to that reported in the paper.

What we did in the paper was to not use a simple OLS regression to estimate the effect, but the rdrobust package for robust nonparametric regression discontinuity estimates. Here is the main result reported in the paper:

rdrobust::rdrobust(y = df_m$living_day_imp_post, 
                   x = df_m$margin) %>% 
  summary()
Call: rdrobust

Number of Obs.                 1092
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                 516         576
Eff. Number of Obs.            236         243
Order est. (p)                   1           1
Order bias  (q)                  2           2
BW est. (h)                  9.541       9.541
BW bias (b)                 19.017      19.017
rho (h/b)                    0.502       0.502
Unique Obs.                    516         555

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
=============================================================================
  Conventional  2749.283   873.601     3.147     0.002  [1037.057 , 4461.509]  
        Robust         -         -     3.188     0.001  [1197.646 , 5020.823]  
=============================================================================

This is the effect. Again, nothing more. Andrew Gelman also imply that including age as a covariate in this approach is needed so here is the model with age as a predictor (conveniently not reported in his blog post):

rdrobust::rdrobust(y = df_m$living_day_imp_post, 
                   x = df_m$margin, 
                   covs = df_m$living_day_imp_pre) %>% 
  summary()
Call: rdrobust

Number of Obs.                 1092
BW type                       mserd
Kernel                   Triangular
VCE method                       NN

Number of Obs.                 516         576
Eff. Number of Obs.            255         265
Order est. (p)                   1           1
Order bias  (q)                  2           2
BW est. (h)                 10.412      10.412
BW bias (b)                 20.323      20.323
rho (h/b)                    0.512       0.512
Unique Obs.                    516         555

=============================================================================
        Method     Coef. Std. Err.         z     P>|z|      [ 95% C.I. ]       
=============================================================================
  Conventional  1948.131   701.527     2.777     0.005   [573.164 , 3323.098]  
        Robust         -         -     2.838     0.005   [689.788 , 3770.580]  
=============================================================================

This is indeed a more precise estimate.

We do present a lot of results in the paper and the appendix. You have a lot of potential choices when you do an RDD analysis. Bandwidth selection, covariates, local-polynomial order etc., and that’s definitely something that can give a lot of possibilities in terms of forking paths.

We report the most parsimonious results in the paper in order to not give us the freedom to pursue several (forking) paths (including providing our case for what’s the most natural predictors). We are transparent about the models we estimated in our paper, also when the result is not statistically significant. Specifically, not all estimates reported in our paper are significant, and hopefully our approach shows that it is indeed possible to get non-significant effects (though still interesting effect sizes). For the choice of bandwidth, for example, Andrew Gelman writes that the “analysis is very sensitive to the bandwidth”. We do show that it is possible to obtain insignificant effects when you estimate the models with a specific bandwidth. Here is the first figure in the appendix:

We find the largest effect estimates with a bandwidth of ~5 using no covariates (Model 1). For the covariates, I am sure, as Andrew Gelman most likely can confirm, it is possible to reduce this effect further with some forking paths.

There is, as Andrew Gelman also points out, no smoking gun. What he does instead is to provide a lot of scribbles about the sociology of science, common sense and fatally flawed statistics. The narrative mode is ‘stream of consciousness’ with a lot of negative words. The purpose is to show that there are so many issues with our study, that it is, again, fatally flawed.

From what I can see, it is mostly a series of misunderstandings or, at best, comments that are completely irrelevant. I guess his aim is to show that there are so many issues that the sum of these proves his bigger point that the study is silly.

The first illustrative example is this point: “Then I looked above, and I saw the effective number of observations was only 236 or 243, not even as many as the 311 I had earlier!”

I suggest that you consult the documentation for rdrobust:

It’s the number of observations on each side of the threshold. The total number of observations is 479. And just to see if we can agree on one thing here (however, having seen how our analysis is being treated in the blog post, I remain skeptical):

479 > 311
[1] TRUE

Also, even if it was “only” around 240, it is still a sample size almost four times greater than the textbook example provided in Gelman and Hill (2007). I am not using this as an argument but as a recommendation to update the material if you want to conclude that what everybody else is doing is fatally flawed.

Andrew Gelman also hints at issues with the data. Specifically, it is a problem that we have additional data that we do not use, such as politicians that died before the next election. I am unable to see how it should be a limitation that we provide all the data we have, including cases that we do not use in our analysis. Also, he says that it is not possible to get the dataset in the correct format which is very much incorrect (it’s actually easy to download .csv files from the Harvard Dataverse).

Enough about this ‘circumstantial evidence’. The overall conclusion of the blog post is the following paragraph: “If you do a study of an effect that is small and highly variable (which this one is: to the extent that winning or losing can have large effects on your lifespan, the effect will surely vary a lot from person to person), you’ve set yourself up for scientific failure: you’re working with noise.”

I believe there is more signal than noise in this data. I definitely do not see our work as a scientific failure. The effects are all positive and not highly variable. Yes, they tend to be closer to 5 years than to 10 years, but that’s not necessarily noise. Again, this is in line with the finding for governors running in elections post-1908 reported by other researchers, where the effect size is ~6 years.

Next, these are not small effects. Even if you manage to press them all the way down to a few years, they are still substantial.

I do agree a lot with the point on heterogeneity, i.e. that the effects will vary a lot from person to person. We do explore that to some extent in the paper but we try not to make too strong inferences about any of these subsample analysis. However, I will be happy to see future research deal with this exact point.

An effect size of up to 10 years is a large effect and, yes, it would have been interesting as well if the effect was 2 years. If the true effect of winning office on longevity is, say, 0.5 years, would we have sufficient statistical power to detect such an effect and conclude that it was statistically different from 0? I don’t think so and I believe that Andrew Gelman got some good reflections on Type M and Type S errors in relation to effect sizes that are also relevant to our study.

Is it against common sense that politicians that wins office live significantly different lives and that these differences might add up to a substantial difference in longevity prospects? Maybe. I don’t think so. If I did, I would not have worked on a paper testing this dynamic. However, as always, it is important to remain skeptical towards the findings of our study (even when it’s in line with the findings in another study).

I will not speculate about whether it is against common sense or not. As always, everything is obvious once you know the answer. What I can say is that I would never let the size of an effect be decisive for whether I would like to publish the result or not (for that reason, I have also published several null findings). Also, when we are dealing with effect sizes, most people are unimpressed by small effect sizes (and a Cohen’s d of .9 is considered a small-to-moderate effect by lay people). Accordingly, I am not sure how strong of an approach common sense is (in and by itself) when we are to evaluate effect sizes.

That being said, my skepticism always increases as the effect size increases. And I would be surprised (and worried) if nobody decided to take a look at our data. Again, I am thankful that Andrew Gelman spend a significant amount of his time on this. Interestingly, Jon Mellon, co-director of the British Election Study, was also critical towards the effect size. He did look at the data. However, he did not reach the same conclusion as Andrew Gelman:

I am not including this here to say that Jon Mellon is on my side as this is not about being on one side or the other. However, I am saying that I find it interesting if other people look at the data as well without reaching the obvious conclusion that it is impossible to produce our models without making weird analytical choices. Also, I am open to the possibility that Jon Mellon – as anybody else – will update what they think about the findings in our paper in the light of these blog posts.

The role of visual inference in RDDs

The first thing Andrew Gelman is presenting in his blog is a visual summary of our data (a scatter plot). He uses this to say that there is nothing going on around the threshold (the discontinuity) and all we can see is evidence for noise (and even garden of forking paths!). I am not impressed by this reasoning but I still find it relevant to comment on. Specifically, we know that a visual presentation of a discontinuity can be strong evidence for an effect, but it’s not sufficient to say that there is no effect because we cannot eyeball a difference.

To understand this, consider this figure made by the economist C. Kirabo Jackson on how RD plots are not useful to assess the existence of an effect.

Specifically, the figure shows a statistically significant effect but it is clear that we cannot observe this effect. I do still believe it is relevant to look at the raw data to get a sense of what we are looking at, but I am not sure why our evidence is less compelling just by looking at the raw data.

To further illustrate this, I took a look at one of my favourite RDD studies, namely ‘What Happens When Extremists Win Primaries?‘ by Andrew B. Hall (published in American Political Science Review). From the abstract: “When an extremist — as measured by primary-election campaign receipt patterns — wins a “coin-flip” election over a more moderate candidate, the party’s general-election vote share decreases on average by approximately 9–13 percentage points, and the probability that the party wins the seat decreases by 35–54 percentage points.”

Here are figures of the raw data similar to those Gelman presented in his blog post using our raw data:

What is clear here is that the raw data does not allow us to assess whether there is a strong effect (i.e. that when an extremist wins an election, the vote share decreases on average by approximately 9–13 percentage points). I am not picking this study to criticize the design (or say that “look, everybody else is doing this!”). Again, I pick this study as it is one of my favourite RDD studies that also shows large effects, or as Andrew B. Hall writes in the paper: “These are large effects.”

In sum, I am not convinced that the raw data says anything meaningful about whether we are left with noise-mining.

Concluding remarks

Overall, the purpose of this post is not to say that I am correct and Andrew Gelman is wrong. I go where the data brings me (even if it is not “common sense”). If it turns out there is no large effect of winning office on longevity, that’s fine with me. I’m not a politician.

Do download the data, read the paper, play around with the data, update the data (alas, in the long run we will have no missing data), try different models, find out how much work you need to put into this in order to make the estimates statistically non-significant etc. Again, you can make a lot of decisions when conducting this type of analysis, including but not limited to bandwidth choices, outliers, second order polynomial, alternative cutoffs and various restricted samples (we provide tests on all of these in the Appendix, but of course this is not enough — and we have provided the replication material for that exact reason).

I am not saying that any of the above is proof that Andrew Gelman deliberately only presented models that suited his “common sense” belief about the fact that our study must be a failure (“we knew ahead of time not to trust this result because of the unrealistically large effect sizes”). However, I can say that I, in the future, will be much more critical towards his way of presenting his analysis of other people’s work on his blog (and maybe even in his published work).

In general, I am sure I agree with Andrew Gelman on more topics than I disagree. As he wrote in a blog post the other day (in relation to another topic): “Build strong models, make big assumptions, issue strong statements, and then when (inevitably) you’re proven wrong, when you’re embarrassed in front of the world, own your errors and figure out what went wrong in your reasoning. Science is self-correcting—but only if we self-correct.” That’s spot on.

We definitely agree on the importance of being able to look into other peoples analysis and results and interpretations. Nullius in verba and whatnot. Furthermore, the last thing I want to do is to look ungrateful when people are reproducing my work (I know from my prior work how pathetic scientists can be when you show that it’s difficult to reproduce their work).

That being said, I was wondering whether I should dignify Andrew Gelman’s criticism with a response. He did little to engage with the material (and it shows). Here is my view: Andrew Gelman is an academic Woody Allen. Some of his work is very good, but his blog post on our study is closer to A Rainy Day in New York than, say, Annie Hall.

Overall, I see a contribution in our paper. As always, a single study is of little value in and by itself, but I do see a contribution. For that reason, I don’t agree that our paper is a “scientific failure”, but I can see how such a categorisation is needed as the criticism is less effective in this case without these exaggerations.

To Andrew Gelman, any effect size he’s not convinced by is best explained by ‘forking paths’ (if all you have is a hammer, everything looks like a nail). Even if it requires a few detours in the garden to get to that point. I believe we agree that it’s possible to fool yourself with forking paths without ever realizing it. The disagreement here is primarily about who is not realizing it.