A new study, published in the Proceedings of the National Academy of Sciences, argues that ‘wolf attacks predict far-right voting’ – at least in Germany. The findings are statistically significant, the effect sizes are (too?) large, and there is even a pre-registration of the study. Yet, I see a few red flags (in addition to the fact that it is published in PNAS).
The main problem with the study is that it was not initially a study about ‘far-right voting’, i.e., voting for Alternative für Deutschland (AfD). That is, voting for a far-right party was not the main outcome of interest according to the pre-registration plan. Here is what the pre-registration plan says is the main hypothesis being tested: Wolf attacks lower the vote share of the Green party.
AfD is not the Green party. Nevertheless, here is how the researchers describe the outcome of interest in the paper: “Our primary outcome of interest is the vote share of the far-right AfD, but we also analyze votes for the proenvironmental Green party.” Now the main outcome of interest in the pre-registration plan is just of secondary interest. If we look at the key dependent variables according to the pre-registration plan, it says: The vote share of the Green Party across federal and state elections at the municipality level.
So why is this study about far-right voting and not the Green party? The answer is provided in the paper: “At the same time, we find inconsistent evidence on whether wolf attacks are associated with a drop in voting for the Green party.” In other words, the researchers did initially look at the Green party, found no evidence for their predicted hypothesis, and decided to go with one of the hypotheses they would examine in an ‘exploratory manner’.
If the authors were transparent about this in the paper, I would not mind it at all. However, the paper is not examining wolf attacks and far-right voting in an ‘exploratory manner’, but instead presenting it as if that was the main hypothesis of interest that was part of the pre-registration.
To understand what is going on, we need to understand how the pre-registration plan in question provides a lot of freedom if the main hypothesis fails (which it did). Specifically, the pre-registration plan linked to the study lists three other hypotheses, namely that wolf attacks raise the vote share of the far-right AfD, that wolf attacks increase (environmental) polarization, and that wolf attacks increase turnout.
One could argue that this is not a problem: add as many hypotheses for secondary analyses as you can into the pre-registration plan and then write up the paper if one of them is working out – and say that everything is simply following the pre-registration plan. I believe this is misleading but I am not surprised to see a journal like PNAS run with this. There is no theory in the paper and all talk about a potential mechanism is explored in passing looking at a few different datasets (that, conveniently, were not part of the pre-registration plan).
What we end up with is a paper on how wolf attacks increase the vote shares of the far-right AfD across different elections. Even if we believe this is all fine, there are theoretical and empirical issues to consider. In other words, I do not believe the paper in question should make us conclude that wolf attacks have important implications for elections.
Why are the researchers focusing on AfD in the first place? Because, as described in the pre-registration plan, it is an ‘anti-environmental party’. If you have a theory for why the pro-environment Green party should be affected (and this is your main hypothesis!), but ends up writing a paper about how wolf attacks only matter for the support of an anti-environment party, I am not convinced you are working within the correct theoretical framework. You are merely fishing in the sea of p-values until you find something that you know will satisfy a journal like PNAS.
Second, why look at all federal elections starting in 1990 and all state elections starting in 2000, when AfD first ran in 2013? Sure, you can talk a lot about having fine-grained data, but is all this data actually useful? And can we capture relevant differences between the municipalities with a few covariates? I see a lot of challenges with spatial data like this and, as the researchers say in the paper: ‘one should be cautious in interpreting the findings in a causal manner’. You bet. I would even be cautious in interpreting the findings in a correlational manner.
It is always interesting to see what researchers are willing to do in order to get ‘research’ into a journal like PNAS. However, if they would like to preserve any level of scientific integrity, I would have recommended to follow the pre-registration plan and get the paper published in a better journal (in terms of research quality). That being said, I am no stranger to the incentives researchers have to consider, and I do sympathise with what they initially tried to do with this project.
To move beyond wolf attacks, the general question of interest is how many secondary hypotheses we should allow in a pre-registration plan before we should be concerned. Of course, the researchers could simply have made four separate pre-registrations (for four different papers) and only written up the one about wolf attacks and AfD, but if we are to take pre-registration plans seriously, we need to see a better overlap between plans and papers. What I think would be useful moving forward is a stronger link between the pre-registration and the publishing process. Specifically, PNAS should have accepted (or rejected) the paper on the basis of the pre-registration, and thereby not putting the researchers in a position where they had to tip the scale and mislead the readers.
If you look at the details of the pre-registration plan, you can also see that the authors consider a continuous treatment (the number of wolf attacks). If one of the four hypotheses did not work out with the initial treatment, they would still have at least four other models to examine (while still saying that the results were predicted by a pre-registration plan). When a pre-registration plan is used to convince editors, reviewers and readers of the predicted nature of an analysis, I am not convinced pre-registration plans are working as intended.
The other day, I wrote about the many causes of Brexit. One of the points I tried to make was that there is an endless supply of potential predictors of social phenomena, but identifying a new predictor is not a sufficient condition to improve our understanding of a specific phenomena. At the end of the day, alas, I am not convinced that the study in question has improved our understanding of why Germans voted for AfD in the study period. Instead, the researchers made a pre-registration plan with a set of outcomes and decided to write up one of the findings once they had a look at the results (we know this as the Texas sharpshooter fallacy).
Last, it is impossible to read the study without thinking back at the time when political science jumped the shark. My concern is that this study is not much better despite being ‘predicted’ in a pre-registration plan.