Natural experiments are experiments that take place in the ‘real’ world (i.e., in ‘nature’) without any randomisation managed by the researcher. In contrast to a ‘normal’ experiment where only the involved researchers are aware of the experiment and the data collection, natural experiments are often available to all researchers with access to the relevant data.
This opens up for specific issues with natural experiments that we rarely encounter with other types of experiments, namely that different researchers use the same natural experiment to address different research quesitons. A recent study by Heath et al. (2023) highlights the problem in the context of multiple hypothesis testing when researchers reuse natural experiments. Specifically, once they correct for multiple hypothesis testing, the findings from specific natural experiments are no longer positive (and, instead, false positives).
The problem with reusing natural experiments is well-known within the econometric literature on instrumental variables. Morck and Yeung (2011), for example, describe how the reuse of instruments is a tragedy of the commons within the literature: “A Tragedy of the Commons has led to an overuse of instrumental variables and a depletion of the actual stock of valid instruments for all econometricians. Each time an instrumental variable is shown to work in one study, that result automatically generates a latent variable problem in every other study that has used, or will use, the same instrumental variable, or another correlated with it, in a similar context.” (p. 50)
When we have two studies using the same natural experiment but with different theories and outcomes of interest, can we read and understand these studies in isolation? Do the two studies provide an independent contribution independent of what the other study is showing? Can we still say one plus one equals two? Or are there diminishing returns when researchers use the same natural experiment?
To my knowledge, very little attention has been devoted to these issues within political science (the work by Jonathan Mellon on instrumental variables is an important exception). However, if we consider some of the natural experiments studying the impact of the 2015 Islamic State terrorist attacks in France on public opinion, we can begin to see some of the issues with how researchers use natural experiments in political science.
Muñoz et al. (2020) showed, in an introduction to the ‘Unexpected Event during Survey Design’, how the Charlie Hebdo terrorist attack could be used to estimate the impact on the attack on French citizens’ satisfaction with their national government using data from the European Social Survey. However, this is by no means the only study using the Charlie Hebdo terrorist attack as a natural experiment.
Solheim (2019), for example, also uses data from the European Social Survey to examine the causal effect of the Charlie Hebdo terrorist attack on public opinion. However, in this paper, the outcome of interest is attitudes towards immigration policy. And there are other important differences, such as the countries of interest and potential moderators (consumption of news). Accordingly, we begin to see how researchers can use the exact same natural experiment and the same data to explore different research questions.
Over the recent years, we have seen multiple other studies using the Charlie Hebdo terrorist attack as a natural experiment exploiting data from the European Social Survey. Colombo et al. (2022) look at the effects of terrorist attacks on well-being. Giani (2021) looks at the effect on fear and ethnic prejudice. Giani and Merlino (2021) look at perceived ethnoracial discrimination. Peri et al. (2023) look at attitudes towards immigration, political orientation, satisfaction with the national government, and trust in parliament. Savelkoul et al. (2022) look at attitudes towards Muslim immigrants. Turkoglu and Chadefaux (2023) look at life satisfaction, happiness, and attitudes toward the government, institutions, and immigrants.
Again, all of these studies use the same data (the European Social Survey) and use the Charlie Hebdo terrorist attack as a natural experiment. However, there are several differences between the studies. Some of the studies also look at other terrorist attacks. Some studies look at the attacks in different countries (i.e., not only in France). And as is of interest here, and illustrated above, the studies look at different outcomes. How can we even begin to make sense of these studies individually if we have to think about all of these outcomes where we have theoretical reasons to expect that Charlie Hebdo could matter for each of these outcomes? Such a theoretical model (e.g., a DAG) will look like spaghetti alla carbonara.
I have previously written about some of these challenges in the literature on the (many) causes of Brexit (#1, #2), but I believe the challenges here are much more related to problem of the tragedy of the commons identified in studies using the same instrumental variable(s). Whereas my concern in the context of the studies on Brexit was about one outcome (the Brexit vote) with several independent causes, we are here looking at one independent cause with several different effects.
My recommendation is, if you plan to use data from the European Social Survey to estimate the effect(s) of the Charlie Hebdo terrorist attack, that you engage with the other studies before you put your sweet sheep on the land of natural experiments. In other words, you cannot just download the relevant data from the European Social Survey and pretent that you live in a world where there is not already a lot of different studies on the impact of the Charlie Hebdo terrorist attack. And more importantly, you cannot read any of these studies in the literature on their own without taking the arguments, theories and findings from the other studies into account – even if they look at different outcomes.
There is one more study that is relevant to look at here. Silva (2018) showed, also studying the impact of the Charlie Hebdo attack using data from the European Social Survey, that there “is no evidence of average impacts across a range of issues, from xenophobia to ideological self-placement and immigration policy preferences”. In other words, no impact of the terrorist attack on different oucomes. As a lot of the studies above found statistically significant effects of the attack on different outcomes, it is important to consider whether there are reasons to be concerned with false positives in at least some of the studies outlined above. That is, are researchers more likely to focus on the outcomes, moderators, countries, period (a few days or several weeks before and after the attack?) and model specifications that show a statistically significant effect of the Charlie Hebdo attack on public opinion? Or are the studies that show significant effects of salient events more likely to end up being published?
Another reason why the study from Silva (2018) is relevant is that it shows somewhat similar findings studying the impact of the Paris shootings on November 13, 2015 (also called the Bataclan attack), using data from Eurobarometer. That is, the main findings replicate to another terrorist attack in France in 2015. Unsurprisingly, as with Charlie Hebdo and the European Social Survey, several studies have used the Paris shootings and data from Eurobaromater as a natural experiment.
Coupe (2017) looks at Eurobarometer data to explore the impact of the November 13 attacks in Paris on expectations, trust and happiness. Ferrín et al. (2020) use the data to estimate the impact of terrorist attacks on Europeans’ attitudes towards immigrants. Nussio et al. (2019) look at the impact on attitudes toward migrants and refugees. And Nussio et al. (2021) look at whether terrorist attacks make people more likely to pick terrorism as a most important issue. My concerns mentioned above are also relevant here.
Noteworthy, we also have a few studies looking at the November 13 attacks using other data sources than Eurobarometer. Breton and Eady (2022), for example, look at the impact on attitudes toward Syrian refugees in Canada. And van Hauwaert and Huber (2020) look at anti-immigrant opinions, immigration salience, political polarisation, social cohesion, societal integration, and political trust. However, despite not using the same data, these studies are still using the same natural experiment.
Again, the more of these studies we see using the same dataset and the same natural experiment, but with different theories and outcomes of interest, the more we face a tragedy of the commons. I do not believe any additional study using the above-mentioned cases as natural experiments add a lot to our understanding of how terrorist attacks matter for public opinion. This is not to say that these studies are bad on their own (or even when aggregated), but that they do not add up in a manner we would like to see when we operate with cumulative science.
My specific concern is that when we begin to compare these studies, we will see that they cannot all be true (i.e., several of these studies will make conclusions that are not robust). For example, when using such natural experiments, we are often faced with a trade-off between statistical power and confounding (see, for example, this great paper on causal exaggeration), and my concern is that several of these studies will not replicate well within a comprehensive framework (i.e., a robust framework that tries to replicate all of the individual findings in the different studies).
Again, natural experiments often rely on data collected for other purposes. Such data often comes with a rich set of variables, including various potential outcomes, covariates, placebo tests, etc. The use of the same natural experiment using the same data might be a Tragedy of the Commons, and I would like to see more work addressing these concerns.