Causality models: Campbell, Rubin and Pearl

In political science, the predominant way to discuss causality is in relation to experiments and counterfactuals (within the potential outcomes framework). However, we also use concepts such as internal and external validity and sometimes we use arrows to show how different concepts are connected. When I was introduced to causality, it was on a PowerPoint slide with the symbol X, a rightwards arrow, and the symbol Y, together with a few bullet points on the specific criteria that should be met before we can say that a relationship is causal (inspired by John Gerring’s criterial approach; see, e.g., Gerring 2005).

Importantly, there are multiple models we can consider when we want to discuss causality. In brief, there are three popular causality models today: 1) the Campbell model (focusing on threats to validity), 2) the Rubin model (focusing on potential outcomes), and 3) the Pearl model (focusing on directed acyclic graphs). The names of the models are based on the names of the researchers who have been instrumental in the development of these models (Donald Campbell, Donald Rubin and Judea Pearl). I believe a good understanding of these three models is a prerequisite to be able to discuss causal inference within quantitative social science.

Luckily, we have good introductions to the three frameworks that compare the main similarities and differences. The special issue introduced by Maxwell (2010) focuses on two of the frameworks, namely the frameworks related to Campbell and Rubin. What is great about the special issue is that it focuses on important differences between the two frameworks but also how the two frameworks are complementary. That being said, it does not pay a lot of attention to the Pearl’s framework. Shadish (2010) and West and Thoemmes (2010) provide comparisons of the work by Campbell and Rubin on causal inference. Rubin (2010) and Imbens (2010) further provide some additional reflections on the causal models from their own perspectives.

The best primer to understand the three frameworks is the book chapter by Shadish and Sullivan (2012). They make it clear that all three models to causality acknowledge the importance of manipulable causes and brings an experimental terminology into observational research. In addition, they highlight the importance of assumptions (as causal inference without assumptions is impossible). Unfortunately, they do not summarise the key similarities and differences between the models in a table. For that reason, I decided to create the table below to provide a brief overview of the three models. Keep in mind that the table provides a simplified comparison and there are important nuances that you will only fully understand by consulting the relevant literature.

Campbell Rubin Pearl
Core Validity typology and the associated threats to validity Precise conceptualization of causal inference Directed acyclic graphs (DAGs)
Goal Create a generalized causal theory Define an effect clearly and precisely State the conditions under which a given DAG can support a causal inference
Fields of development Psychology Statistics, program evaluation Artificial intelligence, machine learning
Examples of main concepts Internal validity, external validity, statistical conclusion validity, construct validity Potential outcomes, causal effect, stable-unit-treatment-value assumption Node, edge, collider, d-seperation, back-door criterion, do(x) operator
Definition of effect Difference between counterfactuals Difference between potential outcomes The space of probability distributions on Y using the do(x) operator
Causal generalisation Meta-analysis, construct and external validity Response surface analysis, meditational modeling Specified within the DAG
Assumption for valid inference in observational research Ruled out all threats to validity Strong ignorability Correct DAG
Examples of application Quasi-experiments Missing data imputation, propensity scores Mediational paths
Conceptual and philosophical scope Wide-ranging Narrow, formal statistical model Narrow, formal statistical model
Emphasis Descriptive causation Descriptive causation Explanatory causation
Preference for randomized experiments Yes Yes No
Focus on effect or mechanism Effect Effect Mechanism
Limitation General lack of quantification, no formal statistical model (lacks analytic sophistication) Limited focus on features of research designs with observational data Vulnerability to misspecification

The Campbell model focuses on validity, i.e., the quality of the conclusions you can make based on your research. The four types of validity to consider here are: 1) (statistical) conclusion validity, internal validity, construct validity, and external validity. Most important for the causal model is the internal validity. That is, the extent to which the research design identities a causal relationship. External validity refers to teh extent to which we can generalise the causal relationship to other populations/contexts. I believe one of the key advantages here is the comprehensive list of potential threats to validity listed in this work. Some of these potential threats are more relevant for specific designs or results, and being familiar with these potential threats will make you a much more critical (and thereby better) researcher. The best comprehensive introduction to the Campbell model is Shadish et al. (2002).

The Rubin model focuses on potential outcomes and how units have potential outcomes in different conditions (most often with and without a binary treatment). For example, Y(1) is an array of potential outcomes under treatment 1 and Y(0) is an array of potential outcomes under treatment 0. This is especially useful when considering an experiment and how randomisation can realise one potential outcome for a unit that can, in combination with other units, be used to calculate the average treatment effect (as we cannot estimate individual-level causal effects). To solve the fundamental problem of causal inference (that we can only observe one unit in one world) we would need a time machine, and in the absence of such science fiction tools, we are left with the importance of the assignment mechanism for causal inference (to estimate effects such as ATE, LATE, PATE, ATT, ATC, and ITT). One of the key advantages of this model is to understand how potential outcomes are turned into one realised outcome and the assumptions we rely on. For example, the Stable Unit Treatment Value Assumption (SUTVA) implies that potential outcomes for one unit are unaffected by the treatment of another unit. This emphasises the importance of minimising the interference between units. The best comprehensive introduction to the Rubin model is Imbens and Rubin (2015).

The Pearl model provides causal identification through directed acylic graphs (DAGs), i.e., how conditioning on a variable along a path blocks the path, and how specific effects need to be restricted in order to make causal inferences. When using with this model of causality, you are often worken with multiple paths and not a simple setup where you only have two groups, one outcome and a single treatment. DAGs can also be understood as non-parametric structural equation models, and are particular useful when working with conditional probabilities and Bayes networks/graphical models.

One of the main advantages of the Pearl model is that it forces you to think much more carefully about your causal model, including what not to control for. For that reason, the model is much better geared to causal inference in complicated settings than, say, the Rubin model.

However, there are also some noteworthy limitations. Interactions and effect heterogeneity are implied in the model, and it can be difficult to convey such ideas (whereas it is easier to consider conditional average treatment effects in the Rubin model). While DAGs are helpful to understand complex causal models, it is often less helpful when we have to consider the parametric assumptions we need to estimate causal effects in practice.

The best introduction to the Pearl model is, surprisingly, not the work by Pearl himself (although I did enjoy The Book of Why). As a political scientist (or a social scientist more generally), I find introductions such as Morgan and Winship (2014), Elwert (2013), Elwert and Winship (2014), Dablander (2020), and Rohrer (2018) much more accessible.

(For Danish readers, you can also check out my lecture slides from 2016 on the Rubin model, the Campbell model and the Pearl model. I also made a different version of the table presented above in Danish that you can find here.)

In political science, researchers have mostly relied on the work by Rubin and Campbell, and less so on the work by Pearl. However, recently we have seen some good work that relies on the insights provided by DAGs. Great examples include the work on racially biased policing in the U.S. (see Knox et al. 2020) and the the work on estimating controlled direct effects (Acharya et al. 2016).

Imbens (2020) provides a good and critical discussion of DAGs in relation to the Rubin model (in favour of the potential outcomes over DAGs as the preferred model to causality within the social sciences). Matthay and Glymour (2020) show how the threats to internal, external, construct and statistical conclusion validity can be presented as DAGs. Lundberg et al. (2021) show how both potential outcomes and DAGs can be used to outline the identification assumptions linking a theoretical estimand to an empirical estimand. This is amazing work and everybody with an interest in strong causal inference connecting statistical evidence to theory should read it.

My opiniated take is that the three models work well together but not necessarily at the same time when thinking about theories, research designs and data. Specifically, I prefer Pearl → Rubin → Campbell. First, use Pearl to outline the causal model (with a particular focus on what not to include). Second use Rubin to focus on the causal estimand of interest, consider different estimators and assumptions (SITA/SUTVA). Third, use Campbell to discuss threats to vality, measurement error, etc.

In sum, the three models are all good to be familiar with if you do quantitative (and even qualitative) social science.