Since the publication of Nudge: Improving Decisions about Health, Wealth, and Happiness, thousands of studies have examined how different nudges can change attitudes and behaviour. The core proposition of the book was that nudging is an effective way to change behaviour, and for that reason, the book is packed with examples of how nudges can lead to better outcomes than traditional, paternalistic policies.
However, all nudges are not effective. We should not see the evidence presented in Nudge as representative for the broader population of nudge findings. As for most social science these years, we know that a lot of findings cannot replicate and the expected effect in most studies should be close to zero (both in terms of statistical and practical significance).
In a recent paper, “How effective is nudging? A quantitative review on the effect sizes and limits of empirical nudging studies”, the authors conclude that “only 62% of nudging treatments are statistically significant”. Is this a small number? Compared to other findings, especially within the domain of social psychology, finding that 62% of the treatments are statistically significant is quite impressive.
My primary issue with this study – and similar studies interested in whether nudges work – is that there is no agreement in the ltierature on when a treatment should be categorized as a nudge. For that reason, I am not convinced that a quantitative review as the one above can provide a meaningful estimate on the number of nudges, that are statistically significant (even if we only look at the published literature). Specifically, look at the criteria used in the literature review to select relevant studies: “we did not include studies that did not mention, ‘nudge’ or ‘nudging’, that did not quote the original work (Thaler and Sunstein, 2008), or that had no other link to the nudge concept.”
My guess is that statistically significant nudging studies, that cite the original work on nudging, are more likely to cite Thaler and Sunstein. Or, more importantly, studies that do find statistically significant effects are more likely to be called nudges in the first place. To understand why, let us take a look at the definition of a nudge offered in Nudge:
“A nudge, as we will use the term, is any aspect of the choice architecture that alters people’s behavior in a predictable way without forbidding any options or significantly changing their economic incentives.”
In the definition of “a nudge”, it is important that an aspect of the choice architecture alters people’s behavior. In other words, if there is no effect of an aspect of the choice architecture, it is not a nudge. This is a definition that is set up to succeed. If an intervention works, it is a nudge. If not, it is not a nudge. Conceptually, it makes no sense to say that x% of nudges work, as 100% of nudges will have an impact. If not, they are not nudges. Heads you win, tails I lose.
With this definition in mind, I am surprised that 38% of the effects are not statistically significant. However, I think I know why the number is that high. In other words, in the studies examined in the review, more than 62% of the nudges will work. The review looks at 100 studies including 317 effect sizes. Not all effect sizes are created equally and most studies have one key effect that is more important than the rest. If a study reports two effects, one significant and one insignificant, you will more often see that the effect called the most important effect (in this case the nudge), is the statistically significant effect. The authors do not provide a lot of information on this, but they do basically confirm this pattern: “Occasionally, statistically insignificant effects are reported to be insignificant in the discussion section of the primary publications.”
For those reasons, and if we set aside my conceptual concerns for a second, I fully agree with the authors when they conclude “that the findings of this study have to be interpreted with great care and are rather represent an upper bound of the effectiveness of nudging.”
The authors are aware of the obvious publication bias: “Moreover, we might be victim of a possible publication bias as many studies with insignificant results are often not published.” I am surprised, however, that they do not discuss the simple fact that they only look at 100 studies. This might seem impressive compared to other studies, but if you have followed the field since 2008, you know that there was conducted much more than 100 studies from 2008 to 2018. For example, the Behavioral Insights Team in the UK (i.e., one single research team) conducted more than 300 experiments over a six year period (see Maynard and Munafó 2018).
To better understand this problem, consider this paper, “RCTs to Scale: Comprehensive Evidence from Two Nudge Units”, that compares all effects from two Nudge Units in the United States to that of published nudging studies. The finding is as unsurprising as it is depressing: there is a huge discrepancy where published effects are much larger and have low statistical power. This discrepancy can primarily be explained by a publication bias.
The point of this post is not to say that nudging is not effective. However, we should be much more aware of the conceptual and methodological challenges in providing reliable answers to how effective nudges are, and especially the relative effectiveness of nudging compared to other solutions (see Benartzi et al. 2017 for a similar point).