How effective is nudging? #3 – Erik Gahner Larsen

In a previous post, I discussed a study by Mertens et al. (2022) showing that the average effect size of nudging is large (i.e., a Cohen’s d of 0.45). In my post, I identified several limitations in the study that led me to conclude that the average effect size presented in the study most likely was overestimated and not representative for the effect sizes we, all else equal, should expect.

Interestingly, the manuscript is now revised with a series of corrections. That is, the paper has been changed and the original replication material is no longer available. You can instead find the updated replication material available here (if you use the link in my previous post, you will be able to confirm that the original replication data is no longer publicly available).

In the revised replication material, there is a file with information on how 50 effect sizes have been updated. A few of the effect sizes are removed because they were from a retracted study. However, most of the changes are minor and, from what I can tell, does not matter for any of the conclusions presented in the manuscript. Interestingly, The problematic effect sizes I informed the corresponding authors about (cf. my previous post) are also updated, so I assume it was an error although they never got back to me (which is totally fine – I assume they are busy people).

It is great to see the authors do whatever is needed to set the record straight. We need to make sure the data available in the academic literature is valid and reliable. One could only wish that other researchers did the same. However, I would have preferred to keep the original URL to the replication data with better version control to keep track of the changes.

Despite the changes, there are still some concerns related to how large we should expect the (average) effect sizes to be. Unsurprisingly, I am not the only one who have been saying that the meta-analytically derived effect size is most likely overestimated. In PNAS, the journal that published the original study, you can now find three comments that say more or less the same. Bakdash and Marusich (2022) conclude that ‘nudges have more limited than general effectiveness’. Szaszi et al. (2022) conclude that there is ‘no reason to expect large and consistent effects when designing nudge experiments or running interventions’. Last, Maier et al. (2022) find ‘no evidence remains for the mean effect after accounting for publication bias’. The three comments all look at the data made available by Mertens et al. (2022) to make this conclusion.

The core argument in these comments is that the nudging literature suffers from a publication bias. In other words, the estimates available on nudging in the academic literature, as captured by the studies in the meta-analysis, are not representative for the population of effect sizes related to nudging. In a reply, the authors of the original meta-analysis confirm that publication bias is an issue in the literature. Stuart Ritchie also provides a good overview of the three responses to the original meta-analysis and how there are serious issues with publication bias in the literature. The three comments are all good and, as a rule of thumb, comments to articles in PNAS are of a much higher quality than the studies being published in the journal.

I spoke to a journalist from the Danish outlet Zetland the other day, and I give my few cents on the topic. You can find the article where I am quoted here (in Danish). In brief, I say that the original meta-analysis painted a too optimistic view of how effective nudging is, but that we should not simply conclude that the impact of nudging is non-distinguishable from zero. However, we also discussed various related topics, some of which I will save for another post.

That being said, there are a few points that are worth elaborating on. We know that the replication crisis is a major problem across all social sciences, and why should we expect the literature on nudging to be any different? In fact, I would be more surprised if there was no issues related to publication bias in the literature.

What I do find problematic is the need to categorize a series of different interventions with different theoretical mechanisms and simply call them all for ‘nudges’. I am not convinced that it makes a lot of sense to talk about the ‘average’ effect of nudging, and it is no surprise that the only thing people can agree on here is that we should expect heterogeneity in the effect sizes.

We know that some nudges work well, and some nudges work in some contexts and for some people. In a new study, Stango and Zinman (2022) find that there is substantial cross-person heterogeneity in biases, and there is no reason to expect that a nudge will have the same effect on different people in different contexts. Put simply, nudges can be very difficult to scale.

Behavioural economics is all about identifying biases that can explain why nudges are effective. Little did we know that the most influential bias that can explain a lot of the effects in the literature would be the publication bias. However, what is needed now in the literature is not another meta-analysis or attempts to provide a more reliable estimate on the average effect size. Instead, I see a need for a much stronger theoretical foundation (see this post by Jason Collins with some references to relevant material on this point). Specifically, nudging – and behavioural economics more generally – is paying too much attention to specific biases rather than providing strong theoretical explanations that can help us understand behaviour across contexts and people.

While we know that nudges can work, there is also an important question to be asked in relation to the baseline. Nudges might work, but compared to what? Benartzi et al. (2017), for example, found that nudge interventions are often more effective than traditional interventions (such as tax incentives). However, in many studies on nudging, we do not compare a nudge intervention to other types of interventions. Instead, what is most often being tested is the impact of a nudge compared to another nudge, or a control group with no intervention at all. To have a much more nuanced understanding on how effective nudges are, we need to provide better and stronger tests of nudges vis-à-vis other policy tools.

In sum, we can conclude that nudging is not as effective as we might have thought if we only consulted the published studies and the meta-analyses aggregating the effects available in the literature. However, we should not conclude that nudging is not working. So let us end this post with a very academic conclusion: additional theoretical and empirical work is needed.