In my previous post on nudging, I discussed a series of critical comments published in PNAS on the meta-analysis demonstrating a substantial average effect of nudging. The key question of interest is whether it makes sense to make a meta-analysis on the effectiveness of nudging and, if so, what the average effect size is. In other words, all else equal, what should be our best estimate on the effect of a nudge?
I read a blog post on the meta-analysis with a focus on the inclusion/exclusion criteria used in the meta-analysis. It is a good blog post and it made me think more about the 447 effect sizes from the 212 publications included in the meta-analysis. Specifically, that post made me consider the number of studies on nudging. 212 publications? Nudging has been very popular, especially within the last 10 to 15 years. How many studies do we actually have on nudging? That is, what literature are we actually talking about, when we talk about nudging? We should be looking at thousands of studies, not 212 or any number in the low hundreds. Before we begin to estimate the average effect of nudges, I would be just as interested in estimating the actual number of relevant studies examining the impact of nudges.
What is the population of interest? How many nudges are we potentially looking at? If we were to sample a nudge from the population of nudges at random, what would the effect of that nudge be? Of course, that effect would not be the same as the average effect of nudges in the scientific literature, but I find these questions relevant to consider before we begin to even think about the relevance of the meta-analysis. Only when we understand these issues can we begin to consider the relevant selection criteria for the meta-analysis and identify the studies of interest. Again, that number should definitely be greater than 212, and I have no confidence that the 212 selected studies are representative of most nudges being studied, most nudging studies being published, or a random sample of nudges from the ‘population of nudges’. My guess is that 212 is a better estimate of the number of unique cognitive biases behavioral economists are interested in than the number of relevant studies.
In addition, how many of these studies are published? How many effect sizes in these studies are related to nudges? How many of these studies are explicitly about nudging/nudges? My concern is not only that the 212 studies are the tip of the iceberg, but that we are dealing with so many different concepts and heterogeneity in the literature that it is not even possible to find the ‘true’ number of empirical studies on nudging.
Importantly, I am not the only one to have these concerns. In a series of blog posts, Uri Simonsohn, Leif Nelson and Joe Simmons talk about how ‘the averaging of incommensurable results’ is doomed to fail (with nudging being a good example). They are not against meta-analyses per se, but they argue that meta-analyses are primarily useful when the studies going into such analyses have identical manipulations and measures. This is by no means the case when you consider any review or meta-analysis of the literature on nudging.
For example, as they show in a post, when you look at the different studies going into the nudging meta-analysis, such as the different “reminder” effects, you will see that these studies are very different and it is a weird exercise to try to calculate the average effect on the basis of these studies. Here is their conclusion: “In sum, we believe that many nudges undoubtedly exert real and meaningful effects on behavior. But you won’t learn that – or which ones – by computing a bunch of averages, or adjusting those averages for publication bias. Instead, you have to read the studies and do some thinking.”
And in the most recent follow-up post, they read three specific studies related to the same domain: “While these three studies are loosely speaking related to “the environment”, it’s unclear to us how to decipher the meaning of the mean that combines the effect of (1) telling people all bananas cost the same on the share of eco bananas purchased, (2) telling households a researcher is coming to check their stickers on placing said stickers, and (3) defaulting academics into a CO2 fee on paying that fee.”
In other words, even when we compare nudges within the same subject matter, it can be extremely difficult, if not impossible, to say anything meaningful about the average effect of nudging. While it makes sense to talk about the usual suspects when evaluationg meta-analyses, such as publication bias, there are more fundamental issues with such analyses. This is similar to the point I made in my first post on the effectiveness of nudging, namely that the definition of nudging is making it difficult to conduct a meta-analysis in the first place, and we should be critical towards any estimates trying to shed light on how effective nudges are on average.
However, to be fair, a lot of these points are not only relevant for the literature on nudging. Meta-analyses are not perfect (for a good and balanced introduction to meta-analyses, see this chapter), and we know that meta-analytic effect sizes are, on average, larger than replication effect sizes (Kvarven et al. 2019). We also know that the impact of publication selection bias on meta-analysis is present across different fields. Recently, Bartoš et al. (2022) concluded that: “The median probability of the presence of an effect in economics decreased from 99.9% to 29.7% after adjusting for publication selection bias. This reduction was slightly lower in psychology (98.9% → 55.7%) and considerably lower in medicine (38.0% → 27.5%).”
Furthermore, one should be cautious when looking at (asymmetric) funnel plots in a meta-analysis and make any conclusions about publication bias. Specifically, it might be that people overestimate the extent of publication bias if researchers decide the relevant sample size with a specific effect size in mind (which seems plausible). For more on this point, read this great Twitter thread by Will Fithian.
We should know by now that the methodological decisions made in a meta-analysis matter a lot for the conclusions, and two meta-analyses on the same question can reach different conclusions (for a good example on meta-analyses related to the effectiveness of a growth mindset, see this Twitter thread). In this particular case, when we have significant concerns about the methodology, I see no reason to believe any of the conclusions.
Even if we set all of these issues aside and assume for a second that the meta-analysis is great, how should we begin to make sense of Cohen’s d and the like? What conclusions should we make? Should we consider a threshold that would make us less interested in nudging? I am not convinced. I recently read a piece by Grice et al. (2020) on person-centered effect sizes, and I would like to see more work on trying to understand how many people in these different studies behave (or respond) in a manner that are consistent with the theoretical expectations. My point here is that I believe any meta-analysis on nudging is also suffering from the lack of strong theories and too much focus on individual tests of cognitive biases, counterintuitive/surprising finding, etc.
How effective is nudging? I don’t know, but I believe we are closer to getting started answering this question than making any final conclusions, and before we collect and analyse the studies of (potential) interest, we should consider a series of conceptual and methodological challenges.