Statistical issues

Here is a collection of statistical issues and misunderstandings you often will encounter in empirical research. My plan is to add more examples in the future.

Issue Description Source
Absence of evidence fallacy No evidence for a finding should not be interpreted as there is evidence of its absence. Altman and Bland (1995)
Berkson’s paradox When conditioning on a variable creates a spurious correlation (i.e., collider bias, the conditioning on a collider). Berkson (1946)
Cronbach’s alpha People often misunderstand the coefficient. There is not a particular level of alpha that is desired or adequate. Hoekstra et al. (2018)
Garbage can regression Adding too many independent variables to your regression model (i.e. a kitchen-sink approach). Achen (2004)
Garden of forking paths When researchers conduct multiple analyses but only end up reporting a subset of these (data-dependent analysis). Gelman and Loken (2014)
Moderation vs. Mediation A moderator is a variable that affects the direction and/or strength of the relation between two variables – not the same as mediation. Baron and Kenny (1986)
Multivariate vs. Multivariable A multivariate model is a model with multiple dependent variables. Mustillo et al. (2018)
p-value as a probability The p-value is not the probability that the null hypothesis is true. Greenland et al. (2016)
Prosecutor’s fallacy Incorrectly assuming that Pr(A|B) = Pr(B|A). Westreich et al. (2014)
Simpson’s paradox A trend in the data can disappear or reverse when looking at subgroups in the data. Simpson (1951)
Spurious correlation When two variables correlate but are not casually related. Simon (1954)
Statistical power The importance of having sufficient data to estimate the effect size of interest. Cohen (1992)

Recommended readings

Kennedy, P. E. 2002. Sinning in the Basement: What are the Rules? The Ten Commandments of Applied Econometrics. Journal of Economic Surveys 16(4): 569-589.

Makin, T. R., and J. O. de Xivry. 2019. Science Forum: Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. eLife 8:e48175.

Motulsky, H. J. 2014. Common Misconceptions about Data Analysis and Statistics. Journal of Pharmacology and Experimental Therapeutics 351(1): 200-205.

Schrodt, P. A. 2014. Seven deadly sins of contemporary quantitative political analysis. Journal of Peace Research 51(2): 287-300.

Meningsmålinger på Politologi.dk #3

I 2020 foretog jeg en række større ændringer til mine visualiseringer af meningsmålingerne på Politologi.dk. Siden da har jeg lavet et par små tilføjelser i ny og næ, eksempelvis når nye partier melder deres ankomst. På det seneste har jeg dog også tilføjet nogle ekstra funktioner på siden, som jeg gennemgår her.

Meningsmålinger fra forskellige valgperioder

Overblikket på forsiden er tiltænkt et hurtigt overblik, hvor man kan se, hvordan partierne klarer sig i meningsmålingerne i de seneste måneder. De er ikke tiltænkt et overblik for hele valgperioden eller tidligere valgperioder, men viser blot de seneste 75 meningsmålinger.

Der er flere der har efterspurgt, at det bliver lettere at se mere end de seneste 75 meningsmålinger. Derfor har jeg lavet separate sider, hvor man let kan få et overblik over meningsmålingerne fra de seneste valgperioder (inklusiv indeværende valgperiode). Her er eksempelvis en figur med meningsmålinger i valgperioden 2011-2015:

Jeg havde oprindeligt en idé om også at tilføje nogle dynamiske elementer, herunder eksempelvis muligheden for selv at vælge de partier og den periode, man gerne ville se meningsmålingerne fra. Til dette forsøgte jeg mig med plotly og ggplot2, men det viste sig hurtigt, at det ville fylde en del. Det er dog en af de ting, jeg vil forsøge at tilføje på et tidspunkt i fremtiden.

Rød og blå blok

Dansk politik handler som bekendt om at kunne tælle til 90. Det er vigtigt for Socialdemokratiet at vide, når de går tilbage i meningsmålingerne, om stemmerne bliver i rød blok eller om de går til blå blok.

Medierne ved således også kun fokusere mere på styrkeforholdet mellem rød og blå blok, når vi kommer tættere på et valg, hvorfor vi skal kigge på mere end opbakningen til de respektive partier.

Jeg har af den grund – og på baggrund af en forespørgsel – tilføjet opbakningen til rød og blå blok. Her er et eksempel på, hvordan det ser ud:

Det gør det let at se hvordan de to blokke står i forhold til hinanden, men det er ikke en figur der viser forandringer a la figurerne med partierne. Dette af to grunde. For det første er den statistiske usikkerhed større for blokkene end for partierne. For det andet sker forandringer i målingerne for det meste internt i blokkene. Når Nye Borgerlige går meget frem eller tilbage, er der større sandsynlighed for, at stemmerne kommer fra andre blå partier end røde partier.

Indlæg med seneste meningsmålinger

Når der kommer en ny meningsmåling, bliver denne også formidlet i sit eget indlæg. For at lave en tabel med tallene i den seneste meningsmåling, bruger jeg den fantastiske pakke gt i R (se her for en oversigt med gode resourcer til denne pakke).

Her er et eksempel med en meningsmåling fra december 2021 foretaget af Voxmeter:

Opbakning
23. december
Usikkerhed
95% KI
FV ’19
Resultat
Forskel
%-point
Socialdemokratiet
26,3% ±2,7% 25,9% 0,4%
Venstre
15,2% ±2,2% 23,4% −8,2%
Konservative
14,5% ±2,2% 6,6% 7,9%
Enhedslisten
9,1% ±1,8% 6,9% 2,2%
SF
8,6% ±1,7% 7,7% 0,9%
Dansk Folkeparti
6,9% ±1,6% 8,7% −1,8%
Radikale Venstre
6,5% ±1,5% 8,6% −2,1%
Nye Borgerlige
6,4% ±1,5% 2,4% 4,0%
Liberal Alliance
2,6% ±1,0% 2,3% 0,3%
Kristendemokraterne
1,6% ±0,8% 1,7% −0,1%
Alternativet
1,3% ±0,7% 3,0% −1,7%
Frie Grønne
0,2% ±0,3%
Veganerpartiet
0,0% ±0,0%

Tanken er at tabellen let kan vise hvordan partierne står i forhold til hinanden, hvor de største partier vises først – og der er lige så meget fokus på den statistiske usikkerhed for hvert af partierne. Denne opbygning gør det desuden også let at se, hvordan den statistiske usikkerhed er lavere for de små partier, og ved at følge disse indlæg vil du hurtigt kunne lære, hvad den statistiske usikkerhed er ved forskellige niveauer af opbakning.

Der er med sikkerhed mulighed for forbedringer, og har du forslag eller kommentarer, hører jeg gerne fra dig.

New article in Journal of Political Science Education: Beyond the Numbers

Together with Gianna Maria Eick, Ben Baumberg Geiger and Trude Sundberg, I have an article in the new issue of Journal of Political Science Education. Here is the abstract:

A number of studies demonstrate that quantitative teaching provides social science students with analytical and critical skills. Accordingly, the skills acquired during quantitative teaching are assumed to enhance students’ progress in and after their degree. However, previous studies rely on subjective measures of students’ evaluations of their skills. So far, no prior studies have examined whether the skills obtained through quantitative teaching can be transferred to an overall better performance at university. In order to address this gap, we use high-quality administrative records to examine the impact of quantitative teaching on undergraduate students’ overall marks. The results show that students subject to additional quantitative teaching obtain significantly better marks throughout their studies. The evidence emphasizes the importance of methodological pluralism for social science students.

I presented the findings at an event at the British Academy in 2018, and it is great to finally see the paper in print. You can find it here.

Potpourri: Statistics #80

– Data Vis Dispatch: June 22, June 29, July 6, July 13, July 20, July 27, August 3, August 10, August 17, August 24, August 31, September 7, September 14, September 21, September 28, October 5, October 12, October 19, October 26, November 2, November 9, November 16, November 23, November 30, December 7, December 14
A beginner’s guide to Shiny modules
Fill the region between two lines in ggplot2
Custom {ggplot2} point shapes with {gggrid}
An Exploratory Introduction to the Plotly Package
Lessons on ML Platforms — from Netflix, DoorDash, Spotify, and more
– Advanced Data Visualisation with R: R Graphics using grid, ggplot2 internals, Writing ggplot2 extensions, Overview of tools for interactive plots, Digging deeper into reactive elements in shiny, Web apps to deliver effective data visualisation
A new dataviz+streaming project all about The Office!
Top 21 #RStats tweets of 2021
Survival Analysis in tidymodels
Survival Analysis in Python
Comparing Distributions
The minimum post-stratification weight in a simple-random-sample equals the response rate
R Markdown Lesser-Known Tips & Tricks #1: Working in the RStudio IDE
Declutter and Focus: Empirically Evaluating Design Guidelines for Effective Data Communication
Get coordinates from fictitious maps
Efficient and beautiful data visualisation
Making Waves in ggplot: An Rtistry Tutorial
Learning to code
What do you need to do to make a matching estimator convincing? Rhetorical vs statistical checks
Obtaining consistent time series from Google Trends
Tidy Data Tutor helps you visualize data analysis pipelines
Linear Algebra Done Right
The Science of Pie Charts – Why We Don’t Read them By Angle
Estimating mood from existing surveys
How not to be lost with VSCode when coming from RStudio?
making maps with R
Estimating correlations adjusted for group membership
Animated map and lineplot with R
Spatial Data Programming with Python
The Open Handbook of Experience Sampling Methodology
How to extract speeches held at Austria’s parliament
Row-wise operations with the {tidyverse}
Using deep learning to generate offensive license plates
A brief tour of tabbycat
6 simple Shiny things I have learned from creating a somewhat small app
Doing maps in R
Clarity and Aesthetics in Data Visualization: Guidelines
Winners of the 2021 Table Contest


Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78 #79

How should the government cut emissions?

Governments around the world need to cut emissions. However, there is not a simple template to use and, most importantly, different initiatives will not attract the same level of public support. For that reason, we need to consider how governments can most effective cut emissions with the support of the public.

In a new report by Demos and WWF, Climate Consensus, survey data is used to shed light on the package of policies that represents the commitments and trade-offs the public is prepared to make in order to reach a 42% reduction in emissions by 2030. The report considers multiple policy areas, e.g., electricity, transporation, food, and flights, and the costs of specific policies.

The report was covered by outlets such as The Guardian, The Times, and BBC. Do check it out. Full disclosure: I played a minor role in the work as I conducted the cluster analysis for the report.

New article in Journal of Hospital Infection: Nudging hand hygiene compliance

In the December issue of Journal of Hospital Infection, you will find our new article titled Nudging hand hygiene compliance: a large-scale field experiment on hospital visitors. Here is the abstract:

Background. Hospital-care-associated infections (HCAIs) represent the most frequent adverse event during care delivery, affecting hundreds of millions of patients around the world. Implementing and ensuring conformity to standard precautions, particularly best hand hygiene practices, is regarded as one of the most important and cheapest strategies for preventing HCAIs. However, despite consistent efforts at increasing conformity to standard hand hygiene practices at hospitals, research has repeatedly documented low conformity levels amongst staff, patients and visitors alike. Aim. The behavioural sciences have documented the potential of adjusting seemingly irrelevant contextual features in order to ‘nudge’ people to conform to desirable behaviours such as hand hygiene compliance (HHC). In this field experiment we investigate the effect on HHC amongst visitors upon entry of a hospital by varying such features. Methods. Over 50 days, we observed the HHC of a total of 46,435 hospital visitors upon their entry to the hospital in a field experimental design covering eight variations over the salience, placement and assertion of the hand sanitizer in the foyer, including the presence of the yearly national HHC campaign and a follow up during the COVID-19 pandemic. Findings. Our experiment found that varying seemingly irrelevant features increased HHC from a baseline of 0.4%–19.7% (47.6% during COVID-19). The experiment also found that the national HHC-campaign had no direct statistically significant effect on HHC. Conclusion. Varying seemingly irrelevant contextual features provides an effective, generic, cheap and easy to scale approach to increasing HHC relative to sanitizing one’s hands at hospitals.

And here is a figure with some of the key findings on differences in hand hygiene compliance:

You can read it online here. The replication material is available on GitHub.

Trust, mistrust and distrust

Trust is important for a well-functioning society and democracy. If we cannot trust each other, politicians and political institutions, we are in deep trouble. For that reason, I am happy to see that scientists aim to understand not only what can explain trust, but also how we should understand trust in the first place.

Specifically, what does it mean to have low trust? Is the opposite of trust mistrust or distrust? Apparently, neither of the two is the opposite of trust. Low trust is different from mistrust and distrust. And mistrust and distrust are not the same. This is the core argument in a series of academic publications.

To take on example, consider the working paper “Exploring Trust, Mistrust and Distrust“. The paper describes how trust, distrust and mistrust are all part of the same trust family with different manifestations, evaluative triggers, associated attitudes and behavioural consequences. Trust is, for example, associated with confidence, whereas distrust is associated with insecurity and mistrust is associated with caution (see Table 1 in the paper for an overview).

My initial thought upon reading about these different notions of trust was one of trust in their empirical relevance (it is quite obvious that trust, mistrust and distrust are different things), but the more I thought about it the more mistrusting I became until I started distrusting the trust family. I would even go as far as saying that trust in politicians can be understood as an unidimensional concept where ‘trust’ are the high values on the scale, ‘distrust’ are the low values on the scale – and ‘mistrust’ are between the two.

I am not contesting the fact that it is possible to conduct a factor analysis that will return some Eigenvalues that are good enough (i.e., above 1) to make you sleep well at night, but this is not sufficient if you want to develop a new scale. Specifically, there are several potential problems and psychometric properties to consider, before you can – with any level of confidence – conclude that you have demonstrated different dimensions of trust.

To illustrate one potential issue, consider the set of survey items designed to measure trust, mistrust and distrust (from Table 2 in the paper):

Branch Question
Trust The government has good intentions
The government understands the needs of my community
Politicians often put the country above their own interests
Most politicians are honest and truthful
In general, the government usually does the right thing
Distrust The government acts unfairly towards people like me
Politicians usually ignore people like me
Politicians don’t respect people like me
Politicians are often incompetent and ineffective
Mistrust People in the government often show poor judgement
It is best to be cautious about trusting the government
Information provided by the government is generally unreliable
In general, politicians are open about their decisions
I am usually cautious about trusting politicians
I am unsure whether to believe most politicians

One of my main concerns here is related to acquiescence bias, i.e., that respondents are more likely to agree than disagree with statements. This is a concern as all survey items with a positive interpretation (good intentions, honest and truthful, etc.) are in one branch of the trust family. Similarly, all of the items in the distrust category are negative (“ignore people”, “incompetent and ineffective”, etc.). My concern is that these items are tapping into the same underlying concept and the differentiating factor is the acquiescence bias.

Actually, there is one survey item in the ‘Mistrust’ branch that has a positive direction, namely “In general, politicians are open about their decisions”. Unsurprisingly, the result of a factor analysis (provided in Table 6 in the paper), shows that this specific item is not loading well with the ‘Mistrust’ family – but the ‘Trust’ family. This is definitely not a good sign.

These issues also show up in other papers using these survey items to measure three branches of trust. First, consider the paper “How trust, mistrust and distrust shape the governance of the COVID-19 crisis“. Here, you can see that “In general, politicians are open about their decisions” shows no factor loading with the ‘Mistrust’ branch – but, again, it fits nicely within the ‘Trust’ branch.

Second, consider the paper “Trust, Mistrust and Distrust: A Gendered Perspective on Meanings and Measurements“. This paper is interested in gender differences related to the three branches of trust. However, one core problem is that, at least for men, there is no empirical evidence for three dimensions. Or as the authors write in the paper: “Surprisingly, however, the men’s EFA identifies only two latent concepts: trust and distrust, meaning mistrust appears not to exist for male respondents.”

This is not to say that we should not consider and work with different notions of trust, but in order to do so will require much more work on the scale development than I have seen in the literature so far. Of course, I might have missed some instrumental work within the literature, and in that case I hope future studies will do a better job in citing this work.

I do not have access to any of the data used in the papers mentioned above. However, some of the fun empirical questions to ask are whether you can have high trust, high distrust and high mistrust at the same time and whether any discriminant validity tests would confirm that we are working with three distinct branches of trust. I believe such insights are needed before we capture any practical variation that – at the end of the day – improves our understanding of how trust matters for contemporary politics.

Visualisering af støtte til forskellige coronarestriktioner

Jeg har tidligere været kritisk overfor de figurer, der bliver delt ifm. håndteringen af coronapandemien. I dette indlæg giver jeg et eksempel på en figur, der (relativt) let kunne forbedres. Der er tale om en figur, der viser støtten til indførelsen af forskellige restriktioner (tilgængelig på side 5 i denne rapport):

Problemet med figuren er, at det kræver arbejde for læseren at koble de respektive kategorier til værdierne i baren. Ønsker vi eksempelvis at se, hvor mange der går ind for “Tvunget hjemmearbejde i offentlige [sic]”, skal vi koble farven i figuren med den respektive kategori. Dette er en udfordring med alle figurer i rapporten, så ovenstående er blot ét repræsentativt eksempel.

Faktisk tjener farverne intet andet formål i figuren end netop dette (bemærk desuden at den nederste del af figuren fylder mere end selve søjlediagrammet). Hvis vi med figuren ønsker let at kunne identificere, hvilke tiltag der nyder den største opbakning blandt borgerne, ville det også give mening at sortere barerne i figuren fra størst til lavest opbakning.

Mit bud på, hvordan figuren kunne se ud, er som vist nedenfor. Bemærk dog at de præcise tal ikke er rapporteret, hvorfor jeg blot har aflæst tallene efter øjemål.

I figuren er det nu let at aflæse hvor stor opbakningen er til de respektive kategorier, samt let identificere hvilke tiltag, der har den største og laveste opbakning. Og dette uden brug af 10+ forskellige farver, der ikke bidrager med nogen væsentlig information til figuren.

Er Socialdemokratiet gået tilbage i meningsmålingerne? #5

Hvordan står Socialdemokratiet i meningsmålingerne? I tidligere indlæg har jeg argumenteret for, at vægtede gennemsnit og specifikke enkeltmålinger ikke overbevisende har vist, at Socialdemokratiet er gået tilbage i meningsmålingerne (#1, #2, #3, #4).

Sidste år pointerede jeg, at blot fordi et vægtet gennemsnit hos Altinget gav Socialdemokratiet en opbakning på 31,5%, betød det ikke, at partiet havde mistet opbakning. Tværtimod var 31,5% meget foreneligt med, hvor partiet sandsynligvis havde ligget gennem store dele af 2020. Der var således større forskel mellem institutterne i, hvor stor opbakningen var til Socialdemokratiet end inden for institutterne, hvor opbakningen var relativt stabil.

I de efterfølgende indlæg fremhævede jeg, at det selvfølgelig ikke kunne udelukkes, at Socialdemokratiet gik tilbage – eller kunne gå tilbage – men at medierne ofte tolkede meningsmålingerne forkert, når de ville lave sådanne konklusioner. Dette skyldes især, at når man ønsker at se på en forskel mellem to meningsmålinger, skal man tage den statistiske usikkerhed i begge målinger i betragtning (journalister antager ofte at den statistiske usikkerhed i en tidligere meningsmåling er 0).

Det er nu efterhånden et godt stykke tid siden, at jeg sidst belyste opbakningen til Socialdemokratiet i meningsmålingerne. Er der sket noget siden sidst? Der har i løbet af de seneste måneder været målinger, der har vist både stabilitet og tilbagegang til Socialdemokratiet. Den 25. oktober kunne man eksempelvis læse hos Altinget, at “S ligger stabilt og fastholder lille coronagevinst”. Samme dag kunne man dog også læse hos BT, Kristeligt Dagblad, Netavisen Pio, Avisen Danmark og andre steder, at opbakningen til rød blok nu var på under 50%.

For at se hvordan det helt præcist er gået Socialdemokratiet siden coronapandemien brød ud, har jeg taget alle meningsmålinger fra januar 2020 til i dag og estimeret en model, hvor vi tager huseffekter i målingerne i betragtning.

Vi ser som bekendt at Socialdemokratiet oplevede en stor fremgang i meningsmålingerne ovenpå den første coronanedlukning i begyndelsen af 2020. Derefter lå partiet relativt stabilt på den gode side af 30%, men fra slutningen af sommeren til nu ser vi, at opbakningen til Socialdemokratiet gradvist er faldet til et niveau, hvor de var før coronapandemien. Konkret er mit bedste bud, at Socialdemokratiet – hvis der var valg i dag – ville få 25% af stemmerne.

Det går op og ned i politik. Det interessante er her, at opbakningen til Socialdemokratiet har ændret sig betydeligt på relativt kort tid. Det gik hurtigt op – og det er gået langsomt ned. Der er således ikke tale om så kort tid, at vi kan opfange sådanne forskydninger i de ugentlige målinger (hvis man ser bort fra fremgangen i begyndelsen af pandemien).

Tager vi alle meningsmålingerne og huseffekterne i betragtning, er der således evidens for, at Socialdemokratiet er gået tilbage i meningsmålingerne i løbet af de seneste måneder. De er i de seneste meningsmålinger således slået tilbage til start (altså før coronapandemien).

Hvad vi ikke ved er, om de vælgere der er gået væk fra Socialdemokratiet på det seneste, er de samme som gik til partiet i begyndelsen af pandemien. I så fald er der sket forandringer under mantraet ‘hvad der kommer let går let’. Mit gæt er at analyseinstitutterne vil være i stand til at besvare dette, men på nuværende tidspunkt har vi ingen evidens herfor.

Microdosing psychedelics, mental health and conditioning on a collider

In a new study, Adults who microdose psychedelics report health related motivations and lower levels of anxiety and depression compared to non‐microdosers, a team of researchers conclude that microdosers, i.e., people who use psychedelic substances at sub‐sensorium ‘microdoses’, exhibit lower levels of depression, anxiety, and stress.

The study relies on self-reported, observational data. As this is not a placebo-controlled study that could establish a strong conterfactual group, it is important to examine how exactly the study constructs a treatment group of microdosers and a control group of non-microdosers. In this post, I will show that there are some significant limitations, maybe even problems, with the study design and inferences made in the paper.

Figure 1 in the paper provides a flow chart with the steps used to construct the two groups of interest. The baseline survey was completed by 8,703 respondents (from 84 nations), as 20% of the respondents did not provide sufficient data to take part in the study. Around 52% of the respondents who entered the study also filled out the Depression, Anxiety, Stress Scale-21 (DASS-21), although the manuscript is not providing any information on how these 52% of the respondents differ from the people participating in the study not answering these questions.

Well, let’s begin from the … beginning. How did people find their way into the study? Is it a random sample? No, on the contrary, people self-selected into the study, so do not assume that any conclusions made in the study will generalize to the population at large (whatever that population might or might not be). According to the webpage of the study, the study was launched on November 15, 2019, on the Joe Rogan Experience podcast. I am quite confident concluding that people not only listening to Joe Rogan, but also deciding to participate in a study on microdosing, are weird (not psychologically WEIRD, but weird-weird). Furthermore, as the authors acknowledge in the paper, only iPhone users could participate in the study (there was no Android app).

The next challenge is identifying who the microdosers are. You can see in the figure that the survey asks “are you currently engaging in a practise of microdosing”, and the relevant word here is “currently“. From what I can tell, the authors did not ask about previous microdosing experience. Why is that important? When the control group, i.e., non-microdosers, will include those who have never microdosed and those with a history of microdosing, we are introducing a bias. For example, it might be that those who are no longer microdosing had negative experiences with microdosing and, for that reason, worse mental health. This makes it very difficult to substantiate that any differences between the two groups can be attributed to the, for example, positive effect of microdosing in the treatment group (and not a negative effect of microdosing in the control group).

Unsurprisingly, we see that there are significant diffences between the two groups in terms of who are microdosers and non-microdosers. Table 1, shown below, shows that microdosers tend be older (highlighted with yellow and green) and more likely to live in urban community settings (highlighted with blue and pink). Is that something that is taken into account in any of the statistical models? Alas, no. None of the models even attempt to adjust for any of these covariates. It’s hip to chi-square.

In Table 2 in the paper, shown below, we see that microdosers are less likely to use alcohol frequently and more likely to abstain from tobacco (though more likely to use cannabis more frequently). How can we say that any differences between the two groups can be attributed to microdosing psychedelics and not, say, smoking cannabis? Or not drinking alcohol? We can’t.

More interestingly, microdosers are more likely to report a history of mental health problems. Specifically, microdosers are more likely to report having experiences with depression and PTSD/trauma-related mental health problems.

Well, this is interesting. The study is about the positive impact of microdosing on mental health. Notice this part from the paper: “Microdosers were generally similar to non‐microdosing controls with regard to demographics, but were more likely to report a history of mental health concerns. Among individuals reporting mental health concerns, microdosers exhibited lower levels of depression, anxiety, and stress across gender.” How can the difference between the two groups suddently change sign from negative to positive?

Say hello to our good old friend: conditioning on a collider. We have a sample selection based on values of the collider (mental health) that creates a non-causal association between microdosing and anxiety/depression. People who microdose are more likely to report mental health concerns, and depressed people are more likely to report mental health concerns. When we select observations based upon mental health concerns, we select observations that are more likely to microdose or being depressed (on average). There is simply no justification for conditioning on mental health status (it also reminds me of the birth weight paradox within epidemiological research).

The key findings are reported in Figure 2 in the paper:

Here we see a negative “effect” of microdosing psychedelics on anxiety, depression and stress. Again, we are conditioning on a collider so don’t put too much into the results. Actually, ignore them altogether. However, even if we assume that these results were valid, we can add a few points. First, notice how the figure uses the y-axis to mislead. Second, and related, the “effect” sizes are negligible. Maybe that is why it is called microdosing.

Noteworthy, I have not looked at the raw data so I cannot rule out that I have identified all relevant issues with the empirics. The data availability statement of the paper says: “All data generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.” I reached out to the corresponding author and asked for the replication material. Specifically, I said: “I have a few ideas for robustness tests I would like to explore to better understand the findings, and I am wondering whether you can share the replication data. I can confirm that I will not share the data with anybody else. I hope this constitutes a reasonable request.” The researchers were kind enough to let me know that they are still ‘preparing the data for public release’, but at the time of me writing this post, I cannot analyse the data. I will of course write a follow-up post when the data is available if – and only if – there is anything of interest to say.

I always assume that researchers care about testing a hypothesis rather than looking for support for a hypothesis. However, in the specific case I am doubtful that the researchers have put the theory to a fair test. Why? Take a look at the (depressingly long) competing interests statement in the paper:

Joseph Rootman has received research funding from Quantified Citizen Technologies who provided the data collection platform for this study. Pamela Kryskow is a member of the clinical advisory board of Numinus Wellness which is a company that provides psychedelic psychotherapy services. Pamela Kryskow is compensated for this role with Numinus stock. Kalin Harvey is the CTO and co-founder of Quantified Citizen which is a company that produces software for decentralized mobile research, which was used in this study. Paul Stamets is a minority investor in Quantified Citizen and is an applicant on pending patents combining psilocybin mushrooms, Lions Mane mushrooms and niacin. Eesmyal Santos-Brault is the CEO and co-founder of Quantified Citizen which is a company that produces software for decentralized mobile research, which was used in this study. Kim PC Kuypers is a principle investigator on research projects, the present study not included, that are sponsored by Mindmed and Silopharma which are companies that are developing psychedelic medicines. Vince Polito is a science advisor for Mydecine Innovations Group, which is a company that is developing psychedelic medicines. Francoise Bourzat is a collaborator in the study on psilocybin assisted psychotherapy for COVID related grief at Pacific Neuroscience Institute, Santa Monica, CA. Zach Walsh is in paid advisory relationships with Numinus Wellness and Entheo Tech Biomedical regarding the medical development of psychedelics.

We have both a theoretical bias (these researchers are being paid to support the medical development of psychedelics) and a methodological bias (using and showing the validity of a specific data collection platform). The competing interests statement in the paper can easily explain why and how the researchers can end up showing a negative correlation between microdosing psychedelics and anxiety, depresion and stress, even when the raw data shows positive correlations.

Will microdosing psychedelics have an impact on your mental health? If so, is it positive? Or negative? We cannot use any of the findings in the paper to answer these (relevant) questions. Hopefully future research can help us out here.