Erik Gahner Larsen

Why you should not trust the Facebook experiment

Recently, there has been a lot of focus on the implications of using Facebook. One study, “The Facebook Experiment: Quitting Facebook Leads to Higher Levels of Well-Being“, argues that people who leave Facebook feel better with their lives. Matthew Yglesias talks about the study in this clip from Vox:

The study also got some attention back in 2016 when it was published (see e.g. The Guardian). This is not surprising as the study presents experimental evidence that people who are randomly assigned to not using Facebook felt better with their lives on a series of outcomes.

The only problem is that the study is fundamentally flawed.

The study finds that people who did not use Facebook for a week reported significantly higher levels of life satisfaction. The design relied on pre and post test measures from a control and treatment group, where the treatment group did not use Facebook for a week. The problem – and the reason we should not believe the results – is that people who took part in the study were aware of the purpose of the experiment and signed up with the aim of not using Facebook! In short, this will bias the results and thereby have implications for the inferences made in the study. Specifically, we are unable to conclude whether the differences between the treatment and the control group is due to an effect of quitting Facebook or is an artifactual effect.

First, when respondents are aware of the purpose of the study, we face serious challenges with experimenter demand effects. People assigned to the treatment group will know that they are expected to show positive reactions to the treatment. In other words, there might not be a causal effect of not being on Facebook for a week, but simply an effect induced by the design of the study.

An example of the information available to the respondents prior to the experiment can be found in the nation-wide coverage. The article (sorry – it’s in Danish) informs the reader that the researchers expect that using Facebook will have a negative impact on well-being.

Second, when people know what the experiment is about and sign up with the aim of not using Facebook, we should expect a serious attrition bias, i.e. that people who are not assigned to their preferred treatment will drop out of the experiment. In other words, attrition bias arises when the loss of respondents is systematically correlated with experimental conditions. This is also what we find in this case. People who got the information that they should continue to use Facebook dropped out of the study.

Figure 1 shows the number of subjects in each group before and after the randomisation in the Facebook experiment. In short, there was a nontrivial attrition bias, i.e. people assigned to the control group dropped out of the study.

Figure 1: Attrition across conditions

The dashed line indicates the attrition bias. We can see that the control group is substantially smaller than the treatment group.

Third, when people sign up to an experiment with a specific purpose (i.e. not using Facebook), they will be less likely to comply with their assigned treatment status. This is also what we see in the study. Specifically, as is described in the paper: “in the control group, the participants’ Facebook use declined during the experiment from a level of 1 hour daily use before the experiment to a level of 45 minutes of daily Facebook use during the week of the experiment.” (p. 663)

These issues are problematic and I see no reason to believe any of the effects reported in the paper. When people sign up to an experiment with a preference for not being on Facebook, we cannot draw inferences beyond this sample and say anything about whether people will be more or less happy by not using Facebook.

Potpourri: Statistics #45

How I find new research

There is a lot of new and interesting academic research coming out every day. Working papers, book chapters (you can usually ignore these), journal articles, books etc. So, how to stay up to date on all this new research? Here are my personal recommendations.

First and most importantly: Twitter. This is by far the easiest way to keep yourself updated. You don’t need to (re)tweet or in any other way engage in the conversations on Twitter, but you should at least have an account and follow your favourite scholars1.

Luckily, it is impossible not to hear about new research from a person if you follow that person on Twitter. Furthermore, people are usually good at tweeting about interesting research similar to their own interests (which hopefully will overlap with your interests).

That also brings us to the challenge of using Twitter: information overload. The more people you follow on Twitter, the more difficult it is to ensure that you notice the tweets relevant to you. It is very easy to follow new people on Twitter. Good Twitter use is not about following as many researchers as possible but about optimizing the signal-to-noise ratio, i.e. seeing more relevant tweets and less irrelevant tweets.

I can recommend that you do a mental cluster analysis and create (private) lists of people connected within their respective domains. For example, you can create lists with academics within different fields/topics (U.S. political scientists, European political scientists, open science, R, economists, psychologists, sociologists etc.)

While there is an overlap between the different lists, they can structure your Twitter use and make it easier to stay up to date on what is going on compared to one major feed with everybody, especially if you are offline or busy not being on Twitter for multiple days and eventually have to catch up. You can read more about lists on Twitter here.

Second, Google Scholar. An important feature of Google Scholar is that you can follow researchers, articles and key words (so-called email alerts). If you follow a researcher on Google Scholar, this will give you a mail notification when the person has new research. You can also follow citations to that persons, i.e. get mail notifications on the new research that is citing work by the person.

Within any scientific subfield there is usually a review piece or two that everybody cites. It is a good idea to sign up for notifications in relation to those articles so you get a mail when there is new work that cite this work. Last, if you work with specific concepts it is a good idea to follow such key words as well.

Third, journal RSS feeds. This was my main method for years, basically getting notifications about the most recent number of a journal and/or articles available in advance/FirstView. I still follow the journals but it is getting less useful for three reasons. First, there is a heavy delay so you have often seen the work months (if not years) in advance of the actual publication (especially if you use the two methods above). Second, there is an overlap with the above methods, so if anything relevant is coming out, you can be sure that it will reach your Twitter feed. Third, going back to the signal-to-noise ratio, the more generic journals you follow, the more irrelevant research will end up in your feed.

These are just a few of the ways in which you can find new research (again, my recommendations). If you want another example on how you can find new research in line with your interests, see this tweet from John B. Holbein (he usually tweets a lot of interesting political science research).

  1. If they are not on Twitter you should reconsider whether they are in fact your favourites. []

Problems with The Global Gender Gap Report

Or, why is Rwanda doing better than Denmark?

In this post I outline basic methodological problems with The Global Gender Gap Report (the GGGR). The GGGR is developed by the World Economic Forum (WEF) and “benchmarks 144 countries on their progress towards gender parity across four thematic dimensions.”

Benchmarking 144 very different countries on their gender parity is a challenging task. Sadly, the report from the World Economic Forum is not doing a great job accommodating the challenges. The issues in the report are severe and the rankings should not be taken seriously. In short, the country rankings in the GGGR are misleading at best and completely meaningless at worst.

I will look at the most recent report from 2017 and illustrate some interrelated problems. There are other issues with the report but below I touch upon some of the most important. For some of the other issues in the report, see my (and others) comments in this article (sorry, it is in Danish).

The GGGR measure the relative gaps between women and men across four thematic dimensions: health, education, economy and politics. For each of the four dimensions we see that 13 out of the 14 variables are ratios.

For the subindex Health and Survival, the variables are 1) sex ratio at birth and 2) female healthy life expectancy (also as a ratio relative to the male value). This subindex will help us understand one of the main problems with the report, namely that it is not tapping into any meaningful gender gaps. Specifically, we will look at healthy life expectancy. This is a measure of “Average number of years that a person can expect to live in full health, calculated by taking into account years lived in less than full health due to disease and/or injury.”

Since men are doing exceptionally bad on the healthy life expectancy variable in Rwanda (with a value of 52.3), Rwanda is getting a very good score on this variable and this is affecting its overall rank as number 4 in the Global Gender Gap Report. Figure 1 shows the top 15 countries doing best on the gender parity list (notice Rwanda as number 4). The blue lines indicate the size of the gender gap.

Figure 1: Gender gap rankings, top 15 countries

The report is partially aware about this issue, as they write: “the Index is constructed to rank countries on their gender gaps not on their development level.” (p. 4). However, this is a serious problem as developed countries are doing much better in terms of the gender gap in health and survival, but this is not to be seen in the rankings (on the contrary, countries are punished for this, cf. below).

In other words, the first key problem is that the index is not necessarily measuring progress towards gender parity.

The report argues that the “Index rewards countries that reach the point where outcomes for women equal those for men, but it neither rewards nor penalizes cases in which women are outperforming men in particular indicators in some countries.” (p. 5) However, this is simply not correct for the measure on healthy life expectancy.

If we take Rwanda in 2017 as an example, the healthy life expectancy for women is 60.8 whereas it is 52.3 for men (a difference of 8.5 years). This is a big gender gap but is rewarded by the Index as women are outperforming men (remember that Rwanda is number 1 on the subindex). If we then look at Denmark in 2017, the value for women is 72.3 and 70.0 for men (a difference of 2.3 years). This is punished by the Index with a rank of 104 to Denmark.

In other words, while the gender gap is obviously smaller in Denmark (2.3 years) than in Rwanda (8.5 years), Rwanda is getting a much better ranking on the specific variable (103 rankings better!). This leads to a better overall ranking as the Index rewards the gender gap in Rwanda (leading to an overall placement of 4 in the ranking system) and punishes Denmark with an overall score of 14. Consequently, we cannot say anything about the overall gender gap score in Rwanda or/and Denmark by looking at the Index (or any other country for that matter).

When we combine these issues with the report, we will see that the Index – all else equal – directly rewards countries with low development. To illustrate this, let us compare Rwanda and Denmark in 2016. In Denmark, the gender gap in healthy life expectancy was 2 years resulting in a female-to-male ratio of 1.03 (71 years/69 years). In Rwanda the gender gap was also 2 years resulting in a female-to-male ratio of 1.04 (57 years/55 years).

As the Index is rewarding a greater ratio, lower development values are rewarded (i.e. lower healthy life expectancy). Consequently, since the gender gap was the same in Denmark and Rwanda in 2016, but Rwanda had a lower life expectancy, they performed better on the Index (13 places better than Denmark). This problem becomes more and more serious when the overall level of development decreases and the gender gap increases.

To show the implication of this, Figure 2 presents a list of the countries with the best ranking (number 1) in 2017 on health and survival. There is an interesting absence of developed Western countries. (But do note that even Syria is doing a top notch job in the GGGR when it comes to health and survival!)

Figure 2: Gender gap in health and survival, best countries

The nature of the problems makes it difficult to make comparisons between countries and use the rankings to say anything meaningful about what is going on in the individual countries over time. Accordingly, it is a bad measure for any meaningful policy discussion.

The World Economic Forum writes in the report: “The Global Gender Gap Index was first introduced by the World Economic Forum in 2006 as a framework for capturing the magnitude of gender-based disparities and tracking their progress over time.” (page vii)

However, the problem is that we cannot say anything about progress over time when we look at the Index! From 2016 to 2017, Rwanda went from being number 100 to number 1 in healthy life expectancy despite an increase in the gender gap.

Gender parity is an important topic and I am sure the World Economic Forum is doing a great job pushing this agenda and turning it into an even more salient issue. However, in the current setup with these measures, I see no reason to take the ranking serious. Future reports will have to take the aspects discussed above into account before we might be able to compare gender parity across different countries.

New paper in Nature Human Behavior: Justify your alpha

Together with 87 other scientists I am co-author on a new paper in Nature Human Behavior. The paper is titled Justify your alpha and the abstract is as follows:

In response to recommendations to redefine statistical significance to P ≤ 0.005, we propose that researchers should transparently report and justify all choices they make when designing a study, including the alpha level.

The paper can be found here and more information on the context for the project can be found here.

Potpourri: Statistics #44

New article in European Sociological Review: Welfare Retrenchments and Government Support

My article, ‘Welfare Retrenchments and Government Support: Evidence from a Natural Experiment’, is now published in the European Sociological Review (vol. 34, no. 1). The abstract sums up the content of the article:

A large body of literature has provided mixed results on the impact of welfare retrenchments on government support. This article examines whether the impact of welfare retrenchments can be explained by proximity, i.e. whether or not the retrenched policy is related to people’s everyday lives. To overcome limitations in previous studies, the empirical approach utilizes a natural experiment with data from the European Social Survey collected concurrently with a salient retrenchment reform of the education grant system in Denmark. The results confirm that people proximate to a welfare policy react substantially stronger to retrenchment reforms than the general public. Robustness and placebo tests further show that the results are not caused by non-personal proximities or satisfaction levels not related to the reform and the government. In sum, the findings speak to a growing body of literature interested in the impact of government policies on mass public.

The article is available as open access here. The replication material can be found at the Harvard Dataverse and at GitHub.

Skal dronningen abdicere?

En af de seneste nyheder i det forgangne år var, at et klart flertal af danskerne ønskede at dronningen skulle abdicere. Baggrunden for dette var en meningsmåling foretaget af analysebureauet Wilke for Avisen.dk, der blev samlet op af diverse andre nyhedsmedier.

Der er flere gode grunde til at være kritisk i forhold til den pågældende meningsmåling, hvorfor jeg også glædeligt har kommenteret på meningsmålingen for TjekDet.

Hvad jeg ikke vidste da jeg kommenterede meningsmålingen var, at 21 procent af de adspurgte i meningsmålingen svarede ‘ved ikke’, som er blevet frasorteret i meningsmålingen. Dette gør blot kritikpunkterne endnu mere relevante. Ligeledes er det sjovt at se, hvordan man kan forsvare en misvisende overskrift med, at man “laver journalistik”.

Skal medierne formidle metodiske informationer i dækningen af meningsmålinger?

Nyhedsartikler med meningsmålinger fortæller ofte historier, der har rod i tilfældig støj, laver absurde fortolkninger på baggrund af misvisende spørgsmålsformuleringer, “glemmer” at informere om hvem der har betalt for meningsmålingerne og så videre.

Derfor har jeg argumenteret tidligt og silde for, at medierne skal informere om metodiske aspekter, da disse er altafgørende for at kunne vurdere, hvor god dækningen af en meningsmåling er. Hvis metodiske informationer udebliver, er vi med andre ord ikke i stand til at kunne vurdere kvaliteten af en meningsmåling.

I 2011 besluttede jeg mig for, sammen med en god ven, at indsamle en masse artikler i medierne og undersøge, hvor gode medierne var til at informere om metodiske aspekter. Motivationen for dette var en begrænset systematisk viden herom i en dansk sammenhæng, men også en frustration over mediernes – i vores optik – mangelfulde dækning af metodiske informationer.

På baggrund af tidligere studier valgte vi at fokusere på bestemte aspekter, herunder om spørgsmålsformuleringens ordlyd, stikprøvestørrelsen og den statistiske usikkerhed blev formidlet. Resultaterne bekræftede overordnet betragtet vores forventninger og blev publiceret i Tidsskriftet Politik.

Til trods for at jeg i de fleste sammenhænge finder metodiske informationer relevante, er jeg ikke ukritisk tilhænger af blot at formidle så mange metodiske informationer som muligt. I dette indlæg vil jeg derfor gøre hvad jeg kan for at mindske relevansen af vores føromtalte studie. Eller som minimum komme ind på nogle af de forbehold, det er vigtige at holde sig for øje.

For det første er der begrænset plads i nyhedsartikler. AAPOR opererer eksempelvis med +10 metodiske informationer, der bør formidles, og der vil være tilfælde, hvor pladsen ikke tillader formidlingen af så mange informationer. Pladsbegrænsninger er en mindre bekymring når det handler om netartikler, men man skal ikke desto mindre være bevidst om, at der ganske enkelt er naturlige begrænsninger på, hvor lange historier der kan skrives om meningsmålinger.

For det andet er alle metodiske informationer ikke lige relevante. Hvad der kan være relevant i én sammenhæng kan i andre sammenhænge være tilnærmelsesvist ligegyldigt. Hvis der eksempelvis er tale om en meningsmåling omkring partivalg, er den eksakte ordlyd på spørgsmålet som regel ikke afgørende, hvor ordlyden i en meningsmåling omkring holdningen til et bestemt politisk emne er yderst relevant – og i mange tilfælde afgørende for, hvilke svar man får.

For det tredje kan formidlingen af mange metodiske informationer føre til at læseren husker mindre fra en meningsmålingsartikel. Derfor kan det give mening at anbefale journalister ikke at bestræbe sig på at formidle et tocifret antal af metodiske informationer, men i stedet at skulle forholde sig til, hvilke metodiske informationer der er relevante i den pågældende sammenhæng.

For det fjerde er det ikke selvskrevet, at metodiske informationer hjælper læserne med at forstå meningsmålinger. Det er således muligt, at en læser kan huske hvad den statistiske usikkerhed er i en meningsmåling, men dette betyder ikke, at vedkommende forstår hvad den statistiske usikkerhed helt præcist er for en størrelse og hvordan den skal tolkes. Derfor kan metodiske informationer ofte ikke stå alene. Nogle informationer kan være nødvendige, men de er sjældent tilstrækkelige.

For det femte er der andre elementer, der påvirker hvordan læserne tolker meningsmålingernes troværdighed. Vi skal dermed ikke forholde os til metodiske informationer isoleret set. Et amerikansk studie viser således, at borgerne er mere tilbøjelige til at finde en meningsmåling pålidelig, hvis den harmonerer med egne politiske overbevisninger, og det at formidle metodiske informationer gør hverken fra eller til i forhold til dette. Gevinsterne ved at formidle metodiske informationer er dermed sandsynligvis mindre end vi har antaget.

Alt dette fører til, at formidlingen af metodiske informationer ikke kan stå alene. Det er vigtigere at fokusere på, om de narrativer journalisterne laver, er konsistente med meningsmålingerne, der formidles, end at tælle hvor mange metodiske aspekter, der formidles. Der kan således være tilfælde, hvor 2-3 metodiske informationer er alt, der er brug for, og ekstra informationer tilføjes på bekostning af andre informationer og læserens oplevelse.

Da vi lavede vores undersøgelse i 2011 fandt vi talrige eksempler på nyheder, hvor der var en eksplicit diskrepans mellem de metodiske aspekter og selve artiklen. Eksempel 1: “Alle forskydninger ligger dog inden for målingens statistiske usikkerhed på 2,8 procent.” Eksempel 2: “Men selvom S går frem fra 25 pct. af stemmerne i går til 26,5 i dagens måling, så skal man bemærke, at bevægelsen ligger indenfor den statistiske usikkerhed, som stikprøveundersøgelsen bevæger sig indenfor.” Eksempel 3: “Bevægelserne er inden for den statistiske usikkerhed.”

I de tilfælde blev den statistiske usikkerhed nævnt, men det var ikke en god dækning. Vi kan derfor gå nok så meget op i, hvor mange metodiske informationer, der formidles, men hvis vi i sidste instans ikke har en dækning, der tager dem seriøst, har vi langt større problemer. Dette er elementer Yosef Bhatti og Rasmus Tue Pedersens tager op i deres undersøgelse af formidlingen af meningsmålinger i relation til den statistiske usikkerhed.

Min opfattelse er, at journalister i de fleste tilfælde ikke er kvalificerede til at vurdere, hvilke informationer der er relevante. Derfor er det ofte tilfældigt, om metodiske informationer formidles – og i så fald hvilke. Som tommelfingerregel bør medierne formidle metodiske informationer, men flere informationer er ikke altid bedre, og i værste fald fjerner det fokus fra, hvilke metodiske informationer der er vigtige samt hvordan de bruges.