Erik Gahner Larsen

Potpourri: Statistics #53

Hvor mange vil stemme på Klaus Riskær Pedersen?

Hos B.T. kan man læse om en chok-måling, der vil sætte et tal på, hvor mange der vil stemme på Klaus Riskær Pedersens parti af samme navn: “Hans nydannede parti, der bærer hans navn, står nemlig til 1,9 procent af stemmerne.”

Det eneste chokerende ved denne måling er, hvor elendig den er ud fra et fagligt perspektiv. YouGov, der har foretaget målingen, gør i hvert fald hvad de kan for at vise, at man ikke bør tage dem seriøst som analyseinstitut. Problemet med målingen er, at der ikke er tale om en meningsmåling, hvor vælgerne er blevet spurgt om, hvilket parti de ville stemme på, hvis der var valg i morgen.

I stedet er vælgerne blevet givet følgende spørgsmål: “Hvor tilbøjelig vil du være til at stemme på ‘Partiet Klaus Riskær Pedersen’ ved det kommende folketingsvalg, som skal afholdes senest den 17. juni 2019?”

Dette fører til, at flere vil give udtryk for, at de vil stemme på partiet, end hvis der var tale om en normal meningsmåling. For at illustrere det problematiske, kan vi kigge på tidligere tilfælde. I 2016 viste en Voxmeter-måling, at 10,8 procent af danskerne ville stemme på Nye Borgerlige. Dette var lodret forkert og udelukkende tilfældet grundet den samme procedure som anvendes i aktuelle YouGov-måling.

Senere i 2016 kom der en lignende måling fra Gallup omhandlende opbakningen til Danskernes Parti. Denne måling viste en opbakning på 3,4 procent til partiet, der for længst er glemt. Ligeledes et tal der ikke siger noget som helst om, hvor mange der ville stemme på partiet ved et folketingsvalg.

Denne slags meningsmålinger er ubrugelige for alle andre end journalister, der ikke forstår metode, og så selvfølgelig de nævnte partier, der har brug for at vise, at de har opbakning i befolkningen. Det eneste interessante ved denne måling er, at end ikke med så misvisende og fejlagtig en måling er der evidens for, at Klaus Riskær Pedersen står til at blive valgt ind.

Hvordan vil danskerne stemme til Europa-Parlamentsvalget?

I en ny rapport fra Europa-Parlamentet foretaget af Kantar Public, gives der tal på, hvordan partierne vil klare sig ved det kommende Europa-Parlamentsvalg.

Der er dog tale om en rapport, der har betydelige svagheder. Disse svagheder gør, at man bør være varsom med at tolke for meget ud af disse resultater. Dette udtaler jeg mig om i en artikel hos Mandag Morgens TjekDet. Artiklen kan læses her.

Potpourri: Statistics #52

How to improve your figures #1: Don’t use the y-axis to mislead

There are good reasons to think carefully about the y-axis when you design figures, including considerations on whether to start your y-axis at zero or not. In this post, I provide a simple piece of advice: when presenting bar charts on a linear scale, start at 0. Not 0.38. Not 0.31. Not 0.04. 0.

The figure below, from Hanel et al. (2018), depicts the same data in three panels. It shows how the same data can be presented in different ways with implications for how we perceive differences between groups.

In the first panel, we see the distributions of the two groups. In the second panel, we see that the y-axis starts at 4.6. In that figure, it looks like the value for Poland (the red bar) is three times greater than the value for the UK (the blue bar). In the third panel, relative to the second panel, we see a much better presentation of the two groups with a y-axis starting at zero.

Despite the fact that bar charts with arbitrary and non-zero starting y-axes are problematic, I see it again and again in scientific publications. Take for example this new article in the American Political Science Review, where the bar charts use the y-axis to mislead. Specifically, they leave the impression of a greater difference between the two groups than is supported by the data:

For another example, take this new article in Political Communication where the bar chart conveniently starts at 4.00% to give the impression of a large difference between the groups. (On a sidenote, I can’t believe how unlucky the authors were. The only statistical finding in the article is the finding that wasn’t preregistered.)

Alas, journals and books are filled with examples of bar charts that use the y-axis to mislead. The general issue is that these figures do not comply to the principle of proportional ink: “The sizes of shaded areas in a visualization need to be proportional to the data values they represent.”

This is not to say that y-axes should always start at zero. On the contrary, there are many cases where figures should definitely not start at zero (see this article from Quartz and this video from Vox for more information). However, when creating a bar plot, the best way to improve your figure is to comply to the principle of proportional ink. Start at 0.

Do men face more discrimination?

An article in the Daily Mail presents the argument that men face more discrimination than women. Similarly, RT writes: “Contrary to everything you’ve ever been told, in most developed countries men are actually more disadvantaged than women, according to new research published in one of the world’s leading scientific journals.” And Yahoo Finance writes “that men actually face more discrimination than women”. The story also made it all the way to Fox News.

The coverage builds upon a new article published in PLOS ONE, “A simplified approach to measuring national gender inequality“. The article begins with a critique of existing indices and especially the popular The Global Gender Gap Index (GGGI, an index I also criticised in a post last year). The authors of the article focus in particular on one aspect of the GGGI, namely that no country (by definition) can be more favourable towards women than to men. As they formulate their critique: “there is no defensible rationale for truncating scores on an ‘equality’ measure when they disadvantage boys or men.”

Based on this, they develop a measure of national gender inequality tapping into three specific dimensions: 1) educational opportunities, 2) life expectancy and 3) life satisfaction. They call this the BIGI, the Basic Index of Gender Inequality. The objective of the index is to pay attention to measures where women perform as well as or better than men. It is, for example, a well-known fact that women tend to live longer than men.

In Figure 1 in the article, the authors present deviation from gender parity across the 134 countries in the sample (missing data is illustrated with a black colour):

What can we learn from this analysis? Here is the main finding of the article that is getting the most attention in the media coverage: “In 91 (68%) of the 131 countries, men were on average more disadvantaged than women, and in the other 43 (32%) countries, women were more disadvantaged than men. The international median of the BIGI is -0.017 (SD = 0.062), that is, nearly a two percent deviation from parity, favoring women.”

While I believe it is great to put emphasis on dimensions where men face problems to a greater extent than women, I do believe there are noteworthy limitations in the study that are lost in the coverage. These limitations are significant and when taken into account, there is no support for the conclusion that men are discriminated more than women.

First and most importantly, the selling point of the study – i.e. the simplified approach – is also the main limitation. The study focuses on a limited set of indicators, selected in favour of men (in other words, to show numbers where women are doing better than men in a lot of countries), that are not necessarily providing a representative picture of gender inequality in a comparative perspective. Accordingly, it is misleading to draw conclusions about whether men in general are more disadvantaged than women.

Second, and related, the study argues that the approach “avoids the difficulties of choosing and weighing indices that are relevant in some contexts but not others, and often may reflect life choices rather than restricted opportunities”. I would argue that simply getting rid of most indicators is not a suitable solution to the challenge of finding relevant indices. As an example, the authors mention that “the ratio of male to female national politicians is only relevant to the tiny proportion of people who choose a political career”. This is incorrect as several studies demonstrate the implications and relevance of having female politicians beyond the career trajectories of the respective politicians (e.g. Anzia and Berry 2011, Clayton et al. 2019, Gilardi 2015, Ladam et al. 2018, and Mendelberg et al. 2013).

Third, I would not make conclusions about the state of gender (in)equality in different countries based on the BIGI. Saudi Arabia, for example, is one of the countries with a relatively high level of overall average gender parity. Granted, I do not know a lot about gender equality and discrimination in Saudi Arabia, but I am reluctant about calling the country a national gender equality pioneer. While the authors provide some post hoc reflections on why Saudi Arabia takes up such a good place, I see no convincing case for taking these scores serious.

Fourth, even if we want to compare countries, we are unable to say whether there are any statistically significant differences between the countries. It can be difficult to compare the scores on the index in substantial terms, and we are unable to say whether any country is actually significantly more equal than any other country.

Fifth, and related, one of the measures used to create the index is the overall life satisfaction data from the Gallup World Poll. However, they do not take any measurement error or uncertainty into account in any of the estimates. Accordingly, while they argue that the life satisfaction score is culturally independent, I do believe additional work is needed before the index scores are useful for what the researchers use it for. Furthermore, the use of survey data significantly limits the data availability and quality. In brief, I am not convinced that the life satisfaction data is of an equal quality and equally representative in the 134 surveyed countries, and we are limited in the spatial and temporal coverage of the index.

Sixth, in connection to the media coverage described above, the study says nothing about discrimination at all. Even if we do not take any of the limitations outlined so far into account, we cannot say anything about actual discrimination. In other words, the news coverage of the study is extremely misleading.

Overall, while I appreciate the objective of providing a better measure of gender inequality in a comparative setting, I do believe that the limitations outlined above render the index useless for actual policy recommendations. As I told the Danish newspaper Weekendavisen the other day, it is important to look at gender inequalities across different countries, but I cannot see the usefulness of this particular index in its current form.

Nyt bogkapitel: Analyse af adfærd via eksperimenter

Har sammen med Pelle Guldborg Hansen skrevet et kapitel om analyse af adfærd via eksperimenter til bogen ‘Metodekogebogen – 129 analysemetoder fra humaniora og samfundsvidenskab’, redigeret af Mie Femø Nielsen og Svend Skriver.

Bogen er, som titlen antyder, en samling af 129 opslag om metoder med relevans for humaniora og samfundsvidenskab. En beskrivelse af bogen fra infomaterialet:

Du får i klar tekst at vide, hvad du skal gøre trin for trin. Hvor begynder du? Hvad skal du være opmærksom på undervejs? Hvilken type resultater kan en konkret metode give dig, og cirka hvor meget tid skal du bruge?

Bogen giver dig inspiration til at forsøge dig med andre metoder, så du kan lave analyser og opnå overraskende indsigter. Det bliver nemmere at udføre ambitiøse, veltilrettelagte og spændende undersøgelser. Resultatet er ny viden både til dig og det omkringliggende samfund.

Den kan købes her, her, her og her.

Det skrantende køn i Weekendavisen

Jeg udtalte mig i Weekendavisen omkring en ny undersøgelse af ligestilling mellem mænd og kvinder i et komparativt perspektiv. Jeg citeres blandt andet for følgende:

»Selvfølgelig er det relevant at se på forskelle mellem kønnene i Danmark, når det gælder sundhed, uddannelse og livstilfredshed, men det bliver fuldstændig meningsløst, når man antyder, at vi skal kigge på Bahrain for at gøre tingene bedre,«

Artiklen kan læses på print og online.

Potpourri: Statistics #51

Podcast on opinion polls

I had the great pleasure of talking to my good friend and colleague, Jack Bridgewater, about opinion polls on the podcast How to Win Arguments with Numbers. The other guests on the podcast this season are Matthew Goodwin, Ruth Dassonneville, Shane Singh, Amanda Bittner, Robert S. Erikson and Joshua Townsley.

You will not learn a lot about how to win arguments with numbers, but hopefully a thing or two on opinion polls. You can find it on Apple Podcasts and SoundCloud. Or listen here:

Read the lightly edited transcript (with references):

JACK BRIDGEWATER: Thanks for coming on the podcast, Erik. If we can begin by just asking the question: what is polling? We talk a lot about polls and a lot of people have different interpretations of polls, but I think it is seldom that we actually think about “What is the methodology behind polling?” and “How does this process actually work?”.

ERIK GAHNER LARSEN: Thanks for having me on, Jack. When we talk about polling we generally talk about opinion polls. What we talk about is a survey designed to represent the opinions of a population. We can’t go out and ask everybody about their opinions all the time, but we can ask a representative sample of a population. So we can ask some people and by asking some people, we can make conclusions about a lot of people. You can compare it to a blood test. Luckily, we do not have to test all blood in a body before we can make conclusions about, say, your body. In the same way, by asking a representative sample of a population, we can make conclusions about what a population thinks of an issue.

However, we rely on certain assumptions. First of all, we make the assumption that opinion polls are representative of the population, so we have the idea that the sample is – on all characteristics – similar to the population, e.g. an equal amount of men and women compared to the population, young and elderly voters and so forth. That’s also where we can see some opinion polls go wrong, if there are systematic biases. But even when there are no systematic biases, we will still have uncertainty. The thing about opinion polls is that we will never talk about 100% certainty. We will have some margin of error, when we talk about polls. I think that is something that is sometimes lost in translation when we are unable to disseminate or communicate the uncertainty we are working with in an opinion poll.

BRIDGEWATER: What are some of the other problems with polling? What else can go wrong?

LARSEN: We are having issues with the way people respond to polls and whether they are responding at all. We have response biases and non-responses biases. We know that the ways questions are asked affect the answers we get. One of the issues we had in the 2016 election was whether people would lie about voting for Trump or not, the argument being that some people would like to vote for Trump but would not be honest about that. So we have a lot of challenges whether to, first of all, whether people are being asked, i.e. whether we are good enough at making a poll representative, and, second, when we get a representative poll, to shed light on to what extent we are tapping into people’s true preferences and attitudes.

BRIDGEWATER: I think, as an outside perspective, it is often underappreciated just how important polling is to all of the social science. Not only voting behaviour, but to all of the social sciences.

LARSEN: Totally. More generally, we live in a democracy and it is important to know about people’s opinions. The best way to know about that is to ask people in a systematic manner. That is something a lot of people do in the social sciences, including political scientists and psychologists. A lot of my colleagues do nothing but conduct surveys and opinion polls, and we know that it is one of the best ways to tap into what people think about certain issues. For better or worse, it is one of the best methods we have; we have alternatives such as vox pops and betting markets. For example, we had betting markets in relation to the Brexit referendum. We also know that politicians, and in particular governments, care about opinion polls as well. Politicians look at opinion polls when they design policies and we know that parties conduct their own opinion polls for internal use to test different political messages.

There is also a brand new study out in the journal West European Politics showing that when governments are polling well, then they are more likely to call an election. Governments look at opinion polls and ask “If we call an election now, are we able to win?”. And conversely, if they are doing bad in the polls they are more likely to split up the government without calling a new election. So we know that opinion polls are quite important, not only for scientists, but also for politicians and the public. To understand contemporary politics, we need to look at opinion polls.

BRIDGEWATER: But the fact that governments could be more likely to call an election if they are doing well in the polls, well, obviously we saw an example of that in the UK with Theresa May. That was probably one of the motivations, that they were so ahead of Labour. But that could tap into a fundamental misunderstanding of polling. There is a lot of evidence to show that outside election periods, opinion polls to do with voting behaviour are not massively informative.

LARSEN: They are to a large extent. However, you are correct that we can’t necessarily predict an election by looking at opinion polls. We know that a lot of things can happen during an election campaign. A government can only look at the polls and see what people will vote today, but they can’t call an election and say “Oh, tomorrow you need to go to the polling station and give your vote”. We can only look at the opinion polls and make certain assumptions and predictions. That being said, they tend to be somewhat correct in what they are predicting.

BRIDGEWATER: If we think about recent polls, that have been seen as failures, most notably Brexit, the 2017 UK election, the 2016 US election, popular opinion seems to be that polling is in crisis, but that isn’t necessarily the insider perspective?

LARSEN: No, exactly. The popular take at the moment is that opinion polls are wrong and we can’t use them anymore. We had, as you say, the Brexit referendum in 2016. We also had the election in 2015. We had the election of Donald Trump in 2016 where the main take was that the polls were wrong. First of all, for the presidential election, as Professor Erikson told you last week, we also had the popular vote that was actually quite spot on. I guess we are good at looking at these specific examples, but as scientists we also know that we should not cherry pick our cases. When we look at the research that has looked into this, they have a measure on mean absolute error, a measure on how incorrect opinion polls are, and when we look at this measure, we see a strong correlation between what the polls are showing and the election outcomes. So in general opinion polls are quite good at predicting elections. When we look at these data over time, we don’t see that opinion polls are becoming less good at predicting election outcomes.

There has of course been some cases where the opinion polls could have done better, but we also have a negativity bias. When opinion polls are doing fine, we tend to forget that – and only look at the specific polls that are incorrect. It’s like the referee in a soccer match where we only remember the decisions that were made that we do not agree with. When opinion polls are doing a fine job we tend to not even recognise or appreciate that. When we look at this in a systematic manner, we see that most polls are doing just fine. What might be the more interesting issue is how polls are being used and how they are being covered in the media.

BRIDGEWATER: Obviously, the media is the middleman between the raw polling and the public. You have quite specialist sites like FiveThirtyEight who are more glued up on how polling works and how we should be a bit more cautious when looking at certain outcomes, but when it comes to media outlets, they have some kind of bias, and that’s going to massively inform how they report. What is the kind of research on what informs how the media presents polls?

LARSEN: There are two interesting elements to this. We got two different bodies of literature on how the media communicate opinion polls. The one is looking at individual polls. How do media outlets select which polls to cover? What we can see there is that the more extreme a poll is, the more likely it is that it will be picked up by media outlets. For example, if you have six opinions polls and five of them show that nothing has changed the last week, and then a sixth poll shows something very extreme, then journalists are much more likely to pay attention to the last poll showing something extreme, well-knowingly that this is not the case.

I have talked to journalists about this issue. Why is it that they pay so much attention to individual polls? I don’t believe journalists are stupid, not all of them at least. They know to a large extent that a specific poll might not be what we will find in follow-up polls, but it is so damn easy to write up an article about that, and it’s something that will give a lot of likes, shares and a lot of attention.

I have done some research on this together with a colleague, Zoltán Fazekas at the University of Oslo where we have looked into this issue. We have looked at what types of news stories are being covered and how are opinion polls being disseminated in the coverage.

Second, the more interesting thing in terms of the coverage is when polls are being aggregated. That’s what we saw in the 2016 election. It’s not like people can say “Yeah, but this opinion poll showed this in the election”. What we are looking at, and what we are mostly talking about, when we look at the 2016 presidential election are these forecasts, e.g. that Clinton has a 98% chance of winning the election. That’s the more problematic issue, when we take a lot of different polls and add them up together and say that there is a specific probability of a certain outcome.

The person that made the best prediction was Nate Silver at FiveThirtyEight. He gave Donald Trump a 28% chance of winning. When people see this, and there is research on this, they are not good at assessing the probability of this actually happen. So what people do is that they overestimate the probability of a certain outcome when they see these numbers presented in a probabilistic manner. When they see that Hillary Clinton has a 75% chance of winning, they don’t think about the likelihood of Trump winning. So when Hillary Clinton is not winning, the polls must be incorrect. And of course, some polls were incorrect in key states – it is not about that. However, we are very bad at assessing these probabilities and making sense of them. I think that’s one of the key lessons we can draw from the 2016 presidential election. How do we actually communicate and aggregate these opinion polls?

What is happening is that we are getting rid of some of the uncertainty. When we add up all these opinion polls, even though a lot of these polls will be correct, if they are biased in some minor manner they can all add up and give Hillary Clinton a 98% chance of winning which most likely will be false.

BRIDGEWATER: When someone has a 75% chance of winning that means they have a 25% chance of not winning. If they don’t win, that doesn’t mean the prediction was wrong.

LARSEN: Exactly. I’m quite ambivalent in terms of that interpretation though. You are totally right that it means that one out of four times we will see another outcome, but it is also important to keep in mind that it’s an easy excuse to use if you are Nate Silver at FiveThirtyEight, i.e. “We didn’t say 100% so there is nothing wrong”. I can see that argument but we might want to think about ways in which we can communicate opinion polls and the aggregated information from these opinion polls while remembering the uncertainty and not communicate these large certainties.

BRIDGEWATER: Based on the information we had at the time, it was still – regardless of the outcome – it was still a sensible prediction to think that Hillary was going to win.

LARSEN: It is a very good point. If we look at the these forecasts in isolation, we can say that Nate Silver only gave Hillary Clinton 72%, but if we look at the other forecasts, one forecast gave Hillary Clinton 85%, and I think it was Huffington Post that gave Clinton 98%. I don’t think people just look at one forecast; they look at all of them or at least some of them and say that there is a systematic pattern here. That will of course also affect the overall reporting. We had stories about what Hillary Clinton will do when she is president. It is of course something that will have spillover effects on other aspects of the political coverage. It was basically assumed that she would be the next president.

There are some discussions about whether that could affect the election as well, e.g. whether the certainty that Hillary would win made people less likely to vote or whether people were more likely to vote for a third candidate because Clinton was the most likely winner. So, people might not be good looking at these individual forecasts in isolation. There might also be an asymmetry in the way that we don’t think about a probability of Hillary Clinton winning as the same as the reverse, being that she has this probability of losing. It might be that if we had put more attention to the fact that Donald Trump in some forecasts had a probability of 25% of winning, people might have perceived that information in a different way.

BRIDGEWATER: If someone told you that you had a 25% chance of winning the lottery, that would be amazing.

LARSEN: I like those odds.

BRIDGEWATER: Going forward, what are the lessons we can learn – both the media, but also us consumers of news – about how to interpret polls and how to make the best of polls?

LARSEN: The first thing to keep in mind is that polls are not perfect. Some of the people that are the most critical of polls are the people working with them, such as scientists. We need to be critical towards polls. We should accept that they are a great tool but not perfect. They are the best method we know of compared to other methods. It is way better than asking random people on the street about what they think. It is better than looking at betting markets and so fourth.

What we need to have are discussions about how not only to conduct opinion polls in the future, but also how we can ensure that journalists cover polls in the best possible way. They should be aware about the uncertainties, the potential problems with these polls and also some selfawareness about the impact that this coverage might have on the public. One argument could be that these opinion polls can be self-fulfilling prophecies. They can have this bandwagon effect where people are more likely to go with the popular candidate, but that wasn’t totally in line with what we saw in 2016. That is the other mechanism, that it might demobilise some voters. Some of these debates are what we should have at the moment.

For the more general aspects of what we will see, we will se that people will also discuss opinion polls in relation to specific elections. We have the midterms coming up in the US and I’m sure there will be a lot of discussions about the quality of opinion polls. We will have people saying that opinion polls were either saved by the election or that they finally proved that there is no hope for opinion polls.

But I couldn’t care less about the individual outcomes and how polls are doing in one specific election. It is important to keep in mind that we want to look at overall patterns and how polls are performing in general. Opinion polls might be correct but for wrong reasons. We want to evaluate opinion polls based on the methods that they are using. We want to ensure that they are conducted in a transparent manner so we can evaluate how good they are. That is something that will be interesting to follow in the future.

We know that a lot of researchers are looking at non-representative samples. So, how can we use samples that are not representative of a population but use statistical techniques to make them representative? We had researchers in 2012 using the Xbox gaming platform, which is a very non-representative sample overrepresented by men, young men in particular. They used that data, adjusted the data and used techniques called multilevel regression and post-stratification to actually predict the election. That is some of the interesting things going on at the moment, i.e. researchers trying to use non-representative samples to make polls better.

We also see more and more people use social media data to try to make predictions about the public. As more people from different sociodemographic and socioeconomic groups will begin to use social media, we see that there are endless ways of making interesting predictions about what will happen and tap into public opinion in very interesting ways that we might not even be able to using traditional survey techniques.

When people say that it is the death of opinion polls I think it is the opposite. We have only seen the beginning now and we are going to see a lot more interesting stuff in the future.

BRIDGEWATER: Thanks a lot! Very interesting.

LARSEN: My pleasure.