Potpourri: Statistics #71

SDS 375/395 Data Visualization in R
Demystifying the coalesce function
Data Viz Bookmarks
Data Science: A First Introduction
Crime by the Numbers
The value of p
The Tidyverse in a Table
Sample Size Justification
Learn tidytext with my new learnr course
Using random effects in GAMs with mgcv
Public Policy Analytics: Code & Context for Data Science in Government
How to run 100 regressions without loops in R
Spreadsheet mistakes – news stories
Weights in statistics
Importing Multiple Files Quickly and Efficiently
Making Sense of Sensitivity: Extending Omitted Variable Bias
Microsoft365R: an R interface to the Microsoft 365 suite
fixest: Fast Fixed-Effects Estimations
Grab World Bank Data in R with {WDI}
Lists are my secret weapon for reporting stats with knitr
Building a team of internal R packages
Tidyverse Skills for Data Science in R
Practical Applications in R for Psychologists
Transform List into Dataframe with tidyr and purrr
Main terms and concepts in R
A complete guide to scales
Computational Thinking for Social Scientists
A Crash Course in Good and Bad Controls
Causal design patterns for data analysts
Modern Data Science with R
Generating SQL with {dbplyr} and sqlfluff
Hypothesis test by hand
How to Use Git/GitHub with R
Testing for normality
Scrape Hundreds of PDF Documents From the Web with R and rvest
Radial Patterns in ggplot2
a gRadual intRoduction to Shiny
Reading tables from images with magick
ggplot Wizardry Hands-On


Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70

Hvorfor er flere respondenter ikke nødvendigvis bedre? #3

Avisen Danmark kan rapportere, at et nyt analyseinstitut ved navn Electica er begyndt at foretage politiske meningsmålinger:

Det er en ny spiller på markedet, instituttet Electica, der har målt Nye Borgerlige til 11 procent, Venstre til 9,8 og Konservative til 12,6. Electica måler for Alliancen, der består af fagforbundene NNF, Blik & Rør, Dansk El-forbund og Malerforbundet, og fordi de måler blandt 5000 repræsentativt udvalgte danskere og ikke de cirka 1000-2000, som andre institutter baserer deres målinger på, giver det en statistisk usikkerhed, der blot er halvt så stor som normalt.

Jeg er på ingen måde imponeret over denne beskrivelse. Vi bør ikke have større tiltro til, at denne meningsmåling er mere præcis end hvad de andre analyseinstitutter kan vise med omkring 1000 respondenter. Det er korrekt, at flere respondenter resulterer i en mindre statistisk usikkerhed, men dette er – som nævnt i tidligere indlæg – ikke ensbetydende med, at der er tale om en mere præcis meningsmåling.

Jeg udelukker ikke, at tallene i meningsmålingen kan være præcise, men jeg har valgt ikke at indkludere denne meningsmåling i min oversigt på Politologi.dk. For det første har det ikke være muligt for mig at finde nogen beskrivelse af Electica, herunder hvilke slags analyser de helt præcist foretager. For det andet er jeg skeptisk over for den information om målingen, der bliver præsenteret i dækningen:

Undersøgelsen er gennemført af Electica for Alliancen, der består af fagforbundene NNF, Blik & Rør, Malerforbundet og Dansk El-forbund.

Der er gennemført 5000 interview blandt et repræsentativt udsnit af den danske befolkning i alderen 16 år eller derover. Deltagere mellem 16 og 18 år er siet [sic] fra. 11 procent har ikke svaret på hvilket parti, de ville stemme på. Tilbage var 4138 deltagere over 18 år.

De 5000 er blevet spurgt i perioden 1.-14. februar.

De 5000 er valg [sic] ud med udgangspunkt i den kendte fordeling på kriterierne køn, alder og region, og resultaterne er efterfølgende vejet således, at de afspejler den i Danmark kendte fordeling på kriterierne køn, alder og region

Den maksimale statistiske usikkerhed er på +/- 1,5 procentpoint.

Hvilken interesse har de respektive fagforbund i at betale for meningsmålinger? Ville de også stå på mål for selvsamme meningsmåling, hvis de havde vist, at opbakningen til Socialdemokraterne ville være langt lavere end hvad andre analyseinstitutter viser? Dette mener jeg, at man retmæssigt kan have sin tvivl omkring, hvorfor jeg ikke vil sidestille disse meningsmålinger med de målinger, der foretages af de respekterede analyseinstitutter og YouGov for andre medier.

Til dette kan man indvende, som nogle eksperter udtaler til Avisen Danmark, at kvaliteten er på linje med andre analyseinstitutter. Jeg skal ikke kunne udelukke dette, men jeg er langt mere kritisk ift. at skulle kunne bekræfte, at kvaliteten er den samme. Jeg tror ganske enkelt ikke, uden at være forelagt anden information end den der er tilgængelig, at denne meningsmåling er lige så god – og blot har endnu mere præcision.

Der er åbenlyse spørgsmål, jeg gerne vil have besvaret: Hvordan er de 5000 respondenter blevet udvalgt? Der skrives, at det er en onlineundersøgelse, men hvordan har de helt præcist rekrutteret 5000 respondenter, der udgør en repræsentativ stikprøve, når man vejer efter køn, alder og region? (Og er svarene de samme hvis man også vejer efter stemmeafgivelse ved valget i 2019?)

Flere respondenter i en meningsmåling er ikke et kvalitetstegn i og for sig selv. Hvis man som (nyt) analyseinstitut forsøger at skille sig ud fra resten af branchen, vil mit råd være ikke at sælge sig selv på en større stikprøve, men større transparens omkring, hvad man helt præcist gør.

Policy feedback effects on public opinion: a list of quantitative studies

In my article in the Policy Studies Journal (published in 2019), I provided a review of published quantitative studies that explicitly examine policy feedback effects on public opinion.

Since then, I have noticed several other studies being published and below I provide a list of the studies I have in my archive. As you can read in my article, I introduced several criteria for the selection of studies of interest (for example, I only look at studies using individual-level data).

As I have not been meticulously following the literature over the recent years, I am sure the list is not exhaustive. I plan to update the list when I find additional studies and do feel free to reach out if you are aware of studies that I have missed.

Study Context Policy Outcome
Abou-Chadi and Finnigan (2019) 29 countries Same-sex rights Attitudes toward homosexuality
Anderson (2009) 16/17 countries Labour market policies Social ties
Andersson et al. (2018) Sweden Asylum applications Attitudes towards refugees
Banducci et al. (2016) 28 countries Family policy Government policy attitudes
Barabas (2009) U.S. Private investment account program Support for privatization policies
Barnes and Hope (2017) U.S. Means-tested public assistance Political socialization
Beaudonnet (2015) 27 countries Welfare efficacy Support for the European Union
Bendz (2015) Sweden Privatization reform Attitudes toward health care privatization
Bendz (2017) Sweden Privatization option Attitudes toward health care privatization
Branham (2018) U.S. Policy spending Policy support
Breznau (2017) 19 countries Social spending and decommodification Government responsibility
Bruch et al. (2010) U.S. Government assistance Political engagement
Bruch and Soss (2018) U.S. School experiences Political engagement and government trust
Burlacu et al. (2018) Germany and Sweden Waiting time rights Health system satisfaction
Busemeyer (2013) 20 countries Private share in education funding Attitudes toward redistribution
Busemeyer and Goerres (2014) 20 countries Education Political participation
Busemeyer and Goerres (2020) Germany Public childcare fees Fair fee level
Busemeyer and Iversen (2014) 20 countries Public share of education spending Attitudes toward government spending on education
Busemeyer and Iversen (2020) 20 OECD countries Private welfare provision Support for the welfare state
Busemeyer and Neimanns (2017) 21 countries Childcare and unemployment benefits Government responsibility
Chattopadhyay (2017) U.S. Dependent coverage provision Policy support and political engagement
Córdova and Kras (2020) Brazil Women’s police stations Trust in the police
Davenport (2015) U.S. Policy-induced risk Political participation
Dellmuth and Chalmers (2018) 13 EU member states EU spending Support for the EU
Ellingsæter et al. (2017) Norway Childcare service reforms Childcare service attitudes
Fernandez and Jaime-Castillo (2013) 27 European countries Pension policy attitudes (e.g. generosity) Attitudes toward increasing contributions to the pension system
Fervers (2019) Germany Labour market reform Vote intention
Flavin and Griffin (2009) U.S. Multiple Policy preferences
Flavin and Hartney (2015) U.S. Bargaining laws Political participation
Fleming (2014) U.S. School voucher program Multiple
Garritzmann (2015) 17 countries Education expenditures Attitudes towards student support
Gingrich (2014) 16 countries Tax design and welfare visibility Right-wing party vote
Gingrich (2019) Britain Education policy Economic equality attitudes
Gingrich and Ansell (2012) 18 countries Employment protection legislation and single payer system Government spending attitudes
Guo and Ting (2015) China Social insurance coverage Political participation
Gusmano et al. (2002) U.S. Health care policies Attitudes toward employer involvement in health care
Haselswerdt (2017) U.S. Medicaid beneficiary Political participation
Haselswerdt and Bartels (2015) U.S. Tax expenditure policy tool Approval of social programs
Hedegaard (2014) Denmark Proximity to welfare recipient Social policy preferences
Hedegaard and Larsen (2014) Denmark Proximity to welfare recipient Social policy preferences
Hern (2017a) Zambia Policy access Political participation
Hern (2017b) Zambia Government project access Political participation
Hetling et al. (2008) U.S. Welfare reform Attitudes toward welfare recipients
Im and Meng (2016) China Multiple welfare policies Attitudes toward government responsibility
Jacobs and Mettler (2018) U.S. Access to health care Affordable Care Act attitudes
Jordan (2010) 11 countries Hierarchical health care system Attitudes toward government responsibility
Jordan (2013) 17 countries Welfare policy generosity Government responsibility for welfare
Kerner (2020) 8 countries Pension system Attitudes towards neoliberalism
Kotsadam and Jakobsson (2011) Sweden and Norway Prostitution law Attitudes toward prostitution
Kreitzer et al. (2014) Iowa (U.S.) Same-sex marriage legalizing Support for same-sex marriage
Kumlin (2011) 11 countries Social policy generosity Satisfaction with democracy
Kumlin (2014) Sweden Welfare policy information Performance evaluation
Kumlin and Rothstein (2005) Sweden Needs-tested policies Interpersonal trust
Kweon (2018) 18 European countries Labor market policies Vote choice
Larsen (2018) Denmark Retrenchment reform Government support
Larsen (2020) 30 countries Healthcare policies Government attitudes
Lavery (2014) U.S. Policy information design Political knowledge and engagement
Lavery (2017) U.S. Policy information Political engagement
Lerman and McCabe (2017) U.S. Public insurance Support for health care policies
Li and Wu (2018) China Pension scheme Political trust
Lindh (2015) 17 countries Private funding and public employment Support for market distribution of services
Lü (2014) China Policy benefit Attitudes toward government responsibility and trust in government
Lynch and Myrskylä (2009) 11 European countries Public pensions Attitudes toward pension reforms
MacLean (2011) Africa Public schools and clinics Political participation
Maltby (2017) U.S. Jail ratio Political attitudes and participation
Mettler (2002) U.S. Educational benefits Political participation
Mettler and Stonecash (2008) U.S. Means-tested programs Political participation
Mettler and Welch (2004) U.S. Educational benefits Political participation
Munoz et al. (2014
Spain Austerity package Political engagement
Nagayoshi and Hjerm (2015) 26 countries Labour market policies Anti-immigration attitudes
Ofosu et al. (2019) U.S. Same-sex rights Antigay bias
Pacheco (2013) U.S. Smoking legislation Attitudes toward smoking and smokers
Raven et al. (2011) Netherlands Welfare state spending Preferences for social security spending
Rhodes (2014) U.S. Education policies Political engagement
Rosenthal (2019) U.S. Universal and means-tested policies Political participation
Rönnerstrand and Oskarson (2020) Sweden Waiting-time guarantee Hospital service satisfaction
Sances and Clinton (2021) U.S. Expansion of Medicaid Support towards the Affordable Care Act
Schneider and Jacoby (2003) U.S. Public assistance Multiple
Shore (2014) 26 countries Social benefits Political engagement
Simonovits et al. (2019) U.S. Agricultural payments Electoral participation
Soss (1999) U.S. Social policies (AFDC and SSDI) Political engagement
Soss and Schram (2007) U.S. Welfare reform (TANF) Multiple
Stensöta and Bendz (2020) Sweden Early retirement generosity Policy trust
Sumino (2016) 19 countries Share of taxes in household income Support for taxation
Svallfors (2010) Germany Policy regime Attitudes towards government responsibilities
Swartz et al. (2009) U.S. Social policy assistance Political engagement
Theiss and Kurowska (2019) 9 European countries Social welfare benefits Protest behaviour
Vannoni (2019) Four countries Tobacco advertisement bans Tobacco control attitudes
van Oorschot and Meuleman (2014) 23 countries Unemployment policies Perception of deservingness of the unemployed
Watson (2015) U.K. Conditional benefits recipient Political engagement
Weaver and Lerman (2010) U.S. Contact with the authorities Political engagement and political trust
Yang and Shen (2021) China Social welfare benefits Political trust
Zhu and Lipsmeyer (2015) 19 countries Privatization of healthcare responsibility Support for increasing government healthcare spending
Ziller and Helbling (2017) 21 countries Antidiscrimination laws Public administration evaluation, political trust and democratic satisfaction

Paying for good journalism

There is no such thing as a free lunch. And there is no such thing as free journalism. And there is definitely no such thing as free good journalism. Yet I consume a lot of free good journalism. Good journalism is a public good. However, to get access to good journalism, as for most things that are good in this world, someone should pay. This also means that I am not convinced that we can approach journalism as something people will be willing to pay for when/if it becomes cheaper.

I consume journalism from various outlets, including (but definitely not limited to) The Economist, The Atlantic, The New Yorker, Financial Times, The Guardian, and the Danish Weekendavisen, without (at least directly) paying anything. I have at multiple occasions considered subscribing to one of more of these outlets but I did not find it worth the money. Not because I do not find any of the prices reasonable. The main issue is that each outlet only provides a small proportion of the content I care about. For that reason, I agree 100% with this tweet arguing that the system is broken: “Starting mid-june I began tracking each Twitter link that I follow which ends up at a source that blocks adblockers & demands a subscription to view content. I calculated the monthly cost of each of those sources, if I subscribed to all to see their content. I’m only 15 days into the experiment and the cost so far is $182.47/month. In other words, paying for the journalistic content I value, in the current market system, is a $2,189/year expense. That’s an awful lot. I could pitch in a buck per article and come out ahead.”

The issue is that good journalism is a public good made up by several (semi-)private organisations and I do not have one entry point to get access to everything I need (or at least the content that I would like to pay for). I tried out Blendle back in 2016 and bought several articles but I never used it beyond that. Maybe I found it difficult to navigate the platform, maybe I simply didn’t integrate it into my news consumption habits.

There has been a lot of talk about the need for a ‘Netflix for journalism’. I disagree with this idea for at least two reasons. First, while it is not always easy to distinguish between the two, journalism is not entertainment like TV shows and movies. As Robert Putnam describes it in Bowling Alone: “Although modern media offer both information and entertainment, they increasingly blur the line between the two — it is important from the point of view of civic engagement to treat the two somewhat separately.” In other words, there are often strong analytical reasons to not simply look at journalism as something that should (or could) work as ‘Netflix for journalism’. If ‘Netflix for journalism’ actually worked, I would be concerned about whether it actually was journalism.

Second, I am not even convinced that the ‘Netflix for entertainment’ model is working. This model was more appealing five years ago when Netflix was one of the few players on the streaming market. Today, you have Disney+, HBO Max, Amazon Prime and other streaming platforms. Maybe Netflix is moving closer to ‘New York Times for entertainment’ than media outlets are moving closer to ‘Netflix for journalism’?

The most recent trend has, if anything, been further away from a ‘Netflix for journalism’ model and more towards following (and paying) individual journalists. Substack is the best example of such a model and there are good reasons not to see that as a good model for most journalism.

Of course, there has already been conducted a lot of research on why people want to pay for news. Fletcher and Nielsen (2017), for example, examine people’s willingness to pay for online news and find that people who are already paying for offline news and younger people are more willing to pay for online news (see also Goyanes 2015). This is not surprising.

However, there are questions in this domain I would like some researchers devote more time to. Here is one hypothesis: People will be more willing to pay for expensive journalism. In other words, as journalism becomes cheaper, we might not see the willingness to pay increase but rather decrease. It’s simply a race to the bottom.

If this hypothesis is correct, then journalism is a Veblen good. That is, demand for journalism increases as the price increases (or, in economic terms, the demand curve for journalism can slope upwards). This is partially because the price reflect the quality of a product (see also Coelho and McClure 1993). There are several examples from the medical literature that expensive placebos work better. In one study, by Waber et al. (2008), respondents in an experiment reported that a painkiller with a cost of $2.50 a dose worked much better than a painkiller with a cost of 10c, despite both of them being placebos. Might similar mechanisms be at play when people consume journalism?

One relatively easy way to test this would be to build a “news portal” in an experiment (similar to to approach in Bryanov et al. 2020) and explore how the price of journalism affects decision-making. When are people more likely to pay for expensive news? What citizens are more likely to opt for cheap alternatives? What role does clickbait headlines play? Will paywalls lead some people to be more inclined to opt for sources with a greater amount of mis- or/and disinformation?

It is a well-known point that a lot of bad journalism is free: “the New York Times, the New Yorker, the Washington Post, the New Republic, New York, Harper’s, the New York Review of Books, the Financial Times, and the London Times all have paywalls. Breitbart, Fox News, the Daily Wire, the Federalist, the Washington Examiner, InfoWars: free!”. However, as I stated above, most of the good journalism I consume is free as well. Accordingly, I am not convinced that the solution to bad journalism is to make good journalism cheaper or more easily available.

How to improve your figures #3: Don’t show variable names

When you plot a figure in your favourite statistical software, you will most likely see the name of the variable(s) you are plotting. If your income variable is called inc, your software will call the axis with income for inc and not income. In most cases variable names are not sufficient and you should, for that reason, not show variable names in your figures.

Good variable names are easy to read and write – and follow specific naming conventions. For example, you cannot (and should not) include spaces in your variable names. That is why we use underscores (_) to separate words in variable names. However, R, SPSS and Stata will happily show such underscores in your figures – and you need to fix that.

I believe this is data visualisation 101 but it is something I see a lot, including in published research. For example, take a look at this figure (Figure 1 from this paper):

As you can see, we have Exitfree, Anti_EU and some GDP* variables. The good thing about this paper is that the variable names are mentioned in the main text as well: “Individuals and parties may have ideological objections to European integration and hence desire a free exit right irrespective of whether their country is peripheral. To control for this, a variable variable ‘Anti_EU’ is constructed based on the variable ‘eu_anti_pro’ in the ParlGov database”. However, I would still recommend that you do not show the actual variable names in the figures but use actual names (with spaces and everything).

Let’s look at another few examples from this paper. Here is the first figure:

The important thing is not what the figure is about, but the labels. You will see labels such as PID_rep_dem and age_real. These are not good labels to have in a figure in a paper. age_real is not mentioned anywhere in the paper (only age as a covariate is mentioned).

Let us take a look at Figure 3 from the same paper:

Here you will see a variable called form2. What was form 1? Is there a form 3? When we rely on variable names instead of clear labels, we introduce ambiguity and makes it difficult for the reader to understand what is going on. Notice also the difference between Figure 1 and Figure 3 for age, i.e. age_real and real_age. Are those variables the same (i.e. a correlation of 1)? And if that is the case, why have two age variables?

Okay, next example. Look at Figure 6 from this paper:

Here we see a variable on the x-axis called yrs_since1920 (years since 1920). It would be better having a label for this axis simply being “Years since 1920”. Or even better: just the year and having the actual years on the axis. Notice also here the 1.sønderjylland_ny label. Sønderjylland is not mentioned in the paper and it is not clear how ny (new in Danish) should be understood here (most likely that it wasn’t the first Sønderjylland variable that was created in the data).

Let’s take another example, specifically Figure 3 from this paper:

Here we see the good old underscores en masse. anti_elite, immigrant_blame, ring_wing_complete_populism, rich_blame and left_wing_complete_populism. There are 29 authors on the article in question. Too many cooks spoil the broth? Nahh, I am sure most of the authors on the manuscript didn’t even bother looking at the figures (also, if you want to have fun, take a critical look at the results provided in the appendix!).

And now I notice that all of the examples I have provided above are from Stata. I promise it is a coincidence. However, let’s take one last example from R just to confirm that it is not only an issue in Stata. Specifically, look at Figure 3 in this paper (or Figure 4, Figure 5 and Figure 6):

The figure show trends in public opinion on economic issues in the United States from 1972 to 2016. There are too many dots in the labels here. guar.jobs.n.income, FS.aid.4.college etc. are not ideal labels in your figure.

In sum, I like most of the papers above (there is a reason I found the examples in the first place). However, it is a major turn-off that the figures do not show actual labels but simply rely on the variable names or weird abbreviations to show crucial information.

Medier og makrelmad

Et af de emner som medierne og de politiske kommentatorer har fokuseret en del på på det seneste, er statsminister Mette Frederiksens makrelmad på Instagram. Der har især været et fokus på, hvordan det kan ses som effektiv politisk kommunikation (se eksempelvis her og her). Det synes at være tilfældet, at de fleste har en holdning til Mette Frederiksens brug af Instagram. Dette indlæg er ingen undtagelse. Jeg har dog ingen holdning til selve opslaget, men en holdning til mediernes holdning til opslaget.

Det interessante er, at der på mange måder er tale om så banal politisk kommunikation, at gud og hver mand kan have en holdning til netop den politiske kommunikation. Der er intet subtilt. Ingen tvetydighed. Ingen substans. Kun ved at medierne fokuserer på den politiske kommunikation, formår det at blive til effektiv politisk kommunikation. Det er således ikke en effekt af sociale medier, men – i det tilfælde der er en effekt – en effekt af klassiske medier. Jeg finder det derfor ikke synderligt interessant at diskutere, hvorvidt den politiske kommunikation i og for sig selv på Instagram er effektiv eller ej (spørger man eksperter i sociale medier, der lever af at rådgive om sociale medier, vil de som regel sige, at kommunikation på sociale medier er meget effektiv).

Jeg finder det heller ikke relevant at diskutere, om mad kan være politisk (et af de opstillingsberettigede partier hedder Veganerpartiet). Forskning viser at selv børn dømmer andre børn på baggrund af den mad de spiser. Anden forskning viser, at vi stoler mere på folk, der spiser samme mad som os selv. Det kommer derfor ikke bag på mig, at Mette Frederiksen deler et billede af en makrelmad.

Det der kommer bag på mig er, hvor overrasket medierne synes at være over, at politikere kan finde på at dele billeder af den karakter. Essensen af en potentiel effektiv politisk kommunikation ligger ikke i selve opslagets substans eller fraværet af samme, men i mediernes reaktion på denne. Det er netop mediernes naivitet, der er hele udgangspunktet for, at det overhovedet giver mening at kalde opslaget for effektiv politisk kommmunikation.

Et billede af en makrelmad på et socialt medie til ens følgere vil næppe kunne betegnes som effektiv, politisk kommunikation, hvis ikke medierne tog emnet op som netop politisk kommunikation. Mit problem er ikke, at det at stille spørgsmålet “Er det effektiv politisk kommunikation?” i sig selv kan påvirke, om den politiske kommunikation bliver effektiv. Jeg tvivler blot på, at medierne selv er bevidste herom.

Udregn mandater til Folketinget med R

I mange meningsmålinger rapporteres partiernes opbakning ikke udelukkende med andelen af stemmer i procent, men også som mandattal. D’Hondts metode bruges som bekendt fordelingen af kredsmandater ved Folketingsvalg, der sammen med tillægsmandater sikrer en ligelig fordeling mellem stemmer og mandater ved valget.

Hvis man gerne vil estimere hvor mange mandater de respektive partier står til at få, kan jeg varmt anbefale seatdist pakken til R. Den er udviklet af Juraj Medzihorsky og kan findes her. Når du har installeret pakken kan du nemt hente den ind i R og bruge giveseats() funktionen til at udregne mandater:

giveseats(c(33, 6, 10, 7, 8, 1, 3, 1, 5, 17, 8, 1), 
          ns = 175, 
          thresh = 0.02,
          method = "dh")

Det første vi giver funktionen er en vektor med opbakningen til partierne i procent (jeg har her undladt decimaler blot for at gøre det nemmere at læse). 33 er eksempelvis opbakningen til Socialdemokratiet i procent. ns angiver hvor mange mandater, vi skal fordele (number of seats, i dette tilfælde 175), thres angiver spærregrænsen (2% i dette tilfælde) og method er vores fordelingsmetode (dh for D’Hondt).

Her kan vi se at Socialdemokratiet vil få omkring 61 mandater ved næste folketingsvalg. Dette er selvfølgelig et estimat, da vi 1) har usikkerhed i meningsmålingen og 2) alle mandater ikke fordeles så simpelt ved valget. Vi tager ligeledes ikke de fire nordatlantiske mandater med i betragtning. Ikke desto mindre er det relativt nemt at få et estimat på, hvor stor opbakningen er til partierne i mandater. Pakken giver desuden en lang række muligheder for at undersøge partiernes mandattal, hvis man tog andre mandatfordelingsmetoder i brug.

Does YouTube outages cause rape?

Here is a strange study: When You Can’t Tube… Impact of a Major YouTube Outage on Rapes. And here is the abstract for the study:

On Tuesday, October 16, 2018, YouTube experienced a major and rare global service outage. Using high-frequency crime data from the United States, we document an important increase in rapes in the 24-hour period following the outage. We investigate various potential underlying channels that may link the YouTube outage to the subsequent observed increase in rapes. The overall evidence only supports the hypothesis that the increase in rapes was driven by an increase in pornography viewing.

I find the actual mechanism improbable, i.e. that people – when YouTube is down – will be more likely to engage in certain criminal activities. For that reason, I decided to actually read the paper.

The first reason for calling the study strange is that I do not remember any stories about ‘a major and rare service outage’. Of course, this is most likely just an event that I did not hear about and that happened at a time when I did not use YouTube. It happens now and then that major services are down, including YouTube (together with other Google services) back in December.

Interestingly, when I looked at the paper I could not find any explicit information about how long the actual outage being studied was. The only information on the length of outages is in footnote 8: “Other minor outages were on June 16, 2017 at 10 a.m. for 2 hours; November 12, 2018 at 5 p.m. for an hour; November 18, 2018 at 7 p.m. for half an hour. Note that these outages did not occur at night.”. I guess a few hours of downtime is a minor outage? Well, apparently not.

When I looked into what media outlets reported in relation to the ‘major and rare global service outage’, I was surprised. For example, as reported by Business Insider: “YouTube abruptly went down for around an hour on Tuesday evening”. Around an hour? Well, no surprise I don’t remember it. And no surprise the length of the outage is not mentioned in the paper. I guess the key point is that this outage occurred at night.

There is, however, information that the outage occurred at night between 9 and 11pm: “On Tuesday, October 16, 2018, between 9 and 11p.m. Eastern time, YouTube experienced a major and rare global service outage.” I guess this was the length of the outage, i.e. from 9pm to 11pm. But why then describe another outage of 2 hours as a minor outage!? And why focus on this specific outage?

This is a red flag to me, i.e. when researchers have several events they can explore. And then add other outcomes and time to the mix, or as they write in the paper: “We find that other crimes and offenses (including drug, alcohol, and traffic) were not affected by the outage. We also report that the observed increase in rapes did not occur in the 2-hour period during the outage, but in the 22-hour period after YouTube service had been restored.” This is a major red flag.

What if the researchers found that drug related crimes increased within the first six hours of the outage? Would that be in line with the theory as well? To be fair, the researchers do develop a theoretical framework with the mandatory reference to Becker 1968, but did they actually develop this framework prior to looking at the data? Maybe, but I have been around the block long enough to be suspicious – especially when you save the theoretical framework to the appendix and use individual-level descriptions such as “the agent rapes if the utility from raping is greater than zero” to explain a coefficient estimated with noisy aggregate-level data.

Wait, why am I not impressed by the results? Take a look at Figure 4 in the paper with the placebo tests:

For these placebo tests to be convincing, we would like to see that the coefficients were randomly distributed around zero and the coefficient for day with the outage (0 on the x-axis) would stand out. This is not the case here. On the contrary, we see effects similar to the main result on other so-called placebo days. This is not convincing and there is, in my view, no systematic evidence showing that the YouTube outage caused an increase in the number of rapes.

In sum, the theory and empirics are too weak to convincingly demonstrate that we should expect (or even predict) an increase in the number of rapes when YouTube is down.