Ten great R functions #2

This is a follow-up post to my previous post on great R functions. I use some of these functions a lot while a few of the functions have been very helpful at least once.

11. dplyr::coalesce()

I have been working with data where two columns have the relevant data that needed to be in one column. For example, there might be an outcome for the treatment group and an outcome for the control group in a survey experiment, but each observation only has a value on one of these variables.

To create one variable with all of the information, we can use the coalesce() function. This function will find the first non-missing element across several columns and add that to a variable. In the example below we create a new variable (var3) that is merged from two other variables.

df <- tibble(id = 1:4,
             var1 = c(1, 2, NA, NA),
             var2 = c(NA, NA, 3, 4))

df %>% 
  mutate(var3 = coalesce(var1, var2))

The new variable will have the values 1, 2, 3 and 4.

12. fs::dir_ls()

If you need a character vector with the files in a folder, preferably complying with a specific regular expression, the dir_ls function in the fs package got your covered. The example below will return all *.csv files in your working directory (you can also specify a specific path if it should not be your working directory).

fs::dir_ls(regexp = "\\.csv$")

13. janitor::clean_names()

The clean_names() function does exactly what it promises: clean names. When I get Excel datasets to work with, the first row often have names that are not ideal variable names, including spaces and different signs.

In the example below I create an empty dataset with two variables with the horrible names: Annual sales (USD) and Growth rate (%). Then I use the clean_names() function to get clean names from the data frame. Specifically, the function takes the variable names and edit them into snake_case names.

df <- data.frame("Annual sales (USD)" = NA,
                 "Growth rate (%)" = NA)

janitor::clean_names(df)

The variable names returned from the function are annual_sales_usd and growth_rate. Much better!

14. dplyr::add_count()

Yet another function that does exactly what it promises. add_count() adds a count of the variable of interest, i.e., the number of values with that specific value. The example below will count how many observations have the specific values on the gear variable in mtcars and add that information to a new variable (gear_n).

mtcars %>% 
  add_count(gear, name = "gear_n")

The function is similar to the count() function but it will not group all observations together on the selected variable. Accordingly, you should only use count() if you want to summarise your data without having to use group_by().

15. performance::check_collinearity()

When you estimate a regression model, you often need to check whether certain assumptions hold or not. The performance package got a lot of relevant functions that makes this easy, such as check_collinearity(). This function easily let you examine the potential multicollinearity in your model.

You can read more about the function and see examples here.

16. dplyr::across()

If you need to apply a function (or functions) across multiple columns, across() is a great function to use. In one of my scripts, I had to create confidence intervals for poll estimates, and I used the function to create new variables with the maximum and minimum estimates.

polls %>% 
  mutate(across(starts_with("party"), 
                ~ .x + 1.96 * sqrt((.x * (100 - .x)) / n), 
                .names = "ci_max_{.col}"),
         across(starts_with("party"), 
                ~ .x - 1.96 * sqrt((.x * (100 - .x)) / n), 
                .names = "ci_min_{.col}")
  )

As you can see, the function takes all variables that starts with “party”, calculates the lower and upper estimates and saves the information in new variables.

17. RVerbalExpressions::rx()

Writing regular expressions can be difficult and involve a lot of frustration. The rx() function let you easily write code that returns the regular expression you want. You can see several good examples on how to use the function here.

18. lubridate::make_date()

make_date() is a great function that easily creates a date variable when you have the information on year, month and day in three separate variables. For example:

df %>% 
  mutate(date = make_date(year, month, day))

19. dplyr::pull()

If you want to extract a single column from a data frame, you can use the pull() function. The example below pulls the gear variable from the data frame and then returns the summary of the variable.

mtcars %>% 
  pull(gear) %>% 
  summary()

Similarly, if you want to extract an element from a list, you can use the pluck() function.

20. scales::show_col()

This was a function I was not familiar with until I saw Andrew Heiss mentioning it on Twitter. It is an amazing function to explore different colour schemes. Do check it out.

Correlation and causation with Friends

There has been a lot of talk about the TV show Friends in 2021, especially related to the reunion of the cast. I have written about the show before (in Danish), and while it is not a show that I am going to watch again anytime soon, there are a few scenes that might be useful for teaching purposes.

Specifically, I used a few clips in the past when I was teaching students about how correlation does not imply causation. The first clip is about a conversation between Joey and Rachel on why their fridge broke down. The second clip is a conversation between most of the friends about what happens when Phoebe visits the dentist.

Of course, the examples are superficial and I have primarily used the two examples in my introdoctory teaching to let students discuss correlation and causality in the simplest of terms. My experience is that both examples work well.

Last, when teaching statistics, you might also consider using data related to Friends. Emil Hvitfeldt has released an R package with the entire transcript from the TV show Friends. The package is on CRAN. There is some good material on how to analyse the data in the TidyTuesday videos from David Robinson and TidyX. Noteworthy, this is not the only R package related to Friends. You also have the centralperk package that enables you to get random quotes from Friends.

If you would like to analyse data from other TV shows, you can also check out the entire transcript from The Office. For examples on how to analyse this data, I can highly recommend the blog posts by Eric Ekholm (part 1, part 2, part 3).

The reliability of flight emission calculators

You cannot go on a flight without increasing your personal carbon footprint. But how much are you – on average – increasing your carbon footprint when you fly from, say, London to New York? And how much do you need to pay in order to offset your flight? Luckily, there are popular state-of-the-art carbon calculators that let you enter your departure and arrival destination in order to give you an estimate on the amount of CO2 (or CO2e) that your specific flight has caused – and how much you should pay to offset your flight.

There has been some research on online carbon footprint calculators. Mulrow et al. (2019), for example, provide a review of 31 online carbon footprint calculators that help people calculate their carbon footprint in relation to home energy (e.g., electric, gas, heating oil, wood, and charcoal), transportation (e.g., car, motorcycle, bus, train, tram, subway, ferry, and taxi), air transportation (i.e. flights), food, water/waste water and other categories (such as recycling). Interestingly, for air transportation, they find significant variation in how detailed these calculators are. Some just let users pick whether they have been on a short or a long flight, whereas the best calculators let people pick the origin and destination airports.

It is not as simple as one might think to calculate the carbon emissions associated with a flight. Birnik (2013) describes two relevant quality principles for online carbon calculators in relation to transportation emissions, namely that they should allow users to 1) model their transportation related emissions in detail and 2) include radiative forcing of flights when modeling flight emissions. Accordingly, there are a lot of technical aspects to consider when calculating the carbon emissions associated with a flight. Do we simply rely on the shortest distance between departure and arrival airports? Do we include a detour factor? What about the aircraft type? And what about the fact that the fuel consumption differ between short-haul and lang-haul flights? And as carbon emissions are often calculated by dividing the average number of seats on an aircraft and multiplied by the load factor, should you pay more if less people are on the flight? And what about the emission of other greenhose gases alongside carbon?

Unsurprisingly, different calculators end up with different estimates for the same flight. This article, for example, states that: “A long-haul return flight to New York from London can produce anywhere between 0.9 and 2.8 tonnes of CO2.” And this great post argues that you cannot always trust flight emission calculations. This made me wonder whether flight emission calculators not only provide different calculations, but also what the implications are for what people need to pay in order to offset their flights.

I decided to collect data on how much it would cost to offset your flight from London to New York from different carbon credit sellers. Specifically, I looked at data from atmosfair, C-Level, Carbon Footprint, Carbon Habitat, Clear, ClimateCare, Co2nsensus, CredibleCarbon, Ecosphere+, GoodPlanet, Greentripper, Leapfrog, myclimate, Offsetters, South Pole, TerraPass, World Land Trust, Wren, and Zeromission. These carbon emission calculators all allow you to pick your departure and arrival airport and then give you an exact price for how much it will cost to offset your flight.

There is a lot of variation in the level of detail in the different calculators. Some of the calculators include radiative forcing (e.g., Carbon Footprint), and some calculators let you provide detailed information on your flight, e.g., atmosfair (flight type and aircraft type). The various assumptions these calculators rely on will lead to different emisions estimates. However, we do not how much variation we will see in the offsetting prices, i.e., how much you will need to pay in order to offset your flight.

For my small-scale analysis, I picked London Heathrow Airport (LHR) as the departure airport and John F. Kennedy International Airport (JFK) as the arrival airport for a direct flight (no stopover, no return). I also did the same but with a non-direct flight (with a stop in Amsterdam Airport Schiphol, AMS). Last, I also did a short-haul flight from LHR to AMS. For all trips, I selected one person. I got the carbon emisisons and prices for all flight classes that were available (Economy, Business, etc.). If different offset projects with different tCO2 prices were available, I would pick the cheapest.

This gave me a total of 155 estimates. The flight emission calculators do not only report different emission estimates, but also very different prices to offset the flight. As is clear from the figure below, we see very different carbon emission estimates and prices (in GBP).

In the figure showing the flights from London to Amsterdam, we see relatively identical prices despite some variation in the emission estimates. However, for a long-haul flight from London to New York, we see offset prices – for a first-class flight – in the range of £10-70. This indicates that it is not an exact science to calculate how much you need to pay in order to offset your trip.

We can see that there is also some variation in the estimates on how much carbon is being emitted. Conditional upon the flight type, the data here shows that you can emit anywhere from below 1000 kg CO2e to above 3500 kg CO2e when you fly from New York to London. As a fun fact, between 2000 and 5000 kg CO2e is the amount of carbon it takes to produce one kg of dried cannabis flower in the United States (cf. Summers et al. 2021).

The variation in the prices can – at least partially – be explained by the different projects the carbon credit sellers buy credits from. While most of them buy credits from verified projects, there is still a lot of variation in the quality and costs. Sadly, there is a lack of transparency in the market and it is not easy to assess the quality of the projects that different carbon credit sellers offer. However, I would definitely not go with the cheapest prices in the figure above if I was about to offset a flight.

You should care about reducing your carbon footprint. Seriously. However, don’t see flight emission calculators as very precise (especially when looking at the price of offsetting) and, as always, do consider low-emission alternatives to flying when/if you travel.

Min Top 50 – Dansk rap #2

I 2010 skrev jeg et indlæg med mine 50 yndlingsnumre inden for dansk rap. Der er uden tvivl numre på listen, der ikke ville finde plads på en lignende liste, skulle jeg lave den nu. For slet ikke at tænke på de numre, jeg knap nok kan genkende eller huske i dag.

I stedet for at revidere en liste, passer det mig bedre blot at tilføje flere numre til listen i 2021 (ligesom med mit indlæg med serieanbefalinger fra forrige år). Derfor tilføjer jeg i dette indlæg 50 ekstra numre til listen (så teknisk set en top 100 nu).

  1. Benal – Nu Her
  2. Benal – Tænker Lidt På dig (feat. Wads.png)
  3. Binær – Nakkesved
  4. Clemens – Uanset hvad I siger
  5. Emil Kruse – Hver Gang Du Er Nær
  6. Emil Kruse – Hvorhen
  7. Emil Kruse – Hørt Fra Dig
  8. Emil Kruse – Ka’ Du Se
  9. Hans Philip – Et Studie i Overtænkning
  10. Hans Philip – Drenge & Piger
  11. Hans Philip – Saa Blaa
  12. Hans Philip – Siger Ingenting
  13. Hvid Sjokolade – Disharmoni
  14. Joe True – Grill ting (feat. Stik Op og Twajs)
  15. Jokeren – Den Tørstige Digter
  16. Jokeren – Gå Væk! (feat. Blæs Bukki)
  17. Jøden – Udknast
  18. Kasper Spez – Det Måtte Jo Komme
  19. Kasper Spez – Så Fint
  20. L.O.C. – Anneliese Michel
  21. L.O.C. – Metanoia
  22. Loke Deph – Abus
  23. Loke Deph – Anne Linnet
  24. Loke Deph – Beluga (feat. Esben)
  25. Loke Deph – God Tid
  26. Loke Deph – Kom Ind
  27. Loke Deph – Malstrøm
  28. Loke Deph – Pantomine
  29. Loke Deph – Vor Frelser
  30. Malk de Koijn – En Gang
  31. Malk de Koijn – Rocstar
  32. Malk de Koijn – Weekendkriger
  33. Marvelous Mosell – Elementernes Rasen (feat. Tue Track)
  34. Marvelous Mosell – Visdom
  35. Mund de Carlo – 87’er
  36. Mund de Carlo – Igen
  37. Mund de Carlo – Stik den
  38. Pede B – Fokuseret
  39. Pede B – Undskyld Vi Er Her
  40. Per Vers – Uden Navn
  41. Per Vers – Uheldige Svin
  42. Pind – Plastic
  43. Strøm – I Familiens Skød
  44. Sund Fornuft – Yeah!
  45. Trepac – Jeres Søn
  46. UFO Yepha – Stille Og Roligt Knald På
  47. UFO Yepha – Vi 2 Amigo
  48. Ukendt Kunstner – Feelings
  49. Ukendt Kunstner – Boulevarden
  50. Østkyst Hustlers – B-Mand

Jeg regner med at tilføje 50 ekstra numre til listen om godt ti år.

25 interesting facts #11

251. Democratization is associated with more deforestation (Sanford 2021)

252. Since the 1950s, there has been a moral aversion to using water as a weapon in armed conflicts (Grech-Madin 2021)

253. People with disagreeable personalities do not have an advantage in pursuing power at work (Anderson et al. 2020)

254. Overconfidence in news judgments is associated with false news susceptibility (Lyons et al. 2021)

255. The correlation between social media use and negative outcomes (such as loneliness) is small (Appel et al. 2020)

256. Brexit increased consumer prices by 2.9 percent, costing the average household £870 per year (Breinlich et al. 2021)

257. Users contribute to Stack Overflow as a way to improve future employment prospects (Xu et al. 2020)

258. People with low self-esteem and a weaker sense of control over their fates are more likely to blame the political system for the challenges they face in their lives (Baird and Wolak 2021)

259. Between the fifth and ninth centuries CE, Korea and Japan peacefully developed state institutions through emulation and learning from China (Huang and Kang 2021)

260. The proportion of employees describing their jobs as useless is low and declining (Soffia et al. 2021)

261. People are more likely to attribute moral standing to beautiful animals (Klebl et al. 2021)

262. Wolves make roadways safer because they reduce deer–vehicle collisions (Raynor et al. 2021)

263. Marijuana use is not a reliable gateway cause of illicit drug use (Jorgensen and Wells 2021)

264. Push polls increase false memories for fake news stories (Murphy et al. 2021)

265. World Bank project aid targets richer parts of countries (Briggs 2021)

266. Most people, if given the opportunity, would not want to know about upcoming negative events (Gigerenzer and Garcia-Retamero 2017)

267. There is substantial evidence on the negative impact of climate change on the planet (Hausfather et al. 2020, Im et al. 2017, Kulp and Strauss 2019, Mora et al. 2017)

268. The George Floyd protests decreased favorability toward the police (Reny and Newman 2021)

269. In Venezuela, the economic consequences of the Chavez administration were bleak (Grier and Maynard 2016)

270. Since the 1980s, the global twinning rate has increased by a third (Monden et al. 2021)

271. Some people are attracted sexually to intelligence (Gignac et al. 2018)

272. Luxury consumption can be a profitable social strategy (Nelissen and Meijers 2011)

273. Our daily mobility is characterized by a deep-rooted regularity (Song et al. 2010)

274. Men and women candidates are similarly persistent after losing elections (Bernhard and de Benedictis-Kessner 2021)

275. Public attitudes toward immigration in Europe become more negative closer to elections (Dekeyser and Freedman 2021)


Previous posts: #10 #9 #8 #7 #6 #5 #4 #3 #2 #1

Potpourri: Statistics #79

Bayes Rules! An Introduction to Bayesian Modeling with R
A friendly introduction to machine learning compilers and optimizers
A History of Polar Area / Coxcomb / Rose charts & how to make them in R’s ggplot2
A Dataset of Cryptic Crossword Clues
– Survival Analysis: Part I: Basic concepts and first analyses, Part II: Multivariate data analysis – an introduction to concepts and methods, Part III: Multivariate data analysis – choosing a model and assessing its adequacy and fit, Part IV: Further concepts and methods in survival analysis
Dataviz Accessibility Resources
RegExplain
A Succinct Intro to R
Deep Learning’s Diminishing Returns
Working with Google Sheets from R
The Rise of the Pandemic Dashboard
Predicting FT Trending Topics
The Art of Linear Algebra: Graphic Notes on “Linear Algebra for Everyone”
Modeling Possibly Nonlinear Confounders
ggHoriPlot: build horizon plots in ggplot2
Finding the Eras of MTV’s The Challenge Through Clustering
Why data scientists shouldn’t need to know Kubernetes
Creating a Dataset from an Image in R Markdown using reticulate
Neural Networks from scratch
plotDK: Plot Summary Statistics as Choropleth Maps of Danish Administrative Areas
The Power of Parameterized Reports With Plumber
Riding tables with {gt} and {gtExtras}
How to explain gradient boosting
How to visualize decision trees
Speech and Language Processing
Sexy up your logistic regression model with logit dotplots
AI’s Islamophobia problem
Possession Is The Puzzle Of Soccer Analytics. These Models Are Trying To Solve It.


Previous posts: #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 #17 #18 #19 #20 #21 #22 #23 #24 #25 #26 #27 #28 #29 #30 #31 #32 #33 #34 #35 #36 #37 #38 #39 #40 #41 #42 #43 #44 #45 #46 #47 #48 #49 #50 #51 #52 #53 #54 #55 #56 #57 #58 #59 #60 #61 #62 #63 #64 #65 #66 #67 #68 #69 #70 #71 #72 #73 #74 #75 #76 #77 #78

Assorted links #7

181. Lectures on unemployment
182. Reversals in psychology
183. Objective or Biased
184. Place-Based Carbon Calculator
185. User:Emijrp/All Human Knowledge
186. The 500 Greatest Songs of All Time
187. “I’ll Finish It This Week” And Other Lies
188. How to Study Mathematics
189. Some observations about life in Denmark vs. life in the US
190. ChessCoach: A neural network-based chess engine capable of natural language commentary
191. Music for Programming
192. Through Scandinavia, Darkly: A Criminological Critique of Nordic Noir
193. The Regular Expression Edition
194. How To Get Better at Painting – Without Painting Anything
195. The housing theory of everything
196. On the impracticality of a cheeseburger
197. The tangled history of mRNA vaccines
198. The invisible addiction: is it time to give up caffeine?
199. The worst volume control UI in the world
200. The biggest pandemic risk? Viral misinformation
201. The Is this prime? game
202. Open Source Alternatives
203. The most counterintuitive facts in all of mathematics, computer science, and physics
204. Halt and Catch Fire Syllabus
205. Typing Practice
206. Into the Fairy Castle: The Persistence of Victorian Liberalism
207. Chart Appreciation: Iraq’s Bloody Toll by Simon Scarr
208. A Fable of the OC
209. The PayPal Mafia
210. In Praise of Small Menus


Previous posts: #1 #2 #3 #4 #5 #6

How to study

I was reading this article on how to study. The article provides great advice, such as space out your study sessions and rely on retrieval practice. That made me reflect upon my own approach to studying and how it has changed over time. When I started studying (many years ago now!), I read every single word in every text to make sure I did not miss out on anything important. I read the text from A to Z, from the first to the last page (not including the list of references). ((Ironically, today, I primarily consult the list of references when I read academic texts before actually reading anything beyond the title and abstract. )) However, this took a very long time and was definitely not a sustainable strategy.

Luckily, I found out that it was not only a waste of time to read everything, but also not the best way to engage with all the material. In the years where I was teaching, I gave a lot of students advice on how to study, and I told them again and again that what is important to learn is how to study rather than the specific content in the curriculum. Teach a man to fish and what have you.

Today, what I find interesting is the need to move from our traditional understanding of literacy to that of digital literacy. In brief, studying today is radically different than studying, say, 20 years ago. The table in the article From Written to Digital: The New Literacy covers the key differences between the two well:

This is not to say that reading is not important. It is. But when you study, you should focus on reading beyond the text at hand. Think about how it is connected to other studies, papers, books, ideas, etc. The important thing is not what you get out of a text – but where you store what you get out of it.

Time is a limited resource and you need to optimise your reading. I truly believe in slow reading, and the more time I spend with a text, the more time I will not only spend processing each paragraph, but also think about connections and implications. Books are like meals. You do not remember every aspect of any meal you eat, but they have an impact on your thinking, and you need to be very cautious with what you put into your body/brain. However, when studying, you cannot spend too much time with the same text as the marginal return will quickly decline.

So, how should you study? There are (at least) five different study strategies, namely (re)reading, highlighting, note-taking, outlining and flash cards (Miyatsu et al. 2018). I don’t think one strategy is intrinsically better, so I think it is more a question of finding the strategy that suits you best. Putnam et al. (2016) provide a set of specific strategies for how to optimise learning that are worth considering (from Table 1 in the paper):

  • Space out your learning.
    • Study for a little bit every day, rather than cramming in one long session.
    • Start studying early, and touch on each topic during each study session.
    • Reading before class and reviewing lecture notes after class will help consolidate what was covered in class.
  • Learn more by testing yourself.
    • Instead of writing a chapter summary as you read, write down what you remember after you read, recalling the details from memory. Then, check to see how well you did (the read-recite-review method).
    • Answer the “end-of-chapter” questions both before and after you read a chapter.
    • Use flash cards to learn key vocabulary. Retrieve the idea from memory (before looking at the answer) and use a larger (rather than a smaller) stack of cards. Put answers you missed back in the deck at an early place and the ones you got right at the end. Finally, aim to recall each item correctly multiple times before taking a card out of the deck.
    • Be skeptical about what you think you know—testing yourself can provide a better picture about which concepts you know
      well and which you might need to study further.
  • Get the most out of your class sessions.
    • Attend every class session.
    • Stay focused during class by leaving your laptop at home; you’ll avoid distracting yourself and your classmates, and you may remember more by taking notes by hand.
    • Ask your professor for a copy of any PowerPoint slides before class, so that you can take notes directly on the slide handout.
  • Be an active reader.
    • Instead of speeding through your reading, slow down and aim for understanding.
    • Ask yourself questions as you read, such as, “What did I learn on this page?” and “What on this page is new to me?”
    • Finally, write some of your own questions about tricky concepts: “What is an example of X in real life?” or “How is Theory X different from Theory Z?”
  • Other general tips.
    • Get organized early in the semester: Put major due dates and exams on your calendar, set reminders to get start studying early, and be sure to look at your calendar at least once a week so you can plan ahead.
    • Get some exercise. Going for a 50-min walk in nature can enhance your ability to focus on difficult tasks.
    • Sleep! Sleeping is critical for ensuring that memories are successfully stored in long-term memory.

There are different ways to study, and my own challenge over the years has primarily been one of finding the motivation. Interestingly, the motivation to study for me personally is often stronger when I have studied. For example, when I have accomplished something, my motivation to keep going is stronger. When I completed a module, my motivation to read through the papers and books again was stronger than prior to taking the module. I guess what I am trying to say is that finding a good way to study is not easy, and whatever works for you … works for you.

Updating the replication material for “Welfare Retrenchments and Government Support”

In 2017, I pushed the replication material for my article, ‘Welfare Retrenchments and Government Support’, to a GitHub repository. I had been working on the article for years and the code was not necessarily up to date. It worked perfectly, gave the exact estimates and was relatively easy to read. Accordingly, everything was good, life was simple and I felt confident that I would never have to look at the code again.

This turned out not to be the case. I recently got a mail from a student who was unable to get the exact estimates as reported in Table 1 in the paper, even when following my script and using the data I made publicly available. I went through the code and I noticed that I could not reproduce the exact estimates with my current R setup. Sure, the results were substantially identical but not the exact same – and the N was also different.

I looked into the issue and I could see that changes were made to the defaults of set.seed() in R 3.6.0. As I ran the original analyses in R 3.3.1, and I am now using R 4.1.0, this could explain why the matching procedure I rely on is not returning the exact matches. For that reason, I decided to make some updates to the replication material so there now is a dataset with the matched data. The script is doing the same as before, but it is not relying on the matched data obtained with the setup in R 3.3.1. This should make it a lot easier to get the exact same estimates as provided throughout the paper.

To increase the changes of long-term reproducibility, I should consider using packrat or a Docker container (I primarily use Docker for my Shiny dashboards). However, as the analyses are mostly a few OLS regressions, I believe this would be overkill and would not necessarily make it easier for most people to easily download the data and script and play around with the results. And I don’t mind making extra updates in the future if needed in order to reproduce the results with different setups.

Interestingly, I did all of these analyses before I doubled down on tidyverse and for that reason I decided to make a series of additional updates to the material, including:

  • More spaces to make the code easier to read. For example, instead of x=week, y=su it is now x = week, y = su.
  • The use of underscores (snake cases) instead of dots. For example, the object ess.matched is now ess_matched.
  • A significant reduction in the use of dollar signs (primarily by the use of mutate()).
  • The use of pivot_longer() instead of gather().
  • No double mention of the variable edulevel in the variable selection.
  • Removing the deprecated type.dots argument from rdplot().
  • The use of seq(0.01, 0.25, 0.01) instead of having 0.01, 0.02, 0.03, 0.04, etc. all the way to 0.25!
  • The use of map_df() instead of a for loop.

And a series of other minor changes that makes the code easier to read and use in 2021. I have made the updated material available in the GitHub repository. There is a revised R-script for the analysis, a dataset with the matched observations and a file with the session info on the current setup I used to reproduce the results.

I have started using the new native pipe operator in R (|>) instead of the tidyverse pipe (%>%), but I decided not to change this in the current version to make sure that the script is also working well using the version of R I used to conduct the analysis years ago. In other words, the 2021 script should work using both R 3.3.1 and R 4.1.0.

I also thought about using the essurvey package to get the data from the European Social Survey (we have an example on how to do that in the Quantitative Politics with R book), but I find it safer to only work with local copies of the data and not rely on this package being available in the future.

In a parallel universe a more productive version of myself would spend time and energy on more fruitful endeavors than updating the material for an article published years ago. However, I can highly recommend going through old material and see whether and if it still works. Some of the issues you might encounter will help you a lot in ensuring that the replication material you create for future projects are also more likely to stand the test of time.

33

Thirty-three. Another year, same me ±95% CIs. One third of a hundred, give or take.

I read the post I wrote last year when I turned 32. It all seemed so recent. I could, in principle, repost my thoughts from last year and call it ’33’. There is not much of significance, if anything, to report on in my life now that I am 33. However, especially in the context of COVID-19, I guess it is a point in and by itself. It all feels like ‘no news’. It is the experience of waking up one morning and suddenly being a year older. One year of the one life I have to live. Definitely not a year wasted, but not a year “lived” either.

Of course, I have lived another year. In the grand scheme of things, I seriously cannot complain even one bit. I am very much aware of my privileges. I have had a lot of great experiences. I have not had any major setbacks. I have never worked this much. I have never relaxed this much. I have never read this much. I have never written this much. Et cetera. Maybe that’s the reason I felt like this year just … happened?

When I was younger I used to assume that the lifespan of a human being was 100 years, or, that my lifespan would be ~100 years. I knew that most people would not live to be 100, but it made sense to use 100 as a heuristic. That is the only way 33 stands out. 1/3 of 100. Based on the ONS ‘Life expectancy calculator‘, the average life expectancy for a 33-year-old male is 85 years. In other words, 52 years left (meaning that I – all else equal – have lived more than one third of my life now). My chance/risk of reaching 100 is 6.9%. Not 7.0 or 6.8%, but 6.9%:

This is ceteris paribus. With that in mind, I like the 6.9%. However, at this point, I don’t think I can do a lot more to increase my life expectancy at the margins. I live (relatively) healthy and there are no additional low-hanging fruits. What I can do is to have a subjectively longer life. Time seems more subjective as I get older, and if I had to live the rest of my life in pandemic mode, it would feel relatively shorter. New experiences – such as travelling – will make my life subjectively longer. In other words, to make my life as long as possible, I need to plan it in ways that feels longer – not by visiting the gym more often and eating less meat.

That being said, I am not sure I can do a lot. I can’t escape the fact that life, for the most part, is the day-to-day experience, and some days I feel like I am second-screening ‘the real life’. Getting shit done and calling it a day. Or as Stig Johansson formulated it, “All those days that came and went, little did I know that they were life.”

I am taking it for granted that people experience a lot of significant changes in their lifetime, such as technological and societal developments. It was only when I read the following passage from Matt Ridley’s book, How Innovation Works, that I got to think about how this is the exception rather than the norm: “Before the last two centuries, innovation was rare. A person could live his or her whole life without once experiencing a new technology: carts, ploughs, axes, candles, creeds and corn looked the same when you died as when you were born.”

I have lost count of the new technologies that have seen the light of day since I was a kid. Even at the age of 33 I have experienced a lot more new technologies than my ancestors experienced in a lifetime. This and the fact that the average life expectancy was, historically speaking, much lower in the past, made me conclude (yet again) that I should not complain. I have on all accounts already had a longer life, both in objective and subjective terms, than what most people could expect to experience in the past. I guess that puts history in perspective.

I remember watching Good Bye Lenin! in the cinema in 2003. That is 18 years ago now. The movie came out less than 14 years after the fall of the Berlin Wall. That’s weird to me. I was closer to the historic event depicted in the movie at that time than I am to me watching the movie for the first time. At some point in the future, assuming I get to live to I am 75 (the odds are good!), the events depicted in Der Untergang will be closer to when I first watched the movie than the time that has passed me since then. I guess that puts history in perspective, too. I was also reading this article in The Atlantic that makes it clear that 2050 is closer to today than 1990, and significant changes to the climate will happen within our lifetime: “A child born today won’t enter the professional workforce until 2043; under the current timeline, decarbonization will be just about licked by the time they turn 30. Their job will be to live with climate change: They will see Antarctica’s crucial 2050s in the prime of their career.” Damn.

Well, for now we can focus on the present. The pandemic as we know it is (hopefully) over. It has been a weird year and a half, and I definitely lost faith in multiple things during the pandemic. Shaking hands, airports, the United States (or, whatever faith that was left), nine to five, time more generally, John Ioannidis, menu cards, etc. However, there is at least one thing I have gained faith in during the pandemic: QR codes.

What’s up for the next year? I don’t know. More of the same, I guess. I don’t have big dreams and mountains to climb. I definitely don’t seek or need fame – or convince the world of my individual brilliance. More importantly, I really enjoy working within a great team. In general, at least in my case, I believe reliable consistency beats occasional brilliance. However, my biggest fear is still to be complacent, especially because that is what I gravitate towards.

All this also confirmed that it was a good move to leave academia. Actually, this has been the first full year without an academic job since I got my PhD. I found it surprisingly easy to give up a permanent academic position and I have not considered even once looking into ways of getting back. None of my parents are academics, and I never had the feeling that being an academic was part of my identity. More importantly, I do not look at my academic friends and see lives that inspire me. That’s totally fine. I am sure it is great for a lot of people, but it did not do it for me. I also found John Williams’ Stoner depressing when I read it years ago and I doubt that would change upon a second reading.

More than anything, I enjoy that I am 33 and not 23. I don’t miss being younger. As I get older, I am quite confident that I will develop certain peculiar quirks (as most people do). Hopefully, I can do that with a certain level of open-mindedness and self-awareness. The good thing about getting older is that I am much more selective in terms of what I spend time on and what I care about. I do not let trivialities live rent free in my head to the same extent as I did ten years ago. I think I think a lot more about what I think about. What do I remember? What do I forget? I obviously don’t see hyperthymesia as a good thing, or even an option in my case, and I don’t want my memories to be a random sample of what I experience – but a carefully curated selection (to the best possible extent).

The bad thing about getting older is that it will be a slow process characterised by physical and cognitive decay, at least at some point and to some extent. The body I am in now will be the body I have to stay in when I am 43. Accordingly, I have to prepare for my 40s in my 30s while still enjoying my 30s. And I have to do that in a way that I did not have to do when I was in my 20s. I have to hit the gym and think about exercise in a different way. I am not there yet, but I could sympathise with, and maybe even relate to, the following lines from one of my favourite albums from 2020, Open Mike Eagle’s ‘Anime, Trauma and Divorce‘: “Started doing more pushups, back pain when I look up/ taking down what I put up, knee hurt when I stood up”. I had a few osteopathic appointments this year – not because I needed it, but because I want to do what I can to make sure that I will not need it in the future.

To reiterate what I said in my previous post: I write this for nobody but myself. I write this primarily to have something for myself to go back and read in the future. In some ways this is pretty similar to the FutureMe service where you can send a letter to your future self. I read old blog posts and I don’t recognise the person writing them, and I fear that if I don’t write a post like this, I will not be able to recall what was on my mind at a certain point in time in the future.

I know I shouldn’t care but the numbers tell me – for reasons beyond my comprehension – that people read my posts. I do sometimes think about the impression people (might) be left with if they only read my blog. The irony is that I read a lot of great books and articles, talk to a lot of interesting people, and have an overall positive outlook on things. This is ironic as I am more inclined to write about something if I have some critical, and often negative, remarks to a piece of research or the media coverage of opinion polls. You can call it a negativity bias. I have tried to reduce this bias by design over the recent years to make the blog more aligned with – and representative of – what I care about. For example, by sharing more of the interesting stuff I find in blog posts with links to interesting studies, potpourris, assorted links, etc. And by framing my blog posts in an explicitly more constructive way, e.g., to say “How to improve your figures” instead of “What I do not like about this figure”. However, I am sure there are still many ways in which I can become better at this in the future (and maybe it will come naturally as I get older!).

Well, again, not much to report on. A year where I existed but did not necessarily live. This can sound pessimistic but it is not. Actually, it is quite optimistic. It is only upon reflection now that I reach this conclusion, and it is from a point of optimism, i.e. that the next year will be even better.

Last, this is the second year in a row I write a post like this. I don’t know whether I will write another post next year. Maybe, maybe not. Let’s see. Maybe I will wait a few years before I pick up on it again. In the best of all worlds I can break the average life expectancy and write a blog post titled ’86’ in 2074. In the grand scheme of things, it will happen before I know it.