Here are ten R functions that have saved me a lot of time over the years.
forcats package has a lot of great functions. The one I use the most is the
fct_reorder() function. I have also seen David Robinson using it a lot in his YouTube videos (I recommend his videos in this post).
The function is good to change the order of values in a factor variable, e.g. if you want to make sure there is some structure to the values you present in a bar chart:
I have lost count of the number of times I have used the
countrycode package. If you are doing comparative research and not using the
countrycode() function, you are in for a treat.
In a lot of datasets you will not have the full country name (e.g. Denmark), but something like ISO 3166-1 alpha-2 codes (e.g. DK). The
countrycode() function can easily return country names based on ISO codes (or vice versa). Here is an example:
countrycode(c("DK", "SE"), origin = "iso2c", destination = "country.name")
This code will return Denmark and Sweden. As you can see, you simply provide the “origin” (i.e. the type of data you have) and the “destination” (i.e. the type of data you would like). I especially find this function useful when I need to merge datasets with different country variables and when I want to present full country names in a visualisation instead of ISO codes.
Last, if you are working on a country-level dataset, make sure that it is easy to match the countries with any of the variables available in the
I recently had to work with a dataset where each country had several priorities in relation to the Sustainable Development Goals (SDGs). However, there was only one SDG variable with information on the relevant SDGs for each country. The
separate_rows() function is great to turn such data into multiple rows.
df <- tibble( country = c(1, 2), SDG = c("SDG 5,SDG 17,SDG 3", "SDG 1,SDG 2,SDG 3") ) df %>% separate_rows(SDG, sep = ",", convert = TRUE)
sep argument is specifying what separator you would like to use to separate the information (in this case a comma). The code will return a tibble with two variables and six observations.
I often use the
crossing() function when I need to create a data frame from scratch. For example, if you need to create a country-year data frame for a few countries from 1965 to 2021, you can create a data frame where each country has a row for each year. Here is an example:
crossing(country = c("Denmark", "Sweden"), year = 1965:2021, value = NA_real_)
I had to scrape a PDF file but the text I got from the document was reversed, e.g. ‘Agriculture’ was ‘erutlucirgA’. There might be different ways to do this in an easy way, but the function
stri_reverse() in the
stringi package did the trick. Here is a simple example:
x <- "snoitcnuf R taerg neT" stringi::stri_reverse(x)
And what we get is: “Ten great R functions”.
reduce() function is a great to collapse repetitive piping. There is a good blog post on the function here. To illustrate, when I used to merge several data frames into one large data frame, I used multiple lines of
reduce(list(df_1, df_2, df_3, df_4), left_join, by = c("iso2c", "year"))
The code will left join all data frames on the
If you have multiple rows in a data frame, e.g. multiple countries, but want a unique row for each country, you can use the
distinct() function to get distinct rows. In the example below we have four rows but we turn them into a data frame with distinct rows on the variable x.
df <- tibble( x = c(1, 1, 2, 2), y = c(1, 1, 2, 4) ) df %>% dplyr::distinct(x, .keep_all = TRUE)
regex_left_join() from the
fuzzyjoin is great if you need to merge a data frame based upon a regular expression. I found this useful when I had to join data frames with different country names.
Here is a simple example where we join two data frames where it merges the rows for both “Denmark” and “denmark”.
df_1 <- data.frame( country = c("Denmark", "denmark"), year = 2020:2021 ) df_2 <- data_frame(regex_country = c("[Dd]enmark"), type = 1:2) df_1 %>% fuzzyjoin::regex_inner_join(df_2, by = c(country = "regex_country"))
I used to look up the
theme() function when I had to remove the title of a legend, or use
scale_x_continuous() if I had to change the title of the x-axis. Not anymore. The
labs() function is an easy way to change the labels in your figure. You can also use it to change the title and subtitle of your figure. Highly recommended.
When I check some of my old code, I often see lines like this:
df %>% filter(!is.na(var1))
However, there is a much easier way to do this, namely using the
df %>% drop_na(var1)
This is not only much easier to write than having to rely on two functions, but also a lot easier to read.