Seven ways to find data

Data might be the new oil (there are arguments for and against this). While there definitely is a lot of data out there for you to drill, it can be difficult to find the exact data you need.

In this post I will outline seven different strategies to 1) keep yourself updated on new data sources and 2) find older datasets. I do not recommend that you necessarily go with all of them (there is a significant overlap between what you will find using the different strategies), and I have ranked the strategies according to my own personal preferences.

1. Newsletters
One of the best ways to keep yourself updated on new datasets is by getting the updates directly to your mailbox. Here, I can highly recommend the weekly newsletter Data Is Plural by Jeremy Singer-Vine.

While there are other newsletters out there, my impression is that if you subscribe to Data Is Plural, you should be covered. In addition, you can take a look at the structured archive of datasets covered in the newsletter (841 datasets at the time of writing). If you do not already subscribe to the newsletter, do yourself a favour and sign up.

2. GitHub repositories
Another good way to find data is to explore GitHub repositories. A lot of repositories host data (e.g. media outlets like FiveThirtyEight), and by exploring popular repositories, you will often find interesting data.

However, there are repositories that also list datasets you might find interesting. Awesome Public Datasets, for example, is a list of open datasets from a wide range of fields (GIS, neuroscience, sports, climate etc.). I curate the PolData repository where you can find a list of political datasets (elections, international relations, parties, policies etc.).

3. Twitter
Twitter is as always a good way to keep yourself in the loop. While there are specific users on Twitter that tweet about new and old datasets (such as GetTheData and Pew Research Methods), the most useful strategy here is to follow researchers.

Researchers care about sharing useful resources such as datasets. To illustrate, I found this amazing resource on free and open psychological datasets on Twitter.

4. Harvard Dataverse
The Harvard Dataverse is another great place to find datasets. The search function is working well and there is publicly available data related to various topics (especially for political scientists).

Noteworthy, I use this service to get a sense of forthcoming articles (as the data usually is stored online prior to the articles hitting your RSS or/and Twitter feed). For example, journals such as American Journal of Political Science and Journal of Politics have their own dataverse where they archive datasets well in advance of the actual publication.

Psychologists might prefer OSF instead of the Harvard Dataverse. However, I find OSF cumbersome to use and a mess when you want to explore potential datasets.

5. Facebook groups
Facebook is usually not my cup of tea (let us be honest: it is shit). That being said, there are some good groups for academics to explore. One of these is Political Science Data where people are good at sharing links to new resources. Furthermore, this is also a good place to ask for data suggestions. My impression is that there are similar Facebook groups available for other scientific domains as well.

6. Reddit
If you already use Reddit, The subreddit r/datasets is worth looking into. The quality of the submissions is not always great but you will often find some interesting datasets from various fields.

Another subreddit to check out is r/dataisbeautiful, where people share data visualizations (mostly original content). While sharing data is not the main objective of the subreddit, you will most likely find a lot of interesting data there.

7. Google Dataset Search
Last, we have Google Dataset Search. I like the idea of having a Google for datasets. And this is literally a Google for datasets. That being said, I have not used this service a lot and whenever I use it to find data, I am not convinced that this is the best strategy to use. Accordingly, I recommend following the six resources introduced above before using this service.

Potpourri: Statistics #57

Keep It Together: Using the tidyverse for machine learning
Learn to purrr
Mastering Shiny
A Comprehensive List of Handy R Packages
The challenges of using machine learning to identify gender in images
How is polling done around the world?
How to Get Better at Embracing Unknowns
Drawing maps in R
Potential Outcome and Directed Acyclic Graph Approaches to Causality: Relevance for Empirical Practice in Economics
Visualizing Locke and Mill: a tidytext analysis
Tutorial: Cleaning UK Office for National Statistics data in R
– Transitioning into the tidyverse: part 1, part 2
Your Friendly Guide to Colors in Data Visualisation
Optimising your R code – a guided example
Learning data visualization
Reference Collection to push back against “Common Statistical Myths”
mutate_all(), select_if(), summarise_at()… what’s the deal with scoped verbs?!
Tools for Exploring and Comparing Data Frames
Tom’s Cookbook for Better Viz
Themes to Improve Your ggplot Figures
Lesser Known R Features
What Statistics Can and Can’t Tell Us About Ourselves
A Graphical Introduction to tidyr’s pivot_*()
n() cool #dplyr things
Bayesian Linear Mixed Models: Random Intercepts, Slopes, and Missing Data
Prepping data for #rstats #tidyverse and a priori planning
NYT-style urban heat island maps

Word limits in political science journals

Different political science journals have different article formats with different word/page limits. Consequently, whenever you want to submit an article to a journal, the first thing to look up is the exact word limit.

In order to get a sense of the different article formats and word limits in political science journals, I have created an overview. The overview shows word limits for long articles, short articles and review essays/articles.

The overview currently consists of 65 journals and I will most likely add more journals (and more features) in the future. Do reach out on Twitter or drop me a mail if you got any feedback or if there is a specific journal of relevance to political scientists that I should add to the overview.

Last, the overview is sorted by impact factor (obtained with the excellent scholar package in R).