How to improve your figures #10: Do not use word clouds

In a previous post, I argued that people should not use pie charts. In this post I am going to make a similar case for word clouds. In short, I will argue that word clouds provide ‘foggy’ insights (pun intended).

Specifically, I will discuss a word cloud that got a lot of attention in the UK on what people think about Boris Johnson. The word cloud in question was made by JL Partners and shared on Twitter. Here it is:

Unsurprisingly, the word cloud went viral. Here is a tweet that got more than 30,000 likes. And here is another tweet that got more than 10,000 likes. It even made it to flow TV. People love word clouds.

I was waiting for a long time for the underlying data to become available, as promised by the polling company. However, it never happened. For that reason, I can only point out the limitations and problems with the word cloud without looking at the raw data.

My main issue with the word cloud is that we cannot really say whether most people call Boris Johnson a liar. We know that the word cloud is based upon 2,000 responses and 72% were negative, but how many of these called Boris Johnson a liar? 5%? 15%? 20%? 50%? It is really a word cloud. We need numbers!

Next, it is easy to get the impression that such word clouds simply arise on the blue sky by themselves. Nothing could be further from the truth. In most cases, you need to preprocess the answers before you can create the word cloud. Notice how we see answers such as “hes” and “he’s” in the word cloud. That is not because a respondent only said “he’s”, but paragraphs such as “he’s a liar” and “he’s a leader”. This is not a problem per se, but it makes is paramount that the material is publicly available. Not only the data, but also the script used to create the word cloud.

Notice also how a specific word, liar, is given the colour blue. This is to make it stand out. It is easy to look at the word cloud and believe that most people said liar, but how many people said liar compared to “leader” or “pm”? Are we looking at 80 people that said “liar” and 70 people that said “leader”? We cannot tell based on the word cloud. In general, it is very difficult to make relative comparisons between words in word clouds.

For those reasons, I suggest that you always consider alternatives to word clouds. If you go with a word cloud, make sure that your material is publicly available and that it is easy to compare the results from the word cloud to other types of presentation of the data. Last, if you still feel tempted to do a word cloud, check out this post on better ways to design word clouds.