Wordcloud

Dataviz logo representing a Wordcloud chart.

This page explains how to build a wordcloud using react and d3.js. It uses the d3-cloud plugin to compute the position of each word, and render them with react.

This section is rather short as I'm not a big fan of wordclouds. They can be quite misleading and you should consider building a barplot or a lollipop plot instead.

Useful links
This page uses the d3-cloud plugin that you can install in your project with npm install d3-cloud

The Data

The data is an array. Each item is an object describing a word. Its name is provided, together with a related value that will be used to size the word on the final figure.

Note that you can add any additional property here, like a color, a font weight or anything else that you want to use to draw the word later on.

const data = [
  { text: "hello", value: 12 },
  { text: "world", value: 2 },
];

Most basic wordcloud with React and D3.js

Everything starts by instantiating a wordcloud layout using thed3Cloud() function of the d3-cloud library.

const layout = d3Cloud()
  .words(data)
  .size([width, height])
  .fontSize((d) => fontSizeScale(d.value))
  .padding(10)
  .on("end", (words) => {
    setWordsPosition(words);
  });

This layout can then be called from a useEffect using layout.start(). The layout algorithm will loop through each word of the dataset and try to place them on the chart, avoiding overlaps with other words.

Once the loop is over, the layout algorithm will produce a words object and provide it to the end() function. This function update a state that stores the position and feature of each word.

It is thus possible to map through those word features and draw them using html, svg or canvas. Here is an example using HTML


Most basic Wordcloud made with react and d3.js

Todo: write better explanation
Todo: the layout algorithm currently provides unperfect values, resulting in a lot of word overlaps. Please tell me if you find where the bug is.

Warning

Wordclouds are useful for quickly perceiving the most prominent terms. They are widely used in media and well understood by the public. However, they are criticized for 2 main reasons:

  • Area is a poor metaphor of a numeric value, it is hardly perceive by the human eye
  • Longer words appear bigger by construction

To put it in a nutshell, wordclouds must be avoided. You can read more about that in data-to-viz. Why not consider a lollipop plot or a barplot instead?

Ranking

Contact

👋 Hey, I'm Yan and I'm currently working on this project!

Feedback is welcome ❤️. You can fill an issue on Github, drop me a message on Twitter, or even send me an email pasting yan.holtz.data with gmail.com. You can also subscribe to the newsletter to know when I publish more content!