NOTE: Before running this notebook, be sure to place your copy of RomeoAndJuliet.txt in the same folder as the notebook.

12.3.2 Visualizing Word Frequencies with Word Clouds

Installing the wordcloud Module

  • conda install -c conda-forge wordcloud
    • Windows users should run the Anaconda Prompt as an Administrator

Loading the Text

In [1]:
from pathlib import Path
In [2]:
text = Path('RomeoAndJuliet.txt').read_text()

Loading the Mask Image that Specifies the Word Cloud’s Shape

  • WordCloud fills non-white areas of a mask image with text
  • Load the mask using the imread function from the imageio module that comes with Anaconda
In [3]:
import imageio
In [4]:
mask_image = imageio.imread('mask_heart.png')

Configuring the WordCloud Object

In [5]:
from wordcloud import WordCloud   
In [6]:
wordcloud = WordCloud(width=1000, height=1000, 
    colormap='prism', mask=mask_image, background_color='white')

Generating the Word Cloud

  • WordCloud’s generate method receives the text to use in the word cloud as an argument and creates the word cloud, which it returns as a WordCloud object
In [7]:
wordcloud = wordcloud.generate(text)
  • removes stop words from the text argument, using the wordcloud module’s built-in stop-words list
  • calculates the word frequencies for the remaining words
  • builds the cloud with a maximum of 200 words by default, but can specify max_words keyword argument

Saving the Word Cloud as an Image File

In [8]:
wordcloud = wordcloud.to_file('RomeoAndJulietHeart.png')

Generating a Word Cloud from a Dictionary

  • If you have a dictionary of word counts, WordCloud’s fit_words method can create a word cloud from it, but does not remove the stop words from the dictionary

Displaying the Image with Matplotlib

In [9]:
%matplotlib inline
In [10]:
import matplotlib.pyplot as plt
In [11]:
<matplotlib.image.AxesImage at 0x11c5954e0>


