NOTE: Before running this notebook, be sure to place your copy of RomeoAndJuliet.txt in the same folder as the notebook.

12.3.2 Visualizing Word Frequencies with Word Clouds

Installing the wordcloud Module

  • conda install -c conda-forge wordcloud
    • Windows users should run the Anaconda Prompt as an Administrator

Loading the Text

In [1]:
from pathlib import Path
In [2]:
text = Path('RomeoAndJuliet.txt').read_text()

Loading the Mask Image that Specifies the Word Cloud’s Shape

  • WordCloud fills non-white areas of a mask image with text
  • Load the mask using the imread function from the imageio module that comes with Anaconda
In [3]:
import imageio
In [4]:
mask_image = imageio.imread('mask_heart.png')

Configuring the WordCloud Object

In [5]:
from wordcloud import WordCloud   
In [6]:
wordcloud = WordCloud(width=1000, height=1000, 
    colormap='prism', mask=mask_image, background_color='white')

Generating the Word Cloud

  • WordCloud’s generate method receives the text to use in the word cloud as an argument and creates the word cloud, which it returns as a WordCloud object
In [7]:
wordcloud = wordcloud.generate(text)
  • removes stop words from the text argument, using the wordcloud module’s built-in stop-words list
  • calculates the word frequencies for the remaining words
  • builds the cloud with a maximum of 200 words by default, but can specify max_words keyword argument

Saving the Word Cloud as an Image File

In [8]:
wordcloud = wordcloud.to_file('RomeoAndJulietHeart.png')

Generating a Word Cloud from a Dictionary

  • If you have a dictionary of word counts, WordCloud’s fit_words method can create a word cloud from it, but does not remove the stop words from the dictionary

Displaying the Image with Matplotlib

In [9]:
%matplotlib inline
In [10]:
import matplotlib.pyplot as plt
In [11]:
plt.imshow(wordcloud)
Out[11]:
<matplotlib.image.AxesImage at 0x11c5954e0>

RomeoAndJulietHeart.png


©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.