NOTE: Before running this notebook, place a copy of your downloaded RomeoAndJuliet.txt file in the same folder with this notebook.
Pandas visualization capabilities are based on Matplotlib, so launch IPython with the following command for this session:
ipython --matplotlib
Or enable matplotlib in Jupyter
%matplotlib inline
from pathlib import Path
from textblob import TextBlob
blob = TextBlob(Path('RomeoAndJuliet.txt').read_text())
from nltk.corpus import stopwords
stop_words = stopwords.words('english')
items = blob.word_counts.items()
item[0]
gets the word from each tuple so we can check whether it’s in stop_words
items = [item for item in items if item[0] not in stop_words]
itemgetter
function from the Python Standard Library’s operator
modulefrom operator import itemgetter
sorted_items = sorted(items, key=itemgetter(1), reverse=True)
TextBlob
tokenizaton splits all contractions at their apostrophes and counts the total number of apostrophes as one of the “words” sorted_items[0]
, you’ll see that they are the most frequently occurring “word” with 867
of them'romeo'
) 0
top20 = sorted_items[1:21]
import pandas as pd
df = pd.DataFrame(top20, columns=['word', 'count'])
df
bar
method of the DataFrame
’s plot
property creates and displays a Matplotlib bar chartaxes = df.plot.bar(x='word', y='count', legend=False)
import matplotlib.pyplot as plt
plt.gcf().tight_layout()
©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.
DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.