NOTE: Before running this notebook, place a copy of your downloaded RomeoAndJuliet.txt file in the same folder with this notebook.

12.2.11 Word Frequencies

  • Various techniques for detecting similarity between documents rely on word frequencies
  • TextBlob can count word frequencies for you
  • When you read a file with Path’s read_text method, it closes the file immediately after it finishes reading the file
In [1]:
from pathlib import Path
In [2]:
from textblob import TextBlob
In [3]:
blob = TextBlob(Path('RomeoAndJuliet.txt').read_text())
  • Access the word frequencies through the TextBlob’s word_counts dictionary
In [4]:
blob.word_counts['juliet']
Out[4]:
190
In [5]:
blob.word_counts['romeo']
Out[5]:
315
In [6]:
blob.word_counts['thou']
Out[6]:
278
  • If you already have tokenized a TextBlob into a WordList, you can count specific words in the list via the count method
In [7]:
blob.words.count('joy')
Out[7]:
14
In [8]:
blob.noun_phrases.count('lady capulet')
Out[8]:
46

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.