Note: This notebook contains ALL the code for Sections 12.2.12.2.7

12.2.1 Create a TextBlob

In [1]:
from textblob import TextBlob
In [2]:
text = 'Today is a beautiful day. Tomorrow looks like bad weather.'
In [3]:
blob = TextBlob(text)
In [4]:
blob
Out[4]:
TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

TextBlob, Sentences and Words Support String Methods and Comparisons

  • Sentences, Words and TextBlobs inherit from BaseBlob, which defines many common methods and properties
  • BaseBlob documentation

12.2.2 Tokenizing Text into Sentences and Words

  • Getting a list of sentences
In [5]:
blob.sentences
Out[5]:
[Sentence("Today is a beautiful day."),
 Sentence("Tomorrow looks like bad weather.")]
  • A WordList is a subclass of Python’s built-in list type with additional NLP methods.
  • Contains TextBlob Word objects
In [6]:
blob.words
Out[6]:
WordList(['Today', 'is', 'a', 'beautiful', 'day', 'Tomorrow', 'looks', 'like', 'bad', 'weather'])

12.2.3 Parts-of-Speech Tagging

  • Evaluate words based on context to determine parts of speech, which can help determine meaning
  • Eight primary English parts of speech
    • nouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions and interjections (words that express emotion and that are typically followed by punctuation, like “Yes!” or “Ha!”)
    • Many subcategories
  • Some words have multiple meanings
    • E.g., “set” and “run” have hundreds of meanings each!
In [7]:
blob
Out[7]:
TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")
In [8]:
blob.tags
Out[8]:
[('Today', 'NN'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('beautiful', 'JJ'),
 ('day', 'NN'),
 ('Tomorrow', 'NNP'),
 ('looks', 'VBZ'),
 ('like', 'IN'),
 ('bad', 'JJ'),
 ('weather', 'NN')]

12.2.3 Parts-of-Speech Tagging (cont.)

12.2.4 Extracting Noun Phrases

  • Preparing to purchase a water ski
  • Might search for “best water ski”“water ski” is a noun phrase
  • For best results, search engine must parse the noun phrase properly
  • Try searching for “best water,” “best ski”, “water ski” and “best water ski” and see what you get
In [9]:
blob
Out[9]:
TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")
In [10]:
blob.noun_phrases
Out[10]:
WordList(['beautiful day', 'tomorrow', 'bad weather'])
  • A Word can represent a noun phrase with multiple words.

12.2.5 Sentiment Analysis with TextBlob’s Default Sentiment Analyzer

  • Determines whether text is positive, neutral or negative.
  • One of the most common and valuable NLP tasks (several later case studies do it)
  • Consider the positive word “good” and the negative word “bad"
    • Alone they are positive and negative, respectively, but...
    • The food is not good — clearly has negative sentiment
    • The movie was not bad — clearly has positive sentiment (but not as positive as The movie was excellent!)
  • Complex machine-learning problem, but libraries like TextBlob can do it for you

Getting the Sentiment of a TextBlob

In [11]:
blob
Out[11]:
TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")
In [12]:
blob.sentiment
Out[12]:
Sentiment(polarity=0.07500000000000007, subjectivity=0.8333333333333333)
  • polarity is the sentiment — from -1.0 (negative) to 1.0 (positive) with 0.0 being neutral.
  • subjectivity is a value from 0.0 (objective) to 1.0 (subjective).

Getting the polarity and subjectivity from the Sentiment Object

  • %precision magic specifies the default precision for standalone float objects and float objects in built-in types like lists, dictionaries and tuples:
In [13]:
%precision 3
Out[13]:
'%.3f'
In [14]:
blob.sentiment.polarity
Out[14]:
0.075
In [15]:
blob.sentiment.subjectivity
Out[15]:
0.833

Getting the Sentiment of a Sentence

  • One is positive (0.85) and one is negative (-0.6999999999999998), which might explain why the entire TextBlob’s sentiment was close to 0.0 (neutral)
In [16]:
for sentence in blob.sentences:
    print(sentence.sentiment)
Sentiment(polarity=0.85, subjectivity=1.0)
Sentiment(polarity=-0.6999999999999998, subjectivity=0.6666666666666666)

12.2.6 Sentiment Analysis with the NaiveBayesAnalyzer

In [17]:
from textblob.sentiments import NaiveBayesAnalyzer
In [18]:
blob = TextBlob(text, analyzer=NaiveBayesAnalyzer())
In [19]:
blob
Out[19]:
TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")
In [20]:
blob.sentiment
Out[20]:
Sentiment(classification='neg', p_pos=0.47662917962091056, p_neg=0.5233708203790892)
In [21]:
for sentence in blob.sentences:
    print(sentence.sentiment)
Sentiment(classification='pos', p_pos=0.8117563121751951, p_neg=0.18824368782480477)
Sentiment(classification='neg', p_pos=0.174363226578349, p_neg=0.8256367734216521)

12.2.7 Language Detection and Translation

  • Google Translate, Microsoft Bing Translator and others can translate between scores of languages instantly
  • Now working on near-real-time translation
    • Converse in real time with people who do not know your natural language
  • In the IBM Watson presentation, we'll develop a script that does inter-language translation
In [22]:
blob
Out[22]:
TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")
In [23]:
blob.detect_language()
Out[23]:
'en'

12.2.7 Language Detection and Translation (cont.)

In [24]:
spanish = blob.translate(to='es')
In [25]:
spanish
Out[25]:
TextBlob("Hoy es un hermoso dia. Mañana parece mal tiempo.")
In [26]:
spanish.detect_language()
Out[26]:
'es'

12.2.7 Language Detection and Translation (cont.)

In [27]:
chinese = blob.translate(to='zh')
In [28]:
chinese
Out[28]:
TextBlob("今天是美好的一天。明天看起来像恶劣的天气。")
In [29]:
chinese.detect_language()
Out[29]:
'zh-CN'

12.2.7 Language Detection and Translation (cont.)

  • Can specify a source language explicitly by passing the from_lang keyword argument to the translate method
chinese = blob.translate(from_lang='en', to='zh')
In [30]:
spanish.translate()
Out[30]:
TextBlob("Today is a beautiful day. Tomorrow seems bad weather.")
In [31]:
chinese.translate() 
Out[31]:
TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")
  • Note the slight difference in the English results.

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.