Note: This notebook contains ALL the code for Sections 12.2.12.2.7

12.2.1 Create a TextBlob¶

from textblob import TextBlob

text = 'Today is a beautiful day. Tomorrow looks like bad weather.'

blob = TextBlob(text)

blob

TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

`TextBlob`, `Sentence`s and `Word`s Support String Methods and Comparisons¶

Sentences, Words and TextBlobs inherit from BaseBlob, which defines many common methods and properties
BaseBlob documentation

12.2.2 Tokenizing Text into Sentences and Words¶

Getting a list of sentences

blob.sentences

[Sentence("Today is a beautiful day."),
 Sentence("Tomorrow looks like bad weather.")]

A WordList is a subclass of Python’s built-in list type with additional NLP methods.
Contains TextBlob Word objects

blob.words

WordList(['Today', 'is', 'a', 'beautiful', 'day', 'Tomorrow', 'looks', 'like', 'bad', 'weather'])

12.2.3 Parts-of-Speech Tagging¶

Evaluate words based on context to determine parts of speech, which can help determine meaning
Eight primary English parts of speech
- nouns, pronouns, verbs, adjectives, adverbs, prepositions, conjunctions and interjections (words that express emotion and that are typically followed by punctuation, like “Yes!” or “Ha!”)
- Many subcategories
Some words have multiple meanings
- E.g., “set” and “run” have hundreds of meanings each!

blob

TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

blob.tags

[('Today', 'NN'),
 ('is', 'VBZ'),
 ('a', 'DT'),
 ('beautiful', 'JJ'),
 ('day', 'NN'),
 ('Tomorrow', 'NNP'),
 ('looks', 'VBZ'),
 ('like', 'IN'),
 ('bad', 'JJ'),
 ('weather', 'NN')]

12.2.3 Parts-of-Speech Tagging (cont.)¶

TextBlob uses a PatternTagger to determine parts-of-speech
Uses pattern library POS tagging
Pattern's 63 parts-of-speech tags
In preceding output:
- NN—a singular noun or mass noun
- VBZ—a third person singular present verb
- DT—a determiner (the, an, that, this, my, their, etc.)
- JJ—an adjective
- NNP—a proper singular noun
- IN—a subordinating conjunction or preposition

12.2.4 Extracting Noun Phrases¶

Preparing to purchase a water ski
Might search for “best water ski”—“water ski” is a noun phrase
For best results, search engine must parse the noun phrase properly
Try searching for “best water,” “best ski”, “water ski” and “best water ski” and see what you get

blob

TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

blob.noun_phrases

WordList(['beautiful day', 'tomorrow', 'bad weather'])

A Word can represent a noun phrase with multiple words.

12.2.5 Sentiment Analysis with TextBlob’s Default Sentiment Analyzer¶

Determines whether text is positive, neutral or negative.
One of the most common and valuable NLP tasks (several later case studies do it)
Consider the positive word “good” and the negative word “bad"
- Alone they are positive and negative, respectively, but...
- The food is not good — clearly has negative sentiment
- The movie was not bad — clearly has positive sentiment (but not as positive as The movie was excellent!)
Complex machine-learning problem, but libraries like TextBlob can do it for you

Getting the Sentiment of a TextBlob¶

blob

TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

blob.sentiment

Sentiment(polarity=0.07500000000000007, subjectivity=0.8333333333333333)

polarity is the sentiment — from -1.0 (negative) to 1.0 (positive) with 0.0 being neutral.
subjectivity is a value from 0.0 (objective) to 1.0 (subjective).

Getting the polarity and subjectivity from the Sentiment Object¶

%precision magic specifies the default precision for standalone float objects and float objects in built-in types like lists, dictionaries and tuples:

%precision 3

'%.3f'

blob.sentiment.polarity

0.075

blob.sentiment.subjectivity

0.833

Getting the Sentiment of a Sentence¶

One is positive (0.85) and one is negative (-0.6999999999999998), which might explain why the entire TextBlob’s sentiment was close to 0.0 (neutral)

for sentence in blob.sentences:
    print(sentence.sentiment)

Sentiment(polarity=0.85, subjectivity=1.0)
Sentiment(polarity=-0.6999999999999998, subjectivity=0.6666666666666666)

12.2.6 Sentiment Analysis with the NaiveBayesAnalyzer¶

from textblob.sentiments import NaiveBayesAnalyzer

blob = TextBlob(text, analyzer=NaiveBayesAnalyzer())

blob

TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

blob.sentiment

Sentiment(classification='neg', p_pos=0.47662917962091056, p_neg=0.5233708203790892)

for sentence in blob.sentences:
    print(sentence.sentiment)

Sentiment(classification='pos', p_pos=0.8117563121751951, p_neg=0.18824368782480477)
Sentiment(classification='neg', p_pos=0.174363226578349, p_neg=0.8256367734216521)

12.2.7 Language Detection and Translation¶

Google Translate, Microsoft Bing Translator and others can translate between scores of languages instantly
Now working on near-real-time translation
- Converse in real time with people who do not know your natural language
In the IBM Watson presentation, we'll develop a script that does inter-language translation

blob

TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

blob.detect_language()

'en'

12.2.7 Language Detection and Translation (cont.)¶

spanish = blob.translate(to='es')

spanish

TextBlob("Hoy es un hermoso dia. Mañana parece mal tiempo.")

spanish.detect_language()

'es'

12.2.7 Language Detection and Translation (cont.)¶

chinese = blob.translate(to='zh')

chinese

TextBlob("今天是美好的一天。明天看起来像恶劣的天气。")

chinese.detect_language()

'zh-CN'

12.2.7 Language Detection and Translation (cont.)¶

Can specify a source language explicitly by passing the from_lang keyword argument to the translate method

chinese = blob.translate(from_lang='en', to='zh')

from_lang and to use iso-639-1 language codes
Google Translate’s list of supported languages

spanish.translate()

TextBlob("Today is a beautiful day. Tomorrow seems bad weather.")

chinese.translate()

TextBlob("Today is a beautiful day. Tomorrow looks like bad weather.")

Note the slight difference in the English results.

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.

12.2.1 Create a TextBlob¶

TextBlob, Sentences and Words Support String Methods and Comparisons¶

12.2.2 Tokenizing Text into Sentences and Words¶

12.2.3 Parts-of-Speech Tagging¶

12.2.3 Parts-of-Speech Tagging (cont.)¶

12.2.4 Extracting Noun Phrases¶

12.2.5 Sentiment Analysis with TextBlob’s Default Sentiment Analyzer¶

Getting the Sentiment of a TextBlob¶

Getting the polarity and subjectivity from the Sentiment Object¶

Getting the Sentiment of a Sentence¶

12.2.6 Sentiment Analysis with the NaiveBayesAnalyzer¶

12.2.7 Language Detection and Translation¶

12.2.7 Language Detection and Translation (cont.)¶

12.2.7 Language Detection and Translation (cont.)¶

12.2.7 Language Detection and Translation (cont.)¶

`TextBlob`, `Sentence`s and `Word`s Support String Methods and Comparisons¶