12.2.14 n-grams

  • n-gram — a sequence of n text items, such as letters in words or words in a sentence.
  • Used to identify letters or words that frequently appear adjacent to one another
    • Predictive text input
    • Speech-to-text
In [1]:
from textblob import TextBlob
In [2]:
text = 'Today is a beautiful day. Tomorrow looks like bad weather.'
In [3]:
blob = TextBlob(text)
  • TextBlob’s ngrams method produces a list of WordList n-grams of length three by default—known as trigrams
  • Use keyword argument n to produce n-grams of any desired length
In [4]:
[WordList(['Today', 'is', 'a']),
 WordList(['is', 'a', 'beautiful']),
 WordList(['a', 'beautiful', 'day']),
 WordList(['beautiful', 'day', 'Tomorrow']),
 WordList(['day', 'Tomorrow', 'looks']),
 WordList(['Tomorrow', 'looks', 'like']),
 WordList(['looks', 'like', 'bad']),
 WordList(['like', 'bad', 'weather'])]
In [5]:
[WordList(['Today', 'is', 'a', 'beautiful', 'day']),
 WordList(['is', 'a', 'beautiful', 'day', 'Tomorrow']),
 WordList(['a', 'beautiful', 'day', 'Tomorrow', 'looks']),
 WordList(['beautiful', 'day', 'Tomorrow', 'looks', 'like']),
 WordList(['day', 'Tomorrow', 'looks', 'like', 'bad']),
 WordList(['Tomorrow', 'looks', 'like', 'bad', 'weather'])]

