12.5 Named Entity Recognition with spaCy

  • NLP can determine what a text is about
  • Named entity recognition attempts to locate and categorize items
    • dates, times, quantities, places, people, things, organizations and more
  • May also want to check out Textacy
    • Built on spaCy and supports additional NLP tasks

Install spaCy

  • spaCy Quickstart guide
  • conda install -c conda-forge spacy
    • Windows users should run the Anaconda Prompt as an Administrator
  • Download spaCy's English (en) "model" for prcessing text
    ipython -m spacy download en

Loading the Language Model

  • spaCy docs recommend the variable name nlp.
In [1]:
import spacy
In [2]:
nlp = spacy.load('en') 

Creating a spaCy Doc

  • Use the nlp object to create a spaCy Doc object representing the document to process.
In [3]:
document = nlp('In 1994, Tim Berners-Lee founded the ' + 
    'World Wide Web Consortium (W3C), devoted to ' +
    'developing web technologies')

Getting the Named Entities

  • Returns tuple of Spans representing the named entities
  • Spans have many properties
  • Display text (the entity's text) and label_ (the kind of entity)
In [4]:
for entity in document.ents:
    print(f'{entity.text}: {entity.label_}')
1994: DATE
Tim Berners-Lee: PERSON
the World Wide Web Consortium (W3C: ORG

