12.5 Named Entity Recognition with spaCy

  • NLP can determine what a text is about
  • Named entity recognition attempts to locate and categorize items
    • dates, times, quantities, places, people, things, organizations and more
  • May also want to check out Textacy
    • Built on spaCy and supports additional NLP tasks

Install spaCy

  • spaCy Quickstart guide
  • conda install -c conda-forge spacy
    • Windows users should run the Anaconda Prompt as an Administrator
  • Download spaCy's English (en) "model" for prcessing text
    ipython -m spacy download en

Loading the Language Model

  • spaCy docs recommend the variable name nlp.
In [1]:
import spacy
In [2]:
nlp = spacy.load('en') 

Creating a spaCy Doc

  • Use the nlp object to create a spaCy Doc object representing the document to process.
In [3]:
document = nlp('In 1994, Tim Berners-Lee founded the ' + 
    'World Wide Web Consortium (W3C), devoted to ' +
    'developing web technologies')

Getting the Named Entities

  • Returns tuple of Spans representing the named entities
  • Spans have many properties
  • Display text (the entity's text) and label_ (the kind of entity)
In [4]:
for entity in document.ents:
    print(f'{entity.text}: {entity.label_}')
1994: DATE
Tim Berners-Lee: PERSON
the World Wide Web Consortium (W3C: ORG

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.