NOTE: Before running this notebook, be sure to place your copy of the play in the same folder as the notebook.

12.4 Readability Assessment with Textatistic

  • Text readability is affected by
    • vocabulary used
    • sentence structure
    • sentence length
    • topic
    • and more.
  • Grammarly uses tools like these to tune writing for readability
  • Textatistic uses several popular readability formulas
    • Flesch Reading Ease
    • Flesch-Kincaid
    • Gunning Fog
    • Simple Measure of Gobbledygook (SMOG)
    • Dale-Chall

Install Textatistic

pip install textatistic

Calculating Statistics and Readability Scores

In [10]:
from pathlib import Path
In [11]:
text = Path('RomeoAndJuliet.txt').read_text()
In [12]:
from textatistic import Textatistic
In [13]:
readability = Textatistic(text)
  • Textatistic method dict returns a dictionary containing various statistics and the readability scores:
In [5]:
%precision 3
Out[5]:
'%.3f'
In [6]:
readability.dict()
Out[6]:
{'char_count': 115141,
 'word_count': 26120,
 'sent_count': 3218,
 'sybl_count': 30166,
 'notdalechall_count': 5823,
 'polysyblword_count': 549,
 'flesch_score': 100.892,
 'fleschkincaid_score': 1.203,
 'gunningfog_score': 4.087,
 'smog_score': 5.489,
 'dalechall_score': 7.559}

Calculating Statistics and Readability Scores (cont.)

  • Each of the values in the dictionary is also accessible via a Textatistic property of the same name as the keys shown in the preceding output. The statistics produced include:
  • char_count—number of characters
  • word_count—number of words
  • sent_count—number of sentences
  • sybl_count—number of syllables
  • notdalechall_count—# of words not on the Dale-Chall list (words understood by 80% of 5th graders)
    • Higher is less readable
  • polysyblword_count—# of words with 3+ syllables
  • flesch_score—Flesch Reading Ease score
    • 90+ considered readable by 5th graders
    • <30 require a college degree

Calculating Statistics and Readability Scores (cont.)

  • fleschkincaid_score—Flesch-Kincaid score corresponds to a specific grade level
  • gunningfog_score—Gunning Fog index value corresponds to a specific grade level
  • smog_scoreSimple Measure of Gobbledygook (SMOG)
    • Corresponds to years of education required to understand text
    • Considered particularly effective for healthcare materials
  • dalechall_score—Dale-Chall score
    • Maps to grade levels from 4 and below to college graduate (grade 16) and above
    • Considered most reliable for a broad range of text types
    • Dale Chall on Wikipedia


©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.