8.12.1 re Module and Function fullmatch

  • fullmatch checks whether the entire string in its second argument matches the pattern in its first argument
In [ ]:
import re

Matching Literal Characters

In [ ]:
pattern = '02215'
In [ ]:
'Match' if re.fullmatch(pattern, '02215') else 'No match'
In [ ]:
'Match' if re.fullmatch(pattern, '51220') else 'No match'
  • First argument is the regular expression pattern to match
    • Any string can be a regular expression
    • literal characters match themselves in the specified order
  • Second argument is the string that should entirely match the pattern.
  • If the second argument matches the pattern in the first argument, fullmatch returns an object containing the matching text, which evaluates to True
  • For no match, fullmatch returns None, which evaluates to False

Metacharacters, Character Classes and Quantifiers

  • Regular expressions typically contain various special symbols called metacharacters:
Regular expression metacharacters
[] {} () \ * + ^ $ ? . |
  • \ metacharacter begins each predefined character class
  • Each matches a specific set of characters
In [ ]:
'Valid' if re.fullmatch(r'\d{5}', '02215') else 'Invalid'
In [ ]:
'Valid' if re.fullmatch(r'\d{5}', '9876') else 'Invalid'
  • In \d{5}, \d is a character class representing a digit (0–9)
  • A character class is a regular expression escape sequence that matches one character
  • To match more than one, follow the character class with a quantifier
  • {5} repeats \d five times to match five consecutive digits

Other Predefined Character Classes

  • To match any metacharacter as its literal value, precede it by a backslash (\)
    • For example, \\ matches a backslash (\) and \$ matches a dollar sign ($)
Character class Matches
\d Any digit (0–9).
\D Any character that is not a digit.
\s Any whitespace character (such as spaces, tabs and newlines).
\S Any character that is not a whitespace character.
\w Any word character (also called an alphanumeric character)—that is, any uppercase or lowercase letter, any digit or an underscore
\W Any character that is not a word character.

Custom Character Classes

  • Square brackets, [], define a custom character class that matches a single character
  • [aeiou] matches a lowercase vowel
  • [A-Z] matches an uppercase letter
  • [a-z] matches a lowercase letter
  • [a-zA-Z] matches any lowercase or uppercase letter
In [ ]:
'Valid' if re.fullmatch('[A-Z][a-z]*', 'Wally') else 'Invalid'
In [ ]:
'Valid' if re.fullmatch('[A-Z][a-z]*', 'eva') else 'Invalid'
In [ ]:
'Match' if re.fullmatch('[^a-z]', 'A') else 'No match'
In [ ]:
'Match' if re.fullmatch('[^a-z]', 'a') else 'No match'
In [ ]:
'Match' if re.fullmatch('[*+$]', '*') else 'No match'
In [ ]:
'Match' if re.fullmatch('[*+$]', '!') else 'No match'

* vs. + Quantifier

  • + matches at least one occurrence of a subexpression
  • * and + are greedy—they match as many characters as possible
In [ ]:
'Valid' if re.fullmatch('[A-Z][a-z]+', 'Wally') else 'Invalid'
In [ ]:
'Valid' if re.fullmatch('[A-Z][a-z]+', 'E') else 'Invalid'

Other Quantifiers

  • ? quantifier matches zero or one occurrences of a subexpression
In [ ]:
'Match' if re.fullmatch('labell?ed', 'labelled') else 'No match'
In [ ]:
'Match' if re.fullmatch('labell?ed', 'labeled') else 'No match'
In [ ]:
'Match' if re.fullmatch('labell?ed', 'labellled') else 'No match'
  • labell?ed matches labelled (the U.K. English spelling) and labeled (the U.S. English spelling), but not the misspelled word labellled
  • l? indicates that there can be zero or one more l characters before the remaining literal ed characters

Other Quantifiers (cont.)

  • Can match at least n occurrences of a subexpression with the {n,} quantifier
In [ ]:
'Match' if re.fullmatch(r'\d{3,}', '123') else 'No match'
In [ ]:
'Match' if re.fullmatch(r'\d{3,}', '1234567890') else 'No match'
In [ ]:
'Match' if re.fullmatch(r'\d{3,}', '12') else 'No match'

Other Quantifiers (cont.)

  • Can match between n and m (inclusive) occurrences of a subexpression with the {n,m} quantifier
In [ ]:
'Match' if re.fullmatch(r'\d{3,6}', '123') else 'No match'
In [ ]:
'Match' if re.fullmatch(r'\d{3,6}', '123456') else 'No match'
In [ ]:
'Match' if re.fullmatch(r'\d{3,6}', '1234567') else 'No match'
In [ ]:
'Match' if re.fullmatch(r'\d{3,6}', '12') else 'No match'

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.