8.12.1 re Module and Function fullmatch

  • fullmatch checks whether the entire string in its second argument matches the pattern in its first argument
import re

Matching Literal Characters

pattern = '02215'
'Match' if re.fullmatch(pattern, '02215') else 'No match'
'Match' if re.fullmatch(pattern, '51220') else 'No match'
  • First argument is the regular expression pattern to match
    • Any string can be a regular expression
    • literal characters match themselves in the specified order
  • Second argument is the string that should entirely match the pattern.
  • If the second argument matches the pattern in the first argument, fullmatch returns an object containing the matching text, which evaluates to True
  • For no match, fullmatch returns None, which evaluates to False

Metacharacters, Character Classes and Quantifiers

  • Regular expressions typically contain various special symbols called metacharacters:
Regular expression metacharacters
[] {} () \ * + ^ $ ? . |
  • \ metacharacter begins each predefined character class
  • Each matches a specific set of characters
'Valid' if re.fullmatch(r'\d{5}', '02215') else 'Invalid'
'Valid' if re.fullmatch(r'\d{5}', '9876') else 'Invalid'
  • In \d{5}, \d is a character class representing a digit (0–9)
  • A character class is a regular expression escape sequence that matches one character
  • To match more than one, follow the character class with a quantifier
  • {5} repeats \d five times to match five consecutive digits

Other Predefined Character Classes

  • To match any metacharacter as its literal value, precede it by a backslash (\)
    • For example, \\ matches a backslash (\) and \$ matches a dollar sign ($)
Character class Matches
\d Any digit (0–9).
\D Any character that is not a digit.
\s Any whitespace character (such as spaces, tabs and newlines).
\S Any character that is not a whitespace character.
\w Any word character (also called an alphanumeric character)—that is, any uppercase or lowercase letter, any digit or an underscore
\W Any character that is not a word character.

Custom Character Classes

  • Square brackets, [], define a custom character class that matches a single character
  • [aeiou] matches a lowercase vowel
  • [A-Z] matches an uppercase letter
  • [a-z] matches a lowercase letter
  • [a-zA-Z] matches any lowercase or uppercase letter
'Valid' if re.fullmatch('[A-Z][a-z]*', 'Wally') else 'Invalid'
'Valid' if re.fullmatch('[A-Z][a-z]*', 'eva') else 'Invalid'
'Match' if re.fullmatch('[^a-z]', 'A') else 'No match'
'Match' if re.fullmatch('[^a-z]', 'a') else 'No match'
'Match' if re.fullmatch('[*+$]', '*') else 'No match'
'Match' if re.fullmatch('[*+$]', '!') else 'No match'

* vs. + Quantifier

  • + matches at least one occurrence of a subexpression
  • * and + are greedy—they match as many characters as possible
'Valid' if re.fullmatch('[A-Z][a-z]+', 'Wally') else 'Invalid'
'Valid' if re.fullmatch('[A-Z][a-z]+', 'E') else 'Invalid'

Other Quantifiers

  • ? quantifier matches zero or one occurrences of a subexpression
'Match' if re.fullmatch('labell?ed', 'labelled') else 'No match'
'Match' if re.fullmatch('labell?ed', 'labeled') else 'No match'
'Match' if re.fullmatch('labell?ed', 'labellled') else 'No match'
  • labell?ed matches labelled (the U.K. English spelling) and labeled (the U.S. English spelling), but not the misspelled word labellled
  • l? indicates that there can be zero or one more l characters before the remaining literal ed characters

Other Quantifiers (cont.)

  • Can match at least n occurrences of a subexpression with the {n,} quantifier
'Match' if re.fullmatch(r'\d{3,}', '123') else 'No match'
'Match' if re.fullmatch(r'\d{3,}', '1234567890') else 'No match'
'Match' if re.fullmatch(r'\d{3,}', '12') else 'No match'

Other Quantifiers (cont.)

  • Can match between n and m (inclusive) occurrences of a subexpression with the {n,m} quantifier
'Match' if re.fullmatch(r'\d{3,6}', '123') else 'No match'
'Match' if re.fullmatch(r'\d{3,6}', '123456') else 'No match'
'Match' if re.fullmatch(r'\d{3,6}', '1234567') else 'No match'
'Match' if re.fullmatch(r'\d{3,6}', '12') else 'No match'

