8.12.3 Other Search Functions; Accessing Matches

Function search—Finding the First Match Anywhere in a String

  • search looks in a string for the first occurrence of a substring that matches a regular expression and returns a match object (of type SRE_Match) that contains the matching substring
  • Match object’s group method returns that substring
In [ ]:
import re
In [ ]:
result = re.search('Python', 'Python is fun')
In [ ]:
result.group() if result else 'not found'
  • search returns None if the string does not contain the pattern
In [ ]:
result2 = re.search('fun!', 'Python is fun')
In [ ]:
result2.group() if result2 else 'not found'
  • Can search for a match only at the beginning of a string with function match

Ignoring Case with the Optional flags Keyword Argument

  • Many re module functions receive an optional flags keyword argument
  • Changes how regular expressions are matched
In [ ]:
result3 = re.search('Sam', 'SAM WHITE', flags=re.IGNORECASE)
In [ ]:
result3.group() if result3 else 'not found'

Metacharacters That Restrict Matches to the Beginning or End of a String

  • ^ metacharacter at the beginning of a regular expression (and not inside square brackets) is an anchor
  • Indicaties that the expression matches only the beginning of a string
In [ ]:
result = re.search('^Python', 'Python is fun')
In [ ]:
result.group() if result else 'not found'
In [ ]:
result = re.search('^fun', 'Python is fun')
In [ ]:
result.group() if result else 'not found'
  • $ metacharacter at the end of a regular expression is an anchor indicating that the expression matches only the end of a string
In [ ]:
result = re.search('Python$', 'Python is fun')
In [ ]:
result.group() if result else 'not found'
In [ ]:
result = re.search('fun$', 'Python is fun')
In [ ]:
result.group() if result else 'not found'

Function findall and finditer—Finding All Matches in a String

  • findall finds every matching substring in a string
  • Returns a list of the matching substrings
In [ ]:
contact = 'Wally White, Home: 555-555-1234, Work: 555-555-4321'
In [ ]:
re.findall(r'\d{3}-\d{3}-\d{4}', contact)
  • finditer works like findall, but returns a lazy iterable of match objects
In [ ]:
for phone in re.finditer(r'\d{3}-\d{3}-\d{4}', contact):
    print(phone.group())

Capturing Substrings in a Match

  • Use parentheses metacharacters( and )—to capture substrings in a match
In [ ]:
text = 'Charlie Cyan, e-mail: demo1@deitel.com'
In [ ]:
pattern = r'([A-Z][a-z]+ [A-Z][a-z]+), e-mail: (\w+@\w+\.\w{3})'
In [ ]:
result = re.search(pattern, text)
  • The regular expression specifies two substrings to capture, each denoted by the metacharacters ( and )
  • ( and ) do not affect whether the pattern is found in the string text
  • match function returns a match object only if the entire pattern is found in the string text
  • match object’s groups method returns a tuple of the captured substrings
In [ ]:
result.groups()
  • match object’s group method returns the entire match as a single string
In [ ]:
result.group()
  • Access each captured substring by passing an integer to the group method
In [ ]:
result.group(1)
In [ ]:
result.group(2) 

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.