8.9 Splitting and Joining Strings

  • Tokens typically are separated by whitespace characters such as blank, tab and newline, though other characters may be used—the separators are known as delimiters

Splitting Strings

  • To tokenize a string at a custom delimiter, specify the delimiter string that split uses to tokenize the string
In [29]:
letters = 'A, B, C, D'
In [30]:
letters.split(', ')
Out[30]:
['A', 'B', 'C', 'D']
  • Specify the maximum number of splits with an integer as the second argument
  • Last token is the remainder of the string
In [31]:
letters.split(', ', 2)
Out[31]:
['A', 'B', 'C, D']
  • rsplit performs the same task as split but processes the maximum number of splits from the end of the string toward the beginning

Joining Strings

  • join concatenates the strings in its argument, which must be an iterable containing only string values
  • The separator between the concatenated items is the string on which you call join
In [32]:
letters_list = ['A', 'B', 'C', 'D']
In [33]:
','.join(letters_list)
Out[33]:
'A,B,C,D'
  • Join the results of a list comprehension that creates a list of strings
In [34]:
','.join([str(i) for i in range(10)])
Out[34]:
'0,1,2,3,4,5,6,7,8,9'

String Methods partition and rpartition

String method partition splits a string into a tuple of three strings based on the method’s separator argument

  • the part of the original string before the separator
  • the separator itself
  • the part of the string after the separator
In [35]:
'Amanda: 89, 97, 92'.partition(': ')
Out[35]:
('Amanda', ': ', '89, 97, 92')
  • To search for the separator from the end of the string, use method rpartition
In [36]:
url = 'http://www.deitel.com/books/PyCDS/table_of_contents.html'
In [37]:
rest_of_url, separator, document = url.rpartition('/')
In [38]:
document
Out[38]:
'table_of_contents.html'
In [39]:
rest_of_url
Out[39]:
'http://www.deitel.com/books/PyCDS'

String Method splitlines

  • splitlines returns a list of new strings representing lines of text split at each newline character in the original string
In [48]:
lines = """This is line 1
This is line2
This is line3"""
In [49]:
lines
Out[49]:
'This is line 1\nThis is line2\nThis is line3'
In [50]:
lines.splitlines()
Out[50]:
['This is line 1', 'This is line2', 'This is line3']
  • Passing True to splitlines keeps the newlines
In [51]:
lines.splitlines(True)
Out[51]:
['This is line 1\n', 'This is line2\n', 'This is line3']

©1992–2020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the book Intro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud.

DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.