How to read a long string line by line in Python?

In today’s Python Automation tutorial will be to read a long string that you might acquire from an external file (text, csv, pickle, others), pandas DataFrame or a SQL database into a Python list.

We will start from a simpler use case in which we have already a multi-line string variable defined, and then go to a more practical case in which we will parse the content of a text file line by line and add those to a list.

Reading a multi-line string line by line

Assume that we have the following string variable defined:

log_str = 'This is the first line. The file was changed at:2022-08-12 \n This is the second line. The file was changed at:2022-09-13 \n This is the third line. The file was changed at:2022-10-13'

The simplest way to accomplish this is to use the string striplines() method, which divides the string using the newline character /n. Here’s how to achieve that:

my_log_lst = log_str.splitlines()
print(my_log_lst)

This will return the following list of strings:

['This is the first line. The file was changed at:2022-08-12 ', ' This is the second line. The file was changed at:2022-09-13 ', ' This is the third line. The file was changed at:2022-10-13']

Appending string which match a condition

A variation of the previous exercise is a case in which we would like to search for strings which match a specific substring. We can use a simple list comprehension:

my_filter_log_lst = [line for line in log_str.splitlines() if '12' in line]

print (my_filter_log_lst)

Only the first line will match the condition statement in our list comprehension. This will return the following list:

['This is the first line. The file was changed at:2022-08-12 ']

Reading Text file string contents into a list

The next case is that we have a multi-line text file which you would like to import into a Python list object for further manipulation. Consider the code below. We’ll first define a file object, then iterate through the file content. Each of the lines is returned as a string object. We then delete newlines from the strings and append the lines to the list. Last, we print the list contents.

from pathlib import Path

# define file object
my_file = Path(r"C:\WorkDir\text_file.txt")

# iterate through the lines (strings)
my_log_lst = []
with open (my_file, 'r') as f:
    for i, line in enumerate(f):
        my_log_lst. append (f"Line: {i+1}, Content={line}".strip())

#print the list contents
print (my_log_lst)

This will return the following list:

['Line: 1, Content=This is the first line. The file was changed at:2022-08-13', 
'Line: 2, Content=This is the second line. The file was changed at:2022-09-13',
'Line: 3, Content=This is the third line. The file was changed at:2022-10-13']