How to read a text file into a list of words in Python?

In this short tutorial we will learn how to retrieve the contents of a file as a list of words with Python.

Define sample file contents

Let’s start by creating a new file with some sample content that you can use in order to follow along this tutorial.

from pathlib import Path
f_path = Path(r"C:\WorkDir\word_file.txt")
f_path.write_text('This is our log file that we will parse using Python code.')

This will return the integer 58, which is the number of characters in the text we have written.

Import list of words from file to a list

To create a list of words from the file we’ll use two functions:

  • Read(): reads the entire file stream into a string object.
  • Split(): that divides a string into a list object. Returns a list.

We’ll also use the with block, which takes care of the file handling and saves us the need to explicitly close the file after reading it.

with open(f_path, 'r') as f_object:
    word_lst = f_object.read().split()

If we print the word_lst we’ll get a word byword split of our file.

print (word_lst)
['This', 'is', 'our', 'log', 'file', 'that', 'we', 'will', 'parse', 'using', 'Python', 'code.']

Read multiple line file word by word

We can use the method outlined above in order also to split multi – line files to words. But what if we want to get a separate word by word list for every line / string in our file?

Let’s define a multi -line file.

from pathlib import Path

# define content for a multi line file
f_path = Path(r"C:\WorkDir\multi_line_file.txt")
f_path.write_text('This is our log file that we will parse using Python code. \n This is the second line.')

# open the file and read line by line
with open(f_path, 'r') as f_object:
    line_lst = f_object.readlines()

#use a list comprehension to split the line strings to words
multi_word_lst = [line.split() for line in line_lst]

#output the list
print(multi_word_lst)

This will render exactly that list of lists, each representing the words in every line

[['This', 'is', 'our', 'log', 'file', 'that', 'we', 'will', 'parse', 'using', 'Python', 'code.'], ['This', 'is', 'the', 'second', 'line.']]

Related learning

How to create a csv file and write to it line by line?