12. Reading & Writing Files#

12.1. Reading a File#

In Python, handling files begins with opening them using the built-in open function. This function requires at least two arguments: the filename and the mode. For reading purposes, the mode is 'r', which stands for read-only. Here’s a basic example demonstrating how to open a file and ensure it is properly closed.

file = open('example.txt', 'r')
# Process the file here
file.close()

When opening files, several issues might arise, such as the file not existing or permission problems preventing access. To handle these situations gracefully, it is advisable to use a try except block. This allows you to catch and respond to specific exceptions.

file = None
try:
    file = open('example.txt', 'r')
    # File operations
except FileNotFoundError:
    print("The file does not exist.")
except IOError:
    print("An error occurred while accessing the file.")
finally:
    if file :
        file.close()

This approach ensures that your program can handle file-related errors without crashing and provides feedback to users about what went wrong.

12.2. Working Directory in Python#

The working directory, also known as the current working directory (CWD), is the default directory where a Python script operates and where file operations are performed if no specific path is provided. Essentially, it is the directory in which Python looks for files to open or where it saves files if a relative path is specified.

When you run a Python script, the working directory is initially set to the location from which the script was executed. This directory serves as the base path for resolving relative file paths, meaning any file operations using relative paths will be performed relative to this directory.

12.2.1. Determining the Working Directory#

To determine the current working directory in Python, you use the os module, which provides a function called getcwd. This function returns the absolute path of the directory from which the script is currently running. Here’s how you can use it:

import os

current_directory = os.getcwd()
print("Current Working Directory:", current_directory)

12.2.2. How is the Working Directory Determined?#

  • Initial Default: When you start a Python script from a command line or terminal, the working directory is typically the directory from which you executed the script. For instance, if you run python my_script.py from /home/user/projects/, then /home/user/projects/ is the initial working directory.

  • Programmatic Changes: You can programmatically change the working directory within your script using the os.chdir function. This can be useful if your script needs to operate in a different directory or if you need to switch directories during execution.

import os

# Change the working directory
os.chdir('/path/to/new/directory')

In this example, os.chdir('/path/to/new/directory') sets the working directory to /path/to/new/directory.

When running Python scripts from an Integrated Development Environment (IDE) (see IDE (Integrated Development Environment)) such as PyCharm or Visual Studio Code, the working directory may be configured within the IDE’s settings. Each IDE has its own way of specifying or changing the working directory for scripts. Therefore, it’s essential to check your IDE’s configuration if you encounter discrepancies or need to change the working directory.

12.2.3. Relative and Absolute Paths#

When specifying file paths (see Path), you have two primary options: relative and absolute paths.

  • Relative Path: This path is relative to the current working directory (see Working Directory) of the script. For example, 'data/example.txt' indicates that the file is located in a folder named data within the current directory. Relative paths are useful for portability, as they allow your code to be more flexible when moving between different environments.

  • Absolute Path: An absolute path provides the complete path from the root of the filesystem to the file. For example, '/home/user/data/example.txt' on Linux/macOS or 'C:\\Users\\User\\data\\example.txt' on Windows. Absolute paths are unambiguous but can reduce portability if you share your code with others who have different directory structures.

12.3. Reading Lines#

In Python, when a file is opened, the file cursor (File Cursor) indicates the current position within the file. Initially, this cursor is positioned at the start of the file. As you read through the file, the cursor advances forward through the data.

12.3.1. Reading Lines with a While Loop#

Reading lines from a file one by one can be accomplished using a while loop. The function readline reads a single line from the file; the newline character (\n) is left at the end of the string, and is only omitted on the last line of the file. This makes the return value unambiguous; if readline returns an empty string, the end of the file has been reached, while a blank line is represented by '\n', a string containing only a single newline.

file = None
try:
    file = open('example.txt', 'r')
    line = file.readline()
    while line:
        print(line)
        line = file.readline()
    # File operations
except FileNotFoundError:
    print("The file does not exist.")
except IOError:
    print("An error occurred while accessing the file.")
finally:
    if file :
        file.close()

In this example:

  • file.readline() reads a single line from the file.

  • The while loop continues until line is an empty string, which signifies the end of the file.

  • The finally block ensures that the file is properly closed, even if an error occurs during the file operations.

12.3.2. Reading Lines with a For Loop#

A more concise and Pythonic approach to reading lines from a file is using a for loop. This method is preferred for its simplicity and readability.

file = None
try:
    file = open('example.txt', 'r')
    for line in file:
        print(line) 
except FileNotFoundError:
    print("The file does not exist.")
except IOError:
    print("An error occurred while accessing the file.")
finally:
    if file :
        file.close()

In this example:

  • The for loop iterates over each line in the file directly.

  • Each line is printed.

12.3.3. Reading All Lines with readlines()#

If the goal is to read all lines from a file into a list, you can use the readlines method. This method reads the entire file and returns a list where each element is a line from the file.

file = None
try:
    file = open('example.txt', 'r')
    lines = file.readlines()
    for line in lines:
        print(line) 
except FileNotFoundError:
    print("The file does not exist.")
except IOError:
    print("An error occurred while accessing the file.")
finally:
    if file :
        file.close()

In this example:

  • file.readlines reads all lines from the file and stores them as a list of strings.

  • The for loop iterates through the list and prints each line.

12.4. File Encoding#

Character encoding is a system used to convert characters into a format that can be easily stored and transmitted. Historically, different encoding schemes have been developed to handle various character sets, including alphabets, punctuation marks, and special symbols.

  • ASCII (American Standard Code for Information Interchange): The earliest and simplest encoding, ASCII represents characters using 7 bits, which allows for 128 unique characters. It includes English letters, digits, and some control characters but does not support accented characters or non-English alphabets.

  • Extended ASCII: To accommodate characters beyond the basic ASCII set, Extended ASCII was developed. It uses 8 bits, allowing for 256 characters. However, it still falls short when it comes to representing diverse global alphabets.

  • ISO-8859-1 (Latin-1): An 8-bit character set that extends ASCII to include characters commonly used in Western European languages. It includes accented characters but is limited in scope.

  • UTF-8 (Unicode Transformation Format, 8-bit): The most widely used encoding today, UTF-8 can represent any character in the Unicode standard, which includes virtually all characters from all written languages. It uses a variable number of bytes for each character, which makes it both space-efficient and comprehensive.

If the encoding used to read a file does not match the file’s actual encoding, several issues can occur:

  • Character Corruption: Characters may be misinterpreted and displayed as garbled text or question marks.

  • Errors During Reading: Python may raise an error if it encounters bytes that do not match the expected encoding scheme.

  • Loss of Information: Important information might be lost or rendered incorrectly, making the text unreadable or misleading.

For example, if a file encoded in UTF-8 is read using ASCII encoding, any non-ASCII characters (such as accented letters) will not be interpreted correctly.

12.4.1. Specifing Encoding#

To handle files correctly, you need to specify the encoding when opening a file. Here’s how you can do this in Python without using the with statement:

file = None
try:
    file = open('example.txt', 'r', encoding='utf-8')
    lines = file.readlines()
    for line in lines:
        print(line.strip())  # strip() removes leading/trailing whitespace
except FileNotFoundError:
    print("The file does not exist.")
except IOError:
    print("An error occurred while accessing the file.")
except UnicodeDecodeError:
    print("There was an error decoding the file. Please check the file's encoding.")
finally:
    if file :
        file.close()

In this example:

  • We use open to open the file. The encoding='utf-8' parameter specifies that the file is encoded in UTF-8.

  • We handle UnicodeDecodeError exceptions. It manages cases where the file’s encoding does not match the specified encoding.

This approach ensures that your program can handle files with various encodings robustly and gracefully manage errors.

12.5. Writing Files#

When you need to write data to a file, you use different modes depending on your requirements.

  • Mode 'w': Opens the file for writing. If the file does not exist, it creates a new one. If the file already exists, it truncates the file, clearing its contents before writing.

  • Mode 'a': Opens the file for appending data. New content is added at the end of the file without altering existing data.

Then, you may use the write function. Here’s an example of writing to a file:

file = None
try:
    file = open('example.txt', 'w')
    file.write("Hello, World!")
except FileNotFoundError:
    print("The file does not exist.")
except IOError:
    print("An error occurred while wrinting into the file.")
finally:
    if file:
        file.close()