Python for Biological Data Analysis, Part 2: Working with Files
Originally published on my legacy blog in 2019. Updated for clarity and modern Python file handling on 5 February 2026.
In Part 1, I worked with short sequences directly in Python strings. In real bioinformatics workflows, sequences are usually stored in files, often very large ones. This post covers the basic read and write patterns you need to handle them safely.
Reading an external file
A clean modern approach is to use a context manager (with open(...)) so the file closes automatically.
filename = "mydna.txt"
with open(filename, "r", encoding="utf-8") as sequence_file:
seq_contents = sequence_file.read()
print(seq_contents)This pattern is safer than leaving files open accidentally.
File modes you should know
open() accepts a mode argument:
"r": read mode (default)."w": write mode (overwrites existing content, creates file if missing)."a": append mode (adds content to the end, creates file if missing).
filename = "mydna.txt"
with open(filename, "r", encoding="utf-8") as read_handle:
existing = read_handle.read()
with open(filename, "w", encoding="utf-8") as write_handle:
write_handle.write("ATTGCTGA\n")
with open(filename, "a", encoding="utf-8") as append_handle:
append_handle.write("GGAATC\n")Writing content to a file
Writing is done with .write().
with open("output.txt", "w", encoding="utf-8") as out_file:
out_file.write("ATCGATCG\n")Use "w" carefully because it replaces existing content.
Closing files
If you use with open(...), explicit .close() is not required. Python closes the file automatically when the block ends.
If you use open() without with, then you must call .close() yourself.
Practical checklist
- Store the filename in a variable when reused.
- Open the file with the correct mode.
- Read or write using methods like
.read()and.write(). - Prefer context managers to avoid file-handle leaks.
- Be explicit when using write mode so you do not overwrite important data unintentionally.
These basics make it much easier to move from toy examples to realistic biological datasets.