Extracting Mapped and Unmapped Reads from FASTQ
Bioinformatics
Command Line
Sequencing
A practical command-line workflow to separate mapped and unmapped reads and recover them into FASTQ files.
Originally published on my legacy blog in 2019. Updated for technical accuracy and command clarity on 5 February 2026.
When only a fraction of reads maps to a reference genome, a common next step is to split mapped and unmapped reads for separate inspection.
This post shows a straightforward workflow using samtools and seqtk.
Workflow overview
- Align reads to a reference and produce a SAM/BAM file.
- Split mapped vs unmapped alignments using SAM FLAG filters.
- Extract read IDs from each group.
- Pull those reads from the original FASTQ.
Step 1: Start from an alignment file
Assume you already have an alignment file such as sample.bam.
Step 2: Split mapped and unmapped alignments
Use SAM FLAG-based filters:
-F 4excludes reads marked unmapped (keeps mapped).-f 4keeps only reads marked unmapped.
samtools view -b -F 4 sample.bam > sample.mapped.bam
samtools view -b -f 4 sample.bam > sample.unmapped.bamStep 3: Extract unique read IDs
Get the first column (QNAME), sort, and deduplicate:
samtools view sample.mapped.bam | cut -f1 | sort -u > mapped_ids.lst
samtools view sample.unmapped.bam | cut -f1 | sort -u > unmapped_ids.lstUsing sort -u avoids duplicate IDs from secondary/supplementary alignments.
Step 4: Recover reads from the original FASTQ
Use the read ID lists with seqtk subseq:
seqtk subseq original.fastq mapped_ids.lst > mapped.fastq
seqtk subseq original.fastq unmapped_ids.lst > unmapped.fastqNow you have:
mapped.fastq: reads that mapped to the reference.unmapped.fastq: reads that did not map.
Optional: convert unmapped FASTQ to FASTA
seqtk seq -a unmapped.fastq > unmapped.faThis can be useful for quick downstream checks such as BLAST searches.
Practical notes
- For paired-end data, keep mate synchronization in mind when extracting reads.
- Always confirm whether your aligner emits secondary/supplementary records and adjust filtering if needed.
- Keep a record of software versions to improve reproducibility.