Introduction to Trimmomatic
Trimmomatic is a versatile, fast, and multithreaded command-line tool designed for processing Illumina FASTQ data. It performs essential trimming tasks such as removing adapters, trimming low-quality bases, and filtering reads based on quality and length criteria. The software supports both single-end and paired-end data.
It provides robust methods for:
- Removing adapter sequences
- Trimming low-quality bases
- Filtering reads based on quality and length
- Supporting both single-end and paired-end sequencing data
This guide provides an in-depth tutorial on using Trimmomatic, detailing its features, usage modes, processing steps, and examples.
Installation and Prerequisites
System Requirements
- Java Runtime Environment (JRE)
- Sufficient computational resources
- Compressed input file support (gzip, bzip2)
Installation
Trimmomatic is distributed as a .jar file requiring Java. To install:
- Ensure Java Runtime Environment (JRE) is installed.
- Download the latest version of Trimmomatic from the official repository.
- Save the trimmomatic-.jar file to your desired directory.
You can run Trimmomatic using the java -jar command.
Basic Command Structure
Trimmomatic supports two primary modes of operation:
Single-End Mode
1
2
3
| java -jar trimmomatic.jar SE [options] \
<input_file> <output_file> \
<processing_steps>
|
Paired-End Mode
1
2
3
4
5
| java -jar trimmomatic.jar PE [options] \
<input_file_1> <input_file_2> \
<output_paired_1> <output_unpaired_1> \
<output_paired_2> <output_unpaired_2> \
<processing_steps>
|
Application example PE mode
Recommended Command
1
2
3
4
5
6
| java -jar trimmomatic-0.39.jar PE \
input_forward.fq.gz input_reverse.fq.gz \
output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True \
LEADING:3 TRAILING:3 MINLEN:36
|
Alternative Command (More Sensitive)
1
2
3
4
5
6
7
| java -jar trimmomatic-0.35.jar PE -phred33 \
input_forward.fq.gz input_reverse.fq.gz \
output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 \
LEADING:3 TRAILING:3 \
SLIDINGWINDOW:4:15 MINLEN:36
|
Processing Steps Breakdown
- Remove adapters using TruSeq3-PE.fa
- Remove leading low-quality bases (quality < 3)
- Remove trailing low-quality bases (quality < 3)
- Optional: Sliding window trimming (4-base window, quality threshold 15)
- Drop reads shorter than 36 bases
Application example SE mode
1
2
3
4
5
| java -jar trimmomatic-0.35.jar SE -phred33 \
input.fq.gz output.fq.gz \
ILLUMINACLIP:TruSeq3-SE:2:30:10 \
LEADING:3 TRAILING:3 \
SLIDINGWINDOW:4:15 MINLEN:36
|
Processing Steps
Trimmomatic offers multiple processing steps that can be applied in sequence:
ILLUMINACLIP: Adapter Removal
1
2
3
4
5
| ILLUMINACLIP:<adapter_file>:<seed_mismatches>:<palindrome_clip_threshold>:<simple_clip_threshold> \
#Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True
#less sensitive for adapters
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10
|
- Removes Illumina adapters
- Supports both simple and palindrome clipping strategies
- Recommended for most Illumina sequencing data
SLIDINGWINDOW: Quality-Based Trimming
1
2
3
4
| SLIDINGWINDOW:<window_size>:<required_quality>
#Example
SLIDINGWINDOW:4:15
#Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15
|
- Scans reads with a sliding window
- Trims when average quality drops below threshold
- Prevents removal of entire reads due to single low-quality bases
LEADING: Trim Low-Quality Bases from Start
1
2
3
4
| LEADING:<quality_threshold>
#Example
LEADING:3
Remove leading low quality quality 3
|
- Removes low-quality bases from the start of reads
TRAILING: Trim Low-Quality Bases from End
1
2
3
4
| TRAILING:<quality_threshold>
#Example
TRAILING:3
Remove trailing low quality 3
|
- Removes low-quality bases from the end of reads
CROP: Limit Read Length
- Truncates reads to a specified maximum length
HEADCROP: Remove Fixed Number of Bases
- Removes a specified number of bases from the start of reads
MINLEN: Minimum Read Length Filter
1
2
3
| MINLEN:<length>
#Example
MINLEN:36
|
- Discards reads shorter than specified length
- Typically applied after other processing steps
Practical Example: Paired-End Processing
1
2
3
4
5
6
7
8
9
| java -jar trimmomatic-0.32.jar PE \
input_forward.fq.gz input_reverse.fq.gz \
output_paired_forward.fq.gz output_unpaired_forward.fq.gz \
output_paired_reverse.fq.gz output_unpaired_reverse.fq.gz \
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 \
LEADING:3 \
TRAILING:3 \
SLIDINGWINDOW:4:15 \
MINLEN:36
|
This example demonstrates:
- Adapter removal using TruSeq3 adapters
- Removing low-quality bases from start and end
- Sliding window quality trimming
- Minimum length filtering
Adapter Sequences
Choosing the Right Adapter File
- TruSeq2: For GAII machines
- TruSeq3: For HiSeq and MiSeq machines
- Verify adapter sequences using FASTQC’s “Overrepresented Sequences” report
Best Practices
- Always use the most recent adapter sequences
- Adjust parameters based on your specific sequencing platform
- Validate trimming results with quality control tools
- Consider downstream analysis requirements when setting thresholds
- Use
-threads parameter for multi-core processing
- Supports automatic thread selection
- Quality encoding auto-detection since version 0.32
Troubleshooting
- Check input file formats (FASTQ, compressed)
- Verify adapter sequences
- Ensure Java version compatibility
- Review trimming logs for detailed processing information
Limitations and Considerations
- Adapter detection is a trade-off between sensitivity and specificity
- Short adapter fragments might remain undetected
- Performance varies with read length and quality
Always validate and optimize parameters for your specific dataset and research requirements.
References
Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.
Download Trimmomatic manual