Post

Comprehensive Trimmomatic Usage Guide

Trimmomatic is a powerful, fast, and flexible command-line tool designed for trimming and filtering high-throughput sequencing data, particularly Illumina FASTQ files.

Comprehensive Trimmomatic Usage Guide

Introduction to Trimmomatic

Trimmomatic is a versatile, fast, and multithreaded command-line tool designed for processing Illumina FASTQ data. It performs essential trimming tasks such as removing adapters, trimming low-quality bases, and filtering reads based on quality and length criteria. The software supports both single-end and paired-end data.

It provides robust methods for:

  • Removing adapter sequences
  • Trimming low-quality bases
  • Filtering reads based on quality and length
  • Supporting both single-end and paired-end sequencing data

This guide provides an in-depth tutorial on using Trimmomatic, detailing its features, usage modes, processing steps, and examples.

Installation and Prerequisites

System Requirements

  • Java Runtime Environment (JRE)
  • Sufficient computational resources
  • Compressed input file support (gzip, bzip2)

Installation

Trimmomatic is distributed as a .jar file requiring Java. To install:

  1. Ensure Java Runtime Environment (JRE) is installed.
  2. Download the latest version of Trimmomatic from the official repository.
  3. Save the trimmomatic-.jar file to your desired directory.

You can run Trimmomatic using the java -jar command.

Basic Command Structure

Trimmomatic supports two primary modes of operation:

Single-End Mode

1
2
3
java -jar trimmomatic.jar SE [options] \
<input_file> <output_file> \
<processing_steps>

Paired-End Mode

1
2
3
4
5
java -jar trimmomatic.jar PE [options] \
<input_file_1> <input_file_2> \
<output_paired_1> <output_unpaired_1> \
<output_paired_2> <output_unpaired_2> \
<processing_steps>

Application example PE mode

1
2
3
4
5
6
java -jar trimmomatic-0.39.jar PE \
    input_forward.fq.gz input_reverse.fq.gz \
    output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
    output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True \
    LEADING:3 TRAILING:3 MINLEN:36

Alternative Command (More Sensitive)

1
2
3
4
5
6
7
java -jar trimmomatic-0.35.jar PE -phred33 \
    input_forward.fq.gz input_reverse.fq.gz \
    output_forward_paired.fq.gz output_forward_unpaired.fq.gz \
    output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 \
    LEADING:3 TRAILING:3 \
    SLIDINGWINDOW:4:15 MINLEN:36

Processing Steps Breakdown

  1. Remove adapters using TruSeq3-PE.fa
  2. Remove leading low-quality bases (quality < 3)
  3. Remove trailing low-quality bases (quality < 3)
  4. Optional: Sliding window trimming (4-base window, quality threshold 15)
  5. Drop reads shorter than 36 bases

Application example SE mode

1
2
3
4
5
java -jar trimmomatic-0.35.jar SE -phred33 \
    input.fq.gz output.fq.gz \
    ILLUMINACLIP:TruSeq3-SE:2:30:10 \
    LEADING:3 TRAILING:3 \
    SLIDINGWINDOW:4:15 MINLEN:36

Processing Steps

Trimmomatic offers multiple processing steps that can be applied in sequence:

ILLUMINACLIP: Adapter Removal

1
2
3
4
5
ILLUMINACLIP:<adapter_file>:<seed_mismatches>:<palindrome_clip_threshold>:<simple_clip_threshold> \
#Remove adapters (ILLUMINACLIP:TruSeq3-PE.fa:2:30:10)
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:True
#less sensitive for adapters
ILLUMINACLIP:TruSeq3-PE.fa:2:30:10
  • Removes Illumina adapters
  • Supports both simple and palindrome clipping strategies
  • Recommended for most Illumina sequencing data

SLIDINGWINDOW: Quality-Based Trimming

1
2
3
4
SLIDINGWINDOW:<window_size>:<required_quality>
#Example
SLIDINGWINDOW:4:15
#Scan the read with a 4-base wide sliding window, cutting when the average quality per base drops below 15
  • Scans reads with a sliding window
  • Trims when average quality drops below threshold
  • Prevents removal of entire reads due to single low-quality bases

LEADING: Trim Low-Quality Bases from Start

1
2
3
4
LEADING:<quality_threshold>
#Example
LEADING:3
Remove leading low quality quality 3
  • Removes low-quality bases from the start of reads

TRAILING: Trim Low-Quality Bases from End

1
2
3
4
TRAILING:<quality_threshold>
#Example
TRAILING:3
Remove trailing low quality 3
  • Removes low-quality bases from the end of reads

CROP: Limit Read Length

1
CROP:<length>
  • Truncates reads to a specified maximum length

HEADCROP: Remove Fixed Number of Bases

1
HEADCROP:<length>
  • Removes a specified number of bases from the start of reads

MINLEN: Minimum Read Length Filter

1
2
3
MINLEN:<length>
#Example
MINLEN:36
  • Discards reads shorter than specified length
  • Typically applied after other processing steps

Practical Example: Paired-End Processing

1
2
3
4
5
6
7
8
9
java -jar trimmomatic-0.32.jar PE \
    input_forward.fq.gz input_reverse.fq.gz \
    output_paired_forward.fq.gz output_unpaired_forward.fq.gz \
    output_paired_reverse.fq.gz output_unpaired_reverse.fq.gz \
    ILLUMINACLIP:TruSeq3-PE.fa:2:30:10 \
    LEADING:3 \
    TRAILING:3 \
    SLIDINGWINDOW:4:15 \
    MINLEN:36

This example demonstrates:

  • Adapter removal using TruSeq3 adapters
  • Removing low-quality bases from start and end
  • Sliding window quality trimming
  • Minimum length filtering

Adapter Sequences

Choosing the Right Adapter File

  • TruSeq2: For GAII machines
  • TruSeq3: For HiSeq and MiSeq machines
  • Verify adapter sequences using FASTQC’s “Overrepresented Sequences” report

Best Practices

  • Always use the most recent adapter sequences
  • Adjust parameters based on your specific sequencing platform
  • Validate trimming results with quality control tools
  • Consider downstream analysis requirements when setting thresholds

Performance Optimization

  • Use -threads parameter for multi-core processing
  • Supports automatic thread selection
  • Quality encoding auto-detection since version 0.32

Troubleshooting

  • Check input file formats (FASTQ, compressed)
  • Verify adapter sequences
  • Ensure Java version compatibility
  • Review trimming logs for detailed processing information

Limitations and Considerations

  • Adapter detection is a trade-off between sensitivity and specificity
  • Short adapter fragments might remain undetected
  • Performance varies with read length and quality

Always validate and optimize parameters for your specific dataset and research requirements.


References

Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina Sequence Data. Bioinformatics, btu170.

Download Trimmomatic manual

This post is licensed under CC BY-NC 4.0 by the author.