Post

SPAdes Basic Usage

Comprehensive guide to SPAdes genome assembly input: FASTQ file preparation, quality control, and preprocessing for accurate sequencing data analysis across single-cell, paired-end, and metagenomic research applications.

SPAdes Basic Usage

SPAdes Basic Input

SPAdes takes as input paired-end reads, mate-pairs and single (unpaired) reads in FASTA and FASTQ (can be gzipped) formats. SPAdes supports multiple input file formats for different sequencing technologies and experimental designs:

  1. FASTQ Format (Recommended)
  • Standard format for high-throughput sequencing data
  • Contains sequence reads with quality scores
  • Supported file extensions: .fastq, .fq and Compression Support: .gz, .bz2

FASTQ Format Structure

1
2
3
4
@Sequence_Identifier
NUCLEOTIDE_SEQUENCE
+
QUALITY_SCORES

Supported Input Types

  1. Paired-End Reads Most common input type; Two separate files for forward and reverse reads Command format:
    1
    
    spades.py -1 forward_reads.fastq -2 reverse_reads.fastq
    
  2. Single-End Reads Single file with unpaired reads; Command format:
    1
    
    spades.py -s single_reads.fastq
    
  3. Mate-Pair Libraries Long-range paired reads; Use --mp or --mp-rf flags
    1
    
    spades.py --mp1-1 mate_pair1_1.fastq --mp1-2 mate_pair1_2.fastq
    
  4. Multiple Libraries Combine different sequencing libraries
    1
    2
    
    spades.py -1 pe_forward.fastq -2 pe_reverse.fastq \
           --mp1-1 mp_forward.fastq --mp1-2 mp_reverse.fastq
    

Input Quality Considerations

Read Quality Metrics

Metric Recommendation Minimum Threshold
Read Length ≥ 75 base pairs 50 base pairs
Quality Encoding Phred+33 or Phred+64 -
Average Quality Score ≥ Q30 Q20

Preprocessing Recommendations

Quality Trimming: Remove low-quality bases from read ends; Tools: Trimmomatic, fastp, cutadapt

Adapter Removal: Remove sequencing adapters; Use tools like Cutadapt or AdapterRemoval

Read Filtering: Remove short reads; Filter out reads with excessive N bases; Remove reads with low complexity

Input Preparation Workflow Example

1
2
3
4
5
6
7
8
9
10
11
12
# Example preprocessing pipeline
# 1. Quality and adapter trimming
trimmomatic PE input_R1.fastq input_R2.fastq \
    output_R1_paired.fastq output_R1_unpaired.fastq \
    output_R2_paired.fastq output_R2_unpaired.fastq \
    ILLUMINACLIP:adapters.fa:2:30:10 \
    LEADING:3 TRAILING:3 MINLEN:75

# 2. SPAdes assembly
spades.py -1 output_R1_paired.fastq \
          -2 output_R2_paired.fastq \
          --careful -o assembly_output
  • Retain original raw sequencing data
  • Document all preprocessing steps
  • Use compressed input files
  • Validate input quality before assembly
  • Choose appropriate assembly mode

Emerging Techniques

  • Machine learning-assisted read filtering
  • Adaptive k-mer size selection
  • Enhanced metagenomic assembly algorithms

Citation and usage

Prjibelski, Andrey, et al. “Using SPAdes De Novo Assembler.” Current Protocols in Bioinformatics, vol. 70, no. 1, June 2020, https://doi.org/10.1002/cpbi.102.

This post is licensed under CC BY-NC 4.0 by the author.