SPAdes Basic Usage

Comprehensive guide to SPAdes genome assembly input: FASTQ file preparation, quality control, and preprocessing for accurate sequencing data analysis across single-cell, paired-end, and metagenomic research applications.

Posted Dec 13, 2024

By Beaven Manjengwa

1 min read

SPAdes Basic Usage

SPAdes Basic Input

SPAdes takes as input paired-end reads, mate-pairs and single (unpaired) reads in FASTA and FASTQ (can be gzipped) formats. SPAdes supports multiple input file formats for different sequencing technologies and experimental designs:

FASTQ Format (Recommended)

Standard format for high-throughput sequencing data
Contains sequence reads with quality scores
Supported file extensions: .fastq, .fq and Compression Support: .gz, .bz2

FASTQ Format Structure

@Sequence_Identifier
NUCLEOTIDE_SEQUENCE
+
QUALITY_SCORES

Supported Input Types

Paired-End Reads Most common input type; Two separate files for forward and reverse reads Command format:
1 spades.py -1 forward_reads.fastq -2 reverse_reads.fastq
Single-End Reads Single file with unpaired reads; Command format:
1 spades.py -s single_reads.fastq

Mate-Pair Libraries Long-range paired reads; Use --mp or --mp-rf flags

spades.py --mp1-1 mate_pair1_1.fastq --mp1-2 mate_pair1_2.fastq

Multiple Libraries Combine different sequencing libraries

        
spades.py -1 pe_forward.fastq -2 pe_reverse.fastq \
       --mp1-1 mp_forward.fastq --mp1-2 mp_reverse.fastq

Input Quality Considerations

Read Quality Metrics

Metric	Recommendation	Minimum Threshold
Read Length	≥ 75 base pairs	50 base pairs
Quality Encoding	Phred+33 or Phred+64	-
Average Quality Score	≥ Q30	Q20

Preprocessing Recommendations

Quality Trimming: Remove low-quality bases from read ends; Tools: Trimmomatic, fastp, cutadapt

Adapter Removal: Remove sequencing adapters; Use tools like Cutadapt or AdapterRemoval

Read Filtering: Remove short reads; Filter out reads with excessive N bases; Remove reads with low complexity

Input Preparation Workflow Example

        
      
# Example preprocessing pipeline
# 1. Quality and adapter trimming
trimmomatic PE input_R1.fastq input_R2.fastq \
    output_R1_paired.fastq output_R1_unpaired.fastq \
    output_R2_paired.fastq output_R2_unpaired.fastq \
    ILLUMINACLIP:adapters.fa:2:30:10 \
    LEADING:3 TRAILING:3 MINLEN:75

# 2. SPAdes assembly
spades.py -1 output_R1_paired.fastq \
          -2 output_R2_paired.fastq \
          --careful -o assembly_output

Recommended Workflow

Retain original raw sequencing data
Document all preprocessing steps
Use compressed input files
Validate input quality before assembly
Choose appropriate assembly mode

Emerging Techniques

Machine learning-assisted read filtering
Adaptive k-mer size selection
Enhanced metagenomic assembly algorithms

Citation and usage

Prjibelski, Andrey, et al. “Using SPAdes De Novo Assembler.” Current Protocols in Bioinformatics, vol. 70, no. 1, June 2020, https://doi.org/10.1002/cpbi.102.

Assembly, Tutorial

SPAdes

This post is licensed under CC BY-NC 4.0 by the author.