SPAdes Basic Usage
Comprehensive guide to SPAdes genome assembly input: FASTQ file preparation, quality control, and preprocessing for accurate sequencing data analysis across single-cell, paired-end, and metagenomic research applications.
SPAdes Basic Input
SPAdes takes as input paired-end reads, mate-pairs and single (unpaired) reads in FASTA and FASTQ (can be gzipped) formats. SPAdes supports multiple input file formats for different sequencing technologies and experimental designs:
- FASTQ Format (Recommended)
- Standard format for high-throughput sequencing data
- Contains sequence reads with quality scores
- Supported file extensions: .fastq, .fq and Compression Support: .gz, .bz2
FASTQ Format Structure
1
2
3
4
@Sequence_Identifier
NUCLEOTIDE_SEQUENCE
+
QUALITY_SCORES
Supported Input Types
- Paired-End Reads
Most common input type; Two separate files for forward and reverse reads
Command format:
1
spades.py -1 forward_reads.fastq -2 reverse_reads.fastq
- Single-End Reads
Single file with unpaired reads; Command format:
1
spades.py -s single_reads.fastq - Mate-Pair Libraries
Long-range paired reads; Use
--mpor--mp-rfflags1
spades.py --mp1-1 mate_pair1_1.fastq --mp1-2 mate_pair1_2.fastq
- Multiple Libraries
Combine different sequencing libraries
1 2
spades.py -1 pe_forward.fastq -2 pe_reverse.fastq \ --mp1-1 mp_forward.fastq --mp1-2 mp_reverse.fastq
Input Quality Considerations
Read Quality Metrics
| Metric | Recommendation | Minimum Threshold |
|---|---|---|
| Read Length | ≥ 75 base pairs | 50 base pairs |
| Quality Encoding | Phred+33 or Phred+64 | - |
| Average Quality Score | ≥ Q30 | Q20 |
Preprocessing Recommendations
Quality Trimming: Remove low-quality bases from read ends; Tools: Trimmomatic, fastp, cutadapt
Adapter Removal: Remove sequencing adapters; Use tools like Cutadapt or AdapterRemoval
Read Filtering: Remove short reads; Filter out reads with excessive N bases; Remove reads with low complexity
Input Preparation Workflow Example
1
2
3
4
5
6
7
8
9
10
11
12
# Example preprocessing pipeline
# 1. Quality and adapter trimming
trimmomatic PE input_R1.fastq input_R2.fastq \
output_R1_paired.fastq output_R1_unpaired.fastq \
output_R2_paired.fastq output_R2_unpaired.fastq \
ILLUMINACLIP:adapters.fa:2:30:10 \
LEADING:3 TRAILING:3 MINLEN:75
# 2. SPAdes assembly
spades.py -1 output_R1_paired.fastq \
-2 output_R2_paired.fastq \
--careful -o assembly_output
Recommended Workflow
- Retain original raw sequencing data
- Document all preprocessing steps
- Use compressed input files
- Validate input quality before assembly
- Choose appropriate assembly mode
Emerging Techniques
- Machine learning-assisted read filtering
- Adaptive k-mer size selection
- Enhanced metagenomic assembly algorithms
Citation and usage
Prjibelski, Andrey, et al. “Using SPAdes De Novo Assembler.” Current Protocols in Bioinformatics, vol. 70, no. 1, June 2020, https://doi.org/10.1002/cpbi.102.