SPAdes Assembler Toolkit
The SPAdes toolkit is a powerful bioinformatics tool for genome assembly from next-generation sequencing data. It is widely used for bacterial genomes and is compatible with Illumina and other sequencing platforms. This guide provides a detailed installation tutorial, usage examples, and tips to maximize efficience
SPAdes overview
SPAdes (St. Petersburg genome assembler) is primarily developed for Illumina sequencing data but can be used for IonTorrent as well. Most SPAdes pipelines support a hybrid mode, i.e. allow using long reads (PacBio and Oxford Nanopore) as supplementary data. The package enables the assembly of bacterial isolates, single-cell genomes, metagenomes, and transcriptomes while featuring specialized modules for plasmid and RNA virus recovery. It integrates k-mer-based algorithms for read processing, graph manipulation, and sequence alignment, supporting diverse genomic analyses through a modular pipeline architecture 1.
Supported Sequencing Platforms
- Illumina (MiSeq, HiSeq, NovaSeq)
- Ion Torrent
- 454 Roche
- PacBio (with limitations)
Installation
SPAdes requires a 64-bit Linux system or Mac OS and Python (3.8 or higher) to be pre-installed on it. To obtain SPAdes you can either download binaries or download source code and compile it yourself.
Supported Operating Systems
- Ubuntu 20.04 LTS and later
- CentOS/RHEL 8.x
- macOS 10.15 (Catalina) and later
- Windows 10/11 with Windows Subsystem for Linux (WSL2)
Dependencies
- Python 3.5 or later (Linux systems typically include Python)
- CMake 3.1 or later.
Conda Installation (Recommended)
1
2
3
4
5
6
# Create a new conda environment
conda create -n spades_env python=3.8
# Activate the environment
conda activate spades_env
# Install SPAdes
conda install -c bioconda spades
Downloading SPAdes Linux binaries
1
2
3
4
5
6
7
wget https://github.com/ablab/spades/releases/download/v4.0.0/SPAdes-4.0.0-Linux.tar.gz
# Replace the URL with the latest release version.
tar -xzf SPAdes-4.0.0-Linux.tar.gz
cd SPAdes-4.0.0-Linux/bin/
./spades.py --help
# Add to PATH (optional)
export PATH=$PATH:/path/to/SPAdes-4.0.0
You can also compile SPAdes from source (requires g++ 9.0+, cmake 3.16+, zlib and libbz2).
1
2
3
4
5
6
7
8
9
10
# Prerequisites
sudo apt-get install cmake gcc g++ python3-dev
# Clone the SPAdes repository
git clone https://github.com/ablab/spades.git
cd spades
# Configure and compile
mkdir build && cd build
cmake ..
make
./bin/spades.py --help
Example of Assembling RNA-Seq data
rnaSPAdes is a transcriptome assembly tool for eukaryotic and prokaryotic short reads, supporting paired-end, single-end, and hybrid assemblies with PacBio/Nanopore reads. It has limitations including no --careful or --cov-cutoff options, specific pipeline mode constraints, and automatic k-mer size selection to prevent chimeric transcripts 1.
SPAdes command line
1
2
3
4
5
6
7
```bash
spades.py --rna \
-1 /media/kashmir/HP P900/Main/deduplicated_1.fastq\
-2 /media/kashmir/HP P900/Main/deduplicated_2.fastq\
-m 60 -t 32 -k 33,55,77,99,127\
-o /media/kashmir/HP P900/Main/spades_output
```
SPAdes parameters.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
System information:
SPAdes version: 3.13.1
Python version: 3.10.12
OS: Linux-6.8.0-49-generic-x86_64-with-glibc2.35
Output dir: /media/kashmir/HP P900/Main/spades_output
Mode: ONLY assembling (without read error correction)
Debug mode is turned OFF
Dataset parameters:
RNA-seq mode
Reads:
Library number: 1, library type: paired-end
orientation: fr
left reads: ['/media/kashmir/HP P900/Main/deduplicated_1.fastq']
right reads: ['/media/kashmir/HP P900/Main/deduplicated_2.fastq']
interlaced reads: not specified
single reads: not specified
merged reads: not specified
Assembly parameters:
k: [33, 55, 77, 99, 127]
Repeat resolution is enabled
Mismatch careful mode is turned OFF
MismatchCorrector will be SKIPPED
Coverage cutoff is turned OFF
Other parameters:
Dir for temp files: /media/kashmir/HP P900/Main/spades_output/tmp
Threads: 32
Memory limit (in Gb): 60
Look for key files
1
2
3
4
5
6
7
8
9
10
11
12
```bash
output_dir/
├── corrected/ # Error-corrected reads
├── scaffolds.fasta # Final scaffolds
├── contigs.fasta # Final contigs
├── assembly_graph.fastg # Assembly graph in FASTG format
├── contigs.paths # Paths in the assembly graph
├── scaffolds.paths # Scaffold paths
├── params.txt # Parameters used
└── spades.log # Log file
``` >The output contigs.fasta should contain high-quality, assembled contigs in FASTA format. Further validation with tools like QUAST is recommended to confirm assembly accuracy.
The end of the log file
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
```bash
===== Assembling finished. Used k-mer sizes: 33, 55, 77, 99, 127
* Assembled transcripts are in "/media/kashmir/HP P900/Main/spades_output/transcripts.fasta"
* Paths in the assembly graph corresponding to the transcripts are in "/media/kashmir/HP P900/Main/spades_output/transcripts.paths"
* Hard filtered transcripts are in "/media/kashmir/HP P900/Main/spades_output/hard_filtered_transcripts.fasta"
* Soft filtered transcripts are in "/media/kashmir/HP P900/Main/spades_output/soft_filtered_transcripts.fasta"
* Assembly graph is in "/media/kashmir/HP P900/Main/spades_output/assembly_graph.fastg"
* Assembly graph in GFA format is in "/media/kashmir/HP P900/Main/spades_output/assembly_graph_with_scaffolds.gfa"
======= SPAdes pipeline finished.
SPAdes log can be found here: /media/kashmir/HP P900/Main/spades_output/spades.log
Thank you for using SPAdes!
``` ### rnaSPAdes output rnaSPAdes generates multiple output files: - `transcripts.fasta`: Main output file (recommended for most projects) - `hard_filtered_transcripts.fasta`: Long, reliable, high-expression transcripts - `soft_filtered_transcripts.fasta`: Short, low-expression transcripts
Contig names follow format: >NODE_97_length_6237_cov_11.9819_g8_i2, with components representing node number, length, coverage, gene group, and transcript index.
References
Feedback and bug reports Please, leave your comments and bug reports at SPAdes GitHub repository tracker.
-
Prjibelski, Andrey, et al. “Using SPAdes De Novo Assembler.” Current Protocols in Bioinformatics, vol. 70, no. 1, June 2020, https://doi.org/10.1002/cpbi.102. ↩︎ ↩︎2