In this study, we present a comprehensive benchmark to systematically evaluate the performance of existing spatial omics integration methods. By providing a structured comparison and scenario-specific recommendations, our benchmark aims to guide researchers in selecting appropriate methods for spatial omics data analysis, facilitating robust and reproducible integrative studies.
In total, our benchmarking study includes 58 experimental datasets and 60 simulated datasets, derived from 33 distinct source datasets, and systematically assesses a total of 37 integration methods.
| Method | Version | GitHub Link |
|---|---|---|
| STAGATE | v1.0.0 | Link |
| DeepST | github (1daa513) | Link |
| GraphST | v1.1.1 | Link |
| PRECAST | v1.6.5 | Link |
| STG3Net | github (c94bff4) | Link |
| spatiAlign | v1.0.2 | Link |
| SEDR | github (ef48360) | Link |
| CellCharter | v0.3.5 | Link |
| BANKSY | v1.4.0 | Link |
| STADIA | v1.0.1 | Link |
| SpaDo | v1.2.0 | Link |
| STAMP | v0.1.3 | Link |
| spaVAE | github (3673cad) | Link |
| spCLUE | github (bbd2c34) | Link |
| Tacos | github (843e50b) | Link |
| stClinic | v0.0.10 | Link |
| NicheCompass | v0.2.3 | Link |
| STAligner | v1.0.0 | Link |
| INSPIRE | github (12b7516) | Link |
| stMSA | github (221730c) | Link |
| SPIRAL | v1.0 | Link |
| STitch3D | v1.0.3 | Link |
| SPACEL | v1.1.8 | Link |
| SANTO | github (b82a7b9) | Link |
| PASTE | v1.4.0 | Link |
| PASTE2 | v1.0.1 | Link |
| scSLAT | v0.3.0 | Link |
| Spateo | v1.1.0 | Link |
| moscot | v0.3.5 | Link |
| STalign | v1.0 | Link |
| GPSA | v0.8 | Link |
| CAST | v0.4 | Link |
| Method | Version | GitHub Link |
|---|---|---|
| SpatialGlue | v1.1.5 | Link |
| SMOPCA | v0.1.1 | Link |
| COSMOS | v1.0.1 | Link |
| spaMGCN | github (77dfe67) | Link |
| SSGATE | github (706bc56) | Link |
All method-specific software environments are packaged as Singularity image files (.sif), ensuring consistent and reproducible execution across different computing nodes without manual dependency management. The .sif files can be obtained from [link].
To run the pipeline, Nextflow and Singularity must be available on your system:
# Check Nextflow installation
nextflow -version
# Check Singularity installation
singularity --version
The benchmarking pipeline is implemented using Nextflow (DSL2), enabling reproducible, scalable, and modular execution across high-performance computing environments. All methods are containerized via Singularity, ensuring consistent software environments across different computing nodes.
The pipeline is organized into three main stages:
-
Data Loading (
LoadDatasets): Preprocesses and filters input AnnData objects according to the specified gene selection strategy. -
Method Execution (
Run<Method>): Each integration method runs as an independent process with method-specific resource configurations (CPU/GPU, memory, time limits). -
Evaluation (
RunEvaluation): Computes a comprehensive set of benchmarking metrics for each method output.
GPU-accelerated methods are submitted to dedicated GPU partitions with appropriate SLURM directives. Memory and time allocations are automatically adjusted based on dataset size, with large-scale datasets (e.g., >10,000 spots) receiving elevated resource budgets. Failed tasks are logged and skipped without interrupting the overall pipeline execution.
For methods with tunable hyperparameters (e.g., knn, pcs), the pipeline supports systematic parameter sweeps defined via param_space in the configuration file. When pre-optimized parameters are available (stored in best_params.csv), the pipeline automatically selects the optimal configuration for each method-dataset combination, bypassing the full search.
param_space = [
seed: [42, 101, 123, 456, 789],
knn: [5, 10, 15, 20, 30],
pcs: [10, 20, 30, 50, 100, 150, 200],
gene: ['all']
]
All random seeds are fixed at both the Python and R levels to ensure reproducibility across runs. Nextflow's built-in caching mechanism allows interrupted pipelines to resume from the last completed task without redundant computation.
