Skip to content

Paul-YPH/SOI_Bench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

21 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SOI_Bench

Charting the Landscape of Spatial Omics Integration: A Comprehensive Benchmarking Study

Penghui Yang, Xiang Zhou*

In this study, we present a comprehensive benchmark to systematically evaluate the performance of existing spatial omics integration methods. By providing a structured comparison and scenario-specific recommendations, our benchmark aims to guide researchers in selecting appropriate methods for spatial omics data analysis, facilitating robust and reproducible integrative studies.

Image text

Methods information

In total, our benchmarking study includes 58 experimental datasets and 60 simulated datasets, derived from 33 distinct source datasets, and systematically assesses a total of 37 integration methods.

Multi-slice Integration

Method Version GitHub Link
STAGATE v1.0.0 Link
DeepST github (1daa513) Link
GraphST v1.1.1 Link
PRECAST v1.6.5 Link
STG3Net github (c94bff4) Link
spatiAlign v1.0.2 Link
SEDR github (ef48360) Link
CellCharter v0.3.5 Link
BANKSY v1.4.0 Link
STADIA v1.0.1 Link
SpaDo v1.2.0 Link
STAMP v0.1.3 Link
spaVAE github (3673cad) Link
spCLUE github (bbd2c34) Link
Tacos github (843e50b) Link
stClinic v0.0.10 Link
NicheCompass v0.2.3 Link
STAligner v1.0.0 Link
INSPIRE github (12b7516) Link
stMSA github (221730c) Link
SPIRAL v1.0 Link
STitch3D v1.0.3 Link
SPACEL v1.1.8 Link
SANTO github (b82a7b9) Link
PASTE v1.4.0 Link
PASTE2 v1.0.1 Link
scSLAT v0.3.0 Link
Spateo v1.1.0 Link
moscot v0.3.5 Link
STalign v1.0 Link
GPSA v0.8 Link
CAST v0.4 Link

Spatial Multi-omics Integration

Method Version GitHub Link
SpatialGlue v1.1.5 Link
SMOPCA v0.1.1 Link
COSMOS v1.0.1 Link
spaMGCN github (77dfe67) Link
SSGATE github (706bc56) Link

Requirements and Installation

All method-specific software environments are packaged as Singularity image files (.sif), ensuring consistent and reproducible execution across different computing nodes without manual dependency management. The .sif files can be obtained from [link].

To run the pipeline, Nextflow and Singularity must be available on your system:

# Check Nextflow installation

nextflow  -version

  

# Check Singularity installation

singularity  --version

Evaluation Pipeline

Workflow Management

The benchmarking pipeline is implemented using Nextflow (DSL2), enabling reproducible, scalable, and modular execution across high-performance computing environments. All methods are containerized via Singularity, ensuring consistent software environments across different computing nodes.

The pipeline is organized into three main stages:

  1. Data Loading (LoadDatasets): Preprocesses and filters input AnnData objects according to the specified gene selection strategy.

  2. Method Execution (Run<Method>): Each integration method runs as an independent process with method-specific resource configurations (CPU/GPU, memory, time limits).

  3. Evaluation (RunEvaluation): Computes a comprehensive set of benchmarking metrics for each method output.

Resource Configuration

GPU-accelerated methods are submitted to dedicated GPU partitions with appropriate SLURM directives. Memory and time allocations are automatically adjusted based on dataset size, with large-scale datasets (e.g., >10,000 spots) receiving elevated resource budgets. Failed tasks are logged and skipped without interrupting the overall pipeline execution.

Parameter Search

For methods with tunable hyperparameters (e.g., knn, pcs), the pipeline supports systematic parameter sweeps defined via param_space in the configuration file. When pre-optimized parameters are available (stored in best_params.csv), the pipeline automatically selects the optimal configuration for each method-dataset combination, bypassing the full search.

param_space = [

seed: [42, 101, 123, 456, 789],

knn: [5, 10, 15, 20, 30],

pcs: [10, 20, 30, 50, 100, 150, 200],

gene: ['all']

]

Reproducibility

All random seeds are fixed at both the Python and R levels to ensure reproducibility across runs. Nextflow's built-in caching mechanism allows interrupted pipelines to resume from the last completed task without redundant computation.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages