`bayesVG`

bayesVG
Installation
- R package
- Stan-related dependencies
Usage
Contact information
References

`bayesVG`

Installation

R package

You can install the most recent version of bayesVG using:

remotes::install_github("jr-leary7/bayesVG")

Stan-related dependencies

Two primary dependencies of bayesVG are the CmdStan software and the cmdstanr R package. These must be installed manually like so, making sure to compile the underlying C++ code with the appropriate optimizations:

install.packages("cmdstanr", repos = c("https://stan-dev.r-universe.dev", getOption("repos")))
library(cmdstanr)
install_cmdstan(cores = 4L,
                overwrite = TRUE,
                cpp_options = list("CXXFLAGS += -O3 -march=native -mtune=native -ffast-math"), 
                check_toolchain = FALSE)
check_cmdstan_toolchain()

Usage

Libraries

library(Seurat)
library(bayesVG)

HVG detection

Data

First, we load the 10X Genomics pbmc3k dataset, which is composed of 2,700 peripheral blood mononuclear cells from a single healthy donor.

data("seu_pbmc")

Modeling

Now we’re able to model gene expression using the meanfield variational inference (VI) algorithm, summarize the approximate posterior distribution of variance for each gene, and classify the top 3,000 most-variable genes as HVGs. The findVariableFeaturesBayes() function can take as input either a Seurat or a SingleCellExperiment object.

seu_pbmc <- findVariableFeaturesBayes(seu_pbmc, 
                                      n.cells.subsample = 500L, 
                                      algorithm = "meanfield",
                                      save.model = TRUE) %>% 
            classifyHVGs(n.HVG = 3000L)

We can generate the summary table (which is sorted by default) and extract the HVGs like so. These genes can then be used as the basis for downstream analyses such as PCA, clustering, UMAP visualization, etc.

summary_hvg <- getBayesianGeneStats(seu_pbmc)
top3k_hvgs <- summary_hvg$gene[1:3000]

The extractModel() function pulls out the fitted brms model from the object’s unstructured metadata, allowing the user to perform posterior predictive checks (PPCs) and other diagnostics. Note: this requires that the HVG identification function be run with save.model = TRUE.

hvg_model <- extractModel(seu_pbmc)

SVG detection

Data

First, we load the 10X Genomics anterior mouse brain dataset that we’ve included in the package.

data("seu_brain")

Before running bayesVG for SVG detection it’s necessary to normalize the expression data and identify a set of 3,000 naive HVGs.

seu_brain <- NormalizeData(seu_brain, verbose = FALSE)
naive_hvgs <- getNaiveHVGs(seu_brain, n.hvg = 3000L)

Modeling

Now we can model gene expression with an approximate multivariate hierarchical Gaussian process (GP) via the meanfield VI algorithm, summarize the spatial component of variance for each gene, and classify the top 1,000 most spatially variable genes as SVGs. The findSpatiallyVariableFeaturesBayes() function can take as input either a Seurat or a SpatialExperiment object.

seu_brain <- findSpatiallyVariableFeaturesBayes(seu_brain, 
                                                naive.hvgs = naive_hvgs, 
                                                likelihood = "gaussian", 
                                                kernel = "matern", 
                                                kernel.smoothness = 2.5, 
                                                algorithm = "meanfield", 
                                                n.cores = 4L, 
                                                save.model = TRUE) %>% 
             classifySVGs(n.SVG = 1000L)

We can extract the summary table (which like the HVG summary table is sorted by default) and extract the top 1,000 SVGs like so. These genes can then be used as the basis for downstream analyses such as PCA, spatial clustering, UMAP visualization, etc.

summary_svg <- getBayesianGeneStats(seu_brain)
top1k_svgs <- summary_svg$gene[1:1000]

Since we ran the SVG identification model function with save.model = TRUE, we can extract the cmdstanr fit like so:

svg_model <- extractModel(seu_brain)

We can cluster the SVG set (using a very fast modified Bayesian Gaussian mixture model) into spatial modules as shown below. The clustering function returns a PCA embedding of the SVGs, a table of the soft cluster assignment probabilities, and the log-likelihood and Bayesian information criterion (BIC) of the clustering. The clustering code uses the fullrank VI algorithm by default due to the multi-modality and high correlations of the approximate posterior distribution.

svg_clusters <- clusterSVGsBayes(seu_brain, 
                                 svgs = top1k_svgs, 
                                 n.clust = 5L, 
                                 n.cores = 4L)

Next, we can score the SVG clusters using UCell under the hood. These scores can then be visualized using e.g., spatial scatterplots, violin plots, or UMAPs.

seu_brain <- scoreSpatialModules(seu_brain, svg.clusters = svg_clusters)

Lastly, with our spatial modules identified and scores we can perform gene set enrichment analysis (GSEA) on each module; this is done internally using gProfiler2. All you need to do is provide the results from clusterSVGsBayes() and specify the correct genus & species e.g., “hsapiens” for human or “mmusculus” for mouse.

enrich_res <- enrichSpatialModules(svg_clusters, species = "mmusculus")

Contact information

This package is developed & maintained by Jack R. Leary. Feel free to reach out by opening an issue or by email (j.leary@ufl.edu) if more detailed assistance is needed.

References

Jordan, M. et al. An Introduction to Variational Methods for Graphical Models. Machine Learning (1999).
Burkner, P. Bayesian Item Response Modeling in R with brms and Stan. Journal of Statistical Software (2021).
Gabry, J. et al. cmdstanr: R Interface to CmdStan. (2024).
Rasmussen, C. and Williams, C. Gaussian Processes for Machine Learning. The MIT Press (2005).
Riutort-Mayol, G. et al. Practical Hilbert Space Approximate Bayesian Gaussian Processes for Probabilistic Programming. arXiv (2020).
Carpenter, B. et al. Stan: A Probabilistic Programming Language. Journal of Statistical Software (2017).

Name		Name	Last commit message	Last commit date
Latest commit History 274 Commits
.github		.github
R		R
data-raw		data-raw
data		data
inst		inst
man		man
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
NEWS.md		NEWS.md
README.Rmd		README.Rmd
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`bayesVG`

Installation

R package

Stan-related dependencies

Usage

Libraries

HVG detection

Data

Modeling

SVG detection

Data

Modeling

Contact information

References

About

Uh oh!

Releases 8

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

bayesVG

Installation

R package

Stan-related dependencies

Usage

Libraries

HVG detection

Data

Modeling

SVG detection

Data

Modeling

Contact information

References

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 8

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

`bayesVG`

Packages