Entrain ligand-velocity analysis
entrain_velo.Rmd
Entrain ligand-velocity analysis from an Anndata object.
This document outlines a basic Entrain analysis starting from an
scverse
anndata
object with pre-calculated
velocities. By the end of this document, you will identify ligands that
are driving the velocities in your data.
Prior assumptions: Entrain-Velocity Analysis requires the following:
- Single-Cell RNA data on a dataset of cells differentiating as well as data on their microenvironmental niche. The niche can be contained in a separate dataset (e.g., if you have sequenced in a separate capture) or in the same dataset as the differentiating cells.
- A
.h5ad
object produced by scvelo that contains robust velocity data. Please ensure that your biology you are interested in is conducive to producing trustworthy velocities.. - You have (some) idea of which cell clusters comprise the ‘niche’, or ligand-expressing cells, in your dataset.
library("Seurat"); library("SeuratWrappers");library("reticulate"); library("dplyr"); library("circlize");
library("entrain")
Pre-processing and loading required data.
Conda setup.
Because RNA velocity is natively done in python, this workflow
involves calling python scripts from R to process the anndata object.
The interfacing between python and R is handled by the package
reticulate
. We recommend using conda to setup a python
environment that contains the required packages for velocity
analysis.
reticulate::conda_create("r-reticulate-entrain")
reticulate::conda_install("r-reticulate-entrain", packages = c("scanpy", "leidenalg", "python-igraph", "adjustText", "loompy"), channel="conda-forge")
reticulate::conda_install("r-reticulate-entrain", packages = "scvelo", pip=TRUE)
reticulate::use_condaenv("r-reticulate-entrain")
Loading in required data
We will download a developing mouse brain atlas dataset at gestation day 10/11. This dataset contains both RNA expression and RNA velocities needed for our analysis. The following code block downloads the .gz file, decompresses it, and converts it into a Seurat object for analysis.
library("SeuratDisk")
options(timeout=3600)
h5ad_path <- "Manno_E10_E11.h5ad"
download.file("https://zenodo.org/record/7874401/files/Manno_E10_E11.h5ad.gz", "Manno_E10_E11.h5ad.gz")
system("gunzip Manno_E10_E11.h5ad.gz")
Convert("Manno_E10_E11.h5ad", dest = "h5seurat", overwrite = FALSE, verbose = TRUE)
obj <- LoadH5Seurat("Manno_E10_E11.h5seurat", assays="RNA")
Entrain relies on the NicheNet database of ligand-receptor-gene networks for the prior knowledge needed to make conclusions about environmentally-influenced trajectories. The genes here have been pre-converted from the original human genes to mouse orthologs.
Data at a glance
The data consists of cells in the developing mouse brain at day 10 after gestation. This comprises a population of neuroblasts rapidly differentiating to neurons (our cells we are going to analyse), as well as their complex microenvironment made up of cells from the endoderm, mesoderm, fibroblastic, blood, and immune compartments.
Run Entrain from a velocity anndata object.
Entrain fully integrates the scvelo package for velocity analysis. Below are the main steps:
Clustering Velocities
- First, cluster velocities into groups of similar velocity dynamics.
The resultant velocity clusters (or
vclusters
) are a separate entity from the usual cell annotation clusters. Cells of the same celltype cluster may not necessarily be in the same velocity cluster.
adata <- entrain_cluster_velocities(h5ad_path, resolution=0.1)
We can see that the velocity clusters roughly, but not exactly, correspond to cell type clusters, indicating that velocities are moderately correlated with cell type.
Fitting Ligands to Velocities
The fitting is encapsulated in the single function
entrain_velocity()
, shown below. This function firstly runs
scvelo.tl.recover_dynamics()
to rank genes based on their
fit-likelihoods, a measure of a gene’s fit to the ordinary differential
equation models that underlie RNA velocity derivation. Higher values
denote more confidence in the velocity. Next, it runs Entrain analysis
on the velocity likelihoods on a cluster-by-cluster basis.
We will also save the resulting .h5ad object for plotting our results afterwards.
sender_cluster_names <- c("Blood", "Fibroblast", "Immune", "Mesoderm")
Idents(obj) <- "Class"
save_adata = "../../entrain_vignette_data/manno_entrain_result.h5ad"
obj_v <- entrain_velocity(receiver_obj = obj,
adata = adata,
save_adata = save_adata,
sender_cluster_key = "Class",
sender_clusters = sender_cluster_names,
lr_network = lr_network_mm,
num_top_genes = NULL,
ligand_target_matrix = ligand_target_matrix_mm)
- Note that as of July 2023, there is an existing bug in
scvelo.tl.recover_dynamics()
. Using `numpy version 1.23.5 may fix the problem. Also see #1058.
Visualizing results
At a glance
We can visualize the top ranked ligands at a glance with the function
plot_velocity_ligands()
. By default, the function only
visualizes results with positive variance explained, with the assumption
that negative variance explained denotes poor model accuracy.
plot_velocity_ligands(adata=save_adata,
plot_output_path = "manno_entrain.png")
This suggests a number of ligands responsible for neurogenic differentiation processes. These are well-established in literature as key drivers for cell differentiation.
In detail
Entrain-Velocity provides detailed plotting functions for visualizing extracellular and intracellular regulation predicted to be driving the observed velocities.
Sender Cell Visualization
Visualization of cells that express the ligands found in our
analysis. Cells are coloured based on mean expression of the
top_n_ligands
predicted to be driving velocities in the
velocity cluster.
vclusters_to_plot <- c("vcluster0", "vcluster1", "vcluster2", "vcluster3")
save_adata = "../../entrain_vignette_data/manno_entrain_result.h5ad"
plot_sender_influence_velocity(adata= save_adata,
velocity_clusters = vclusters_to_plot,
title = "velocity_sender_influences.png")
####
Visualizing ligand importances Visualization of ligands and their
relative importances, or contribution, towards each
vcluster
.
g<-velocity_ligand_importance_heatmap(obj_v,
n_top_ligands=5) +
ggplot2::scale_fill_viridis_c(option="inferno")
g
Visualizing high velocity likelihood genes
Commonly, a small proportion of the total genes contribute the
majority of the observed RNA velocity dynamics. You may wish to
visualize these genes by plotting the top likelihood genes as a sanity
check to confirm that scvelo
is appropriate for your
biological system. These genes are the ones found by scvelo
to most robustly fit the underlying theoretical model. The function
plot_velocity_likelihoods_heatmap()
plots, as a heatmap,
the n_top_genes
ranked by velocity likelihood for each
velocity cluster.
g<-plot_velocity_likelihoods_heatmap(obj_v,
n_top_genes=5)
g
Visualizing intracellular regulation downstream of the ligand-receptor pair.
The function plot_velocity_ligand_targets
extracts
regulatory information from NicheNetR
to visualize the
predicted regulatory linkages between the n_top_ligands
and
the n_top_targets
high velocity likelihood genes.
g<-plot_velocity_ligand_targets(obj_v,
velocity_cluster="vcluster1",
n_top_ligands=10L,
ligand_target_matrix=ligand_target_matrix_mm)
g
Visualization with a circos plot
Circos plots are a popular format for visualizing connections between
ligand receptor pairs. We recreate this method with the function
draw_entrain_circos()
, making sure to feed it NicheNet’s
weighted database: weighted_networks
.
weighted_networks <- readRDS(url("https://zenodo.org/record/7874401/files/weighted_networks_mm.rds"))
draw_entrain_circos(obj = obj_v,
vcluster= "vcluster3",
lr_network = lr_network_mm,
weighted_networks = weighted_networks)
## Warning in RColorBrewer::brewer.pal(length(ligand_types), palette): minimal value for n is 3, returning requested palette with 3 different levels