Entrain ligand-velocity analysis in Python
entrain_velo_python.Rmd
Entrain ligand-velocity analysis from an Anndata object in Python.
This document outlines a basic Entrain analysis starting from an
scverse
anndata
object with pre-calculated
velocities. By the end of this document, you will identify ligands that
are driving the velocities in your data.
Prior assumptions: Entrain-Velocity Analysis requires the following:
- Single-Cell RNA data on a dataset of cells differentiating as well as data on their microenvironmental niche.
- A
.h5ad
object produced by scvelo that contains robust velocity data. Please ensure that your biology you are interested in is conducive to producing trustworthy velocities.. - You have (some) idea of which cell clusters comprise the ‘niche’, or ligand-expressing cells, in your dataset.
Pre-processing and loading required data.
Conda setup.
We recommend using conda to setup a python environment that contains the required packages for velocity analysis.
-n entrain-velocity
conda create -velocity
conda activate entrain-c conda-forge scanpy leidenalg python-igraph adjustText loompy matplotlib seaborn scikit-learn
conda install
conda install scvelo-omnipath
pip install pypath pip install entrain
We can then load in our libraries in python
import entrain as en
import pandas as pd
import anndata as ad
import scvelo as scv
Downloading required data
First, we will download an example dataset of a developing mouse brain atlas dataset at gestation day 10/11, as well as required regulatory network information.
Entrain relies on the NicheNet database of ligand-receptor-gene networks for the prior knowledge needed to make conclusions about environmentally-influenced trajectories. The genes here have been pre-converted from the original human genes to mouse orthologs.
= pd.read_csv("ligand_target_matrix_mm.csv",index_col=0)
ligand_target_matrix_mm = ad.read_h5ad("Manno_E10_E11.h5ad") adata
Data at a glance
The data consists of cells in the developing mouse brain at day 10 after gestation. This comprises a population of neuroblasts rapidly differentiating to neurons (our cells we are going to analyse), as well as their complex microenvironment made up of cells from the endoderm, mesoderm, fibroblastic, blood, and immune compartments.
scv.set_figure_params()
sc.pl.umap(adata, color = "Class", save = "manno_python_umap.png")
Run Entrain from a velocity anndata object.
Entrain fully integrates the scvelo package for velocity analysis. Below are the main steps:
Clustering Velocities
- First, cluster velocities into groups of similar velocity dynamics.
The resultant velocity clusters (or
vclusters
) are a separate entity from the usual cell annotation clusters. Cells of the same celltype cluster may not necessarily be in the same velocity cluster.
adata = en.cluster_velocities(adata, resolution=0.05)
en.plot_velocity_clusters_python(adata,
plot_file = "plot_velocity_clusters.png",
velocity_cluster_key = "velocity_clusters")
We can see that the velocity clusters roughly, but not exactly, correspond to cell type clusters, indicating that velocities are moderately correlated with cell type.
Note that as of July 2023, there is a bug in scvelo
that prevents plotting. See https://github.com/theislab/scvelo/issues/1098
Recover Dynamics
Velocity likelihoods for each cluster are then calculated via scvelo, to feed into later Entrain analysis.
= en.recover_dynamics_clusters(adata,
adata = 10,
n_jobs = True,
return_adata =None) n_top_genes
- Note that as of July 2023, there is an existing bug in
scvelo.tl.recover_dynamics()
. Using `numpy version 1.23.5 may fix the problem. Also see #1058.
If some of your clusters contain few cells, you may get numerous
warnings
e.g. WARNING: TDRD6 not recoverable due to insufficient samples.
.
This is expected in clusters that contain few cells. If you’d like to
remove these clusters from further analysis, you can specify the
argument vclusters =
in the next step.
Fitting Ligands to Velocities
We will also save the resulting .h5ad object for plotting our results afterwards.
= ["Blood", "Fibroblast", "Immune", "Mesoderm"]
sender_cluster_names = en.get_velocity_ligands(adata,
adata ="mouse",
organism=sender_clusters,
sender_clusters= "Class",
sender_cluster_key =ligand_target_matrix_mm) ligand_target_matrix
Visualizing results
At a glance
We can visualize the top ranked ligands at a glance with the function
plot_velocity_ligands_python
. By default, the function only
visualizes results with positive variance explained, with the assumption
that negative variance explained denotes poor model accuracy.
en.plot_velocity_ligands_python(adata,="plasma",
cell_palette= "black",
velocity_cluster_palette ="velocity_clusters",
color= "manno_entrain_python.png") plot_output_path
Sender Cell Visualization
Visualization of cells that express the ligands found in our
analysis. Cells are coloured based on mean expression of the
top_n_ligands
predicted to be driving velocities in the
velocity cluster.
= ["vcluster0", "vcluster1", "vcluster2", "vcluster3"]
vclusters_to_plot plot_sender_influence_velocity(adata= adata,
velocity_clusters = vclusters_to_plot,
title = "velocity_sender_influences_python.png")
Visualizing ligand importances
Visualization of ligands and their relative importances, or
contribution, towards each vcluster
.
en.plot_ligand_importance_heatmap(adata, filename="importance_heatmap.png")
Visualizing high velocity likelihood genes
Commonly, a small proportion of the total genes contribute the
majority of the observed RNA velocity dynamics. You may wish to
visualize these genes by plotting the top likelihood genes as a sanity
check to confirm that scvelo
is appropriate for your
biological system. These genes are the ones found by scvelo
to most robustly fit the underlying theoretical model. The function
plot_likelihoods_heatmap
plots, as a heatmap, the
n_top_genes
ranked by velocity likelihood for each velocity
cluster.
en.plot_likelihoods_heatmap(adata, filename="manno_python_likelihoods_heatmap.png")
Visualizing intracellular regulation downstream of the ligand-receptor pair.
The function plot_ligand_targets_heatmap
extracts
regulatory information from NicheNetR
to visualize the
predicted regulatory linkages between the n_top_ligands
and
the top velocity likelihood genes.
en.plot_ligand_targets_heatmap(
adata,
ligand_target_matrix,
velocity_cluster = "vcluster1",
filename = "velocity_ligand_targets.png")