Notes from Tania: Bookmarks: single cell RNA-seq tutorials and tools

These are my bookmarks for single cell transcriptomics resources and tutorials.

scRNA-seq introductions

How to make R objects for single cell data, e.g. SingleCellExperiment, SummarizedExperiment

How to take a spreadsheet with a matrix and convert it to the format needed for many other single cell RNA-seq tutorials, e.g. if you download a .csv.gz file from NCBI GEO

Getting Started with Seurat v4 (Satija lab tutorials list)

Many tutorials here, for different scRNA-seq goals

Guided clustering tutorial with 3000 PBMC cells

Setup Seurat object
Standard pre-processing workflow & quality control
Data normalization
Identifying highly variable features (genes)
Clustering, UMAP/tSNE plots
Differential gene expression analysis

Basics of single cell analysis with Bioconductor

University of Cambridge intro to single cell RNA-seq analysis

Identification of low-quality cells using MADs values

Lectures, textbooks, video tutorials, & interpretation

Determining the optimal number of clusters with elbow plots

OSCA Basics: Basics of Single-Cell Analysis with Bioconductor by Robert Amezquita, Aaron Lun, Stephanie Hicks, Raphael Gottardo (2024)

Quality control - various QC metrics, identifying & removing low quality cells, diagnostic plots
Normalization - library size, deconvolution, spike-ins, scaling and log-transformation
Feature selection - quantifying variation, sequencing noises, batch effects, etc
Dimensionality reduction - PCA plots
Clustering - k means clustering, hierarchical clustering, subclustering
Marker gene detection - dot plots, expression plots
Cell type annotation - using other references, specific genes, markers, diagnostic heatmaps

SingleR Book: Assigning cell types with SingleR by Aaron Lun (2024)

Using references
Annotation diagnostics
Using multiple references
Exploiting cell ontology
Example dataset from pancreas

scran:

Using scran to analyze single-cell RNA-seq data

Automated PC choice
Graph-based clustering
Identifying marker genes
Detecting correlated genes
Converting to other formats allows for pseudobulk analysis with edgeR or DESeq2

Seurat:

Seurat v5 Command Cheat Sheet
Dimensional Reduction Vignette - explains where things are stored and how to access them
Combining Two 10X Runs - how to merge different samples for a joint analysis

Merge 2+ Seurat objects with Seurat's merge function. By default, Seurat uses the raw counts and doesn't keep normalization.
Merge normalized data by adding merge.data = TRUE

Introduction to scRNA-seq integration (2023)

Split layers
Analyze without integration
Integrated data
Identify conserved cell type markers
Identify differential expressed genes across conditions
Alternatively, perform integration with SCTransform-normalized datasets

Integrative analysis in Seurat v5 - how to combine data from different samples or experiments
Tips for integrating large datasets in Seurat v4.3 - what steps to run in what order, how to reduce computational needs

Create a list of Seurat objects to integrate
Perform normalization, feature selection, and scaling separately for each dataset
Run PCA on each object in the list
Integrate datasets, and proceed with joint analysis

10x Genomics tutorial:

Interpreting Cell Ranger Web Summary Files for Single Cell Gene Expression Assays, CG000329 . Highly recommended. They show what plot results look like for typical (good) samples, heterogeneous samples, and compromised (bad) samples.
Human reference genome annotations

YouTube tutorials:

"Seurat Object Explained: Beginner's Guide and Demo" by chatomics
Introduction to scRNA-seq Data Analysis by 10x Genomics: Cell Ranger, Loupe browser, cloud analysis
Quality Assessment Using the Cell Ranger Web Summary by 10x Genomics
"Statistical analysis of single-cell RNA-seq data with multiple samples" by DahShu

Data visualization - types of plots and how to make them

Data visualization methods in Seurat - ridge plots, violin plots, feature plots, dot plots, heatmaps, visualizing coexpression

Split Dot Plot - color code by an additional variable such as a condition

Clustered dot plot using ComplexHeatmap

Let's Plot 7: Clustered Dot Plots in the ggverse (Eye Informatician)

tSNE vs UMAP, two methods to show clustering:

Understanding UMAP (Andy Coenen, Adam Pearce)
tSNE vs UMAP: Global Structure

SCpubr - an R package to make publication ready plots for single cell RNA-seq

Dim plots - dimensional reduction, similar to PCA or UMAP plots
Feature plots - dim plot with a continuous scale for gene expression visualization across clusters
Nebulosa plots - computes a density plot for specific gene markers so you can see where they are most expressed
Bee Swarm plots
Violin plots
Ridge plots - multiple violin plots together
Dot plots - show gene expression of different markers across different clusters
Bar plots
Box plots
Geyser plots
Alluvian plots
Sankey plots
Chord Diagram plots - circos plots
Volcano plots

Doublet detection and visualization

Cell labeling, label transfer, single cell reference mapping

Mapping and annotating query datasets (Satija lab, Oct 2023)

Web Resources for Cell Type Annotation (10x Genomics Analysis Guide, 2024)

Azimuth: App for reference based single cell analysis - helps annotate clusters. You can upload the Seurat object .rds file to the app and get predictions. Troubleshoot error with Seurat v5.

Install signac first, otherwise you may get an installation error
Azimuth annotation on Seurat

Combining samples

Theory -

See "Statistical analysis of single-cell RNA-seq data with multiple samples" (YouTube, 1hr lecture)

Q & A -

Recommendations for combining multiple 10x runs into one SingleCellExperiment

Process each sample separately for initial QC steps (cell filtering, removing doublets)
Take notes on QC of each individual sample
Beware that batch correction steps can remove differentially expressed genes.

"You can avoid this with careful experimental design, e.g., paired WT/KO samples in each batch so that correction cannot remove genotype differences. You can also detect DE genes between conditions by summing cells within each batch (possibly per population) and treating them as pseudo-bulk for edgeR analyses (see https://doi.org/10.1093/biostatistics/kxw055; https://pubmed.ncbi.nlm.nih.gov/28334062/). This complements a batch-corrected single-cell-level analysis, e.g., when a treatment causes both a systematic DE and changes in population composition."

When to combine samples in the pre-processing of 10x scRNA-seq data? (2019)

Pre-process each sample separately (cell filtering, removing empty droplets, doublets, etc)
Cluster each sample independently for QC purposes - to check samples
Cluster samples together afterward?

The difference between merge and integration with Seurat objects (2021)

Only merge data before pre-processing if using technical replicates with low batch effects?

How to handle large Seurat objects (5GB+) in R?

Increase R memory size
Switch to a high performance computing machine when it becomes too much

Tutorials -

Seurat - Combining Two 10X Runs (10/2023)

Code to use Read10X function on separate datasets
Code to combine data and add dataset IDs with merge function

Commands for Seurat object integration & pseudobulk analysis

Merge objects (without integration)
Merge objects (with integration)
Pseudobulk analysis - group cells together based on multiple categories

Differential expression testing (3/2024)

Compare different cell types within the same sample
Compare same cell types across different samples
Aggregate gene expression to perform pseudobulk DE analysis with DESeq2, edgeR, or limma

How to define batch -

This assumes you have a small spreadsheet "donor_metadata" which includes rows=samples and columns=metadata including a column labeled "batch". The "ID" columns are the sample names and these should match the IDs used during import of individual Seurat datasets.

rownames(donor_metadata) <- donor_metadata$ID

## Create dataframe with batch info for every cell
cellBatch = dplyr::left_join(
  x = data.frame(
            rownames = rownames(pbmc@meta.data),
            ID = pbmc@meta.data$orig.ident),
  y = donor_metadata[, c("ID", "batch")],
  by = "ID")
head(cellBatch)

How to assign Azimuth labels and split layers by batch -

## Azimuth labeling
Layers(pbmc)
pbmc <- JoinLayers(pbmc)  ## to fix Azimuth error
pbmc <- Azimuth::RunAzimuth(pbmc, reference = "pbmcref")
pbmc

Layers(pbmc)

## See cell type annotations added
head(pbmc@meta.data, 10)

## Split layers only AFTER running Azimuth. Define the column to use for batches.
pbmc[["RNA"]] <- split(pbmc[["RNA"]], f = pbmc$batch)
Layers(pbmc[["RNA"]])

## Run normalizations and scaling
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc)
pbmc <- ScaleData(pbmc)
pbmc <- RunPCA(pbmc)

Batch correction (integration)

13.3.1 Batch correction: canonical correlation analysis (CCA) using Seurat (Broad Institute) - old method, but code is still helpful for learning

Uses separate Seurat objects (old way)

Integrative analysis in Seurat v5 - recommended new method

Streamlined one-line integrative analysis (new way)
Uses one Seurat object created by merging different Seurat objects, then splitting layers to define batches
Includes example code for different integration methods, including CCA and Harmony

Harmony R package (Korsunsky et al 2019, Nature Methods) - method for batch correction of single cell data

"Harmony enables the integration of ~106 cells on a personal computer"

Benchmarking atlas-level data integration in single-cell genomics (Luecken et al 2021, Nature Methods)

scANVI, Scanorama and scVI perform best for scRNA-seq
scATAC-seq integration performance depends on feature space (genes) & most methods performed poorly for scATAC
"scATAC-seq batch effects were only consistently overcome by LIGER and Harmony, which prioritize batch removal over conservation of biological variation."

Differential Expression Testing

Differential expression testing (Seurat)

p_val : p-value (unadjusted)
avg_log2FC : log fold-change of the average expression between the two groups. Positive values indicate that the feature is more highly expressed in the first group.
pct.1 : The percentage of cells where the feature is detected in the first group
pct.2 : The percentage of cells where the feature is detected in the second group
p_val_adj : Adjusted p-value, based on Bonferroni correction using all features in the dataset.

Differential expression across conditions (Seurat integration subsection)

Receptor-Ligand interactions

LIANA: a LIgand-receptor ANalysis frAmework - an R package and python tool for identifying and scoring receptor-ligand interactions in datasets

Spatial transcriptomics

Analysis of spatial datasets (Sequencing-based)

Analysis of spatial datasets (Imaging-based)

STELLAR ( Python based tool ) from Stanford to annotate single cell data, can be used for cross tissue and cross donor spatial transcriptomics data

Multiomics: scRNA-seq and scATAC-seq

Integrating scRNA-seq and scATAC-seq data (Satija lab)

Integrative analysis in Seurat v5 (Satija lab, Oct 2023)

"For this vignette, we use a dataset of human PBMC profiled with seven different technologies , profiled as part of a systematic comparative analysis (pbmcsca). The data is available as part of our SeuratData package."

Azimuth annotation for scRNA-seq and scATAC-seq data (Satija lab)

Signac: a comprehensive R package for the analysis of single-cell chromatin data (Stuart lab)

scRNA-seq data analysis for non-programmers

Galaxy - software for nonprogrammers to use for scRNA-seq analysis

Background reading - general

"The technology and biology of single-cell RNA sequencing" , Kolodziejczyk et al 2015. Molecular Cell . Review.
"An Introduction to the Analysis of Single-Cell RNA-Sequencing Data" , AlJanahi et al 2018. Mol Ther Methods Clin Dev .
"The human mitochondrial transcriptome" , Mercer et al 2012. Cell .

"In the heart, mitochondrial transcripts comprise almost 30% of total mRNA, whereas mitochondria contribute a lower bound of ∼5% to the total mRNA of tissues with lower energy demands (adrenal, ovary, thyroid, prostate, testes, lung, lymph and white blood cells)."
Normal proportions of mitochondrial transcripts vary by sample type -- this is relevant for setting cutoffs! Don't just use the percent.mt cutoff from the Seurat 3kPBMC tutorial.

Background reading - placenta, endometrium

Mareckova M, Garcia-Alonso L, ..., Vento-Tormo R. "An integrated single-cell reference atlas of the human endometrium." Nature Genetics, 2024. [PMID: 39198675 ; PMCID: PMC11387200 ]

Human endometrium with/without endometriosis
ReproductiveCellAtlas.org
HECA = Human Endometrium Cell Atlas, >313k cells
Integrated 6 scRNA-seq databases & new Mareckova (cells) dataset

Wang M, Liu Y, ..., Wang H. "Single-nucleus multi-omic profiling of human placental syncytiotrophoblasts identifies cellular trajectories during pregnancy." Nature Genetics, 2024. [PMID: 38267607 ; PMCID: PMC10864176 ]

Human placenta at first and late third trimester
n=6 placenta in early pregnancy (6-9 weeks gestation)
n=6 placenta in late pregnancy (38-39 weeks gestation)
Integrated separate snRNA-seq and snATAC-seq

Arutyunyan A,... Vento-Tormo R. "Spatial multiomics map of trophoblast development in early pregnancy." Nature, 2023. [PMID: 36991123; PMCID: PMC10076224]

Human placenta and decidua frozen into blocks for spatial experiments
Tissue cryopreserved with cold OCT medium and flash-frozen using a dry ice-isopentane slurry.
Single nuclei used for multiomics (snRNA-seq, snATAC-seq)

Ji K, Chen L, ..., Liu H. "Integrating single-cell RNA sequencing with spatial transcriptomics reveals an immune landscape of human myometrium during labour." Clin Trans Med, 2022. [PMID: 37095651 ; PMCID: PMC10126311 ]

Human myometrial tissue collected during C-section deliveries (singleton, uncomplicated full term)
n=6 TIL, term in labor
n=6 TNL, tern in non-labor
Tissue was washed with PBS, minced and enzymatically dissociated briefly:
3 mg/ml collagenase IV, 2 mg/ml papain , and 120 Units/ml DNases I ) at 37C for 20 min . Cell suspension was passed through stacked 70-30um filters, then passed through the Dead Cell Removal Kit (Miltenyi). Washed with PBS + 0.04% BSA twice.

Koel M, Krjutskov, ... Altmae S. "Human endometrial cell-type-specific RNA sequencing provides new insights into the embryo–endometrium interplay." Human Reproduction , 2022 . [PMID: 36339249 ; PMCID: PMC9632455 ]

Human endometrium cells sorted with FACS, then bulk RNA-seq
n=16 healthy women from Estonia and Spain, mean age 29.7, normal BMI, no hormonal medication for 3 months; normal serum levels of progesterone, prolactin, and testosterone; negative for STIs, no uterine pathologies or endometriosis or PCOS, at least one live birth.
Per woman, n=2 endometrial biopsies within the same menstrual cycle (early secretory & mid-secretory/receptive phases)
NCBI GEO accession GSE97929 (32 samples): 16 paired endometrial samples

Sun T*, Gonzalez TL*, ..., Pisarska MD. “Sexually dimorphic crosstalk at the maternal-fetal interface.” J Clin Endocrinol Metab, 2020. [PMID: 32772088 ; PMCID: PMC7571453 ] *co-first authors.

Human placenta at late first trimester during CVS appointments
NCBI GEO accession GSE131696 (6 samples) = Single cell RNA-seq
NCBI GEO accession GSE131874 (8 samples) = Bulk total RNA-seq of matched decidua and placenta
Tissue was washed with PBS, minced and enzymatically dissociated:
300U/ml collagenase , 0.25% trypsin , and 200μg/ml DNase I at 37C for 90 min . Cells spun 1200 rpm for 10 min, resuspended in Chang medium (which contains 16% serum), and treated with 1x red blood cell lysis buffer for 15 min, then cells were washed again and strained through a 70um filter. [ Details ]

Vento-Tormo R, ..., Teichmann SA. "Single-cell reconstruction of the early maternal–fetal interface in humans." Nature, 2018. [PMID: 30429548; PMCID: PMC7612850]

Human placenta at first trimester

Notes from Tania

October 10, 2024

Bookmarks: single cell RNA-seq tutorials and tools