October 10, 2024

Bookmarks: single cell RNA-seq tutorials and tools

These are my bookmarks for single cell transcriptomics resources and tutorials.

scRNA-seq introductions

How to make R objects for single cell data, e.g. SingleCellExperiment, SummarizedExperiment  

Getting Started with Seurat v4  (Satija lab tutorials list)

  • Many tutorials here, for different scRNA-seq goals

Guided clustering tutorial with 3000 PBMC cells

  • Setup Seurat object
  • Standard pre-processing workflow & quality control
  • Data normalization
  • Identifying highly variable features (genes)
  • Clustering, UMAP/tSNE plots
  • Differential gene expression analysis

Basics of single cell analysis with Bioconductor

University of Cambridge intro to single cell RNA-seq analysis

  • Identification of low-quality cells using MADs values


Lectures, textbooks, video tutorials, & interpretation

Determining the optimal number of clusters with elbow plots


OSCA Basics: Basics of Single-Cell Analysis with Bioconductor by Robert Amezquita, Aaron Lun, Stephanie Hicks, Raphael Gottardo (2024)
  • Quality control - various QC metrics, identifying & removing low quality cells, diagnostic plots
  • Normalization - library size, deconvolution, spike-ins, scaling and log-transformation
  • Feature selection - quantifying variation, sequencing noises, batch effects, etc
  • Dimensionality reduction - PCA plots
  • Clustering - k means clustering, hierarchical clustering, subclustering
  • Marker gene detection - dot plots, expression plots
  • Cell type annotation - using other references, specific genes, markers, diagnostic heatmaps
  • Using references
  • Annotation diagnostics
  • Using multiple references
  • Exploiting cell ontology
  • Example dataset from pancreas

scran:
  • Automated PC choice
  • Graph-based clustering
  • Identifying marker genes
  • Detecting correlated genes
  • Converting to other formats allows for pseudobulk analysis with edgeR or DESeq2

Seurat:
  • Seurat v5 Command Cheat Sheet
  • Dimensional Reduction Vignette - explains where things are stored and how to access them
  • Combining Two 10X Runs - how to merge different samples for a joint analysis
    • Merge 2+ Seurat objects with Seurat's merge function. By default, Seurat uses the raw counts and doesn't keep normalization.
    • Merge normalized data by adding merge.data = TRUE
  • Introduction to scRNA-seq integration (2023)
    • Split layers
    • Analyze without integration
    • Integrated data 
    • Identify conserved cell type markers
    • Identify differential expressed genes across conditions
    • Alternatively, perform integration with SCTransform-normalized datasets
  • Integrative analysis in Seurat v5 - how to combine data from different samples or experiments
  • Tips for integrating large datasets in Seurat v4.3 - what steps to run in what order, how to reduce computational needs
    • Create a list of Seurat objects to integrate
    • Perform normalization, feature selection, and scaling separately for each dataset
    • Run PCA on each object in the list
    • Integrate datasets, and proceed with joint analysis

10x Genomics tutorial:
YouTube tutorials:

Data visualization - types of plots and how to make them

Data visualization methods in Seurat  - ridge plots, violin plots, feature plots, dot plots, heatmaps, visualizing coexpression

Split Dot Plot  - color code by an additional variable such as a condition

Clustered dot plot using ComplexHeatmap

Let's Plot 7: Clustered Dot Plots in the ggverse  (Eye Informatician)


tSNE vs UMAP, two methods to show clustering:


SCpubr - an R package to make publication ready plots for single cell RNA-seq

  • Dim plots - dimensional reduction, similar to PCA or UMAP plots
  • Feature plots - dim plot with a continuous scale for gene expression visualization across clusters
  • Nebulosa plots - computes a density plot for specific gene markers so you can see where they are most expressed
  • Bee Swarm plots
  • Violin plots
  • Ridge plots - multiple violin plots together
  • Dot plots - show gene expression of different markers across different clusters
  • Bar plots
  • Box plots
  • Geyser plots
  • Alluvian plots
  • Sankey plots
  • Chord Diagram plots - circos plots
  • Volcano plots


Cell labeling, label transfer, single cell reference mapping

Mapping and annotating query datasets (Satija lab, Oct 2023)

Web Resources for Cell Type Annotation  (10x Genomics Analysis Guide, 2024)

Azimuth: App for reference based single cell analysis - helps annotate clusters. You can upload the Seurat object .rds file to the app and get predictions. Troubleshoot error with Seurat v5.
 

Combining samples

Theory - 

Q & A -

Tutorials -


How to define batch -
This assumes you have a small spreadsheet "donor_metadata" which includes rows=samples and columns=metadata including a column labeled "batch". The "ID" columns are the sample names and these should match the IDs used during import of individual Seurat datasets.

rownames(donor_metadata) <- donor_metadata$ID

## Create dataframe with batch info for every cell
cellBatch = dplyr::left_join(
  x = data.frame(
            rownames = rownames(pbmc@meta.data),
            ID = pbmc@meta.data$orig.ident),
  y = donor_metadata[, c("ID", "batch")],
  by = "ID")
head(cellBatch)


How to assign Azimuth labels and split layers by batch - 
## Azimuth labeling
Layers(pbmc)
pbmc <- JoinLayers(pbmc)  ## to fix Azimuth error
pbmc <- Azimuth::RunAzimuth(pbmc, reference = "pbmcref")
pbmc

Layers(pbmc)

## See cell type annotations added
head(pbmc@meta.data, 10)

## Split layers only AFTER running Azimuth. Define the column to use for batches.
pbmc[["RNA"]] <- split(pbmc[["RNA"]], f = pbmc$batch)
Layers(pbmc[["RNA"]])

## Run normalizations and scaling
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc)
pbmc <- ScaleData(pbmc)
pbmc <- RunPCA(pbmc)
 

Batch correction (integration)

13.3.1 Batch correction: canonical correlation analysis (CCA) using Seurat (Broad Institute) - old method, but code is still helpful for learning

  • Uses separate Seurat objects (old way)

Integrative analysis in Seurat v5 - recommended new method

  • Streamlined one-line integrative analysis (new way)
  • Uses one Seurat object created by merging different Seurat objects, then splitting layers to define batches
  • Includes example code for different integration methods, including CCA and Harmony

Harmony R package  (Korsunsky et al 2019 Nature Methods) - method for batch correction of single cell data

  • "Harmony enables the integration of ~106 cells on a personal computer"

Benchmarking atlas-level data integration in single-cell genomics (Luecken et al 2021, Nature Methods)

  • scANVI, Scanorama and scVI perform best for scRNA-seq
  • scATAC-seq integration performance depends on feature space (genes) & most methods performed poorly for scATAC
  • "scATAC-seq batch effects were only consistently overcome by LIGER and Harmony, which prioritize batch removal over conservation of biological variation."



Differential Expression Testing

Differential expression testing (Seurat)

  • p_val : p-value (unadjusted)
  • avg_log2FC : log fold-change of the average expression between the two groups. Positive values indicate that the feature is more highly expressed in the first group.
  • pct.1 : The percentage of cells where the feature is detected in the first group
  • pct.2 : The percentage of cells where the feature is detected in the second group
  • p_val_adj : Adjusted p-value, based on Bonferroni correction using all features in the dataset.


Differential expression across conditions (Seurat integration subsection)



Receptor-Ligand interactions

LIANA: a LIgand-receptor ANalysis frAmework  - an R package and python tool for identifying and scoring receptor-ligand interactions in datasets


Spatial transcriptomics

Analysis of spatial datasets (Sequencing-based)

Analysis of spatial datasets (Imaging-based)

STELLAR ( Python based tool ) from Stanford to annotate single cell data, can be used for cross tissue and cross donor spatial transcriptomics data


Multiomics: scRNA-seq and scATAC-seq

Integrating scRNA-seq and scATAC-seq data  (Satija lab)

Integrative analysis in Seurat v5 (Satija lab, Oct 2023)

"For this vignette, we use a dataset of human PBMC profiled with seven different technologies , profiled as part of a systematic comparative analysis (pbmcsca). The data is available as part of our SeuratData package."

Azimuth annotation for scRNA-seq and scATAC-seq data (Satija lab)

Signac: a comprehensive R package for the analysis of single-cell chromatin data (Stuart lab)


scRNA-seq data analysis for non-programmers

Galaxy  - software for nonprogrammers to use for scRNA-seq analysis

Background reading - general



Background reading - placenta, endometrium

  • Arutyunyan A,... Vento-Tormo R. "Spatial multiomics map of trophoblast development in early pregnancy." Nature, 2023. [PMID: 36991123; PMCID: PMC10076224]
    • Human placenta and decidua frozen into blocks for spatial experiments
    • Tissue cryopreserved with cold OCT medium and flash-frozen using a dry ice-isopentane slurry.
    • Single nuclei used for multiomics (snRNA-seq, snATAC-seq)

  • Ji K, Chen L, ..., Liu H. "Integrating single-cell RNA sequencing with spatial transcriptomics reveals an immune landscape of human myometrium during labour." Clin Trans Med, 2022. [PMID: 37095651 ; PMCID: PMC10126311 ]
    • Human myometrial tissue collected during C-section deliveries (singleton, uncomplicated full term)
    • n=6 TIL, term in labor
    • n=6 TNL, tern in non-labor
    • Tissue was washed with PBS, minced and enzymatically dissociated briefly:
      3 mg/ml collagenase IV, 2 mg/ml papain , and 120 Units/ml DNases I ) at 37C for 20 min . Cell suspension was passed through stacked 70-30um filters, then passed through the Dead Cell Removal Kit (Miltenyi). Washed with PBS + 0.04% BSA twice.

  • Koel M, Krjutskov, ... Altmae S. "Human endometrial cell-type-specific RNA sequencing provides new insights into the embryo–endometrium interplay." Human Reproduction , 2022 . [PMID: 36339249 ; PMCID: PMC9632455 ]
    • Human endometrium cells sorted with FACS, then bulk RNA-seq
    • n=16 healthy women from Estonia and Spain, mean age 29.7, normal BMI, no hormonal medication for 3 months; normal serum levels of progesterone, prolactin, and testosterone; negative for STIs, no uterine pathologies or endometriosis or PCOS, at least one live birth.
    • Per woman, n=2 endometrial biopsies within the same menstrual cycle (early secretory & mid-secretory/receptive phases)
    • NCBI GEO accession GSE97929 (32 samples): 16 paired endometrial samples 

  • Sun T*,  Gonzalez TL*, ..., Pisarska MD. “Sexually dimorphic crosstalk at the maternal-fetal interface.” J Clin Endocrinol Metab, 2020. [PMID: 32772088 ; PMCID: PMC7571453 ] *co-first authors.
    • Human placenta at late first trimester during CVS appointments
    • NCBI GEO accession GSE131696  (6 samples) = Single cell RNA-seq
    • NCBI GEO accession GSE131874  (8 samples) = Bulk total RNA-seq of matched decidua and placenta
    • Tissue was washed with PBS, minced and enzymatically dissociated:
      300U/ml collagenase , 0.25%  trypsin , and 200μg/ml DNase I  at  37C for 90 min . Cells spun 1200 rpm for 10 min, resuspended in Chang medium (which contains 16% serum), and treated with 1x red blood cell lysis buffer for 15 min, then cells were washed again and strained through a 70um filter. [ Details ]


No comments:

Post a Comment

Bookmarks: single cell RNA-seq tutorials and tools

These are my bookmarks for single cell transcriptomics resources and tutorials. scRNA-seq introductions How to make R obj...