These are my bookmarks for single cell transcriptomics resources and tutorials.
scRNA-seq introductions
How to make R objects for single cell data, e.g. SingleCellExperiment, SummarizedExperiment
- How to take a spreadsheet with a matrix and convert it to the format needed for many other single cell RNA-seq tutorials, e.g. if you download a .csv.gz file from NCBI GEO
Getting Started with Seurat v4 (Satija lab tutorials list)
- Many tutorials here, for different scRNA-seq goals
Guided clustering tutorial with 3000 PBMC cells
- Setup Seurat object
- Standard pre-processing workflow & quality control
- Data normalization
- Identifying highly variable features (genes)
- Clustering, UMAP/tSNE plots
- Differential gene expression analysis
Basics of single cell analysis with Bioconductor
University of Cambridge intro to single cell RNA-seq analysis
- Identification of low-quality cells using MADs values
Lectures, textbooks, video tutorials, & interpretation
Determining the optimal number of clusters with elbow plots
OSCA Basics: Basics of Single-Cell Analysis with Bioconductor by Robert Amezquita, Aaron Lun, Stephanie Hicks, Raphael Gottardo (2024)
- Quality control - various QC metrics, identifying & removing low quality cells, diagnostic plots
- Normalization - library size, deconvolution, spike-ins, scaling and log-transformation
- Feature selection - quantifying variation, sequencing noises, batch effects, etc
- Dimensionality reduction - PCA plots
- Clustering - k means clustering, hierarchical clustering, subclustering
- Marker gene detection - dot plots, expression plots
- Cell type annotation - using other references, specific genes, markers, diagnostic heatmaps
- Using references
- Annotation diagnostics
- Using multiple references
- Exploiting cell ontology
- Example dataset from pancreas
- Automated PC choice
- Graph-based clustering
- Identifying marker genes
- Detecting correlated genes
- Converting to other formats allows for pseudobulk analysis with edgeR or DESeq2
- Seurat v5 Command Cheat Sheet
- Dimensional Reduction Vignette - explains where things are stored and how to access them
- Combining Two 10X Runs - how to merge different samples for a joint analysis
- Merge 2+ Seurat objects with Seurat's merge function. By default, Seurat uses the raw counts and doesn't keep normalization.
- Merge normalized data by adding merge.data = TRUE
- Introduction to scRNA-seq integration (2023)
- Split layers
- Analyze without integration
- Integrated data
- Identify conserved cell type markers
- Identify differential expressed genes across conditions
- Alternatively, perform integration with SCTransform-normalized datasets
- Integrative analysis in Seurat v5 - how to combine data from different samples or experiments
- Tips for integrating large datasets in Seurat v4.3 - what steps to run in what order, how to reduce computational needs
- Create a list of Seurat objects to integrate
- Perform normalization, feature selection, and scaling separately for each dataset
- Run PCA on each object in the list
- Integrate datasets, and proceed with joint analysis
- Interpreting Cell Ranger Web Summary Files for Single Cell Gene Expression Assays, CG000329 . Highly recommended. They show what plot results look like for typical (good) samples, heterogeneous samples, and compromised (bad) samples.
- Human reference genome annotations
- "Seurat Object Explained: Beginner's Guide and Demo" by chatomics
- Introduction to scRNA-seq Data Analysis by 10x Genomics: Cell Ranger, Loupe browser, cloud analysis
- Quality Assessment Using the Cell Ranger Web Summary by 10x Genomics
- "Statistical analysis of single-cell RNA-seq data with multiple samples" by DahShu
Data visualization - types of plots and how to make them
Data visualization methods in Seurat - ridge plots, violin plots, feature plots, dot plots, heatmaps, visualizing coexpression
Split Dot Plot - color code by an additional variable such as a condition
Clustered dot plot using ComplexHeatmap
Let's Plot 7: Clustered Dot Plots in the ggverse (Eye Informatician)
tSNE vs UMAP, two methods to show clustering:
- Understanding UMAP (Andy Coenen, Adam Pearce)
- tSNE vs UMAP: Global Structure
SCpubr - an R package to make publication ready plots for single cell RNA-seq
- Dim plots - dimensional reduction, similar to PCA or UMAP plots
- Feature plots - dim plot with a continuous scale for gene expression visualization across clusters
- Nebulosa plots - computes a density plot for specific gene markers so you can see where they are most expressed
- Bee Swarm plots
- Violin plots
- Ridge plots - multiple violin plots together
- Dot plots - show gene expression of different markers across different clusters
- Bar plots
- Box plots
- Geyser plots
- Alluvian plots
- Sankey plots
- Chord Diagram plots - circos plots
- Volcano plots
Cell labeling, label transfer, single cell reference mapping
Mapping and annotating query datasets (Satija lab, Oct 2023)
Web Resources for Cell Type Annotation (10x Genomics Analysis Guide, 2024)- Install signac first, otherwise you may get an installation error
- Azimuth annotation on Seurat
Combining samples
- See "Statistical analysis of single-cell RNA-seq data with multiple samples" (YouTube, 1hr lecture)
- Recommendations for combining multiple 10x runs into one SingleCellExperiment
- Process each sample separately for initial QC steps (cell filtering, removing doublets)
- Take notes on QC of each individual sample
- Beware that batch correction steps can remove differentially expressed genes.
- "You can avoid this with careful experimental design, e.g., paired WT/KO samples in each batch so that correction cannot remove genotype differences. You can also detect DE genes between conditions by summing cells within each batch (possibly per population) and treating them as pseudo-bulk for edgeR analyses (see https://doi.org/10.1093/biostatistics/kxw055; https://pubmed.ncbi.nlm.nih.gov/28334062/). This complements a batch-corrected single-cell-level analysis, e.g., when a treatment causes both a systematic DE and changes in population composition."
- When to combine samples in the pre-processing of 10x scRNA-seq data? (2019)
- Pre-process each sample separately (cell filtering, removing empty droplets, doublets, etc)
- Cluster each sample independently for QC purposes - to check samples
- Cluster samples together afterward?
- The difference between merge and integration with Seurat objects (2021)
- Only merge data before pre-processing if using technical replicates with low batch effects?
- How to handle large Seurat objects (5GB+) in R?
- Increase R memory size
- Switch to a high performance computing machine when it becomes too much
- Seurat - Combining Two 10X Runs (10/2023)
- Code to use Read10X function on separate datasets
- Code to combine data and add dataset IDs with merge function
- Commands for Seurat object integration & pseudobulk analysis
- Merge objects (without integration)
- Merge objects (with integration)
- Pseudobulk analysis - group cells together based on multiple categories
- Differential expression testing (3/2024)
- Compare different cell types within the same sample
- Compare same cell types across different samples
- Aggregate gene expression to perform pseudobulk DE analysis with DESeq2, edgeR, or limma
rownames(donor_metadata) <- donor_metadata$ID
## Create dataframe with batch info for every cell
cellBatch = dplyr::left_join(
x = data.frame(
rownames = rownames(pbmc@meta.data),
ID = pbmc@meta.data$orig.ident),
y = donor_metadata[, c("ID", "batch")],
by = "ID")
head(cellBatch)
## Azimuth labeling
Layers(pbmc)
pbmc <- JoinLayers(pbmc) ## to fix Azimuth error
pbmc <- Azimuth::RunAzimuth(pbmc, reference = "pbmcref")
pbmc
Layers(pbmc)
## See cell type annotations added
head(pbmc@meta.data, 10)
## Split layers only AFTER running Azimuth. Define the column to use for batches.
pbmc[["RNA"]] <- split(pbmc[["RNA"]], f = pbmc$batch)
Layers(pbmc[["RNA"]])
## Run normalizations and scaling
pbmc <- NormalizeData(pbmc)
pbmc <- FindVariableFeatures(pbmc)
pbmc <- ScaleData(pbmc)
pbmc <- RunPCA(pbmc)
Batch correction (integration)
13.3.1 Batch correction: canonical correlation analysis (CCA) using Seurat (Broad Institute) - old method, but code is still helpful for learning
- Uses separate Seurat objects (old way)
Integrative analysis in Seurat v5 - recommended new method
- Streamlined one-line integrative analysis (new way)
- Uses one Seurat object created by merging different Seurat objects, then splitting layers to define batches
- Includes example code for different integration methods, including CCA and Harmony
Harmony R package (Korsunsky et al 2019, Nature Methods) - method for batch correction of single cell data
- "Harmony enables the integration of ~106 cells on a personal computer"
Benchmarking atlas-level data integration in single-cell genomics (Luecken et al 2021, Nature Methods)
- scANVI, Scanorama and scVI perform best for scRNA-seq
- scATAC-seq integration performance depends on feature space (genes) & most methods performed poorly for scATAC
- "scATAC-seq batch effects were only consistently overcome by LIGER and Harmony, which prioritize batch removal over conservation of biological variation."
Differential Expression Testing
Differential expression testing (Seurat)
- p_val : p-value (unadjusted)
- avg_log2FC : log fold-change of the average expression between the two groups. Positive values indicate that the feature is more highly expressed in the first group.
- pct.1 : The percentage of cells where the feature is detected in the first group
- pct.2 : The percentage of cells where the feature is detected in the second group
- p_val_adj : Adjusted p-value, based on Bonferroni correction using all features in the dataset.
Differential expression across conditions (Seurat integration subsection)
Receptor-Ligand interactions
LIANA: a LIgand-receptor ANalysis frAmework - an R package and python tool for identifying and scoring receptor-ligand interactions in datasets
Spatial transcriptomics
Analysis of spatial datasets (Sequencing-based)
Analysis of spatial datasets (Imaging-based)
STELLAR ( Python based tool ) from Stanford to annotate single cell data, can be used for cross tissue and cross donor spatial transcriptomics data
Multiomics: scRNA-seq and scATAC-seq
Integrating scRNA-seq and scATAC-seq data (Satija lab)
Integrative analysis in Seurat v5 (Satija lab, Oct 2023)
"For this vignette, we use a dataset of human PBMC profiled with seven different technologies , profiled as part of a systematic comparative analysis (pbmcsca). The data is available as part of our SeuratData package."
Azimuth annotation for scRNA-seq and scATAC-seq data (Satija lab)
Signac: a comprehensive R package for the analysis of single-cell chromatin data (Stuart lab)
scRNA-seq data analysis for non-programmers
Background reading - general
- "The technology and biology of single-cell RNA sequencing" , Kolodziejczyk et al 2015. Molecular Cell . Review.
- "An Introduction to the Analysis of Single-Cell RNA-Sequencing Data" , AlJanahi et al 2018. Mol Ther Methods Clin Dev .
- "The human mitochondrial transcriptome" , Mercer et al 2012. Cell .
- "In the heart, mitochondrial transcripts comprise almost 30% of total mRNA, whereas mitochondria contribute a lower bound of ∼5% to the total mRNA of tissues with lower energy demands (adrenal, ovary, thyroid, prostate, testes, lung, lymph and white blood cells)."
- Normal proportions of mitochondrial transcripts vary by sample type -- this is relevant for setting cutoffs! Don't just use the percent.mt cutoff from the Seurat 3kPBMC tutorial.
Background reading - placenta, endometrium
- Mareckova M, Garcia-Alonso L, ..., Vento-Tormo R. "An integrated single-cell reference atlas of the human endometrium." Nature Genetics, 2024. [PMID: 39198675 ; PMCID: PMC11387200 ]
- Human endometrium with/without endometriosis
- ReproductiveCellAtlas.org
- HECA = Human Endometrium Cell Atlas, >313k cells
-
Integrated 6 scRNA-seq databases & new Mareckova (cells) dataset
- Wang M, Liu Y, ..., Wang H. "Single-nucleus multi-omic profiling of human placental syncytiotrophoblasts identifies cellular trajectories during pregnancy." Nature Genetics, 2024. [PMID: 38267607 ; PMCID: PMC10864176 ]
- Human placenta at first and late third trimester
- n=6 placenta in early pregnancy (6-9 weeks gestation)
- n=6 placenta in late pregnancy (38-39 weeks gestation)
- Integrated separate snRNA-seq and snATAC-seq
- Arutyunyan A,... Vento-Tormo R. "Spatial multiomics map of trophoblast development in early pregnancy." Nature, 2023. [PMID: 36991123; PMCID: PMC10076224]
- Human placenta and decidua frozen into blocks for spatial experiments
- Tissue cryopreserved with cold OCT medium and flash-frozen using a dry ice-isopentane slurry.
- Single nuclei used for multiomics (snRNA-seq, snATAC-seq)
- Ji K, Chen L, ..., Liu H. "Integrating single-cell RNA sequencing with spatial transcriptomics reveals an immune landscape of human myometrium during labour." Clin Trans Med, 2022. [PMID: 37095651 ; PMCID: PMC10126311 ]
- Human myometrial tissue collected during C-section deliveries (singleton, uncomplicated full term)
- n=6 TIL, term in labor
- n=6 TNL, tern in non-labor
-
Tissue was washed with PBS, minced and enzymatically dissociated briefly:
3 mg/ml collagenase IV, 2 mg/ml papain , and 120 Units/ml DNases I ) at 37C for 20 min . Cell suspension was passed through stacked 70-30um filters, then passed through the Dead Cell Removal Kit (Miltenyi). Washed with PBS + 0.04% BSA twice.
- Koel M, Krjutskov, ... Altmae S. "Human endometrial cell-type-specific RNA sequencing provides new insights into the embryo–endometrium interplay." Human Reproduction , 2022 . [PMID: 36339249 ; PMCID: PMC9632455 ]
- Human endometrium cells sorted with FACS, then bulk RNA-seq
- n=16 healthy women from Estonia and Spain, mean age 29.7, normal BMI, no hormonal medication for 3 months; normal serum levels of progesterone, prolactin, and testosterone; negative for STIs, no uterine pathologies or endometriosis or PCOS, at least one live birth.
- Per woman, n=2 endometrial biopsies within the same menstrual cycle (early secretory & mid-secretory/receptive phases)
-
NCBI GEO accession GSE97929
(32 samples): 16 paired endometrial samples
- Sun T*, Gonzalez TL*, ..., Pisarska MD. “Sexually dimorphic crosstalk at the maternal-fetal interface.” J Clin Endocrinol Metab, 2020. [PMID: 32772088 ; PMCID: PMC7571453 ] *co-first authors.
- Human placenta at late first trimester during CVS appointments
- NCBI GEO accession GSE131696 (6 samples) = Single cell RNA-seq
- NCBI GEO accession GSE131874 (8 samples) = Bulk total RNA-seq of matched decidua and placenta
-
Tissue was washed with PBS, minced and enzymatically dissociated:
300U/ml collagenase , 0.25% trypsin , and 200μg/ml DNase I at 37C for 90 min . Cells spun 1200 rpm for 10 min, resuspended in Chang medium (which contains 16% serum), and treated with 1x red blood cell lysis buffer for 15 min, then cells were washed again and strained through a 70um filter. [ Details ]
- Vento-Tormo R, ..., Teichmann SA. "Single-cell reconstruction of the early maternal–fetal interface in humans." Nature, 2018. [PMID: 30429548; PMCID: PMC7612850]
- Human placenta at first trimester
No comments:
Post a Comment