May 11, 2024

R code for Manhattan plots: identify the FDR=0.05 equivalent P-value (for plotting purposes)

For Manhattan plots and other plots where you're plotting -log10(P-values) or untransformed P-values, you sometimes want to draw line to identify when FDR=0.05 is reached. But where do you draw the line? Below is R code that identifies the nearest P-value to FDR=0.05. You can then use variable FDR5.equiv.P to create a line on your Manhattan plot.

Code assumes you have a data frame with two columns, "Pval" and "FDR".

March 26, 2024

Advice: Flash talks

What is a flash talk?

Flash talks are brief formal presentations about your research, usually 1-3 minutes. They are mini-oral presentations to accompany research posters or journal articles. Other names include lightning talks, speed talks, rapid-fire talks. Their goal is to provide big picture ideas and make your research interesting to a broad audience.

They are similar to elevator pitches, except flash talks include a visual element (usually 1-3 PowerPoint slides). Sometimes the visual element is your full poster and nothing else, in which case you want to design your poster with this in mind. Add large visuals and large text.

Advice for flash talks (video links):

Winning Tips for Preparing a Successful Three-Minute Thesis 3MT® Presentation, OhioUPhysics, YouTube [12:32]

How to give a flash talk - tips and tricks for scientists, European Molecular Biology Laboratory (EMBL), YouTube [2:58]

The perfect pitch - explaining your research in one minute, Kungl. Ingenjörsvetenskapsakademien IVA, YouTube [7:32]

February 13, 2024

Free Databases and Software for Molecular Biology, Genetics, and Bioinformatics [2024]

Table of contents


DNA, RNA, and protein sequences - databases

Ensembl - genome reference website with gene annotations. Go here for gene, genome sequence, and splicing isoform information. The Ensembl Gene IDs are useful for RNA-seq primary and secondary analysis. For example: the ID for the gene encoding actin beta is ENSG00000075624.
  • Computer-generated IDs for unique sequences. The pseudoautosomal genes from chromosomes X and Y are not duplicated and only represented in chromosome X.

GENCODE - genome reference website with gene annotations. Human and mouse data.
  • Human-annotated IDs. The pseudoautosomal genes from chromosomes X and Y are given separate IDs, keeping "ENSG0..." for the chromosome X copy (as per Ensembl.org) and replacing the first zero with an R for the chromosome Y copy, "ENSGR..."

UC Santa Cruz Genome Browser - database of genomic DNA annotations for various species. Previously only had annotations for the old human genome reference (version hg19, also called GRCh37) but currently adding annotations for the new version (hg38, GRCh38).

February 6, 2024

How to batch rename microscope photos with Irfanview (Windows freeware)

Irfanview is free graphics editing software for Windows. It is meant for editing existing files, not drawing new images. Use Photoshop, Illustrator, Inkscape, Powerpoint, Paint, and other graphics programs to make new images. The purpose of Irfanview is "only" editing, but it is packed full of useful science tools such as cropping, resizing, bright/contrast/gamma/hue adjustments, image resolution changes, color replacement, conversion to grayscale or black/white, batch conversion of file type/size/name, and more. 

I love Irfanview for renaming microscopy photos, which I show below.

Step 1: Download and install Irfanview.

Step 2: Save microscopy photos. If your microscope allows you to export photos but not to name them (like my example here), then take a "spacer photo" in between your samples so that you know when sample A ends and sample B begins. Spacer photos can be anything that looks different, for example a photo of a dark or white area that you will recognize is not either sample. In my example here, I took two photos of the text labeling on the hemacytometer, in between samples.

Step 3: Open Irfanview and select File:Batch Conversion/Rename...

November 24, 2023

Gardening: seeds

 Places to buy seeds. Not for research, just vegetable and flower gardening.

California


Other USA

Look up your plant hardiness zone by zip code here.

September 2, 2023

Vocabulary for reading DNA methylation papers

 If you are new to the field of DNA methylation, here is some introductory vocabulary.

  • 27k array - short for Infinium HumanMethylation27 BeadChip (approx 27,000 methylation sites). An early DNA methylation arrays that samples CpG sites across the human genome. Not used anymore. 
  • 450k array - short for Illumina HumanMethylation450K BeadChip (>450,000 methylation sites). This used to be the gold standard for array-based DNA methylation measurements for several years, though now the EPIC array is available with more methylation sites.
  • Beta value and M value are two terms used to describe the methylation measurement.
    • Beta values range from 0 to 1 and describe the proportion of methylation for a specific site in the sample, from completely unmethylated to completely methylated. 
    • M values can be negative or positive and are a result of data normalization. M values are more useful than beta values as input for statistical models. If you want to describe differentially methylated probes (DMPs) then you need M values for your analysis.
  • Beta values or sometimes "delta beta", annoyingly, can also refer to the model coefficient result from a generalized linear model. In that case, they would describe if a site is more or less methylated in group1 versus group2. Positive and negative values refer to the direction of higher methylation. Consult the manuscript to figure out the direction.
  • Bisulfite conversion - this is a chemical change done by the lab to make it easier to distinguish unmethylated versus methylated sites on the DNA. It is done before the DNA methylation measurement step, whether that be bisulfite sequencing or bisulfite PCR or a methylation array. Bisulfite conversion does NOT mean bisulfite sequencing.
  • Bisulfite sequencing - this is a method of measuring DNA methylation in a sample, using bisulfite conversion followed by DNA sequencing. 
  • Bonferroni correction - this is a way of correcting p-values when you have a lot of measurements and thus a higher risk of false positives. The Bonferroni method is much stricter than the FDR method.
    • P<0.05 is "nominally" significant, meaning it seems significant for a single site or gene but if you're looking at many sites or genes then you have a higher chance that your data has some false positives, and therefore P<0.05 isn't good enough for big data. You need to adjust for multiple comparisons to cut down the risk of false positives.
    • FDR<0.05 is actually significant for big data.
    • Bonferroni<0.05 is actually significant for big data and much stricter than FDR<0.05
    • Bonferroni<0.05 data points are also going to meet criteria for FDR<0.05 and P<0.05, since Bonferroni<0.05 is stricter than both
  • CpG - dinucleotide grouping of a cytosine base followed by a guanine base on the same DNA strand. It is not the same as C and G binding across DNA strands. CpG methylation (with a methyl group on the C base) is what people usually mean when they say DNA methylation, since it's the most common form of DNA methylation.

R code for Manhattan plots: identify the FDR=0.05 equivalent P-value (for plotting purposes)

For Manhattan plots and other plots where you're plotting -log10(P-values) or untransformed P-values, you sometimes want to draw line to...