Links to free tools and tutorials. To be updated occasionally...
Reading in large csv spreadsheets (e.g. 1-2GB)
- Optimize your data read in. Use the "data.table" R package fread() function instead of base read.csv(). It's faster and allows you to import only certain columns or rows.
- To read in a 2 GB spreadsheet, you need 16 GB of RAM. In my experience, 8 GB RAM results in out of memory errors.
Annotating omics data with Ensembl's BioMart tool (R, Perl)
- "biomaRt" R package
- Ensembl.org also provides a Perl API for BioMart
RNA-seq workflow (Unix, R)
- UCLA workshops (YouTube recordings of live workshops):
Comparing edgeR, DESeq2, and limma
DESeq2 differential expression analysis (R)
edgeR workflow
Machine learning 101 (Matlab, Python)
Comparing sequences
- BLAST - compare your sequence to another given sequence, or to the human genome, or to other genomes or transcriptomes
- Nucleotide BLAST compares sequences
- blastx - give it a DNA/RNA sequence to be compared to proteins
- tblastn - give it a protein sequence to be compared to DNA/RNA sequences
- Protein BLAST - compare protein sequences
- ClustalW - input sequences in a FASTA format and align them
- Reduce the "Gap Extension Penalty" to zero if you're comparing DNA to mRNA (thus allowing for introns to interrupt the alignment, which the default algorithm avoids)
- Use this when you want to see the full alignment, not just short windows of good alignment that BLAST provides
Enrichment Analysis - methods with no coding required
- Ingenuity Pathway Analysis (QIAGEN) - click on "Resources" and search for webinars. The software is free, but the license is not. Cedars-Sinai has an institutional license that you can request through EIS.
- Gene Ontology - free but less detailed than IPA
Data visualization
Statistical models
General links
- Pak Yu's github: https://sfpacman.github.io/cookbook/index.html