October 25, 2016

Free Databases and Software for Molecular Biology, Genetics, and Basic Bioinformatics [2016]

SEQUENCE DATABASES AND TOOLS

  • Uniprot.org - Go here to find information on specific proteins. Find information on functional and structural domains, calculate pI, calculate molecular weight, find homologs, get expression information and protein ontology notes. "The mission of UniProt is to provide the scientific community with a comprehensive, high-quality and freely accessible resource of protein sequence and functional information."
  • Ensembl - Go here for gene, genome sequence, and splicing isoform information. The Ensembl Gene IDs are useful for RNA-seq analysis. For example: the ID for the gene encoding actin beta is ENSG00000075624.
  • GenBank - NIH Genetic Sequence Database - Use this to find annotated sequence data for genes of interest. For example, look up the cDNA sequence of actin beta, click on "Send:" in the top right corner, choose "Complete Record", choose destination "File", and download format "Genbank" or "Genbank (full)". That GenBank file can be opened with any sequence software that supports annotations (e.g. SnapGene, SeqBuilder, Benchling, ApE) and the result will be the annotation showing up right next to the DNA sequence. My favorite free software for viewing annotated files is SnapGene Viewer.
  • Primer-BLAST - Use this tool to design specific primers for a gene of interest for qPCR or regular PCR.
  • GeneMANIA.org - Make protein signaling and interaction networks. Input a list of proteins and see if they are known to interact. The results are from published data, so unstudied proteins will mistakenly look like they don't interact with anything. For an example network, try this input:
    • CDX2
    • MMP2
    • HLA-G
  • The Bio-Analytic Resource (BAR) for Plant Biology - Useful tools specifically geared for plant biologists. Genome browsers, expression mappers (eFP browsers), etc.
  • bioDBnet: db2db - Database to Database Conversions -  Really useful for RNA-seq and other large-scale experiments! If you only have a list of gene names or IDs, use db2db to generate a list of gene name synonyms, gene descriptions, biotypes (e.g. protein_coding, lincRNA), accession IDs in different databases, etc. Try using the Ensembl Gene ID input for actin beta (ENSG00000075624).
  • Clustal W - Multiple Sequence Alignment - Use this to check multiple sequences (DNA or protein). I like this version of Clustal W specifically because it gives me the ability to alter the parameters for the alignment. If you are aligning similar sequences, but one of them has an intron or another interruption, the default parameters will result in a poor alignment. In order to improve it, reduce the "gap extension penalty" so that the alignment score doesn't become awful due to the intron interrupting one of the sequences. Otherwise, the winning alignment will be a useless one full of 1-3 base gaps all over the place. When aligning differently spliced sequences that are otherwise expected to be similar, keep the "gap open penalty" high and the "gap extension penalty" low to get a better result. This is what I do when I manually check the sequencing results of an unknown splice product against the genomic sequence.

SOFTWARE

SnapGene Viewer showing annotated map of EGFP-HyP5SM.
SnapGene can find common features to fix poorly-annotated plasmid files.
  • SnapGene Viewer - This is my favorite sequences annotation software. This software will read the annotated files generated by GenBank (.gb) as well as sequence formats from for-pay software like the LaserGene Suite's SeqBuilder. If uploading the raw sequence of a plasmid, SnapGene Viewer will search for common sequences and helpfully annotate known promoters, epitope tags, selectable markers, origins of replication, reporter genes, terminators, and many other sequences. It is useful for complex annotations (allowing multiple notes, color coding, breaks in the annotated sequence, etc). It also reads DNA chromatogram files (.abi) and can be used to analyze sequencing results, but the free version does not allow for easy alignment to a reference gene. For DNA sequencing analysis, I prefer Benchling.

    Checking restriction digest sites in DsRed2 using SerialCloner.

  • SerialCloner - Great offline tool for cloning, especially restriction digest cloning. I also use it to check it my primers are specific to my gene or if they may bind another part of the plasmid. Its sequence alignment tool will find and report less-than-perfect matches on either strand so that I can manually decide if a primer will work for me. Con: It only shows its best match.
      
Looking at the consensus sequences of plant L5 ribosomal proteins with BioEdit.
  • BioEdit - A useful tool for generating and saving sequence alignments. No longer maintained, but still worth a download. It was designed for Windows XP, but I have used it with Windows 7/8/10. If you have trouble, right-click the .exe file, select "Properties", go to the "Compatibility" tab, and run under compatibility settings for Windows XP.

ONLINE SOFTWARE


  • Benchling - Online sequence annotation, alignment, and sequence data analysis software. The annotation capabilities are very simple right now, so I prefer SnapGene for that. I mainly use Benchling for checking sequencing results. Upload the DNA chromatogram file (.abi) from the sequencing results and the predicted DNA (e.g. the gene you are cloning into a plasmid) as reference. Align the two sequences and check the chromatogram for SNPs.



No comments:

Post a Comment

Bookmarks: single cell RNA-seq tutorials and tools

These are my bookmarks for single cell transcriptomics resources and tutorials. scRNA-seq introductions How to make R obj...