August 25, 2021

How to look up gene synonyms quickly with db2db (and fix the calendar gene issue)

If you have a list of gene symbols or Ensembl IDs and need to identify gene synonyms, try the Database to Database Conversions tool from bioDBnet: https://biodbnet-abcc.ncifcrf.gov/db/db2db.php

This is also a great way to correct calendar gene name conversion errors when opening a csv file in Excel. 

Quick tutorial for db2db


Input: use the "Ensembl Gene ID" when available (highly recommended!) or "Gene Symbol" if you only have the name. Ensembl Gene IDs are better because they are unique, whereas gene symbols are not always unique and can change with genome updates.

Outputs: Chromosomal Location, Ensembl Gene ID, Gene Symbol, Gene Synonyms, etc. Whichever input you select will be missing on the output options list, but db2db automatically gives it back to you.

Select multiple output by holding the "Ctrl" key on your keyboard.

Organism: 9606 for human (not optional for our research). If you leave this blank, db2db will return information for other species.

ID List: paste in your list of genes or ensembl IDs directly from Excel. It will paste as 1 ID per row.

Identifier values can have comma(,): Default no. Don't change if you pasted a column from Excel. Do change if you are pasting in a list with comma-separated values.

Remove duplicate input values: No. The default is "Yes", but  you might get back a different number of rows than what you put in, which makes it difficult to copy/paste results back into your spreadsheet.

Other defaults are fine.

Click Submit.

Click "Results in Excel" to download an .xls file.

Double click the .xls file to open. You will get an error message saying the file might be corrupted, but it is not. Excel is detecting that the file was generated by code so it looks weird, but it should open fine.

Add columns to your original spreadsheet to make space for all your db2db output row, including your returned input data. Paste all columns from db2db into your original Excel sheet.

Check your results before incorporating them into your original spreadsheet

Important: check that the input rows are not scrambled before you copy/paste columns back into your original spreadsheet. Use a function to compare your original input and the returned input from db2db.

Column A: original input column you provided db2db
Column B: input column you received back from db2db output

Make a new column and paste in this formula: =A=B 
 
Depending on whether the column values match, it will return TRUE or FALSE for each row. Search to make sure that everything says TRUE.
 
FALSE rows mean your data is scrambled! Check why and fix it. 


Example: fixing the calendar genes issue 

In older versions of the human genome, some genes had calendar date-like names (e.g. MARCH1, MARC1, SEPT9). Opening csv spreadsheets with Excel changes the gene symbol text to actual calendar dates by default for those genes. The most recent versions of the human genome have updated these names to prevent this issue.

Here, I used db2db to get those newer gene names for an RNA-seq project. I first sorted column "Gene" alphabetically to bring the calendar genes to the top. I input "GENCODE_ensembl" values into db2db for rows 4-31 and received back two columns. I created a "Check order" column to quickly check for any mismatched Ensembl Gene IDs (there were none).


The last row with a calendar issue (row 31) didn't have a gene symbol result from db2db, but I searched the Ensembl Gene ID on Ensembl.org myself and it corresponds to gene symbol DELEC1, which I added manually. 

My last step was to copy/paste the symbols from db2db into column "Gene" to fix the calendar issue.  

Alternatives to db2db

Ensembl.org's BioMart website: https://m.ensembl.org/biomart/martview

Tutorial for BioMart from Ensembl: https://m.ensembl.org/info/data/biomart/index.html

No comments:

Post a Comment

Bookmarks: single cell RNA-seq tutorials and tools

These are my bookmarks for single cell transcriptomics resources and tutorials. scRNA-seq introductions How to make R obj...