Quick tutorial for db2db
Outputs: Chromosomal Location, Ensembl Gene ID, Gene Symbol, Gene Synonyms, etc. Whichever input you select will be missing on the output options list, but db2db automatically gives it back to you.
ID List: paste in your list of genes or ensembl IDs directly from Excel. It will paste as 1 ID per row.
Remove duplicate input values: No. The default is "Yes", but you might get back a different number of rows than what you put in, which makes it difficult to copy/paste results back into your spreadsheet.
Other defaults are fine.
Click Submit.
Click "Results in Excel" to download an .xls file.
Double click the .xls file to open. You will get an error message saying the file might be corrupted, but it is not. Excel is detecting that the file was generated by code so it looks weird, but it should open fine.
Add columns to your original spreadsheet to make space for all your db2db output row, including your returned input data. Paste all columns from db2db into your original Excel sheet.
Check your results before incorporating them into your original spreadsheet
Important: check that the input rows are not scrambled before you copy/paste columns back into your original spreadsheet. Use a function to compare your original input and the returned input from db2db.Column A: original input column you provided db2dbColumn B: input column you received back from db2db output
Make a new column and paste in this formula: =A=B
Depending on whether the column values match, it will return TRUE or FALSE for each row. Search to make sure that everything says TRUE.
FALSE rows mean your data is scrambled! Check why and fix it.
Example: fixing the calendar genes issue
In older versions of the human genome, some genes had calendar date-like names (e.g. MARCH1, MARC1, SEPT9). Opening csv spreadsheets with Excel changes the gene symbol text to actual calendar dates by default for those genes. The most recent versions of the human genome have updated these names to prevent this issue.
Here, I used db2db to get those newer gene names for an RNA-seq project. I first sorted column "Gene" alphabetically to bring the calendar genes to the top. I input "GENCODE_ensembl" values into db2db for rows 4-31 and received back two columns. I created a "Check order" column to quickly check for any mismatched Ensembl Gene IDs (there were none).
The last row with a calendar issue (row 31) didn't have a gene symbol result from db2db, but I searched the Ensembl Gene ID on Ensembl.org myself and it corresponds to gene symbol DELEC1, which I added manually.
No comments:
Post a Comment