September 13, 2020

How to BLAST a gene of interest (starting from the gene symbol)

 This is assuming you are starting from a gene symbol and don't have the sequence or accession ID yet. 


Step 1: search for your gene in NCBI's databases

Start with either NCBI Nucleotide or NCBI Gene

Search results for noncoding gene "XIST" in NCBI Gene, then clicking the "+" to expand the "RefSeq Sequences" section:


Luckily, there is only 1 reference transcript for XIST. For genes with several transcripts, either continue with all or pick some based on another criteria (e.g. which one is longest, which ones code for protein, which one is suggested as "RefSeq Select" or "canonical").


Search results for protein-coding gene "UTY" in NCBI Gene, after expanding the RefSeq Sequences section, but before clicking on any result:




If the NCBI Gene search results are not enough, you can get more information on transcripts by opening the full NCBI Gene page for UTY. Specifically, go to the section titled "NCBI Reference Sequences (RefSeq)". Page for UTY:


Read the descriptions for each transcript to help you decide. Examples for UTY:



Step 2: Select an NCBI accession ID to BLAST

Clicking on an NCBI accession ID from other NCBI pages (e.g. NCBI Gene) takes you to the NCBI Nucleotide database with the sequence. 

NCBI accession ID prefixes:

  • NR_ = RNA transcript (noncoding or partial or undescribed)
  • NM_ = mRNA transcript (polyadenylated RNA transcript)
  • NP_ = protein sequence



From here, click on "Run BLAST". This takes you to an NCBI BLAST website. Select desired filters in "Choose Search Set", e.g. narrow down to humans with Organism=9606. 

Check "Show results in a new window", then click the blue "BLAST" button.


Results are just other NCBI transcript variants of UTY, which is expected but not interesting:



To get more targeted results if the goal is to find other genes with similar sequences, try one or all of these adjustments:
  • Change the Database drop-down selection to one of the reference sequences databases.
    • Reference RNA sequences (refseq_rna)
    • RefSeq Genome Database (refseq_genomes)
    • Human RefSeqGene sequences (RefSeq_Gene)
  • Adjust the Program Selection in the BLAST options (previous page) and rerun the results. 
    • Try "somewhat similar sequences (blastn)". 
  • Add multiple accession IDs from the same gene to the Query Sequence text box.
    • Get these from NCBI Gene or NCBI Nucleotides
    • Get only RNA or only protein sequence accession IDs (don't mix them up)

No comments:

Post a Comment

Bookmarks: single cell RNA-seq tutorials and tools

These are my bookmarks for single cell transcriptomics resources and tutorials. scRNA-seq introductions How to make R obj...