FASTA format
FASTA files are text files used to store sequence information. The format is 2 lines of text:
>title line starting with a greater-than sign
sequence with no double line breaks or special characters
FASTA file extensions are usually .fasta or .fa but can also be .txt because, either way, they are just text files. Open them as text files with Notepad. For large sequences, Notepad++ works better (more memory allocation). Word or any other text file reader will work as well.
Select an accession ID:
- Search for your gene of interest in the database NCBI Gene, e.g. DDX3X.
- Go to section "NCBI Reference Sequences (RefSeq)" and click on an accession ID
Most useful NCBI accession ID prefixes for sequences from GenBank cDNA and EST data. These are curated sequences:
- NR_ = RNA transcript (noncoding or partial or undescribed)
- NM_ = mRNA transcript (polyadenylated RNA transcript)
- NP_ = protein sequence
Other NCBI accessions from eukaryotic genome annotation pipelines. These map to the genome but haven't been curated yet (thus may be less useful):
Download a FASTA file (.fa text file) from NCBI Nucleotide
After clicking on the accession ID, you'll be taken to NCBI Nucleotide with the sequence information, e.g.
page for DDX3X.
Click on "Send to:" on the upper right corner.
Select Complete Record, then File, then format "FASTA". Click Create File.
The resulting FASTA file can be opened with any text file reader. Here are the first few lines for DDX3X:
Retrieve a FASTA sequence (no download)
If you just want to cut and paste, and don't need a saved file, click on the "FASTA" link on the upper left side.
Alignments
Clustal Omega & Clustal W
You can manually compare your FASTA-formatted sequence with other FASTA sequences by aligning them with
Clustal Omega (more recent version) or
Clustal W.
Use ClustalW if comparing intron-containing and intronless sequences (e.g. genomic versus coding sequence) because you can reduce the Gap Open Penalty to zero to get a better alignment.
NCBI BLAST
The FASTA sequence can also be used for
NCBI BLAST tools to compare your sequence to whole databases.
- Nucleotide blast (blastn) - compares your nucleotide sequence to known nucleotide sequences.
- blastx - translates your nucleotide sequence input to amino acids, then compares the results to known protein sequences. Conveniently, blastx tries all 6 reading frames of your input (3 forward, 3 reverse).
No comments:
Post a Comment