PARSESNP (Project Aligned Related Sequences and Evaluate SNPs) is a web-based tool for the analysis of polymorphisms in genes. It determines the translated amino acid sequence from a reference DNA sequence (genomic or cDNA) and a gene model, and the effects of the supplied polymorphisms on the expressed gene product. If a homology model is provided, predictions can be made as to the severity of missense changes.
Variants can be read in from a number of databases, including HGMD, SwissProt, and dbSNP. When using variants from these databases, be sure to check that the reference DNA sequence being used corresponds exactly to the variants being processed; numbering, especially in protein sequences, often can be inconsistent. When they are included in a GenBank record, variants will also be read in from an NCBI URL or GenBank file (even if entered through the input preprocessor). In addition, users can enter their own variants manually, both through the variant file upload option and through the web form that is presented after the first PARSESNP form submission. If more than five variants are to be entered manually in this way, change the "No. of variants to enter by hand" field to the appropriate value before submitting the initial form.
Variants can be entered in any of the following formats:
The PARSESNP output can seem a bit intimidating at first glance, but it's really quite harmless. At the top of the page is the gene name entered by the user, followed by a list of the Blocks families that were used as a homology model. This is followed by one or more images showing the locations of polymorphisms on the genomic and coding sequence of the gene.
The images of variants positioned on the gene read from left, the start of the gene, to right, at the end of the gene. The first line of the image, the green boxes with a line through the middle, show the locations of Blocks on the gene. If a block spans an intron, the middle line continues through the intron, but the top and bottom lines are only present in exonic sequence. The second line, the orange boxes connected by lines, shows the location of coding exons on the gene. The boxes are the exons, and the thin lines represent the introns. In a graph of the coding sequence, the locations of introns are represented by vertical orange lines, but the introns themselves are not shown. The third section of the graph shows the location of the polymorphisms. A polymorphism in an exon is represented by an upward-pointing triangle, while a polymorphism in an intron is represented by a downward-pointing triangle. The first row of triangles, colored red, shows the location of nonsense and splice junction changes. The second row, colored black, shows the location of missense changes, and the third row, colored purple, shows the location of silent changes. The total length of the sequence displayed on the graph is shown at the end of the sequence.
This is followed by a table of the variants, in the order they are found on the sequence. For each variant, there is a link to the location of the change on the genomic sequence (the "G" link) and, if the variant is in a coding region, on the cDNA sequence (the "C" link). Each variant also shows the change in nucleotide sequence, the effect on translation or splicing, and a list of restiction enzyme polymorphisms caused by the change. If a Blocks family was provided as a homology model, the PSSM difference score is shown for a missense change that falls within a Block; similarly, if a protein sequence alignment containing the reference sequence was provided, SIFT scores are provided for each missense change. This is followed by the user-supplied description of the change, or a statement of how the variant was entered if that isn't immediately obvious from the nucleotide change or effect columns. The final column lists the zygosity of the change; changes entered using an ambiguous nucleotide are considered heterozygous, others are homozygous.
The table is followed by a link to download the information from the table in a tab-separated-value (TSV) text file. If a Blocks model was provided and variants were found in a region covered by a Block, an option to search 3D Blocks to view the variants on a 3D protein structure is provided. If a Blocks model was not supplied, the user has the option of submitting one and reprocessing the submitted polymorphisms.
The last element of the PARSESNP output is the detailed display of the reference sequence with the polymorphisms shown. Both the genomic and coding sequences are shown, introns are represented by a series of continuous lower case nucleotides in the genomic sequence, and intron locations are represented by a vertical bar in the coding sequence. The main portion of the display is a series of lower case letters, the DNA sequence, with a series of upper case letters, the amino acid sequence, above it. The DNA sequence is broken up into codons in coding regions. At the end of each line of the amino acid sequence is the position of the last amino acid shown, and at the end of each line of the DNA sequence is the position of the last nucleotide shown.
Block hits are shown on the amino acid sequence. The name of the Block, the MAST p-value for the match between the Block and the reference sequence, and the information content of the Block are all shown on the line above the amino acid sequence. The amino acids in the block are shown as underlined amino acids; note that Blocks may be interrupted by an intron, in which case the underlining stops for the duration of the intron. How well the reference sequence matches the block is shown by the coloring of the amino acids in the reference sequence; those colored green are most similar to the corresponding column in the aligned Block and have a PSSM score greater than 2, those colored red have a PSSM score less than 0, and those colored black have an intermediate score.
Variants are shown below the DNA sequence. The nucleotide change is shown directly under the affected nucleotide. This is followed, for changes in coding regions, by an indication of the effect of the change of the form original amino acid, amino acid position, and new amino acid (* for stop codon). This is followed by a number identifying the variant; the same identifier is used in the table. The variant display is colored according to the severity of its effect. Changes to a stop codon and splice junction changes are colored red, and silent changes are colored black. If a missense change is in a Block, it is colored according to its PSSM difference score: if the score is less than 0, indicating that variant residue is more similar to the corresponding column of the Blocks alignment than the reference residue, then the change is colored green; if it's greater than 10, it's colored red, otherwise it is colored black. Missense changes outside of a Block are also colored black.
For more help filling out the PARSESNP web form, please visit the PARSESNP glossary.
Created 14 January 2003, last modified 1 April 2003
© 2003 The proWeb Project.