Introduction to the
CODDLE and PARSESNP Input Preprocessor

The CODDLE and PARSESNP input preprocessor is designed to free the user from some of the more mundane tasks in preparing to use either of those programs. It allows for the quick and easy determination of a gene model from genomic sequence and either protein sequence or cDNA sequence. It will search the Blocks database for similar Blocks families using Reverse PSI-BLAST. It even automates the building of an alignment of related sequences using the SIFT package.

Two things are definately needed to begin processing, the DNA sequence of the gene, and a way to determine the gene model. If working with an entry from NCBI, the latter will be pulled out for you automatically; the same holds true for GenBank formatted files created by Sequin. If the NCBI entry or GenBank file contains more than one CDS line (i.e. more than one gene), however, you will need to copy the gene model out and paste it in by hand, so that we know which gene you want to use. If you know the gene model, just enter it in the proper format into the field provided. Otherwise, it can be determined from the genomic sequence and either the cDNA sequence or the amino acid sequence.

If processing a gene that uses a non-standard genetic code, be sure to set the genetic code to the proper value. Also, if working with a gene fragment, be sure to set the 'First exon begins at codon position' field to the position in the codon of the first exonic nucleotide to avoid a frameshift.

If the program is unable to determine the gene model on the first try, don't give up! It's likely because there is a small discrepency between the genomic sequence you used and your cDNA or protein sequence. The error output should give you a rough idea of where the problem lies. After modifying one of your sequences to correct the problem, try resubmitting. If it still doesn't work, use the error output to write your own gene model by hand in the proper format.

The output from the input preprocessor can be broken down into five main sections:

  1. Warnings, at the top of the page, in red. These indicate potential problems with your input data. Please read them and make sure you understand what they mean and whether or not there is really a problem before proceeding. This section may be absent.
  2. A table of hits to the Blocks database. The full results can be looked at by clinking on the 'Reverse PSI-BLAST Search Results' link. The checkboxes allow the user to choose which Blocks families will be considered in further processing by CODDLE or PARSESNP. Block family ID is a hyperlink to the entry in the Blocks database, and the score is a link the that entry in the search results.
  3. Below that are buttons that allow the user to proceed to CODDLE or PARSESNP using the determined gene model and the selected Blocks families.
  4. Next these is a button that allows the user to attempt to build an alignment of related sequences using SIFT, if not satisified with the options from the database. The information content cutoff allows the user to choose how diverged the block should be; 0 is the least conserved, and 4.32 is the most conserved. Values in the range from 2.75 to 3.25 are recommended. The user can also choose which database to search. SwissProt/TrEMBL, the default, contains the most sequences, but some are known to be in error. SwissProt by itself is much smaller, and therefore faster to search, as well as being more reliable; however, it often yields an inferior alignment due to fewer sequences being represented.
  5. Finally, the determined gene model is listed, as well as the translated amino acid sequence, as a check.
Should you want to run CODDLE or PARSESNP multiple times on the same gene, save the Exon/Intron Position statement and the Blocks families you want to use; you can enter that information directly into CODDLE or PARSESNP, along with the genomic sequence, and skip the input preprocessor entirely.


Created 14 January 2003, last modified 14 January 2003

© 2003 The proWeb Project.