Nucleic Acids Research Advance Access published online on January 18, 2008
Nucleic Acids Research, doi:10.1093/nar/gkm1181
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Methods Online |
Codon choice in genes depends on flanking sequence information—implications for theoretical reverse translation
1Department of Molecular Biology and Microbiology, Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, 32816, USA, 2Genomics and Regulatory Systems Group, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 3Department of Biostatistics, University of Texas School of Public Health, El Paso, TX, 79902, USA
*To whom correspondence should be addressed. Tel: 407 823 3633; 407 823 3635; Email: acole{at}mail.ucf.edu
Received September 26, 2007. Revised December 7, 2007. Accepted December 27, 2007.
Algorithms for theoretical reverse translation have direct applications in degenerate PCR. The conventional practice is to create several degenerate primers each of which variably encode the peptide region of interest. In the current work, for each codon we have analyzed the flanking residues in proteins and determined their influence on codon choice. From this, we created a method for theoretical reverse translation that includes information from flanking residues of the protein in question. Our method, named the neighbor correlation method (NCM) and its enhancement, the consensus-NCM (c-NCM) performed significantly better than the conventional codon-usage statistic method (CSM). Using the methods NCM and c-NCM, we were able to increase the average sequence identity from 77% up to 81%. Furthermore, we revealed a significant increase in coverage, at 80% identity, from < 20% (CSM) to > 75% (c-NCM). The algorithms, their applications and implications are discussed herein.