Nucleic Acids Research Advance Access originally published online on January 18, 2008
Nucleic Acids Research 2008 36(3):e16; doi:10.1093/nar/gkm1181
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2008, Vol. 36, No. 3 e16
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
Codon choice in genes depends on flanking sequence information—implications for theoretical reverse translation
1Department of Molecular Biology and Microbiology, Burnett School of Biomedical Sciences, University of Central Florida, Orlando, FL, 32816, USA, 2Genomics and Regulatory Systems Group, EMBL-European Bioinformatics Institute, Wellcome Trust Genome Campus, Hinxton, Cambridge CB10 1SD, UK and 3Department of Biostatistics, University of Texas School of Public Health, El Paso, TX, 79902, USA
*To whom correspondence should be addressed. Tel: 407 823 3633; 407 823 3635; Email: acole{at}mail.ucf.edu
Received September 26, 2007. Revised December 7, 2007. Accepted December 27, 2007.
Algorithms for theoretical reverse translation have direct applications in degenerate PCR. The conventional practice is to create several degenerate primers each of which variably encode the peptide region of interest. In the current work, for each codon we have analyzed the flanking residues in proteins and determined their influence on codon choice. From this, we created a method for theoretical reverse translation that includes information from flanking residues of the protein in question. Our method, named the neighbor correlation method (NCM) and its enhancement, the consensus-NCM (c-NCM) performed significantly better than the conventional codon-usage statistic method (CSM). Using the methods NCM and c-NCM, we were able to increase the average sequence identity from 77% up to 81%. Furthermore, we revealed a significant increase in coverage, at 80% identity, from < 20% (CSM) to > 75% (c-NCM). The algorithms, their applications and implications are discussed herein.