Nucleic Acids Research Advance Access originally published online on December 10, 2008
Nucleic Acids Research 2009 37(2):591-601; doi:10.1093/nar/gkn917
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 2 591-601
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
Identification of protein-coding sequences using the hybridization of 18S rRNA and mRNA during translation
1Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695-7911, 2Department of Computer Science, North Carolina State University, Raleigh, NC 27695-8206, 3Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC 27695-7911, 4Department of Computer Science, North Carolina State University, Raleigh, NC 27695-8206 and 5Department of Forestry and Environmental Resources, North Carolina State University, Raleigh, NC 27695-8002, USA
*To whom correspondence should be addressed. Tel: +1 919 684 0621; Fax: +1 919 684 0900; Email: cx6{at}duke.edu
Received September 26, 2008. Revised October 29, 2008. Accepted October 30, 2008.
We introduce a new approach in this article to distinguish protein-coding sequences from non-coding sequences utilizing a period-3, free energy signal that arises from the interactions of the 3'-terminal nucleotides of the 18S rRNA with mRNA. We extracted the special features of the amplitude and the phase of the period-3 signal in protein-coding regions, which is not found in non-coding regions, and used them to distinguish protein-coding sequences from non-coding sequences. We tested on all the experimental genes from Saccharomyces cerevisiae and Schizosaccharomyces pombe. The identification was consistent with the corresponding information from GenBank, and produced better performance compared to existing methods that use a period-3 signal. The primary tests on some fly, mouse and human genes suggests that our method is applicable to higher eukaryotic genes. The tests on pseudogenes indicated that most pseudogenes have no period-3 signal. Some exploration of the 3'-tail of 18S rRNA and pattern analysis of protein-coding sequences supported further our assumption that the 3'-tail of 18S rRNA has a role of synchronization throughout translation elongation process. This, in turn, can be utilized for the identification of protein-coding sequences.