Nucleic Acids Research, 1992, Vol. 20, No. 24 6441-6450
© 1992
SURVEY AND SUMMARY |
Assessment of protein coding measures
Theoretical Biology and Biophysics Group, and Center for Human Genome Studies, Los Alamos National Laboratory Los Alamos, NM 87545, USA
Received July 20, 1992. Revised November 13, 1992. Accepted November 13, 1992.
A number of methods for recognizing protein coding genes in DNA sequence have been published over the last 13 years, and new, more comprehensive algorithms, drawing on the repertoire of existing techniques, continue to be developed. To optimize continued development, it is valuable to systematically review and evaluate published techniques. At the core of most gene recognition algorithms is one or more coding measures functions which produce, given any sample window of sequence, a number or vector intended to measure the degree to which a sample sequence resembles a window of typical exonic DNA. In this paper we review and synthesize the underlying coding measures from published algorithms. A standardized benchmark is described, and each of the measures is evaluated according to this benchmark. Our main conclusion is that a very simple and obvious measure counting oligomers is more effective than any of the more sophisticated measures. Ditferent measures contain different information. However there is a great deal of redundancy in the current suite of measures. We show that in future development of gene recognition algorithms, attention can probably be limited to six of the twenty or so measures proposed to date.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Xing, D. L. Bitzer, W. E. Alexander, M. A. Vouk, and A.-M. Stomp Identification of protein-coding sequences using the hybridization of 18S rRNA and mRNA during translation Nucleic Acids Res., February 1, 2009; 37(2): 591 - 601. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, T. Abeel, S. Degroeve, and Y. Van de Peer Translation initiation site prediction on a genomic scale: beauty in simplicity Bioinformatics, July 1, 2007; 23(13): i418 - i423. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hanada, X. Zhang, J. O. Borevitz, W.-H. Li, and S.-H. Shiu A large number of novel coding small open reading frames in the intergenic regions of the Arabidopsis thaliana genome are transcribed and/or under purifying selection Genome Res., May 1, 2007; 17(5): 632 - 640. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, P. Rouze, and Y. Van de Peer In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists Bioinformatics, February 15, 2007; 23(4): 414 - 420. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. C. Kulkarni, R. Vigneshwar, V. K. Jayaraman, and B. D. Kulkarni Identification of coding and non-coding sequences using local Holder exponent formalism Bioinformatics, October 15, 2005; 21(20): 3818 - 3823. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Nicholson, M. K. Theodorou, and J. L. Brookman Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome Microbiology, January 1, 2005; 151(1): 121 - 133. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang and L. Luo Splice site prediction with quadratic discriminant analysis using diversity measure Nucleic Acids Res., November 1, 2003; 31(21): 6214 - 6220. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kotlar and Y. Lavner Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions Genome Res., August 1, 2003; 13(8): 1930 - 1937. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Balakirev, V. R. Chechetkin, V. V. Lobzin, and F. J. Ayala DNA Polymorphism in the {beta}-Esterase Gene Cluster of Drosophila melanogaster Genetics, June 1, 2003; 164(2): 533 - 544. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rogic, A. K. Mackworth, and F. B.F. Ouellette Evaluation of Gene-Finding Programs on Mammalian Sequences Genome Res., May 1, 2001; 11(5): 817 - 832. [Abstract] [Full Text] |
||||
![]() |
J.-M. Claverie From Bioinformatics to Computational Biology Genome Res., September 1, 2000; 10(9): 1277 - 1279. [Full Text] |
||||
![]() |
G. D. Stormo Gene-Finding Approaches for Eukaryotes Genome Res., April 1, 2000; 10(4): 394 - 397. [Full Text] |
||||
![]() |
M. G. Reese, G. Hartzell, N. L. Harris, U. Ohler, J. F. Abril, and S. E. Lewis Genome Annotation Assessment in Drosophila melanogaster Genome Res., April 1, 2000; 10(4): 483 - 501. [Abstract] [Full Text] |
||||
![]() |
A. A. Salamov and V. V. Solovyev Ab initio Gene Finding in Drosophila Genomic DNA Genome Res., April 1, 2000; 10(4): 516 - 522. [Abstract] [Full Text] |
||||
![]() |
M. G. Reese, D. Kulp, H. Tammana, and D. Haussler Genie---Gene Finding in Drosophila melanogaster Genome Res., April 1, 2000; 10(4): 529 - 538. [Abstract] [Full Text] |
||||
![]() |
C. Médigue, M. Rose, A. Viari, and A. Danchin Detecting and Analyzing DNA Sequencing Errors: Toward a Higher Quality of the Bacillus subtilis Genome Sequence Genome Res., November 1, 1999; 9(11): 1116 - 1127. [Abstract] [Full Text] |
||||
![]() |
S. Audic and J.-M. Claverie Self-identification of protein-coding regions in microbial genomes PNAS, August 18, 1998; 95(17): 10026 - 10031. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Q. Zhang Identification of protein coding regions in the human genome by quadratic discriminant analysis PNAS, January 21, 1997; 94(2): 565 - 568. [Abstract] [Full Text] [PDF] |
||||





