Nucleic Acids Research, 2000, Vol. 28, No. 14 2804-2814
© 2000 Oxford University Press
Recognition of protein coding genes in the yeast genome at better than 95% accuracy based on the Z curve
Department of Physics, Tianjin University, Tianjin 300072, China
The Z curve is a three-dimensional space curve constituting the unique representation of a given DNA sequence in the sense that each can be uniquely reconstructed from the other. Based on the Z curve, a new protein coding gene-finding algorithm specific for the yeast genome at better than 95% accuracy has been proposed. Six cross-validation tests were performed to confirm the above accuracy. Using the new algorithm, the number of protein coding genes in the yeast genome is re-estimated. The estimate is based on the assumption that the unknown genes have similar statistical properties to the known genes. It is found that the number of protein coding genes in the 16 yeast chromosomes is
5645, significantly smaller than the 58006000 which is widely accepted, and much larger than the 4800 estimated by another group recently. The mitochondrial genes were not included into the above estimate. A codingness index called the YZ score (YZ
[0,1]) is proposed to recognize protein coding genes in the yeast genome. Among the ORFs annotated in the MIPS (Munich Information Centre for Protein Sequences) database, those recognized as non-coding by the present algorithm are listed in this paper in detail. The criterion for a coding or non-coding ORF is simply decided by YZ > 0.5 or YZ < 0.5, respectively. The YZ scores for all the ORFs annotated in the MIPS database have been calculated and are available on request by sending email to the corresponding author.
* To whom correspondence should be addressed. Tel: +86 22 2740 1008; Fax: +86 22 2335 8329; Email: ctzhang@tju.edu.cn
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
O. C. Kulkarni, R. Vigneshwar, V. K. Jayaraman, and B. D. Kulkarni Identification of coding and non-coding sequences using local Holder exponent formalism Bioinformatics, October 15, 2005; 21(20): 3818 - 3823. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-B. Guo, H.-Y. Ou, and C.-T. Zhang ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes Nucleic Acids Res., March 15, 2003; 31(6): 1780 - 1789. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Beaudoin, K. Gable, O. Sayanova, T. Dunn, and J. A. Napier A Saccharomyces cerevisiae Gene Required for Heterologous Fatty Acid Elongase Activity Encodes a Microsomal beta -Keto-reductase J. Biol. Chem., March 22, 2002; 277(13): 11481 - 11488. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, A. Kumar, N. Lang, M. Snyder, and M. Gerstein A question of size: the eukaryotic proteome and the problems in defining it Nucleic Acids Res., March 1, 2002; 30(5): 1083 - 1090. [Abstract] [Full Text] [PDF] |
||||


