Nucleic Acids Research, 2002, Vol. 30, No. 19 4103-4117
© 2002 Oxford University Press
Current methods of gene prediction, their strengths and weaknesses
Institut de Pharmacologie et Biologie Structurale, UMR 5089, 205 route de Narbonne, F-31077 Toulouse Cedex, France, 1 INRIA Rhône-Alpes, UMR 5558 Biométrie et Biologie Évolutive, Université Claude Bernard, Lyon I, 43 Boulevard du 11 Novembre, F-69622 Villeurbanne Cedex, France, 2 INRA Toulouse, Département de Biométrie et Intelligence Artificielle, Chemin de Borde Rouge, BP 27, F-31326 Castanet-Tolosan Cedex, France and 3 Laboratoire Associé de lINRA (France), Universiteit Gent, Ledeganckstraat 35, B-9000 Gent, Belgium
*To whom correspondence should be addressed. Tel: +33 5 61 17 59 53; Fax: +33 5 61 17 59 94; Email: catherine.mathe{at}ipbs.fr
While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations Biophys. J., June 1, 2008; 94(11): 4173 - 4183. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ryabov and M. Gribskov Spontaneous symmetry breaking in genome evolution Nucleic Acids Res., May 1, 2008; 36(8): 2756 - 2763. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Abeel, Y. Saeys, E. Bonnet, P. Rouze, and Y. Van de Peer Generic eukaryotic core promoter prediction using structural features of DNA Genome Res., February 1, 2008; 18(2): 310 - 323. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. F. Roos, R. Jacob, J. Grossmann, B. Fischer, J. M. Buhmann, W. Gruissem, S. Baginsky, and P. Widmayer PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra Bioinformatics, November 15, 2007; 23(22): 3016 - 3023. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhu and C. R. Buell Improvement of whole-genome annotation of cereals through comparative analyses Genome Res., March 1, 2007; 17(3): 299 - 310. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Saeys, P. Rouze, and Y. Van de Peer In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists Bioinformatics, February 15, 2007; 23(4): 414 - 420. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Knapp and Y.-P. P. Chen An evaluation of contemporary hidden Markov model genefinders with a predicted exon taxonomy Nucleic Acids Res., January 12, 2007; 35(1): 317 - 324. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Gorantla, P. Babu, V. Reddy Lachagari, A. Reddy, R. Wusirika, J. L. Bennetzen, and A. R. Reddy Identification of stress-responsive genes in an indica rice (Oryza sativa L.) using ESTs generated from drought-stressed seedlings J. Exp. Bot., January 1, 2007; 58(2): 253 - 265. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Pareek, A. Singh, M. Kumar, H. R. Kushwaha, A. M. Lynn, and S. L. Singla-Pareek Whole-Genome Analysis of Oryza sativa Reveals Similar Architecture of Two-Component Signaling Machinery with Arabidopsis Plant Physiology, October 1, 2006; 142(2): 380 - 397. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang GeneAlign: a coding exon prediction tool based on phylogenetical comparisons. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W280 - W284. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Agrawal and G. D. Stormo Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans Bioinformatics, May 15, 2006; 22(10): 1239 - 1244. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Ge, S. Gurd, T. Gaudin, C. Dore, P. Lepage, E. Harmsen, T. J. Hudson, and T. Pastinen Survey of allelic expression using EST mining Genome Res., November 1, 2005; 15(11): 1584 - 1591. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. H. Pratt, C. Liang, M. Shah, F. Sun, H. Wang, St. P. Reid, A. R. Gingle, A. H. Paterson, R. Wing, R. Dean, et al. Sorghum Expressed Sequence Tags Identify Signature Genes for Drought, Pathogenesis, and Skotomorphogenesis from a Milestone Set of 16,801 Unique Transcripts Plant Physiology, October 1, 2005; 139(2): 869 - 884. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. E. Steele Genomics of Basal Metazoans Integr. Comp. Biol., August 1, 2005; 45(4): 639 - 648. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Saetrom, R. Sneve, K. I. Kristiansen, O. Snove Jr, T. Grunfeld, T. Rognes, and E. Seeberg Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming Nucleic Acids Res., June 7, 2005; 33(10): 3263 - 3270. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Degroeve, Y. Saeys, B. De Baets, P. Rouze, and Y. Van de Peer SpliceMachine: predicting splice sites from high-dimensional local context representations Bioinformatics, April 15, 2005; 21(8): 1332 - 1338. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Marshall, J. A. Fox, S. L. Butland, B. F. F. Ouellette, F. S. L. Brinkman, and G. F. Tibbits Phylogeny of Na+/Ca2+ exchanger (NCX) genes from genomic data identifies new gene duplications and a new family member in fish species Physiol Genomics, April 14, 2005; 21(2): 161 - 173. [Abstract] [Full Text] [PDF] |
||||
![]() |
T.-M. Chen, C.-C. Lu, and W.-H. Li Prediction of splice sites with dependency graphs and their expanded bayesian networks Bioinformatics, February 15, 2005; 21(4): 471 - 482. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Aubourg, V. Brunaud, C. Bruyere, M. Cock, R. Cooke, A. Cottet, A. Couloux, P. Dehais, G. Deleage, A. Duclert, et al. GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts Nucleic Acids Res., January 1, 2005; 33(suppl_1): D641 - D646. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Vandepoele and Y. Van de Peer Exploring the Plant Transcriptome through Phylogenetic Profiling Plant Physiology, January 1, 2005; 137(1): 31 - 42. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Issac and G. P. S. Raghava EGPred: Prediction of Eukaryotic Genes Using Ab Initio Methods After Combining With Sequence Similarity Approaches Genome Res., September 1, 2004; 14(9): 1756 - 1766. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. M. Roche, K. Hokamp, M. Acab, L. A. Babiuk, R. E. W. Hancock, and F. S. L. Brinkman ProbeLynx: a tool for updating the association of microarray probes to genes Nucleic Acids Res., July 1, 2004; 32(suppl_2): W471 - W474. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Boivin, A. Acarkan, R.-S. Mbulu, O. Clarenz, and R. Schmidt The Arabidopsis Genome Sequence as a Tool for Genome Analysis in Brassicaceae. A Comparison of the Arabidopsis and Capsella rubella Genomes Plant Physiology, June 1, 2004; 135(2): 735 - 744. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Dewey, J. Q. Wu, S. Cawley, M. Alexandersson, R. Gibbs, and L. Pachter Accurate Identification of Novel Human Genes Through Simultaneous Gene Prediction in Human, Mouse, and Rat Genome Res., April 1, 2004; 14(4): 661 - 664. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Eden and S. Brunak Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA Nucleic Acids Res., February 11, 2004; 32(3): 1131 - 1142. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Sohlberg, J. Huang, and S. N. Cohen The Streptomyces coelicolor Polynucleotide Phosphorylase Homologue, and Not the Putative Poly(A) Polymerase, Can Polyadenylate RNA J. Bacteriol., December 15, 2003; 185(24): 7273 - 7278. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. S. Clark, Y. J.K. Edwards, D. Peterson, S. W. Clifton, A. J. Thompson, M. Sasaki, Y. Suzuki, K. Kikuchi, S. Watabe, K. Kawakami, et al. Fugu ESTs: New Resources for Transcription Analysis and Genome Annotation Genome Res., December 1, 2003; 13(12): 2747 - 2753. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang and L. Luo Splice site prediction with quadratic discriminant analysis using diversity measure Nucleic Acids Res., November 1, 2003; 31(21): 6214 - 6220. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Yamada, J. Lim, J. M. Dale, H. Chen, P. Shinn, C. J. Palm, A. M. Southwick, H. C. Wu, C. Kim, M. Nguyen, et al. Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome Science, October 31, 2003; 302(5646): 842 - 846. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kotlar and Y. Lavner Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions Genome Res., August 1, 2003; 13(8): 1930 - 1937. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. H. Majoros, M. Pertea, C. Antonescu, and S. L. Salzberg GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders Nucleic Acids Res., July 1, 2003; 31(13): 3601 - 3604. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Foissac, P. Bardou, A. Moisan, M.-J. Cros, and T. Schiex EUGENE'HOM: a generic similarity-based gene finder using multiple homologous sequences Nucleic Acids Res., July 1, 2003; 31(13): 3742 - 3745. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rombauts, K. Florquin, M. Lescot, K. Marchal, P. Rouze, and Y. Van de Peer Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes Plant Physiology, July 1, 2003; 132(3): 1162 - 1176. [Abstract] [Full Text] [PDF] |
||||










