Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (129K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (120)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Mathé, C.
Right arrow Articles by Rouzé, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mathé, C.
Right arrow Articles by Rouzé, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 19 4103-4117
© 2002 Oxford University Press

Current methods of gene prediction, their strengths and weaknesses

Catherine Mathé*, Marie-France Sagot1, Thomas Schiex2 and Pierre Rouzé3

Institut de Pharmacologie et Biologie Structurale, UMR 5089, 205 route de Narbonne, F-31077 Toulouse Cedex, France, 1 INRIA Rhône-Alpes, UMR 5558 Biométrie et Biologie Évolutive, Université Claude Bernard, Lyon I, 43 Boulevard du 11 Novembre, F-69622 Villeurbanne Cedex, France, 2 INRA Toulouse, Département de Biométrie et Intelligence Artificielle, Chemin de Borde Rouge, BP 27, F-31326 Castanet-Tolosan Cedex, France and 3 Laboratoire Associé de l’INRA (France), Universiteit Gent, Ledeganckstraat 35, B-9000 Gent, Belgium

*To whom correspondence should be addressed. Tel: +33 5 61 17 59 53; Fax: +33 5 61 17 59 94; Email: catherine.mathe{at}ipbs.fr

While the genomes of many organisms have been sequenced over the last few years, transforming such raw sequence data into knowledge remains a hard task. A great number of prediction programs have been developed that try to address one part of this problem, which consists of locating the genes along a genome. This paper reviews the existing approaches to predicting genes in eukaryotic genomes and underlines their intrinsic advantages and limitations. The main mathematical models and computational algorithms adopted are also briefly described and the resulting software classified according to both the method and the type of evidence used. Finally, the several difficulties and pitfalls encountered by the programs are detailed, showing that improvements are needed and that new directions must be considered.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
C. Xing, D. L. Bitzer, W. E. Alexander, M. A. Vouk, and A.-M. Stomp
Identification of protein-coding sequences using the hybridization of 18S rRNA and mRNA during translation
Nucleic Acids Res., February 1, 2009; 37(2): 591 - 601.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
V. Ter-Hovhannisyan, A. Lomsadze, Y. O. Chernoff, and M. Borodovsky
Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training
Genome Res., December 1, 2008; 18(12): 1979 - 1990.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Ryabov and M. Gribskov
Spontaneous symmetry breaking in genome evolution
Nucleic Acids Res., May 1, 2008; 36(8): 2756 - 2763.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Abeel, Y. Saeys, E. Bonnet, P. Rouze, and Y. Van de Peer
Generic eukaryotic core promoter prediction using structural features of DNA
Genome Res., February 1, 2008; 18(2): 310 - 323.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
F. F. Roos, R. Jacob, J. Grossmann, B. Fischer, J. M. Buhmann, W. Gruissem, S. Baginsky, and P. Widmayer
PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra
Bioinformatics, November 15, 2007; 23(22): 3016 - 3023.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
W. Zhu and C. R. Buell
Improvement of whole-genome annotation of cereals through comparative analyses
Genome Res., March 1, 2007; 17(3): 299 - 310.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Saeys, P. Rouze, and Y. Van de Peer
In search of the small ones: improved prediction of short exons in vertebrates, plants, fungi and protists
Bioinformatics, February 15, 2007; 23(4): 414 - 420.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Knapp and Y.-P. P. Chen
An evaluation of contemporary hidden Markov model genefinders with a predicted exon taxonomy
Nucleic Acids Res., January 12, 2007; 35(1): 317 - 324.
[Abstract] [Full Text] [PDF]


Home page
J Exp BotHome page
M. Gorantla, P. Babu, V. Reddy Lachagari, A. Reddy, R. Wusirika, J. L. Bennetzen, and A. R. Reddy
Identification of stress-responsive genes in an indica rice (Oryza sativa L.) using ESTs generated from drought-stressed seedlings
J. Exp. Bot., January 1, 2007; 58(2): 253 - 265.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
A. Pareek, A. Singh, M. Kumar, H. R. Kushwaha, A. M. Lynn, and S. L. Singla-Pareek
Whole-Genome Analysis of Oryza sativa Reveals Similar Architecture of Two-Component Signaling Machinery with Arabidopsis
Plant Physiology, October 1, 2006; 142(2): 380 - 397.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. J. Hsieh, C. Y. Lin, N. H. Liu, W. Y. Chow, and C. Y. Tang
GeneAlign: a coding exon prediction tool based on phylogenetical comparisons.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W280 - W284.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Agrawal and G. D. Stormo
Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans
Bioinformatics, May 15, 2006; 22(10): 1239 - 1244.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al.
Machine learning in bioinformatics
Brief Bioinform, March 1, 2006; 7(1): 86 - 112.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky
Gene identification in novel eukaryotic genomes by self-training algorithm
Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. Ge, S. Gurd, T. Gaudin, C. Dore, P. Lepage, E. Harmsen, T. J. Hudson, and T. Pastinen
Survey of allelic expression using EST mining
Genome Res., November 1, 2005; 15(11): 1584 - 1591.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
L. H. Pratt, C. Liang, M. Shah, F. Sun, H. Wang, St. P. Reid, A. R. Gingle, A. H. Paterson, R. Wing, R. Dean, et al.
Sorghum Expressed Sequence Tags Identify Signature Genes for Drought, Pathogenesis, and Skotomorphogenesis from a Milestone Set of 16,801 Unique Transcripts
Plant Physiology, October 1, 2005; 139(2): 869 - 884.
[Abstract] [Full Text] [PDF]


Home page
Integr. Comp. Biol.Home page
R. E. Steele
Genomics of Basal Metazoans
Integr. Comp. Biol., August 1, 2005; 45(4): 639 - 648.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Saetrom, R. Sneve, K. I. Kristiansen, O. Snove Jr, T. Grunfeld, T. Rognes, and E. Seeberg
Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming
Nucleic Acids Res., June 7, 2005; 33(10): 3263 - 3270.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Degroeve, Y. Saeys, B. De Baets, P. Rouze, and Y. Van de Peer
SpliceMachine: predicting splice sites from high-dimensional local context representations
Bioinformatics, April 15, 2005; 21(8): 1332 - 1338.
[Abstract] [Full Text] [PDF]


Home page
Physiol. GenomicsHome page
C. R. Marshall, J. A. Fox, S. L. Butland, B. F. F. Ouellette, F. S. L. Brinkman, and G. F. Tibbits
Phylogeny of Na+/Ca2+ exchanger (NCX) genes from genomic data identifies new gene duplications and a new family member in fish species
Physiol Genomics, April 14, 2005; 21(2): 161 - 173.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T.-M. Chen, C.-C. Lu, and W.-H. Li
Prediction of splice sites with dependency graphs and their expanded bayesian networks
Bioinformatics, February 15, 2005; 21(4): 471 - 482.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Aubourg, V. Brunaud, C. Bruyere, M. Cock, R. Cooke, A. Cottet, A. Couloux, P. Dehais, G. Deleage, A. Duclert, et al.
GeneFarm, structural and functional annotation of Arabidopsis gene and protein families by a network of experts
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D641 - D646.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Vandepoele and Y. Van de Peer
Exploring the Plant Transcriptome through Phylogenetic Profiling
Plant Physiology, January 1, 2005; 137(1): 31 - 42.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. Issac and G. P. S. Raghava
EGPred: Prediction of Eukaryotic Genes Using Ab Initio Methods After Combining With Sequence Similarity Approaches
Genome Res., September 1, 2004; 14(9): 1756 - 1766.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. M. Roche, K. Hokamp, M. Acab, L. A. Babiuk, R. E. W. Hancock, and F. S. L. Brinkman
ProbeLynx: a tool for updating the association of microarray probes to genes
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W471 - W474.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Boivin, A. Acarkan, R.-S. Mbulu, O. Clarenz, and R. Schmidt
The Arabidopsis Genome Sequence as a Tool for Genome Analysis in Brassicaceae. A Comparison of the Arabidopsis and Capsella rubella Genomes
Plant Physiology, June 1, 2004; 135(2): 735 - 744.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. Dewey, J. Q. Wu, S. Cawley, M. Alexandersson, R. Gibbs, and L. Pachter
Accurate Identification of Novel Human Genes Through Simultaneous Gene Prediction in Human, Mouse, and Rat
Genome Res., April 1, 2004; 14(4): 661 - 664.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. Eden and S. Brunak
Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA
Nucleic Acids Res., February 11, 2004; 32(3): 1131 - 1142.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
B. Sohlberg, J. Huang, and S. N. Cohen
The Streptomyces coelicolor Polynucleotide Phosphorylase Homologue, and Not the Putative Poly(A) Polymerase, Can Polyadenylate RNA
J. Bacteriol., December 15, 2003; 185(24): 7273 - 7278.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. S. Clark, Y. J.K. Edwards, D. Peterson, S. W. Clifton, A. J. Thompson, M. Sasaki, Y. Suzuki, K. Kikuchi, S. Watabe, K. Kawakami, et al.
Fugu ESTs: New Resources for Transcription Analysis and Genome Annotation
Genome Res., December 1, 2003; 13(12): 2747 - 2753.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Zhang and L. Luo
Splice site prediction with quadratic discriminant analysis using diversity measure
Nucleic Acids Res., November 1, 2003; 31(21): 6214 - 6220.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
K. Yamada, J. Lim, J. M. Dale, H. Chen, P. Shinn, C. J. Palm, A. M. Southwick, H. C. Wu, C. Kim, M. Nguyen, et al.
Empirical Analysis of Transcriptional Activity in the Arabidopsis Genome
Science, October 31, 2003; 302(5646): 842 - 846.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. Kotlar and Y. Lavner
Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions
Genome Res., August 1, 2003; 13(8): 1930 - 1937.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
W. H. Majoros, M. Pertea, C. Antonescu, and S. L. Salzberg
GlimmerM, Exonomy and Unveil: three ab initio eukaryotic genefinders
Nucleic Acids Res., July 1, 2003; 31(13): 3601 - 3604.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Foissac, P. Bardou, A. Moisan, M.-J. Cros, and T. Schiex
EUGENE'HOM: a generic similarity-based gene finder using multiple homologous sequences
Nucleic Acids Res., July 1, 2003; 31(13): 3742 - 3745.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
S. Rombauts, K. Florquin, M. Lescot, K. Marchal, P. Rouze, and Y. Van de Peer
Computational Approaches to Identify Promoters and cis-Regulatory Elements in Plant Genomes
Plant Physiology, July 1, 2003; 132(3): 1162 - 1176.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.