Skip Navigation

Nucleic Acids Research 2005 33(20):6494-6506; doi:10.1093/nar/gki937
This Article
Right arrow Full Text Freely available
Right arrow Print PDF (470K) Freely available
Right arrow Screen PDF (473K) Freely available
Right arrow Supplementary Material
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Lomsadze, A.
Right arrow Articles by Borodovsky, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lomsadze, A.
Right arrow Articles by Borodovsky, M.
Related Collections
Right arrow Computational methods
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published online 28 November 2005

© The Author 2005. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions{at}oxfordjournals.org


Article

Gene identification in novel eukaryotic genomes by self-training algorithm

Alexandre Lomsadze1, Vardges Ter-Hovhannisyan1, Yury O. Chernoff1 and Mark Borodovsky1,2,*

1School of Biology, Georgia Institute of Technology Atlanta, GA 30332-0230, USA 2Department of Biomedical Engineering, Georgia Institute of Technology Atlanta, GA 30332-0535, USA

*To whom correspondence should be addressed. Tel: +1 404 894 8432; Fax: +1 404 894 0519; Email: mark{at}amber.biology.gatech.edu

Received August 5, 2005. Revised October 12, 2005. Accepted October 12, 2005.

Finding new protein-coding genes is one of the most important goals of eukaryotic genome sequencing projects. However, genomic organization of novel eukaryotic genomes is diverse and ab initio gene finding tools tuned up for previously studied species are rarely suitable for efficacious gene hunting in DNA sequences of a new genome. Gene identification methods based on cDNA and expressed sequence tag (EST) mapping to genomic DNA or those using alignments to closely related genomes rely either on existence of abundant cDNA and EST data and/or availability on reference genomes. Conventional statistical ab initio methods require large training sets of validated genes for estimating gene model parameters. In practice, neither one of these types of data may be available in sufficient amount until rather late stages of the novel genome sequencing. Nevertheless, we have shown that gene finding in eukaryotic genomes could be carried out in parallel with statistical models estimation directly from yet anonymous genomic DNA. The suggested method of parallelization of gene prediction with the model parameters estimation follows the path of the iterative Viterbi training. Rounds of genomic sequence labeling into coding and non-coding regions are followed by the rounds of model parameters estimation. Several dynamically changing restrictions on the possible range of model parameters are added to filter out fluctuations in the initial steps of the algorithm that could redirect the iteration process away from the biologically relevant point in parameter space. Tests on well-studied eukaryotic genomes have shown that the new method performs comparably or better than conventional methods where the supervised model training precedes the gene prediction step. Several novel genomes have been analyzed and biologically interesting findings are discussed. Thus, a self-training algorithm that had been assumed feasible only for prokaryotic genomes has now been developed for ab initio eukaryotic gene identification.


The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
The Plant GenomeHome page
B. Joseph, J. A. Schlueter, J. Du, M. A. Graham, J. Ma, and R. C. Shoemaker
Retrotransposons within Syntenic Regions between Soybean and Medicago truncatula and Their Contribution to Local Genome Evolution
The Plant Genome, November 1, 2009; 2(3): 211 - 223.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
M. Vandenbussche, A. Horstman, J. Zethof, R. Koes, A. S. Rijpkema, and T. Gerats
Differential Recruitment of WOX Transcription Factors for Lateral Development and Organ Fusion in Petunia and Arabidopsis
PLANT CELL, August 1, 2009; 21(8): 2269 - 2283.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
D. de Jong, M. Eitel, W. Jakob, H.-J. Osigus, H. Hadrys, R. DeSalle, and B. Schierwater
Multiple Dicer Genes in the Early-Diverging Metazoa
Mol. Biol. Evol., June 1, 2009; 26(6): 1333 - 1340.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. Brejova, T. Vinar, Y. Chen, S. Wang, G. Zhao, D. G. Brown, M. Li, and Y. Zhou
Finding genes in Schistosoma japonicum: annotating novel genomes with help of extrinsic evidence
Nucleic Acids Res., April 1, 2009; 37(7): e52 - e52.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. H. Hur, K. Van Doninck, M. L. Mandigo, and M. Meselson
Degenerate Tetraploidy Was Established Before Bdelloid Rotifer Families Diverged
Mol. Biol. Evol., February 1, 2009; 26(2): 375 - 383.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
V. Ter-Hovhannisyan, A. Lomsadze, Y. O. Chernoff, and M. Borodovsky
Gene prediction in novel fungal genomes using an ab initio algorithm with unsupervised training
Genome Res., December 1, 2008; 18(12): 1979 - 1990.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Warthmann, S. Das, C. Lanz, and D. Weigel
Comparative Analysis of the MIR319a MicroRNA Locus in Arabidopsis and Related Brassicaceae
Mol. Biol. Evol., May 1, 2008; 25(5): 892 - 902.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Stanke, M. Diekhans, R. Baertsch, and D. Haussler
Using native and syntenically mapped cDNA alignments to improve de novo gene finding
Bioinformatics, March 1, 2008; 24(5): 637 - 644.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Sulakhe, M. D'Souza, M. Syed, A. Rodriguez, Y. Zhang, E. M. Glass, M. F. Romine, and N. Maltsev
GNARE--a grid-based server for the analysis of user submitted genomes
Nucleic Acids Res., May 25, 2007; (2007) gkm366v1.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Parra, K. Bradnam, and I. Korf
CEGMA: a pipeline to accurately annotate core genes in eukaryotic genomes
Bioinformatics, May 1, 2007; 23(9): 1061 - 1067.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer
GISMO--gene identification using a support vector machine for ORF classification
Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
H. Chu, Q. Qian, W. Liang, C. Yin, H. Tan, X. Yao, Z. Yuan, J. Yang, H. Huang, D. Luo, et al.
The FLORAL ORGAN NUMBER4 Gene Encoding a Putative Ortholog of Arabidopsis CLAVATA3 Regulates Apical Meristem Size in Rice
Plant Physiology, November 1, 2006; 142(3): 1039 - 1052.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.