Nucleic Acids Research, Vol 27, Issue 19 3911-3920, Copyright © 1999 by Oxford University Press
J Besemer and M Borodovsky
Computer methods of accurate gene finding in DNA sequences require models
of protein coding and non-coding regions derived either from experimentally
validated training sets or from large amounts of anonymous DNA sequence.
Here we propose a new, heuristic method producing fairly accurate
inhomogeneous Markov models of protein coding regions. The new method needs
such a small amount of DNA sequence data that the model can be built 'on
the fly' by a web server for any DNA sequence >400 nt. Tests on 10
complete bacterial genomes performed with the GeneMark.hmm program
demonstrated the ability of the new models to detect 93.1% of annotated
genes on average, while models built by traditional training predict an
average of 93.9% of genes. Models built by the heuristic approach could be
used to find genes in small fragments of anonymous prokaryotic genomes and
in genomes of organelles, viruses, phages and plasmids, as well as in
highly inhomogeneous genomes where adjustment of models to local DNA
composition is needed. The heuristic method also gives an insight into the
mechanism of codon usage pattern evolution.
ARTICLES
Heuristic approach to deriving models for gene finding
School of Biology, Georgia Institute of Technology, Atlanta, GA 30332- 0230, USA.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Singh, J. Stavrinides, D. Christendat, and D. S. Guttman A Phylogenomic Analysis of the Shikimate Dehydrogenases Reveals Broadscale Functional Diversification and Identifies One Functionally Distinct Subclass Mol. Biol. Evol., October 1, 2008; 25(10): 2221 - 2232. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations Biophys. J., June 1, 2008; 94(11): 4173 - 4183. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. H. Hazen, D. Wu, J. A. Eisen, and P. A. Sobecky Sequence Characterization and Comparative Analysis of Three Plasmids Isolated from Environmental Vibrio spp. Appl. Envir. Microbiol., December 1, 2007; 73(23): 7703 - 7710. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. McCauley, S. de Groot, T. Mailund, and J. Hein Annotation of selection strengths in viral genomes Bioinformatics, November 15, 2007; 23(22): 2978 - 2986. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Miller, R. M. Elliot, J. T. Sullivan, and C. W. Ronson Host-specific regulation of symbiotic nitrogen fixation in Rhizobium leguminosarum biovar trifolii Microbiology, September 1, 2007; 153(9): 3184 - 3195. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-H. Lee, J. S. Halgerson, J.-H. Kim, and D. J. O'Sullivan Comparative Sequence Analysis of Plasmids from Lactobacillus delbrueckii and Construction of a Shuttle Cloning Vector Appl. Envir. Microbiol., July 15, 2007; 73(14): 4417 - 4424. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Bao, P. Bolotov, D. Dernovoy, B. Kiryutin, and T. Tatusova FLAN: a web server for influenza virus genome annotation Nucleic Acids Res., July 13, 2007; 35(suppl_2): W280 - W284. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Parschat, J. Overhage, A. W. Strittmatter, A. Henne, G. Gottschalk, and S. Fetzner Complete Nucleotide Sequence of the 113-Kilobase Linear Catabolic Plasmid pAL1 of Arthrobacter nitroguajacolicus Ru61a and Transcriptional Analysis of Genes Involved in Quinaldine Degradation J. Bacteriol., May 15, 2007; 189(10): 3855 - 3867. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Delcher, K. A. Bratke, E. C. Powers, and S. L. Salzberg Identifying bacterial genes and endosymbiont DNA with Glimmer Bioinformatics, March 15, 2007; 23(6): 673 - 679. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Goh, P. F. Ong, K. P. Song, T. V. Riley, and B. J. Chang The complete genome sequence of Clostridium difficile phage {phi}C2 and comparisons to {phi}CD119 and inducible prophages of CD630 Microbiology, March 1, 2007; 153(3): 676 - 685. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, J. Park, and T. Takagi MetaGene: prokaryotic gene finding from environmental genome shotgun sequences Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Beloqui, M. Pita, J. Polaina, A. Martinez-Arias, O. V. Golyshina, M. Zumarraga, M. M. Yakimov, H. Garcia-Arellano, M. Alcalde, V. M. Fernandez, et al. Novel Polyphenol Oxidase Mined from a Metagenome Expression Library of Bovine Rumen: BIOCHEMICAL PROPERTIES, STRUCTURAL ANALYSIS, AND PHYLOGENETIC RELATIONSHIPS J. Biol. Chem., August 11, 2006; 281(32): 22933 - 22942. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Zhang, X. Su, S. Gong, Q. Zeng, B. Zhu, Z. Wu, T. Peng, C. Zhang, and R. Zhou Comparative genomic analysis of two strains of human adenovirus type 3 isolated from children with acute respiratory infection in southern China J. Gen. Virol., June 1, 2006; 87(6): 1531 - 1541. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. McCauley and J. Hein Using hidden Markov models and observed evolution to annotate viral genomes Bioinformatics, June 1, 2006; 22(11): 1308 - 1316. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. J. Summer, C. F. Gonzalez, M. Bomer, T. Carlile, A. Embry, A. M. Kucherka, J. Lee, L. Mebane, W. C. Morrison, L. Mark, et al. Divergence and Mosaicism among Virulent Soil Phages of the Burkholderia cepacia Complex J. Bacteriol., January 1, 2006; 188(1): 255 - 268. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nielsen and A. Krogh Large-scale prokaryotic gene prediction and comparison to genome annotation Bioinformatics, December 15, 2005; 21(24): 4322 - 4329. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhao, R. E. Davis, and I.-M. Lee Phylogenetic positions of 'Candidatus Phytoplasma asteris' and Spiroplasma kunkelii as inferred from multiple sets of concatenated core housekeeping proteins Int J Syst Evol Microbiol, September 1, 2005; 55(5): 2131 - 2141. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Purkayastha, J. Su, J. McGraw, S. E. Ditty, T. L. Hadfield, J. Seto, K. L. Russell, C. Tibbetts, and D. Seto Genomic and Bioinformatics Analyses of HAdV-4vac and HAdV-7vac, Two Human Adenovirus (HAdV) Strains That Constituted Original Prophylaxis against HAdV-Related Acute Respiratory Disease, a Reemerging Epidemic Disease J. Clin. Microbiol., July 1, 2005; 43(7): 3083 - 3094. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Besemer and M. Borodovsky GeneMark: web software for gene finding in prokaryotes, eukaryotes and viruses Nucleic Acids Res., July 1, 2005; 33(suppl_2): W451 - W454. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Lohr, F. Chen, and R. T. Hill Genomic Analysis of Bacteriophage {Phi}JL001: Insights into Its Interaction with a Sponge-Associated Alpha-Proteobacterium Appl. Envir. Microbiol., March 1, 2005; 71(3): 1598 - 1609. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Purkayastha, S. E. Ditty, J. Su, J. McGraw, T. L. Hadfield, C. Tibbetts, and D. Seto Genomic and Bioinformatics Analysis of HAdV-4, a Human Adenovirus Causing Acute Respiratory Disease: Implications for Gene Therapy and Vaccine Vector Development J. Virol., February 15, 2005; 79(4): 2559 - 2572. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jansen, N. Chansiripornchai, W. Gaastra, and J. P. M. van Putten Characterization of Plasmid pOR1 from Ornithobacterium rhinotracheale and Construction of a Shuttle Plasmid Appl. Envir. Microbiol., October 1, 2004; 70(10): 5853 - 5858. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. P. Lauer, I. Llorente, E. Blair, J. Seto, V. Krasnov, A. Purkayastha, S. E. Ditty, T. L. Hadfield, C. Buck, C. Tibbetts, et al. Natural variation among human adenoviruses: genome sequence and annotation of human adenovirus serotype 1 J. Gen. Virol., September 1, 2004; 85(9): 2615 - 2625. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Stavrinides and D. S. Guttman Nucleotide Sequence and Evolution of the Five-Plasmid Complement of the Phytopathogen Pseudomonas syringae pv. maculicola ES4326 J. Bacteriol., August 1, 2004; 186(15): 5101 - 5115. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. H. Hurst, T. R. Glare, and T. A. Jackson Cloning Serratia entomophila Antifeeding Genes--a Putative Defective Prophage Active against the Grass Grub Costelytra zealandica J. Bacteriol., August 1, 2004; 186(15): 5116 - 5128. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Pavlovic, V. Burrus, B. Gintz, B. Decaris, and G. Guedon Evolution of genomic islands by deletion and tandem accretion by site-specific recombination: ICESt1-related elements from Streptococcus thermophilus Microbiology, April 1, 2004; 150(4): 759 - 774. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. W. Wang, L. Chu, and D. S. Guttman Complete Sequence and Evolutionary Genomic Analysis of the Pseudomonas aeruginosa Transposable Bacteriophage D3112 J. Bacteriol., January 15, 2004; 186(2): 400 - 410. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Klockgether, O. Reva, K. Larbig, and B. Tummler Sequence Analysis of the Mobile Genome Island pKLC102 of Pseudomonas aeruginosa C J. Bacteriol., January 15, 2004; 186(2): 518 - 534. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mills, M. Rozanov, A. Lomsadze, T. Tatusova, and M. Borodovsky Improving gene annotation of complete viral genomes Nucleic Acids Res., December 1, 2003; 31(23): 7041 - 7055. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Seguritan, I-W. Feng, F. Rohwer, M. Swift, and A. M. Segall Genome Sequences of Two Closely Related Vibrio parahaemolyticus Phages, VP16T and VP16C J. Bacteriol., November 1, 2003; 185(21): 6434 - 6447. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Nelson, R. Schuch, S. Zhu, D. M. Tscherne, and V. A. Fischetti Genomic Sequence of C1, the First Streptococcal Phage J. Bacteriol., June 1, 2003; 185(11): 3325 - 3332. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Deng, S.-R. Liou, G. Plunkett III, G. F. Mayhew, D. J. Rose, V. Burland, V. Kodoyianni, D. C. Schwartz, and F. R. Blattner Comparative Genomics of Salmonellaenterica Serovar Typhi Strains Ty2 and CT18 J. Bacteriol., April 1, 2003; 185(7): 2330 - 2337. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. H. A. Parret, G. Schoofs, P. Proost, and R. De Mot Plant Lectin-Like Bacteriocin from a Rhizosphere-Colonizing Pseudomonas Isolate J. Bacteriol., February 1, 2003; 185(3): 897 - 908. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. D. Larbig, A. Christmann, A. Johann, J. Klockgether, T. Hartsch, R. Merkl, L. Wiehlmann, H.-J. Fritz, and B. Tummler Gene Islands Integrated into tRNAGly Genes Confer Genome Diversity on a Pseudomonas aeruginosa Clone J. Bacteriol., December 1, 2002; 184(23): 6665 - 6680. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. N. Mijts and B. K. C. Patel Cloning, sequencing and expression of an {alpha}-amylase gene, amyA, from the thermophilic halophile Halothermothrix orenii and purification and biochemical characterization of the recombinant enzyme Microbiology, August 1, 2002; 148(8): 2343 - 2349. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. T. Sullivan, J. R. Trzebiatowski, R. W. Cruickshank, J. Gouzy, S. D. Brown, R. M. Elliot, D. J. Fleetwood, N. G. McCallum, U. Rossbach, G. S. Stuart, et al. Comparative Sequence Analysis of the Symbiosis Island of Mesorhizobium loti Strain R7A J. Bacteriol., June 1, 2002; 184(11): 3086 - 3095. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Recktenwald and H. Schmidt The Nucleotide Sequence of Shiga Toxin (Stx) 2e-Encoding Phage {phi}P27 Is Not Related to Other Stx Phage Genomes, but the Modular Genetic Structure Is Conserved Infect. Immun., April 1, 2002; 70(4): 1896 - 1908. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Chauvaux, F. Chevalier, C. Le Dantec, F. Fayolle, I. Miras, F. Kunst, and P. Beguin Cloning of a Genetically Unstable Cytochrome P-450 Gene Cluster Involved in Degradation of the Pollutant Ethyl tert-Butyl Ether by Rhodococcus ruber J. Bacteriol., November 15, 2001; 183(22): 6551 - 6557. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Hambly, F. Tetart, C. Desplats, W. H. Wilson, H. M. Krisch, and N. H. Mann A conserved genetic module that encodes the major virion components in both the coliphage T4 and the marine cyanophage S-PM2 PNAS, September 5, 2001; (2001) 191174498. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Besemer, A. Lomsadze, and M. Borodovsky GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions Nucleic Acids Res., June 15, 2001; 29(12): 2607 - 2618. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. T. Sullivan, S. D. Brown, R. R. Yocum, and C. W. Ronson The bio operon on the acquired symbiosis island of Mesorhizobium sp. strain R7A includes a novel gene involved in pimeloyl-CoA synthesis Microbiology, May 1, 2001; 147(5): 1315 - 1322. [Abstract] [Full Text] |
||||
![]() |
C. Kiewitz, K. Larbig, J. Klockgether, C. Weinel, and B. Tümmler Monitoring genome evolution ex vivo: reversible chromosomal integration of a 106 kb plasmid at two tRNALys gene loci in sequential Pseudomonas aeruginosa airway isolates Microbiology, October 1, 2000; 146(10): 2365 - 2373. [Abstract] [Full Text] |
||||
![]() |
M. Muniesa, J. Recktenwald, M. Bielaszewska, H. Karch, and H. Schmidt Characterization of a Shiga Toxin 2e-Converting Bacteriophage from an Escherichia coli Strain of Human Origin Infect. Immun., September 1, 2000; 68(9): 4850 - 4855. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fawcett, P. Eichenberger, R. Losick, and P. Youngman The transcriptional profile of early to middle sporulation in Bacillus subtilis PNAS, June 23, 2000; (2000) 140209597. [Abstract] [Full Text] |
||||
![]() |
M. G. Reese, G. Hartzell, N. L. Harris, U. Ohler, J. F. Abril, and S. E. Lewis Genome Annotation Assessment in Drosophila melanogaster Genome Res., April 1, 2000; 10(4): 483 - 501. [Abstract] [Full Text] |
||||
![]() |
P. Fawcett, P. Eichenberger, R. Losick, and P. Youngman The transcriptional profile of early to middle sporulation in Bacillus subtilis PNAS, July 5, 2000; 97(14): 8063 - 8068. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Hambly, F. Tetart, C. Desplats, W. H. Wilson, H. M. Krisch, and N. H. Mann A conserved genetic module that encodes the major virion components in both the coliphage T4 and the marine cyanophage S-PM2 PNAS, September 25, 2001; 98(20): 11411 - 11416. [Abstract] [Full Text] [PDF] |
||||














