Nucleic Acids Research, Vol 26, Issue 12 2941-2947, Copyright © 1998 by Oxford University Press
D Frishman, A Mironov, HW Mewes and M Gelfand
Analysis of a newly sequenced bacterial genome starts with identification
of protein-coding genes. Functional assignment of proteins requires the
exact knowledge of protein N-termini. We present a new program ORPHEUS that
identifies candidate genes and accurately predicts gene starts. The
analysis starts with a database similarity search and identification of
reliable gene fragments. The latter are used to derive statistical
characteristics of protein-coding regions and ribosome-binding sites and to
predict the complete set of genes in the analyzed genome. In a test on
Bacillus subtilis and Escherichia coli genomes, the program correctly
identified 93.3% (resp. 96.3%) of experimentally annotated genes longer
than 100 codons described in the PIR-International database, and for these
genes 96.3% (83.9%) of starts were predicted exactly. Furthermore, 98.9%
(99.1%) of genes longer than 100 codons annotated in GenBank were found,
and 92.9% (75.7%) of predicted starts coincided with the feature table
description. Finally, for the complete gene complements of B.subtilis and
E.coli , including genes shorter than 100 codons, gene prediction accuracy
was 88.9 and 87.1%, respectively, with 94.2 and 76.7% starts coinciding
with the existing annotation.
ARTICLES
Combining diverse evidence for gene recognition in completely sequenced bacterial genomes [published erratum appears in Nucleic Acids Res 1998 Aug 15;26(16):following 3870]
Munich Information Center for Protein Sequences (MIPS) of the German National Center for Health and Environment (GSF), Am Klopferspitz 18a, 82152 Martinsried, Germany. frishman@mips.biochem.mpg.de
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. A. Moynihan, J. P. Morrissey, E. R. Coppoolse, W. J. Stiekema, F. O'Gara, and E. F. Boyd Evolutionary History of the phl Gene Cluster in the Plant-Associated Bacterium Pseudomonas fluorescens Appl. Envir. Microbiol., April 1, 2009; 75(7): 2122 - 2131. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. G. Holden, H. M. B. Seth-Smith, L. C. Crossman, M. Sebaihia, S. D. Bentley, A. M. Cerdeno-Tarraga, N. R. Thomson, N. Bason, M. A. Quail, S. Sharp, et al. The Genome of Burkholderia cenocepacia J2315, an Epidemic Pathogen of Cystic Fibrosis Patients J. Bacteriol., January 1, 2009; 191(1): 261 - 277. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kunin, A. Copeland, A. Lapidus, K. Mavromatis, and P. Hugenholtz A Bioinformatician's Guide to Metagenomics Microbiol. Mol. Biol. Rev., December 1, 2008; 72(4): 557 - 578. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Klasson, T. Walker, M. Sebaihia, M. J. Sanders, M. A. Quail, A. Lord, S. Sanders, J. Earl, S. L. O'Neill, N. Thomson, et al. Genome Evolution of Wolbachia Strain wPip from the Culex pipiens Group Mol. Biol. Evol., September 1, 2008; 25(9): 1877 - 1887. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Naito, H. Hirakawa, A. Yamashita, N. Ohara, M. Shoji, H. Yukitake, K. Nakayama, H. Toh, F. Yoshimura, S. Kuhara, et al. Determination of the Genome Sequence of Porphyromonas gingivalis Strain ATCC 33277 and Genomic Comparison with Strain W83 Revealed Extensive Genome Rearrangements in P. gingivalis DNA Res, August 1, 2008; 15(4): 215 - 225. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. P. Stinear, T. Seemann, P. F. Harrison, G. A. Jenkin, J. K. Davies, P. D.R. Johnson, Z. Abdellah, C. Arrowsmith, T. Chillingworth, C. Churcher, et al. Insights from the complete genome sequence of Mycobacterium marinum on the evolution of Mycobacterium tuberculosis Genome Res., May 1, 2008; 18(5): 729 - 741. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Bentley, C. Corton, S. E. Brown, A. Barron, L. Clark, J. Doggett, B. Harris, D. Ormond, M. A. Quail, G. May, et al. Genome of the Actinomycete Plant Pathogen Clavibacter michiganensis subsp. sepedonicus Suggests Recent Niche Adaptation J. Bacteriol., March 15, 2008; 190(6): 2150 - 2160. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Goto, A. Yamashita, H. Hirakawa, M. Matsutani, K. Todo, K. Ohshima, H. Toh, K. Miyamoto, S. Kuhara, M. Hattori, et al. Complete Genome Sequence of Finegoldia magna, an Anaerobic Opportunistic Pathogen DNA Res, February 7, 2008; (2008) dsm030v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.-Q. Hu, X. Zheng, Y.-F. Yang, P. Ortet, Z.-S. She, and H. Zhu ProTISA: a comprehensive resource for translation initiation site annotation in prokaryotic genomes Nucleic Acids Res., January 11, 2008; 36(suppl_1): D114 - D119. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kang, S.-J. Yang, S. Kim, and J. Bhak CONSORF: a consensus prediction system for prokaryotic coding sequences Bioinformatics, November 15, 2007; 23(22): 3088 - 3090. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. L. Delcher, K. A. Bratke, E. C. Powers, and S. L. Salzberg Identifying bacterial genes and endosymbiont DNA with Glimmer Bioinformatics, March 15, 2007; 23(6): 673 - 679. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Krause, A. C. McHardy, T. W. Nattkemper, A. Puhler, J. Stoye, and F. Meyer GISMO--gene identification using a support vector machine for ORF classification Nucleic Acids Res., January 28, 2007; 35(2): 540 - 549. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. L. Riley, T. Schmidt, I. I. Artamonova, C. Wagner, A. Volz, K. Heumann, H.-W. Mewes, and D. Frishman PEDANT genome database: 10 years online Nucleic Acids Res., January 12, 2007; 35(suppl_1): D354 - D357. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, J. Park, and T. Takagi MetaGene: prokaryotic gene finding from environmental genome shotgun sequences Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nielsen and A. Krogh Large-scale prokaryotic gene prediction and comparison to genome annotation Bioinformatics, December 15, 2005; 21(24): 4322 - 4329. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lomsadze, V. Ter-Hovhannisyan, Y. O. Chernoff, and M. Borodovsky Gene identification in novel eukaryotic genomes by self-training algorithm Nucleic Acids Res., November 28, 2005; 33(20): 6494 - 6506. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Ricke, M. Kube, S. Nakagawa, C. Erkel, R. Reinhardt, and W. Liesack First Genome Data from Uncultured Upland Soil Cluster Alpha Methanotrophs Provide Further Evidence for a Close Phylogenetic Relationship to Methylocapsa acidiphila B2 and for High-Affinity Methanotrophy Involving Particulate Methane Monooxygenase Appl. Envir. Microbiol., November 1, 2005; 71(11): 7472 - 7482. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Witney, G. L. Marsden, M. T. G. Holden, R. A. Stabler, S. E. Husain, J. K. Vass, P. D. Butcher, J. Hinds, and J. A. Lindsay Design, Validation, and Application of a Seven-Strain Staphylococcus aureus PCR Product Microarray for Comparative Genomics Appl. Envir. Microbiol., November 1, 2005; 71(11): 7504 - 7514. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. E. Collins, J. Liebenberg, E. P. de Villiers, K. A. Brayton, E. Louw, A. Pretorius, F. E. Faber, H. van Heerden, A. Josemans, M. van Kleef, et al. The genome of the heartwater agent Ehrlichia ruminantium contains multiple tandem repeats of actively variable copy number PNAS, January 18, 2005; 102(3): 838 - 843. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. Brayton, L. S. Kappmeyer, D. R. Herndon, M. J. Dark, D. L. Tibbals, G. H. Palmer, T. C. McGuire, and D. P. Knowles Jr. Complete genome sequencing of Anaplasma marginale reveals that the surface is skewed to two superfamilies of outer membrane proteins PNAS, January 18, 2005; 102(3): 844 - 849. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kuwahara, A. Yamashita, H. Hirakawa, H. Nakayama, H. Toh, N. Okada, S. Kuhara, M. Hattori, T. Hayashi, and Y. Ohnishi Genomic analysis of Bacteroides fragilis reveals extensive DNA inversions regulating cell surface adaptation PNAS, October 12, 2004; 101(41): 14919 - 14924. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. S. Bell, M. Sebaihia, L. Pritchard, M. T. G. Holden, L. J. Hyman, M. C. Holeva, N. R. Thomson, S. D. Bentley, L. J. C. Churcher, K. Mungall, et al. Genome sequence of the enterobacterial phytopathogen Erwinia carotovora subsp. atroseptica and characterization of virulence factors PNAS, July 27, 2004; 101(30): 11105 - 11110. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Ben Abdelkhalek, A. Beckers, K. Schuster-Gossler, M. N. Pavlova, H. Burkhardt, H. Lickert, J. Rossant, R. Reinhardt, L. C. Schalkwyk, I. Muller, et al. The mouse homeobox gene Not is required for caudal notochord development and affected by the truncate mutation Genes & Dev., July 15, 2004; 18(14): 1725 - 1736. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Futterer, A. Angelov, H. Liesegang, G. Gottschalk, C. Schleper, B. Schepers, C. Dock, G. Antranikian, and W. Liebl Genome sequence of Picrophilus torridus and its implications for life around pH 0 PNAS, June 15, 2004; 101(24): 9091 - 9096. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Ricke, C. Erkel, M. Kube, R. Reinhardt, and W. Liesack Comparative Analysis of the Conventional and Novel pmo (Particulate Methane Monooxygenase) Operons from Methylocystis Strain SC2 Appl. Envir. Microbiol., May 1, 2004; 70(5): 3055 - 3063. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Cerdeno-Tarraga, A. Efstratiou, L. G. Dover, M. T. G. Holden, M. Pallen, S. D. Bentley, G. S. Besra, C. Churcher, K. D. James, A. De Zoysa, et al. The complete genome sequence and analysis of Corynebacterium diphtheriae NCTC13129 Nucleic Acids Res., November 15, 2003; 31(22): 6516 - 6523. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. O. Glockner, M. Kube, M. Bauer, H. Teeling, T. Lombardot, W. Ludwig, D. Gade, A. Beck, K. Borzym, K. Heitmann, et al. Complete genome sequence of the marine planctomycete Pirellula sp. strain 1 PNAS, July 8, 2003; 100(14): 8298 - 8303. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-B. Guo, H.-Y. Ou, and C.-T. Zhang ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes Nucleic Acids Res., March 15, 2003; 31(6): 1780 - 1789. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Frishman, M. Mokrejs, D. Kosykh, G. Kastenmuller, G. Kolesov, I. Zubrzycki, C. Gruber, B. Geier, A. Kaps, K. Albermann, et al. The PEDANT genome database Nucleic Acids Res., January 1, 2003; 31(1): 207 - 211. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Shibuya and I. Rigoutsos Dictionary-driven prokaryotic gene finding Nucleic Acids Res., June 15, 2002; 30(12): 2710 - 2725. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Bao, Y. Tian, W. Li, Z. Xu, Z. Xuan, S. Hu, W. Dong, J. Yang, Y. Chen, Y. Xue, et al. A Complete Sequence of the T. tengcongensis Genome Genome Res., May 1, 2002; 12(5): 689 - 700. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. W. Mewes, D. Frishman, U. Guldener, G. Mannhaupt, K. Mayer, M. Mokrejs, B. Morgenstern, M. Munsterkotter, S. Rudd, and B. Weil MIPS: a database for genomes and protein sequences Nucleic Acids Res., January 1, 2002; 30(1): 31 - 34. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Besemer, A. Lomsadze, and M. Borodovsky GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions Nucleic Acids Res., June 15, 2001; 29(12): 2607 - 2618. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Prentice, K. D. James, J. Parkhill, S. G. Baker, K. Stevens, M. N. Simmonds, K. L. Mungall, C. Churcher, P. C. F. Oyston, R. W. Titball, et al. Yersinia pestis pFra Shows Biovar-Specific Differences and Recent Common Ancestry with a Salmonella enterica Serovar Typhi Plasmid J. Bacteriol., April 15, 2001; 183(8): 2586 - 2594. [Abstract] [Full Text] |
||||
![]() |
B. J. May, Q. Zhang, L. L. Li, M. L. Paustian, T. S. Whittam, and V. Kapur Complete genomic sequence of Pasteurella multocida,Pm70 PNAS, March 13, 2001; 98(6): 3460 - 3465. [Abstract] [Full Text] [PDF] |
||||









