Nucleic Acids Research, Vol 27, Issue 17 3577-3582, Copyright © 1999 by Oxford University Press
SS Hannenhalli, WS Hayes, AG Hatzigeorgiou and JW Fickett
With the growing number of completely sequenced bacterial genes, accurate
gene prediction in bacterial genomes remains an important problem. Although
the existing tools predict genes in bacterial genomes with high overall
accuracy, their ability to pinpoint the translation start site remains
unsatisfactory. In this paper, we present a novel approach to bacterial
start site prediction that takes into account multiple features of a
potential start site, viz., ribosome binding site (RBS) binding energy,
distance of the RBS from the start codon, distance from the beginning of
the maximal ORF to the start codon, the start codon itself and the
coding/non-coding potential around the start site. Mixed integer programing
was used to optimize the discriminatory system. The accuracy of this
approach is up to 90%, compared to 70%, using the most common tools in
fully automated mode (that is, without expert human post-processing of
results). The approach is evaluated using Bacillus subtilis, Escherichia
coli and Pyrococcus furiosus. These three genomes cover a broad spectrum of
bacterial genomes, since B.subtilis is a Gram-positive bacterium, E.coli is
a Gram-negative bacterium and P. furiosus is an archaebacterium. A
significant problem is generating a set of 'true' start sites for algorithm
training, in the absence of experimental work. We found that sequence
conservation between P. furiosus and the related Pyrococcus horikoshii
clearly delimited the gene start in many cases, providing a sufficient
training set.
ARTICLES
Bacterial start site prediction
Bioinformatics, SmithKline Beecham Pharmaceuticals, 709 Swedeland Road, PO Box 1539, King of Prussia, PA 19406, USA. hannes00@mh.vs.sbphrd.com
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations Biophys. J., June 1, 2008; 94(11): 4173 - 4183. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-B. Guo, H.-Y. Ou, and C.-T. Zhang ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes Nucleic Acids Res., March 15, 2003; 31(6): 1780 - 1789. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Thanassi, S. L. Hartman-Neumann, T. J. Dougherty, B. A. Dougherty, and M. J. Pucci Identification of 113 conserved essential genes using a high-throughput gene disruption system in Streptococcus pneumoniae Nucleic Acids Res., July 15, 2002; 30(14): 3152 - 3162. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Walker, V. Pavlovic, and S. Kasif A comparative genomic method for computational identification of prokaryotic translation initiation sites Nucleic Acids Res., July 15, 2002; 30(14): 3181 - 3191. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Shibuya and I. Rigoutsos Dictionary-driven prokaryotic gene finding Nucleic Acids Res., June 15, 2002; 30(12): 2710 - 2725. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ekins and D. F. Niven Identification of fur and fldA Homologs and a Pasteurella multocida tbpA Homolog in Histophilus ovis and Effects of Iron Availability on Their Transcription J. Bacteriol., May 1, 2002; 184(9): 2539 - 2542. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Besemer, A. Lomsadze, and M. Borodovsky GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions Nucleic Acids Res., June 15, 2001; 29(12): 2607 - 2618. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Fujimoto and Y. Ike pAM401-Based Shuttle Vectors That Enable Overexpression of Promoterless Genes and One-Step Purification of Tag Fusion Proteins Directly from Enterococcus faecalis Appl. Envir. Microbiol., March 1, 2001; 67(3): 1262 - 1267. [Abstract] [Full Text] |
||||



