Nucleic Acids Research, 1994, Vol. 22, No. 22 4756-4767
© 1994
COMPUTATIONAL BIOLOGY |
Intrinsic and extrinsic approaches for detecting genes in a bacterial genome
School of Biology, Georgia Institute of Techonology Atlanta, GA 30332-0230 1National Center for Biotechnology Information, National Library of Medicine, National Institute of Health Bethesda, MD 20894, USA
Received June 21, 1994. Revised September 28, 1994. Accepted September 28, 1994.
The unannotated regions of the Escherichia coli genome DNA sequence from the EcoSeq6 database, totaling 1,278 intergenic sequences of the combined length of 359,279 basepairs, were analyzed using computer-assisted methods with the aim of identifying putative unknown genes. The proposed strategy for finding new genes includes two key elements: l) prediction of expressed open reading frames (ORFs) using the GeneMark method based on Markov chain models for coding and non-coding regions of Escherichia coli DNA, and ii) search for protein sequence similarities using programs based on the BLAST algorithm and programs for motif identification. A total of 354 putative expressed ORFs were predicted by GeneMark. Using the BLASTX and TBLASTN programs, it was shown that 208 ORFs located in the unannotated regions of the E.coli chromosome are significantly similar to other protein sequences. Identification of 182 ORFs as probable genes was supported by both GeneMark and BLAST, comprising 51.4% of the GeneMark hits and 87.5% of the BLAST hits. 73 putative new genes, comprising 20.6% of the GeneMark predictions, belong to ancient conserved protein families that include both eubacterial and eukaryotic members. This value is close to the overall proportion of highly conserved sequences among eubacterial proteins, indicating that the majority of the putative expressed ORFs that are predicted by GeneMark, but have no significant BLAST hits, nevertheless are likely to be real genes. The majority of the putative genes identified by BLAST search have been described since the release of the EcoSeq6 database, but about 70 genes have not been detected so far. Among these new identifications are genes encoding proteins with a variety of predicted functions including dehydrogenases, kinases, several other metabolic enzymes, ATPases, rRNA methyltransferases, membrane proteins, and different types of regulatory proteins.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Ibrahim, P. Nicolas, P. Bessieres, A. Bolotin, V. Monnet, and R. Gardan A genome-wide survey of short coding sequences in streptococci Microbiology, November 1, 2007; 153(11): 3631 - 3644. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. G. Hodskinson, L. M. Allen, D. P. Thomson, and J. R. Sayers Molecular interactions of Escherichia coli ExoIX and identification of its associated 3'-5' exonuclease activity Nucleic Acids Res., June 12, 2007; (2007) gkm396v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Tadokoro, H. Hayashi, T. Kishimoto, Y. Makino, S. Fujisaki, and Y. Nishimura Interaction of the Escherichia coli Lipoprotein NlpI with Periplasmic Prc (Tsp) Protease J. Biochem., February 1, 2004; 135(2): 185 - 191. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Mills, M. Rozanov, A. Lomsadze, T. Tatusova, and M. Borodovsky Improving gene annotation of complete viral genomes Nucleic Acids Res., December 1, 2003; 31(23): 7041 - 7055. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hirosawa, K.-i. Ishikawa, T. Nagase, and O. Ohara Detection of Spurious Interruptions of Protein-Coding Regions in Cloned cDNA Sequences by GeneMark Analysis Genome Res., September 1, 2000; 10(9): 1333 - 1341. [Abstract] [Full Text] |
||||
![]() |
D. H. Schmiel, G. M. Young, and V. L. Miller The Yersinia enterocolitica Phospholipase Gene yplA Is Part of the Flagellar Regulon J. Bacteriol., April 15, 2000; 182(8): 2314 - 2320. [Abstract] [Full Text] |
||||
![]() |
J. A. Solinger, D. Pascolini, and W.-D. Heyer Active-Site Mutations in the Xrn1p Exoribonuclease of Saccharomyces cerevisiae Reveal a Specific Role in Meiosis Mol. Cell. Biol., September 1, 1999; 19(9): 5930 - 5942. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Ohara, H. C. Wu, K. Sankaran, and P. D. Rick Identification and Characterization of a New Lipoprotein, NlpI, in Escherichia coli K-12 J. Bacteriol., July 15, 1999; 181(14): 4318 - 4325. [Abstract] [Full Text] |
||||
![]() |
P. K. Martin, T. Li, D. Sun, D. P. Biek, and M. B. Schmid Role in Cell Permeability of an Essential Two-Component System in Staphylococcus aureus J. Bacteriol., June 15, 1999; 181(12): 3666 - 3673. [Abstract] [Full Text] |
||||
![]() |
D. Blankenhorn, J. Phillips, and J. L. Slonczewski Acid- and Base-Induced Proteins during Aerobic and Anaerobic Growth of Escherichia coli Revealed by Two-Dimensional Gel Electrophoresis J. Bacteriol., April 1, 1999; 181(7): 2209 - 2216. [Abstract] [Full Text] |
||||
![]() |
S. Censini, C. Lange, Z. Xiang, J. E. Crabtree, P. Ghiara, M. Borodovsky, R. Rappuoli, and A. Covacci cag, a pathogenicity island of Helicobacter pylori, encodes type I-specific and disease-associated virulence factors PNAS, December 10, 1996; 93(25): 14648 - 14653. [Abstract] [Full Text] [PDF] |
||||






