Nucleic Acids Research, 1994, Vol. 22, No. 22 4768-4778
© 1994
COMPUTATIONAL BIOLOGY |
A hidden Markov model that finds genes in E.coli DNA
Nordita, Blegdamsvej 17, DK-2100 Copenhagen, Denmark 1Sinsheimer Laboratories, University of California Santa Cruz, CA 95064 2Computer and Information Sciences, Univesity of California Santa Cruz, CA 95064, USA
*To whom correpsondence should be addressed
Received June 21, 1994. Revised September 28, 1994. Accepted September 28, 1994.
A hidden Markov model (HMM) has been developed to find protein coding genes in E.coli DNA using E.coli genome DNA sequence from the EcoSeq6 database maintained by Kenn Rudd. This HMM includes states that model the codons and their frequencies in E.coli genes, as well as the patterns found in the intergenic region, including repetitive extragenic palindromic sequences and the Shine Delgarno motif. To account for potential sequencing errors and or frameshifts in raw genomic DNA sequence, it allows for the (very unlikely) possiblity of insertions and deletions of individual nucleotides within a codon. The parameters of the HMM are estimated using approximately one million nucleotides of annotated DNA in EcoSeq6 and the model tested on a disjoint set of contigs containing about 325,000 nucleotides. The HMM finds the exact locations of about 80% of the known E.coli genes, and approximate locations for about 10%. It also finds several potentially new genes, and locates several places were insertion or deletion errors/and or frameshifts may be present in the contigs.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
P. Downey Profile of David Haussler PNAS, September 23, 2008; 105(38): 14251 - 14253. [Full Text] [PDF] |
||||
![]() |
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations Biophys. J., June 1, 2008; 94(11): 4173 - 4183. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mann, J. Li, and Y.-P. P. Chen A pHMM-ANN based discriminative approach to promoter identification in prokaryote genomic contexts Nucleic Acids Res., January 28, 2007; 35(2): e12 - e12. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Friedrich, B. Pils, T. Dandekar, J. Schultz, and T. Muller Modelling interaction sites in protein domains with interaction profile hidden Markov models Bioinformatics, December 1, 2006; 22(23): 2851 - 2857. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Noguchi, J. Park, and T. Takagi MetaGene: prokaryotic gene finding from environmental genome shotgun sequences Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Yu. Mitrophanov and M. Borodovsky Statistical significance in biological sequence analysis Brief Bioinform, March 1, 2006; 7(1): 2 - 24. |
||||
![]() |
W. Zhao, J. Wang, X. He, X. Huang, Y. Jiao, M. Dai, S. Wei, J. Fu, Y. Chen, X. Ren, et al. BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics Nucleic Acids Res., January 1, 2004; 32(90001): D377 - 382. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Vogel, S. A. Teichmann, and C. Chothia The immunoglobulin superfamily in Drosophila melanogaster and Caenorhabditis elegans and the evolution of complexity Development, December 22, 2003; 130(25): 6317 - 6328. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Fujii and A. M. Graybiel Representation of Action Sequence Boundaries by Macaque Prefrontal Cortical Neurons Science, August 29, 2003; 301(5637): 1246 - 1249. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kotlar and Y. Lavner Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions Genome Res., August 1, 2003; 13(8): 1930 - 1937. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Zhang, V. Pavlovic, C. R Cantor, and S. Kasif Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis Genome Res., June 1, 2003; 13(6): 1190 - 1202. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. M. Swalla, R. I. Gumport, and J. F. Gardner Conservation of structure and function among tyrosine recombinases: homology-based modeling of the lambda integrase core-binding domain Nucleic Acids Res., February 1, 2003; 31(3): 805 - 818. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Winters-Hilt, W. Vercoutere, V. S. DeGuzman, D. Deamer, M. Akeson, and D. Haussler Highly Accurate Classification of Watson-Crick Basepairs on Termini of Single DNA Molecules Biophys. J., February 1, 2003; 84(2): 967 - 976. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Udaka, H. Mamitsuka, Y. Nakaseko, and N. Abe Empirical Evaluation of a Dynamic Experiment Design Method for Prediction of MHC Class I-Binding Peptides J. Immunol., November 15, 2002; 169(10): 5744 - 5753. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Thayer and D. L. Beveridge Hidden Markov models from molecular dynamics simulations on DNA PNAS, June 25, 2002; 99(13): 8642 - 8647. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Nicolas, L. Bize, F. Muri, M. Hoebeke, F. Rodolphe, S. D. Ehrlich, B. Prum, and P. Bessieres Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models Nucleic Acids Res., March 15, 2002; 30(6): 1418 - 1426. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Besemer, A. Lomsadze, and M. Borodovsky GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions Nucleic Acids Res., June 15, 2001; 29(12): 2607 - 2618. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liang, X.-Q. T. Pham, M. V. Olson, and S. Lory Identification of a Genomic Island Present in the Majority of Pathogenic Isolates of Pseudomonas aeruginosa J. Bacteriol., February 1, 2001; 183(3): 843 - 853. [Abstract] [Full Text] |
||||
![]() |
G. D. Stormo Gene-Finding Approaches for Eukaryotes Genome Res., April 1, 2000; 10(4): 394 - 397. [Full Text] |
||||
![]() |
A. A. Salamov and V. V. Solovyev Ab initio Gene Finding in Drosophila Genomic DNA Genome Res., April 1, 2000; 10(4): 516 - 522. [Abstract] [Full Text] |
||||
![]() |
A. Krogh Using Database Matches with HMMGene for Automated Gene Detection in Drosophila Genome Res., April 1, 2000; 10(4): 523 - 528. [Abstract] [Full Text] |
||||
![]() |
S. Corbet, M. C. Müller-Trutwin, P. Versmisse, S. Delarue, A. Ayouba, J. Lewis, S. Brunak, P. Martin, F. Brun-Vezinet, F. Simon, et al. env Sequences of Simian Immunodeficiency Viruses from Chimpanzees in Cameroon Are Strongly Related to Those of Human Immunodeficiency Virus Group N from the Same Geographic Area J. Virol., January 1, 2000; 74(1): 529 - 534. [Abstract] [Full Text] |
||||










