Skip Navigation

This Article
Right arrow Print PDF (1064K)
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (105)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Krogh, A.
Right arrow Articles by Haussler, D.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Krogh, A.
Right arrow Articles by Haussler, D.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 1994, Vol. 22, No. 22 4768-4778
© 1994


COMPUTATIONAL BIOLOGY

A hidden Markov model that finds genes in E.coli DNA

Anders Krogh, I. Saira Mian1 and David Haussler2,*

Nordita, Blegdamsvej 17, DK-2100 Copenhagen, Denmark 1Sinsheimer Laboratories, University of California Santa Cruz, CA 95064 2Computer and Information Sciences, Univesity of California Santa Cruz, CA 95064, USA

*To whom correpsondence should be addressed

Received June 21, 1994. Revised September 28, 1994. Accepted September 28, 1994.

A hidden Markov model (HMM) has been developed to find protein coding genes in E.coli DNA using E.coli genome DNA sequence from the EcoSeq6 database maintained by Kenn Rudd. This HMM includes states that model the codons and their frequencies in E.coli genes, as well as the patterns found in the intergenic region, including repetitive extragenic palindromic sequences and the Shine – Delgarno motif. To account for potential sequencing errors and or frameshifts in raw genomic DNA sequence, it allows for the (very unlikely) possiblity of insertions and deletions of individual nucleotides within a codon. The parameters of the HMM are estimated using approximately one million nucleotides of annotated DNA in EcoSeq6 and the model tested on a disjoint set of contigs containing about 325,000 nucleotides. The HMM finds the exact locations of about 80% of the known E.coli genes, and approximate locations for about 10%. It also finds several potentially new genes, and locates several places were insertion or deletion errors/and or frameshifts may be present in the contigs.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Proc. Natl. Acad. Sci. USAHome page
P. Downey
Profile of David Haussler
PNAS, September 23, 2008; 105(38): 14251 - 14253.
[Full Text] [PDF]


Home page
Biophys. JHome page
P. Singhal, B. Jayaram, S. B. Dixit, and D. L. Beveridge
Prokaryotic Gene Finding Based on Physicochemical Characteristics of Codons Calculated from Molecular Dynamics Simulations
Biophys. J., June 1, 2008; 94(11): 4173 - 4183.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mann, J. Li, and Y.-P. P. Chen
A pHMM-ANN based discriminative approach to promoter identification in prokaryote genomic contexts
Nucleic Acids Res., January 28, 2007; 35(2): e12 - e12.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Friedrich, B. Pils, T. Dandekar, J. Schultz, and T. Muller
Modelling interaction sites in protein domains with interaction profile hidden Markov models
Bioinformatics, December 1, 2006; 22(23): 2851 - 2857.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. Noguchi, J. Park, and T. Takagi
MetaGene: prokaryotic gene finding from environmental genome shotgun sequences
Nucleic Acids Res., November 14, 2006; 34(19): 5623 - 5630.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
A. Yu. Mitrophanov and M. Borodovsky
Statistical significance in biological sequence analysis
Brief Bioinform, March 1, 2006; 7(1): 2 - 24.



Home page
Nucleic Acids ResHome page
W. Zhao, J. Wang, X. He, X. Huang, Y. Jiao, M. Dai, S. Wei, J. Fu, Y. Chen, X. Ren, et al.
BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics
Nucleic Acids Res., January 1, 2004; 32(90001): D377 - 382.
[Abstract] [Full Text] [PDF]


Home page
DevelopmentHome page
C. Vogel, S. A. Teichmann, and C. Chothia
The immunoglobulin superfamily in Drosophila melanogaster and Caenorhabditis elegans and the evolution of complexity
Development, December 22, 2003; 130(25): 6317 - 6328.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
N. Fujii and A. M. Graybiel
Representation of Action Sequence Boundaries by Macaque Prefrontal Cortical Neurons
Science, August 29, 2003; 301(5637): 1246 - 1249.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
D. Kotlar and Y. Lavner
Gene Prediction by Spectral Rotation Measure: A New Method for Identifying Protein-Coding Regions
Genome Res., August 1, 2003; 13(8): 1930 - 1937.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Zhang, V. Pavlovic, C. R Cantor, and S. Kasif
Human-Mouse Gene Identification by Comparative Evidence Integration and Evolutionary Analysis
Genome Res., June 1, 2003; 13(6): 1190 - 1202.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
B. M. Swalla, R. I. Gumport, and J. F. Gardner
Conservation of structure and function among tyrosine recombinases: homology-based modeling of the lambda integrase core-binding domain
Nucleic Acids Res., February 1, 2003; 31(3): 805 - 818.
[Abstract] [Full Text] [PDF]


Home page
Biophys. JHome page
S. Winters-Hilt, W. Vercoutere, V. S. DeGuzman, D. Deamer, M. Akeson, and D. Haussler
Highly Accurate Classification of Watson-Crick Basepairs on Termini of Single DNA Molecules
Biophys. J., February 1, 2003; 84(2): 967 - 976.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
K. Udaka, H. Mamitsuka, Y. Nakaseko, and N. Abe
Empirical Evaluation of a Dynamic Experiment Design Method for Prediction of MHC Class I-Binding Peptides
J. Immunol., November 15, 2002; 169(10): 5744 - 5753.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze
Current methods of gene prediction, their strengths and weaknesses
Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
K. M. Thayer and D. L. Beveridge
Hidden Markov models from molecular dynamics simulations on DNA
PNAS, June 25, 2002; 99(13): 8642 - 8647.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. Nicolas, L. Bize, F. Muri, M. Hoebeke, F. Rodolphe, S. D. Ehrlich, B. Prum, and P. Bessieres
Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models
Nucleic Acids Res., March 15, 2002; 30(6): 1418 - 1426.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Besemer, A. Lomsadze, and M. Borodovsky
GeneMarkS: a self-training method for prediction of gene starts in microbial genomes. Implications for finding sequence motifs in regulatory regions
Nucleic Acids Res., June 15, 2001; 29(12): 2607 - 2618.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
X. Liang, X.-Q. T. Pham, M. V. Olson, and S. Lory
Identification of a Genomic Island Present in the Majority of Pathogenic Isolates of Pseudomonas aeruginosa
J. Bacteriol., February 1, 2001; 183(3): 843 - 853.
[Abstract] [Full Text]


Home page
Genome ResHome page
G. D. Stormo
Gene-Finding Approaches for Eukaryotes
Genome Res., April 1, 2000; 10(4): 394 - 397.
[Full Text]


Home page
Genome ResHome page
A. A. Salamov and V. V. Solovyev
Ab initio Gene Finding in Drosophila Genomic DNA
Genome Res., April 1, 2000; 10(4): 516 - 522.
[Abstract] [Full Text]


Home page
Genome ResHome page
A. Krogh
Using Database Matches with HMMGene for Automated Gene Detection in Drosophila
Genome Res., April 1, 2000; 10(4): 523 - 528.
[Abstract] [Full Text]


Home page
J. Virol.Home page
S. Corbet, M. C. Müller-Trutwin, P. Versmisse, S. Delarue, A. Ayouba, J. Lewis, S. Brunak, P. Martin, F. Brun-Vezinet, F. Simon, et al.
env Sequences of Simian Immunodeficiency Viruses from Chimpanzees in Cameroon Are Strongly Related to Those of Human Immunodeficiency Virus Group N from the Same Geographic Area
J. Virol., January 1, 2000; 74(1): 529 - 534.
[Abstract] [Full Text]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.