Nucleic Acids Research, 2002, Vol. 30, No. 19 4321-4328
© 2002 Oxford University Press
A comparison of profile hidden Markov model procedures for remote homology detection
MRC Laboratory of Molecular Biology, Hills Road, Cambridge CB2 2QH, UK
*To whom correspondence should be addressed. Tel: +44 1223 402479; Fax: +44 1223 213556; Email: mm238{at}mrc-lmb.cam.ac.uk
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
Profile hidden Markov models (HMMs) are amongst the most successful procedures for detecting remote homology between proteins. There are two popular profile HMM programs, HMMER and SAM. Little is known about their performance relative to each other and to the recently improved version of PSI-BLAST. Here we compare the two programs to each other and to non-HMM methods, to determine their relative performance and the features that are important for their success. The quality of the multiple sequence alignments used to build models was the most important factor affecting the overall performance of profile HMMs. The SAM T99 procedure is needed to produce high quality alignments automatically, and the lack of an equivalent component in HMMER makes it less complete as a package. Using the default options and parameters as would be expected of an inexpert user, it was found that from identical alignments SAM consistently produces better models than HMMER and that the relative performance of the model-scoring components varies. On average, HMMER was found to be between one and three times faster than SAM when searching databases larger than 2000 sequences, SAM being faster on smaller ones. Both methods were shown to have effective low complexity and repeat sequence masking using their null models, and the accuracy of their E-values was comparable. It was found that the SAM T99 iterative database search procedure performs better than the most recent version of PSI-BLAST, but that scoring of PSI-BLAST profiles is more than 30 times faster than scoring of SAM models.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Dlakic HHsvm: fast and accurate classification of profile-profile matches identified by HHsearch Bioinformatics, December 1, 2009; 25(23): 3071 - 3076. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Forslund and E. L. Sonnhammer Benchmarking homology detection procedures with low complexity filters Bioinformatics, October 1, 2009; 25(19): 2500 - 2505. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Lobley, M. I. Sadowski, and D. T. Jones pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination Bioinformatics, July 15, 2009; 25(14): 1761 - 1767. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. W. Brandt and J. Heringa webPRC: the Profile Comparer for alignment-based searching of public domain databases Nucleic Acids Res., July 1, 2009; 37(suppl_2): W48 - W52. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Madera Profile Comparer: a program for scoring and aligning profile hidden Markov models Bioinformatics, November 15, 2008; 24(22): 2630 - 2631. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Loewenstein and M. Linial Connect the dots: exposing hidden protein family connections from the entire sequence tree Bioinformatics, August 15, 2008; 24(16): i193 - i199. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu The effectiveness of position- and composition-specific gap costs for protein similarity searches Bioinformatics, July 1, 2008; 24(13): i15 - i23. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Reid, C. Yeats, and C. A. Orengo Methods of remote homology detection can be combined to increase coverage by 10% in the midnight zone Bioinformatics, September 15, 2007; 23(18): 2353 - 2360. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Hochreiter, M. Heusel, and K. Obermayer Fast model-based protein homology detection without alignment Bioinformatics, July 15, 2007; 23(14): 1728 - 1736. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Kann, S. L. Sheetlin, Y. Park, S. H. Bryant, and J. L. Spouge The identification of complete domains within protein sequences using accurate E-values for semi-global alignment Nucleic Acids Res., July 9, 2007; 35(14): 4678 - 4685. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Fjell, R. E.W. Hancock, and A. Cherkasov AMPer: a database and an automated discovery tool for antimicrobial peptides Bioinformatics, May 1, 2007; 23(9): 1148 - 1155. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Wilson, M. Madera, C. Vogel, C. Chothia, and J. Gough The SUPERFAMILY database in 2007: families and functions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D308 - D313. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kutchma, N. Quayum, and J. Jensen GeneSpeed: protein domain organization of the transcriptome Nucleic Acids Res., January 12, 2007; 35(suppl_1): D674 - D679. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. K. Freyhult, J. P. Bollback, and P. P. Gardner Exploring genomic dark matter: A critical assessment of the performance of homology search methods on noncoding RNA Genome Res., January 1, 2007; 17(1): 117 - 125. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Friedrich, B. Pils, T. Dandekar, J. Schultz, and T. Muller Modelling interaction sites in protein domains with interaction profile hidden Markov models Bioinformatics, December 1, 2006; 22(23): 2851 - 2857. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Gough Genomic scale sub-family assignment of protein domains Nucleic Acids Res., July 28, 2006; 34(13): 3625 - 3633. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Neduva and R. B. Russell DILIMOT: discovery of linear motifs in proteins. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W350 - W355. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cheng and P. Baldi A machine learning information retrieval approach to protein fold recognition Bioinformatics, June 15, 2006; 22(12): 1456 - 1463. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Fodor and R. W. Aldrich Statistical Limits to the Identification of Ion Channel Domains by Sequence Similarity J. Gen. Physiol., May 30, 2006; 127(6): 755 - 766. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. R. Johnston and D. C. Shields A sequence sub-sampling algorithm increases the power to detect distant homologues Nucleic Acids Res., July 8, 2005; 33(12): 3772 - 3778. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Krishnadev, N. Rekha, S. B. Pandit, S. Abhiman, S. Mohanty, L. S. Swapna, S. Gore, and N. Srinivasan PRODOC: a resource for the comparison of tethered protein domain architectures with in-built information on remotely related domain families Nucleic Acids Res., July 1, 2005; 33(suppl_2): W126 - W129. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ovcharenko, D. Boffelli, and G. G. Loots eShadow: A Tool for Comparing Closely Related Sequences Genome Res., June 1, 2004; 14(6): 1191 - 1198. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Madera, C. Vogel, S. K. Kummerfeld, C. Chothia, and J. Gough The SUPERFAMILY database in 2004: additions and improvements Nucleic Acids Res., January 1, 2004; 32(90001): D235 - 239. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. O. Stitziel, T. A. Binkowski, Y. Y. Tseng, S. Kasif, and J. Liang topoSNP: a topographic database of non-synonymous single nucleotide polymorphisms with and without known disease association Nucleic Acids Res., January 1, 2004; 32(90001): D520 - 522. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Madan Babu and S. A. Teichmann Evolution of transcription factors and the gene regulatory network in Escherichia coli Nucleic Acids Res., February 15, 2003; 31(4): 1234 - 1244. [Abstract] [Full Text] [PDF] |
||||



