Published online 2 September 2005
Article |
Limitations and potentials of current motif discovery algorithms
1Department of Biological Sciences, College of Science, Purdue University West Lafayette, IN 47907, USA 2Department of Computer Science, College of Science, Purdue University West Lafayette, IN 47907, USA 3Markey Center for Structural Biology, College of Science, Purdue University West Lafayette, IN 47907, USA 4The Bindley Bioscience Center, College of Science, Purdue University West Lafayette, IN 47907, USA
*To whom correspondence should be addressed. Tel: +1 765 496 2284; Fax: +1 765 494 1189; Email: dkihara{at}purdue.edu
Received March 3, 2005. Revised May 21, 2005. Accepted August 9, 2005.
Computational methods for de novo identification of gene regulation elements, such as transcription factor binding sites, have proved to be useful for deciphering genetic regulatory networks. However, despite the availability of a large number of algorithms, their strengths and weaknesses are not sufficiently understood. Here, we designed a comprehensive set of performance measures and benchmarked five modern sequence-based motif discovery algorithms using large datasets generated from Escherichia coli RegulonDB. Factors that affect the prediction accuracy, scalability and reliability are characterized. It is revealed that the nucleotide and the binding site level accuracy are very low, while the motif level accuracy is relatively high, which indicates that the algorithms can usually capture at least one correct motif in an input sequence. To exploit diverse predictions from multiple runs of one or more algorithms, a consensus ensemble algorithm has been developed, which achieved 645% improvement over the base algorithms by increasing both the sensitivity and specificity. Our study illustrates limitations and potentials of existing sequence-based motif discovery algorithms. Taking advantage of the revealed potentials, several promising directions for further improvements are discussed. Since the sequence-based algorithms are the baseline of most of the modern motif discovery algorithms, this paper suggests substantial improvements would be possible for them.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. A. F. T. van Hijum, M. H. Medema, and O. P. Kuipers Mechanisms and Evolution of Control Logic in Prokaryotic Transcriptional Regulation Microbiol. Mol. Biol. Rev., September 1, 2009; 73(3): 481 - 509. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Sharma, D. Mohanty, and A. Surolia RegAnalyst: a web interface for the analysis of regulatory motifs, networks and pathways Nucleic Acids Res., July 1, 2009; 37(suppl_2): W193 - W201. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Zhang, M. Xu, S. Li, and Z. Su Genome-wide de novo prediction of cis-regulatory binding sites in prokaryotes Nucleic Acids Res., June 1, 2009; 37(10): e72 - e72. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Yanover, M. Singh, and E. Zaslavsky M are better than one: an ensemble-based motif finder and its application to regulatory element prediction Bioinformatics, April 1, 2009; 25(7): 868 - 874. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. H. Sieglaff, W. A. Dunn, X. S. Xie, K. Megy, O. Marinotti, and A. A. James Comparative genomics allows the discovery of cis-regulatory elements in mosquitoes PNAS, March 3, 2009; 106(9): 3053 - 3058. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wijaya, S.-M. Yiu, N. T. Son, R. Kanagasabai, and W.-K. Sung MotifVoter: a novel ensemble method for fine-grained integration of generic motif finders Bioinformatics, October 15, 2008; 24(20): 2288 - 2295. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-Y. Chen, H.-K. Tsai, C.-M. Hsu, M.-J. May Chen, H.-G. Hung, G. T.-W. Huang, and W.-H. Li Discovering gapped binding sites of yeast transcription factors PNAS, February 19, 2008; 105(7): 2527 - 2532. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Cordero, M. Botta, and R. A. Calogero Microarray data analysis and mining approaches Brief Funct Genomic Proteomic, January 22, 2008; (2008) elm034v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Carlson, A. Chakravarty, C. E. DeZiel, and R. H. Gross SCOPE: a web server for practical de novo motif discovery Nucleic Acids Res., July 13, 2007; 35(suppl_2): W259 - W264. [Abstract] [Full Text] [PDF] |
||||
![]() |
N.-K. Kim, K. Tharakaraman, and J. L. Spouge Adding sequence context to a Markov background model improves the identification of regulatory elements Bioinformatics, December 1, 2006; 22(23): 2870 - 2875. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. S. Cameron and R. J. Redfield Non-canonical CRP sites control competence regulons in Escherichia coli and many other {gamma}-proteobacteria Nucleic Acids Res., November 6, 2006; 34(20): 6001 - 6014. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. L. Bailey, N. Williams, C. Misleh, and W. W. Li MEME: discovering and analyzing DNA and protein sequence motifs. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W369 - W373. [Abstract] [Full Text] [PDF] |
||||




