Nucleic Acids Research Advance Access originally published online on October 13, 2006
Nucleic Acids Research 2006 34(20):5730-5739; doi:10.1093/nar/gkl585
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2006, Vol. 34, No. 20 5730-5739
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites
Department of Biochemistry CA 94305, USA 1 Department of Computer Science, Stanford University CA 94305, USA
*To whom correspondence should be addressed. Tel: 650 723 5976; Fax: 650 723 6783; Email: briannau{at}stanford.edu
Received June 8, 2006. Revised July 25, 2006. Accepted July 27, 2006.
Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs.
*Correspondence may also be addressed to Eugene Fratkin. Email: fratkin{at}cs.stanford.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Fu, P. Ray, and E. P. Xing DISCOVER: a feature-based discriminative method for motif search in complex genomes Bioinformatics, June 15, 2009; 25(12): i321 - i329. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kheradpour, A. Stark, S. Roy, and M. Kellis Reliable prediction of regulator targets using 12 Drosophila genomes Genome Res., December 1, 2007; 17(12): 1919 - 1931. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, Y. Liang, and R. L. Bass GAPWM: a genetic algorithm method for optimizing a position weight matrix Bioinformatics, May 15, 2007; 23(10): 1188 - 1194. [Abstract] [Full Text] [PDF] |
||||

