Published online 26 February 2004
Nucleic Acids Research, 2004, Vol. 32, No. 4 1372-1381
© 2004 Oxford University Press
Detection of functional DNA motifs via statistical over-representation
1 Bioinformatics Program, Boston University, 44 Cummington Street, Boston, MA 02215, USA, 2 Department of Neurology, Boston University, 715 Albany Street, Boston, MA 02118, USA, 3 Department of Biology, Boston University, 5 Cummington Street, Boston, MA 02215, USA and 4 Department of Biomedical Engineering, Boston University, 44 Cummington Street, Boston, MA 02215, USA
*To whom correspondence should be addressed. Tel: +1 617 353 3509; Fax: +1 617 353 6766; Email: zhiping{at}bu.edu
The interaction of proteins with DNA recognition motifs regulates a number of fundamental biological processes, including transcription. To understand these processes, we need to know which motifs are present in a sequence and which factors bind to them. We describe a method to screen a set of DNA sequences against a precompiled library of motifs, and assess which, if any, of the motifs are statistically over- or under-represented in the sequences. Over-represented motifs are good candidates for playing a functional role in the sequences, while under-representation hints that if the motif were present, it would have a harmful dysregulatory effect. We apply our method (implemented as a computer program called Clover) to dopamine-responsive promoters, sequences flanking binding sites for the transcription factor LSF, sequences that direct transcription in muscle and liver, and Drosophila segmentation enhancers. In each case Clover successfully detects motifs known to function in the sequences, and intriguing and testable hypotheses are made concerning additional motifs. Clover compares favorably with an ab initio motif discovery algorithm based on sequence alignment, when the motif library includes only a homolog of the factor that actually regulates the sequences. It also demonstrates superior performance over two contingency table based over-representation methods. In conclusion, Clover has the potential to greatly accelerate characterization of signals that regulate transcription.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
F. Zambelli, G. Pesole, and G. Pavesi Pscan: finding over-represented transcription factor binding site motifs in sequences from co-regulated or co-expressed genes Nucleic Acids Res., July 1, 2009; 37(suppl_2): W247 - W252. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. Mattaliano, C. Huard, W. Cao, A. A. Hill, W. Zhong, R. V. Martinez, D. C. Harnish, J. E. Paulsen, and H. H. Shih LOX-1-dependent transcriptional regulation in response to oxidized LDL treatment of human aortic endothelial cells Am J Physiol Cell Physiol, June 1, 2009; 296(6): C1329 - C1337. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. G. Roider, T. Manke, S. O'Keeffe, M. Vingron, and S. A. Haas PASTAA: identifying transcription factors associated with sets of co-regulated genes Bioinformatics, February 15, 2009; 25(4): 435 - 442. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Nishida, M. C. Frith, and K. Nakai Pseudocounts for transcription factor binding sites Nucleic Acids Res., February 1, 2009; 37(3): 939 - 944. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Whitington, A. C. Perkins, and T. L. Bailey High-throughput chromatin information enables accurate tissue-specific prediction of transcription factor binding sites Nucleic Acids Res., January 1, 2009; 37(1): 14 - 25. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Boden and T. L. Bailey Associating transcription factor-binding site motifs with target GO terms and target genes Nucleic Acids Res., July 1, 2008; 36(12): 4108 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Thomas-Chollier, O. Sand, J.-V. Turatsinze, R. Janky, M. Defrance, E. Vervisch, S. Brohee, and J. van Helden RSAT: regulatory sequence analysis tools Nucleic Acids Res., July 1, 2008; 36(suppl_2): W119 - W127. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hoemme, A. Peerzada, G. Behre, Y. Wang, M. McClelland, K. Nieselt, M. Zschunke, C. Disselhoff, S. Agrawal, F. Isken, et al. Chromatin modifications induced by PML-RAR{alpha} repress critical targets in leukemogenesis as analyzed by ChIP-Chip Blood, March 1, 2008; 111(5): 2887 - 2895. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Bryne, E. Valen, M.-H. E. Tang, T. Marstrand, O. Winther, I. da Piedade, A. Krogh, B. Lenhard, and A. Sandelin JASPAR, the open access database of transcription factor-binding profiles: new content and tools in the 2008 update Nucleic Acids Res., January 11, 2008; 36(suppl_1): D102 - D106. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Gao, S. Falt, A. Sandelin, J.-A. Gustafsson, and K. Dahlman-Wright Genome-Wide Identification of Estrogen Receptor {alpha}-Binding Sites in Mouse Liver Mol. Endocrinol., January 1, 2008; 22(1): 10 - 22. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-R. Chung, D. Kostka, and M. Vingron A physical model for tiling array analysis Bioinformatics, July 1, 2007; 23(13): i80 - i86. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. V. Laz, M. G. Holloway, C.-S. Chen, and D. J. Waxman Characterization of Three Growth Hormone-Responsive Transcription Factors Preferentially Expressed in Adult Female Liver Endocrinology, July 1, 2007; 148(7): 3327 - 3337. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, Y. Liang, and R. L. Bass GAPWM: a genetic algorithm method for optimizing a position weight matrix Bioinformatics, May 15, 2007; 23(10): 1188 - 1194. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Zhang, B. Jiang, M. Li, J. Tromp, X. Zhang, and M. Q. Zhang Computing exact P-values for DNA motifs Bioinformatics, March 1, 2007; 23(5): 531 - 537. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Sandmann, C. Girardot, M. Brehme, W. Tongprasit, V. Stolc, and E. E.M. Furlong A core transcriptional network for early mesoderm development in Drosophila melanogaster Genes & Dev., February 15, 2007; 21(4): 436 - 449. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Elnitski, V. X. Jin, P. J. Farnham, and S. J.M. Jones Locating mammalian transcription factor binding sites: A survey of computational and experimental techniques Genome Res., December 1, 2006; 16(12): 1455 - 1464. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. X. Jin, A. Rabinovich, S. L. Squazzo, R. Green, and P. J. Farnham A computational genomics approach to identify cis-regulatory modules from chromatin immunoprecipitation microarray data--A case study using E2F1 Genome Res., December 1, 2006; 16(12): 1585 - 1595. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Shen, Z. Hu, J. Goswami, and S. L. Gaffen Identification of Common Transcriptional Regulatory Elements in Interleukin-17 Target Genes J. Biol. Chem., August 25, 2006; 281(34): 24138 - 24148. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Sun, K. Boyd, W. Xu, J. Ma, C. W. Jackson, A. Fu, J. M. Shillingford, G. W. Robinson, L. Hennighausen, J. K. Hitzler, et al. Acute Myeloid Leukemia-Associated Mkl1 (Mrtf-a) Is a Key Regulator of Mammary Gland Function Mol. Cell. Biol., August 1, 2006; 26(15): 5809 - 5826. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Abnizova and W. R. Gilks Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes Brief Bioinform, March 1, 2006; 7(1): 48 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. E. Vinogradov "Genome design" model: Evidence from conserved intronic sequence in human-mouse comparison Genome Res., March 1, 2006; 16(3): 347 - 354. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Tijet, P. C. Boutros, I. D. Moffat, A. B. Okey, J. Tuomisto, and R. Pohjanvirta Aryl Hydrocarbon Receptor Regulates Distinct Dioxin-Dependent and Dioxin-Independent Gene Batteries Mol. Pharmacol., January 1, 2006; 69(1): 140 - 153. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R. Eriksson, G. Mendiratta, N. B. McLaughlin, T. G. Wolfsberg, L. Marino-Ramirez, T. A. Pompa, M. Jainerin, D. Landsman, C.-H. Shen, and D. J. Clark Global Regulation by the Yeast Spt10 Protein Is Mediated through Chromatin Structure and the Histone Upstream Activating Sequence Elements Mol. Cell. Biol., October 15, 2005; 25(20): 9127 - 9137. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Ho Sui, J. R. Mortimer, D. J. Arenillas, J. Brumm, C. J. Walsh, B. P. Kennedy, and W. W. Wasserman oPOSSUM: identification of over-represented transcription factor binding sites in co-expressed genes Nucleic Acids Res., June 2, 2005; 33(10): 3154 - 3164. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Barta, E. Sebestyen, T. B. Palfy, G. Toth, C. P. Ortutay, and L. Patthy DoOP: Databases of Orthologous Promoters, collections of clusters of orthologous upstream sequences from chordates and plants Nucleic Acids Res., January 1, 2005; 33(suppl_1): D86 - D90. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. O'Lone, M. C. Frith, E. K. Karlsson, and U. Hansen Genomic Targets of Nuclear Estrogen Receptors Mol. Endocrinol., August 1, 2004; 18(8): 1859 - 1875. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Hu, Y. Fu, A. S. Halees, S. M. Kielbasa, and Z. Weng SeqVISTA: a new module of integrated computational tools for studying transcriptional regulation Nucleic Acids Res., July 1, 2004; 32(suppl_2): W235 - W241. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Fu, M. C. Frith, P. M. Haverty, and Z. Weng MotifViz: an analysis and visualization tool for motif discovery Nucleic Acids Res., July 1, 2004; 32(suppl_2): W420 - W423. [Abstract] [Full Text] [PDF] |
||||











