Nucleic Acids Research Advance Access first published online on May 29, 2009
This version published online on June 2, 2009
Nucleic Acids Research, doi:10.1093/nar/gkp423
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Methods Online |
Measuring spatial preferences at fine-scale resolution identifies known and novel cis-regulatory element candidates and functional motif-pair relationships
1Biology Department, 2Institute for Genome Sciences and Policy and 3Departments of Biostatistics and Bioinformatics and Computer Science; Duke University, Durham, NC 27708, USA
*To whom correspondence should be addressed. Tel: +1 919 668 6249; Fax: +1 919 660 7293; Email: kdy2{at}duke.edu Correspondence may also be addressed Uwe Ohler. Tel: +1 919 668 5388; Fax: +1 919 668 0795; Email: uwe.ohler{at}duke.edu
Received February 12, 2009. Revised May 5, 2009. Accepted May 8, 2009.
Transcriptional regulation is mediated by the collective binding of proteins called transcription factors to cis-regulatory elements. A handful of factors are known to function at particular distances from the transcription start site, although the extent to which this occurs is not well understood. Spatial dependencies can also exist between pairs of binding motifs, facilitating factor-pair interactions. We sought to determine to what extent spatial preferences measured at high-scale resolution could be utilized to predict cis-regulatory elements as well as motif-pairs binding interacting proteins. We introduce the motif positional function model which predicts spatial biases using regression analysis, differentiating noise from true position-specific overrepresentation at single-nucleotide resolution. Our method predicts 48 consensus motifs exhibiting positional enrichment within human promoters, including fourteen motifs without known binding partners. We then extend the model to analyze distance preferences between pairs of motifs. We find that motif-pairs binding interacting factors often co-occur preferentially at multiple distances, with intervals between preferred distances often corresponding to the turn of the DNA double-helix. This offers a novel means by which to predict sequence elements with a collective role in gene regulation.