Nucleic Acids Research, 2003, Vol. 31, No. 20 6016-6026
© 2003 Oxford University Press
Distance preferences in the arrangement of binding motifs and hierarchical levels in organization of transcription regulatory information
Scientific Center Genetika, Moscow 113545, Russia, 1 Engelhardt Institute of Molecular Biology, Moscow 119991, Russia and 2 Department of Biology, New York University, New York, NY 10003-6688, USA
*To whom correspondence should be addressed. Tel: +1 212 683 87 81; Fax: +1 212 995 47 10; Email: dap5{at}nyu.edu
Received June 6, 2003; Revised July 31, 2003; Accepted September 2, 2003
| ABSTRACT |
|---|
|
|
|---|
We explored distance preferences in the arrangement of binding motifs for five transcription factors (Bicoid, Krüppel, Hunchback, Knirps and Caudal) in a large set of Drosophila cis-regulatory modules (CRMs). Analysis of non-overlapping binding motifs revealed the presence of periodic signals specific to particular combinations of binding motifs. The most striking periodic signals (10 bp for Bicoid and 11 bp for Hunchback) suggest preferential positioning of some binding site combinations on the same side of the DNA helix. We also analyzed distance preferences in arrangements of highly correlated overlapping binding motifs, such as Bicoid and Krüppel. Based on the distance analysis, we extracted preferential binding site arrangements and proposed models for potential composite elements (CEs) and antagonistic motif pairs involved in the function of developmental CRMs. Our results suggest that there are distinct hierarchical levels in the organization of transcription regulatory information. We discuss the role of the hierarchy in understanding transcriptional regulation and in detection of transcription regulatory regions in genomes.
| INTRODUCTION |
|---|
|
|
|---|
Initiation of tissue-specific or spatio-specific transcription in multicellular organisms requires binding of multiple transcription factor molecules to transcription regulatory regions, such as promoters and enhancers (cis-regulatory modules; CRMs). Multiple binding motifs and even multiple binding sites for the same motif presented in the regulatory regions are often described as regulatory clusters (16). Statistical models, based on motif clustering, are helpful for finding novel CRMs in the genome, but very often they consider only site density (cluster significance) and relative site affinity (such as a weighted matrix score) (6). However, it is known that specific arrangements of binding motifs within the regulatory regions (regulatory clusters) are necessary to achieve proper biological function. Incorporating such architectural features into formal clustering models might facilitate computational recognition of CRMs and interpretation of their biological function (7).
Specific arrangements between binding sites are known from many examples in biology. For instance, recent quantitative studies of basal transcription (8) revealed a striking dependence of basal promoter activity on both the distance and the orientation of an artificial activator binding site (Gal4) and the TATA box. An optimal spacing between binding sites (NF-Y and SRE motifs) has also been demonstrated in the human SREBP-2 promoter (sterol regulatory element-binding protein) (9). In vitro analysis of binding site arrangements in the rat collagenase-3 promoter (10) has revealed that a 10 bp (helical) phasing in binding site distribution provides maximal transcriptional activity. The importance of the helical phasing and specific binding site arrangement was also demonstrated in vivo for the murine CD4 promoter (11). In some cases, even a very small difference in the distance between binding motifs results in dramatically different transcriptional outcomes. One of the most striking examples of this kind is the binding of the POU domain transcription factor Pit-1 to its target sites, differentially spaced (2 bp difference) in growth hormone and prolactin gene promoters (12). The helical phasing (10 bp) has also been demonstrated computationally (13) using a large number of proximal eukaryotic promoters (14) and the list of binding motifs available from the TRANSFAC database (15). In many of the described quantitative experimental studies, the disruption of specific spacing (phasing) between binding sites resulted in reduction, but not abolishment, of transcription. This fact, together with some known cases of successful promoter reconstruction (1618), also supports the presence of a certain flexibility in site arrangement.
The biological reasons leading to a specific arrangement of sites in promoters are clear: the transcription factors, bound to promoter DNA, are also involved in specific proteinprotein interactions (19,20); therefore, the binding motifs must be distributed in the promoter in a non-random fashion. In other words, the arrangement of binding motifs can control the formation of 3D protein complexes involved in initiation of specific transcription.
Attempts to reveal and describe specific site arrangements resulted in a very interesting concept of composite elements (CEs) (21). In the simplest case, a CE corresponds to a pair of individual binding motifs located at a particular distance and involved in formation of specific tertiary (DNAproteinproteinDNA) complexes. Identical CEs may perform related functions in different genes. Further development of this concept resulted in construction of a dedicated database TRANSCompel (22), combining sequences for 256 (Release 6.0) CEs from different organisms. Currently, the CE concept is widely used for finding co-localized, synergistic (antagonistic) binding motif pairs (23,24) or combinatorial arrays of motifs responsible for the formation of similar gene expression profiles (2527).
In the current work, we explored preferential site distances in CRMs of Drosophila developmental genes. CRMs are transcription regulatory units (
1 kb range), often located far from the transcription start site and responsible for spatio-temporal expression of their cognate developmental genes (3). We recently built a database containing known functional Drosophila CRMs (see our web resource: http://homepages. nyu.edu/
dap5/PCL/appendix2.htm) together with a list of matrices for a number of transcription factors and known transcriptional interactions (6,28). Selection of relevant binding motifs in a particular functionally related group of transcription regulatory regions minimizes the risk of false positives, which is known to be a problem for large-scale analysis of highly diverse data sets.
In this work, we have shown that the binding sites in CRMs of Drosophila are arranged in particular ways, indicating the presence of specific developmental CEs. We also discuss a general model describing hierarchical levels in organization of transcriptional information and the role of CEs in understanding the responses of developmental genes to transcriptional signals.
| METHODS |
|---|
|
|
|---|
In order to identify binding site cores, we calculated information content I (bits) in the ith column of a binding motif alignment as the Shannon entropy for this alignment column (29):
In this equation, qi
represents the frequency of the letter
(
{A,C,G,T}) in the ith position of the alignment. To calculate the score of a binding motif match, we constructed position weighted matrices (PWMs) for each motif using the equation with a pseudocount parameter:
Si
is the score of letter
in position i, ni
is the number of letters
in column i of the motif alignment, q
is the frequency of letter
in the Drosophila genome, and a is the pseudocount parameter, which we set as equal to 1. In our previous publications, we discussed how to select the PWM cutoff for the binding motifs in this particular system (28). For each PWM cutoff value, we estimated the site frequency ES as the total number of motif matches above the PWM cutoff in the entire Drosophila genome normalized to the length of the genome.
To estimate randomness of binding site distribution in the CRMs, we have found all distances between neighboring sites and compared the observed distance distribution with the expected distance distribution in a random Bernoulli sequence (see also equation 7). For each jth interval of 10 distances (j = 5, 15...225), we calculated Z-scores (see Fig. 1):
|
In this equation, the expected number of distances Nexpj for each distance interval j in genomic samples was calculated taking into account the site frequency in the genome (EGs = 5E-4). The corresponding PWM cutoff values for Bicoid (Bcd) and Krüppel (Kr) are shown in Table 1. In the case of the CRM data set, we calculated ES from the number of sites having the same frequency in the genome (the same PWM cutoff), but actually found in the CRM data set (ESCRM = the total number of sites in CRMs/total CRM length). Due to the limited size of CRMs, we calculated the total number of distances between the neighboring sites N300CRM only in the range of 1300 bp. With correction for the maximal possible number of distances (conservative estimation), we calculated the expected number of sites in the jth distance interval as:
|
To obtain comparable statistical values, the Z-scores for genomic sequences were calculated for the sample size of CRMs (NG = N300CRM) using the same equatiion. Statistical noise caused by the small sample size prevented further reduction of the selected distance intervals (see Fig. 1A). Larger intervals would result in lower resolution by distance.
Spectra of distance distributions in the frequency domain were built using the Matlab® (Mathworks, Inc.) signal processing module. We used a filtered fast Fourier transform (FFT) algorithm, implemented in the multiple signal classification method (MUSIC). The order of FFT was set to the maximal, as well as the signal dimension (149 for 300 input points, analyzed range of distances), thus keeping all putative signals without noise reduction. Input signals (Z-scores) for periodic analysis were generated from distance distributions (histograms smoothed by three distance points) using equation 3, taking into account the site frequency observed in the CRM data set (total number of sites in CRMs/total CRM length, see above).
To extract Bcd/Kr functional elements, we calculated PWM scores for Bcd and Kr, respectively, for each position (with the +2 bp shift) of CRM sequences and large genomic samples (1 Mb total). Then, for each jth PWM score zone (i.e. Bcd 5.45.8, Kr 7.47.8, see Table 2), we found the number of matches in the CRM data set Njobs and compared this number with the number obtained from genome samples Njexp. Notice that in the described test, Njobs and Njexp were different (j here is a PWM score zone) from those given above.
|
To minimize errors caused by the small size of the sample (the number of overlaps in CRM sequences), we calculated a conservative estimator for the number of overlaps, Njexp. To find this number, we evaluated separately the conditional probability of observing a Kr match given a Bcd match PjG (KRi+2|BCDi) and the conditional probability of observing a Bcd match given a Kr match PjG (BCDi2|KRi). Both conditional probabilities were estimated from the entire genome. To obtain the conservative estimation, we took the maximum over the two expectations for each jth PWM score zone:
Njexp = Max[NjBcdCRM PjG (KRi+2|BCDi); NjKrCRM PjG (BCDi2|KRi)]5
Here NjBcdCRN, NjKrCRM are the numbers of Bcd and Kr matches, respectively, found in CRMs for the jth PWM score zone. Notice that the conditional probabilities can also be calculated for any PWM score zone, e.g. for sites having site probability above a selected cutoff E (shown in Table 1):
| RESULTS |
|---|
|
|
|---|
Mapping binding sites and defining distances
To perform an analysis of distances between binding sites, it is necessary (i) to select and map binding motifs; (ii) to delineate a relevant sequence data set; and (iii) to formulate a working definition of distances between the binding site matches.
We limited our choice to the five best binding motifs for the transcriptional regulators Bcd, Caudal (Cad), Hunchback (Hb), Kr and Knirps (Kni), having a relatively large number of occurrences in our CRM database. To map these binding motifs, we employed a PWM search with parameters described earlier (6,28). In general, we considered only high-affinity sites with probabilities not exceeding 103, as estimated from the Drosophila genome (see Methods).
To generate a representative sequence data set, we considered only CRMs regulated by any of the selected transcription factors and containing multiple high-affinity binding sites for these proteins. The positions of binding site clusters previously identified in the context of these CRMs (6) provided a formal criterion for establishing boundaries of the selected early developmental CRMs. The total size of analyzed CRMs after the described pre-screening procedure combined >68 kb of sequence data in 33 non-overlapping contigs. The sequence data can be obtained from our web resource (see New York University website: http://homepages. nyu.edu/
dap5/PCL/pseudoobscura/train_plus_contigs.zip).
Since the binding motifs for selected transcription factors have different widths, we measured the distances between the centers of binding site cores made by site alignment columns with a high informational content (see Methods). Table 3 illustrates the procedure of distance measurement. Notice that distances even for the same binding motif may require coordinate adjustment due to the asymmetry of the motif.
|
With the described rules, the distances can be measured between sites that belong to the same binding motif or between sites belonging to different binding motifs; between sites located on the same DNA strand (in tandem) or sites on the opposite strands (in palindrome).
Non-random arrangement of binding motifs in CRMs
To test whether the binding site arrangement in the Drosophila CRMs is non-random, we calculated all distances between neighboring binding sites for the five motifs, and compared the obtained distance distribution with the random expectation. In a random sequence, the probability of observing distance n between two neighboring PWM matches can be calculated from the geometric distribution (30):
P(n) = ES(1 ES)n 17
where ES is the site occurrence in the selected data set and n (bases) is the distance between the neighboring motif matches. In this test, PWM cutoff values (site likelihood ratio values) were set to achieve the same site occurrence for each binding motif in the Drosophila genome, equal to E = 5 x 104 (6). We excluded close distances from this consideration (see Overlapping and correlated motifs below).
This statistical test (see Fig. 1 and Methods) demonstrated that the distances between sites in CRMs are smaller than expected from the described random model. We also analyzed the distribution of distances between the neighboring binding sites in Drosophila genome samples (1% of genome total). In fact, a similar distance distribution was observed (see Fig. 1B), although the significance of the short distances in genome samples was much smaller. The existence of microsatellites, repetitive sequences and other correlations in DNA (31) can explain the deviation of the observed from the expected distance distribution in genome samples.
The described analysis demonstrated that binding sites in Drosophila CRMs are distributed in a non-random fashion and the fraction of sites having spacing in the range 5060 bp is larger than expected. This distance range might correspond to CEs containing several closely spaced binding sites. Notice also that even stand alone sites may represent parts of CEs, which were not detected in our search using five binding motifs.
Periodic signals in arrangement of a single binding motif
To explore whether binding sites are distributed periodically in CRMs, we calculated distances between any two binding sites (all possible binding site pairs in a CRM) belonging to the same binding motif or to a binding motif combination. In this case, the distance expectation in a random sequence is independent of the distance itself and has a uniform distribution (30). Therefore, we compared the empirical (observed) distribution of distances for any binding site pair with the uniform distribution.
To minimize interference between periodic signals, specific to different binding motifs (or binding motif combinations), we focused our attention on analysis of distances between sites belonging to one or two binding motifs. Periodic signals present in the resulting distance distributions were assessed using Fourier analysis (see Methods).
First, we built distance distributions and the corresponding Fourier spectra for the three most frequent binding motifs from our data set, Bcd, Hb and Kr. The most striking result, confirming the hypothesis of helical phasing (see Introduction), was obtained for Bcd (see Figs 2A and B, 3A and B, and 5A). The vast majority of high-affinity Bcd sites are positioned at distances close to 10, 20, 30 etc. bp. The periodicity in the arrangement of Bcd sites drops rapidly with decreasing site affinity, supporting the biological importance of this specific signal. A similar, but not identical periodic signal was observed in the distribution of distances between binding sites for Hb (Figs 2C and 3D). In this case, however, the period was equal not to 10 but to 11 bp. This difference in periodicity might be explained by a slightly different DNA conformation (twist) of the two binding motifs (compare CCTAATCCC, the consensus for Bcd, and TTTTTTTG, the consensus for Hb). Surprisingly, the distribution of another binding motif, Kr, showed no periodic signal corresponding to the helical phasing (see Fig. 3E). The different structure of the Kr DNA-binding domain together with the different mechanism of Kr binding might explain the absence of the helical phasing in the distribution of Kr sites. Bcd is known to be involved in cooperative DNA binding (32), which typically requires several closely spaced binding sites. Instead, Kr seems to be involved in competitive rather then cooperative DNA binding (see Overlapping and correlated motifs below) (33).
|
|
|
The arrangement of binding motifs for another transcriptional activator, Cad, also displayed helical phasing (data not shown). The Kni motif has a low number of occurrences in our data set and was not considered in this type of analysis.
Periodic signals in arrangement of a binding motif combination
To extract periodic signals corresponding to a specific combination of binding motifs (potential synergistic pairs or CEs), we analyzed distance preferences for pairs (any two matches) of the most frequent motifs from our database, BcdHb, BcdKr and KrHb, and the corresponding Fourier spectra. Expression patterns of Bcd, Hb and Kr in the early embryo have substantial overlaps and the transcription factors are expected to be involved in direct synergistic or antagonistic interactions (33).
Analysis of the BcdHb pair revealed two phasing signals (10 and 11 bp), corresponding to BcdBcd and HbHb combinations (Fig. 3G). The high amplitude of the signal corresponding to the double helical period (22 bp) is the result of signal interference from BcdBcd and HbHb pairs (compare positions of peaks corresponding to 2x period in Fig. 3A and D). We also generated the differential BcdHb spectrum (data not shown) by removing distances for BcdBcd and HbHb pairs from our consideration, but detected no high-amplitude periodic signals. Given the periodicity detected in distributions of Bcd and Hb motifs separately, even the presence of a single specific distance for the BcdHb pair would result in a periodic signal. The absence of specific distances between Bcd and Hb suggests that they perform their functions rather independently and perhaps their binding motifs never belong to the same CE, or the potential BcdHb CEs have a flexible structure and are difficult to detect using our type of analysis.
We also explored distance preferences in the distribution of another interesting motif pair, BcdKr. These motifs are very similar (CCTAATCCC, the Bcd consensus, and TAACC CTTT, the Kr consensus) and the corresponding transcription factors are involved in antagonistic interactions by competing for the same binding sequences in regulatory regions (34). We analyzed both the short- (<5 bp, see Overlapping and correlated motifs, below) and the long-range BcdKr distances. Periodic analysis of the BcdKr distribution (long-range distances) revealed the presence of a new signal, having a period rather opposite to the helical (17 bp, see Fig. 3H). The signal was absent in Fourier spectra, built for either Bcd or Kr motif distribution (Fig. 3A and E). The differential Fourier spectrum, generated for BcdKr distances only (data not shown), confirmed the presence of the 17 bp periodic signal and of an additional signal, with a period close to that of the helical (11 bp). This second signal is expected, as Bcd and Kr motifs are highly correlated (2 bp shift) and the Bcd sites are distributed periodically (10 bp). However, the signal corresponding to 17 bp is new and its presence suggests that the non-overlapping Bcd and Kr sites have a tendency to be placed on the opposite sides of the DNA helix (bound proteins are facing in opposite directions). In this case, the non-overlapping Bcd and Kr sites may belong to distinct CEs, performing independent functions (see Discussion).
We also extracted periodic signals specific to some other motif combinations (HbKr, no new signals) as well as signals presented in the combination of all five binding motifs (Bcd, Hb, Kr, Kni and Cad). In the latter case, we detected the same helical signal (11 bp), although with somewhat lower amplitude. In addition, we measured periodicity in the distribution of experimental sites (not PWM matches) in one of the best studied enhancer regions, even-skipped stripe 2, from six Drosophila species (35). In this case, we also detected the major signal with a period close to 10 bp and the opposite phase signal (17 bp), presumably corresponding to the BcdKr motif combination (see Fig. 3C). Table 4 summarizes data for detected periodicities in distributions of the considered binding motif combinations.
|
These data clearly demonstrate that the arrangement of non-overlapping binding motifs in regulatory regions cannot be described by the simple helical phasing formula. Instead, each binding motif as well as each binding motif combination exhibits its own periodicity, sometimes quite different from the major helical signal (1011 bp).
Overlapping and correlated motifs
In the analysis described above, we considered only non-overlapping sites separated by distances exceeding the binding motif lengths. Nevertheless, the overlapping sites are of interest, especially when the binding motifs correlate and the transcription factors compete for the same binding sequences. As described above, the BcdKr (activatorrepressor) motif combination is a characteristic example of this quite common biological situation. Bcd (consensus CCTAATCCC) and Kr (consensus TAACCCTTT) motifs may overlap by chance (consensus CCTAAYCCYTTT), but some of the overlaps do correspond to functional antagonistic elements (composite sites) and some do not. We calculated the possible fraction of the functional Bcd/Kr overlaps and extracted these putative antagonistic elements from our database of developmental CRMs.
In the first step, we estimated what distance (shift) between Bcd and Kr matches causes maximal motif correlation. For each possible shift, we calculated PWM scores for the Bcd and Kr motifs in every position of a test DNA sequence. Figure 4 shows that the maximal correlation value (CC = 0.7) between the two motifs is observed if they are placed on the same DNA strand (according to the orientations shown) and shifted by two bases. Correlation values obtained for other Bcd/Kr motif shifts were low (no overlapping high-scoring matches). In the second step, we compared the frequency of words containing overlapping Bcd and Kr sites (with the 2 bp shift) in the Drosophila genome with the frequency of the same words in the developmental CRMs (see Methods). Table 2 shows a comparison of the numbers of Bcd/Kr overlaps found in the CRM data set and the corresponding numbers found in the genome and normalized to the sample size (number of sites) of the CRM data set. Words corresponding to Bcd/Kr overlaps and overrepresented in CRM sequences (see numbers in bold in Table 2) were extracted and aligned as shown in Figure 5B. The alignment contains many of the known functional BcdKr elements, for instance those found previously in the even-skipped stripe 2 region.
|
The described analysis demonstrates how the test for motif interdependence might help in extraction of antagonistic elements, such as the Bcd/Kr composite binding site. Given a set of transcription regulatory sequences and a list of binding motifs, it seems to be possible to reveal the presence of potential antagonistic relationships among the motifs (competitive binding) without any additional information.
| DISCUSSION |
|---|
|
|
|---|
Preferred site arrangement in developmental enhancers
The preferential arrangement of binding sites for transcription factors in regulatory modules might be considered as a specific type of functional information encoded in regulatory DNA. In the current work, we demonstrated how to extract preferential site arrangements and potential CEs from large data sets using periodic analysis and a test for motif interdependence.
We have shown that the distribution of Bcd and Hb motifs, considered alone, fits well to the known helical phasing rule (1011 bp), and so the bound transcription factors are placed on the same surface of DNA. According to existing models (10), this preferential binding site arrangement facilitates proteinprotein interactions and promotes formation of specific tertiary complexes (DNAproteinproteinDNA), involved in activation of specific transcription. The phenomenon of helical phasing is also known from the distribution of nucleosome positional signals (13), and thus it is very likely that multiple DNAprotein contacts, caused by any large protein complex, have a good chance of following the helical phasing rule. The example in Figure 5A represents actual sequences, containing periodically distributed binding sites for Bcd. Alignment of these sequences reveals a common element, containing at least 23 high-scoring Bcd matches, which are present in many developmental CRMs.
It is more important, however, that the helical phasing rule is not sufficient to describe arrangement of any binding motif or binding motif combination in any regulatory sequence. This fact is demonstrated by periodic signals detected in the distribution of the Kr motif and of the BcdKr binding motif combination (17 bp, see Fig. 3E and H).
Hierarchical levels in the organization of transcription regulatory information
CRMs represent independent functional units, responsible for the formation of specific expression patterns in developing fly embryo. This functional independence of CRMs suggests that they constitute one of the upper levels in the informational hierarchy. Conversely, binding motifs for transcription factors represent the bottom level, as they cannot be divided further, e.g. into smaller functional subwords. Combinatorial arrangements of binding motifs such as CEs and antagonistic pairs of overlapping motifs (e.g Bcd/Kr) might represent yet another, middle level in the informational hierarchy.
In this work, we demonstrated that binding motifs are distributed in Drosophila CRMs in a non-random fashion, with a large fraction of sites belonging to small closely spaced groups (5070 bp range). Moreover, within these small groups (putative CEs), the distances between the binding sites are far from random and comply with some specific spacing requirements. We believe that these two findings confirm the presence of the middle hierarchical level, defining (i) order, (ii) orientation and (iii) spacing in small groups of binding sites. If so, then the upper informational level, represented by entire promoters and CRMs, combines several such small functional groups or several CEs, acting independently in their response to the spectrum of native transcriptional signals. For example, repression of the same promoter (CRM) by two transcription factors might be achieved through an independent response of two or more corresponding CEs to the concentrations of these proteins.
One can see that maximal spatial independence of adjacent CEs might be achieved through positioning of corresponding protein complexes on opposite sides of the DNA helix. In this respect, our finding of the 17 bp phasing (opposite to the helical) in the distribution of the BcdKr motif combination (see Fig. 3C and H) fits the proposed model (see Fig. 6). Three hierarchical levels, binding motifs, CEs and CRMs (as well as proximal promoters), appear to describe the distribution of binding motifs and explain the biological function of motif combinations.
|
Genomics approaches and promoter analysis
Specific signals, detected in the distribution of binding motifs at the middle level, could be helpful in finding similar CEs in the genome and improving promoter recognition algorithms. On the basis of our periodic analysis and analysis of overlapping motifs, we generated models (alignments) for two putative CEs, containing BcdBcd (synergistic) and BcdKr (antagonistic) motif combinations (see Fig. 5). The motifs corresponding to the CEs are wider, and they provide better grounds for a specific search than the binding motifs for transcription factors themselves. Pre-screening with CEs might also facilitate identification of functional binding motif clusters and functional CRMs in the genome. Periodic arrangement (e.g. helical phasing) of binding motifs may also help in finding unknown binding motifs in regulatory sequences. This idea can be implemented in motif-extracting software, such as Gibbs Sampler.
CEs can become indispensable for understanding the transcription regulatory code and for reconstructing entire CRMs and promoters. In this respect, identification of CEs in promoters and CRMs and consequent analysis of these CEs both in vivo and in silico represents a high priority goal, as important as identification of promoters themselves. We believe that in future the CE concept will prove to be a powerful tool in analysis of transcription regulatory regions and promoter reconstruction. For instance, the existence of structurally different CEs (together with the different site affinity) might explain the differential gene response to concentrations of transcriptional regulators. Identical CEs found in promoters of different genes may suggest involvement of the genes in the same regulatory cascades and might help in analysis of their function.
Future progress in computational identification of developmental CEs in Drosophila CRMs will depend on the amount and the quality of available binding motifs as well as on the number of available functional CRMs, regulated by the corresponding transcription factors.
| ACKNOWLEDGEMENTS |
|---|
We thank Stephen Small for discussion, valuable remarks and help with manuscript preparation. We would also like to thank our reviewers for their careful work and helpful comments. This work was supported by a grant from National Institutes of Health (EM064864 [GenBank] ) to Claude Desplan. V.J.M. and A.P.L. were also supported in part by grants from the Ludwig Institute for Cancer Research, HHMI East Europe (55000309) and RFBR (02-04-49111). The CRM annotation and related resources are available at the following address: http://homepages.nyu.edu/
dap5/PCL/appendix2.htm.
| REFERENCES |
|---|
|
|
|---|
- Wagner,A. (1999) Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes. Bioinformatics, 15, 776784.
[Abstract/Free Full Text] - Pickert,L., Reuter,I., Klawonn,F. and Wingender,E. (1998) Transcription regulatory region analysis using signal detection and fuzzy clustering. Bioinformatics, 14, 244251.
[Abstract/Free Full Text] - Berman,B.P., Nibu,Y., Pfeiffer,B.D., Tomancak,P., Celniker,S.E., Levine,M., Rubin,G.M. and Eisen,M.B. (2002) Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome. Proc. Natl Acad. Sci. USA, 99, 757762.
[Abstract/Free Full Text] - Markstein,M., Markstein,P., Markstein,V. and Levine,M.S. (2002) Genome-wide analysis of clustered Dorsal binding sites identifies putative target genes in the Drosophila embryo. Proc. Natl Acad. Sci. USA, 99, 763768.
[Abstract/Free Full Text] - Rajewsky,N., Vergassola,M., Gaul,U. and Siggia,E.D. (2002) Computational detection of genomic cis-regulatory modules applied to body patterning in the early Drosophila embryo. BMC Bioinformatics, 3, 30.[CrossRef][Medline]
- Lifanov,A.P., Makeev,V.J., Nazina,A.G. and Papatsenko,D.A. (2003) Homotypic regulatory clusters in Drosophila. Genome Res., 13, 579588.
[Abstract/Free Full Text] - Stathopoulos,A., Van Drenth,M., Erives,A., Markstein,M. and Levine,M. (2002) Whole-genome analysis of dorsalventral patterning in the Drosophila embryo. Cell, 111, 687701.[CrossRef][Web of Science][Medline]
- Dion,V. and Coulombe,B. (2003) Interactions of a DNA-bound transcriptional activator with the TBPTFIIATFIIBpromoter quaternary complex. J. Biol. Chem., 278, 1149511501.
[Abstract/Free Full Text] - Inoue,J., Sato,R. and Maeda,M. (1998) Multiple DNA elements for sterol regulatory element-binding protein and NF-Y are responsible for sterol-regulated transcription of the genes for human 3-hydroxy-3-methylglutaryl coenzyme A synthase and squalene synthase. J. Biochem., 123, 11911198.
[Abstract/Free Full Text] - DAlonzo,R.C., Selvamurugan,N., Karsenty,G. and Partridge,N.C. (2002) Physical interaction of the activator protein-1 factors c-Fos and c-Jun with Cbfa1 for collagenase-3 promoter activation. J. Biol. Chem., 277, 816822.
[Abstract/Free Full Text] - Sarafova,S. and Siu,G. (2000) Precise arrangement of factor-binding sites is required for murine CD4 promoter function. Nucleic Acids Res., 28, 26642671.
[Abstract/Free Full Text] - Scully,K.M., Jacobson,E.M., Jepsen,K., Lunyak,V., Viadiu,H., Carriere,C., Rose,D.W., Hooshmand,F., Aggarwal,A.K. and Rosenfeld,M.G. (2000) Allosteric effects of Pit-1 DNA sites on long-term repression in cell type specification. Science, 290, 11271131.
[Abstract/Free Full Text] - Ioshikhes,I., Trifonov,E.N. and Zhang,M.Q. (1999) Periodical distribution of transcription factor sites in promoter regions and connection with chromatin structure. Proc. Natl Acad. Sci. USA, 96, 28912895.
[Abstract/Free Full Text] - Praz,V., Perier,R., Bonnard,C. and Bucher,P. (2002) The Eukaryotic Promoter Database, EPD: new entry types and links to gene expression data. Nucleic Acids Res., 30, 322324.
[Abstract/Free Full Text] - Matys,V., Fricke,E., Geffers,R., Gossling,E., Haubrock,M., Hehl,R., Hornischer,K., Karas,D., Kel,A.E., Kel-Margoulis,O.V., Kloos,D.U., Land,S., Lewicki-Potapov,B., Michael,H., Munch,R., Reuter,I., Rotert,S., Saxel,H., Scheer,M., Thiele,S. and Wingender,E. (2003) TRANSFAC: transcriptional regulation, from patterns to profiles. Nucleic Acids Res., 31, 374378.
[Abstract/Free Full Text] - Guss,K.A., Nelson,C.E., Hudson,A., Kraus,M.E. and Carroll,S.B. (2001) Control of a genetic regulatory network by a selector gene. Science, 292, 11641167.
[Abstract/Free Full Text] - Szymanski,P. and Levine,M. (1995) Multiple modes of dorsalbHLH transcriptional synergy in the Drosophila embryo. EMBO J., 14, 22292238.[Web of Science][Medline]
- Arnosti,D.N. (2003) Analysis and function of transcriptional regulatory elements: insights from Drosophila. Annu. Rev. Entomol., 48, 579602.[CrossRef][Web of Science][Medline]
- Chytil,M., Peterson,B.R., Erlanson,D.A. and Verdine,G.L. (1998) The orientation of the AP-1 heterodimer on DNA strongly affects transcriptional potency. Proc. Natl Acad. Sci. USA, 95, 1407614081.
[Abstract/Free Full Text] - Remenyi,A., Tomilin,A., Scholer,H.R. and Wilmanns,M. (2002) Differential activity by DNA-induced quarternary structures of POU transcription factors. Biochem. Pharmacol., 64, 979984.[CrossRef][Web of Science][Medline]
- Diamond,M.I., Miner,J.N., Yoshinaga,S.K. and Yamamoto,K.R. (1990) Transcription factor interactions: selectors of positive or negative regulation from a single DNA element. Science, 249, 12661272.
[Abstract/Free Full Text] - Kel-Margoulis,O.V., Kel,A.E., Reuter,I., Deineko,I.V. and Wingender,E. (2002) TRANSCompel: a database on composite regulatory elements in eukaryotic genes. Nucleic Acids Res., 30, 332334.
[Abstract/Free Full Text] - Hannenhalli,S. and Levy,S. (2002) Predicting transcription factor synergism. Nucleic Acids Res., 30, 42784284.
[Abstract/Free Full Text] - Qiu,P., Ding,W., Jiang,Y., Greene,J.R. and Wang,L. (2002) Computational analysis of composite regulatory elements. Mamm. Genome, 13, 327332.[CrossRef][Web of Science][Medline]
- Wasserman,W.W. and Fickett,J.W. (1998) Identification of regulatory regions which confer muscle-specific gene expression. J. Mol. Biol., 278, 167181.[CrossRef][Web of Science][Medline]
- Krivan,W. and Wasserman,W.W. (2001) A predictive model for regulatory sequences directing liver-specific transcription. Genome Res., 11, 15591566.
[Abstract/Free Full Text] - Pilpel,Y., Sudarsanam,P. and Church,G.M. (2001) Identifying regulatory networks by combinatorial analysis of promoter elements. Nature Genet., 29, 153159.[CrossRef][Web of Science][Medline]
- Papatsenko,D.A., Makeev,V.J., Lifanov,A.P., Regnier,M., Nazina,A.G. and Desplan,C. (2002) Extraction of functional binding sites from unique regulatory regions: the Drosophila early developmental enhancers. Genome Res., 12, 470481.
[Abstract/Free Full Text] - Waterman,M.S. (1995) Introduction to Computational Biology. Chapman & Hall London, UK.
- Feller,W. (1970) An Introduction to Probability Theory and its Applications, 3rd edn. John Wiley & Sons, New York.
- Peng,C.K., Buldyrev,S.V., Goldberger,A.L., Havlin,S., Sciortino,F., Simons,M. and Stanley,H.E. (1992) Long-range correlations in nucleotide sequences. Nature, 356, 168170.[CrossRef][Medline]
- Burz,D.S., Rivera-Pomar,R., Jackle,H. and Hanes,S.D. (1998) Cooperative DNA binding by Bicoid provides a mechanism for threshold-dependent gene activation in the Drosophila embryo. EMBO J., 17, 59986009.[CrossRef][Web of Science][Medline]
- Rivera-Pomar,R. and Jackle,H. (1996) From gradients to stripes in Drosophila embryogenesis: filling in the gaps. Trends Genet., 12, 478483.[CrossRef][Web of Science][Medline]
- Small,S., Blair,A. and Levine,M. (1992) Regulation of even-skipped stripe 2 in the Drosophila embryo. EMBO J., 11, 40474057.[Web of Science][Medline]
- Ludwig,M.Z., Patel,N.H. and Kreitman,M. (1998) Functional analysis of eve stripe 2 enhancer evolution in Drosophila: rules governing conservation and change. Development, 125, 949958.[Abstract]
This article has been cited by other articles:
![]() |
Y. J. Passamaneck, L. Katikala, L. Perrone, M. P. Dunn, I. Oda-Ishii, and A. Di Gregorio Direct activation of a notochord cis-regulatory module by Brachyury and FoxA in the ascidian Ciona intestinalis Development, November 1, 2009; 136(21): 3679 - 3689. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Papatsenko, Y. Goltsev, and M. Levine Organization of developmental enhancers in the Drosophila embryo Nucleic Acids Res., September 1, 2009; 37(17): 5665 - 5677. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Melzer, W. Verelst, and G. Theissen The class E floral homeotic protein SEPALLATA3 is sufficient to loop DNA in 'floral quartet'-like complexes in vitro Nucleic Acids Res., January 1, 2009; 37(1): 144 - 157. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hu, H. Hu, and X. Li MOPAT: a graph-based method to predict recurrent cis-regulatory modules from known motifs Nucleic Acids Res., August 1, 2008; 36(13): 4488 - 4497. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Lopez-Ochoa, G. Acevedo-Hernandez, A. Martinez-Hernandez, G. Arguello-Astorga, and L. Herrera-Estrella Structural relationships between diverse cis-acting elements are critical for the functional properties of a rbcS minimal light regulatory unit J. Exp. Bot., December 1, 2007; 58(15-16): 4397 - 4406. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wijaya, K. Rajaraman, S.-M. Yiu, and W.-K. Sung Detection of generic spaced motifs using submotif pattern mining Bioinformatics, June 15, 2007; 23(12): 1476 - 1485. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Papatsenko ClusterDraw web server: a tool to identify and visualize clusters of binding motifs for transcription factors Bioinformatics, April 15, 2007; 23(8): 1032 - 1034. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Boeva, M. Regnier, D. Papatsenko, and V. Makeev Short fuzzy tandem repeats in genomic sequences, identification, and possible role in regulation of gene expression Bioinformatics, March 15, 2006; 22(6): 676 - 684. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Abnizova and W. R. Gilks Studying statistical properties of regulatory DNA sequences, and their use in predicting regulatory regions in the eukaryotic genomes Brief Bioinform, March 1, 2006; 7(1): 48 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhu, J. Shendure, and G. M. Church Discovering functional transcription-factor combinations in the human cell cycle Genome Res., June 1, 2005; 15(6): 848 - 855. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Favorov, M. S. Gelfand, A. V. Gerasimova, D. A. Ravcheev, A. A. Mironov, and V. J. Makeev A Gibbs sampler for identification of symmetrically structured, spaced DNA motifs with improved estimation of the signal length Bioinformatics, May 15, 2005; 21(10): 2240 - 2245. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Kulkarni and D. N. Arnosti cis-Regulatory Logic of Short-Range Transcriptional Repression in Drosophila melanogaster Mol. Cell. Biol., May 1, 2005; 25(9): 3411 - 3420. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Papatsenko and M. Levine Gene Regulatory Networks Special Feature: Quantitative analysis of binding motifs mediating diverse spatial readouts of the Dorsal gradient in the Drosophila embryo PNAS, April 5, 2005; 102(14): 4966 - 4971. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Alazard, M. Blaud, S. Elbaz, C. Vossen, G. Icre, G. Joseph, L. Nieto, and M. Erard Identification of the 'NORE' (N-Oct-3 responsive element), a novel structural motif and composite element Nucleic Acids Res., March 14, 2005; 33(5): 1513 - 1523. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. Zhou and W. H. Wong CisModule: De novo discovery of cis-regulatory modules by hierarchical mixture modeling PNAS, August 17, 2004; 101(33): 12114 - 12119. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||













