ABSTRACT
The DNA binding domain of the yeast transcriptional activator CYP1(HAP1) contains a zinc-cluster structure. The structures of the DNA binding domain-DNA complexes of two other zinc-cluster proteins (GAL4 and PPR1) have been studied by X-ray crystallography. Their binding domains present, besides the zinc cluster, a short linker peptide and a dimerization element. They recognize, as homodimers, two rotationally symmetric CGG trinucleotides, the linker peptide and the dimerization element playing a crucial role in binding specificity. Surprisingly, CYP1 recognizes degenerate forms of a direct repeat, CGGnnnTAnCGGnnnTA, and the role of its linker is under discussion. To better understand the binding specificity of CYP1, we have studied, by NMR, the interaction between the CYP1(55-126) peptide and two DNA fragments derived from the CYC1 upstream activation sequence 1B. Our data indicate that CYP1(55-126) interacts with a CGG and with a thymine 5 bp downstream. The CGG trinucleotide is recognized by the zinc cluster in the major groove, as for GAL4 and PPR1, and the thymine is bound in the minor groove by the N-terminal region, which possesses a basic stretch of arginyl and lysyl residues. This suggests that the CYP1(55-126) N-terminal region could play a role in the affinity and/or specificity of the interaction with its DNA targets, in contrast to GAL4 and PPR1.
The CYP1 protein is an oxygen-dependent transcriptional activator of the yeast Saccharomyces cerevisiae (1 -3 ). Its DNA binding domain is located in the N-terminal region of the molecule (2 ,3 ) and belongs to the zinc-cluster family, which is characterized by the presence of two zinc ions complexed by the sulfur atoms of six cysteines (4 ,5 ). On the basis of amino acid alignment, several proteins of this family, including CYP1, could be grouped in a subclass in which the zinc cluster is connected to a dimerization element by a short linker peptide (6 ).
The interactions with DNA of several members of this subclass have been studied. They have been shown to recognize DNA sequences containing two rotationally symmetrical or two directly repeated CGG trinucleotides separated by a variable number of base pairs (6 ). Analysis of the crystal structures of the DNA-DNA binding domain complexes of GAL4 (7 ) and PPR1 (8 ) has shown that binding of the CGG trinucleotides is ensured by a highly conserved helix of each zinc cluster. In this context, the linker element has been proposed to play an essential role. Indeed, the specificity of the interaction seems to be determined by the fit of the distance between the two zinc-cluster domains in the dimer, imposed by the structure of the linkers, to the distance between the two CGGs in the DNA target (9 ). The GAL4 and PPR1 linkers have the same length, yet they recognize two rotationally symmetrical CGGs spaced by 11 and 6 bp respectively. GAL4 linkers are completely spread on the DNA and contribute to the stability of the complex by non-specific interactions with phosphate groups located between the two CGGs. The PPR1 linkers adopt a structure that brings the two zinc-cluster regions into close proximity. This induces an asymmetrical disposition of the two monomers stabilized by the presence of contacts between the zinc cluster of one monomer and the linker peptide of the other.
CYP1 seems to behave rather differently. In contrast to known GAL4 and PPR1 targets, which nearly always present a perfect inverted repeat of the CGG motif (10 ,11 ), CYP1 targets do not really present a consensus sequence (12 and references therein). Some of them may be read either as inverted repeats: (CCGn7CGG) or as direct repeats: (CGGn6CGG) or even do not present any CGG repeat. In fact, selection experiments (13 ) and analysis of various CYP1 binding sites (12 ) suggesteded that the natural targets of CYP1 are degenerate forms of the optimal sequence, CGGnnnTAnCGGnnnTA. All these observations lead to three important structural questions: (i) does the CYP1 zinc cluster recognize CGG trinucleotides as do the zinc clusters of PPR1 and GAL4?; (ii) does the TA doublet play a direct role in the interaction?; (iii) in the case of a direct involvement of the TA doublets in binding, what part of the CYP1 DNA binding domain interacts with the additional TA doublets? To address these questions, we have undertaken a study of the interactions between the CYP1(55-126) peptide, whose structure was previously determined (14 ), and a DNA fragment derived from CYC1 upstream activation sequence 1B (CYC1-UAS1B). Our data indicate that the CGG trinucleotide is recognized by the zinc-cluster domain in the major groove, as in the case of GAL4 and PPR1, while the N-terminal part of the peptide, which is unstructured in GAL4 (7 ) and PPR1 (8 ) complexes, interacts in the minor groove with a thymine located 5 nt downstream of the CGG at the position corresponding to an adenine in the optimal sequence proposed by Zhang and Guarente (13 ).ab
The 55-126 fragment of the CYP1 DNA binding domain, whose structure of the (60-100) region has been previously determined by NMR spectroscopy (14 ), was expressed and purified as previously described (5 ). Cells were grown on minimal medium supplemented with [15N]ammonium chloride, leading to uniformly 15N-labeled samples.
All single-strand oligonucleotides were synthesized with a model 7500 DNA synthesizer (Milligen). The two duplexes (16 and 11 bp respectively) were prepared by mixing an equimolar amount of each strand (determined by absorbance at 260 nm), which were subsequently heated to 95oC and annealed by slow cooling at room temperature.
The resulting samples were dialyzed against 50 mmol/l NaH2PO4/K2HPO4, 100 or 200 mmol/l NaCl buffer, pH 6.0. Finally, 10% D2O and 0.03% NaN3 (to prevent bacterial growth) were added. D2O samples were prepared by freeze drying H2O samples and redissolving them in pure D2O.
We have studied the evolution of the NMR spectra of the 16 and 11 bp DNA fragments in the presence of increasing amounts of protein (protein/16 bp and protein/11 bp experiments) and the evolution of the protein spectra upon addition of the 16 bp fragment (16 bp/protein experiment). Addition of increasing amounts of protein to a 16 bp fragment sample was done twice, using two different salt concentrations (100 and 200 mmol/l NaCl).
Before each experiment, the DNA and protein samples were dialyzed against the same buffer (50 mmol/l NaH2PO4/K2HPO4, 100 or 200 mmol/l NaCl, pH 6.30 for the CYP1-16 bp complexes and pH 6.0 for the CYP1-11 bp complex). The added species was fractionated and each fraction lyophilized. After recording the first reference spectrum, the sample was carefully removed from the NMR tube, mixed with the lyophilized fraction and returned to the NMR tube. This process was repeated for each addition. Four spectra (at ratios of 0.25:2, 0.5:2, 0.75:2 and 1:2) were registered for the 16 bp/protein experiment, leading to a final concentration of 1.5 mmol/l DNA and 3 mmol/l CYP1. The reverse protein/16 bp experiment was carried out by recording five spectra (at ratios of 0.33:1, 0.66:1, 1:1, 1.33:1 and 1.66:1) in a first experiment and four spectra (0.25:1, 0.50:1, 0.75:1 and 1:1) in the second, using a 1.5 mmol/l DNA concentration in both cases. Similarly, three spectra (0.25:1, 0.5:1 and 1:1) were realized for the protein/11 bp experiment, leading to a final concentration of 2 mmol/l for both the protein and the DNA.
Spectra were collected on a Bruker AMX-600 spectrometer equipped with a gradient 13C/15N/1H triple resonance probe.
The CYP1(55-126) fragment we have studied (Fig. 1 ) is similar to those used in the crystallographic and NMR studies of GAL4 (7 ,27 ,28 ) and PPR1 (8 ). It is formed of the zinc cluster subdomain (residues 64-95) linked to the 16 first amino acids of the putative dimerization helix (residues 111-126) by the linker peptide (residues 96-110). In addition, nine residues (55-63) are present at the N-terminus. The linkers of the GAL4 and PPR1 fragments are shorter (nine residues instead of 14), but the lengths of the dimerization helices are similar (16 residues for CYP1 and GAL4, 17 for PPR1). The main difference is the presence in the case of the PPR1 fragment of a 23 amino acid tail at the C-terminus. The CYP1(55-126) fragment was previously studied alone in solution, leading to determination of the structure of the zinc cluster region, the other parts of the molecule remaining unstructured (14 ).
The two DNA targets are part of CYC1 upstream activation sequence 1B (CYC1-UAS1B) (Fig. 1 ). The first, corresponding to the 16 bp (GCCGGGGTTTACGGAC) sequence, was chosen to promote fixation of two molecules of protein, eventually as a dimer. The second fragment of 11 bp (CCGGGGTTTAC) was used to look in more detail at the interactions between the protein and one of the CGG trinucleotides.
The main difficulty of nucleic acid assignment is the poor dispersion of the spectra, amplified, in our case, by the lack of symmetry of the fragments. Despite these difficulties, the assignment of all proton resonances, with the exception of a few H5'H5'' protons, was obtained using the well-described standard procedure (29 ).
The presence of the low field imino proton resonances confirmed the double helical structure of the 16 bp sequence and analysis of the intra-residual and sequential (H6-H8)/H2'H2'' correlations argued in favor of an overall B-type conformation. The non-exchangeable protons were assigned using the T8 T9 T10 triplet on one strand and the unique T18 C19 sequence on the other as a starting point. All but the G1 and G17 imino protons, which are located at the extremities of the DNA fragment and are probably in very fast exchange with the solvent, were identified. Assignment of the cytosine amino protons was also quite straightforward. In contrast, none of the adenine and guanine NH2 protons could be observed in the recorded spectra.
The 11 bp sequence corresponds to the C2G31-C12G21 region of the 16 bp sequence (the numbering of which will be used for both nucleotides). Its assignment was thus mainly derived from that of the 16 bp sequence by superimposing the NOESY spectra of the two molecules. Most of the correlations present in both spectra were found at similar positions. The main variations of chemical shifts concerned the protons of the base pairs located at the termini of the 11 bp sequence, namely C2 and G31 on one strand and C12, G21 and T22 on the other.
Figure Figure 2 shows the comparison of four HSQC spectra of the protein recorded at increasing DNA:protein ratios. Clearly, the correlations broaden markedly and decrease in intensity but undergo only very small chemical shift variations, suggesting that the complex has an intermediate exchange rate. However, the variations of the intensities are very inhomogeneous. At a 0.25:2 16 bp:protein ratio most of the correlations are still present, with a more important decrease for the zinc-cluster protons. This is amplified at a ratio of 0.5:2, where the cluster disappears. Finally, at a 1:2 16 bp:protein ratio nearly all correlations are absent.
Comparison of the relative intensities between the spectra recorded at ratios of 0:2 and 0.25:2 (Fig. 3 ) allows delineation of three regions. The N-terminal fragment (60-64), the first helix (65-71) and the beginning of the following strand (72-73) are characterized by a noticeable diminution in the relative intensities, leading to the disappearance of the Ile60 and Leu62-Cys64 residues. On the other hand, the cluster C-terminal region Tyr95-Gln98 and the Trp100, Ala101, Asn111, Asp112 and Val121 residues, which belong to the linker and the dimerization element, show an increase in relative intensity. Finally, the central part of the cluster remains stable, with the exception of the His80 intensity increase.
The decrease in the relative intensities in the 67-73 region, in particular of Arg68, Arg70, Lys71, Val72 and Lys73, demonstrates the importance of this region for the interaction. These amino acids are conserved or type conserved in the GAL4 and PPR1 zinc clusters and have been shown to be involved in contact with DNA (9 ,7 ). This suggests that the CYP1 zinc cluster binds to DNA in the same manner as GAL4 and PPR1. The decrease in Lys86 intensity also suggests that this residue participates in contacts either with DNA or with the second protein. Indeed, a Lys86 -> Ile mutation induces a loss of affinity of CYP1 for the CYC1 and CYC7 UAS (30 ). More surprising are the variations observed at both ends of the cluster. The increases in the Trp100, Ala101, Asn111, Asp112 and Val121 relative intensities, together with the observation that the chemical shifts of most of the non-assigned correlations stay unmodified throughout the experiment, suggest that the linker and the dimerization helix remain unstructured. On the other hand, the disappearance of the Ile60, Leu62, Ser63 and Cys64 correlations suggests that these amino acids acquire structure in the presence of DNA. This latter result, confirmed by analysis of the protein-11 bp complex (see later), indicates that the N-terminal region of CYP1(55-126) behaves differently from that of GAL4, which remains unstructured in the presence of DNA (7 ). Finally, the particular behavior of His80 does not seem to be related to any direct interaction. In fact, its intensity increase almost certainly reflects mobility of the residue, located in a loop at the junction of the two half-domains of the zinc cluster.
Figure
Addition of the protein to the 16 bp fragment sample also resulted in a marked broadening of the resonances, leading to the disappearance of nearly all DNA signals at a ratio of 1.66:1. However, a closer look at the imino region shows that the rate of disappearance depends on the proton considered. Using the 2D NOESY series, it appears that the first resonances to disappear belong to G4C29 (imino and amino resonances) and T10A23 (T10 imino and H1', A23 H2) base pairs and C26 (H1'H2'H2''). On the other hand, the resonances of the base pairs located at the two termini (G1C32, G31 and C16G17, T18) together with those of the central region (T8A25, T9) remain visible until the end of the experiment.
Unfortunately, the decrease in the intensities was too fast to allow a detailed analysis. So the experiment was repeated, focusing on the first half of the curve (protein:16 bp ratios of 0.25:1, 0.50:1, 0.75:1 and 1:1), with a higher salt concentration in the hope of accelerating the exchange rate. Indeed, many correlations of the DNA (followed on NOESY spectra) and of the protein (detected on an HSQC spectrum) remain visible at the ratio of 1:1.
The mean intensity, calculated on all analyzed correlations, is 60% for the 0.25:1 and 35% for the 0.5:1 and 1:1 protein:16 bp ratios. As shown in Figure 4a and b (H6,H8-H1' correlations), the main variations concern the G30, G4 and G5 bases (which correspond to the first CGG triplet), G14 (which belongs to the second CGG) and the A23 and C26 bases. The C12G21 base pair (corresponding to the cytosine of the second CGG) seems unaffected, while the C3 protons (which belong to the first CGG) disappear rapidly, suggesting a binding difference between the two CGGs. A similar observation can be made in Figure 4c and d (CG imino-amino correlations), which shows that the correlations concerning the first CGG triplet disappear faster than those concerning the second. The effect of binding on the T10A23 base pair is also visible through the imino-H2 (Fig. 4 e) and to a lesser extent through the CH3-H6 (Fig. 4 f) correlations.
Ha and co-workers (12 ) have recently proposed an optimal sequence for CYP1 binding sequences formed by the repetition of two CGG and two TA motifs, CGGnnnTAnCGGnnnTA. The DNA fragment we have used in the present study was prepared from the wild-type CYC1-UAS1B and presents a CGGnnnTTnCGG motif. The TA doublet of the optimal sequence is replaced by a TT in the first half-site and is absent in the second. As expected, CYP1 binding perturbs the proton signals of the two CGGs, but also those of the T10A23 base pair, which corresponds to the second base pair of the TT doublet. Interestingly, the main effect is seen on the T10 imino and A23 H2 protons, which are located in the minor groove of the DNA, while the CH3 and H6 protons (in the major groove) are only weakly affected. The importance of this additional interaction is also indirectly assessed by the non-equivalence of the two half-sites. As shown by both protein/16 bp experiments, the H6,H8-H1' and imino-amino correlations of the first CGG triplet disappear more rapidly than those of the second, suggesting that CYP1(55-126) presents a higher affinity for the first half-site, which possesses the TT doublet, than for the second, which does not. Strikingly, the T9A24 base pair, whose importance was also suggested by Zhang and Guarente (13 ), seems unaffected by binding of CYP1 in our experiments. It is impossible to rule out that this could result from the use of a truncated CYP1 fragment. However, another explanation may be considered. It has been shown that the binding of GAL4 is sensitive to the nature of the base pairs in the middle of the site, even in the absence of any specific contacts (31 ). Similarly, we can imagine a structural role for the first base pair that would favor, for example, correct orientation of the second.
As previously, all resonances broaden as the higher molecular weight complex becomes the predominant species. But, in addition, we also observe the displacement of some correlations (Fig. 5 ), suggesting that several protons now show a fast exchange rate. This phenomenon concerns both the DNA and the protein.
DNA evolution was followed using the imino-amino, H5/H6 and H6-H8/H1' correlation regions, where no protein signal was present. Many correlations disappear, in particular those of G4, G5 and T10 on one strand and A23, C28 and C29 on the other. We also observe a large chemical shift variation of the C3, C26, C27 and G30 resonances. As expected, these modifications concern the C3G30, G4C29, G5C28 motif and its surrounding bases (C26 and C27), but also the T10A23 base pair. The T8/A25 and T9/A24 imino-amino correlations remain visible throughout the experiment.
Similarly, superimposition of the HMQC spectra of the free and complexed protein (Fig. 6 ) shows that some residues undergo a large chemical shift variation, in particular the 63-66 and the 69-72 regions, together with Cys81. Others do not seem to be influenced by the interaction, i.e. the 75-80, 82-93, 95-101 fragments and Asp111, Asn112 and Val121. These results are confirmed by the evolution of the correlation intensities observed in the NOESY spectrum of the complex (data not shown). In addition, some cross-peaks `appear' in the spectrum. These new cross-peaks result from large displacements of correlations previously located in the crowded central region of the spectrum and corresponding to non-assigned protons of the unstructured regions in the free form of CYP1.
Figure
These observations agree with our previous experiments. Even in the presence of a half-site (11 bp), the CYP1(55-126) fragment interacts with the CGGnnnnT sequence. Our data also show clearly that a part of the protein acquires structure upon DNA binding. Considering that the DNA target contains only one half-site, that Trp100, Ala101, Asp111, Asn112 and Val121 remain unaffected and that the Ser63-Ile66 segment undergoes a large chemical shift variation, it seems clear that the N-terminal part of the CYP1(55-126) fragment is concerned in the interaction. This confirms our previous hypothesis and strongly suggests that the T10/A23 base pair is recognized by the N-terminal region of the CYP1(55-126) protein.
Using 2D NOESY and 1D NOE difference experiments, we were able to observe 20 intermolecular contacts between the protein and the 11 bp fragment (Table 1 ). They concern three protein residues (Arg68, Lys71 and Val72) and five DNA bases (C2, C3, C27, C28 and C29) and demonstrate an interaction between the region of the cluster first helix (Lys69-Val72) and the CGG (C3-G5 and C28-G30) motif.
Many of these contacts are similar to those observed for GAL4 (Lys71Ha/C28H5, Lys71Hb'/C3H5, Lys71Hg/C3H5 and Val72CH3/C2H5) (28 ). We thus decided to build a preliminary model of the complex using the relative protein/DNA disposition observed in the case of GAL4 and our intermolecular NOEs (Fig. 7 ). After refinement this model displays no bad contacts and a unique distance violation >0.5 Å (between the Lys71 H[gamma] and the C29 H5), which may be due to the fact that we kept the DNA structure rigid during the minimization.
Figure
The structure, in agreement with all the data we have previously obtained, supports the idea that the CYP1 zinc cluster domain recognizes the CGG trinucleotides, as do those of GAL4 and PPR1. However, the N-terminal region of GAL4 appears unstructured and rather far from the DNA. In contrast, we observe that the CYP1 Ile60-Cys64 fragment has a conformation that brings the Ile60 residue into the minor groove of the DNA near C26 (which may affect its intensity variations). This may look rather strange, considering the absence of any constraint between the N-terminal region of the protein and the DNA. In fact, it appears that this particular Ile60-Cys64 peptide structure is present in nearly all free CYP1 structures. This results from the presence of Pro61, which restrains the available conformational space, and of several NOE restraints observed in the free structure, between Leu62, Ser63 and Cys64 and their surroundings (in particular Cys67, Arg68, Cys74, Tyr95 and Met96).
The manner in which GAL4 and PPR1 recognize DNA seems rather well understood today. Nearly all their known UASs contain two rotationally symmetrical CGG trinucleotides (10 ,11 ) and all the specific interactions occur between these two CGGs and the two zinc clusters of a protein dimer (7 ,8 ,31 ). In the case of CYP1 the picture is more complicated. The various UASs have very heterogeneous sequences. They correspond to a direct repetition of a CGG trinucleotide, but with many variations (for example CYC1-UAS1A and -B, CGGn6CGG; CYB2-UAS1, AAGGn6CGG; CYC7, CGCn6CGC) (12 and references therein). It has been shown recently that the two CGG trinucleotides of the latter two targets are not functionally equivalent (32 ). In addition, several experiments indicate the existence of contacts outside the CGGs. Methyl interaction experiments (33 ) have shown that CYP1 interacts not only with the CGG (CYC1) or CGC (CYC7) trinucleotides in the major groove, but also with a stretch of As in the minor groove covering 6 or 7 bp. More recently, selection (13 ) and mutation (12 ) experiments have led to the conclusion that CYP1 recognizes degenerate forms of the CGGnnnTAnCGGnnnTA optimal sequence.
Table 1
Our data confirm binding of the CGG trinucleotides and of, at least, a TA base pair five residues downstream in the case of the CYC1-UAS1B target. They show that the CGG trinucleotide is recognized by the zinc cluster in a manner similar to that found for GAL4 and PPR1, but also that the additional TA base pair is bound in the DNA minor groove by the CYP1(55-126) N-terminal region. The sequence of this region (Arg55-Lys-Arg-Asn-Arg-Ile-Pro-Leu-Ser63) contains a stretch of four basic residues. This stretch is even longer in the whole protein (Ser50-Ser-Lys-Ile-Lys-Arg-Lys-Arg-Asn-Arg-Ile-Pro-Leu-Ser63). Similar basic regions have been described at the N-termini of the [lambda] repressor (34 ) and, more recently, of the GAGA protein (35 ). They have been demonstrated to play a critical role in DNA binding. In addition, saturation mutation experiments conducted on the CYP1(55-125) fragment (30 ) have led to the characterization of several mutants that modulate the activity and/or affinity of CYP1 for the CYC1- and CYC7-UAS. They concern the linker region but also the N-terminal (Lys54-Ile66) fragment.
Thus, considering our NMR results together with all the data in the literature, we propose a model of CYP1-CYC1-UAS1B interaction in which the CGG trinucleotide is recognized by the zinc cluster domain of the protein and a supplementary region is recognized by the basic residue-rich Lys52-Ile-Lys-Arg-Lys-Arg-Asn-Arg59 fragment, the zinc cluster and the basic residue-rich region being linked by the Ile60-Ser63 tetrapeptide. Interestingly, an analysis of the 47 zinc cluster protein N-terminus sequences contained in the Swissprot databank reveals that 36 of them (but neither GAL4 nor PPR1) contain at least one basic residue-rich region, suggesting that the puzzling behavior of CYP1 may not be an exception.
We thank Dr Dardel and Dr Timmerman for helpful discussions and critical comments on the manuscript and Mrs D.Menay for the synthesis of the oligonucleotides. This work was supported by the French Association Pour la Recherche contre le Cancer (ARC).
*To whom correspondence should be addressed. Tel: +33 1 69 33 48 32; Fax: +33 1 69 33 30 10; Email: francois.bontems@polytechnique.fr
Protein
DNA
NOEs from the NOESY spectra
Lys71 H[alpha]
Cyt3 H5
Lys71 H[alpha]
Cyt28 H5
Lys71 H[beta]
Cyt28 NH2a
Lys71 H[beta]'
Cyt28 NH2b
Lys71 H[beta]'
Cyt3 H5
Lys71 H[beta]'
Cyt29 NH2a
Lys71 H[beta]'
Cyt29 NH2b
Lys71 H[gamma]
Cyt3 H5
Lys71 H[gamma]
Cyt29 H5
Lys71 H[gamma]
Cyt28 NH2b
Lys71 H[gamma]
Cyt3 NH2b
Arg68 H[beta]
Cyt27 H5
Arg68 H[gamma]
Cyt27 H5
NOEs from the 1D NOE difference experiments
Val72 CH3
Cyt3 H5
Val72 CH3
Cyt29 H5
Val72 CH3
Cyt3 NH2a
Val72 CH3
Cyt2 H5
Val72 CH3
Cyt2 H6
Val72 CH3
Cyt29 NH2a
Val72 CH3
Cyt29 NH2b
REFERENCES

