Cysteine 50 of the POUH domain determines the range of targets recognized by POU proteins
Cysteine 50 of the POU H domain determines the range of targets recognized by POU proteinsAlexander G. Stepchenko*, Nadya N. Luchina and Elizaveta V. Pankratova
Engelhardt Institute of Molecular Biology, Russian Academy of Sciences, 32 Vavilov str., 117984 Moscow, Russia
Received March 26, 1997;Revised and Accepted June 4, 1997
ABSTRACT
The best target of POU proteins (Oct-1, Oct-2) is an octamer sequence ATGCAAAT. POU proteins also recognize, with weaker affinity, the TAAT-like targets of another group of regulatory factors, the homeoproteins. Up to now, it has not been known why Cys50 of the POUH domain is absolutely conserved in contrast to that in homeoproteins. To assess the importance of Cys50 in determining the binding specificity of POU proteins, all possible amino acids were substituted for Cys at position 50, and the resulting mutants were tested with probes containing octamer (ATGCAAATNN) or homeospecific binding sites. Only the wild-type POU was shown to adequately discriminate between the octamer and homeospecific sites, and the protein affinity was only slightly affected by the nucleotide sequence flanking the octamer at the 3'-end. Any amino acid substitution at position 50 resulted in the mutant protein binding efficiently both to the octamer and the TAAT-like sequences. Moreover, in this case the 3'-flanking sequences influenced the binding to a much greater extent.
INTRODUCTION
In vertebrates, POU proteins are involved in regulation of transcription at different stages of development (1 -3 ). An octamer motif ATGCAAAT is the recognition site of POU proteins Oct-1 and Oct-2. This sequence is found in the regulatory regions of the snRNA and histone H2B genes, promoters of immunoglobulin genes and replication origins of SV40 and Ad4 (4 -9 ). Activation of gene promoters containing the octamer sequence required, in some cases, association of a B-cell-specific cofactor OBF-1 (Bob1, OCA-B) (10 -12 ) with Oct-1 and Oct-2 or general transcription factors (TBP and TFIIB). This association would restrict the POU protein-DNA recognition and gene transcription (13 ,14 ). POU proteins also bind to the `degenerate octamer' motif ATGCTAATGARAT or to the TAATGARAT sequence (15 -17 ) containing the POUH domain binding site TAATGA (18 -23 ). The conserved residues Val47 and Asn51 located in the third helix of the POUH domain make base-specific contacts in the DNA major groove, thus recognizing the 3'-half of the octamer binding site ATGCAAAT. The residues in the N-terminal arm of the POU homeodomain may form additional bonds in the minor groove (19 ,24 ). Genetic and biochemical studies of homeoproteins show that the variable residue at position 50 contacts with bases flanking the TAAT core at the 3' end and plays the key role in determining the binding specificity (25 -30 ). In contrast to the homeoproteins, residue 50 of the POUH domain is highly conserved. X-ray analysis revealed that Cys50 in the Oct-1 POUH domain made van der Waals contacts with methyl groups of thymines 3'-flanking the octamer site (24 ). On the other hand, site selection experiments demonstrated only a slight preference of the POU protein for thymine or adenine at position 9 and adenine at position 10 in the binding site ATGCAAATT/AA (19 ,31 ). Moreover, it was shown that the C50Q substitution has a little effect on the Pit-1 POUH domain affinity to DNA (32 ).
At present, it remains unknown why Cys50 is absolutely conserved in POU proteins. We have proposed that this residue restricts the range of potential POU protein targets. It is possible that evolutionary selection of transcription factors has been aimed both at increasing the affinity to their `own' targets and decreasing that to'foreign' DNA-binding sites. This is partly true for the families of transcription factors with homologous DNA-binding domains, such as homeoproteins and Pax proteins.
This work reveals the role of residues at position 50 of the POUH domain in recognizing the octamer (ATGCAAATNN) or homeospecific (TAATNN) binding sites.
MATERIALS AND METHODS
Construction of expression plasmids and purification of proteins
The cDNA fragment (amino acids 179-392) (33 ) containing the POU domain was cloned in the pUR292 expression vector between the BamHI and HindIII sites for introduction into Escherichia coli BMH 71-18 (20 ). The cells were grown to an OD600 of 0.3 in LB medium, then IPTG was added to 0.5 mM. Following 2 h of induction, bacterial cells were harvested by centrifugation and resuspended in 6 vol of a buffer (10 mM Tris-HCl pH 8.0, 10 mM EDTA, 1 mM PMSF and 0.2 mg/ml lysozyme), incubated for 20 min at 0oC, and freeze-thawed in the presence of 0.1% NP-40. Bacterial debris and nucleic acids were pelleted by centrifugation. Proteins were precipitated with ammonium sulphate at 35% saturation, collected by centrifugation, resuspended in PBS containing 1 mM DTT, 1 mM PMSF and dialyzed against the same buffer. Then the protein solution was passed through a Sepharose 2B column with immobilized on rabbit polyclonal antibodies against total E.coli BMH 71-18 proteins. The flow-through was collected and dialyzed against 100 mM KCl, 40 mM Tris-HCl pH 7.8, 1 mM EDTA, 1 mM DTT, 1 mM PMSF and 50% glycerol. Protein concentration and purity were determined by SDS-PAGE electrophoresis. Increasing amounts of highly purified wild-type POU protein were loaded on the gel for determining concentration of mutant proteins. Gel was stained with Coomassie blue G-250 and scanned by laser densitometer. Absorbtion of the bands corresponding the wild-type POU protein was plotted as a function of protein amount (calibration curve). Amount of undegradated C50K POU protein was determined according to the calibration curve.
Site-directed mutagenesis
Phage DNA preparation and mutagenesis were performed with the Pharmacia Oligonucleotide-Directed in vitro Mutagenesis System, using two mutagenic oligonucleotides: 5'-GGTTGCAGAACCAGACTTTGATCACTTCCTTCTCC-3' (R46) and 5'-GTTCTGGCGCCGGTTNNNGAACCAGACGCGGATCAC-3' (C50). The first was used to replace Arg46 of the POUH domain with Lys, and the second one for substitution of other amino acids for Cys50 (N is A, G, T or C). Mutations were identified by dideoxy sequencing.
Oligonucleotides
Two sets (`TAAT' and `OCT') of 35 bp oligonucleotides were synthesized to assay the DNA binding of the wild-type and mutant POU proteins. 1234567
TAAT5"-AGGTACCTGAGTTGATAATNAGACTGTCTCTAGAG-3" group 15"-AGGTACCTGAGTTGATAATGNGACTGTCTCTAGAG-3" group 2 12345678910OCT5"-AGGTACCTGAGATGCAAATNNGACTGTCTCTAGAG-3"
Each group of the `TAAT' nucleotide set consisted of four oligonucleotides with A, T, G or C at position 6 or 7 (groups 1 and 2, respectively). `OCT' probes contained 16 oligonucleotides with two randomized nucleotides downstream of the octamer site ATGCAAATNN. The flanking sequences were identical and did not contain additional TAAT and ATGC to obviate their effect on probe binding.
Gel retardation essay
Binding reaction was performed as described earlier (21 ). The amount of free DNA was plotted as a function of the log of POU domain concentration (Fig. 1 ). In the presence of excess protein, the relative dissociation constant was determined as the protein concentration required for binding 50% of the DNA probe, and normalized to that for the wild-type protein binding to the octamer site ATGCAAATGA.
Methylation and ethylation interference assay
Labeled DNA fragments were methylated with DMS by the method of Siebenlist and Gilbert (34 ). Modification of DNA with DEPC was performed according to Sturm et al. (35 ). The binding reaction was carried out as described earlier (21 ), except that the samples contained 1-5 [mu]g polyd(G[middot]C)[middot]polyd(G[middot]C) and 1-3 ng of modified DNA as a probe. Free and protein-bound DNA fragments were located by autoradiography, eluted from the gel, and further processed as described earlier (34 ). The cleavage products were resolved in 7% sequencing gel and quantitated by scanning the autoradiograph with a laser densitometer.
RESULTS
Interaction of the wild-type POU domain with homeospecific binding sites TAATNN and octamer sequences ATGCAAATNN
The effect of nucleotide substitutions in the homeospecific binding site TAATNN on the affinity of the Oct-2 POU domain was estimated in the gel retardation assay (Fig. 1 A). The equilibrium dissociation constant of the protein-DNA complex was determined at constant concentration of the DNA probe and increasing concentration of the POU domain (Fig. 1 B). As seen in Table 1 , adenine at position 7 and guanine or thymine at position 6 are favored by the wild-type POU, and TAATG/TA is the optimal homeospecific binding site. With other nucleotides at these positions, the affinity is about one-quarter. The variation in the dissociation constant, upon substitution of nucleotides 3'-flanking the homeospecific binding site can be characterized with a dispersion coefficient DispH = Rel.KdHmax/Rel.KdHmin, where Rel.KdHmax and Rel.KdHmin are the maximum and the minimum relative dissociation constants. For the wild-type POU domain, DispH = 3.9. The small dispersion may be explained by the fact that Cys50 of the POU domain forms no contacts with nucleotides 3'-flanking the homeospecific binding site TAATNN, or their contribution to the DNA-protein complex stability is insignificant. To test these possibilities, the contacts between the POU domain residues and nucleotides of the optimal binding site ATAATGA were mapped in interference experiments (Fig. 2 A). Ethylation and methylation of A3 and A4 in the top strand, and A2 in the bottom strand, as well as ethylation and methylation of A5 in the bottom strand, interfered with the POU domain binding. Ethylation or methylation of the nucleotides 3'-flanking the TAAT core had no effect.
Binding of mutated POU proteins to homeospecific and octamer sites
Cysteine at position 50 was substituted by other amino acids, and mutant proteins were assayed with homeospecific ATAATNN and octamer ATGCAAATNN sites. As seen in Table 1 , only five out of 19 residues (Ala, Lys, Asn, Gln and Ser) sustain the DNA-binding ability. The C50R POU mutant stands apart because of its very relaxed specificity to homeo sites. All nucleotide substitutions in the homeospecific site at any position from 1 to 7 are allowed without a dramatic decrease in the affinity (relative dissociation constants are 5-35). So, all mutants still capable of interacting with homeo sites have higher affinity to their optimal sites, and a wider range of dispersion as compared with the wild-type POU (Table 1 ). Thus, it is quite reasonable to suggest that the residues at position 50 of the mutant POU homeodomains are in closer contact with the nucleotides 3'-flanking the TAAT core, and these proteins recognize homeospecific sites in a manner similar to the homeoproteins. To check this assumption, ethylation and methylation experiments were performed (Fig. 3 ). Under the autoradiograph we have summarized all contacts thus identified. Other POU mutants C50S, C50Q and C50K have very similar methylation and ethylation patterns (data not shown). Like the wild-type POU (Fig. 2 A), methylation and ethylation of A2 and A3 at the top strand and A1 at the bottom strand as well as ethylation of A4 at the bottom strand interfered with mutant protein binding. In contrast to the wild-type POU, ethylation of nucleotides at positions 6 and 7 interfered with the mutant protein binding as well as ethylation and methylation of adenine at position -1.
DISCUSSION
Our experimental results and data in the literature (18 ,19 ,21 ,31 ,37 ) suggest that the wild-type POU domain has the following DNA-binding properties: (i) the affinity to the octamer sequence is high and only slightly depends on the 3'-flanking nucleotides; (ii) the wild-type POU domain can recognize homeospecific sites containing the TAAT core, but with a weaker affinity. Cys50 of the POUH domain is not in close proximity to the nucleotides 3'-flanking the binding sites (Fig. 2 ), but in the case of homeospecific sequences these nucleotides significantly affect the protein affinity (Table 1 ). In the case of the octamer site this effect is slight. Why does Cys50 of the POU proteins not contact with the nucleotides 3'-flanking the octamer sequence? This is apparently caused by two reasons. First, the ability of the POU domain to bend DNA (38 ), may result in the removal of Cys50 from the 3'-flanking nucleotides and relief of internal stresses. The second reason is formation of a hydrogen bond between the SH group of Cys50 and the carbonyl oxygen of the highly conserved Arg46 (24 ). This bond may alter the Cys50 orientation relative to the 3'-flanking nucleotides. Homeoproteins do not have Cys at position 50, and Lys occupies position 46 in most cases. A hydrogen bond between positions 46 and 50 in homeoproteins may only be formed if Gln50 of even-skipped homeodomain is in non-canonical conformation (30 ). To assess the mutual effect of residues 46 and 50, we examined the DNA-binding properties of R46K POU mutant. This amino acid substitution not only reducted the affinity to both TAAT and octamer sites about 10 times and changed of the optimal binding site (ATGCAAATGT) but also increased the dispersion to 3.1 (not shown). Is Cys50 alone able to define the unique specificity of the POU protein binding to DNA? Six residues (Ala, Asn, Gln, Ser, Lys and Arg) out of the 19 tested can replace Cys50 retaining the POU domain ability to bind to homeospecific sites. In all cases the affinity of mutated proteins to their optimal binding sites is higher than that of the wild-type POU domain to its preferred homeospecific sites TAATG/TA (Table 1 ). For mutants but not for the wild-type POU protein, ethylation of nucleotides 3'-flanking the TAAT core interfered with binding (Fig. 2 ). The ethylation and methylation patterns are similar to those of classical homeoproteins (28 ), and the optimal binding sites for C50A, C50Q and C50K POU proteins coincide with the corresponding sites for homeoproteins (27 ,29 ,39 ) (Fig. 3 and Table 1 ).
We believe that Cys50 defines the manner of the POUH domain interaction. With Cys at this position, the POUH domain follows the `POU' interaction pattern, and this residue is not involved in recognition of the 3'-flanking nucleotides. When Cys50 is replaced by any amino acid except Arg, the POU domain-DNA binding follows the `homeo' pattern. In this case the mutants, like classical homeoproteins, are capable of specific and high-affinity binding to homeospecific sites. Distinctions between the wild-type and the mutated POU domain are even more pronounced in their interaction with the octamer. As seen in Figure 4 only the wild-type POU domain is capable of adequately discriminating between the two types of target: the affinity for all octamer sequences is about three times higher than that for any homeospecific site. All mutants that bind to the TAAT core have significantly lower discrimination coefficients (Table 2 ). The oct- and homeo-areas in Figure 4 are either close to each other (C50A) or overlap (C50Q and C50K). Hence, such mutants bind homeospecific sequences as well as octamer sites. In contrast to them, the C50H, C50T, C50M and C50V mutants do not bind to TAAT-like sequences, but recognize the oct sites. All of them have high values of dispersion and hence efficiently bind to a limited number of potential octamer targets. Evolutionary selection adapts not only the POU proteins to their targets, but concurrently the targets to POU proteins. In this connection, it is noteworthy that there are two kinds of homeospecific sites with most affinity for wild-type POU protein, TAATGA and TAATTA (Table 1 ). The first one is a part of all TAATGARAT or ATGCTAATGARAT targets where the conserved GA are essential for POUH domain binding (18 ,22 ). The second one is not found in these natural targets. This is perhaps due to the fact that the latter is the high-affinity binding site for a wide number of Antp-family homeoproteins. Therefore, such proteins would compete with POU for TAATTARAT or ATGCTAATTARAT targets.
Thus, no residues at position 50 except Cys can give the POU proteins the capability of highly selective recognition of their specific targets and binding to them independently of the 3'-flanking nucleotides.
ACKNOWLEDGEMENTS
The authors are grateful to Prof. Oleg L.Polanovsky for valuable discussions; to Prof. Walter Schaffner (Institut fur Molecularbiologie II, der Universitat Zurich) for advice, and support in early stages of this work. Thanks are also due to Dr Rechinsky for critical reading of the manuscript and Elena Sytina and Viktor Petukhov for assistance. This work was supported by the Russian Human Genome program, the Russian State program `Frontiers in Genetics' and International Science Foundation (grants N26000 and N263000).
17 apRhys,C.M.J., Ciufo,D.M., O`Neill,E.A., Kelly,T.J. and Hayward,G.S. (1989) J. Virol. 63, 2798-2812.
18 Verrijzer,C.P., Kal,A.J. and van der Vliet,P.C. (1990) Genes Dev.4, 1964-1974. MEDLINE Abstract
19 Verrijzer,C.P., Alkema,M.J., van Weperen,W.W., van Leeuwen,H.C., Strating,M.J.J. and van der Vliet,P.C. (1992) EMBO J.11, 4993-5003. MEDLINE Abstract