ABSTRACT
The DNA-binding domain of the oncoprotein c-Myb consists of three imperfect tryptophan-rich repeats, R
1
, R
2
and R
3
. Each repeat forms an independent mini- domain with a helix-turn-helix related motif and they are connected by linkers
containing highly conserved residues. The location of the linker between two
DNA-binding units suggests a function analogous to a dimerisation motif with a
critical role in positioning the recognition helices of each mini-domain. Mutational analysis of the minimal DNA-binding domain of chicken c-Myb (R
2
and R
3
), revealed that besides the recognition helices of each repeat, the linker connecting them was of critical importance in maintaining specific DNA- binding. A comparison of several linker sequences from different Myb proteins revealed a highly conserved motif of four amino
acids in the first half of the linker: LNPE (L138 to E141 in chicken c-Myb R
2
R
3
). Substitution of residues within this sequence led to reduced stability of protein-DNA complexes and even loss of DNA-binding. The two most affected mutants showed increased
accessibility to proteases, and fluorescence emission spectra and quenching experiments revealed greater average exposure of tryptophans which suggests changes in conformation of the proteins. From the structure of R
2
R
3
we propose that the LNPE motif provides two functions: anchorage to the first
repeat (through L) and determination of the direction of the bridge to the next
repeat (through P).
The c-
myb
proto-oncogene is expressed in a limited range of differentiating cell types and
has a critical role in early stages of hematopoiesis (
1
,
2
). The c-Myb transcription factor plays a role in the balance between proliferation
and differentiation in these cells. Aberrant overexpression of the
transcription factor inhibits differentiation of hematopoietic precursor cells
(
3
) while loss of c-Myb expression inhibits their proliferation (
4
-
6
).
The c-
myb
gene encodes a 75 kDa sequence-specific DNA-binding transcription factor with at least three functional domains
(
2
). The DNA-binding domain (DBD) is located near the N-terminus and is a highly conserved tryptophan-rich region composed of three imperfect repeats (R
1
, R
2
and R
3
) each with a helix-turn-helix related motif (
7
-
9
). Each repeat has a distinct function and is folded independently of the
others. R
3
is tightly folded in solution and mainly responsible for the sequence-specific recognition of the AAC-core in the binding site (
9
,
10
). R
2
is a more flexible unit with a cavity inside the hydrophobic core (
11
), and seems to undergo slow conformational fluctuations and specific
conformational changes upon binding to DNA (
11
-
14
). The function of R
1
is thought to be stabilisation of the protein-DNA complex through electrostatic interactions (
15
-
17
). The minimal region in Myb giving sequence-specific DNA-binding has been delimited to the R
2
and R
3
repeats (
18
-
20
). The R
2
R
3
of Myb binds to the major groove of DNA continuously (
10
) similar to transcription factor IIIA-type zinc fingers (
21
-
23
).
The majority of eukaryotic transcription factors analysed so far use motifs that
insert an [alpha]-helix into the major groove of the target DNA to obtain specific
binding. However, due to the curving of the DNA surface, a single straight [alpha]-helix can only contact 4-6 base pairs, which is too short an interaction surface to
achieve necessary specificity (
24
). Therefore, sequence-specific DNA-binding usually requires two or more subdomains to get a
sufficiently large interaction surface in the major groove. Many transcription factors achieve this by forming homo- or heterodimers like, for example, leucine zipper proteins, helix-loop-helix proteins and nuclear receptors (
25
). In addition to the primary function of binding monomers together, the
dimerisation domain is also responsible for positioning the two DNA-recognition units in a correct distance and direction.
Certain families of transcription factors bind their target as monomers but
still use multiple recognition units to contact DNA. The best studied example
is the large family of zinc finger proteins which binds DNA through covalently linked mini-domains. Another example is the covalently linked tryptophan-rich repeats in the DNA-binding domain of the Myb family of transcription factors. In
these transcription factors with repeated motifs the relative positioning of
the mini-domains might be directed by the linker sequences joining them. If so,
these linkers would have a role equally important as the dimerisation domains
in dimeric transcription factors. Accordingly, the linkers in these transcription factors contain distinct conserved residues. The evolutionary conservation of
these linkers in Myb- and zinc finger proteins suggest a distinct functional role in the
activity of the protein. Choo and Klug (
26
) studied the influence on DNA-binding of linker sequences connecting zinc fingers and found that a replacement of individual amino acids could reduce binding by factors up to 24.
In the present work we addressed the question whether the linkers between the
repeats in the Myb proteins might be of similar importance. We report that
certain single amino acid substitutions in the linker sequence of c-Myb R
2
R
3
impair the ability of the protein to bind DNA. The weakening or loss of binding
is accompanied by changes in conformation of the protein as seen from protease
cleavage patterns, fluorescence emission spectra and quenching experiments.
The minimal DNA-binding domain of the chicken c-Myb protein, R
2
R
3
and derived mutants were expressed in
Escherichia coli
using the T7 system (
27
). Proteins were purified as described by Gabrielsen
et al.
(
8
).
The following mutations were introduced in the R
2
region of c-Myb R
2
R
3
by site-directed
in vitro
mutagenesis: T96R, Q101A, R102A, E105A, PK112/113A, H121A, K123A, R125A, Q129A,
R131A, R133A and N136A, and in the linker region: L138A, N139A, P140A, P140G
and E141A (amino acids numbered from the first residue in the chicken c-Myb protein). The mutants Q101A, R102A, Q129A, R133A, N136A and N139A have
been described previously (
8
).
DNA-binding was monitored by the electrophoretic mobility shift assay (EMSA) (
28
), with the modifications described in Gabrielsen
et al
. (
8
). The MRE probe was a 23mer duplex oligo derived from the c-Myb recognition element (MRE) in the upstream region (site A) of the
mim-
1 gene (
29
).
5'-GCATTATAACGGTTTTTTAGCGC-3'
3'-CGTAATATTGCCAAAAAATCGCG-5'
Trypsin and chymotrypsin solutions were made fresh each time from powder
dissolved in TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0). Purified c-Myb proteins were diluted in TE-buffer and incubated with or without protease at 37oC for 15 min and the reaction stopped by addition of SDS loading
buffer. The samples were heated at 95oC for 2 min and loaded on a 10-20% polyacrylamide gradient gel containing SDS according to Laemmli
(
30
).
A Perkin-Elmer LS-50B Luminescence Spectrometer and a Perkin Elmer Luminescence
Spectroscopy Cell of 120 [mu]l were used for the fluorescence experiments. The exitation wavelength was
295 nm, and exitation slit 15 nm. Emission spectra were recorded between 310
and 400 nm with an emission slit of 5 nm, and a scanning speed of 500 nm/min.
Each recording was made as an average of three accumulated scans. Samples were
prepared from purified proteins in TE buffer at a concentration of 1-2 [mu]M. In the quenching experiments (
31
), each protein was measured at its emission maximum. Immediately before
recording, 126 [mu]l diluted protein was added to a test tube containing 14 [mu]l of either TE buffer, 1.5, 3, 4.5, 6 or 7.5 M acrylamide solution to
obtain final concentrations of 0, 0.1, 0.2, 0.3, 0.4 and 0.5 M acrylamide.
Fluorescence values were corrected with respect to background fluorescence and
the inner filter effect of acrylamide according to Parker (
32
). For denaturation, proteins were incubated with 6 M guanidinium chloride for
20-30 min at room temperature before recording.
In a screening study of a series of single amino acid substitutions dispersed
throughout the second repeat (R
2
) in the DNA-binding domain of chicken c-Myb and the linker connecting the second and third repeats, we
observed two regions especially affected by mutations. When performing simple EMSA studies on crude bacterial extracts containing these mutants, we found that mutations in the third [alpha]-helix of R
2
abolished or severely reduced DNA-binding (Fig.
1
), as would be expected from previous reports (
7
,
8
). However, we also observed that one mutation in the linker connecting R
2
and R
3
abolished binding while another did not (N139A and P140G in Fig.
1
). This suggested that the linker region between the repeats R
2
and R
3
might have an equally important role in binding as the third helix of R
2
, and we decided to investigate more closely the importance of the linkers.
To compare linker sequences for the conserved residues, we defined the `linker'
sequence of the c-Myb DNA-binding domain to span from the end of the third [alpha]-helix in one repeat to the beginning of the first [alpha]-helix in the next repeat (for chicken c-Myb that is N136 to T148), assuming
conservation of these secondary structure elements. A comparison of linkers connecting R
1
and R
2
and linkers connecting R
2
and R
3
in Myb proteins from both humans, higher mammals, frog, plants, insects and
yeast gave the following conserved linker residues: x x L
100
N
82
P
100
E
67
x x K
97
x x W
97
T
97
(numbers referring to percent conservation). This shows a highly conserved
sequence of four amino acids in the first half of the linker: LNPE, and a
single conserved lysine closer to R
3
. In the chicken c-Myb the conserved amino acids are L138, N139, P140, E141 and K144. The 3-D location of the LNPE motif according to the NMR structure of Ogata
et al.
(
10
) is shown in Figure
2
. Frampton
et al.
(
7
) performed a mutational analysis of c-Myb R
2
R
3
where they substituted K144 with isoleucine and observed a severe decrease in
binding. They also substituted the entire triplet NPE (139-141) for RRK found in the C-terminal of R
3
and observed a complete loss of DNA-binding.
In this work we have investigated the importance of the linker sequence between
the R
2
and the R
3
repeats in the DNA-binding domain of the chicken c-Myb protein. We have shown that certain
mutations in the highly conserved LNPE-motif in the first half of the linker severely affects the proteins
ability to bind DNA. Mutations of the invariant residues L138 and P140 were
found to have great impact on both DNA-binding and conformation of the protein. Point mutation in the highly
conserved K144 was shown previously to severely affect DNA-binding as well (
7
).
The Myb protein is a transcription factor with a repeated DNA-binding motif. The DNA-binding domain is made up from three imperfectly repeated mini-domains: R
1
, R
2
and R
3
, connected by short linker sequences. We see only one reasonable explanation why the linker sequence should be of critical importance for DNA-binding and that is if it had a role in positioning the two DNA-interacting subdomains, R
2
and R
3
, relative to each other to achieve an optimal DNA-binding surface. This explanation would define the linker as a functional
analogue of the various dimerisation domains found in transcription factors
forming homo- or heterodimers, since many of them also have a critical role in mediating an optimal relative orientation of the DNA-binding monomers. We have not provided conclusive evidence for this
hypothesis but all our observations can easily be rationalised within such a
framework. The high conservation of specific residues would not have been
expected if the linker was only a passive chain holding two autonomous domains
physically together. Our mutational analysis also demonstrates that they are conserved because they are necessary to achieve stable protein-DNA complexes. Furthermore, increasing the flexibility of the linker by
replacing a stiff proline residue with a glycine, abolished DNA-binding. In our collection of nearly 20 single amino acid replacements in
R
2
R
3
this is the only mutation outside the recognition helices that has such a large
negative effect on DNA-binding.
An NMR-derived 3-D model of the mouse R
2
R
3
-DNA complex allows us to rationalise our observation. Displaying this
structure revealed that the leucine in position 138 in the linker of R
2
R
3
seems to make close contact with the hydrophobic core and thereby the
tryptophans of R
2
(Fig.
2
). It appears to be a kind of last anchoring, and then the proline in position
140 changes the direction of the peptide chain which makes the linker break
away from R
2
. A comparison of a wide range of linker sequences from Myb proteins in many
organisms confirmed that these two amino acids are invariant in all linker
sequences we analysed. The loop from glutamate 141 to serine 146 have some
flexibility, since Ogata
et al.
(
11
) observe some flexibility in this connecting loop when measuring NMR relaxation
parameters. Accordingly, the orientation of R
2
and R
3
are not fixed in solution, while upon binding to DNA their relative orientation
becomes fixed (
34
). At the R
3
side the flexibility ends with the docking of the tryptophan and start of the
first helix (E149). To keep our hypothesis consistent with these observations
we have to assume that the conserved residues in the linker limits the relative
orientation of the two subdomains to an allowed range of conformations and that
upon DNA-binding the final fixed orientation is achieved.
Mutations in the four conserved amino acids LNPE in the first half of the linker
sequence of R
2
R
3
gives rise to specific protein-DNA complexes with great variation in binding affinity and stability. The
L138A mutant binds to the MRE probe with approximately one fourth the strength
of the wild-type complex, and was severely destabilised under competitive conditions.
It was also more sensitive to proteases, and together with fluorescence emission and quenching studies this suggests that the mutation had caused an
increase in the average exposure of the peptide chain and the tryptophans to
the solvent. These changes might be due to the alanine failing to make as
extensive contact with the hydrophobic core and the tryptophans of R
2
as leucine because of its smaller side chain, thereby leaving one tryptophan
more exposed to solvent than in the wild-type R
2
R
3
protein.
Two mutations were made in position 140: P140A and P140G. The idea was that a
glycine residue would be the only amino acid able to mimic the conformation of
the peptide bond made by proline and thereby letting the linker bend in the
same way as in the wild-type protein. However, the P140G mutant was found to be unable to bind to
MRE while the P140A mutant bound to MRE but with reduced complex stability.
Studies of the conformation revealed that the alanine mutant had the same
average exposure of tryptophans as wild-type protein, and the same degree of quenching, which makes us conclude that the overall conformation is not severely altered. The glycine mutation on the other hand, had a much higher average exposure of tryptophans with increased accessibility to proteases and to quenching by acrylamide,
which indicates changes in conformation of the protein. Because of the lack of
side chain in glycine the protein should have increased freedom of rotation in
the mutated region and the impact that this has on the overall conformation may
be too severe for the protein to be able to recover to an active DNA-binding conformation. The linker might just be too flexible and leave the
protein outside acceptable ranges of conformations resulting in loss of
function and failure to position R
2
and R
3
correctly for DNA-binding.
The two other mutations, N139A and E141A, corresponding to highly conserved but
not invariant positions, were practically unchanged with regard to DNA-binding and protein conformation except for a slightly increased off rate seen with N139A.
Zinc finger proteins represent another family of transcription factors besides c-Myb where the DBD is built up from imperfect repeats. Choo and Klug (
26
) reported evidence for the importance of the conserved linker sequences between
the individual zinc fingers. Upon DNA-binding the zinc fingers must be positioned correctly for continuous
binding, each finger contacting 3 base pairs in the major groove in a simple
and repetitive manner. When heterologous linkers were inserted, binding was
abolished, and single amino acid replacements led to severe reduction of the
binding activity. Their results are similar to those obtained with the c-Myb linkers in our study, and clearly suggests that in these transcription
factors the linkers are not only random sequences linking together autonomous domains, but they also have an active role in the
overall conformation and function of the proteins.
This work was supported by The Norwegian Research Council, The Norwegian Cancer
Society, the Odd Fellow Foundation for Medical Science and the Anders Jahres
Foundation.
REFERENCES
Return

