*
To whom correspondence should be addressed
ABSTRACT
Anomalous expansion of the DNA triplet (CTG)
n
causes myotonic dystrophy. Structural studies have been carried out on (CTG)
n
repeats in an attempt to better understand the molecular mechanism of repeat
expansion. NMR and gel electrophoretic studies demonstrate the presence of
hairpin structures for (CTG)5 and (CTG)6 in solution. The monomeric hairpin
structure remains invariant over a wide range of salt concentrations (10-200 mM NaCl), DNA concentrations (micromolar to millimolar in DNA strand)
and pH (6.0-7.5). The (CTG)
n
hairpin contains three bases in the loop when
n
is odd and four bases when
n
is even. For both odd and even
n
the stacking and pairing in the stem remain the same, i.e. two hydrogen bond T[middot]T pairs stack with the neighboring G[middot]C pairs. All the nucleotides in (CTG)
5
and (CTG)
6
adopt C2
'
-
endo
,
anti
conformations. Full-relaxation matrix analysis has been performed to derive the NOE distance
constraints from NOESY experiments at seven different mixing times (25, 50, 75,
100, 125, 200 and 500 ms). NOESY-derived distance constraints were subsequently used in restrained molecular dynamics simulations to obtain a family of structures consistent with the NMR data. The theoretical
order parameters are computed for H5-H6 (cytosines) and H2
'
-H2
''
dipolar correlations for both (CTG)
5
and (CTG)
6
by employing the Lipari-Szabo formalism. Experimental data show that the cytosine in the loop of
the (CTG)
5
hairpin is slightly more flexible than those in the stem. The cytosine in the
loop of the (CTG)
6
hairpin is extremely flexible, implying that the dynamics of the four base loop
is intrinsically different from that of the three base loop.
The involvement of DNA hairpins in biological processes has been known for
several years in both prokaryotic and eukaryotic systems (
1
). The presence of hairpins is well documented in prokaryotic and eukaryotic
replication origins (
2
-
3
). Also, hairpins have been shown to be a part of the cruciforms that release
superhelical stress in circular DNA and act as putative intermediates in
genetic recombination in prokaryotes (
4
). Nevertheless, it is only recently that we have begun to understand the
biological role of DNA hairpins in eukaryotic systems (
5
). Most striking is the observation of DNA capping at telomeres due to formation
of hairpins with G quartet stems (
6
-
7
). Also, triple helix-mediated regulation is caused by hairpin folding of the polypurine strand
at eukaryotic promoters (
8
). However, the hairpins formed by the fragile X triplet repeats (CGG)
n
[middot](CCG)
n
are perhaps biologically the most relevant structures, since they explain two
major characteristics associated with fragile X syndrome, namely expansion of
the repeat and hypermethylation of the CpG island inside the repeat (
9
).
In this article we report structural studies on DNA hairpins formed by the
triplet repeat (CTG)
n
encountered in myotonic dystrophy (DM). The DM triplets are located on the 3'-untranslated side of the myotonin protein kinase gene,
DMPK
(
10
-
12
). Inordinate expansion of the (CTG)
n
triplets leads to DM, a progressive neuromuscular disorder (
13
-
14
). Here we propose that similar hairpins are also formed in mRNA of the myotonin
kinase gene. A few of these RNA hairpins immediately following the termination
codon in normal phenotypes ensure efficient termination of transcription by
specifically binding to transcription termination factors. However,
inordinately expanded 3' (CTG)
n
triplets in disease phenotypes disable efficient termination of transcription,
probably due to the presence of multiple hairpins, which leads to the loss of
specific binding to the (CTG)
n
sequence immediately after the termination codon.
Structural studies are reported for odd and even numbers of repeats (
n
), i.e.
n
= 5 and 6 respectively. The choice of
n
(5 or 6) ensures a stable hairpin with a reasonably long stem, while at the
same time allowing a detailed structural characterization by NMR without
complications from severe resonance overlap.
The oligonucleotides d(CTG)
5,6
, d[(CTG)
2
CAG(CTG)
2
], d[(CTG)
2
CCG(CTG)
2
] and d(CGCTAGCTTGCG) were synthesized by the solid phase phosphoramidite method with an Applied Biosystems
synthesizer and purified by passing through a Sephadex column. The product was
then ethanol precipitated and lyophylized several times.
Temperature-dependent imino proton profiles were obtained for the temperature range 5-50oC. In most cases the imino proton resonances completely
disappeared above 50oC. The UV melting temperatures of (CTG)
5,6
hairpins were in the range 54-58oC for 10-200 mM NaCl, pH ~7. pH was varied between 6 and 9 in order to examine the
nature and the susceptibility of different base pairs to open-close reactions.
All the NMR experiments were performed on a Bruker AMX-500 spectrometer at the Los Alamos National Laboratory (Los Alamos, NM) or
a UNITY-500 spectrometer at Iowa State University (Ames, IA). Chemical shifts were measured with reference to
3,3,3-trimethylsilylpropionate as an internal standard. One-dimensional proton spectra in H
2
O:D
2
O (9:1) were recorded using the jump-return (JR) method (
15
), keeping the excitation maximum near the base paired imino proton resonances of G[middot]C and the null at the H
2
O resonance. In all the NOESY experiments in D
2
O the HDO signal was presaturated during 80% of the relaxation delay and 20% of
the mixing time. The mixing times ranged from 25 to 500 ms (25, 50, 75, 100, 125, 200 and 500 ms). The saturation power was optimized to give minimum bleaching of the resonances
close to the HDO signal. The DQF-COSY spectra were recorded with a modified phase cycling scheme (
16
). In all the two-dimensional experiments, except for NOESY with JR detection (JR-NOESY), 2048 data points in the
t
2 and 1024 in the
t
1 dimension were collected; 512 data points in
t
1 were collected for JR-NOESY. All the two-dimensional experiments were done in phase sensitive mode (
17
).
The NMR data were processed on a Silicon Graphics workstation (Indigo2) with
Felix software (version 2.3; Biosym Technology Inc.). A shifted square sine-bell function (shift of 70o) was used in both dimensions for all the two-dimensional NOESY and JR-NOESY data. The same window function, but with a shift of
85o was used for processing the DQF-COSY data with 2048 points in the
t
2 and 1024 in the
t
1 dimension. The volumes of the NOESY cross-peaks were obtained by the integration routine in the Felix software.
Deconvolution of the imino proton spectra was also done by the Felix software.
The nature of base pairing patterns was identified by chemical shift values of
the imino protons, temperature-dependent imino proton profiles and the NOE profiles of the imino protons.
Glycosyl torsions and sugar puckers were deduced from the NOE intensities
between base (H8/H6) and sugar (H1', H2'/2'', H3', H5'/5'') protons and the H1'-H2'
and H2'-H3' J-coupling constants derived from DQF-COSY. A set of average inter-proton distances for pairwise
interactions was obtained by performing full-relaxation matrix simulation with the NOE intensities from mixing time-dependent (25-500 ms) NOESY spectra (
18
). Following the previously described procedure (
19
), these distance constraints and the base pairing constraints were used in high temperature restrained molecular
dynamics (res-MD) and energy minimization calculations to sample a set of low energy hairpin structures consistent
with the NMR data of d(CTG)
5,6
.
The res-MD simulations were performed using AMBER software (version 4.0).
Calculations were done in vacuum including all non-bonding pairs and with a dielectric constant of 78.5. All the energy terms
were calculated by employing the all-atom force field of Weiner
et al
. (
20
). The initial system was equilibrated at 600 K in a 10 ps constant temperature MD simulation. During equilibration the end and the neck base pairs were constrained by the hydrogen bonding
potential so that the hairpin structure did not break open into a random coil.
The resulting structure at the end of the equilibration period was used as the
starting structure for a 200 ps constant energy res-MD simulation. Conformations along the trajectory, one each 2 ps, were collected and energy minimized.
Energy minimization relaxed the system to the bottom of the energy basin. A
hierarchy of sampled configurations was defined by progressively dividing the
structures among clusters in order to distinguish local and global arrangements
of atoms. Details of the methodology of both MD and cluster analysis have been
previously reported (
19
).
The cross-relaxation constants ([sigma]) were determined following the method of Macura
et al
. (
21
) by fitting the NOE cross-peak intensities scaled by the sum of both the diagonal peak intensities.
The apparent correlation times were estimated by the method of bisection (
22
) using the following equation
[sigma] = (56.92/
r
6
)[6
J
(2[omega]) -
J
([omega])]
1
where
J
([omega]) = [tau]
c
/(1 + [omega]
2
[tau]
c
2
).
J
([omega]), [tau]
c
,
r
and [omega] are spectral density, correlation time of the dipolar vector, length of the vector and
spectrometer frequency respectively. Distances of 2.45 Å for H5-H6 and 1.79 Å for H2'-H2'' were used to estimate apparent
correlation times.
Within the limit of fast internal motion the ratio of cross-relaxation constants is expressed as (
22
)
[sigma]
1
/[sigma]
2
= (
S
1
2
r
2
6
)/(
S
2
2
r
1
6
)
2
where the subscripts 1 and 2 denote the dipolar vectors under consideration and
S
2
represents the order parameter. The order parameters for the H5-H6 dipolar vector of C4 in both (CTG)
5
and (CTG)
6
are assumed to be 1.00 and the order parameters of H5-H6 of the remaining cytosines and H2'-H2'' from the sugars are estimated using equation
2
.
Theoretical computation of S
2
.
The generalized order parameters for different dipolar vectors are computed by
the Lipari-Szabo formalism (
23
,
24
). In this formalism the time dependence of the dipolar correlation function for
internal motion is defined by the following equation
C
I
(
t
) = (1/5){
P
2
[[mu](0)[middot][mu](
t
)]}
3
where
P
2
(
x
) is the Legendre polynomial,
P
2
(
x
) = (1/2)(3
x
2
- 1). [mu](0) and [mu](
t
) are the dipolar vectors at time 0 and
t
respectively. In order to compute the time dependence of the correlation
function MD simulations with only the hydrogen bonding constraints were
performed for 600 ps. The structures, one for each 0.05 ps, were collected and
employed for integration (equation
3
). The value of the correlation function for each time point was averaged over
the possible configurations [for example,
C
I
(0) was averaged over all the configurations (
N
),
C
I
(t) was averaged over
N
- 1 configurations,
C
I
(2
t
) was averaged over
N
- 2 configurations and so on]. The time dependence of the correlation
function was approximated to the following function (
23
)
C
I
(
t
) =
S
2
+(1 -
S
2
)e
-t/[tau]
e
4
where [tau]
e
is an effective correlation time. [tau]
e
is expressed in terms of [tau], the correlation time for internal motion, and [tau]
c
, the correlation time for isotropic tumbling
[tau]
-1
= [tau]
c
-1
+ [tau]
e
-1
5
Equation
4
has a simple interpretation that at
t
= 0 the value of
C
I
(
t
) is equal to 1.0 and at
t
= [infinity] (
t
>> [tau]
e
) the value of
C
I
(
t
) is equal to
S
2
.
Figure
1
shows the possible structural forms of (CTG)
5,6
and their analogs studied in this work. Note that (CTG)
5,6
can either adopt a monomeric hairpin or a mismatched duplex (Fig.
1
A-C). The non-denaturing gel electrophoretic mobility data for d(CTG)
5,6
in Figure
2
distinguishes between these two possibilities. Note that the oligomers d(CTG)
5,6
migrate faster than the 10 bp duplex (faster band in lane M). This suggests the
presence of unimolecular hairpins. The d(CTG)
5
hairpin is expected to migrate like the 7/8 bp duplex, while the d(CTG)
6
hairpin should migrate like the 9/10 bp duplex. Also, note that similar gel
patterns are observed at two different NaCl concentrations. In addition,
hairpins still remain the predominant conformation, even when the DNA
concentrations of d(CTG)
5,6
are raised from 0.25 to 25 [mu]M (data not shown).
Figure
3
shows the imino proton spectra of d(CTG)
5
at 5oC for (A) 2, (B) 0.5 and (C) 0.1 mM and (D) 25 [mu]M DNA strand. The pH and NaCl concentrations were maintained at 6.3 and
10 mM respectively. Note that the spectra remain the same within the 0.025-2.0 mM concentration range; this is typical of DNA hairpins. Resonances
around 13.1 and 10.9 p.p.m. account for four Watson-Crick G[middot]C and two T[middot]T pairs respectively. This is consistent with the hairpin of (CTG)
5
as shown Figure
1
A. The broad resonance at 12.4 p.p.m. is believed to be due to the minor
homoduplex. On the other hand, the imino proton spectrum of (CTG)
6
(shown in Fig.
5
D-F) is consistent with five G[middot]C and two T[middot]T base pairs, as shown in the hairpin structure of Figure
1
B. The maximization of G[middot]C pairs in the stem of the hairpin is achieved for even and odd repeat
numbers. The same base pairing in the stem for even and odd repeat numbers
requires different loop geometries. For example, (CTG)
5
exhibits a (CTG) trinucleotide loop, while (CTG)
6
shows a (TGCT) tetranucleotide loop (Fig.
1
A and B).
Figure
In the (CTG)
5
hairpin (Fig.
3
) the imino signals from the bases in the loop appear at 11.2 p.p.m.. This peak
sharpens on lowering the pH below 7 and is the first to disappear when the
temperature or pH is gradually raised to ~15oC or ~7.5 respectively. The area under this loop signal indicates that
the contribution originates from one base (i.e. either G or T). This means that
while one of the bases (T8 or G9) in the loop remains excluded from the
solvent, the other is constantly in fast exchange with the solvent. The imino
protons undergoing fast exchange with the solvent do not give identifiable
resonances. In order to identify which base in the loop was excluded from the
solvent one-dimensional JR spectra were recorded for (CTG)
2
CCG(CTG)
2
(Fig.
4
A) and (CTG)
2
CAG(CTG)
2
(Fig.
4
B) under the same solution conditions as for (CTG)
5
. The loop signal in both analogs is identical to the spectra of (CTG)
5
shown in Figure
3
A. This implies that the resonance at 11.2 p.p.m. in Figures
3
A and 4A and B originates from the imino proton of G9. However, the presence of
a Watson-Crick A[middot]T imino signal at 13.8 p.p.m. (Fig.
4
B) in the case of (CTG)
2
CAG(CTG)
2
is indicative of the co-existence of hairpin (Fig.
1
D) and slipped-homoduplex (Fig.
1
E). Note that Watson-Crick A[middot]T pairs are only expected in a slipped duplex, not in a blunt
duplex which contains T[middot]T and A[middot]A pairs. This is a clear example of how a single base mutation
could considerably change the structural preference.
Figure
Figure
The imino protons of a two hydrogen bond T[middot]T pair resonate at two different frequencies if they sample two different chemical shift
environments, which, in general, depends upon the flanking sequence. For
example, in the self-complementary duplex formed by CGCTAGCTTGCG the two imino protons of the T[middot]T pair (marked in bold) appear at 10.9 and 10.4 p.p.m. (Fig.
5
G;
25
). However, in a repetitive DNA structure such as the (CTG)
n
hairpin (Fig.
1
A and B) two imino protons of a T[middot]T pair resonate closer in frequency due to the similarity in their
chemical shift environments (Fig.
3
A and B). Nevertheless, at lower DNA concentration (0.1 mM) and at 15oC the two signals begin to resolve by ~0.1 p.p.m.. Further, important evidence for two hydrogen bond T[middot]T pairs comes from the ratio of the integrated intensity (
I.I
) of imino proton resonances of G[middot]C pairs to that of T[middot]T pairs. For example, in (CTG)
5
I.I
G[middot]C
/
I.I
T[middot]T
= 1 if T[middot]T pairs contain two hydrogen bonds and
I.I
G[middot]C
/I.I
T[middot]T
= 2 if T[middot]T pairs contain a single hydrogen bond. The experimentally observed
ratio
I.I
G[middot]C
/
I.I
T[middot]T
is equal to 1, indicating the presence of two hydrogen bond T[middot]T pairs. Unfortunately, the spectral overlap of the two imino signals prevents conventional
identification of two hydrogen bond T[middot]T pairs through the observation of imino-imino NOEs (
25
). We therefore compared the exchange properties of the T[middot]T pairs in (CTG)
5,6
hairpins
and those in the self-complementary duplex of 5'-CGCTAGCTTGCG-3'. Figure
5
shows the imino proton spectra of (CTG)
5
, (CTG)
6
and the self-complementary duplex of CGCTAGCTTGCG at pH 6.03, 7.0 and 8.0. Figure
6
shows the pH-dependent line widths of the imino resonances of T[middot]T pairs for the hairpin structure of (CTG)
5
and the duplex of CGCTAGCTTGCG. Note that the imino protons in the two hydrogen
bond T[middot]T pairs in the duplex have similar pH dependencies of exchange when
compared with that of T[middot]T pairs in the hairpins. This also supports that the T[middot]T pair in (CTG)
5,6
hairpins contain two hydrogen bonds.
Figure
Two-dimensional JR-NOESY experiments at 150 ms mixing time were performed to identify
the Watson-Crick G[middot]C and mismatched T[middot]T pairs of (CTG)
5,6
hairpins. As shown in Figures
3
and
5
, all four imino protons of G[middot]C pairs come under the same envelope. This prevents sequential
assignment of all four signals. However, second order NOEs from the imino
protons to H5 of the cytosines (via the amino N4-H of C) help us to identify the cytosines in the G[middot]C pairs. In addition, NOEs between the imino protons of the G[middot]C and T[middot]T pairs (Fig.
7
) and between the imino protons of the T[middot]T pairs and the amino N4-H of the G[middot]C pairs are observed; such NOEs are expected only when the
T[middot]T pairs in the stem are stacked with the two neighboring G[middot]C pairs.
Figure
Figure
8
A shows the sequential assignment for the (CTG)
5
hairpin in the H8/H6 versus H1'/H5(C) NOESY cross-section for 500 ms mixing time, while Figure
8
B shows the sequential assignment of H1', H2'/H2'' spin systems in a DQF-COSY cross-section. Similarly, NOESY and DQF-COSY spectra were recorded for
(CTG)
6
(data not shown). Additional NOESY and DQF-COSY spectra of H2'/H2''/CH3 versus H3' and H3'/H4' versus H5'/H5'' cross-sections enable the complete sequential assignment of the spin systems H8/H6, H5/CH3, H1', H2'/H2''
H3', H4', H5'/H5'' belonging to all the nucleotides in (CTG)
5,6
hairpins.
Figure
Comparison of the NOESY and DQF-COSY data reveals that all the constituent nucleotides in (CTG)
5,6
hairpins adopt C2'-
endo
,
anti
conformations, i.e. the backbone torsion angle [delta] (for sugar pucker) falling within 110-160o and the glycosyl torsion angle [chi] falling within 210-270o. Inter-nucleotide distance constraints are
present for the proton pairs H8/H6(
i
)-H1'(
i
- 1), H8/H6(
i
)-H5(
i
+ 1), H8/H6(
i
)-H2''(
i
- 1), etc.
Full-relaxation matrix simulation with NOE intensities from mixing time-dependent NOESY spectra produces a set of average inter-proton distances (which defines the initial structure) and a
lower and an upper bound for the NOE matched average distances. The single
correlation time approximation, as evidenced by experimentally determined cross-relaxation constants and the apparent correlation times (Table
1
), was used for the computation of all the relaxation matrix elements. The lower
and upper bounds are the result of choosing several different initial guesses
for the linked atom least squares refinement procedure (
18
). These are used for the lower and upper limits of the distance constraint with
respect to the corresponding spin pair for the res-MD simulations. An ensemble of structures was isolated from the 200 ps
constrained MD trajectory at 600 K and energy minimized either for 2500
conjugate gradient steps or until the root mean square value of the first
derivative of energy is below 0.1 kcal/mol/Å. One hundred structures were derived and clustered into conformationally
similar structures for both (CTG)
5
and (CTG)
6
. The details of the methodology have been reported elsewhere (
19
).
Table 1
The initial structures of (CTG)
5
for res-MD simulations were constructed for four different models of loop folding:
(i) three bases in the 3'-side of the stem; (ii) one base in the 5'- with two bases in the 3'-side of the stem; (iii) two bases in the
5'- with one base in the 3'-side; (iv) three bases in the 5'-side of the stem. Res-MD simulations were done separately
for each model. The structures derived from these four models show differences
only in the single-stranded loop segment of the hairpins. Figure
9
shows the lowest energy structure for each of the models. In model (i) (Fig.
9
A) all the three bases in the loop are stacked in the 3'-side of the stem. In model (ii) (Fig.
9
B) T8 and G9 are stacked with each other in the 3'-side while C7 is stacked in the 5'-side of the stem. Model (iii) (Fig.
9
C) has G9 stacked in the 3'-side of the stem while C7 and T8 are stacked in the 5'-side of the stem. In model (iv) (Fig.
9
D) C7, T8 and G9 are all stacked in the 5'-side of the stem, although G9 is partially flipped out of the
stacked array. Fewer inter-nucleotide distance constraints (e.g. one constraint for G6 and C7 and two
constraints for G9 and C10 are available) in the loop portion of the hairpin
structure does not allow us to rigorously distinguish the four loop stacking
possibilities. However, model (i) shows better agreement with the distance
constraints in the loop. Figure
10
shows the lowest energy structure for (CTG)
6
which satisfies the NMR constraints. The initial structure was constructed with
two bases in each side of the stem. T[middot]T pairing appears to be present in the loop of this model, although we
do not have any experimental evidence for a T[middot]T pair in the loop. It is possible that the T[middot]T pair in the loop of (CTG)
6
opens and closes so fast in the NMR time scale that we could not observe the
imino signal. The results of the cluster analysis for (CTG)
5,6
will be made available to readers on request.
Figure
Figure
Figure
The derivation of the flexibility pattern through the H2'-H2'' dipolar interaction is more complicated than for H5-H6, since it involves the motional
characteristics of the sugar ring and its dependence on many factors, like the
nature of the base pair (Watson-Crick or mismatch), the nature of the bases (for example C or G in a G[middot]C pair), the neighboring bases, the extent of stacking, etc.
However, the general features are apparent from the theoretical data shown in
Figure
11
B. In the case of (CTG)
5
G3 and G12 are the least flexible, as evidenced by the highest order parameter
values. Within the loop the 5' stacking of C7, T8 and G9 makes C7 and T8 less flexible than G9. Sugars
corresponding to mismatches are also more flexible than those from Watson-Crick pairs (Fig.
11
B; C10-T11-G12). Similar features are also observed for (CTG)
6
.
NMR and gel electrophoresis data (
26
) unequivocally demonstrate that (CTG)
n
triplets form hairpin structures (for
n
= 5 or 6) over a wide range of solution conditions. Two hydrogen bond T[middot]T pairs are stacked with the two flanking G[middot]C pairs in the stem. For the (CTG)
5
hairpin we have explored all four possible types of loop stackings. We observe
that a stack of (CTG) in the 3'-side of the stem (Fig.
9
A) is most consistent with the NMR data. We have also examined the site-specific mobility of the bases in (CTG)
n
hairpins. For (CTG)
5
the loop cytosine, C7, shows slightly greater mobility than the cytosines in
the stem, as judged by the values of [sigma] and [tau]
app
(Table
1
). However, for (CTG)
6
the cytosine in the loop, C10, is extremely flexible. From the hairpin
structure of (CTG)
6
(Fig.
10
) it is apparent that C10 (located at the tip of the loop) is free to sample
different configurations without affecting the rest of the structure.
Therefore, our structural studies have not only demonstrated that (CTG)
n
triplets form hairpin structures in solution, but we have also completely
characterized the structural and dynamic properties of these hairpins. These
properties include the stereochemistry of hairpin folding, the conformations of
the individual nucleotides in the hairpin, the base pairing in the stem of the
hairpin, the stacking of the bases in the stem and loop, the site-specific dynamics of the bases in the stem and loop and the differential
open-close reactions of the Watson-Crick G[middot]C and mismatched T[middot]T pairs in the hairpin.
Recently Mitas
et al
. (
28
) suggested the possibility of (CTG)
n
hairpins based upon gel electrophoresis, digestion by single-strand-specific P1 enzyme and chemical modification studies. Although the
gross morphology of a hairpin was evident, neither the exact stereochemistry of
hairpin folding nor the nature of the T[middot]T pair (i.e. one or two hydrogen bonds) could be directly obtained from their
data. In addition, Mitas
et al
. used (CTG)
n
sequences flanked by Watson-Crick complementary elements at the 5'- and 3'-ends, which forced hairpin folding of the
central (CTG)
n
sequence. In another independent study Gacy
et al
. (
29
) reported NMR and thermodynamic data on long and natural (CTG)
25
sequences to show the formation of hairpin structures. However, Gacy
et al
. also did not report quantitative details of the structure and dynamics of
these hairpins. Nonetheless, the simple observation by us (this work;
9
,
26
,
27
) and others (
28
-
30
) that (CTG)
n
triplets adopt hairpin structures immediately offers a structural basis for
hairpin-induced slippage during replication and the subsequent expansion (
31
,
32
).
The intrinsic propensity for hairpin formation by the (CTG)
n
sequence may also manifest itself at the level of mRNA. It is easy to visualize
the possible biological role of such a RNA hairpin, especially when the (CTG)
n
triplet occurs on the 3'-untranslated side of the mRNA (
10
-
12
). It has long been demonstrated that a stable hairpin on the 3'-untranslated side of early genes in bacteriophage T3 ensures
efficient termination of transcription both
in vitro
and
in vivo
(
33
); such a hairpin is the specific target of protein factor tau. Recently an
evolutionarily conserved (from frog to human) hairpin has been located on the 3'-untranslated side of the H2A and H4 genes (
34
); here again a specific protein complex binds this hairpin to ensure efficient
termination of transcription. Therefore, it appears that formation of RNA
hairpins by (CTG)
n
sequences on the 3'-untranslated side of the
DMPK
gene may either halt the transcription machinery or provide a specific target
for protein binding in post-transcriptional mRNA processing. It has recently been reported (
35
) that precursor mRNAs from normal and DM alleles show no differences. However,
post-transcriptional processing of the normal and DM alleles are quite
different in that mRNA maturation is severely impaired when (CTG)
n
triplets are expanded in disease phenotypes. All these data (
33
-
35
) agree with our hypothesis that a few (CTG)
n
hairpins enable the formation of specific RNA-protein complexes required for efficient termination of transcription and
for post-transcriptional mRNA processing. This specificity is impaired when the
(CTG)
n
triplets are expanded. The recent claim (
36
) that increased nucleosomal binding of expanded (CTG)
n
triplets affects
DMPK
transcription seems questionable, because if it were true there would be a
difference in the levels of precursor mRNA synthesis from normal and DM
alleles.
This work was supported by LANL grant XL-77 and the Human Genome Project of the Office of Health and Environmental
Research of the Department of Energy. We thank Ms Sue Thompson for synthesis
and purification of the DNA oligomers. We thank Dr Xian Chen for helping us
with the gel electrophoresis experiments. We are grateful to Dr Cliff Unkefer
for giving us access to the 500 MHz Bruker-AMX NMR spectrometer at CST-4. SVSM thanks Dr R. K. Moyzis for financial support. Portions of this work were presented at the 39th Annual Meeting of
the Biophysical Society, San Francisco, CA, February, 1995.






Interaction
Base position
[sigma] (s
-1
)
a
[tau]
app
(ns)
H5-H6
(CTG)
5
C7
0.48
2.1
C1/C10
0.53
2.2
C4/C13
0.61
2.5
(CTG)
6
C1
0.66
2.7
C10
b
H2'-H2''
(CTG)
5
T8
0.86
1.0
T5/T11/T14
0.58
0.9
(CTG)
6
G9/G18
0.34
0.8
C1/C16
0.86
1.0
C4/C7
0.80
1.0
T2/T11
0.80
1.0
C10/T8
0.98
1.1
T5/T14/T17
0.46
0.9



REFERENCES
