ABSTRACT
Efficient transcription from the human immunodeficiency virus (HIV) promoter
depends on binding of the viral regulatory protein Tat to a
cis
-acting RNA regulatory element, TAR. Tat binds at a trinucleotide bulge located near the apex of
the TAR stem-loop structure. An essential feature of Tat-TAR interaction is that the protein induces a conformational
change in TAR that repositions the functional groups on the bases and the
phosphate backbone that are critical for specific intermolecular recognition of
TAR RNA. We have previously determined a high resolution structure for the bound form of TAR RNA using heteronuclear NMR. Here, we describe a high
resolution structure of the free TAR RNA based on 871 experimentally determined
restraints. In the free TAR RNA, bulged residues U23 and C24 are stacked within
the helix, while U25 is looped out. This creates a major distortion of the
phosphate backbone between C24 and G26. In contrast, in the bound TAR RNA, each
of the three residues from the bulge are looped out of the helix and U23 is
drawn into proximity with G26 through contacts with an arginine residue that is
inserted between the two bases. Thus, TAR RNA undergoes a transition from a
structure with an open and accessible major groove to a much more tightly
packed structure that is folded around basic side chains emanating from the Tat
protein.
The human immunodeficiency virus type-1 (HIV-1) Tat protein activates transcription from the viral long terminal
repeat by stimulating elongation efficiency of RNA polymerase II (
1
-
7
). Tat is introduced to the transcription complex following binding to a
cis
-acting RNA regulatory element called TAR, the
trans
-activation response element (
8
-
14
). Tat acts analogously to the bacteriophage [lambda] N anti-terminator protein and creates a modified transcription complex by binding to the elongating RNA polymerase together with TAR RNA and cellular co-factors (
2
,
7
,
15
,
16
).
The TAR RNA element is positioned immediately after the transcription start site
(nt +1 to +59) and forms a stable hairpin structure (
8
,
12
,
17
,
18
). Tat binds in the region of a three base (UCU or UUU) bulge and recognises
both the identity of adjacent Watson-Crick base pairs and the positions of surrounding phosphate groups (
8
,
10
-
12
,
19
-
25
). The interaction between Tat and TAR is essential for viral growth; mutants in
TAR with reduced affinity for Tat are unable to replicate efficiently (
26
,
27
).
Tat belongs to the `basic domain' family of RNA binding proteins (
28
). Members of this family, which also includes HIV-1 Rev protein and the bacteriophage [lambda] N protein, carry arginine-rich sequences that mediate their interactions with RNA (
19
,
28
,
29
). Upon binding to TAR RNA, the basic region of Tat promotes a conformational
rearrangement in TAR which places the functional groups recognised by the protein in a specific spatial arrangement (
18
,
30
,
31
). Evidence that the conformational change in TAR RNA is essential for specific Tat binding comes from the observation that mutations in TAR that result in the largest decreases in Tat affinity are associated with residues G26 and U23, the two bases
mediating the conformational change (
8
).
Using heteronuclear multidimensional NMR, we have previously determined the structure of a 29 nt fragment that contains the top part of the
TAR stem-loop bound to a 36 residue peptide which contains the essential basic and
core regions of Tat (
31
). In the present work, we have used 871 NMR-derived experimental restraints to determine the structure of the free HIV-1 TAR fragment. In contrast to the compressed structure adopted by
the bound form of TAR RNA, the bulge and adjacent base pairs in the free TAR
RNA structure are found in an exceptionally wide and accessible major groove.
Comparison of the free and bound structures of TAR also reveals two distinct
mechanisms for accommodating bulged residues within an RNA duplex.
The NMR structure was determined using either a 29 nt RNA oligonucleotide corresponding to the apical region of the wild-type TAR sequence (Fig.
1
) or a 27 nt RNA oligonucleotide in which the apical loop sequence of TAR
(CUGGGA) was replaced by a stable tetraloop sequence (UUCG). Milligram
quantities of TAR RNA were synthesised by
in vitro
transcription of DNA oligonucleotides in a T7 polymerase system and purified by
polyacrylamide gel electrophoresis. The RNA concentrations were determined from the UV absorbance at 260 nm. Preparation of >98%
15
N-labelled and
15
N-
13
C-labelled RNA was as described (
31
-
33
).
NMR spectra were recorded either on a Bruker AMX-500 NMR spectrometer operating at 500 MHz (
1
H) or a 300 MHz Bruker DRX-300 or a Bruker DMX-600 spectrometer operating at 600 MHz and equipped with triple
resonance gradient probes. One- and two-dimensional
1
H NMR spectra were recorded at 1-2 mM RNA concentrations in 5 mM phosphate buffer, pH 5.5. High quality
spectra with the wild-type TAR sequence were obtained only in the presence of ~50 mM NaCl, whereas the tetraloop TAR variant yielded high quality
spectra under all conditions.
To determine the structure of the free TAR RNA, essentially the same set of NMR experiments were carried out as in the studies of the TAR RNA-peptide complex (
31
). Briefly, NMR experiments included a two-dimensional NOESY build-up series, two- dimensional
1
H-
13
C HSQC- and HCCH-TOCSY experiments, three-dimensional
13
C-edited HCCH-COSY, HMQC-TOCSY and NOESY-HMQC experiments and a series of
31
P-
1
H correlation spectra. Experimental details concerning NMR data acquisition and processing
have been described previously (
31
).
Hydrogen bonding constraints for base pairs in the TAR RNA stem were based on
distinctive patterns of NOE interactions involving U and G imino resonances,
together with characteristic chemical shift ranges and rates of exchange with
solvent (
34
). Hydrogen bonding constraints were introduced as distance constraints between
heavy atoms (2.9 +- 0.3 Å). No hydrogen bonding constraints were introduced for the loop
or bulge regions.
Distance restraints between non-exchangeable protons were obtained from the intensities of cross-peaks in NOE build-up series and from a three-dimensional
13
C-edited NOESY spectrum acquired at 150 ms mixing time. Briefly, strong peaks were introduced as 0-3 Å distance constraints, medium peaks as 0-4 Å constraints and weak peaks as 0-5 Å constraints. NOE cross-peaks that were only
observable at long mixing times (100-200 ms) were treated as very weak constraints (0-6.0 Å) in order to reduce the possibility that inaccurate restraints arising from
spin diffusion could distort the final structure. NOE interactions involving exchangeable
resonances were only incorporated with upper boundaries of 5.5 or 6.5 Å (
31
,
35
), with the exception of the very strong NOE cross-peaks from Watson-Crick base pairs (involving the A-H
2
and U-NH
3
resonances or the C-NH
2
and G-NH
1
resonances), which were introduced as medium constraints (upper bounds 4 Å).
After completion of a first round of calculations, predicted NOEs which were
absent from the restraint list were investigated as described (
31
). We found no example of predicted close contacts (<3.8 Å) that did not generate clear NOE interactions in the NOESY spectra.
The procedure used to derive dihedral angle constraints has recently been
described in great detail (
31
,
34
,
36
). Briefly, [alpha] (O3'-P-O5'-C5') and [zeta] (C3'-O3'-P-O5') were constrained only very qualitatively (0 +-
120o) from
31
P chemical shifts.
31
P chemical shifts are dependent on a number of factors aside from torsion angles (
31
,
34
,
36
-
38
). Therefore, very qualitative restraints were introduced, and only when standard
31
P chemical shift values (between 4 and 5 p.p.m.) were observed. The [beta] (P-O5'-C5'-C4'), [epsilon] (C4'-C3'-O3'-P) and [gamma]
(O5'-C5'-C4'-C3') dihedral angles were constrained using semiquantitative
estimates of coupling constants (
31
,
36
). The sugar puckers, identifying the [delta] (C5'-C4'-C3'-O3') dihedral angles, were
constrained using a variety of
1
H-
1
H and
1
H-
13
C scalar couplings to C3'-
endo
([delta] = 85 +- 30o) conformations or were left unconstrained in cases where
significant conformational averaging was present.
No dihedral restraints, other than restraints on [delta], were incorporated in the bulge region, due to evidence of averaging in the observed patterns of scalar couplings. This is consistent with the observation of enhanced conformational flexibility in this region of the
structure (
39
).
RNA structures were calculated using restrained molecular dynamics followed by
energy minimisation with an all-atom force field using X-PLOR (
40
). The standard X-PLOR 3.1 angle parameters were modified as described to obtain more
realistic sugar puckers (
35
). RMS deviations between the average structure (calculated using `
clusterpose
';
41
) and the converged structures were calculated for all atoms and for subsets of
the structure.
A total of 754 interproton distance restraints derived from NOESY data (448
intranucleotide and 306 internucleotide), corresponding to an average of ~28 constraints/nucleotide, were used in the structure determination. A
total of 49 internucleotide restraints were non-sequential, including interstrand restraints involving exchangeable and A-H
2
protons. Three hundred and twenty three distance restraints were
conformationally redundant, meaning that the interproton distances in question
are already restrained by covalent bond length and angle restraints to within
values less than those of the NMR restraints. Although the presence of those
restraints does not affect the final outcome (C.Gubser, personal
communication), it is nevertheless important to sift through these restraints
carefully to identify important internucleotide NOEs unambiguously. In addition to the interproton distance restraints, 25 hydrogen bonds for the base pairs forming the stem were
used. No hydrogen bonding constraints were introduced in the flexible apical
loop or within the bulge region. A total of 92 dihedral angles (12 [alpha], 15 [beta], 17 [gamma], 19 [delta], 17 [epsilon] and 12 [zeta]) were constrained to different degrees
of precision as described above. No assumption was made at any stage on the planarity of the base pairs
leading to non-experimental `planarity' restraints.
A total of 60 structures with random backbone torsion angles were generated as
initial models for structure determination. Structure calculations were then
done in three successive steps (
31
,
34
,
35
,
42
,
43
). Converged and non-converged structures were separated from the number and total energy of
constraints violations. For the best 20 converged structures, less than five
distance violations of between 0.1 and 0.2 were found, whereas violations of
>0.3 were observed in the non-converged structures (21 onwards; Fig.
4
). No violations of dihedral angle constraints of >2o were observed for the converged structures. The overall energy was also
significantly higher (at least by 15 kcal/mol) for the non-converged structures. RMSD values were obtained from energy-ordered RMSD profiles (
44
).
The binding site for Tat is centred around a three nucleotide bulge found near the apex of the TAR RNA stem-loop (Fig.
1
). Extensive mutagenesis and chemical probing studies have defined the key functional groups on TAR RNA that are required for high affinity Tat
binding (
8
,
9
,
11
-
13
,
21
-
25
,
45
,
46
). The first residue in the bulge must always be a uridine (U23), but the other
residues in the bulge, C24 and U25, appear to act predominantly as spacers and
may be replaced by other nucleotides, or even by non-nucleotide linkers (
8
,
10
,
11
,
13
,
45
)
.
Tat recognises the identity of two base pairs in the stem above the U-rich bulge, G26[middot]C39 and A27[middot]U38 (
8
,
21
). Critical phosphate contacts involve phosphates P21, P22 and P40, which are
located below the bulge on both strands (
19
,
23
-
25
,
47
).
The Tat binding site can be presented on short oligoribonucleotides that carry
the sequence of the apical portion of TAR RNA, as well as on short duplexes that span the bulge region but lack the apical loop (
8
,
23
,
25
,
45
). The NMR studies described here used singly (
15
N) and doubly (
15
N-
13
C) labelled 29 residue ribo-oligonucleotides containing the 3 nt UCU bulge and the apical loop (Fig.
1
) (
31
). For ease of assignments and spectral simplifications, data were also obtained
from a
15
N-labelled TAR RNA sample in which the wild-type apical loop (CUGGA) was replaced by a stable tetraloop sequence (UUCG) (
31
). The tetraloop substitution leads to only a small decrease in Tat binding affinity (
8
).
Qualitative analysis of the free TAR RNA spectra shows that the stems
surrounding the UCU bulge form a stable double helical structure. Sharp imino
proton resonances for U and G residues within the stem regions are observed at chemical shifts characteristic of Watson-Crick base pairs. Strong evidence for base pairing in the two double
helical regions is also provided by prominent U-H
3
-A-H
2
and G imino-C amino NOE interactions. However, no such NOE cross-peaks are found corresponding to the A22[middot]U40 base pair immediately below the bulge, suggesting that,
in contrast to the bound TAR RNA structure (
31
), this base pair may be unstable in the free TAR RNA structure.
Across from the bulge strand, residues C39 and U40 produce each of the
sequential NOEs found in helical A-form structures. However, the sequential NOE interaction between U40 H6
and C39 H1' is unusually strong, indicating that this is a region where the helix
becomes significantly distorted.
Base-sugar connectivity pathways typical of A-form RNA, including both anomeric and other sugar resonances, are
observed in the bulge region up through residue C24. As an example, sequential
aromatic-H1' NOE interactions are shown for residues A22-C24 in Figure
2
. Further evidence that base stacking continues from A22 through the first two
residues of the loop (U23 and C24) comes from aromatic-aromatic NOE interactions between residues A22 and U23, and sugar-sugar cross-peaks between A22 and U23 and between U23 and C24. Nine
sequential NOE cross-peaks were observed between residues U23 and C24 and eight cross-peaks were found between residues A22 and U23. These NOE
interactions demonstrate that the first two residues in the bulge are stacked
continuously over A22.
The structure of free TAR RNA was calculated using a set of 871 experimental
constraints and procedures successfully applied to the determination of the
structure of the bound TAR RNA (
31
). Calculations began with fully randomised starting structures. Extensive
restrained molecular dynamics simulations at high temperatures were performed
in order to obtain a global fold of the RNA that was consistent with the
experimental data. Subsequent steps included restrained molecular dynamics and
energy minimisation at decreasing temperatures (
31
,
34
,
42
,
43
). A final energy minimisation in the presence and absence of electrostatic
interactions was carried out in order to evaluate whether calculated structures also had satisfactory van der Waals and electrostatic
contacts.
Figure
3
shows a major groove view of a representative structure for the free TAR
compared with a similar view of the bound TAR structure. In both structures,
the helical regions adjacent to the bulge (residues G18-G21, C41-C44 and G26-C29, G36-C39) show a characteristic A-form geometry with apparent major groove
widths within limits typical for A-form structures. However, the three base TAR bulge and the neighbouring residues define a pocket of major groove accessibility in the free TAR RNA structure that is clearly more extensive than the
comparable regions in either the bound TAR RNA structure or in A-form RNA.
Of 60 calculations initiated with different randomised starting structures,
eight failed to reach completion altogether due to unacceptable violations of
covalent geometry. The remaining structures were analysed in terms of energies
of violations of the NOE restraints and overall energies, including energies associated with constraints on covalent geometry and, after final refinement,
electrostatic and van der Waals energies. These 52 structures were ordered
according to their agreement with the data as defined by the total energy of
NOE violations (E
NOE
) (Fig.
4
). Similar results are obtained when the total energy (E
tot
) is used to order the structures (
34
). As shown in Figure
4
, E
NOE
increases dramatically after structure 29. Therefore, structures 30 onwards can be considered to be non-converged. The separation between structures 1-20 and 21-29 is also clear, though less straightforward than for 30
onwards. Within the first 20 structures, there are no violations of NOE
constraints >0.2 Å or dihedral angle restraint violations >2o. The next set of nine structures (
21
-
29
) show significantly higher energies of NOE violations. The structures in this
group each showed one to three NOE constraint violations of 0.2-0.5 Å.
Superpositions of the 20 best structures are shown for the stem regions in
Figure
5
. RMS deviations relative to the average structure are similar for the two stem
regions (~1 Å), a value that is probably close to the current realistic limit for
precision of RNA structure determination (
34
,
34
). The bulge region (residues G21-C24) is less precisely defined than the stem regions (the overall RMS
deviation about the average structure for residues G21-C24 is just under 2 Å), but continuous helical stacking is observed from residues G21
to C24 in 23 out of the 29 lowest energy structures. Including structures 21-29 in this analysis only slightly increases the RMS deviations from the
average structure for the stem regions and did not increase the RMS deviations
in the bulge region.
Twenty three of the 29 lowest energy structures showed continuous stacking from
residue G21 to C24 and are very similar to the representative structure shown in Figure
3
A and in Figure
6
. In three structures, residue A22 is unstacked and tilted by >45o relative to the plane of the G21[middot]C41 base pair, while partial stacking of U23 on A22 and of C24 on
U23 is still observed. In the remaining three structures, U23 appears sideways
and stacking is disrupted throughout this region. These six alternative
structures each show small NOE violations within the bulge region. Their
presence in the set of converged structures probably reflects the lack of
powerful cross-strand NOE restraints in the absence of base pairing within this region,
although it is difficult to rule out the possibility that these structures
represent transient conformers present in solution.
Bulge motifs are common building blocks of RNA structures and are often present
at sites recognised by RNA binding proteins. NMR studies of the TAR RNA
structure reveal two radically different methods for accommodating bulged
residues within duplex RNA and provide a detailed understanding of the
mechanism of Tat-TAR recognition (
18
,
30
,
31
).
The free TAR RNA structure, reported in this paper, presents a paradigm for the
packing of bulged residues into a continuous RNA helix. The presence of several
bulged residues stacked continuously within one strand forces a physical
separation between the base pairs adjoining the two stem regions on that
strand. To accommodate this distortion, while maintaining the energetically
favourable continuous stacking interactions of the A-form helices from the adjacent regions, some separation between the
residues on the non-bulged strand is also required. This may be related to a discontinuity in
the direction of the helix axis, which is observed in hydrodynamic and optical
experiments for free TAR (
52
,
53
), as well as for other bulged nucleic acids (
54
-
58
). Thus, the NMR data are consistent with these experiments, even though the global characteristics of TAR RNA are not defined sufficiently
precisely by the NMR data to provide an exact measure of the degree of bending
induced by the bulged residues.
A second type of distortion results from the difference in twist between the
strand carrying the bulged residues and the opposite strand. In the free TAR
RNA structure, the disparity in twist between the two strands is accommodated
by U25 looping out of the helix and by introducing a negative twist between G26
and C24 (Fig.
6
). A slight increase in twist is also observed on the opposite strand, around
residue U40, although the precision of the present study does not allow us to
determine precisely how this increased twist is distributed between residues
C39, U40 and C41.
The bound TAR RNA illustrates a radically different means of adjusting to the
presence of the three base bulge. When basic ligands bind TAR (
18
,
31
), the bulged residues U23 and C24 loop out of the helix and a binding pocket is
created that places the guanidinium and [epsilon]NH groups of the arginine within hydrogen bonding distance of functional
groups on G26 and U23. The aliphatic side chain of an arginine residue also
comes into very close contact with U23 and forms a stacking interaction underneath this bulged residue reminiscent of cation-[pi] interactions observed in proteins and their complexes (
59
). These interactions, together with a reduction in the energetic strain between
the two stem regions above and below the bulge in the bound structure, appear
to compensate for the loss of energy associated with disruption of the A22-U23-C24 stacking interactions present in the free TAR structure.
The NMR data are clearly consistent with a major conformer for the free TAR
bulge along the lines described above. However, the data also indicate that the
bulge is more flexible than the helical regions or the bound bulge. Thus,
binding of Tat not only induces a substantial conformational rearrangement in
TAR, but also locally produces a more rigid structure. As a consequence of the
flexibility in the bulge region of free TAR, the TAR bulge samples a wider region of conformational space in the absence of ligands than when bound by Tat-derived peptides. In this situation, a method of structure calculation
that assumes one structure rather than multiple conformers does not represent a
description of the full conformational range sampled by free TAR. It is
possible that minor conformers that differ from the predominant one described here are also present. Nevertheless, we believe that such conformers, if present, must be sparsely populated. If two or more major conformers that differ significantly from the predominant conformer were populated, one would expect to find mutually inconsistent NOE interactions
leading to severe violations of experimental restraints. This is clearly not
the case. Alternative structures resulting from the calculations represent
local minima which happen to fit the data reasonably well but may or may not
occur in solution. Furthermore, there are small restraint violations observed for these structures concentrated within the bulge region.
In an early study, Weeks and Crothers (
12
) demonstrated that the N
7
groups found on the purines adjacent to the TAR RNA bulge are unusually
accessible to chemical modification by DEPC and DMS. In agreement with the
chemical modification data, the NMR structures reported here also demonstrate
that the TAR RNA bulge region is surrounded by a widened and accessible major
groove (Figs
3
and
6
).
How does the open structure at the TAR RNA bulge contribute to Tat binding? By
analogy with DNA recognition by major groove binding proteins, Weeks and
Crothers (
12
) proposed that the widened major groove created by the bulged residues in TAR
RNA facilitates Tat recognition by permitting the insertion of large and
comparatively rigid protein structural elements, such as [alpha]-helices. However, this model is inconsistent with the NMR data
showing that the major groove in the bound TAR RNA structure is not wide enough
to allow penetration by such elements (
18
,
31
). Instead, TAR RNA appears to fold itself around the basic region of Tat, which
exists as an extended and disordered chain in solution (
60
), though some [alpha]-helical tendency appears within the basic region of a HIV-EIAV tat hybrid peptide (
61
). A similar mechanism occurs when basic peptides derived from Tat bind to TAR
RNA (
8
,
12
,
14
,
46
,
62
,
63
). These short, basic peptides are unfolded in solution, but show new
intrapeptide NOE cross-peaks after binding to TAR RNA (
31
).
The conformational change in TAR RNA repositions the P
(21-22)
, P
(22-23)
and P
(40-41)
phosphates, which provide energetically important contacts with Tat (
25
), around the arginine binding pocket on one surface of the TAR RNA structure
(Fig.
3
). Model building suggests that these phosphates can be easily contacted by
other basic residues found in the TAR RNA binding region (data not shown).
Contacts between these phosphates and amino acid side chains from Tat
contribute not only to the affinity of the interaction, but also to its
specificity, by providing discrimination with respect to other bulged RNA
structures. The Tat-TAR interaction therefore provides a clear example of the `indirect readout' of nucleic acid sequences through recognition of backbone phosphates
whose positions are uniquely defined by the RNA structure. The importance of
the conformational change in TAR for Tat binding is confirmed by the
observation that the mutations that produce the most severe reductions in TAR
activity involve G26 and U23 and disrupt the intermolecular interactions that
are responsible for the folding transition (
8
,
12
). Thus, although the widened major groove in the free TAR structure may
facilitate initial protein binding events, it is clear that high affinity Tat
binding to TAR is due to the refolding of TAR RNA around an extended and
unstructured basic domain from Tat, rather than being due to the insertion of a
large and stable protein structural element into a large pre-formed RNA binding site.
The interactions between Tat and TAR, as described in detail above, are critical
for virus replication. For example, replication of HIV is strongly inhibited by
over-expression of TAR RNA `decoy' sequences which act as competitive
inhibitors of Tat binding (
64
-
66
). Thus, it seems likely that small molecules that inhibit Tat binding to its
recognition site will also have antiviral activity. Two strategies for drug
design are suggested by the structural studies on TAR RNA. First, small
molecules that target the bound conformation of TAR RNA are expected to behave
as competitive inhibitors of Tat binding. An alternative strategy is to target
the unbound TAR structure. A small molecule that is able to bind free TAR RNA
with high affinity and block the conformational change could also be an
effective inhibitor of the Tat-TAR interaction and, consequently, of HIV growth.See supplementary material available in NAR Online.
We thank Andres Ramos and Dr Mike Gait for helpful discussions and Brian Wimberly for use of the program `backwheel' for generating plots of torsion angles. This work was partially supported by a grant from the MRC Aids Directed Program (940 5057).
REFERENCES
Return





