A revised secondary structure model for the 3
'-end of hepatitis B virus pregenomic RNA
A revised secondary structure model for the 3 '-end of hepatitis B virus pregenomic RNA
Alistair H.
Kidd
1
and
Karin
Kidd-Ljunggren
1,2,
*
1
Department of Virology, University of Umeå, 90 185
Umeå
,
Sweden
and
2
Department of Infectious Diseases, University of Lund, 221 85
Lund
,
Sweden
Received June 11, 1996;
Revised and Accepted July 11, 1996
ABSTRACT
The polymerase encoded by human hepatitis B virus, which has reverse
transcriptase and RNase H activity, binds to its pregenomic RNA template in a
two-step process involving a terminal redundancy. Both first strand and second
strand DNA synthesis involve primer translocation and second strand synthesis
involves a template jump. Three parts of the genome, including the so-called core promoter, are known to show deletions in strains usually
arising after long-standing HBV infection, but also in some patients treated with interferon.
A computer-based study of RNA template folding in the core promoter region,
accommodating well-known point mutations, has generated a model for the 3
'
DR1 primer binding site as being part of a superstructure encompassing an
already well-established stem-loop. Depending on the identity of nucleotides 1762 and 1764, the
DR1 region may assume two alternative secondary structures which stabilize it
as a primer binding site to different extents. Remarkably, one of these
structures includes a pronounced loop which coincides with at least 12 related
deletions seen in HBV DNA from different patients. Thus according to the model,
the 5
'
- and 3
'
-ends of pregenomic RNA, which share primary sequences but have separate
functions, are not structural equivalents. An RNA superstructure near the 3'
-end of all HBV transcripts could have far-reaching implications for the modulation of both genome replication
and post-transcriptional processing.
INTRODUCTION
Hepadnavirus genomes have a compact organization in which all transcriptional
regulatory signals coincide with open reading frames (ORFs). Almost half of the
3.2 kb partially double-stranded DNA genome of human hepatitis B virus (HBV) has overlapping ORFs.
The ORFs are termed precore/core, polymerase, pre-S/S and X. There are five major unidirectional transcripts, including the
3.5 kb pregenome and a 3.5 kb RNA species termed precore RNA, which is 30
nucleotides (nt) longer (
1
). In common with precore RNA, pregenomic RNA acts as a messenger RNA, but it
functions additionally as a template for reverse transcription (Fig.
1
). This occurs in immature core particles and encapsidation of the hepadnavirus
pregenome and polymerase together is required for DNA synthesis (
2
).
MATERIALS AND METHODS
HBV strains and sequence determination
Several HBV strains used in this study have been described previously (
14
). As the X gene/core promoter sequences were already established, sequencing
was extended downstream to include the precore region/encapsidation signal. All
sequencing was performed by the method of Kretz
et al
. (
23
) as described previously (
24
).
Nucleotide numbering throughout corresponds to that of Okamoto
et al
. (
11
). This differs by 2 nt from the numbering system adopted by Chen
et al
. (
22
) for core promoter analysis.
Phylogenetic analysis
HBV X gene sequences were aligned with those of several known strains (
14
) using CLUSTALV (
25
) and subsequently analysed using the PHYLIP package, version 3.5 (
26
). The programs SEQBOOT, DNAPARS, CONSENSE and DRAWTREE were used consecutively
as described previously (
14
).
Computer prediction of RNA secondary structure
Computer models of RNA folding were generated using the MFOLD program, a part of
the GCG package (
27
), which is based on the revised predictions of Jaeger and Zuker (
28
). This program allows consideration of suboptimal folding of RNA. The on-line facility of M. Zuker (Institute for Biomedical Computing, Washington
University, St Louis, MO; http://www.ibc.wustl.edu/~zuker/RNA/form1.cgi) was used for the purpose of confirmation, as was
RNADRAW (
29
), a graphical interactive program based on the Vienna RNA package. All
calculations were performed for a temperature of 37oC.
RESULTS
Taking into account that AGG -> TGA changes at positions 1762-1764 in the so-called core promoter are so common and that many natural
deletions arise across this same region (nt 1748-1777), we undertook a computer-based study of secondary structure prediction. Our main premise was
that there must be a unifying structural explanation for these two apparently
linked phenomena which is compatible with our present knowledge of HBV genome
transcription and replication. Since the initiator and TATA elements of precore
and pregenomic RNA transcription are now well defined (
22
), the replication events following transcription and mediated by the viral
polymerase might rather be the reason for such mutations arising.
Seventy-five HBV sequences (including 45 of our own) spanning the core promoter,
the precore region and the first 28 nt of the core region were aligned and
compared. These sequences represented genotypes A-D, as determined by phylogenetic analysis (
14
). From this alignment, a map of genotype-specific changes was made (Fig.
2
). Much of the computer-based analysis was arbitrarily based on a genotype A HBV sequence from the
databank (accession no. V00866). This 3.2 kb genomic sequence was edited to
resemble 3.5 kb pregenomic RNA, then subedited for analysis in sections. Most
of what follows deals with analysis of a 375 base sequence representing the 3'-end of the pregenomic RNA, from nt 1582 to 1956 according to the
genomic numbering (
11
).
DISCUSSION
In current research on hepadnavirus replication, much interest lies in the
elucidation of how the virus polymerase first interacts with the 5' stem-loop structure of pregenomic RNA and accomplishes a 4 nt primer
synthesis. The events of encapsidation are tightly linked to these early events
of reverse transcription and are obvious targets for intervention. Many
questions about this process remain unanswered. Why, for example, does the
polymerase prefer the 5' copy of the encapsidation signal to the 3' copy, despite the fact that a 3' copy can be made to function and cause priming of DR1 in
artificial constructs (
34
)? If the 3' copy is not required for viral replication, what strategy does the virus
use to avoid its use or any competitive effect that it might have? One simple
explanation could be that the 3' redundancy in pregenomic RNA shares the same primary sequence, but not
the same three-dimensional conformation.
Related to this first enigma is the additional one of polyadenylation signal
usage. The one polyadenylation signal, which is somewhat unusual (U
1916
AUAAA
1921
), is present at the 3'-end of all HBV transcripts, but both pregenomic RNA and precore RNA
carry it at the 5'-end also. The reason why the upstream copy of the signal is ignored
was addressed by Russnak and Ganem using WHV (
35
), who found that proper usage depended on multiple sequences 5' of it, which increased its efficiency of use. These sequences were
located within 400 nt of the polyadenylation signal. The authors suggested a
stem-loop (now known to correspond to the 5' encapsidation signal) immediately upstream of the poly(A) signal
as having a possible role in activating it.
The structures proposed in this paper, invoking the concept of structural differences at the 5'- and 3'-ends of pregenomic RNA, are of relevance in answering
the above questions. These structures can be regarded as variants of a working
model, which both deserve and demand experimental confirmation. The analysis
was originally performed to check whether core promoter mutations, both point
mutations and deletions, might arise as a result of secondary structural
considerations. The computer work presented here is convincing evidence that
they may.
We propose that the creation of deletions in the core promoter region of HBV is
a result of template skipping by the polymerase. This is compatible with what
we know of its natural function: template jumping is thought to be the
mechanism by which plus strand synthesis in hepadnaviruses is continued (
2
). The proposal is also compatible with the computer generated secondary
structures, where natural deletions arising in the core promoter map to a well-defined loop region (Fig.
4
). Since completing this first analysis, we have examined the pre-S/S region of HBV for predicted RNA secondary structure and, again, both a
129 base deletion (
36
) and a completely overlapping 183 base form of the deletion (
37
-
39
) mapped to a long predicted stem-loop structure (data not shown).
Since the genome of HBV has limited coding capacity (with overlapping ORFs and
regulatory signals), it is reasonable to assume that only certain regions can
be deleted and perhaps even only at certain stages of infection or under
certain circumstances. Thus, although template skipping may arise with some
frequency throughout the pregenomic RNA as a result of secondary structures,
such events are probably almost invariably lethal for the continuation of that
genome once expressed. It is possible that deletions are only tolerated in the
natural history of HBV infection when the need for certain products or parts of
products declines. In the case of the so-called core promoter deletions mapping within the X ORF, there could be a
decreased need for those functions of the X protein that are associated with
its C-terminal domain (
40
-
42
). However, deletions could also be of advantage to the survival of HBV. For the
pre-S region, deletions are probably related to the removal of epitopes and
escape from the immune response (
17
). Deletions in the HBV genome have been seen to arise during or after treatment
with interferon (
36
). One reason could be that double-stranded RNA, which is present in any RNA with appreciable secondary
structure, potently promotes interferon action through more than one pathway (
43
). Accidental loss of such secondary structures, if not entirely lethal for
replication, might thus be of advantage to the survival of HBV during
interferon therapy.
By attempting to explain mutations in the HBV core promoter region, we have
developed a model for the 3'-end of pregenomic RNA of HBV, WHV and GSHV involving two closely
linked stem-loop regions. These stem-loop regions (referred to as 1 and 2 in this paper) may undergo
base pairing to form a common stem (Figs
3
and
6
), but this may not be a functional requirement. The most important
consideration is the environment proposed for DR1 within stem-loop 1 of this secondary structure. This involves conserved base pairing
(Fig.
6
) and could even involve weak three-dimensional interaction with nucleotides of stem-loop 2. The latter possibility has until now been considered to be
unimportant for the replication process, based largely on work with DHBV.
The existence of stem-loop 1 at the 3'-end of pregenomic RNA will be more difficult to test
experimentally than the existence of stem-loop 2. Stem-loop 1 as proposed for HBV and WHV does not have the same
stringency of base pairing as stem-loop 2 and could be predicted to be more of a dynamic structure. We note
that no predicted Watson-Crick base pairing extends over more than seven bases, which may have
relevance for easy unfolding of the structure during reverse transcription. The
5'-end of DR1 in our model, like the site of 4 nt primer synthesis, is
part of a bulge, which may have importance for the poorly understood mechanism
of primer transfer, at least in mammalian hepadnaviruses.
With regard to the well-known point mutations in HBV, it is worth considering why a UGA motif at
positions 1762-1764 might be an advantage over an AGG motif, which was predicted to base
pair with part of the DR1 sequence (Fig.
3
a). The most likely explanation for this would be that a change to UGA causes a
secondary structure shift (Fig.
3
b) and that the substituting nucleotides can support the whole of the DR1 region
by base pairing, except for the four nucleotides directly involved in priming
of the minus strand RNA. This would have the advantage of minimizing molecular
movement in this region even further and at the same time making all four
primer binding nucleotides free from predicted base pairing (5'-UUCA-3'), instead of just three (5'-UUC-3').
Why, then, would the UGA motif not be favoured in the `wild-type' sequence, rather than arising during infection? The answer to this
may be that this region also encodes the X protein and that the AGG/UGA change
causes a non-conservative change in two amino acids (K
130
V
131
) thought to reside in an important domain of the protein for its
transactivation function (
40
-
42
). Thus, during establishment of infection, it may be an advantage (or a
necessity) for the virus to have X protein activity in infected cells, whereas
during long-established infection and especially during antiviral treatment,
efficiency of nucleic acid replication may be a more important factor.
It is interesting to note that computer folding predicted that both position
1762 alone and 1764 alone can determine whether stem-loop 1 would assume a long or a branched structure (Fig.
3
a and b). The motif AGA at 1762-1764, however, creates a
loop
as a branch, rather than a
stem-loop
. The branch would probably be better stabilized on a further change of the
motif to
U
GA. Whether a stem-loop is possible also involves consideration of nucleotide identity at
position 1773 (C or U), since the majority of genotype B and C strains have C
at this position, which cannot base pair with nt 1761 (A).
To our knowledge, no core promoter deletions have been detected which eradicate
the A
1778
GG
1780
motif, which we postulate to be instrumental in stabilizing the central part of
the DR1 sequence in Figure
3
b. Since deletions have been described involving sequences up to and including
position 1777 (Fig.
4
;
16
,
18
,
19
,
30
-
33
) and since the majority of these deletions have been predicted here by computer
folding to preserve the interaction between A
1778
GG
1780
and the central motif of DR1 (C
1828
CU
1830
), motif A
1778
GG
1780
may indeed play an indispensable structural role in the conformation depicted
in Figure
3
b. We note that deletion of sequences bearing A
1762
GG
1764
(between positions 1748 and 1777) in the conformation of Figure
3
a could lead directly to the conformation of Figure
3
b. However, while there is evidence that such deletions arise from strains
carrying U
1762
and/or A
1764
(
16
), it is not clear from the literature how often deletions can arise from
strains carrying AGG at positions 1762-1764.
As a model, the superstructure with partially base paired DR1 not only explains
the existence of specific core promoter point mutations and deletions, but it
also offers an additional explanation to that of Li
et al
. (
21
) and Lok
et al
. (
20
) for the importance of the precore stop mutation in virus replication. There
appears to be a substantial decrease in the replication competence of genotype
A HBV strains on developing the codon 28 stop mutation (
21
). While this mutation in genotype B, C and D strains might, by virtue of
creating a stronger U-A base pair from U
1858
-G
1896
, stabilize encapsidation in the 5' copy of stem-loop 2, the C-A pair created in genotype A may have significance at the 3' copy of the stem-loop. A radical alteration in stem-loop 2 (and destruction of its close
association with stem-loop 1) after the so-called precore codon 28 mutation might serve to alter the three-dimensional RNA conformation of DR1 or its immediate
environment and therefore its ability to accept the 4 nt primer. Additionally,
it is not yet clear whether local secondary structure could have an effect on
the functioning of the polyadenylation signal which serves for all HBV mRNAs,
but testing of this is clearly necessary in the light of our findings.
ACKNOWLEDGEMENTS
We thank Prof. Glenn Björk for advice and Prof. Göran Wadell for support. This study was financed by grants from the
Swedish Medical Research Council (MFR), the Magnus Bergwall Trust and the
Research Foundation of the Department of Oncology, University of Umeå.
REFERENCES
1 Yuh,C.-H., Chang,Y.-L. and Ting,L.-P. (1992) J. Virol., 66, 4073-4084.MEDLINE Abstract
26 Felsenstein,J. (1993) PHYLIP: phylogeny inference package V.3.5c. Distributed by the author, Department of Genetics, University of Seattle, Seattle, WA.
27 Genetics Computer Group Inc. (1995) Program manual of the Wisconsin Sequence Analysis Package V.8.1.
28 Jaeger,J.A., Turner,D.H. and Zuker,M. (1989) Proc. Natl. Acad. Sci. USA, 86, 7706-7710.MEDLINE Abstract
29 Matzura,O. and Wennborg,A. (1995) RNADRAW V.1.1. http://mango.mef.ki.se/~ole/rnadraw/rnadraw.html.