Secondary structure of the 3
' untranslated region of flaviviruses: similarities and differences
Secondary structure of the 3 ' untranslated region of flaviviruses: similarities and differences
Vitali
Proutski*
,
Ernest A.
Gould
1
and
Edward C.
Holmes
The Wellcome Trust Centre for the Epidemiology of Infectious Disease, Department
of Zoology, University of Oxford, South Parks Road,
Oxford
OX1 3PS,
UK
and
1
NERC Institute of Virology and Environmental Microbiology, Mansfield Road,
Oxford
OX1 3SR,
UK
Received December 2, 1996;
Revised and Accepted January 31, 1997
ABSTRACT
Genetic algorithm-based RNA secondary structure prediction was used in combination with
comparative sequence analysis to construct models of folding for the distal
part of the 3
'
-untranslated region of flaviviruses belonging to four serological groups.
Elements of RNA secondary structure that are preserved among all the
flaviviruses studied were revealed, despite the high degree of sequence
divergence between them. At the same time, structural elements were observed
that distinguish members of different serological groups and, in particular, a
region of remarkable structural divergence between the tick-borne and mosquito-borne flaviviruses was found. Application of the genetic algorithm
also revealed that the 3
'
-terminus of flaviviral genomic RNA may take on alternative conformations,
which are not observed in the 3
'
-terminus of complementary minus strand RNA. These alternative folding
patterns may have roles in the regulation of transcription and translation
initiation and in the switch between them.
INTRODUCTION
Flaviviruses share common morphological features of the virion, antigenic
determinants and genomic organization. Most of the flaviviruses are arthropod-borne and can be divided into two major groups in this respect: those transmitted by mosquitoes (mosquito- borne) and those where ticks are the main vector (tick-borne). Originally the flaviviruses were classified on the
basis of serological relatedness (
1
,
2
), a classification which has generally been confirmed by phylogenetic analysis
of viral genomic sequences (
3
,
4
), although some flaviviruses, for example yellow fever virus (YF), remain unclassified at the serological level (
5
).
All flaviviruses have a single-stranded RNA genome of positive polarity which is ~11 kb in length. Genomic RNA is the only virus-specific messenger RNA in flavivirus-infected cells and encodes a single open reading frame of
~10 kb. Translation of genomic RNA results in a single large polyprotein
precursor, which undergoes proteolytic cleavage and gives rise to 10 mature
viral proteins (
6
,
7
).
Flanking the coding region are the 5'- and 3'-untranslated regions (UTRs). The 5'-UTR of flaviviruses is relatively short
(95-132 bases in length), while the 3'-UTR is usually longer but demonstrates extensive
heterogeneity in size and sequence between different viral species and even
among different strains within the same species (
8
-
10
). This divergence is primarily concentrated within the proximal part of the 3'-UTR following the stop codon, where long deletions, insertions,
sequence repeats and even poly(A) stretches have been observed (
8
,
9
,
11
). At the same time, the distal part of the 3'-UTR (~330-400 nt in length) exhibits relatively high sequence
identity among different strains of the same viral species and even among the
members of serological groups (
8
-
10
,
12
). Interestingly, some strains of tick-borne encephalitis virus (TBE) have very short 3'-UTRs and demonstrate that the distal part of this region
alone may be sufficient for the existence of a viable virus (
9
), suggesting that it may represent a functional core of the flaviviral 3'-UTR where all or most of the important elements in viral
translation, replication and assembly are concentrated. Though the precise
mechanisms underlying these functions are currently not well understood, some
conserved elements of RNA primary and potential secondary structure have been
identified. In particular, computer-predicted folding patterns and RNase cleavage experiments have
demonstrated that the marginal 3'-terminal nucleotides of all flaviviruses form a long stable hairpin structure (3'-LSH), which preserves its shape despite significant
differences in primary structure (
8
,
9
,
13
-
17
). Such conservation suggests that this structural element has functional
importance, a concept supported by the demonstration of a specific interaction
between the 3'-LSH of flaviviruses and some host cellular proteins that are
thought to be components of the virus replication complex (
18
) and that it is the secondary structure, which is under selective constraint.
Although other conserved RNA sequence motifs have been found within the 3'-UTR of flaviviruses, little is known about the secondary structure upstream of the 3'-LSH. Specifically, while some RNA sequences within the 3'-UTR distinguish mosquito-borne from tick-borne flaviviruses, to date
there is no information identifying any structural differences between these
two groups.
In the present study we constructed models of potential RNA secondary structure
for the distal part of the flaviviral 3'-UTR, including the region upstream of the 3'-LSH. These models reveal a number of structural
elements that preserve their conformation in most of the flaviviruses, as well
as structures that characterize the members of different serological groups.
The possible functional implications of some of these structures are discussed.
MATERIALS AND METHODS
Flaviviral 3
'
-UTR nucleotide sequences
We used complete or partial sequences of the 3'-UTR from both tick-borne and mosquito-borne flaviviruses belonging to three serological
groups: Japanese encephalitis (JE), dengue (DEN) and tick-borne encephalitis (TBE) (
5
). We also analysed the 3'-UTR sequences of wild and vaccine strains of yellow fever virus
(YF) (ungrouped according to the classification of Calisher
et al
.;
2
) that had previously been used for the construction of a RNA secondary
structure model of this virus (Proutski
et al.
, manuscript submitted). This means that all flaviviral 3'-UTR sequences currently available in the GenBank database were
utilized. The GenBank accession nos of the sequences used are given in Table
1
.
The sequences were grouped according to their serological classification and
then aligned using the ClustalW program (
19
) and subsequently corrected manually. Alignments are available from the authors
upon request.
Secondary structure prediction and comparative analysis
The possible folding of the distal part of the 3'-UTR of flaviviruses was predicted using a genetic algorithm (GA)
implemented in the STAR program (
20
,
21
). The GA has several advantages over the most widely used algorithms for
secondary structure prediction, which are based on a search for the minimal
free energy state (
22
). Firstly, the GA simulates the natural folding pathway which takes place
during RNA elongation. This not only enables new stems to be added to the
growing RNA chain, but also allows structures to be removed at later stages of
the simulation if some other pairings are found to be more favourable. The GA
also allows the user to follow the folding pathway and reveal the metastable
structures that appear during RNA synthesis, which might have some functional
importance despite being disrupted at later stages of synthesis (
23
). Finally, the GA allows prediction of some tertiary interactions.
In this study we performed secondary structure prediction for each sequence and
then compared the results of prediction manually and by means of a programing
module, CovarSearch (available from the authors on request), which was developed to reveal possible
compensatory mutations in the sequences (covariations). The program uses an alignment file as input data and outputs a list of
covariant positions within the alignment taking into account only the
substitutions that occur in both strands of the possible helix (G-C -> G-U substitutions were not counted as covariant). Currently two compensatory mutations are
considered to be sufficient to prove the stem region of a hairpin. In this study we
revealed a number of elements of secondary structure containing two or more
compensatory mutations between several related sequences. Structures with one
site of compensatory mutation were also considered preferentialy over those containing no covariant nucleotides. All structures containing compensatory mutation were
considered as being under selection pressure and, therefore, functionally
important.
Manual comparison of the predicted secondary structures allowed us to reveal and
remove from the analysis structural elements that occurred occasionally in
individual sequences but retain those structures, although not supported by
compensatory mutations, that exhibited similar conformational features among
sequences.
Structure prediction and comparative analysis were performed twice. First, whole
sequences of the region in question were analysed so that regions with
independent structures that did not interfere with the structures from other
regions could be defined. Structure prediction and comparison was then repeated
for each region to identify possible variants of folding and to construct a general model of secondary structure. Secondary structure models for each sequence in analysis are available from the authors upon request.
RESULTS
Three structural regions could be defined within the distal part of the
flaviviral 3'-UTR, each containing independent elements of secondary structure
that are not likely to interfere with the elements from other regions (Figs
1
-
4
).
Region III
Region III is the most well-studied distal part of the flaviviral genomic 3'-UTR. A number of authors have reported the results of
computer predictions and direct experimental evidence for the existence of a
stable and conformationally conserved secondary structure in this region among
all flaviviruses and its possible role in viral replication (
6
,
8
,
9
,
13
-
17
,
23
). The models of secondary structure we propose generally comply with these
previous findings and show that the secondary structure of the most distal part
of the 3'-UTR is not disturbed by pairings with sequences from upstream
regions I and II. However, for dengue viruses our model of secondary structure for region III (Figs
1
,
5
and
6
IV and V) differs from that proposed previously (
13
,
16
-
18
). Specifically, in our model the 3'-LSH common to all flaviviruses is much shorter than has been shown
before, although it preserves the upper part of the stem with the
characteristic lateral stem-loop structure on top. This discrepancy is explained by the fact that
only the truncated part of the DEN 3'-terminal sequence was used previously in determination of the
secondary structure (
17
). The addition of 5' nucleotides that do not participate in any pairings with upstream
sequences results in the formation of several stable stems that prevent the
formation of a longer 3'-LSH. For example, the 3'-terminal sequence of DEN3 virus has a free energy of -30.1 kcal/mol according to our model (Fig.
5
A), whereas it is significantly higher, -25.6 kcal/mol, for the model previously proposed (Fig.
5
B;
13
,
17
,
18
). This previous model is kinetically as well as energetically unfeasible, since
the longer stem can only be formed late in RNA synthesis, which means that
energetically more stable structures formed earlier must be disrupted.
Recently, experimental evidence for the formation of a pseudoknot structure by
the truncated 3'-terminal sequence of DEN3 and other flaviviruses has been published
(
17
). Our calculations also show that for the truncated sequence a structure with
this pseudoknot (Fig.
5
C) is thermodynamically more stable (-28.0 kcal/mol) and kinetically more likely than the long straight stem (Fig.
5
B), although for the longer sequence of the 3'-terminus an even more stable folding is possible (Figs
1
and
5
A). Nevertheless, taking into account the folding pattern of region III of other
flaviviruses and the fact that the conformation of RNA can be significantly
changed by interaction with proteins (
25
,
26
), we cannot entirely exclude the existence of a longer stem for DEN viruses as
well. This issue can only be resolved by experimentally testing the secondary
structures formed by the longer sequences of the DEN virus 3'-UTR.
Figure 5
.
The folding pattern of region III of DEN3 virus (GenBank accession no. M93130).
(
A
) Structure according to our prediction, formed by a longer sequence of region
III. (
B
) Less stable structure as predicted previously (13,17,18), formed by a
truncated sequence of region III. (
C
) More stable structure with a pseudoknot, formed by a truncated sequence of
region III. Regions and nucleotides of importance are denoted as in Figure 1.
DISCUSSION
We have used all currently available 3'-UTR sequences of flaviviruses in order to determine the potential
folding of the distal part of this region. The models reveal three structurally
independent regions within this part of the 3'-UTR (Figs
1
-
4
).
Region I exhibits a high rate of sequence divergence even between closely
related viruses, a divergence which has caused some structural variation (Figs
1
-
4
). However, comparison of the structural motifs (many of which are supported by
compensatory mutations) reveals a clear similarity among all flaviviruses.
Previously we found an association between the conformation of region I and the
virulence of YF virus (Proutski
et al.
, manuscript submitted). This finding and the observation that this region can
preserve a similar overall topology in even the most distantly related
flaviviruses suggests that these proposed structures do exist
in vivo
and have some functional importance. However, the variants of folding observed
in different viruses remain to be interpreted: either the function performed by
this region does not require a strict structural conservation or the variants
of structure reflect the biological differences between viruses.
The structural properties of region II allowed us to separate the flaviviruses
studied into three groups, with dramatic differences between them and
remarkable structural similarity within each group. The first group comprises
viruses from the JE and DEN serological groups. Despite extensive variation in
sequence, these viruses share a general topology for the secondary structure of
this region which exposes highly conserved double- and single-stranded motifs: duplicated lateral bulge- and stem-loop structures and the top loop regions of the main
hairpins. This structural identity of region II supports a close evolutionary
relationship between the DEN and JE serological groups (
3
,
4
).
The folding pattern of region II of another mosquito-borne flavivirus, YF virus, appears to be different from that observed for
the DEN and JE serogroup viruses. However, the sequence motif (CS2) of the
duplicated lateral bulge- and stem-loop structures which characterizes other mosquito-borne viruses was also found in a highly conserved hairpin
within region II of YF. Conservation of this sequence and the fact that in all
cases it is exposed in the conserved structural elements suggest that it has
functional significance in the mosquito-borne flaviviruses.
The secondary structure model of region II of tick-borne flaviviruses, which is supported by a number of compensatory
mutations, is very different from that proposed for the mosquito-borne flaviviruses. In particular, no single secondary structure or primary structure motif defined in the mosquito-borne flaviviruses has been found in any of the tick-borne viruses. The folding of region II may therefore be an
important structural determinant in the tick-borne flaviviruses.
The conservation in shape of region III, with the formation of a long stable
hairpin (3'-LSH) was previously shown to be characteristic for all
flaviviruses. The models of secondary structure for the longer part of the 3'-UTR we have constructed confirm that nothing from the upstream
regions interferes with typical folding of the 3'-terminus. However, our model of the secondary structure of region
III of the DEN viruses slightly contradicts the previously reported models and
implies that they possess a shorter 3'-LSH than found in other flaviviruses (Fig.
5
). This discrepancy remains to be resolved experimentally.
Conformational conservation within the 3'-terminus of the flaviviral genome suggests that the structures have some important functions, which may include initiation and regulation of
transcription of the minus strand RNA, regulation of translation, participation
in virion assembly and stabilization of the RNA genome, which lacks a poly(A)
tail.
It is well known that
in vivo
flaviviruses demonstrate a remarkable disproportion in the synthesis of plus
and minus strand RNAs: 10-100 times more plus strand RNA than minus strand is
produced (
27
,
28
). The biological explanation for this is the double function of the genomic
plus strand RNA: it is used as a template both for transcription of the minus
strand and translation of the polyprotein precursor, while the minus strand is
only transcribed into the new plus strand. However, nothing is known about the mechanisms that regulate initiation of transcription and translation and control the switch between them.
The GA which we used for prediction of the secondary structure simulates the
natural pathway of RNA folding and allows the user to follow this pathway
during elongation of the RNA chain. From this we observed that during
`synthesis' of the 3'-terminus of genomic RNA of all flaviviruses the growing RNA chain
can form similar
intermediate
metastable structures that are disrupted by the algorithm in the later stages
of `synthesis' in order to form the more stable `final' structure. In the
intermediate folding the 3'-LSH was usually much shorter than that of the `final' structure
(Fig.
6
). Previously we found an association between the predicted folding structure of
the 3'-terminus and the virulence of YF virus: all vaccine strains have a
much shorter 3'-LSH stem than all wild (virulent) strains (Proutski
et al.
, manuscript submitted). This finding was supported by the experimental evidence
that interaction of the 3'-LSH with proteins that are believed to be components of the virus
replication complex was sensitive to the length of the 3'-LSH (
18
). Thus, we can assume that the shorter 3'-LSH which is formed as part of an intermediate structure has a
lower capacity to interact with the components of the replication complex and
to initiate transcription. The existence of the intermediate structure may then
delay formation of the `final' structure, which is capable of initiating
transcription, and therefore provide a sufficient time window to start the
process of translation. Indeed, in the life cycle of flavivirus, transcription cannot be started until the virus-specific RNA polymerase is translated and processed. One possible problem with
this hypothesis is that the real rate of RNA chain synthesis is ~20 bases/s (
29
) and individual stem formation takes milliseconds (
30
), so that the intermediate structure must be disrupted and replaced by the
`final' one extremely quickly. However, it has been shown that the lifetime of
the metastable structure is long and sufficient to perform its function (
23
).
The calculated energies of the intermediate metastable and the `final'
structures are very close in all flaviviruses. This implies that
in vivo
the population of genomic RNAs may represent a dynamic equilibrium of alternative conformations of the 3'-termini (Fig.
6
). We speculate that one of these conformations, with a longer 3'-LSH, is able to bind the proteins of the polymerase complex and
thereby initiate transcription, while another, with a truncated stem of the 3'-UTR, is unable or, at least, less able to do this. Some, so far unknown, mechanisms may shift this equilibrium in one direction or another and, as a result, favour transcription
or translation.
Figure 6
.
Alternative structures which can be formed by the distal part of the 3'-UTR of various flaviviruses. I, WN virus (GenBank accession no.
M12294); II, JE virus, strain JaOArS982 (GenBank accession no. M18370); III, YF
virus, Trinidad79 strain (GenBank accession no. U52420); IV, DEN2 virus,
Jamaica strain (GenBank accession no. M20558); V, DEN1 virus (GenBank accession
no. M87512); VI, TBE virus, Neudoerfl strain (GenBank accession no. U27495). (
A
) Structure with the shortened hairpin; (
B
) structure with a long straight hairpin. - or [brvbar] denote canonical pairing, * non-canonical pairing and : possible pairings which were not
predicted by the program. In some cases (IA, IIA and VIA) it is currently
impossible to assess the energy value of the structures, because there are no
available data for double pseudoknots. However, the energy was estimated (in
kcal/mol) for the structures without taking into account the contribution of
these pseudoknots and it is presumed that the real energy is lower. Regions and
nucleotides of importance are denoted as in Figure 1.
To test this alternative folding hypothesis we constructed secondary structure models for the 3'-termini of the complementary minus strand RNAs of DEN and YF viruses (not shown). We found that
this region can also form a long stable hairpin structure which may be
recognized by the components of the replication complex and so initiate
transcription from the minus strand, but we found no alternative folding which
would affect the length of this hairpin. Thus, the alternative folding of the 3'-terminus of flaviviral genomic RNA, which is not observed in the 3'-terminus of minus strand RNA, may work as a switching
mechanism between translation and replication. Indeed, in infected cells minus
strand RNAs are only found within the replication complex, whereas the protein-free plus chains are efficiently produced for encapsidation into the
virion (
13
).
One more piece of evidence supporting the functional importance of the alternative folding comes from the tick-borne flaviviruses. In the 3'-terminus of these viruses a conserved 11 nt domain (5'-UCUUGUUCUCC-3') has been identified which is
complementary to a sequence near the 5'-end of the viral genome (
8
). These sequence motifs are considered functionally analogous to the putative cyclization sequences of mosquito-borne flaviviruses (
15
). Interestingly, in the `final' structure of the 3'-terminus of TBE this sequence occurs in the lower part of the 3'-LSH stem (Fig.
6
VIB). However, in the alternative model of folding, with the truncated 3'-LSH, this sequence can be exposed in the loop region of another
hairpin (hairpin Y in Fig.
6
VIA) and so is more accessible for complementary interaction with the respective
5'-end sequence. According to our hypothesis, folding with the shorter
3'-LSH is transcriptionally less efficient, so that the complementary
interaction between the loop region in the 3'-terminus and the corresponding sequence in the 5'-terminus of genomic RNA, which leads to cyclization of
the flaviviral genome, may not be involved in replication, but instead may be
the first step in virion packaging. Interestingly, in all models of the
secondary structure we have constructed for region III of mosquito-borne flaviviruses the cyclization sequence is either exposed in the
single-stranded region or (in YF virus) participates in a very unstable pairing.
Hence, it can also be accessed by the 5'-end complement and cause genome cyclization.
In conclusion, the secondary structure models of the flaviviral 3'-UTR we have constructed demonstrate similarities between the
viruses and, at the same time, highlight structural differences which may
reflect their underlying biological differences. The elements of secondary
structure were found to contain a high number of compensatory mutations, which
support our models of folding and suggest that these elements are under strong
functional constraint. These functions, which may provide an insight into the
delicate mechanisms of viral life, remain to be determined experimentally. Furthermore, the functional importance of these structures makes them potentially attractive targets for both
antiviral drug design and vaccine development: the rational modification of
these structures or the functionally important sequences which are exposed by
them could produce an avirulent but immunogenic virus and chemical agents that
specifically interact with the structures of the flaviviral 3'-UTR and interfere with their functions may be used against these
viruses.
ACKNOWLEDGEMENTS
This work was funded by research grants from The Wellcome Trust and The Royal
Society.
REFERENCES
1 Porterfield,J.S. (1980) In Shlesinger,R.W. (ed.), The Togaviruses. Academic Press, New York, NY, pp. 13-46.
2 Calisher,C.H., Karabatsos,N., Dalrymple,J.M., Shope,R.E., Porterfield,J., Westaway,E.G. and Brandt,W.E. (1989) J. Gen. Virol., 70, 37-43.MEDLINE Abstract
3 Marin,M.S., de Zanotto,P.M.A., Gritsun,T.S. and Gould,E.A. (1995) Virology, 206, 1133-1139.MEDLINE Abstract
4 Zanotto,P.M. de A., Gould,E.A., Gao,G.E., Harvey,P.H. and Holmes,E.C. (1996) Proc. Natl. Acad. Sci. USA, 93, 548-553.MEDLINE Abstract
5 Rice,C.M. (1996) In Fields,B.N. et al. (eds), FieldsVirology, 3rd Edn. Lippincott-Raven Publishers, Philadelphia, PA, pp. 931-959.