ABSTRACT
We have created a catalogue comprising all viroid and viroid-like RNA sequences which to our knowledge have been either published or
were available from on-line sequence libraries as of October 1, 1995. In the development of this catalogue nomenclature ambiguities were removed, the likely ancestral sequence of most species was determined and the most stable secondary
structures of these sequences were predicted using the MulFold package. Only
viroids of PSTVd-type possessed a rod-like secondary structure, while most other viroids adopted branched
secondary structures. Several viroids have predicted secondary structures that include either a Y or cruciform structure reminiscent of the tRNA-like end of virus genomes at an extremity. However, it remains unknown whether or not these predicted structures are adopted in solution,
and if they serve a particular function
in vivo
. Additional information such as the position of the self-catalytic domains are included in the catalogue. An analysis of the data
compilated in the catalogue is included. The catalogue will be available on the world wide web
(http://www.callisto.si.usherb.ca/~jpperra), on computer disk and in printed form. It should provide an excellent reference
point for further studies.
Viroids are small single-stranded circular RNA molecules (246-463 nt) that infect higher plants, causing diseases in crop species and resulting in important economic losses in the
agricultural industry (
1
). It has been proposed that viroids replicate in a DNA-independent manner via a rolling circle mechanism involving the synthesis
of multimeric strands which are then cleaved into monomeric fragments and
circularized producing the progeny viroids (
1
,
2
). We have developed a catalogue in order to facilitate viroid research by
presenting a large amount of viroid sequence and related data in a
comprehensive and user-friendly format. We compilated a total of 182 sequences from 21 viroids (
3
-
79
), eight plant satellite viroid-like RNAs (
80
-
88
) and the viroid-like domain of the human hepatitis [delta] virus RNA (
89
). In addition, we have included a transcript from the mitochondrial DNA
satellite II from both
newt
(
90
) and carnation (
91
) which have been proposed to be retroviroid-like elements evolving from viroids by retroposition into host DNA (
92
). Table
1
is a summary of these sequences. Here, we describe the catalogue and present an
analysis of its content.
Table 1
.
Summary of the viroid and viroid-like RNA sequences included in the catalogue
*Because it is only a domain of HDV that is related to the viroid, only one
variant sequence has been included even though several other sequences have
been determined.
This compilation comprises all sequences that to our knowledge had been
published or were available from the sequence library file servers (NCBI
sequence librairies) as of October 1, 1995. Only complete sequences were
retained because it appeared difficult to establish a criteria on how large a
sequence fraction should be in order for it to be listed in the compilation.
With the heterogeneity that occurs in viroids, the inclusion of sequences from
partial cDNA clones could result in erroneous conclusions in subsequent
studies, and therefore they were omitted.
Among viroids, the classification system proposed by Koltunow and Rezaian has been adopted (
93
). Viroids are divided in two types, the ASBVd-type (also named group A) whose members possess the capacity to self-cleave, but do not possess a conserved core region (CCR); and the
PSTVd-type (or group B) whose members possess a CCR, but have no known self-cleaving properties. Viroids of PSTVd-type are subdivided in two groups: the PSTVd group (or
subgroup B1) and the ASSVd group (or subgroup B2). The viroids of these two
groups are easily distinguished by the sequence forming their respective CCR (
93
). Only the sequences of HSVd, which is a member of PSTVd group, were subdivided
according to their initial isolation host.
In the catalogue each species is listed by its complete name and number of
sequence variants (see Fig.
1
). This is followed by, for each species, a listing of the sequence variants with their new identifications
(see below), accession numbers for sequence library file server, bank loci
(when available), number of nucleotides, numbers of each type of nucleotide,
complete publication information, and the complete nucleotide sequence in 10 nt
blocks in order to facilitate further analyses. For species which possess known
self-catalytic domains (hammerhead, hairpin and delta), the localization of the conserved sequences required for
cleavage to occur are reported. In addition, a secondary structure prediction
of the most likely ancestral variant was derived using the MulFold structure
prediction package (see below). The chosen most likely ancestral variants are
reported in Table
1
, while their predicted secondary structures are appended to the catalogue.
In order to simplify the nomenclature of the included sequences, we used an
identification scheme based on the usual abbreviation of an RNA species
followed by a number. For example, the original PSTVd sequence (
30
) is identified as PSTVd.1. The number is a function of the date of report. For
sequences already published or reported in on-line libraries, priority was given to publication date over library
submission date. When more than one sequence was reported simultaneously, we
attributed arbitrary numbers to the entries. Exclusively for HSVd variants, a
letter according to the initial isolate host precedes the number. For example,
HSVd.h1 refers to a species isolate from the hop. We believe that the proposed
nomenclature will facilitate further sequence identification.
In creating this catalogue we clarified some identification ambiguities. Our
proposed clarifications are listed below:
(i) Previously, two sequences have been identified as grapevine viroid, GVVd (
7
,
25
). Phylogenetic analysis of these sequences (data not shown) clearly
demonstrated that one is a variant of CEVd (
7
), which we identified as CEVd.22; while the other is a HSVd variant infecting
the grapevine (
25
), which we identified as HSVd.g5;
(ii) The second sequence reported as a GYSVd variant has been reported elsewhere
as a distinct species, GVd 1B (
68
). We used the latter identification as it is supported by phylogenetic analysis
which assigned 52 substitutions between the GVd 1B and the likely ancestral
GYSVd.27 variant (data not shown);
(iii) Two sequence variants of CPFVd have been reported (
28
,
29
). They belong to HSVd species isolated from cucumber. We have identified these
two sequences as HSVd.c1 and .c2.
(iv) Dapple apple viroid (DAVd; EMBL acession number X71599) appeared as the
likely ancestral variant of apple scar skin viroid (ASSVd.4), and not as a
different species.
(v) We used the nomenclature CoVd (
Coleus viroids
) which included the
Coleus blumei viroid species I
(CoVd.1, ref.
64
) and the
Coleus yellow viroid
(CoVd.2, ref.
65
). This is in agreement with the suggestion of Fonseca
et al
. (
65
).
We analyzed the compilated sequences for the general characteristics of viroids
and viroid-like RNAs taking for account the phylogenetic reconstruction reported by
Elena
et al
. (
94
), as well as an updated reconstruction confirming the previous one (F. Bussière, D. Lafontaine and J.-P. Perreault, unpublished data). No relationship can be discerned
between the nucleotide percentages, sequence length and phylogenetic
clustering. The nucleotide differences between variants of the same species are
located primarily in the pathogenesis (P) and the variable (V) domains of the
viroids, however they are not restricted to these two domains as concluded
previously (
95
). Furthermore, no relationship has been established between viroids and either
their host or their worldwide geographic distribution, nor have any specific
signature sequences been identified for viroids which infect the same host. In
contrast, some interesting features were pointed out by phylogenetic
reconstructions and secondary structure predictions.
In order to study the relationship between variants belonging to the same
species and to identify the most likely ancestral sequence, we performed either
direct inferences or phylogenetic reconstructions. The identification of the
probable ancestral variants will be useful in further phylogenetic
reconstructions between species. When two sequences were known, we initially
looked to see if one could be derived from the other, and then used the
ancestral one if this was indeed the case. If they did not fulfill this
requirement, the sequences were considered as belonging to different taxa. If
more than two variants existed, the ancestral state was either directly
inferred, or obtained by phylogenetic analysis. In the latter cases the
sequence alignments were carried out by means of a multiple sequence algorithm
with hierchical clustering (Multalin package, ref.
96
) followed by minor sequence rearrangements, and then the phylogenetic analysis
was performed using maximum parsimony method (PAUP package, ref.
97
). We deduced the current probable ancestral variant among the sequences
available in Table
1
, and appended this list to the compilation. The addition of more sequences may
facilitate the identification of ancestral variants.
Exhaustive phylogenetic reconstitutions of the species that have large numbers
of variants (ASBVd, CCCVd, CEVd, GYSVd, HSVd and PSTVd) have been performed.
The case of CCCVd was the simplest among these viroids. Five CCCVd variants
resulted from a duplication of a region of CCCVd.2 and then subsequent substitutions occurred (CCCVd.3-CCCVd.7), as suggested previously (
95
). CCCVd.1 has a one nucleotide change as compared to CCCVd.2, and this
substitution is not found in the other five larger variants; therefore, we
infered CCCVd.2 as the oldest variant based on the available data. In contrast,
the other viroids with several variants required phylogenetic reconstructions to identify the likely oldest sequence. Most of the variants of these viroids
differ only by a very small number of substitutions (~1-10 mutations and/or deletions and/or insertions). When all variants
of one of these species were considered for a phylogenetic reconstruction,
several sequences could have evolved in different manners from either the
ancestral variants or other variants. Thus, several trees with the same total
length were inferred for a specific species (also observed by Dr Robert Owens
for HSVd and PSTVd, personal communication). Because no biological data permits
validation of these phylogenetic trees (for example, replicational data), a
consensus tree was deduced and the likely oldest sequence defined as the
ancestral one.
Possibly, the identity of the host infected by a viroid may be useful for the
validation of phylogenetic trees. For example HSVd variants infect a wide range
of hosts including hop, grapevine, plum, pear, peach, citrus and cucumber.
Taking into account the host specificity of HSVd variants, we analyzed the
trees derived in order to verify whether variant clusters reflected their host specificity. As
previously reported (
26
), the HSVd variants clustered mainly in three variant types, (i) the hop (hop,
grapevine, peach and pear isolates), (ii) the plum (grapevine, peach and plum
isolates) and (iii) the citrus (citrus and cucumber isolates). The HSVd
phylogenetic tree did not strictly reflect the host specificity; hence this did
not allow validation of the phylogenetic data.
For viroids that have only two known variants, the sequences are nearly
perfectly identical, and therefore did not allow identification of the
ancestor. One exception was TASVd in which TASVd.1 and .2 show a sequence
homology of 91.5% (
50
) and appear the most distant variants of the same species with two known
sequences. A phylogenetic reconstruction of some PSTVd-type viroids shows that the two TASVd sequences branched onto the main
lineage (Fig.
2
). These two branches were separated by an average of 16 substitutions. These
results may support two different possibilities, either the two TASVd sequences
belong to two different species which infect the same host, or they are two
variants of a same species. The latter case suggests that CEVd has evolved
directly from TASVd and not from a common ancestor. Further characterization
should determine which of these two scenarios is the correct one.
In order to allow comparison between the most stable secondary structures of the
ancestral species, we performed computer analysis on the previously selected
sequences. We used the MulFold structure prediction package (
98
) of GCG (Genetic Computing Group) version 8.0 installed on a UNIX system (at
the Institut de Recherche Clinique de Montréal). For each selected sequence, a prediction of the secondary structure
was obtained, and the resulting structures transformed into `connect' files
using the plotfold-H software. The connect file of each predicted structure follows the
sequence in the website, thereby allowing any investigator to work within it
using his own graphics package. For the printed version of the catalogue, the
secondary structure connect files have been displayed using the RNADrawn
package and are appended.
Most of the published viroid secondary structures were predicted with the
original RNAfold package. These original predictions led to the proposal that
the most stable secondary structure of the classical viroids of the PSTVd-type (i.e. PSTVd and ASSVd groups) are rod-like structures composed of alternating single- and double-stranded regions (
1
,
95
). With time, the rod-like secondary structure almost became an identification criterion for
viroids. Analysis of our predicted secondary structures led to unexpected
results that are summarized in Figure
3
. To facilitate the analysis, the results are presented based on the
phylogenetic tree of viroids and related RNAs.
We have attempted to compile a catalogue of all viroid and viroid-like sequences published in journals or available from the GenBank and
EMBL nucleotide sequence libraries. To this end, the EMBL and NCBI library file
servers were scanned using several queries for new sequences. Sequences
obtained were added to the catalogue as well as any pertinent, useful
information. The authors would appreciate being informed of any omitted
sequences or errors in the data set. We intend to correct any such errors in
the future. Furthermore, the catalogue will be updated a few times per year. In
future catalogue updates, no priority will be attributed to publication. This
compilation will be available on the world wide web
(http://www.callisto.si.usherb.ca/~jpperra). In this manner all viroid researchers will have the possiblity
of, as well as some responsibility for, updating this compilation by electronic
mail submission (jp.perre@courrier.usherb.ca) following the example in Figure
1
. The viroid and viroid-like RNA catalogue will be also available on floppy disks, readable on
microcomputers operating under MS-DOS, or in hard copy form.
The authors thank Drs Rudra P. Singh and Robert A. Owens for critical review of
sections of the catalogue. This work was supported in part by grants from the
Natural Sciences and Engineering Research Council (NSERC) and by the Medical
Research Council (MRC) of Canada to J.-P.P, and in part by a scientific team grant from Fonds pour la Formation
des Chercheurs et l'Avancement de la Recherche du Québec (FCAR). F.B. and D.L. are recipients of predoctoral fellowships from
FCAR and Fonds de la Recherche en Santé du Québec (FRSQ), respectively. J.-P.P. holds a scholarship from the Medical Research Council
of Canada.

REFERENCES
Return


