ABSTRACT
The Antwerp database on small ribosomal subunit RNA offers over 4300 nucleotide
sequences (August 1995). All these sequences are stored in the form of an
alignment based on the adopted secondary structure model, which in turn is
corroborated by the observation of compensating substitutions in the alignment.
Besides the primary and secondary structure information, literature references,
accession numbers and detailed taxonomic information are also compiled. The
complete database is made available to the scientific community through
anonymous ftp and World Wide Web (WWW).
The database on small ribosomal subunit RNA (further abbreviated as SSU rRNA)
contained 4331 sequences in August 1995. This number comprises 1035 eukaryotic,
97 archaeal, 2988 bacterial, 64 plastid and 147 mitochondrial sequences.
Partial sequences are included only if the combined length of the sequenced
segments amounts to >= 70% of the estimated chain length of the molecule. The chain length of a
partially determined sequence is estimated by comparing it to a complete
sequence of a close relative. All sequences are stored in the form of an
alignment and contain the postulated secondary structure pattern in encoded
form.
Table
1
lists the different eukaryotic taxa and the number of representatives in the
database. The taxonomic classification of the species is according to Brusca
and Brusca (
1
) for the Animalia, according to Cronquist (
2
) for the higher plants, according to Ainsworth
et al
. (
3
) for the zygomycetes and ascomycetes, according to Moore (
4
) for the basidiomycetes and ustomycetes, and according to Margulis
et al
. (
5
) for the remaining eukaryotes, viz. the Protoctista.
Table 1
.
List of eukaryotic taxa represented in the database and number of their
representatives
Table
2
covers the prokaryotic SSU rRNA sequences. The classification is based on the
construction of evolutionary trees. In short, new sequences retrieved from the
EMBL (
6
) and/or GenBank (
7
) nucleotide sequence libraries are aligned with their presumed closest
relative. Evolutionary trees are then constructed by the neighbor-joining method (
8
), and according to the phylogenetic position observed, the species are assigned
to one of the taxa described by Woese and coworkers (
9
,
10
) and our research group (
11
,
12
). In the case of the Bacteria, no hierarchical distinction is made between
divisions and subdivisions such as the [alpha], [beta], [gamma], [delta] and [epsilon] subdivisions of the division Proteobacteria,
since these subdivisions do not always form together a monophyletic cluster in
evolutionary trees. In particular the [delta] and [epsilon] subdivisions are regularly clustered separately from the other
Proteobacteria (
11
,
12
). Furthermore, the [gamma] subdivision is often found to be paraphyletic (e.g.
10
,
11
), embracing the Proteobacteria [beta]. In previous papers describing the Antwerp rRNA database (
11
,
13
), we also distinguished the subdivision [gamma]*, which was formed by species attributed to the Proteobacteria group by
Woese and collaborators but separated from the majority of other [gamma] Proteobacteria by the Proteobacteria [beta]. However, since the position of the Proteobacteria [beta] cluster within the [gamma] subdivision is not stable, we no longer discriminate
between [gamma] and [gamma]* Proteobacteria, and bacteria previously ascribed to the latter
taxon are now placed in the [gamma] subdivision. For the Archaea, a distinction is made between the
divisions Crenarchaeota and Euryarchaeota (
14
). The latter division is further subdivided into 8 subdivisions.
Table 2
.
List of prokaryotic taxa represented in the database and number of their
representatives
Other databases concerning SSU rRNA structure (
15
,
16
) and known mutations in
Escherichia coli
16S rRNA (
17
) can be found in the present and the previous database issues of this journal.
The secondary structure models adopted for prokaryotic and eukaryotic SSU rRNAs
were originally derived (
18
) by comparison of 6 eucaryal, 1 archaeal, 4 bacterial, 2 plastidial and 1
mitochondrial SSU rRNA sequences available in 1984 and by surveying 13
secondary structure models proposed at the time in papers listed in (
18
). Gradual improvements were made to the models, as reported in subsequent
papers describing our database on SSU rRNA structure (
19
-
23
,
11
,
13
), taking into account compensating substitutions observed in our sequence
alignments (
24
) and the results of studies by others (reviewed in
25
). The model presently followed for bacterial SSU rRNAs is essentially identical
to the models made available in graphic form by Gutell (
15
). It is illustrated in Figure
1
with the SSU rRNA of the Gram positive bacterium
Bacillus subtilis
. The model followed for eukaryotic SSU rRNAs includes a secondary structure
pattern in certain variable areas left undefined in the models distributed by
Gutell (
15
). It is illustrated in Figure
2
with the SSU rRNA of the dinoflagellate
Alexandrium tamarense
.
Each SSU rRNA sequence is stored in a separate file, in order to simplify access
to the data. Each file contains primary and secondary structure information, as
well as annotations such as accession number, literature reference and detailed
taxonomic specifications. The SSU rRNA database is made available through
anonymous ftp on the server rrna.uia.ac.be or by World Wide Web at URL
http://rrna.uia.ac.be/rrna/ssuform.html. Because of user friendliness, we
recommend connecting to the database via WWW. Through WWW, it is very easy to
select sequences either one by one, or by taxonomic group, or by a combination
of both. Sequences can be retrieved in different formats. On-line information about the database is also available.
For those who choose to connect via ftp, a file called `readme' is present under
the directory `pub' which contains information on the database contents and on
how to obtain SSU rRNA sequences. We suggest to fetch and read this file first
before downloading other data. The names of the files on this server are
produced from the species name by taking characters of the genus and species
names. Their extension is a code describing the phylogenetic group to which the
species belongs. This makes it possible to either retrieve specific sequences
using the full name, or to retrieve a set of sequences belonging to a
phylogenetic group using wild cards. A program is available on the server which
allows to create different file formats and to integrate several sequences into
an alignment.
If problems occur in connecting to the server or in retrieving data, the authors
can be contacted by electronic mail to dwachter@uia.ua.ac.be or
yvdp@uia.ua.ac.be. Users publishing results based on data retrieved from our
database are requested to cite this paper.
Our research is supported by the BIOTECH programme of the commission of European
Communities (contract BIO2-CT94-3098), by the Programme on Interuniversity Poles of Attraction of
the Office for Scientific, Cultural and Technical Affairs of the Belgian State
(contract 23), and by the National Fund for Scientific Research. We thank
Sabine Chapelle for the computer drawings of the secondary structure models.
Yves Van de Peer and Peter De Rijk are Research Assistants of the National Fund
for Scientific Research.


REFERENCES
Return
