ABSTRACT
The RNA editing process in protozoan parasites is controlled by small RNA
molecules known as guide RNAs (gRNAs). The gRNA database is a comprehensive
compilation of published guide RNA sequences from eight different kinetoplastid
organisms. In addition to the RNA primary sequences, information on the gene
localization, the experimental verification of the transcripts, and literature
citations are provided. Accessory information includes the secondary structures
of four
Trypanosoma brucei
gRNAs as well as a computer modelled three dimensional gRNA structure. The
database is made available as a hypertext document accessible via the World
Wide Web (WWW) or from the authors in a printed form.
Guide RNAs (gRNAs) are small, metabolically stable mitochondrial transcripts
identified only in kinetoplastid organisms such as
Trypanosoma
,
Leishmania
or
Crithidia
. The molecules carry out a central function during the unusual mitochondrial
RNA processing reaction known as kinetoplastid (k) RNA editing (for recent
reviews see
1
,
2
). During editing uridylate residues get inserted into and deleted from
mitochondrial transcripts thus completing the sequence information of these
mRNAs. Guide RNAs provide the information for the U insertion/deletion process
by base pairing to pre-edited mRNAs. They are encoded on the mitochondrial mini- or maxicircle DNA elements in kinetoplastid organisms and the RNAs
are presumably primary transcripts. Guide RNAs have an average length of 50-70 nucleotides (nt) with a strong A/U nucleotide bias. The primary
sequence of gRNAs can be divided into three functional domains: first, a region
of complementarity located at the 5'-end, termed anchor sequence, which is thought to create the initial
contact with the pre-edited mRNA; second, an informational sequence domain which presumably
directs the editing reaction; and third, a posttranscriptionally added 3' oligo(U) extension, sometimes of >20 nt in length. More than 200
different gRNAs have been estimated to be required for the editing of all
encrypted genes in
Trypanosoma brucei
(
3
) and there is an ~3-fold higher coding capacity for gRNA genes in that organism. Thus, in
addition to the large number of different gRNAs the potential for gRNA
redundancy exists (
4
). Guide RNAs have been suggested to fold into simple secondary structures,
comprising two consecutive stem loop elements with both terminal ends in a
single-stranded conformation (
5
).
Release 1.0 of the database contains 235 gRNA sequence entries including
published sequences through September 30, 1996. The sequences stem from eight
different kinetoplastid species:
Trypanosoma brucei, Trypanosoma cruzi, Trypanosoma congolense, Trypanosoma
equiperdum, Leishmania tarentolae, Leishmania infantum, Leishmania gymnodactyli
and
Crithidia fasciculata
. The compilation is arranged in tabular form, listing for each entry: organism
and name of the gRNAs, their primary sequences [not including the 3' oligo(U) extension] and their localization on the mitochondrial genome
(see Fig.
1
for an example). The order in which the gRNAs have been listed is from left-to-right with reference to the linear maxicircle map as given in (
6
). The nomenclature of gRNAs differs depending on the laboratory involved and
the molecules are listed in a 5' to 3' order: the gRNA required to edit a 5' region of a mRNA sequence is listed before that which is
involved in editing a 3' region. The amount of sequence shown for a gRNA may exceed the actual
length of the gRNA. This is because in many cases the 5' and 3' termini have not been determined experimentally or because
heterogeneity has been observed when gRNAs have been analyzed by primer
extension or cDNA sequencing. For 159 of the 235 gRNAs, the existence of the
molecules within RNA preparations has been experimentally verified by Northern
blotting, primer extension, direct cDNA cloning, or by being isolated as part
of a gRNA/mRNA chimera. The remaining sequences have to be considered putative
gRNAs based on their base complementarity to fully edited mRNA sequence
domains. Since all sequences were collected from published information, the
corresponding references are provided in an associated hypertext document
including MEDLINE identification numbers. In most of the cases these references
will provide an alignment of the gRNAs with their cognate mRNAs.
The gRNA database is accessible via the URL: http://www. biochem.mpg.de/~goeringe/ . A printed version can be obtained upon request from any of the
authors who can be contacted by electronic mail
(goeringe@alf.biochem.mpg.de/souza@alf. biochem.mpg.de) or by mail at the
address given above. Users of the database should cite this publication.
Corrections, new entries, errors and omissions or other materials for inclusion
in the database are welcome. Submission of new information will be accepted in
any form. Unpublished data will be held confidential if required.
This work was supported by grants from the German ministry for education and
research (BMBF) and the German research foundation (DFG) to H.U.G.
REFERENCES
Return
