Nucleic Acids Research, 2000, Vol. 28, No. 12 2342-2352
© 2000 Oxford University Press
Evolutionary appearance of genes encoding proteins associated with box H/ACA snoRNAs: Cbf5p in Euglena gracilis, an early diverging eukaryote, and candidate Gar1p and Nop10p homologs in archaebacteria
Program in Evolutionary Biology, Canadian Institute for Advanced Research, Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia B3H 4H7, Canada
Received February 22, 2000; Revised and Accepted April 24, 2000.
DDBJ/EMBL/GenBank accession nos AF234319, AF234320.
| ABSTRACT |
|---|
|
|
|---|
A reverse transcriptionpolymerase chain reaction (RTPCR) approach was used to clone a cDNA encoding the Euglena gracilis homolog of yeast Cbf5p, a protein component of the box H/ACA class of snoRNPs that mediate pseudouridine formation in eukaryotic rRNA. Cbf5p is a putative pseudouridine synthase, and the Euglena homolog is the first full-length Cbf5p sequence to be reported for an early diverging unicellular eukaryote (protist). Phylogenetic analysis of putative pseudouridine synthase sequences confirms that archaebacterial and eukaryotic (including Euglena) Cbf5p proteins are specifically related and are distinct from the TruB/Pus4p clade that is responsible for formation of pseudouridine at position 55 in eubacterial (TruB) and eukaryotic (Pus4p) tRNAs. Using a bioinformatics approach, we also identified archaebacterial genes encoding candidate homologs of yeast Gar1p and Nop10p, two additional proteins known to be associated with eukaryotic box H/ACA snoRNPs. These observations raise the possibility that pseudouridine formation in archaebacterial rRNA may be dependent on analogs of the eukaryotic box H/ACA snoRNPs, whose evolutionary origin may therefore predate the split between Archaea (archaebacteria) and Eucarya (eukaryotes). Database searches further revealed, in archaebacterial and some eukaryotic genomes, two previously unrecognized groups of genes (here designated PsuX and PsuY) distantly related to the Cbf5p/TruB gene family.
| INTRODUCTION |
|---|
|
|
|---|
In eukaryotes, cytoplasmic rRNA species other than 5S rRNA are transcribed as part of a longer precursor (pre-rRNA), from which the mature rRNAs are processed. In most cases, the pre-rRNA, which includes external and internal transcribed spacers, undergoes endonucleolytic cleavage to yield large subunit (LSU) rRNAs (5.8S and 2628S) and a small subunit (SSU) rRNA. To date, the mechanisms of eukaryotic pre-rRNA processing have been well studied only in relatively late diverging species such as yeast (Saccharomyces cerevisiae) and vertebrate animals. These investigations have provided information about the complexity and order of processing events, as well as the involvement of numerous cis-elements and trans-acting factors (for recent reviews, see 14).
In the protist phylum Euglenozoa, which includes euglenid protozoa such as Euglena gracilis and kinetoplastid protozoa such as Trypanosoma spp, the LSU rRNA is further fragmented into smaller pieces (a total of 14 in the case of E.gracilis; 5). Euglena gracilis LSU rRNA also contains a substantially higher proportion of O2'-methylnucleosides than its counterparts in yeast and vertebrates (6; M.N.Schnare, personal communication). As in other eukaryotes, the highly fragmented euglenid LSU rRNA is derived from a pre-rRNA that includes the SSU rRNA sequence (7,8); however, information about trans-acting factors and details about the rRNA maturation pathway in E.gracilis are still quite limited (8,9).
Although questions about the precise phylogenetic position of the Euglenozoa within the domain Eucarya (eukaryotes) have been raised by recent protein phylogenies (1013), this phylum is generally considered to represent an early-branching eukaryotic lineage (14). In this context, studies of ribosome biogenesis in members of the Euglenozoa should provide insights into the evolutionary origin of the system that processes eukaryotic rRNA.
The box H/ACA snoRNAs constitute one of the two major classes of snoRNA in eukaryotes (15). Box H/ACA snoRNAs display a secondary structure that features two hairpin motifs and two conserved sequence blocks, box H (ANANNA, in the hinge region between the two hairpin structures) and box ACA (ACANNN, at the 3' end) (1518). The box H/ACA snoRNAs participate in either rRNA processing (e.g. yeast snR30) (19) or formation of pseudouridine (5-ribosyluracil,
) in rRNA. The latter function relies on complementarity within pseudouridine pockets to short regions flanking the modification sites in rRNA (16,20). Some box H/ACA snoRNAs (e.g. yeast snR10) (20,21) are involved in both nucleolytic processing and post-transcriptional modification. Box H/ACA snoRNAs have been identified in so-called crown eukaryotes such as animals, plants and yeast (reviewed in 22,23; see also 18,2426) as well as in a ciliate protozoon, Tetrahymena thermophila (27).
In S.cerevisiae, box H/ACA snoRNAs form a complex with the proteins Gar1p, Cbf5p, Nhp2p and Nop10p (4,2831). Cbf5p, originally identified as a centromere/microtubule-binding protein (32), is an essential protein in yeast, implicated in rRNA processing (33) and stability of box H/ACA snoRNAs (4). Cbf5p shares sequence similarity with the TruB/Pus4p family of tRNA
55 synthases (3438) and, in association with box H/ACA snoRNAs, is believed to act as an rRNA
synthase. Eukaryotic Cbf5p homologs have been identified and characterized in rat (NAP57; 34), human (dyskerin; 39,40), Drosophila [Nop60B (41) or Minifly protein (26)] and fungi (42). Genes for archaebacterial Cbf5p homologs have also been identified (4347) but their function is currently unclear.
As yet, there is no published information about box H/ACA snoRNP proteins in unicellular eukaryotes (protists), especially ones thought to be early branching members of the domain Eucarya. As part of an on-going investigation of rRNA processing and modification in the early diverging protist, E.gracilis, we have cloned and characterized a full-length cDNA encoding the Cbf5p protein of this organism. In the course of this study, we also identified candidate genes encoding archaebacterial homologs of two of the proteins (Gar1p and Nop10p) known to be associated with eukaryotic box H/ACA snoRNPs, as well as genes specifying two novel protein families related to Cbf5p and TruB.
| MATERIALS AND METHODS |
|---|
|
|
|---|
RNA and DNA
Total cellular DNA and RNA from E.gracilis was prepared as described by Breckenridge et al. (48). Poly(A)+ RNA was selected from total RNA using the PolyATtract system (Promega, Madison, WI).
Oligonucleotides
Oligoribonucleotide P-1R (5'-AAUAAAGCGGCCGCGGAUCCAA-3') was purchased from Dalton Chemical Laboratories Inc. (North York, ON, Canada). Oligodeoxyribonucleotides (Table 1) were obtained from Gibco BRL (Burlington, ON, Canada) or ID Labs Biotechnology, Inc. (London, ON, Canada).
|
Generation of Cbf5p cDNA sequence by reverse transcriptionpolymerase chain reaction (RTPCR)
RT of a fraction enriched in poly(A)+ RNA (~500 ng) was carried out using Superscript II reverse transcriptase (RNase H; Gibco BRL) and modified oligo(dT) primers (Table 1) dTP-4 for degenerate PCR or P-16 for 3' RACE (rapid amplification of cDNA ends) following the suppliers protocol. The oligo-capping method of Maruyama and Sugano (49) was used in 5' RACE experiments. In brief, a fraction enriched in poly(A)+ RNA was treated with calf intestinal alkaline phosphatase (New England Biolabs, Beverly, MA) followed by tobacco acid pyrophosphatase (Epicentre Technologies, Madison, WI). Using T4 RNA ligase (Amersham-Pharmacia Biotech, Cleveland, OH), the 3' end of RNA oligo P-1R was joined to the 5' end of the RNA originally possessing the cap structure. The resulting RNA was subjected to RT as described above using P-87 (Table 1) as a primer.
PCR mixtures were prepared as described by Baskaran et al. (50) using Pfu DNA polymerase (1.5 U/ml, Stratagene, La Jolla, CA) but substituting Taq DNA polymerase (25 U/ml of reaction mix, Gibco BRL) for the KlenTaq1 polymerase. Initially, two internal portions of Cbf5p sequence were amplified by degenerate PCR using modified touchdown protocols (51) with the primer pairs P-70/P-73 and P-77/P-79 (Table 1; degenerate RTPCR, Fig. 1). These PCR products were purified by non-denaturing polyacrylamide gel electrophoresis, re-amplified using a conventional PCR cycle, then cloned and sequenced as outlined below. New primers (P-85 and P-87, then P-86 and P-88 in the nested reaction), designed on the basis of the sequence information obtained, were used to connect the two portions (internal RTPCR, Fig. 1). For 5' RACE, the combination P-55 + P-87 was followed by P-4 + P-95 in the nested reaction (Fig. 1). For 3' RACE, P-85 + P-55, then P-86 + P-4 were used in the nested reaction (3' RACE I, Fig. 1).
|
To verify poly(A) addition site(s) more precisely, an additional round of 3' RACE was carried out using as template dTP-4-primed cDNA together with primers specific for sequence further downstream (P-99 and P-107, respectively, in primary and nested PCR) and anchor-specific primers (P-4 and P-55, respectively, in primary and nested reactions; 3' RACE II, Fig. 1). Ten clones selected in this way were sequenced.
Nested PCR products were purified by agarose gel electrophoresis, cloned (52), then sequenced by the dideoxy method (53) with modifications (48,54,55). To minimize the possibility of PCR mutation, at least three independent clones from each reaction were sequenced on both strands. The P-86 to P-88 portion of the 3' RACE I products was not fully analyzed because this region overlapped parts of other RTPCR products.
Genomic PCR
To examine the 3' terminal region of the E.gracilis Cbf5p gene encompassing the region that includes sequence heterogeneities noted during cDNA analysis (see Results), a PCR walking strategy (56) was used with minor modifications. In brief, E.gracilis total DNA was digested separately with a number of restriction enzymes that generate blunt ends. A partially double-stranded adapter (P-22 and P23ddC annealed together; see table 1 and figure 1 of ref. 56) was ligated to both ends of the resulting restriction fragments using T4 DNA ligase. The ligation products were then used as templates in PCR with different combinations of cDNA- and adapter-specific primers (P-163 and P-24 in the primary reaction, P-164 and P-25 in the secondary reaction; Table 1). One of the PCR products (~420 bp), amplified from template prepared from DraI-digested DNA, was purified, cloned and sequenced.
Northern hybridization
The fraction enriched in poly(A)+ RNA (500 ng) was electrophoresed in a 1% agarose0.4 M formaldehyde gel (57), then transferred to GeneScreen Plus nylon membrane (NEN, Boston, MA) according to the protocol of Chomczynski and Mackey (58). A cloned cDNA fragment (P-86 to P-88) was amplified by PCR using a corresponding plasmid clone as template DNA. The amplification product was purified by agarose gel electrophoresis, then labeled by a random priming protocol (59) using [
-32P]dATP. Hybridization was performed in 0.5 M sodium phosphate buffer (pH 7.2), 7% SDS and 1 mM EDTA (60) at 65°C overnight. The membrane was immersed in 40 mM sodium phosphate buffer, 5% SDS and 1 mM EDTA at 65°C for 5 min, then 40 mM sodium phosphate buffer, 1% SDS and 1 mM EDTA at 65°C for 20 min (3x), and finally subjected to autoradiography with an intensifying screen at 75°C.
Southern hybridization
Total cellular DNA (5 µg) was hydrolyzed with restriction enzymes and the products were separated by electrophoresis in a 0.7% agarose gel, then transferred to GeneScreen Plus as described by Chomczynski (61). The cDNA fragment was amplified by PCR and labeled as above. Hybridization and autoradiography were carried out as described above.
Analysis of sequence data
Sequence data were assembled and analyzed using MacDNASIS version 3.5 (Hitachi Software, Yokohama, Japan). Searches of public domain nucleotide and protein sequence databases were carried out by Gapped-BLAST (BLASTP or TBLASTN; 62) through the NCBI web server (www.ncbi.nlm.nih.gov ) using the default options unless otherwise specified. Sequence alignments were generated with Clustal W version 1.7 (63) followed by manual modification. Based on the alignment, 181 amino acid positions were selected for phylogenetic analysis, with positions of insertion and deletion omitted. Phylogenetic trees were constructed using the quartet puzzling and maximum likelihood methods of protein phylogeny in the PUZZLE 4.0 program (64). The JTT-F +
model of amino acid substitutions was assumed in the analysis (6466). Rate heterogeneity among sites was approximated by a discrete gamma distribution (with four categories).
| RESULTS |
|---|
|
|
|---|
Assembly of a cDNA sequence encoding E.gracilis Cbf5p
As summarized in Figure 1, an RTPCR approach was used to assemble a complete cDNA sequence encoding Euglena Cbf5p. Initially, highly degenerate primers were designed on the basis of conserved peptide motifs identified in an alignment of eukaryotic homologs of yeast Cbf5p. Two of the primer sets successfully amplified short cDNA fragments that appeared to specify portions of Euglena Cbf5p (degenerate PCR in Fig. 1). These partial cDNA sequences were used in further primer design to connect the initial two sequences, after which additional sets of primers were synthesized as necessary to allow completion of the sequence by RACE techniques.
The 5' end of the cDNA was obtained by an oligo-capping method (49) that in theory should yield the authentic 5' end of the corresponding mature mRNA. In E.gracilis, as in the trypanosomatid protozoa, most mRNAs acquire a common 5' terminal sequence as a result of a trans-splicing event that covalently joins the 5' portion of a small, separately transcribed RNA, the spliced leader (SL) RNA, to different mRNA transcripts (67). In our hands, the oligo-capping procedure yielded an SL sequence having two additional 5' nucleotides (AC) compared with previously reported sequences (67,68). Known gene sequences for the Euglena SL RNA do have AC at the corresponding positions (68); thus, the Euglena SL sequence very likely is 28 nt long, beginning with the sequence 5'-ACAC...
The SL RNA features an as yet uncharacterized methylguanosine 5' cap structure, as suggested by analyses (Y.Watanabe, unpublished results) using an anti-TMG antibody that in our hands reacts with monomethyl- as well as trimethylguanosine (69). Previous studies of SL sequences in Euglena had employed direct reverse transcriptase sequencing, with the enzyme evidently stopping at the third or fourth positions from the 5' end, presumably as a result of post-transcriptional modifications in the SL sequence (70; see also 71, but no sequencing gel shown). The protocol used in the present study may allow reverse transcriptase extension through these modifications to some degree; alternatively, reverse transcriptase readthrough may reflect undermodification at the 5' end of the Euglena SL RNA, as observed in the case of the trypanosomatid SL (7274). The latter RNA has a characteristic cap4 structure in which the first four 5' nucleotides are methylated in the base and/or sugar (O2'-methyl) moieties (72,75). Our results and those of previous studies suggest that Euglena SL RNA likely harbors modifications in its first four 5' nucleotides. Thus, Euglena and kinetoplastid protozoa may be similar not only in possessing trans-splicing but also in having an SL RNA that is extensively modified at its 5' end. Recently, a 28-nt SL beginning with 5'-ACUC... and also possibly containing a 5' cap structure with extensive modifications was characterized in the colorless euglenid, Entosiphon sulcatum (76).
In 3' RACE experiments that examined cDNA synthesized using an oligo(dT) primer, we obtained evidence of heterogeneity at the poly(A) addition site (Fig. 2). 3' RACE analysis also revealed a likely sequence heterogeneity at a single position (Fig. 2): at this site, four clones had C whereas five clones had T, with one clone lacking the corresponding region altogether (see below). It seemed unlikely that these differences could be attributed to RT or PCR artifacts because they were observed in clones generated in independent 3' RACE experiments (data not shown). On the other hand, sequencing of the genomic PCR product comprising the 3' end of the cDNA indicated that the position in question is T in the gene sequence (three out of three independent clones sequenced). The poly(A) addition site in the one deleted clone is ambiguous due to the possible occurrence of A residues immediately 5' to the poly(A) tail; all other clones had a pyrimidine residue at the poly(A) junction.
|
Genomic Southern analysis (data not shown) combined with the results of genomic PCR make it unlikely that there are multiple copies of the Cbf5p gene in E.gracilis nuclear DNA, although detailed Cbf5p gene structure has not yet been investigated. Northern blot analysis (not shown) revealed a Cbf5p mRNA of ~1.7 kb, consistent with the size of the complete cDNA sequence plus about 100 3' terminal A residues.
Characteristics of the Euglena Cbf5p sequence
The E.gracilis Cbf5p cDNA sequence contains an open reading frame of 467 residues specifying a protein of predicted molecular mass 52 392 Da (Fig. 2). Like the homologous fungal sequences, Euglena Cbf5p lacks an identifiable positively charged N-terminal extension that in metazoan Cbf5p sequences is assumed to represent a nuclear localization signal. However, Euglena Cbf5p does possess the C-terminal KKE/D repeats that are characteristic of all known eukaryotic Cbf5p homologs. These repeats display microtubule-binding activity in vitro and are essential for viability in S.cerevisiae (32) (although not in another yeast, Kluyveromyces lactis; 42).
In Escherichia coli
synthases acting on rRNA and tRNA, a highly conserved aspartate residue is essential for activity (7780), and a possible catalytic role for this residue via its ß-carbonyl group has been proposed (77,81). A recent crystallographic study supports this idea (82). A motif that includes this conserved Asp (the TruB motif II) has been identified not only among members of the TruB/Pus4p family of tRNA
55 synthases (37) but also in Cbf5p (36), and the functional importance of this Asp in both TruB (80) and Cbf5p (83) has now been demonstrated. A partial alignment of eukaryotic and archaebacterial Cbf5p homologs, encompassing highly conserved regions (motifs I and II) characteristic of
synthases, is shown in Figure 3 (a full alignment of the corresponding Cbf5p sequences is available upon request). Exceptional conservation of the motif II sequence (including the functionally critical Asp residue) is evident among Cbf5p homologs, including the E.gracilis one.
|
An unusual arrangement of the Cbf5p gene is seen in Aeropyrum pernix, the only member of the archaebacterial kingdom Crenarchaeota for which a complete genome sequence is available (the other completely sequenced archaebacterial genomes being from members of the Euryarchaeota). In this case, the Cbf5p homolog is encoded by two partially overlapping open reading frames and the motif II aspartate residue (Asp60) is replaced by glutamate (47). In mutagenesis experiments with the E.coli TruA tRNA
synthase, Huang et al. (77) found that Glu at this position could not substitute functionally for the conserved Asp.
Phylogenetic relationships
Phylogenetic analysis of eukaryotic and archaebacterial Cbf5p homologs (Fig. 4) suggests that Euglena diverged from the main branch of eukaryotic evolution earlier than the other eukaryotes for whom Cbf5p sequences are currently available. However, confirmation of this point will require a more diverse collection of eukaryotic Cbf5p sequences, including additional protist ones. This phylogenetic analysis also demonstrates that archaebacterial and eukaryotic (including Euglena) Cbf5p sequences comprise two distinct branches of a single clade to the exclusion of the affiliated TruB (eubacterial) and Pus4p (eukaryotic) sequences, which form a separate clade. These affinities are supported by the presence of unique insertions found only in TruB and yeast Pus4p sequences and by the absence of an apparent pseudouridine synthase and archaeosine (PUA) domain in Pus4p and most eubacterial TruB sequences (84).
|
Potential homologs of box H/ACA snoRNP proteins in archaebacterial genomes
The known presence in archaebacterial genomes of genes encoding homologs of Cbf5p and Nhp2p (another box H/ACA snoRNP-specific protein) prompted us to search for genes that might specify Gar1p and Nop10p, the remaining two proteins recently identified as components of box H/ACA snoRNPs (30,31). In a TBLASTN search of sequenced archaebacterial genomes using the yeast Gar1p sequence, a potential Methanobacterium homolog was identified (E value 0.004). Using this Methanobacterium sequence as query, putative homologs from other archaebacterial genomes were identified (E values ranging from 1012 to 0.43 for Aeropyrum). Figure 5A presents an alignment of the N-terminal portion of inferred archaebacterial and eukaryotic Gar1p sequences. The archaebacterial versions are predicted to have shorter N- and C-terminal regions than their eukaryotic homologs, with sequence identity limited to short blocks dispersed throughout the sequence. In this regard, it has been shown in the case of S.cerevisiae Gar1p that GlyArg repeats at both N- and C-termini are not essential for growth (85), with the protein produced by in vitro translation able to interact with box H/ACA snoRNAs through its internal core region (86). Thus, the shorter archaebacterial Gar1p candidates could well be functional.
|
Using the yeast Nop10p sequence in a TBLASTN search of complete archaebacterial genome sequences, we identified possible homologs of this protein; E values ranged from 105 to 0.84 (except for 4.9 in the case of Methanococcus), with corresponding values between 1016 and 1010 obtained using the Pyrococcus Nop10p homologs (which have the identical protein sequence) as query. Amino acid sequence identity among the putative archaebacterial and eukaryotic Nop10p homologs (Fig. 5B) is somewhat greater than in the case of Gar1p, with N-terminal and C-terminal regions being the most divergent. However, in contrast to Gar1p, the putative Nop10p sequences from archaebacteria and eukaryotes are virtually the same length.
Examination of sequenced archaebacterial genomes revealed that in all cases, the putative Nop10p gene is physically linked to the gene encoding the homolog of eukaryotic translation initiation factor IF2-
, and sometimes ribosomal protein genes, as well, possibly in the same operon (Fig. 7A). Further, genes for some candidate archaebacterial Gar1p homologs (in Methanococcus, Methanobacterium, Archaeoglobus and Aeropyrum) are physically linked to genes for archaebacterial homologs of eukaryotic transcription factor IIB (TFIIB) (Fig. 7B). In Archaea, transcription of both rRNA and mRNA requires TFIIB (87). Furthermore, the candidate Aeropyrum Gar1p gene is in the same transcriptional orientation (and so may be part of the same operon) as the gene for ribosomal protein S8E (Fig. 7B). The organization of these putative archaebacterial Gar1p and Nop10p genes suggests that their expression may be co-regulated with components of the transcription and translation machineries in these organisms.
|
Identification in archaebacteria and some eukaryotes of genes encoding a novel group of proteins related to the Cbf5p/TruB gene family
In an attempt to identify additional protein sequences related to known
synthases, we conducted BLAST searches using as query an internal portion of the E.gracilis Cbf5p sequence (positions G66 to A271) that excluded Cbf5p-specific N- and C-terminal regions. In particular, the query sequence lacked the PUA domain, which has been proposed as a possible RNA-binding domain in a broad range of known or putative RNA-binding proteins (84). In addition to the expected Cbf5p/TruB family sequences (E values ranging from 1086 to 0.45 except for a value of 5.1 in the case of S.cerevisiae Pus4p), this search detected a previously uncharacterized Archaeoglobus sequence (GenBank accession no. AAB90092) at an E value of 0.052. Using this sequence in Gapped-BLAST searches (62), additional novel Cbf5p/TruB-related sequences (here designated PsuX) were detected (Fig. 6A). In BLASTP or TBLASTN searches, we identified PsuX sequences with high statistical significance (E values <1020) in all completely sequenced archaebacterial genomes, as well as detecting possible orthologous full-length sequences in the animals Caenorhabditis elegans and Drosophila melanogaster and a partial sequence in a protist, Giardia intestinalis; on the other hand, no PsuX-related sequences were evident in the yeast (S.cerevisiae) or eubacterial genomes. BLAST searches of EST databases suggested that mammals and plants also express a PsuX homolog (data not shown). After four rounds of iteration using the Archaeoglobus fulgidus sequence as query in a more sensitive PSI-BLAST (62) search and with the low-complexity filter program SEG (88) off, authentic Cbf5p/TruB family sequences but no pseudo-positives appeared with a high degree of significance (E values <108).
|
The Cbf5p/TruB-homologous portion of the PsuX protein sequence does not contain a readily identifiable TruB motif II (Fig. 6A) which, as noted above, includes the functionally critical Asp residue. However, alignment of PsuX protein sequences did reveal a highly conserved stretch, (A/S)GRED(V/I)D(A/V)R(M/T/V)LG (positions 183194 in the A.fulgidus sequence) (Fig. 6A), that is likely a functionally important region. This conserved stretch resembles known
synthase motifs (77) and includes two Asp residues (Fig. 6B). In Figure 6B, the PsuX2 alignment follows the DxxxxG pattern proposed for the TruB/RluA/RsuA superfamily (79).
Another notable feature of PsuX sequences is the presence of CX2C motifs within the N-terminal region. Some of these motifs (e.g., C4X2C7, C20X2C23, C139X2C142 and C147X2C150 in A.fulgidus) are well conserved among the aligned sequences (not shown), although the C139X2C142 and C147X2C150 motifs are shared only among A.fulgidus, Methanococcus jannaschii and Methanobacterium thermoautotrophicum. The CX2C motif is frequently found in metal-binding domains of various nucleic acid-binding proteins (89). Pus1p, a yeast tRNA
synthase, contains a zinc ion essential for function and tRNA binding, and potential zinc-binding elements similar to CX2C motifs have been proposed for this protein (90).
In M.jannaschi and M.thermoautotrophicum, the PsuX gene is physically linked to the gene encoding ribosomal protein L21E and (in the case of M.thermoautotrophicum) the gene for an archaebacterial homolog of the signal recognition particle GTPase, Ffh/SRP54, as well (Fig. 7C). Furthermore, in the genome of Pyrococcus spp, genes encoding PsuX and a homolog of eukaryotic initiation factor IF2-
are arrayed in a head-to-head fashion (Fig. 7C). These observations suggest the possibility in at least some archaebacteria of co-regulation of the expression of PsuX genes and of genes encoding proteins related to translation, as suggested above for some of the archaebacterial candidate Gar1p and Nop10p homologs.
| DISCUSSION |
|---|
|
|
|---|
On the basis of cDNA analysis, the Euglena Cbf5p mRNA has a 71-nt 5' untranslated leader, including a 28-nt trans-spliced leader that is two nucleotides longer at the 5' end than previously reported for the Euglena SL (67,68). A cluster of polyadenylation sites occurs between 122 and 136 nt downstream of an inferred UAG termination codon. However, no clear polyadenylation signals are evident.
The E.gracilis Cbf5p sequence is the first reported example of a protist homolog of this key rRNA modification protein. The Euglena branch is the earliest one in a eukaryotic Cbf5p phylogenetic tree, consistent with the early divergence of Euglena in phylogenies based on rRNA sequence comparisons; however, this placement should be considered tentative given the highly biased nature of the current Cbf5p database which, with the exception of Euglena Cbf5p, consists exclusively of animal and fungal sequences. In any event, Euglena Cbf5p branches robustly within the eukaryotic sub-tree, distinct from both archaebacterial Cbf5p and eubacterial/eukaryotic TruB/Pus4p sequences. Euglena Cbf5p has all of the conserved motifs characteristic of
synthases, as well as the distinctive C-terminal KKE repeat motif found only in eukaryotic Cbf5p sequences.
Our finding of a Cbf5p sequence in Euglena as well as the presence of box H/ACA snoRNAs in Tetrahymena (27) strongly indicates that protists possess a box H/ACA snoRNP-based system for rRNA processing and pseudouridylylation of rRNA. This conclusion is supported by the presence in public databases of partial protist cDNA sequences encoding box H/ACA snoRNP proteins. These protein sequences include Gar1p, Cbf5p and Nhp2p from an apicomplexan, Cryptosporidium parvum (GenBank accession nos AA532317, AA532324 and AA224694, respectively), and Nop10p from kinetoplastid protozoa, Trypanosoma spp (AA681026 and AA952384). In contrast, eubacteria use a set of site-specific (sometimes multisite-specific) rRNA
synthases, namely RluA, B, C, D and E (comprising the RluA family) and RsuA (78,79,9197).
The situation with respect to rRNA pseudouridylylation in archaebacteria is unclear at present. Based on the relatively small number of
residues in the rRNA of the crenarchaeote, Sulfolobus solfataricus (98), and the apparent absence of a Gar1p homolog in archaebacterial genomes, Lafontaine and Tollervey (99) suggested that archaebacteria may have a snoRNA-independent system of
formation in rRNA. On the other hand, as noted recently (79,84), genes for the eubacterial-type RluA and RsuA families of
synthases are not apparent in any of the completely sequenced archaebacterial genomes. Moreover, although the LSU rRNA of Sulfolobus acidocaldarius contains only six
residues (compared with 9 and 55, respectively, in E.coli and human LSU rRNA), none of these archaebacterial
residues is specifically shared with eubacteria to the exclusion of eukaryotes (100); rather, three are at unique positions, two are shared with both eubacteria and eukaryotes, and one is shared only with eukaryotes. Our report of archaebacterial sequences encoding candidate homologs of Gar1p and Nop10p, in conjunction with prior evidence of archaebacterial Cbf5p and Nhp2p homologs (30,31,101,102), lends additional support to the proposition that a snoRNA-based system operates in Archaea to generate
in rRNA. However, biochemical characterization of these archaebacterial proteins will be required to verify their proposed function.
Table 2 summarizes the known distribution of
synthases within the three domains of life. In addition to one or more members of the site-specific RluA and RsuA sub-families of rRNA
synthases, eubacteria generally contain one TruA gene and (with the exception of Mycoplasma spp and Helicobacter pylori) one TruB gene (79). The TruA and TruB
synthases catalyze formation of
at positions 3840 and 55, respectively, in tRNA [in E.coli, the RluA synthase is a dual-specificity enzyme that carries out pseudouridylylation at position 32 in tRNA as well as position 746 in the LSU (23S) rRNA (91)]. The RluA- and RsuA-type synthases bear no statistically significant similarity to the Cbf5p/TruB superfamily, although they do possess short sequence motifs diagnostic of
synthases in general.
|
Convincing Cbf5p and TruA homologs have been identified in those archaebacterial genomes that have been completely sequenced; however, no genes that are specifically related to TruB have been reported. If, as suggested here, Cbf5p functions as an rRNA
synthase in archaebacteria, these observations raise the question of how
55, which is known to occur in archaebacterial tRNAs (103), is formed. This may be another situation in which a single
synthase (in this case, Cbf5p or TruA) possesses dual specificity, acting on tRNA as well as rRNA, or at different sites in tRNA. Alternatively, genes for other (tRNA-specific)
synthases may exist in archaebacterial genomes but be too divergent to be recognized in database searches by the methods currently available. A third possibility is that the novel PsuX family reported here plays a role in
55 synthesis in tRNA, as its organismal distribution might suggest (Table 2). For example, although Pus4p orthologs are not detectable in sequenced archaebacterial genomes or in the C.elegans and D.melanogaster genome sequences, PsuX homologs are present in these genomes. Conversely, the yeast (S.cerevisiae) genome apparently lacks PsuX homologs but does encode a tRNA
55 synthase (Pus4p).
Lafontaine and Tollervey (99) have suggested that the Cbf5p gene might have originated via duplication of an ancestral TruB-like gene (probably) after the separation of the domains Eucarya and Archaea. However, because the archaebacterial TruB homolog is more closely related to the Cbf5p class than to the TruB sub-family per se, it is likely that any duplication of a TruB-like gene would have pre-dated the postulated archaebacterialeukaryotic split. In fact, the accumulating evidence does suggest that a transition in the rRNA
synthesizing machinery may have occurred in an archaebacterialeukaryotic common ancestor, from the simpler, site-specific, eubacterial type of RluA-RsuA
synthases to the more complex, snoRNP-dependent type present in eukaryotes and (as suggested here) archaebacteria. Alternatively (but perhaps less likely in view of the complexity factor), the snoRNP-based Cbf5p
synthase system may have been ancestral, with retention of this system in the archaebacterialeukaryotic line but with a shift to an RluA/RsuA-based system in eubacteria. A third possibility is the direct evolution of a TruB-like, tRNA-specific activity into a Cbf5p-like, rRNA-specific enzyme (perhaps initially retaining specificity for tRNA as well), again presumably in an archaebacterialeukaryotic common ancestor. Current data do not allow us to distinguish among these possibilities.
This evolutionary picture is complicated by the presence in some eukaryotes of a TruB ortholog (Pus4p in yeast) having tRNA
55 specificity. It is possible that this gene traces its origin to an ancestral, duplicated TruB-like gene, with one of the duplicates diverging to give the Cbf5p family and the other retaining TruB structure and function. However, this scenario would require that the TruB gene was lost in the archaebacterial lineage but retained (as a Pus4p homolog) in the eukaryotic lineage. Also, one must account for the fact that eukaryotic Pus4p sequences are substantially more closely related to eubacterial TruB sequences than to either archaebacterial or eukaryotic Cbf5p sequences (Fig. 4). Thus, if duplication of a TruB-like gene had occurred in an archaebacterialeukaryotic common ancestor, one of the duplicates would have had to have diverged so radically that the evolutionary descendents of this duplication (Cbf5p and Pus4p in eukaryotes) no longer bear evidence of a specifically shared common ancestry. Further clouding this issue is the novel PsuX family described here, which is distantly related to both the Cbf5p and TruB subfamilies.
An alternative possibility to account for the origin for Pus4p is lateral transfer of the TruB gene, e.g. from the eubacteria-like endosymbiont that gave rise to mitochondria. This scenario would account for the fact that eukaryotic Pus4p is more similar to eubacterial TruB than to eukaryotic Cbf5p (31,84). The phylogenetic placement of Pus4p sequences (Fig. 4) is consistent with a direct eubacterial ancestry, although evidence of a specific
-proteobacterial ancestry, as expected for a mitochondrial origin (104), is not apparent. Again, this may largely be a sampling limitation, with only two full-length Pus4p sequences available at the moment. With regard to a possible endosymbiotic origin of the Pus4p gene, we note that the Pus4p protein catalyzes formation of
55 in mitochondrial as well as cytosolic tRNAs in yeast (38).
After submission of this manuscript, the nearly complete genome sequence of D.melanogaster was published (105). Using E.coli
synthase sequences in a BLASTP search against the database of predicted Drosophila protein sequences, we detected (at E values <104) two and three putative homologs, respectively, of RluA and TruA, but no RsuA counterpart. Also, in addition to the Cbf5p (Nop60Bp/minifly) and PsuX orthologs already discussed, we encountered a third Drosophila protein sharing a low level (E = 2 x 103) of sequence similarity with E.coli TruB, namely the CB7849 gene product (AAF57283.1). The latter sequence detects closely related proteins encoded by the human (AAD20059.1; E = 4 x 1028) and C.elegans (AAF60570.1; E = 5 x 109) genomes. The human homolog of this protein group (for which we propose the name PsuY) was previously detected as a TruB homolog (TRUB2/HUMAN in figure 5 of ref. 79); however, no conserved candidate TruB motif II is evident in these PsuY sequences.
| ACKNOWLEDGEMENTS |
|---|
We thank Dr Murray N. Schnare and Michael Charette for valuable discussion and advice, and other members of the Gray Lab for critical comment. M.W.G., who is a Fellow in the Program in Evolutionary Biology, Canadian Institute for Advanced Research, gratefully acknowledges salary and interaction support from the CIAR. This work was funded by operating grant MT-11212 from the Medical Research Council of Canada to M.W.G.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 902 494 2521; Fax: +1 902 494 1355; Email: m.w.gray@dal.ca
| REFERENCES |
|---|
|
|
|---|
-
1 Eichler,D.C. and Craig,N. (1994) Prog. Nucleic Acid Res. Mol. Biol., 49, 197239.[ISI][Medline]
2 Venema,J. and Tollervey,D. (1995) Yeast, 11, 16291650.[ISI][Medline]
3 Morrissey,J.P. and Tollervey,D. (1995) Trends Biochem. Sci., 20, 7882.[ISI][Medline]
4 Lafontaine,D.L.J., Bousquet-Antonelli,C., Henry,Y., Caizergues-Ferrer,M. and Tollervey,D. (1998) Genes Dev., 12, 527537.
5 Schnare,M.N. and Gray,M.W. (1990) J. Mol. Biol., 215, 7383.[ISI][Medline]
6 Gray,M.W. and Schnare,M.N. (1996) In Zimmermann,R.A. and Dahlberg,A.E. (eds), Ribosomal RNA: Structure, Evolution, Processing, and Function in Protein Biosynthesis. CRC Press, Inc., Boca Raton, FL, pp. 4969.
7 Schnare,M.N., Cook,J.R. and Gray,M.W. (1990) J. Mol. Biol., 215, 8591.[ISI][Medline]
8 Greenwood,S.J. and Gray,M.W. (1998) Biochim. Biophys. Acta, 1443, 128138.[Medline]
9 Greenwood,S.J., Schnare,M.N. and Gray,M.W. (1996) Curr. Genet., 30, 338346.[ISI][Medline]
10 Edlind,T.D., Li,J., Visvesvara,G.S., Vodkin,M.H., McLaughlin,G.L. and Katiyar,S.K. (1996) Mol. Phylogenet. Evol., 5, 359367.[ISI][Medline]
11 Keeling,P.J. and Doolittle,W.F. (1996) Mol. Biol. Evol., 13, 12971305.[Abstract]
12 Li,J., Katiyar,S.K., Hamelin,A., Visvesvara,G.S. and Edlind,T.D. (1996) Mol. Biochem. Parasitol., 78, 289295.[ISI][Medline]
13 Roger,A.J., Sandblom,O., Doolittle,W.F. and Philippe,H. (1999) Mol. Biol. Evol., 16, 218233.
14 Sogin,M.L. (1991) Curr. Opin. Genet. Dev., 1, 457463.[Medline]
15 Balakin,A.G., Smith,L. and Fournier,M.J. (1996) Cell, 86, 823834.[ISI][Medline]
16 Ganot,P., Bortolin,M.-L. and Kiss,T. (1997) Cell, 89, 799809.[ISI][Medline]
17 Ganot,P., Caizergues-Ferrer,M. and Kiss,T. (1997) Genes Dev., 11, 941956.
18 Selvamurugan,N., Joost,O.H., Haas,E.S., Brown,J.W., Galvin,N.J. and Eliceiri,G.L. (1997) Nucleic Acids Res., 25, 15911596.
19 Morrissey,J.P. and Tollervey,D. (1993) Mol. Cell. Biol., 13, 24692477.
20 Ni,J., Tien,A.L. and Fournier,M.J. (1997) Cell, 89, 565573.[ISI][Medline]
21 Tollervey,D. (1987) EMBO J., 6, 41694175.[ISI][Medline]
22 Smith,C.M. and Steitz,J.A. (1997) Cell, 89, 669672.[ISI][Medline]
23 Tollervey,D. and Kiss,T. (1997) Curr. Opin. Cell Biol., 9, 337342.[ISI][Medline]
24 Olivas,W.M., Muhlrad,D. and Parker,R. (1997) Nucleic Acids Res., 25, 46194625.
25 Leader,D.J., Clark,G.P., Watters,J., Beven,A.F., Shaw,P.J. and Brown,J.W.S. (1997) EMBO J., 16, 57425751.[ISI][Medline]
26 Giordano,E., Peluso,I., Senger,S. and Furia,M. (1999) J. Cell Biol., 144, 11231133.
27 Nielsen,H., Ørum,H. and Engberg,J. (1992) FEBS Lett., 307, 337342.[ISI][Medline]
28 Lübben,B., Fabrizio,P., Kastner,B. and Lührmann,R. (1995) J. Biol. Chem., 270, 1154911554.
29 Bousquet-Antonelli,C., Henry,Y., Gélugne,J.-P., Caizergues-Ferrer,M. and Kiss,T. (1997) EMBO J., 16, 47704776.[ISI][Medline]
30 Henras,A., Henry,Y., Bousquet-Antonelli,C., Noaillac-Depeyre,J., Gélugne,J.-P. and Caizergues-Ferrer,M. (1998) EMBO J., 17, 70787090.[ISI][Medline]
31 Watkins,N.J., Gottschalk,A., Neubauer,G., Kastner,B., Fabrizio,P., Mann,M. and Lührmann,R. (1998) RNA, 4, 15491568.[Abstract]
32 Jiang,W., Middleton,K., Yoon,H.-J., Fouquet,C. and Carbon,J. (1993) Mol. Cell. Biol. 13, 48844893.
33 Cadwell,C., Yoon,H.-J., Zebarjadian,Y. and Carbon,J. (1997) Mol. Cell. Biol., 17, 61756183.[Abstract]
34 Meier,U.T. and Blobel,G. (1994) J. Cell Biol., 127, 15051514 [published erratum appears in (1998) J. Cell Biol., 140, 447].
35 Nurse,K., Wrzesinski,J., Bakin,A., Lane,B.G. and Ofengand,J. (1995) RNA, 1, 102112.[Abstract]
36 Koonin,E.V. (1996) Nucleic Acids Res., 24, 24112415.
37 Gustafsson,C., Reid,R., Greene,P.J. and Santi,D.V. (1996) Nucleic Acids Res., 24, 37563762.
38 Becker,H.F., Motorin,Y., Planta,R.J. and Grosjean,H. (1997) Nucleic Acids Res., 25, 44934499.
39 Heiss,N.S., Knight,S.W., Vulliamy,T.J., Klauck,S.M., Wiemann,S., Mason,P.J., Poustka,A. and Dokal,I. (1998) Nature Genet., 19, 3238.[ISI][Medline]
40 Mitchell,J.R., Wood,E. and Collins,K. (1999) Nature, 402, 551555.[Medline]
41 Phillips,B., Billin,A.N., Cadwell,C., Buchholz,R., Erickson,C., Merriam,J.R., Carbon,J. and Poole,S.J. (1998) Mol. Gen. Genet., 260, 2029.[ISI][Medline]
42 Winkler,A.A., Bobok,A., Zonneveld,B.J.M., Steensma,H.Y. and Hooykaas,P.J.J. (1998) Yeast, 14, 3748.[ISI][Medline]
43 Bult,C.J., White,O., Olsen,G.J., Zhou,L., Fleischmann,R.D., Sutton,G.G., Blake,J.A., FitzGerald,L.M., Clayton,R.A., Gocayne,J.D. et al. (1996) Science, 273, 10581073.[Abstract]
44 Smith,D.R., Doucette-Stamm,L.A., Deloughery,C., Lee,H., Dubois,J., Aldredge,T., Bashirzadeh,R., Blakely,D., Cook,R., Gilbert,K. et al. (1997) J. Bacteriol., 179, 71357155.
45 Klenk,H.-P., Clayton,R.A., Tomb,J.-F., White,O., Nelson,K.E., Ketchum,K.A., Dodson,R.J., Gwinn,M., Hickey,E.K., Peterson,J.D. et al. (1997) Nature, 390, 364370.[Medline]
46 Kawarabayasi,Y., Sawada,M., Horikawa,H., Haikawa,Y., Hino,Y., Yamamoto,S., Sekine,M., Baba,S., Kosugi,H., Hosoyama,A. et al. (1998) DNA Res., 5, 5576.[Abstract]
47 Kawarabayasi,Y., Hino,Y., Horikawa,H., Yamazaki,S., Haikawa,Y., Jin-no,K., Takahashi,M., Sekine,M., Baba,S., Ankai,A. et al. (1999) DNA Res., 6, 83101.[Abstract]
48 Breckenridge,D.G., Watanabe,Y., Greenwood,S.J., Gray,M.W. and Schnare,M.N. (1999) Proc. Natl Acad. Sci. USA, 96, 852856.
49 Maruyama,K. and Sugano,S. (1994) Gene, 138, 171174.[ISI][Medline]
50 Baskaran,N., Kandpal,R.P., Bhargava,A.K., Glynn,M.W., Bale,A. and Weissman,S.M. (1996) Genome Res., 6, 633638.
51 Don,R.H., Cox,P.T., Wainwright,B.J., Baker,K. and Mattick,J.S. (1991) Nucleic Acids Res., 19, 4008.
52 Marchuk,D., Drumm,M., Saulino,A. and Collins,F.S. (1991) Nucleic Acids Res. 19, 1154.
53 Sanger,F., Nicklen,S. and Coulson,A.R. (1977) Proc. Natl Acad. Sci. USA, 74, 54635467.
54 Mytelka,D.S. and Chamberlin,M.J. (1996) Nucleic Acids Res., 24, 27742781.
55 Frank,R., Müller,D. and Wolff,C. (1981) Nucleic Acids Res., 9, 49674979.
56 Siebert,P.D., Chenchik,A., Kellog,D.E., Lukyanov,K.A. and Lukyanov,S.A. (1995) Nucleic Acids Res., 23, 10871088.
57 Ausubel,F.M., Brent,R., Kingston,R.E., Moore,D.D., Seidman,J.G., Smith,J.A. and Struhl,K. (1987) Current Protocols in Molecular Biology. Greene Publishing Associates and Wiley-Interscience, New York, NY.
58 Chomczynski,P. and Mackey,K. (1994) Anal. Biochem., 221, 303305.[ISI][Medline]
59 Sambrook,J., Fritsch,E.F. and Maniatis,T., (1989) Molecular Cloning: A Laboratory Manual, 2nd Edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.
60 Church,G.M. and Gilbert,W. (1984) Proc. Natl Acad. Sci. USA, 81, 19911995.
61 Chomczynski,P. (1992) Anal. Biochem., 201, 134139.[ISI][Medline]
62 Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Nucleic Acids Res., 25, 33893402.
63 Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) Nucleic Acids Res., 22, 46734680.
64 Strimmer,K. and von Haeseler,A. (1996) Mol. Biol. Evol., 13, 964969.[ISI]
65 Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) Comput. Appl. Biosci., 8, 275282.
66 Ota,T. and Nei,M. (1994) J. Mol. Evol., 38, 642643.
67 Tessier,L.-H., Keller,M., Chan,R.L., Fournier,R., Weil,J.-H. and Imbault,P. (1991) EMBO J., 10, 26212625.[ISI][Medline]
68 Keller,M., Tessier,L.H., Chan,R.L., Weil,J.H. and Imbault,P. (1992) Nucleic Acids Res., 20, 17111715.
69 Schnare,M.N. and Gray,M.W. (1999) J. Biol. Chem., 274, 2369123694.
70 Montandon,P.-E. and Stutz,E. (1990) Nucleic Acids Res., 18, 7582.
71 Chan,R.L., Keller,M., Canaday,J., Weil,J.-H. and Imbault,P. (1990) EMBO J., 9, 333338.[ISI][Medline]
72 Perry,K.L., Watkins,K.P. and Agabian,N. (1987) Proc. Natl Acad. Sci. USA, 84, 81908194.
73 McNally,K.P. and Agabian,N. (1992) Mol. Cell. Biol., 12, 48444851.
74 Lücke,S., Xu,G.-L., Palfi,Z., Cross,M., Bellofatto,V. and Bindereif,A. (1996) EMBO J., 15, 43804391.[ISI][Medline]
75 Bangs,J.D., Crain,P.F., Hashizume,T., McCloskey,J.A. and Boothroyd,J.C. (1992) J. Biol. Chem., 267, 98059815.
76 Ebel,C., Frantz,C., Paulus,F. and Imbault,P. (1999) Curr. Genet., 35, 542550.[ISI][Medline]
77 Huang,L., Pookanjanatavip,M., Gu,X. and Santi,D.V. (1998) Biochemistry, 37, 344351.[Medline]
78 Raychaudhuri,S., Niu,L., Conrad,J., Lane,B.G. and Ofengand,J. (1999) J. Biol. Chem., 274, 1888018886.
79 Conrad,J., Niu,L., Rudd,K., Lane,B.G. and Ofengand,J. (1999) RNA, 5, 751763.[Abstract]
80 Ramamurthy,V., Swann,S.L., Paulson,J.L., Spedaliere,C.J. and Mueller,E.G. (1999) J. Biol. Chem., 274, 2222522230.
81 Gu,X., Liu,Y. and Santi,D.V. (1999) Proc. Natl Acad. Sci. USA, 96, 1427014275.
82 Foster,P.G., Huang,L., Santi,D.V. and Stroud,R.M. (2000) Nature Struct. Biol., 7, 2327.[ISI][Medline]
83 Zebarjadian,Y., King,T., Fournier,M.J., Clarke,L. and Carbon,J. (1999) Mol. Cell. Biol., 19, 74617472.
84 Aravind,L. and Koonin,E.V. (1999) J. Mol. Evol., 48, 291302.[ISI][Medline]






