Nucleic Acids Research, 2002, Vol. 30, No. 11 2575-2587
© 2002 Oxford University Press
Tracing the evolution of RNA structure in ribosomes
1Laboratory of Molecular Ecology and Evolution and Division of Molecular Biology, Department of Biology, University of Oslo, N-0316 Oslo, Norway and 2Vital NRG, Knoxville, TN, USA
Received November 26, 2001; Revised March 19, 2002; Accepted April 2, 2002.
| ABSTRACT |
|---|
|
|
|---|
The elucidation of ribosomal structure has shown that the function of ribosomes is fundamentally confined to dynamic interactions established between the RNA components of the ribosomal ensemble. These findings now enable a detailed analysis of the evolution of ribosomal RNA (rRNA) structure. The origin and diversification of rRNA was studied here using phylogenetic tools directly at the structural level. A rooted universal tree was reconstructed from the combined secondary structures of large (LSU) and small (SSU) subunit rRNA using cladistic methods and considerations in statistical mechanics. The evolution of the complete repertoire of structural ribosomal characters was formally traced lineage-by-lineage in the tree, showing a tendency towards molecular simplification and a homogeneous reduction of ribosomal structural change with time. Character tracing revealed patterns of evolution in inter-subunit bridge contacts and tRNA-binding sites that were consistent with the proposed coupling of tRNA translocation and subunit movement. These patterns support the concerted evolution of tRNA-binding sites in the two subunits and the ancestral nature and common origin of certain structural ribosomal features, such as the peptidyl (P) site, the functional relay of the penultimate stem helix of SSU rRNA, and other structures participating in ribosomal dynamics. Overall results provide a rare insight into the evolution of ribosomal structure.
| INTRODUCTION |
|---|
|
|
|---|
Ribosomes are macromolecular factories that synthesize proteins from amino acids with remarkable speed and accuracy, their function being fundamentally confined to ribosomal RNA (rRNA) (1). High-resolution crystal structures of the 30S and 50S ribosomal subunits have been recently determined at atomic resolution (25), and the complete structure of the 70S ribosome (at 5.5 Å) acquired in the presence of transcript molecules and cognate transfer RNA (tRNA) bound to aminoacyl (A), peptidyl (P) and exit (E) sites (6). These, and other studies (7), clarify the many functional interactions established within the rRNA core. Dynamic bridges between rRNA, tRNA and peripheral proteins ratchet the tRNA molecules through the center of the ribosome, orienting subunits and facilitating their movement in the process of peptide elongation (6). Crystallography has also confirmed secondary structural models predicted during the past two decades by the study of compensating substitutions in sequence alignments (8,9) and has helped place the evolutionarily conserved and functional ribosomal regions in the center of the molecule (10). The careful identification of functional components in rRNA now provides a new perspective on translation (11) and enables an evolutionary tracing of these structures at the molecular level.
Phylogenies are branching histories of inheritance and useful tools for testing evolutionary hypotheses. They are usually described by trees, graphical and mathematical representations (containing branches and reticulations) that depict how contemporary is common ancestry (12). Phylogenetic trees trace the genealogical relationships of molecules and organisms at different taxonomical levels. They have been particularly useful for the comparative analysis of nucleic acid and protein sequences in subjects as diverse as molecular epidemiology, the study of genome organization and evolution, and the origin of life (13,14). In recent years there has been continued recognition that higher order structure is fundamental to establish important structurefunction relationships in biological macromolecules (15). The increasing acquisition of structural information and the inception of structural genomics (16) has reinforced this view. Many have studied the evolutionary diversification of structure at rather large structural scales (17). Evolutionary relationships have also been inferred directly from the structure of nucleic acid (1821) and protein molecules (2225), using formal approaches of phylogenetic analysis. These methods compare molecules using secondary structure representations or spatial backbone distributions. This contrasts with approaches of positional covariance in comparative sequence analysis that focus on inferring structure from phylogenetic diversification (8,9). I recently introduced a formal approach that recovers phylogenetic signal from the structure of nucleic acid molecules and can be applied to the study of organisms of highly divergent lineage (20,21). Phylogenetic relationships are inferred on the basis of shared and derived characteristics in RNA structure, using a cladistic approach and considerations in statistical mechanics (see Fig. 1). Structural features that describe molecular components or entire molecules are treated as linearly ordered multi-state characters that are polarized by fixing the direction of evolutionary transformation towards molecular order. The assumption that RNA molecules are optimized by a process that increases favorable and decreases non-favorable inter- and intra-molecular interactions is supported by analytical models that reconstruct the structural repertoire of evolving RNA sequences from energetic and kinetic perspectives (2628), the study of extant and randomized sequences (2932), experimental verification of a tendency towards stability using thermodynamic principles generalized to account for non-equilibrium conditions (33,34), correlation between thermal stability and the occurrence of structural motifs in natural nucleic acids (35), and concordance between phylogenies inferred from structure, sequence and accepted classification (21). The approach ultimately reconstructs inherently rooted phylogenetic trees directly from evolved structure, allowing the evolutionary study of important functional components of RNA molecules at levels other than sequence. I reconstruct here a universal tree from the combined secondary structures of large (LSU) and small (SSU) subunit rRNA, trace structural characters in the trees, and study the origin and diversification of ribosomal structure.
|
| MATERIALS AND METHODS |
|---|
|
|
|---|
rRNA sequences and secondary structures derived from comparative sequence analysis were drawn from the Antwerp database (http://rrna.uia.ac.be) and the 5S rRNA Data Bank (http://cammsg3.caos.kun.nl). The SSU structures were corrected to account for pseudoknots in variable area V4 (36). RNA structures were decomposed into sub-structural components and their features characterized and coded using an alphanumerical format suitable for cladistic analysis, as previously described (20,21). Coded characters were based on the length in nucleotides of double-helical stem tracts (S), hairpin (H), bulge and interior loops (B), and unpaired segments such as free ends, connecting joints, and multiloop sequences separating stems (U). Other structural features, such as mean number of stems attached to a loop (D) and number of B loops in S (N), were used to describe 5S rRNA. Character states were represented by numbers 09 and letters AP. Structural features with longer nucleotide lengths were given the maximum state (P), and if missing, the minimum state (0). The maximum number of character states was limited by the phylogenetic analysis programs used here (26 states). Structural alignments listed characters characterizing the structure in the 5'Æ3' direction as it is read in the sequence, and for each sequence segment, in the order S, B, H and U. Stem tracts were defined as those separated by multibranched and pseudoknot loops, and were considered as coaxial units even when harboring interior loops. They were defined by two complementary sequence segments and characters [named by a number and its prime according to established nomenclature (10)] to account for the difference in nucleotide number between stem and unpaired tracts. As previously described, topographic correspondence was the main criterion for determining character homology in unambiguously aligned rRNA molecules [see Caetano-Anollés (21) for details]. Molecules were also characterized by statistical metrics Q, P and S (31) as described in Figure 1. The equilibrium partition functions and base pair probabilities were calculated using the Vienna package (37). Phylogenetic relationships were inferred using the PAUP* package (38) and character reconstruction implemented in MacClade (39). Characters were polarized by distinguishing ancestral states as those with larger S and lower H, B, U, N and D state values. The assumptions, justification and limitations of the model of character change have been previously described (20,21). Phylogenetic trees were reconstructed using maximum parsimony as the optimality criterion, and were automatically rooted at the point where the hypothetical ancestor connected to the tree. Phylogenetic reliability was evaluated by the non-parametric bootstrap (BS) method (40) (implemented using at least 103 pseudoreplicates in PAUP*) and by double decay (DD) analysis (41) [using RadCon (42)]. The structure of phylogenetic signal in the data was tested by the skewness (g1) of the length distribution of 104 random trees, and permutation tail probability (PTP) tests of cladistic covariation using 103 replicates. The homogeneity of partitions was analyzed using a modified MichevichFarris index of incongruence among data sets and 103 heuristic search replicates (43). Topological congruence was measured using several tree comparison metrics and randomization tools implemented in COMPONENT (44). The distribution of character change in a clade was measured using the node index (Ni) metric. Character changes per branch in branches equidistant to the root node (by value n) were normalized to the total number of changes per branch in the clade, weighted (multiplied by n), averaged, and expressed on a 01 scale. These values were sometimes gap-recoded and used as discrete characters for maximum parsimony analysis.
| RESULTS |
|---|
|
|
|---|
Reconstruction of a universal tree based on rRNA structure
The origin and diversification of rRNA molecules was studied directly at the secondary structure level. The procedure of analysis followed the three steps outlined in Figure 1. First, rRNA sequences that were folded according to accepted structural models [inferred from comparative sequence analysis (10)] were described by a set of molecular attributes that characterize numerically their secondary structure. Two different sets of characters were used for this purpose, one describing directly the shape of the molecules, another describing general statistical properties (molecular statistics described in Fig. 1A). In this paper, I have focused on characters that describe each and every spatial component of secondary rRNA structure, since these characters can trace the actual details of the evolutionary transformation of molecular structure. Presently, statistical characters describe the evolution of complete molecular ensembles and cannot trace spatial features individually in complicated molecules such as rRNA. Their analysis will be described elsewhere. Each rRNA structure was dissected into double-helical stem tracts, hairpin loops, bulge and interior loops, and other single-stranded segments. The length of these components (in nucleotides) was tabulated using a simple alphanumeric format and used to generate a data matrix for cladistic analysis in which homologous characters were aligned in ordered columns following the 5'Æ3' direction of the sequence. Table 1 illustrates the data showing an alignment matrix of rRNA structural characters encoding inter-subunit bridge contacts and tRNA-binding sites. This is only a small subset representing only
5% of the rRNA data matrix. These matrices are used to characterize nucleic acid molecules and can be viewed as cladistic ensembles composed mostly of helical stem and loop regions that expand and/or contract in the course of evolution. Secondly, the structural characters identified were polarized by establishing a direction in their evolutionary transformation. Character polarization and its assumptions are carefully described in Figure 1B (and elsewhere) (21) and were implemented by identifying ancestral character states with the ANCSTATES command in PAUP. Thirdly, the structural alignment matrices were used to reconstruct universal phylogenetic trees using maximum parsimony as the optimality criterion and established methods. This involved comparing molecules from species encompassing all primary organismal domains of life.
|
Only LSU and SSU rRNA molecules were analyzed in this study, as the small structured rRNA (e.g. 5S rRNA; Fig. 1) carried little phylogenetic information and was unsuitable for the reconstruction of a universal phylogeny. Data from the rRNA subunits in 29 selected taxa were combined and resulted in 1540 structural characters (878 and 662 for LSU and SSU rRNA, respectively) that included 384 S, 778 B, 117 H and 261 U features. The complete data set can be retrieved from the TreeBASE repository (http://herbaria.harvard.edu/treebase/) under study and matrix accession nos S730 and M1162, respectively. In order to diminish phylogenetic noise, 510 uninformative characters representing invariant features and autapomorphies were excluded before tree reconstruction. Combined analysis of the rRNA subunits produced a single universal tree (Fig. 2A). The data exhibited strong phylogenetic signal, as indicated by the distribution of cladogram length (p < 0.01) and PTP tests of cladistic co-variation (p = 0.001). However, the cladistic properties of rRNA structure appeared heterogeneous. The null hypothesis of congruence was rejected when combined LSU and SSU molecules, combined helix and unpaired regions, or combined structural domains in both molecules were tested for homogeneity of data partitions (p = 0.001). This suggests that the two subunits exhibit distinct phylogenetic histories, the molecules themselves are heterogeneous, and helix and unpaired regions express variant evolutionary patterns. However, congruence could not be rejected when unpaired regions were subdivided into structural components H, B and U (p = 0.425), and functional characters defining inter-subunit bridges and tRNA-binding sites were compared with the rest of the molecules (p = 0.079). This was somehow unanticipated, especially for functional regions that should have had differential histories imprinted on them (see below).
|
The trees reconstructed from the secondary structures of individual or combined rRNA subunits were mostly congruent (partition distance, PD = 1832; symmetric difference, SD = 0.120.25 for quartet analysis), rejecting a topological match by chance (p < 0.01). However, an approach of taxonomic congruence in partitioned analysis in which rRNA subunits were individually analyzed and trees combined by strict consensus resulted in considerable loss of resolution (Fig. 2B). Therefore, I have combined subunit data to enable character reconstruction in the ribosomal ensemble, reduce ambiguities, and maximize explanatory power (45). Note however that this could compromise the conservativeness of certain phylogenetic statements.
The topology of the universal tree was reasonably supported by BS and decay analysis (Fig. 2A), and was similar to topologies individually obtained from LSU and SSU structures (21). The tree branched in three monophyletic groups corresponding to the major domains of life. Clades defining Archaea and Bacteria were significantly supported (85 and 88% BS). In contrast, clade definition was marginal for Eucarya (72% BS), mostly due to the early branching of amitochondriate Archezoa. However, support for Eucarya was significant (86% BS) when a refined model of character evolution (see below) was used. The universal tree was rooted in the eukaryotic branch, albeit poorly (50% BS). A similar rooting was observed in reconstructions from LSU and SSU rRNA structure (with 97 and 50% BS, respectively) (21). Phylogenetic relationships generally matched traditional classification and revealed the unprecedented diversity of the eukaryotes. Decay and leaf stability values showed support being strongest for most of the relationships within Eucarya and weakest for those in Bacteria. Incongruence between phylogenies reconstructed from sequence and structure was also evident, such as the grouping of fungi and plants and the early branching of Archezoa in Eucarya (21). Note that phylogenetic analysis of structure may suffer from the same problems that affect reconstructions from sequence, such as mutational saturation, variation of evolutionary rates across sites, and covarion structure.
Inference of a model of rRNA character evolution
Given a phylogeny and an initial evolutionary hypothesis, it is possible to reconstruct histories of character change and use this information to infer a refined model of character evolution (39). The final objective is to generate a matrix of transformation costs between character states that assigns a probability to every possible change. Step matrices help understand mechanisms of evolution and can improve phylogenetic reconstruction. For example, substitution rate models utilize rate matrices that explain RNA sequence evolution by incorporating assumptions about base-pairing constraints (46). As with sequence, the definition of an initial model of RNA structural evolution requires assumptions about the evolutionary process itself. In the present study, the initial model considers that structural features increase or decrease in length in single steps, each step corresponding to the addition or removal of a nucleotide in a sequence. The process follows a linear and reversible path, but expresses an asymmetry in transformation costs (e.g. the transformation cost from lengths 1 to 2 differs from lengths 2 to 1). This accommodates a universal search for order that is supported by considerations in thermodynamics and statistical mechanics of molecules. Since assumptions could not be falsified by phylogenetic reconstruction (21), the validity and approach of using character history reconstructions to draw inferences about evolutionary processes related to RNA structure is therefore enforced.
Based on these considerations, a model of character evolution was inferred from patterns of character change using the State Changes and Stasis feature in MacClade, and the resulting step matrix of character transformation was used to reconstruct a refined universal tree. The relative frequencies of change were plotted in a bubble diagram (Fig. 3A) and were converted to a transformation type using functions described by Wheeler (47). The refined model depicted the frustrated energetics of base pairing invoked by the original model of character change. While changes occurred most frequently in single steps, there was a differential behavior of helix stacks and unpaired loop regions. In helices, losses were favored over gains for lengths of stacks £911 bp with a reverse trend for longer segments. In unpaired regions, gains were favored over losses, with frequencies decreasing with nucleotide length. These trends were especially evident when one-step gains and losses are individually charted for each or both subunits (data not shown). The inferred step matrix was used to reconstruct a refined universal tree (Fig. 3B). This tree maintained the overall topology of the original reconstruction (PD = 10, SD = 0.08), but showed a better support of the clade encompassing Eucarya.
|
Tracing the evolution of structural change in rRNA
The ability to trace structural features evolving along the lineages of the universal tree using parsimony-based methods offers a unique opportunity to study how evolutionary change is distributed and constrained in rRNA. It also enables the inference of hypothetical ancestral molecules by assigning character states (termed most parsimonious reconstruction sets) to each node in the tree. The historical reconstructions of the pathways taken by characters from one state to another were analyzed here on a phylogeny that, with debated exceptions (Archezoa), derives congruently from sequence, structural traits and traditional classification. To my knowledge, this is the first time that a complete repertoire of structural characters has been traced lineage-by-lineage on a tree using such a formal treatment.
Figure 4A describes ancestral LSU and SSU rRNA molecules inferred from reconstructions of character history. The ancestral molecules showed structural features absent or reduced in molecules from extant taxa in Eucarya (e.g. 8_1, 23_n, C1_n, D4_1 and E20_n), Archaea and Bacteria (e.g. 43_n, G5_n and H1_n components). These features generally coincided with hypervariable regions, known to have insertion sites toward the less conserved periphery of the ribosomal ensemble, and appearing with no clear evolutionary pattern (10).
|
The distribution of character change along the branches of the universal (dichotomous) tree showed that most structural changes in LSU and SSU rRNA molecules were of ancestral origin (Fig. 4B). The frequency of change was highest in Eucarya. Character change in branches (nodes) equidistant to the hypothetical ancestor (the root node) increased significantly towards the base of the universal tree, and this pattern was not significantly different between subunits following a two-way ANOVA for values normalized to the total number of informative characters in each molecule (df: 1, 7, 7, 96; branches: F = 2.68, p = 0.014; subunits: F = 0.76, p = 0.386; interaction: F = 1.53, p = 0.168). Overall, there were 0.095 ± 0.009 (SE) and 0.077 ± 0.010 changes per character occurring in LSU and SSU rRNA, respectively. The distribution of character change was also evaluated using a measure of central tendency, the node index (Ni). Ni is a conservative indicator of how derived are changes in branches within a clade, which is unbiased by clade topology, number of characters and levels of change, and ranges from 0 (changes occurring exclusively at the root) to 1 (changes exclusively at the leaves). Ni values for the LSU (0.426 ± 0.003) and SSU (0.388 ± 0.004) molecules were significantly lower (p < 0.001) than an average of shuffled data where changes were randomly distributed. This again indicates that changes are unequally distributed along the branches of the tree and cluster in its base.
These observations reveal the following three patterns of evolution in the lineages of the universal tree: (i) molecular simplification, especially in highly variable regions; (ii) reduction of structural change with evolutionary time; and (iii) a concordant distribution of character change in both ribosomal subunits.
Tracing the evolution of structures important for ribosomal function
A number of structures important for ribosome function have been recently identified in the interface of the two rRNA subunits, located in an RNA core devoid of ribosomal proteins (6). The construction of three-dimensional nucleotide variability maps has shown that this core is highly conserved (10). Despite low relative substitution rates, the evolution of tRNA-binding sites in LSU rRNA [defined mostly by chemical footprinting and in vivo and in vitro functional studies (1)] was traced in the lineages of a tree generated from its structure (21). Here I have extended that preliminary study to the entire rRNA ensemble by including features important for ribosomal function that have been carefully resolved by low-resolution cryo-EM (48) and high-resolution structural analysis (27). Variable features that were individually traced in the universal evolutionary tree included sequences involved in peptidyl transferase activity, the translational cycle, and interaction between rRNA subunits (Fig. 4A). A total of 92 characters were studied (Table 1), encompassing inter-subunit bridge contacts and tRNA-binding sites (Table 2).
|
The pie charts in Figure 4B show a total of 204 character changes distributing along the branches of the universal tree. Over half of these occurred in inter-subunit contacts and were mostly derived. In turn, only 12% of changes in tRNA-binding sites occurred in P sites. These changes appeared ancestral to those in A and E sites, being mostly confined to the tree base and to Eucarya. The visual trends observed in the figure were confirmed by statistical analysis. Patterns of change were significant (p = 0.0001) when measured with the Ni metric (Table 2). Ni increased in the order P sites < E and A sites < bridges, in clear progression from ancestral to derived. This pattern of change may portray an actual origin of tRNA-binding sites, provided structural evolution was gradual at the onset of organismal diversification.
Ancestraldescendant relationships between functional components
In order to dissect evolutionary patterns further, Ni values were calculated for the major clades that were present in the universal tree and these were used as phylogenetic characters to reconstruct a rooted tree of ribosomal structures (Fig. 5). This approach established ancestraldescendant relationships between the functional ribosomal components themselves. Figure 5A shows the ribosomal outline of an interface view of rRNA subunits revealing the specific location of bridge contacts and tRNA-binding sites considered in this analysis. RNA interfaces included a total of 19 LSU and SSU rRNA mobile bridge contacts and P, A and E tRNA-binding sites in each rRNA subunit. For each subunit, tRNA-binding sites encompassed all interactions established with the cloverleaf tRNA structure described in Table 2.
|
The phylogenetic relationships between functional ribosomal structures were first reconstructed by pooling bridge components in each subunit together (Fig. 5B). The tree that was generated showed that (i) P sites in both rRNA subunits were ancestral to other functional structures; (ii) P, A and E sites clustered separately in three distinct groups, with E sites being more derived than the rest; and (iii) LSU rRNA bridge contacts had divergent histories and were more derived than their SSU rRNA counterparts. Statistical analysis of Ni values were for the most part consistent with these results (Table 2). The phylogenetic grouping of binding sites for tRNA in each subunit, with P sites appearing earlier that A sites, and E sites being the most derived, suggests that tRNA-binding sites in both rRNA subunits have been the subject of concerted evolution.
The phylogenetic relationships between mobile bridge contacts in individual subunits were also analyzed in relation to binding sites for tRNA (Fig. 5C). The phylogenetic tree showed the evolutionary relationships of individual mobile inter-subunit contacts. SSU rRNA bridge contacts B2a, B3, B5 and B6, all supported by the penultimate stem helix of the molecule (helix 49; see Figs 4A and 5A), were closely related phylogenetically to ancestral P site components. Stem helix 49, also known as helix 44 of 16S rRNA in the prokaryotic model, is the dominant structural component of the SSU rRNA interface and has been proposed as a central ribosomal functional relay (2). LSU rRNA contact B7a and the conserved B2b, both perturbed during tRNA translocation (6) and supported by coaxial helices E26E28 (helices 69 and 71 of the prokaryotic model), and flexible contact B2a, were similarly related to the P site. These contacts form a crucial central LSU rRNA interface element implicated in translational fidelity (6) that interacts directly with SSU rRNA functional relay through B2a and with the switch helix 31 through B2b. In contrast, several other bridge contacts were phylogenetic derivatives of the A and E sites, such as B2c, B7b, and LSU contacts B3 and B5. Interestingly, the A site finger B1a and protein-mediated B1b contacts between the LSU head and the SSU top appeared phylogenetic derivatives of the P site, although being structurally distant (D14) and quite derived. These flexible contacts are involved in the ratchet-like inter-subunit rotation induced by elongation factor G (49). B1a was particularly variable, accounting for 16% of changes in LSU bridge contacts. Interestingly, these bridges involve proteinprotein and proteinhelix contacts (6), suggesting the recent role of ribosomal proteins in subunit interactions.
Character evolution was also reconstructed in the structural trees, confirming for example the ancestral nature of bridges B2b, B4, B8 and contacts supported by the penultimate stem helix of SSU rRNA. This approach was especially useful when studying structural components important for ribosomal translocation. The Thermus thermophilus LSU rRNA central interface element harboring the flexible bridge contact B2a exhibits an intriguing disorder in the crystal structure of helix 69 (6). This helix, here labeled E26, also establishes contacts with P-tRNA and A-tRNA. The disorder has been explained by the feeble support of the multi-branched loop structure (embodied in helices E24E28) that harbors the central interface element and the absence of direct stackingpacking interactions with the rest of the LSU rRNA molecule. The disorder identifies features capable of independent motion that participate in ribosomal dynamics and involve both subunit contact and tRNA movement (6). Because of its central role in translocation, the E24E28 multi-branched loop structure was taken out of molecular context and was characterized using statistical properties that describe the stability and uniqueness of folded conformations (see Fig. 1). These properties were traced in the universal tree using square-change parsimony (Fig. 6). Interestingly, Q entropy values that describe number of conflicting interactions and measure plasticity in the E24E28 loop generally increased in the tree. They were considerably higher in Archaea and Eucarya than in Bacteria. This trend is the opposite of what is generally encountered in the rest of the rRNA molecules (G. Caetano-Anollés, manuscript in preparation), but was shared by the stem helix D14 that harbors B1a, the other flexible rRNA contact (data not shown). Therefore, ribosomal structures bearing flexible contacts appear subjected to strong functional constraints that force them to go against the universal search for molecular order evident in rRNA.
|
| DISCUSSION |
|---|
|
|
|---|
I have used cladistic principles to reconstruct phylogenetic history directly from the structure of nucleic acids molecules, focusing on the evolution of structural attributes that characterize how branched, stable and uniquely folded (plastic) are nucleic acid molecules. Structural attributes (characters) transform from one numerical value (character state) to another in linearly ordered and polarized pathways. These transformation [sensu Hennig (50)] pathways are defined here by a model of structural evolution driven by a fundamental and universal evolutionary tendency towards order. This model assumes that molecular evolution is constrained by the mapping of sequence into structure and that ancestral characteristics in extant molecules become fixed when trapped in the local optima of an adaptive landscape. The model does not consider other possible attributes that bare on biological function and are optimized during molecular evolution, given the limited knowledge of the complex selective forces operating at a functional level. Instead, function is regarded here as a first consequence of order and assumptions are only based on principles well grounded on the thermodynamics and statistical mechanics of molecules (2632). Note that these assumptions could not be falsified by phylogenetic reconstruction, as congruence between phylogenetic trees derived from structure, sequence and traditional classification failed to reject the validity of character transformation hypotheses.
One particularly fundamental phylogeny represents the hierarchical classification of the living world and is based on comparative analysis of sequences encoding rRNA and several proteins (14). This phylogeny is known as the universal tree of life. Because of concerted evolution, there are no paralogous genes that can root the rRNA universal tree and this problem has been intractable. Here I was able to reconstruct a rooted universal tree from the combined structures of LSU and SSU rRNA molecules. The tree branched in three monophyletic groups corresponding to the major domains of life, and was rooted in the eukaryotic branch (Fig. 2). This rooting supports conclusions from a previous analysis of individual rRNA structures (21), but conflicts with the accepted view of a prokaryotic universal ancestor (51). Instead, it intimates an equally parsimonious eukaryotic origin of diversified life. Note that a eukaryotic ancestry of life (52) has been recently supported by phylogenetic analysis of several gene sequences [e.g. SRP (53)]. The existence of three major monophyletic groups parallels evolutionary reconstructions from comparative sequence analysis (51,54,55), and suggests diversity originated in three dramatic evolutionary events. Within major clades, phylogenetic relationships generally matched traditional classification. However, there were instances of incongruence such as the basal placement of amitochondriate Archezoa in the Eucarya, which represents a highly debated topology (56).
Character tracing in the reconstructed tree allowed the inference of a refined model of character evolution, produced a step matrix of transformation costs for parsimony analysis, and resulted in a refined universal tree (Fig. 2). This tree retained the overall topology of the original reconstruction. As expected, the inferred model depicted the frustrated energetics of base pairing in RNA structure (26,28). The minimum free energy of a secondary structure can be considered the sum of its loop energies, which have been measured, tabulated, and used in folding algorithms as function of loop size and delimiting base pairs (5759). Energetically unfavorable loops constrain the formation of energetically favorable stacking regions, resulting in a vast arrangement of possible helix and unpaired segments [i.e. the structural repertoire of an RNA sequence (28)]. The contrasting probabilities of change in helix and loop regions reflect these opposing energetic trends. However, step matrices showed an unanticipated tendency to form rRNA molecules with paired segments of an optimal length of
10 bp. These observations suggest there is an evolutionary tendency to optimize molecular size. Interestingly, reconstructions of character history allowed the inference of ancestral rRNA molecules showing there was a clear evolutionary trend towards molecular simplification (Fig. 3). This general trend is consistent with a proposal of simplification of molecular features in prokaryotes by gene loss and non-orthologous displacement (60), and streamlining of the rapidly replicating prokaryotic genomes (17). Interestingly, features that are simplified generally have a polyphyletic distribution in the universal tree and occur in the periphery of the ribosome (10). Probably, they are not constrained by function and can therefore afford to be transient during evolution (17). However, there was one striking exception in this study, the ancestral but very compact structure of Encephalitozoon cuniculi (61). This anomaly could be attributed to unusual evolutionary forces, fast evolutionary rates, and the intracellular lifestyle of this amitochondriate microsporidian endoparasite.
An evolutionary model of structural change was recently inferred from the dynamics of RNA populations evolving towards predefined structural targets in a constant environment (28,62). The model predicts a concurrent reduction of variability (genetic canalization) and plasticity during structural evolution that ultimately results in evolutionary lock-in and extreme modularity (i.e. the ability to maintain the structural integrity of autonomous components across a wide range of environments and genetic contexts) (28). This model can be tested in rRNA by inferring the history of character change, if we assume that constraints imposed on structure by the functional landscape (e.g. involving translation) were relatively constant during ribosomal evolution. The distribution of character change along the branches of the universal tree showed structural change clustering at its base (Fig. 4B and Table 2). This pattern is consistent with the proposed evolutionary model, providing direct experimental support to the predicted reduction of structural variability in time. Similarly, tracing of cladistic characters reflecting the uniqueness of folded conformations supported an evolutionary decrease of the plastic repertoire (G. Caetano-Anollés, manuscript in preparation). Reductions in RNA plasticity have also been supported by studies in statistical mechanics of extant RNA molecules (including rRNA) (2932). Overall results substantiate the concurrent decrease of genetic variability and structural plasticity during evolution. Since cladogenesis (i.e. the branching of lineages in a phylogenetic tree) is a consequence of natural selection (50), I propose here that these evolutionary lock-in mechanisms enhance cladogenesis at the expense of curtailing phenotypic innovation. As a consequence, ancestral characteristics in extant molecules result from lineages that retain exploratory properties while those that are derived result from lock-in and express structural modularity as an evolutionary novelty. We are currently tracing modularity characteristics in phylogenetic trees to test some of these assumptions.
One of the most powerful features of structure-based phylogenetic tree reconstruction proposed here is the ability to trace the evolution of functionally important components in nucleic acid molecules. This is particularly meaningful when structural information has been acquired in vitro using methods such as cryo-electron microscopy reconstruction, NMR spectroscopy, crystallography, and genetic and biochemical approaches. Recent progress in the structural determination of the ribosomal ensemble (27) has defined mobile inter-subunit bridge contacts and tRNA-binding sites at atomic resolution in rRNA. This has provided an unprecedented view of the mechanism of translation. Here I show that the history of structural change in these ribosomal components can be traced on the universal tree. The distributions of character change along the universal tree were used to establish phylogenetic relationships between the functional structures themselves (Fig. 5). This constitutes a novel approach that enables the phylogenetic reconstruction of structural trees and the establishment of ancestraldescendant relationships between the individual structural components of a nucleic acid molecule. The evolutionary patterns observed support several interesting and functionally important features such as the concerted evolution of tRNA-binding sites in the two subunits, and the ancestral nature and common origin of structures that participate in ribosomal dynamics. These structures include the peptidyl (P) site, the functional relay of the penultimate stem helix of SSU rRNA, and the central interface of LSU rRNA. The spatial arrangement of contacts and binding sites and the molecular disorder that characterizes the coupling of tRNA translocation and inter-subunit movement in crystallographic analysis (6) matched the evolutionary patterns observed here. Interestingly, mobile contacts centrally implicated in these mechanisms (B2a and B1a) appear supported by structures with atypical evolutionary trends that favor molecular disorder. This observation illustrates how functional constraints can curb the evolution of molecules.
Structures predicted by the study of compensating substitutions in sequence alignments (810) have recently been confirmed by crystallographic studies of bacterial and archaeal rRNA molecules (17). This has validated the comparative sequence analysis prediction method. The majority of secondary structural components are common to prokaryotic and eukaryotic molecules, and for the most part, structures are well defined (10), despite ongoing refinement of variable and taxon-specific rRNA regions (especially in Eucarya) (36). Therefore, structural inaccuracies are assumed not severe and are tolerated as systematic error in this study, pending a detailed crystallographic analysis of the eukaryotic rRNA ensemble. Mature rRNA molecules represent a mosaic of highly conserved domains and rapidly evolving areas always found in identical locations along the molecules (63). Two- and three-dimensional nucleotide variability maps have been constructed recently (10), showing that highly conserved regions are more abundant in single-stranded rRNA and are preferentially located in the center of the ribosome. In contrast, structures specific for Archaea, Bacteria and Eucarya are situated in its periphery. A preliminary analysis of structural character change has shown that change is maximal on the periphery of the rRNA ensemble (G.Caetano-Anollés, unpublished results), confirming inferences from three-dimensional variability maps (10). The inter-subunit bridge contacts and tRNA-binding sites traced in the present study fall within the conserved and centrally located areas. They belong to functional structures involved in tRNA binding, mRNA decoding and peptidyl transfer, which exhibit average nucleotide substitution rates 1050 times lower that the average rRNA site. These structures are conserved and are present in the three domains of life. Structural inaccuracies in these sites should be considered negligible and inconsequential to the validity of the phylogenetic inferences presented in this study.
| SUPPLEMENTARY MATERIAL |
|---|
|
|
|---|
Supplementary Material is available at NAR Online.
| ACKNOWLEDGEMENTS |
|---|
I gratefully acknowledge G. E. Caetano-Anollés for help in character coding, V. Knudsen for computer programs, and Vital NRG for financial support.
| FOOTNOTES |
|---|
* Correspondence should be addressed to: Vital NRG, 1320 Beacon Hill Lane, Knoxville, TN 37919-7652, USA. Tel: +1 865 521 5083; Fax: +1 865 521 5083; Email: gustavoc{at}mac.com
| REFERENCES |
|---|
|
|
|---|
- Green,R. and Noller,H.F. (1997) Ribosomes and translation. Annu. Rev. Biochem., 66, 679716.[ISI][Medline]
-
Cate,J.H., Yusupov,M.M., Yusupova,G.Z., Earnest,T.N. and Noller,H.F. (1999) X-ray crystal structure of 70S ribosome functional complexes. Science, 285, 20952104.
[Abstract/Free Full Text] -
Ban,N., Nissen,P., Hansen,J., Moore,P.B. and Steitz,T.A. (2000) The complete atomic structure of the large ribosomal subunit at 2.4 Å resolution. Science, 289, 905920.
[Abstract/Free Full Text] - Wimberly,B.T., Brodersen,D.E., Clemons,W.M.,Jr, Morgan-Warren,R.J., Carter,A.P., Vonrhein,C., Hartsch,T. and Ramakrishnan,V. (2000) Structure of the 30S ribosomal subunit. Nature, 407, 306307.[Medline]
- Schluenzen,F., Tocilj,A., Zarivach,R., Harms,J., Gluehmann,M., Janell,D., Bashan,A., Bartels,H., Agmon,I., Franceschi,F. and Yonath,A. (2000) Structure of functionally activated small ribosomal subunit at 3.3 Å resolution. Cell, 102, 615623.[ISI][Medline]
-
Yusupov,M.M., Yusupova,G.Z., Baucom,A., Lieberman,K., Earnest,T.N., Cate,J.H.D. and Noller,H.F. (2001) Crystal structure of the ribosome at 5.5 Å resolution. Science, 292, 883896.
[Abstract/Free Full Text] -
Ogle,J.M., Brodersen,D.E., Clemons,W.M.,Jr, Tarry,M.J., Carter,A.P. and Ramakrishnan,V. (2001) Recognition of cognate transfer RNA by the 30S ribosomal subunit. Science, 292, 897902.
[Abstract/Free Full Text] - James,B.D., Olsen,G.J. and Pace,N.R. (1989) Phylogenetic comparative analysis of RNA secondary structure. Methods Enzymol., 180, 227239.[ISI][Medline]
-
Gutell,R.R., Larsen,N. and Woese,C.R. (1994) Lessons from an evolving rRNA: 16S and 23S rRNA structures from a comparative perspective. Microbiol. Rev., 58, 1026.
[Abstract/Free Full Text] -
Wuyts,J., Van de Peer,Y. and De Wachter,R. (2001) Distribution of substitution rates and location of insertion sites in the tertiary structure of ribosomal RNA. Nucleic Acids Res., 29, 50175028.
[Abstract/Free Full Text] - Woese,C.R. (2001) Translation: in retrospect and prospect. RNA, 7, 10551067.[Abstract]
- Page,R.D.M. and Holmes,E.C. (1998) Molecular Evolution: A Phylogenetic Approach. Blackwell Science, Oxford.
- Charlesworth,D., Charlesworth,B. and McVean,A.T. (2001) Genome sequences and evolutionary biology, a two-way interaction. Trends Ecol. Evol., 16, 235242.[Medline]
-
Doolittle,W.D. (1999) Phylogenetic classification and the universal tree. Science, 284, 21242128.
[Abstract/Free Full Text] - Vukmirovic,O.G. and Tilghman,S.M. (2000) Exploring genome space. Nature, 405, 820822.[Medline]
- Mittl,P.R.E. and Grütter,M.G. (2001) Structural genomics: opportunities and challenges. Curr. Opin. Chem. Biol., 5, 402408.[ISI][Medline]
- Clark,C.G. (1987) On the evolution of ribosomal RNA. J. Mol. Evol., 25, 343350.[ISI][Medline]
-
Billoud,B., Guerrucci,M.-A., Masselot,M. and Deutsch,J.S. (2000) Cirripede phylogeny using a novel approach: molecular morphometrics. Mol. Biol. Evol., 17, 14351445.
[Abstract/Free Full Text] - Collins,L.J., Moulton,V. and Penny,D. (2000) Use of RNA secondary structure for studying the evolution of RNase P and RNase MRP. J. Mol. Evol., 51, 194204.[ISI][Medline]
- Caetano-Anollés,G. (2001) Novel strategies to study the role of mutation and nucleic acid structure in evolution. Plant Cell Tissue Org. Culture, 67, 115132.
- Caetano-Anollés,G. (2002) Evolved RNA secondary structure and the rooting of the universal tree of life. J. Mol. Evol., 54, 333345.[ISI][Medline]
- Eventoff,W. and Rossmann,M.G. (1975) The evolution of dehydrogenases and kinases. CRC Crit. Rev. Biochem., 3, 111140.[Medline]
- Johnson,M.S., Sutcliff,M.J. and Blundell,T.L. (1990) Molecular anatomy: phyletic relationships derived from three-dimensional structures of proteins. J. Mol. Evol., 30, 4359.[ISI][Medline]
- Bujnicki,J.M. (2000) Phylogeny of restriction endonuclease-like superfamily inferred from comparison of protein sequences. J. Mol. Evol., 50, 3944.[ISI][Medline]
-
Breitling,R., Laubner,D. and Adamski,J. (2001) Structure-based phylogenetic analysis of short-chain alcohol dehydrogenases and reclassification of the 17beta-hydroxysteroid dehydrogenase family. Mol. Biol. Evol., 18, 21542161.
[Abstract/Free Full Text] - Fontana,W. and Schuster,P. (1998) Shaping space: the possible and the attainable in RNA genotypephenotype mapping. J. Theor. Biol., 194, 491515.[ISI][Medline]
- Wuchty,S., Fontana,W., Hofacker,I.L. and Schuster,P. (1999) Complete suboptimal folding of RNA and the stability of secondary structures. Biopolymers, 49, 145165.[ISI][Medline]
- Ancel,L.W. and Fontana,W. (2000) Plasticity, evolvability and modularity in RNA. J. Exp. Zool., 288, 242283.[ISI][Medline]
- Higgs,P.G. (1993) RNA secondary structure: a comparison or real and random sequences. J. Phys. I France, 3, 4359.
- Higgs,P.G. (1995) Thermodynamic properties of transfer RNA: a computational study. J. Chem. Soc. Faraday Trans., 91, 25312540.
- Schultes,E.A., Hraber,P.T. and LaBean,T.H. (1999) Estimating the contributions of selection and self-organization in RNA secondary structure. J. Mol. Evol., 49, 7683.[ISI][Medline]
- Gultyaev,P.A., van Batenburg,F.H.D. and Pleij,C.W.A. (2002) Selective pressures on RNA hairpins in vivo and in vitro. J. Mol. Evol., 54, 18.[ISI][Medline]
- Gladyshev,G.P. (1978) On the thermodynamics of biological evolution. J. Theor. Biol., 75, 425441.[ISI][Medline]
- Gladyshev,G.P. and Eshov,Y.A. (1982) Principles of the thermodynamics of biological systems. J. Theor. Biol., 94, 301343.[ISI][Medline]
- Kierzek,E., Biala,E. and Kierzek,R. (2001) Elements of thermodynamics in RNA evolution. Acta Biochim. Polonica, 48, 485493.[ISI][Medline]
-
Wuyts,J., De Rijk,P., Van de Peer,Y., Pison,G., Rousseuw,P. and De Wachter,R. (2000) Comparative analysis of more than 3000 sequences reveals the existence of two pseudoknots in area V4 of eukaryotic small subunit ribosomal RNA. Nucleic Acids Res., 28, 46984708.
[Abstract/Free Full Text] - Hofacker,I.L., Fontana,W., Stadler,P.F., Bonhoeffer,L.S., Tacker,M. and Schuster,P. (1994) Fast folding and comparison of RNA secondary structures. Monatshefte Chem., 125, 167188.
- Swofford,D.L. (1999) Phylogenetic Analysis Using Parsimony and Other Programs (PAUP*), Version 4. Sinauer Assoc., Sunderland, MA.
- Maddison,W.P. and Maddison,D.R. (1999) MacClade: Analysis of Phylogeny and Character Evolution, Version 3.08. Sinauer Assoc., Sunderland, MA.
- Felsenstein,J. (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39, 783791.[ISI]
- Wilkinson,M., Thorley,J.L. and Upchurch,P. (2000) A chain is no longer than its weakest link: double decay analysis of phylogenetic hypotheses. Syst. Biol., 49, 754776.[ISI][Medline]
-
Thorley,J.L. and Page,R.D.M. (2000) RadCon: phylogenetic tree comparison and consensus. Bioinformatics, 16, 486487.
[Abstract/Free Full Text] - Farris,J.S., Kållersjö,M., Kluge,A.G. and Bult,C. (1995) Testing significance of incongruence. Cladistics, 10, 315319.
- Page,R.D.M. (1993) COMPONENT, Tree Comparison Software for Microsoft Windows, v. 2.0. The Natural History Museum, London.
- Nixon,K.C. and Carpenter,J.M. (1996) On simultaneous analysis. Cladistics, 12, 221241.
-
Savill,N.J., Hoyle,D.C. and Higgs,P.G. (2001) RNA sequence evolution with secondary structure constraints: comparison of substitution rate models using maximum-likelihood methods. Genetics, 157, 399411.
[Abstract/Free Full Text] - Wheeler,W.C. (1990) Combinatorial weights in phylogenetic analysis. A statistical parsimony procedure. Cladistics, 6, 269275.
- Gabashvili,I.S., Agrawal,R.K., Spahn,C.M., Grassucci,R.A., Svergun,D.I., Frank,J. and Penczek,P. (2000) Solution structure of the E. coli 70S ribosome at 11.5 Å resolution. Cell, 100, 537549.[ISI][Medline]
- Frank,J. and Agrawal,R.K. (2000) A ratchet-like inter-subunit reorganization of the ribosome during translocation. Nature, 406, 318322.[Medline]
- Hennig,W. (1966) Phylogenetic Systematics. University of Illinois Press, Urbana, IL.
-
Woese,C.R., Kandler,O. and Wheelis,M.L. (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria and Eucarya. Proc. Natl Acad. Sci. USA, 87, 45764579.
[Abstract/Free Full Text] - Reanney,D.C. (1974) On the origin of prokaryotes. Theor. Biol., 48, 243251.
- Brinkmann,H. and Philippe,H. (1999) Archaea sister group of bacteria? Indications from tree reconstruction artifacts in ancient phylogenies. Mol. Biol. Evol., 16, 817825.[Abstract]
- Gouy,M. and Li,W.-H. (1989) Phylogenetic analysis based on rRNA sequences supports the archaebacterial rather than the eocyte tree. Nature, 339, 145147.[Medline]
- De Rijk,P., Van de Peer,Y., Van den Broeck,I. and De Wachter,R. (1995) Evolution according to the large ribosomal subunit RNA. J. Mol. Evol., 41, 366375.[ISI][Medline]
- Philippe,H., Germot,A. and Moreira,D. (2000) The new phylogeny of eukaryotes. Curr. Opin. Genet. Dev., 10, 596601.[ISI][Medline]
-
Freier,S.M., Kierzek,R., Jaeger,J.A., Sugimoto,N., Caruthers,M.H., Neilson,T. and Turner,D.H. (1986) Improved free-energy parameters for prediction of RNA duplex stability. Proc. Natl Acad. Sci. USA, 83, 93739377.
[Abstract/Free Full Text] -
Jaeguer,J.A., Turner,D.H. and Zuker,M. (1989) Improved predictions of secondary structures for RNA. Proc. Natl Acad. Sci. USA, 86, 77067710.
[Abstract/Free Full Text] - Mathews,D.W., Sabina,J., Zuker,M. and Turner,D.H. (1999) Expanded sequence dependence of thermodynamic parameters provides robust prediction of RNA secondary structure. J. Mol. Biol., 288, 911940.[ISI][Medline]

G°), and statistical parameters that describe the shape, stability and uniqueness of folded conformations (



