Nucleic Acids Research Advance Access originally published online on September 18, 2006
Nucleic Acids Research 2006 34(17):4912-4924; doi:10.1093/nar/gkl472
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2006, Vol. 34, No. 17 4912-4924
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
RNA |
A set of nearest neighbor parameters for predicting the enthalpy change of RNA secondary structure formation
Department of Biochemistry & Biophysics and Center for Pediatric Biomedical Research, University of Rochester Medical Center 601 Elmwood Avenue, Box 712, Rochester, NY 14642, USA 1 Department of Chemistry, University of Rochester Box 0216, Rochester, NY 14627, USA
*To whom correspondence should be addressed. Tel: 1 585 275 1734; Fax: 1 585 275 6007; Email: david_mathews{at}urmc.rochester.edu
Received March 27, 2006. Revised June 20, 2006. Accepted June 20, 2006.
| ABSTRACT |
|---|
|
|
|---|
A complete set of nearest neighbor parameters to predict the enthalpy change of RNA secondary structure formation was derived. These parameters can be used with available free energy nearest neighbor parameters to extend the secondary structure prediction of RNA sequences to temperatures other than 37°C. The parameters were tested by predicting the secondary structures of sequences with known secondary structure that are from organisms with known optimal growth temperatures. Compared with the previous set of enthalpy nearest neighbor parameters, the sensitivity of base pair prediction improved from 65.2 to 68.9% at optimal growth temperatures ranging from 10 to 60°C. Base pair probabilities were predicted with a partition function and the positive predictive value of structure prediction is 90.4% when considering the base pairs in the lowest free energy structure with pairing probability of 0.99 or above. Moreover, a strong correlation is found between the predicted melting temperatures of RNA sequences and the optimal growth temperatures of the host organism. This indicates that organisms that live at higher temperatures have evolved RNA sequences with higher melting temperatures.
| INTRODUCTION |
|---|
|
|
|---|
RNA is more than a simple single-stranded sequence carrying genetic information as in the Central Dogma of Biology. For example, it can form tertiary structures that, such as proteins, can be catalytic. Natural and engineered RNA molecules are widely used as functional tools in enzymatic catalysis and genetic control (15). One current problem is how to predict the structures of functional RNA sequences.
Secondary structure, the sum of canonical base pairs, is stronger (69) and forms faster (10) than tertiary structure. Therefore, secondary structure can largely be determined without knowledge of tertiary structure. Comparative sequence analysis is a standard technique for determining the secondary structure of homologous RNA sequences (1113). When only a few or even a single sequence is available, the secondary structure at 37°C can be predicted by free energy minimization algorithms (1417) using a set of empirical free energy parameters, determined from optical melting experiments (1721). Each parameter only depends on the sequence identity of nucleotides in the motif and in adjacent base pairs and the total free energy is the sum of nearest neighbor terms. The average sensitivity (the percentage of known base pairs that are correctly predicted) of free energy minimization prediction has been benchmarked as high as 72.8 ± 9.4% for a diverse database of sequences having fewer than 800 nt (17). Furthermore, experimentally determined constraints can improve this accuracy of prediction up to 84% (17,18) for sequences with <6% pseudoknotted (non-nested) base pairs (17). Partition function prediction of base pair probabilities can be used to identify base pairs in the predicted lowest free energy structure that are much more likely than average to be in the known secondary structure (22,23). For example, 91.0% of base pairs in the lowest free energy structure with pairing probability of 0.99 or higher are contained in the known structure, on average (22). The high accuracy of thermodynamic structure prediction (17) demonstrates that many RNA secondary structures can be determined from sequences, without knowledge of any tertiary contacts or protein interactions.
The current set of free energy nearest neighbor parameters for predicting the free energy of RNA secondary structure, however, is limited to application at 37°C. Many organisms, thermophiles and psychrophiles, live at temperatures far from 37°C and many experiments are conducted at other temperatures. The prediction of secondary structure of RNA at arbitrary temperature would expand our knowledge of structure and evolution in the RNA world. Moreover, it would facilitate studying and designing functional RNA molecules at temperatures other than 37°C. The enthalpy nearest neighbor parameters can be used in conjunction with available free energy nearest neighbor parameters for 37°C to determine free energy nearest neighbors at other temperatures. But the most recent enthalpy parameters were derived in 1995 using a simple model (24). At that time, no themes had emerged for the sequence-dependent stability of internal loops. Subsequently, the nearest neighbor model for free energy change at 37°C was significantly improved (17) using experimental results. Therefore, we applied the principles of the current free energy nearest neighbor model (17,18) to determine a complete set of enthalpy nearest neighbor parameters using the available optical melting data.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Database of experiments
The database of experimental data for derivation of enthalpy parameters is included in Supplementary Data. It includes 130 hairpin loops (2531), 37 bulge loops (32,33), 337 internal loops (17,18,3449) (99 of which are 2 x 2 internal loops), 74 multibranch loops (50,51) and 43 coaxial stacking models (5255).
Derivation and refinement of enthalpy parameters
Canonical base pairs
The enthalpies of WatsonCrick and GU base pairs were derived by Xia et al. (21) and Mathews et al. (18), respectively.
Dangling ends and terminal mismatches
Dangling ends are unpaired nucleotides adjacent to canonical pairs and their enthalpy parameters were compiled previously (24). Dangling ends on terminal GU pairs are treated similar to dangling ends on terminal AU pairs. Terminal mismatches are non-canonical pairs at the end of helixes. The enthalpy parameters of terminal mismatches are taken from another compilation (20), with the exception of mismatches on terminal GU pairs, which were measured recently (30).
If a terminal mismatch has the potential to pair canonically, the values of AC and CA mismatches are used for the purinepyrimidine mismatch and pyrimidinepurine mismatches, respectively. This is important for partition function calculations, where all possible secondary structures are considered.
Hairpin loops
The experimental enthalpies of hairpin loop formation are calculated from published experimental data (2531) with the following equation:
![]() |
is the experimental value for unfolding the hairpin loop with stem,
is calculated by the INN-HB parameters (18,21), without an intermolecular initiation term. The hairpin loop enthalpy parameters are estimated by linear regression using the same model as free energy nearest neighbor parameters (17), except that the GG first mismatch bonus observed for free energy does not apply for enthalpy because the bonus was not statistically significant for enthalpy. The GG stability bonus is therefore entropic in nature, consistent with the observation that GG mismatches are dynamic (56), i.e. they sample more than one single microstate on short timescales.
The enthalpies of hairpin loops are estimated by the following equation:
![]() |
(UU or GA first mismatch but not AG) is applied to loops with first mismatches of UU or GA (G on the 5' side and A on 3' side of loop). The oligo-C penalty applies only to loops composed of all C residues and, if n > 3, is calculated with
(oligo-C loops, n > 3) = An + B. For hairpinloops composed entirely of 3 C residues, the
(oligo-C loops, n = 3) is applied. The enthalpy parameters are listed in Table 1 and the database of measured loop enthalpies is available as Supplementary Data. In the absence of data, for hairpin loops longer than 9 nt, the initiation enthalpy is approximated with the initiation term for a hairpin of 9 nt. This assumes that additional instability of hairpin loops as the loop lengthens derives from the entropy (57).
|
The measured free energies at 37°C of some special hairpin loops of 3, 4 or 6 unpaired nucleotides (30,31,3436) are either more or less stable by 0.9 kcal/mol than the model predicts. The enthalpies for each of these sequences are listed in a separate lookup table (Table 2), to be consistent with the free energy parameters.
|
Bulge loops
RNA secondary structure is destabilized by bulge loops, which are an interruption of helical structure in one strand only (32,37,38). The initiation terms,
for bulge loops of 13 nt, are listed in Table 3. They are the average values of experimental data (32,33), calculated using the following equation:
![]() |
is the stacking enthalpy of the base pairs in the duplex without the bulge that flank the bulge loop in the duplex with the bulge. Because the difference of initiation enthalpies between 2 and 3 nt bulges is almost zero, it is assumed that the increasing instability for longer bulges (n
4) comes from the entropy of the loop closure (39,57). Thus, the initiation enthalpy for bulges longer than 3 nt is approximated as the 3 nt bulge enthalpy.
|
Assuming that helical stacking is continuous between the adjacent helices for single bulges, but is interrupted by longer bulges (39,40), the enthalpies of bulge loops are calculated with the following equation:
![]() |
is the canonical helix stacking enthalpy applied for the two closing base pairs as though the helix was not interrupted by the bulge loop.
Internal loops
Internal loop enthalpies were calculated from experimental data (17,18,3449) using the following equation:
![]() |
1 x 1 Internal loops (single mismatches)
For single non-canonical pairs (1 x 1 internal loops), the loop enthalpies are approximated by the following equation:
![]() |
is the enthalpy of initiation for a single non-canonical pair;
is the penalty for each AU or GU closing base pair;
is a bonus for a GG pair in a 1 x 1 loop; and
is a bonus for a 5'RU/3'YU stack in a 1 x 1 loop, where R is a purine and Y is a pyrimidine.
2 x 2 Internal loops (tandem mismatches)
The 2 x 2 internal loops, also called tandem mismatches, interrupt helical RNA with two opposing unpaired nucleotides on each strand. Many of the sequence-symmetric 2 x 2 loops have been studied experimentally (17,18, 3449) and their enthalpies are assembled in a periodic table (Table 4). Symmetric sequences that have not been measured are approximated by averaging the most adjacent columns that have been measured. For asymmetric 2 x 2 loops, the enthalpies are approximated using the following equation:
![]() |
(12.5 ± 2.7 kcal/mol) is applied to loops with a GG pair adjacent to an AA or any non-canonical pair with a pyrimidine and
p (2.4 ± 3.1 kcal/mol) is applied to loops with an AG or GA pair adjacent to a UC, CU or CC pair or with a UU pair adjacent to an AA pair.
|
Other internal loops
The enthalpies of other internal loops are approximated using the following equation:
![]() |
is the enthalpy of initiation for a loop of n nucleotides;
is a penalty for loops with unequal numbers of nucleotides on each side, with n1 and n2 the number of nucleotides on each side;
is applied for each sequence-specific first mismatch (Table 5), but it is not applied to loops of the form 1 x (n 1) with n > 3 (n is the total number of unpaired bases). Special first mismatch bonuses were determined for 2 x 3 and 1 x 2 internal loops with separate linear regressions.
|
Moreover, the free energy parameters (Table 6) were updated for internal loops based on recent experimental measurements. The free energy parameters were obtained using the method of Mathews et al. (17). The recent data include the 3 x 3 loops from Chen et al. (41), but excluding the 3 x 3 loops with a middle GA pair. The middle GA pair is shown to enhance stability and this extra stability cannot be predicted by the nearest neighbor parameter set used in this work (41).
|
Coaxial stacking
Coaxial stacking, which is a favorable interaction of two helices stacked end to end, occurs in multibranch loops and exterior loops. Stability increments for coaxial stacking were measured with a structure composed of a short oligonucleotide bound to a single-stranded end of a stemloop structure, creating a helical interface (5255). The enthalpy of coaxial stacking is quantified as follows:
![]() |
H°(correction) is the enthalpy for displacing a 3' dangling end on the stemloop structure if one is present.
When the helixes have no intervening mismatches, the enthalpy bonus is approximated by the nearest neighbor parameter (21) of a base pair in a helix. The excess enthalpy above the helical stacking nearest neighbor from Xia et al. (21),
, for each measured interface was calculated. With flush interfaces, i.e. with no intervening mismatch, and no strand extensions beyond the interface, the average excess enthalpy is 1.53 ± 1.45 kcal/mol. For interfaces followed by strand extensions, the excess enthalpy is 1.82 ± 1.13 kcal/mol. As the excess enthalpy changes are not statistically significant, coaxial stacking of helices with no intervening nucleotides is modeled with the enthalpy parameter in a helix.
With one intervening nucleotide from each strand, two helices can stack with an intervening mismatch between them. There are two stack increments: one is the mismatch stack at the end of one helix with continuous backbone, which is equal to the mismatch stacking parameter on a helix, and the other is the mismatch stack with discontinuous backbone, which is modeled as sequence independent. The average enthalpy of sequence independent stacks is 8.46 ± 2.75 kcal/mol. In addition to this, an enthalpy bonus of 0.4 or 0.2 kcal/mol are applied to intervening mismatches composed of nucleotides that could form a WatsonCrick or a GU base pair, respectively. These bonuses are identical to free energy increments that are used and are empirically found to improve structure prediction accuracy.
Multibranch loops
The parameters are determined by linear regression of experimental data for three- and four-way multibranch loops (50,51). In a nearest neighbor model, the bimolecular enthalpy (
) for the formation of the duplex with a multibranch loop is given by the following equation:
![]() |
H° predicted from nearest neighbor parameters for WatsonCrick pairs (without including bimolecular initiation so that
appears only once). The
is a term that accounts for the stacking enthalpy increment of the nucleotides that can stack on the hairpin loop stems to form a modified motif after the two strands have dissociated. This is the most favorable configuration with coaxial stacking of helixes (in the case of four-way multibranch loops) or of the stacking of unpaired nucleotides.
is the experimental value which is taken from
versus ln(CT/4) plots. The multibranch loop enthalpy initiation term (
) can be calculated from the above equation. The enthalpy of multibranch loops (
) is then modeled as the sum of two terms, initiation and stacking:
![]() |
![]() |
is a strain enthalpy that only applies to three-way multibranch loops with fewer than two unpaired nucleotides. The asym term is the average asymmetry that reflects the distribution of unpaired nucleotides, which is defined by the following equation:
![]() |
|
The average asymmetry is limited to 2.0, following the rules suggested by free energy parameters. Asymmetry cannot be applied, however, by dynamic programming algorithms for secondary structure prediction (17,22). Thus, the b term was excluded for secondary structure prediction and the parameters a and c were optimized by finding the parameters that lead to the highest average sensitivity of secondary structure prediction by free energy minimization. The maximum sensitivity of prediction was found with a = 30.0 kcal/mol and c = 2.2 kcal/mol.
Database of RNA secondary structures
The revised enthalpy nearest neighbor model was tested with RNA sequences with known secondary structure from organisms with known optimal growth temperature. The structures were taken from comparative analysis databases (4249,58,59). Small (16S) subunit rRNA sequences are divided into domains as defined by Jaeger et al. (39). Large (23S) subunit rRNA sequences are divided into domains of fewer than 700 nt each (18). The optimal growth temperatures of different organisms were taken from the Prokaryotic Growth Temperature Database (http://pgtdb.csie.ncu.edu.tw/) and the DSMZ German Collection of Microorganisms and Cell Cultures website (http://www.dsmz.de/). Only the RNA sequences of mesophiles (organisms living at temperatures between 10 and 60°C, but with organisms living at 37°C excluded) were chosen to test the sensitivity and positive predictive value (PPV) of secondary structure prediction. Considering that posttranscriptional modification (60) and high pressure (61) in the thermophiles and hyperthermophiles (organism living above 60°C) would change the thermodynamics of secondary structure formation, sequences from these organisms were excluded. A list of sequences and optimal growth temperatures used are available in Supplementary Data.
Accuracy of secondary structure prediction
The accuracy of structure prediction is determined by the sum of the canonical base pairs correctly predicted. A base pair is considered correctly predicted even if it is shifted by 1 nt on one side. For example a base pair between nucleotides i and j is considered to be correctly predicted if any of these base pairs is predicted: i to j, i to j 1, i to j + 1, i 1 to j or i + 1 to j. The predicted base pair between i 1 and j + 1, however, is not considered to be correct. This scoring scheme reflects the uncertainty of exact base pair matches in comparative sequence analysis and the possibility for dynamics in base pairing. The values of sensitivity and PPV of this scoring scheme are
23% higher than when determined with exact base pairing only, where only the i to j base pair is considered to be correct. The prediction accuracies are shown in Supplementary Tables 11 and 12. Each table includes accuracies determined when pairs can be shifted and when pairs must be an exact match.
Availability of parameters
Machine-readable tables of the enthalpy parameters are available on the Mathews lab website (http://rna.urmc.rochester.edu/).
| RESULTS |
|---|
|
|
|---|
Nearest neighbor model parameters
In the nearest neighbor model of free energy (17,18), the parameters for WatsonCrick base pairs are well determined at 37°C with errors <10%, or
0.10.2 kcal/mol (21). For other motifs such as loops and GU base pairs, individual nearest neighbor free energy increments are often determined with an error <0.5 kcal/mol (17,18). In order to extend the current model to predict free energy at temperatures other than 37°C, enthalpy parameters consistent with the current nearest neighbor model are required. The free energy at arbitrary temperature for each parameter is then
![]() | (1) |
H°) and entropy (
S°) are assumed to be temperature independent. As described in Materials and Methods, parameters for enthalpy prediction, compatible with the free energy model, were determined using available experimental data from optical melting experiments.
Experimental studies consistently demonstrate that enthalpy and entropy measurements have considerably larger percent error than free energy measurements. Free energy at 37°C is determined with greater precision because of correlation between errors in enthalpy and entropy (21). The larger experimental errors in enthalpy result in larger percent errors for enthalpy nearest neighbor parameters than free energy parameters. The enthalpy of RNA secondary structure is known to be a function of temperature. A linear model for heat capacity change predicts the following:
![]() | (2) |
![]() | (3) |
is a constant heat capacity change and T0 is a chosen reference temperature. It is hypothesized that the heat capacity change arises from the extent of stacking increasing with decreasing temperature. Thus,
is negative because single strands are more organized at low rather than high temperature (6267). The
can be estimated by linear fits of enthalpy and entropy changes as a function of melting temperature (50,51,62) or determined by isothermal titration calorimetry at multiple temperatures (68,69). However, the effects of heat capacity change on enthalpy and entropy are antagonistic in terms of free energy change:
![]() | (4) |
Therefore, for certain
T (
T = T T0),
can be neglected because the effects are compensated in terms of free energy. To calculate the compensation for a set of RNA duplexes (62), the free energy,
G°, was derived directly from Equation 4 assuming that the entropy and enthalpy were independent of temperature. Then the temperature-dependent free energy,
, was calculated with the measured non-zero
from Equations 24. The free energy difference, 
G° =
G°, increases with the deviation of temperature from T0 (37°C) (Figure 1). The exact 
G° for each duplex is shown in Table 8 for different temperatures. The experimental error in individual loop free energy nearest neighbor parameters at 37°C is as large as 0.5 kcal/mol (17), which corresponds to roughly a factor of 2 in equilibrium constant. Thus, the small 
G° for helices suggests that the approximation of
= 0 is reasonable for predictions from
10 to 60°C. Therefore, the enthalpy parameters derived here assume
= 0 and are most accurate at predicting free energy change close to 37°C.
|
|
Dynamic programming algorithm for RNA secondary structure prediction
RNAstructure is a program for RNA secondary structure prediction and analysis. It includes prediction of secondary structure by free energy minimization (17), prediction of base pair probabilities using a partition function (22), the efn2 function for predicting the free energy change of folding given a sequence and secondary structure (18), and the Dynalign algorithm for finding the secondary structure common to two sequences (70). RNAstructure was revised to make predictions at user-defined temperature. Because large internal loops are more likely at high temperature, the previous limitation on internal loop size (fewer than 30 unpaired nucleotides) (17,18,22) was removed by implementing the method of Lyngsø et al. (71). This provides an O(N3) algorithm that can predict internal loops of arbitrary size. Benchmarks for calculation time and memory requirement with and without this revision are shown in Table 9.
|
Sensitivities and PPVs of structure predictions
The enthalpy nearest neighbor parameters were compared with the previous parameters and model for enthalpy and free energy assembled by Serra and Turner (24) by predicting the secondary structures of RNA sequences with known secondary structures. Sensitivities, the percent of known base pairs that are correctly predicted, using both sets of parameters are shown in Figure 2 (detailed numbers are in Supplementary Table 11A) for different types of structural RNA sequences. The known structures of these sequences were taken from comparative analysis databases (4249,58,59). The average sensitivity is improved from 65.2 to 68.9% using the new parameters assembled here. Sensitivities are improved for most types of the RNA. The exceptions are 5S rRNA and Group II introns.
|
To test the enthalpy parameters, the accuracy of secondary structure prediction at optimal growth temperature was compared to the accuracy of structure prediction at 37°C for organisms that do not grow optimally at 37°C for several types of RNAs (Table 10). The comparison of predictions was shown in different groups divided by optimal growth temperature. The organisms in each group grow optimally in a certain range of temperatures. Compared to the prediction at 37°C, structure prediction at optimal growth temperature performs better for the organism living at temperatures between 22 and 37°C, but is worse at other optimal growth temperatures. This suggests that when enthalpy parameters are assumed to be temperature independent, their utility as a tool for deriving free energy parameters for use in predicting the lowest free energy structure is limited to a narrow temperature range. Small errors in enthalpy change parameters have a larger effect on free energy change parameter determination (Equation 1), the farther the temperature is from 37°C.
|
Figure 3 shows the PPV for base pairs from the lowest free energy structure for base pairs with different pairing probabilities (see detailed numbers in Supplementary Table 12A). They are predicted using a partition function calculation at optimal growth temperature (22). PPV is the percentage of predicted base pairs that are found in the known structure. The average PPV of all pairs in the lowest free energy structures is only 62.0%, which is lower than the sensitivity (68.9%). This suggests that the model over-predicts base pairs and/or that the base pairs may not be annotated completely in the structures from comparative analysis (22). For example, if a base pair is completely conserved, then it is sometimes not annotated by comparative analysis (4249,58,59). Base pair probabilities for all possible pairs are calculated with a partition function and grouped by different thresholds. The PPV is significantly higher for predicted base pairs in the lowest free energy structure with higher pairing probability. The average PPV is up to 90.4% for those known base pairs having probability of 0.99 or above. It has been demonstrated previously that base pair probabilities predicted at 37°C can be used to find pairs with high PPV (22). The fact that this holds true at other temperatures shows that the enthalpy parameters are robust for base pair probability prediction.
|
The fact that the accuracy of secondary structure prediction is sensitive to the accuracy of the nearest neighbor parameters, but the base pair probabilities remain a robust measure of confidence for a wide variety of temperatures is consistent with a previous work. Layton and Bundschuh (72) demonstrated that the predicted lowest free energy structure was often changed in repeated structure predictions after random adjustments of the nearest neighbor parameters within the limits of their error. Base pair probabilities, however, were less perturbed by changes in the parameters (72). With the extrapolation of nearest neighbor parameters to temperatures far from 37°C, the accuracy of the predicted lowest free energy structure is often reduced as compared to structure prediction at 37°C. The ability of the partition function predicted base pair probabilities to determine base pairs predicted with a higher confidence is unchanged with secondary structure prediction at temperatures far from 37°C. This is because the determination of base pair probabilities is not as perturbed by errors in the nearest neighbor parameters.
An example of secondary structure prediction at 37°C and at optimal growth temperature of 30°C is shown in Figure 4 for a tRNA sequence. The base pairs with higher predicted pairing probability (color annotated according to pairing probability in Figure 4B and C) are pairs predicted with greater confidence. For this sequence, secondary structure prediction is more accurate and the fidelity of structure prediction (as judged by the percent of high probability pairs) is improved at optimal growth temperature.
|
Correlation between melting temperature and optimal growth temperature
Melting temperature, Tm, is defined as the temperature at which half of strands are unpaired. Assuming that an RNA melts with a two-state transition, the melting temperature (in Kelvins) of a single-stranded RNA structure can be predicted by Tm =
H°/
S° (73). For example, the predicted melting temperatures (°C) for all hairpins in the database of optically melted sequences (Supplementary Data) (2531) are plotted in Figure 5 as a function of experimentally determined Tm. This shows that the parameters adequately reflect the thermal stabilities of RNA sequences with known Tm. Better correlation was found at higher temperatures. This is expected because most hairpins were measured with high melting temperatures in experiments (2531).
|
Melting temperature reflects the thermal stability of a structure. Therefore RNA structures in organisms living at higher temperature are expected to have higher melting temperatures. Figure 6A shows a plot of predicted melting temperatures of the lowest free energy structure versus organism optimal growth temperature (1090°C). A strong correlation (linear correlation coefficient of 0.797) is found between the melting temperature and the optimal growth temperature for different types of RNA structures. On the other hand, there appears to be less correlation between nucleotide content and optimal growth temperature (Figure 6BD) for diverse types of RNA, although uracil content of 16S rRNA of thermophiles and psychrophiles were found recently to correlate inversely with their optimal growth temperatures (74). Evidently, the thermal stability of RNA structure is not simply controlled by base content. Organisms that grow at high temperature have apparently evolved RNA secondary structures with a combination of motifs that provide thermal stability.
|
| DISCUSSION |
|---|
|
|
|---|
The nearest neighbor parameters for enthalpy were derived here using similar rules as for free energy nearest neighbor parameters at 37°C (17). This makes these parameters useful for determining free energy parameters at arbitrary temperature that are compatible with dynamic programming algorithms for secondary structure prediction. Some of the enthalpy parameters have large percent standard errors as compared with the parameters of free energy. This reflects the larger errors in the experimental results of enthalpy than free energy, but it also suggests that enthalpy may be more sequence dependent than free energy. This sequence dependence cannot be determined using the currently available database of optical melting experiments and suggests a need for further optical melting experiments on model RNA systems.
Another source of error comes from the assumption that the enthalpy and entropy are independent of the temperature in both the model and in the analysis of optical melting experiments. When the temperature is too far from 37°C, the sensitivity of prediction is expected to be worse than 68.9% on average because of the approximation of
. For example, experiments demonstrate cold denaturation of RNA (68,69), but the nearest neighbor model does not reproduce those results. Further experiments by isothermal titration calorimetry would be needed to provide the data for a model that can include a non-zero heat capacity change.
There are common error sources that should be considered for the prediction of base pairs. Free energy minimization assumes that the secondary structure is at equilibrium. The nearest neighbor model is an incomplete representation of structural free energy. The parameters average some sequence-specific effects and were derived from a limited set of experiments. Some RNA sequences, in particular mRNA, may sample multiple structures at equilibrium. The parameters are derived from experimental data at 1 M NaCl, whereas the salt concentration in different organisms may be very different.
In spite of all these limitations, the nearest neighbor model predicts secondary structures with a 72.8% average sensitivity (17). Recent experimental results on the self-folding of the 16S rRNA 5' domain (75) support the assumption of thermodynamic control of folding pathway. Moreover, the base pair prediction with the partition function can be used to determine pairs predicted with greater confidence (22).
In spite of the fact that the enthalpy parameters have larger percent errors than the free energy parameters for 37°C, the enthalpy parameters are able to predict optical melting temperatures for small model sequences. Predicted melting temperatures for structural RNA sequences correlate well with optimal growth temperature, suggesting that these parameters capture many of the sequence-dependent features of RNA folding enthalpy change.
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary Data are available at NAR Online.
| ACKNOWLEDGEMENTS |
|---|
The authors thank Rahul Tyagi and Andrew V. Uzilov for helpful discussions. D.H.M. is an Alfred P. Sloan Research Fellow. This work was supported by National Institutes of Health Grants GM22939 to D.H.T. and GM076485 to D.H.M. Funding to pay the Open Access publication charges for this article was provided by National Institutes of Health.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Nelson, P., Kiriakidou, M., Sharma, A., Maniataki, E., Mourelatos, Z. (2003) The microRNA world: small is mighty Trends Biochem. Sci, . 28, 534540[CrossRef][ISI][Medline] .
- Doudna, J. and Cech, T. (2002) The chemical repertoire of natural ribozymes Nature, 418, 222228[CrossRef][Medline] .
- Walter, P. and Blobel, G. (1982) Signal recognition particle contains a 7S RNA essential for protein translocation across the endoplasmic reticulum Nature, 299, 691698[CrossRef][Medline] .
- Lau, N.C., Lim, L.P., Weinstein, E.G., Bartel, D.P. (2001) An abundant class of tiny RNAs with probable regulatory roles in Caenorhabditis elegans Science, 294, 858862
[Abstract/Free Full Text] . - Lagos-Quintana, M., Rauhut, R., Lendeckel, W., Tuschl, T. (2001) Identification of novel genes coding for small expressed RNAs Science, 294, 853858
[Abstract/Free Full Text] . - Onoa, B. and Tinoco, I., Jr. (2004) RNA folding and unfolding Curr. Opin. Struct. Biol, . 14, 374379[CrossRef][ISI][Medline] .
- Crothers, D.M., Cole, P.E., Hilbers, C.W., Schulman, R.G. (1974) The molecular mechanism of thermal unfolding of Escherichia coli formylmethionine transfer RNA J. Mol. Biol, . 87, 6388[CrossRef][ISI][Medline] .
- Mathews, D.H., Banerjee, A.R., Luan, D.D., Eickbush, T.H., Turner, D.H. (1997) Secondary structure model of the RNA recognized by the reverse transcriptase from the R2 retrotransposable element RNA, 3, 116[Abstract] .
- Banerjee, A.R., Jaeger, J.A., Turner, D.H. (1993) Thermal unfolding of a group I ribozyme: the low temperature transition is primarily a disruption of tertiary structure Biochemistry, 32, 153163[CrossRef][Medline] .
- Woodson, S.A. (2000) Recent insights on RNA folding mechanisms from catalytic RNA Cell Mol. Life Sci, . 57, 796808[CrossRef][ISI][Medline] .
- James, B.D., Olsen, G.J., Pace, N.R. (1989) Phylogenetic comparative analysis of RNA secondary structure Methods Enzymol, . 180, 227239[ISI][Medline] .
- Pace, N.R., Thomas, B.C., Woese, C.R. (1999) Probing RNA structure, function, and history by comparative analysis In Gesteland, R.F., Cech, T.R., Atkins, J.F. (Eds.). The RNA World, 2nd edn Cold Spring Harbor Cold Spring Harbor Laboratory Press pp. 113141 .
- Woese, C.R., Gutell, R.R., Gupta, R., Noller, H.F. (1983) Detailed analysis of the higher order structure of 16S-like ribosomal ribonucleic acids Microbiol. Rev, . 47, 621669
[Free Full Text] . - Andronescu, M., Aguirre-Hernandez, R., Condon, A., Hoos, H.H. (2003) RNAsoft: A suite of RNA secondary structure prediction and design software tools Nucleic Acids Res, . 31, 34163422
[Abstract/Free Full Text] . - Hofacker, I.L. (2003) Vienna RNA secondary structure server Nucleic Acids Res, . 31, 34293431
[Abstract/Free Full Text] . - Zuker, M. (2003) Mfold web server for nucleic acid folding and hybridization prediction Nucleic Acids Res, . 31, 34063415
[Abstract/Free Full Text] . - Mathews, D.H., Disney, M.D., Childs, J.L., Schroeder, S.J., Zuker, M., Turner, D.H. (2004) Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure Proc. Natl Acad. Sci. USA, 101, 72877292
[Abstract/Free Full Text] . - Mathews, D.H., Sabina, J., Zuker, M., Turner, D.H. (1999) Expanded sequence dependence of thermodynamic parameters provides improved prediction of RNA Secondary Structure J. Mol. Biol, . 288, 911940[CrossRef][ISI][Medline] .
- Turner, D.H. (2000) Conformational changes In Bloomfield, V., Crothers, D., Tinoco, I. (Eds.). Nucleic Acids, Sausalito, CA University Science Books pp. 259334 .
- Xia, T., Mathews, D.H., Turner, D.H. (1999) Thermodynamics of RNA secondary structure formation In Soll, D.G., Nishimura, S., Moore, P.B. (Eds.). Prebiotic Chemistry, Molecular Fossils, Nucleosides, and RNA, NY Elsevier pp. 2147 .
- Xia, T., SantaLucia, J., Jr, Burkard, M.E., Kierzek, R., Schroeder, S.J., Jiao, X., Cox, C., Turner, D.H. (1998) Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with WatsonCrick pairs Biochemistry, 37, 1471914735[CrossRef][Medline] .
- Mathews, D.H. (2004) Using an RNA secondary structure partition function to determine confidence in base pairs predicted by free energy minimization RNA, 10, 11781190
[Abstract/Free Full Text] . - McCaskill, J.S. (1990) The equilibrium partition function and base pair probabilities for RNA secondary structure Biopolymers, 29, 11051119[CrossRef][ISI][Medline] .
- Serra, M.J. and Turner, D.H. (1995) Predicting thermodynamic properties of RNA Methods Enzymol, . 259, 242261[ISI][Medline] .
- Giese, M.R., Betschart, K., Dale, T., Riley, C.K., Rowan, C., Sprouse, K.J., Serra, M.J. (1998) Stability of RNA hairpins closed by wobble base pairs Biochemistry, 37, 10941100[CrossRef][Medline] .
- Serra, M.J., Lyttle, M.H., Axenson, T.J., Schadt, C.A., Turner, D.H. (1993) RNA hairpin loop stability depends on closing pair Nucleic Acids Res, . 21, 38453849
[Abstract/Free Full Text] . - Serra, M.J., Axenson, T.J., Turner, D.H. (1994) A model for the stabilities of RNA hairpins based on a study of the sequence dependence of stability for hairpins of six nucleotides Biochemistry, 33, 1428914296[CrossRef][Medline] .
- Serra, M.J., Barnes, T.W., Betschart, K., Gutierrez, M.J., Sprouse, K.J., Riley, C.K., Stewart, L., Temel, R.E. (1997) Improved parameters for the prediction of RNA hairpin stability Biochemistry, 36, 48444851[CrossRef][Medline] .
- Groebe, D.R. and Uhlenbeck, O.C. (1988) Characterization of RNA hairpin loop stability Nucleic Acids Res, . 16, 1172511735
[Abstract/Free Full Text] . - Dale, T., Smith, R., Serra, M.J. (2000) A test of the model to predict unusually stable RNA hairpin loop stability RNA, 6, 608615[Abstract] .
- Antao, V.P. and Tinoco, I., Jr. (1992) Thermodynamic parameters for loop formation in RNA and DNA hairpin tetraloops Nucleic Acids Res, . 20, 819824
[Abstract/Free Full Text] . - Longfellow, C.E., Kierzek, R., Turner, D.H. (1990) Thermodynamic and spectroscopic study of bulge loops in oligoribonucleotides Biochemistry, 29, 278285[CrossRef][Medline] .
- Znosko, B.M., Silvestri, S.B., Volkman, H., Boswell, B., Serra, M.J. (2002) Thermodynamic parameters for an expanded nearest-neighbor model for the formation of RNA duplexes with single nucleotide bulges Biochemistry, 41, 1040610417[CrossRef][Medline] .
- Shu, Z. and Bevilacqua, P.C. (1999) Isolation and characterization of thermodynamically stable and unstable RNA hairpins from a triloop combinatorial library Biochemistry, 38, 1536915379[CrossRef][Medline] .
- Proctor, D.J., Schaak, J.E., Bevilacqua, J.M., Falzone, C.J., Bevilacqua, P.C. (2002) Isolation and characterization of stable tetraloops with the motif YNMG that participates in tertiary interactions Biochemistry, 41, 1206212075[CrossRef][Medline] .
- Laing, L.G. and Hall, K.B. (1996) A model of the iron responsive element RNA hairpin loop structure determined from NMR and thermodynamic data Biochemistry, 35, 1358613596[CrossRef][Medline] .
- Groebe, D.R. and Uhlenbeck, O.C. (1989) Thermal stability of RNA hairpins containing a four-membered loop and a bulge nucleotide Biochemistry, 28, 742747[CrossRef][Medline] .
- Fink, T.R. and Crothers, D.M. (1972) Free energy of imperfect nucleic acid helices,I. The bulge defect J. Mol. Biol, . 66, 112[CrossRef][ISI][Medline] .
- Jaeger, J.A., Turner, D.H., Zuker, M. (1989) Improved predictions of secondary structures for RNA Proc. Natl Acad. Sci. USA, 86, 77067710
[Abstract/Free Full Text] . - Weeks, K.M. and Crothers, D.M. (1993) Major groove accessibility of RNA Science, 261, 15741577
[Abstract/Free Full Text] . - Chen, G., Znosko, B.M., Jiao, X., Turner, D.H. (2004) Factors affecting thermodynamic stabilities of RNA 3 x 3 internal loops Biochemistry, 43, 1286512876[CrossRef][Medline] .
- Gutell, R.R. (1994) Collection of small subunit (16S- and 16S-like) ribosomal RNA structures Nucleic Acids Res, . 22, 35023507
[Abstract/Free Full Text] . - Gutell, R.R., Gray, M.W., Schnare, M.N. (1993) A compilation of large subunit (23S- and 23S-like) ribosomal RNA structures Nucleic Acids Res, . 21, 30553074
[Free Full Text] . - Schnare, M.N., Damberger, S.H., Gray, M.W., Gutell, R.R. (1996) Comprehensive comparison of structural characteristics in eukaryotic cytoplasmic large subunit (23S-like) ribosomal RNA J. Mol. Biol, . 256, 701719[CrossRef][ISI][Medline] .
- Szymanski, M., Specht, T., Barciszewska, M.Z., Barciszewski, J., Erdmann, V.A. (1998) 5S rRNA data bank Nucleic Acids Res, . 26, 156159
[Abstract/Free Full Text] . - Sprinzl, M., Horn, C., Brown, M., Ioudovitch, A., Steinberg, S. (1998) Compilation of tRNA sequences and sequences of tRNA genes Nu


















(solid line) was calculated from
(



