Nucleic Acids Research Advance Access originally published online on June 16, 2009
Nucleic Acids Research 2009 37(Web Server issue):W281-W286; doi:10.1093/nar/gkp477
Nucleic Acids Research, 2009, Vol. 37, No. suppl_2 W281-W286
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
RNAmutants: a web server to explore the mutational landscape of RNA secondary structures
Jerome Waldispühl1,2,
Srinivas Devadas2,3,
Bonnie Berger1,2,* and
Peter Clote4,*
1Department of Mathematics, 2Computer Science and Artificial Intelligence Laboratory, MIT, Cambridge, MA 02139, 3Electrical Engineering and Computer Science, MIT, Cambridge and 4Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
*To whom correspondence should be addressed. Tel: +1 617 552 1332; Fax: +1 617 552 2011; Email: clote{at}bc.edu Correspondence may also be addressed to Bonnie Berger. Tel: +1 617 253 1827; Fax: +1 617 258 5429; Email: bab{at}mit.edu
Received February 26, 2009. Revised May 6, 2009. Accepted May 17, 2009.
 |
ABSTRACT
|
|---|
The history and mechanism of molecular evolution in DNA have
been greatly elucidated by contributions from genetics, probability
theory and bioinformatics—indeed, mathematical developments
such as Kimura's neutral theory, Kingman's coalescent theory
and efficient software such as
BLAST,
ClustalW,
Phylip, etc.,
provide the foundation for modern population genetics. In contrast
to DNA, the function of most noncoding RNA depends on tertiary
structure, experimentally known to be largely determined by
secondary structure, for which dynamic programming can efficiently
compute the minimum free energy secondary structure. For this
reason, understanding the effect of pointwise mutations in RNA
secondary structure could reveal fundamental properties of structural
RNA molecules and improve our understanding of molecular evolution
of RNA. The web server
RNAmutants provides several efficient
tools to compute the ensemble of low-energy secondary structures
for all
k-mutants of a given RNA sequence, where
k is bounded
by a user-specified upper bound. As we have previously shown,
these tools can be used to predict putative deleterious mutations
and to analyze regulatory sequences from the hepatitis C and
human immunodeficiency genomes. Web server is available at
http://bioinformatics.bc.edu/clotelab/RNAmutants/,
and downloadable binaries at
http://rnamutants.csail.mit.edu/.
 |
INTRODUCTION
|
|---|
Understanding the molecular evolution of DNA has proven essential
to modern biology. One of the main fields that has contributed
to our understanding of molecular evolution is population genetics,
in its modern form founded by Fisher (
1) and Wright (
2) in the
early part of the last century, when they posed and partially
solved the question of expected time (number of generations)
for gene allele fixation or extinction, known subsequently as
the (discrete) Fisher–Wright problem. This difficult problem
of probability theory was solved using various techniques, including
the Fokker–Planck single-variable diffusion equation (
1–4),
the coalescent (
5,
6), and a direct analysis of Markov chains
(
7). The Fisher–Wright model forms the foundation of Kimura's
widely accepted
neutral theory of molecular evolution, now a
cornerstone of modern genetics (
8).
A mutation in a protein coding gene may be deleterious depending on whether it causes a change of the coded amino acid. A measure of selective pressure on protein coding genes is the term Ka/Ks (also known as dN/dS), which is the ratio of the rate of nonsynonymous substitutions (Ka) to synonymous substitutions in a protein coding region (CDS). In contrast, a mutation in a nonprotein coding RNA gene may be deleterious if the underlying functional structure is changed. At present, there is no widely adopted measure of selective pressure in noncoding RNA genes; however, as explained in Waldispühl et al. (9), RNAmutants can be used to quantify the deleterious nature of pointwise mutations in noncoding RNA genes. The rationale for the consideration of mutational effects on RNA secondary structure is explained in the next paragraph.
The function of structural noncoding RNA [ribozymes (10), riboswitches (11), precursor microRNA (12), selenocysteine insertion sequence (SECIS) elements (13), transfer RNA, etc.] depends on tertiary structure, which Banerjee et al. (14) have shown experimentally to largely depend on secondary structure. Secondary structure can be predicted using dynamic programming energy minimization (15); indeed, Mathews et al. (16) have shown that the minimum free energy (MFE) structure, as determined in mfold (17) or RNAfold (18), includes 73% of the base pairs in the native as inferred from the X-ray structure or by comparative sequence analysis secondary structure, on average, when tested on RNA sequences of length 700 nt.
Computational tools like mfold of Zuker (17), Vienna RNA Package of Hofacker et al. (18), RNAStructure of Mathews and Turner (19), Sfold of Ding et al. (20,21), RNAbor of Freyhult et al. (22,23) and RNAsat of Waldispühl and Clote (24) probe the landscape of secondary structures of a given RNA sequence. RNA sequence/structure alignment tools like Dynalign by Mathews and Turner (25), FOLDALIGN by Havgaard et al. (26), MSARI of Coventry et al. (27), RNAz of Washietl et al. (28), etc., can be considered to be the RNA analog of BLAST and ClustalW, whereby conservation of secondary structure base pairing is taken into account.
Understanding the effect of pointwise mutations on RNA secondary structure reveals fundamental properties of structurally important RNA and may suggest potentially deleterious mutations in RNA viral pathogens. Designed explicitly for this purpose, the algorithm RNAmutants (9) allows users to analyze the low energy ensemble of mutant RNA sequences and structures. Given an RNA sequence s of length n, an upper bound K for the number of mutations allowed, a desired number N of secondary structures samples to be generated, and a temperature 0
T
100 in degrees Celsius, RNAmutants computes the following for all k
K simultaneously: (i) the MFE structure MFE
, its free energy and the Boltzmann partition function Z
, over all secondary structures of all k-point mutants; (ii) a plot of the ensemble free energy –RT ln Z
, as a function of k; and (iii) a collection of N RNA mutant sequences and their secondary structures, as sampled using the partition function. By comparing low-energy structures from mutant RNA with the consensus structures from the Rfam database (29), one can infer putative deleterious mutations, as performed in (9).
 |
DEFINITIONS AND METHODS
|
|---|
Definitions
Given RNA sequence
s =
s1, ...,
sn, for all 0
k
n, let Z


denote
the Boltzmann partition function at absolute temperature
T for
the collection of all secondary structures on all
k-point mutants;
i.e.
| (1) |
where the first
sum is taken over all
k-point mutants
s' =
s'
1, ...,
s'
n of
s =
s1, ...,
sn, and the second sum is taken over all secondary
structures

of the (fixed)
k-point mutant. Similarly, let
mfe

denote
the
k-point mutant
s' =
s'
1, ...,
s'
n of
s whose secondary structure
has least free energy over all
k-point mutants of
s, and let
MFE

denote its secondary structure. In the sequel,
mfe

is called the
k-superoptimal
mutant and MFE


is called the
k-superoptimal secondary structure.
Finally, we let
Zk,
mfek, MFE
k denote the corresponding values
at default temperature
T = 37°C.
Partition function and superoptimal structures
In (30), we introduced a novel algorithm to compute the partition function Z
for all k-point mutants of a given RNA sequence at absolute temperature T, with respect to the Nussinov energy model (31). In contrast to the Nussinov energy model, where each base pair contributes energy term of –1, the widely accepted Turner energy model (32) includes negative, stabilizing free energy terms for stacked base pairs as well as positive, destabilizing free energy terms for hairpins, bulges, internal loops and multiloops. With the exception of multiloops, for which an affine approximation is applied, these free energy parameters were obtained from UV absorption (optical melting) experiments first pioneered by Tinoco's Lab (33) and systematically carried out by Turner's Lab (32,34). For instance, at 37°C, Turner's rules assign stacking free energy of –2.24 kcal/mol to
|
Waldispühl et al. (35) developed a general algorithm AMSAG, applicable both to RNA and transmembrane protein structure prediction. Subsequently, Clote et al. (30) designed an algorithm to compute the partition function Z
with respect to the Nussinov energy model (31), and applied AMSAG to determine the k-superoptimal secondary structures with respect to an energy model intermediate between the Nussinov and Turner models. Recently, Waldispühl et al. (9) created a unified framework for simultaneously computing k-superoptimal secondary structures MFE
as well as the partition functions Z
with respect to the full Turner energy model. The resulting program, RNAmutants, was then applied to the analysis of regulatory portions of the hepatitis C and human immunodeficiency viral genomes. Of particular interest is the determination of putative deleterious mutations, many of which were validated in prior experimental work.
Using dynamic programming, RNAmutants computes mfe
, MFE
and Z
for all values of 0
k
K in worst-case time O(n3K2) and space O(n2K). From statistical mechanics, it is known that the expected internal energy
Ek
of all k-point mutants and their secondary structures is equal to RT2 times the partial derivative of ln Z
, and hence can be approximated using the difference Z
-Z
(30). Ensemble free energy -RT ln Z
can be computed as well and plotted as a function of k. Similarly, other thermodynamic parameters (heat capacity, etc.) can be obtained from the partition function.
 |
WEB SERVER
|
|---|
Input
The web server (
http://bioinformatics.bc.edu/clotelab/RNAmutants)
runs on a Linux cluster with head and file server nodes, and
25 compute nodes, including 6 Dell Power Edge 1750, 2x Intel
Xeon P4 (2.80 GHz), 2 GB RAM, 11 Dell Power Edge 1750, 2x Intel
Xeon P4 (2.80 GHz), 4 GB RAM, and 8 Dell Power Edge 1950, 2x
Intel Xeon E5430 Quad core (2.80 GHz), 16 GB RAM.
The input form for RNAmutants is shown in Figure 1. The user must submit an RNA sequence, either by pasting in the space provided, or by uploading a file. As well, the user must enter a valid email address (This email may be bogus; however, for long jobs that cannot be done interactively, the results will be sent to the email address provided), an upper bound for the number of pointwise mutations, the desired number of sampled structures, and optionally the temperature in degrees Celsius. Input for each job is saved under a unique anonymized job ID, sent to the user's email address, thus allowing the user to retrieve information from the old runs. As long as the user's browser is open, updates to the results page will be made; however, for long runs, the user will receive an email with job ID and link to the completed results page.
Output
If
K denotes the user-specified upper bound for the number of
mutations, then
RNAmutants computes for each
k
K the
k-superoptimal
sequence
mfek, secondary structure MFE
k and free energy
Ek,
where we recall that the superoptimal secondary structure MFE
k is that which has lowest free energy over all secondary structures
of all
k-point mutants of the input RNA sequence. Additionally,
RNAmutants computes the Boltzmann partition function
Zk =

e–E(
)/RT for each
k
K, and using this computes a sample of structures
from the low energy ensemble, following a technique similar
to (but distinct from) that of Ding and Lawrence (
20).
RNAmutants,
output of
mfek, MFE
k and
Ek is depicted in
Figure 2, while sampled
sequence/structure pairs are given in
Figure 3.

View larger version (16K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 2. Initial portion of one output file from RNAmutants for 51-nt portion of the 3'-untranslated region from murine β-galactoside binding protein mRNA, with NCBI accession code MUSGBPA. Web server displays all 51 superoptimal secondary structures, their free energy and mutation locations. Mutated nucleotides are shown in lower case. Each line contains the partition functin value Z(K), the sampled mutated sequence, its minimum free energy structure and the free energy of that structure.
|
|

View larger version (54K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 3. Initial portion of output file of 100 mutant sequence/structure pairs from RNAmutants for the 51-nt portion of the 3'-untranslated region from murine β-galactoside binding protein mRNA, with NCBI accession code MUSGBPA. Mutated nucleotides are shown in lower case.
|
|
By writing scripts to postprocess the output, a number of interesting
results can be obtained, as exemplified in
Figures 4–
6.
Figure 4 was generated using
RNAplot and
RNAfold from the
Vienna RNA Package (
18), using the 51 nt portion of the 3'-untranslated
region from murine β-galactoside binding protein mRNA,
with NCBI accession code MUSGBPA (
29). This figure shows the
Rfam consensus structure (
29), the MFE structure and the 20-superoptimal
structure. The upper triangular portion of
Figure 5A shows the
base pairing frequencies over all sampled structures for the
88-nt hepatitis delta virus ribozyme with EMBL accession code
X85253.1
[GenBank]
/682-769, while the lower triangular portion shows the
base pairs in the MFE structure. (We follow the dot plot conventions
of
Vienna RNA Package.)
Figure 5B shows superoptimal and ensemble
free energy (
y-axis), plotted as a function of number of pointwise
mutations (
x-axis).
Figure 6 displays the mutational profile
of the 48-nt HAR1 region, an important region of the novel RNA
gene HAR1F (
36), expressed in Cajal–Retzius neurons in
the developing human neocortex, a gene believed to show significant
evolutionary acceleration. The RNAmutants Web server provides
a tool to display the mutational profile determined for nc RNA
genes.

View larger version (93K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 6. Mutability profile of 48-nt HAR1 region, where the number of mutations ranges from 1 to 40. HAR1 is part of the novel RNA gene HAR1F (39), expressed in Cajal–Retzius neurons in the developing human neocortex, a gene believed to show significant evolutionary acceleration. As in traffic lights, red regions are not mutated, while green regions are mutated from wild-type nucleotide. The x-axis represents nucleotide position, as suggested by the logo plot below; the y-axis represents position-specific mutability. Mutability value of 0 corresponds to finding no mutations at that position among all samples, depicted by RGB color triple (255, 0, 0), while mutability value of 1 corresponds to finding every sample mutated at that position, depicted by RGB color triple (0, 255, 0), while fractional ratios of mutant positions are depicted by the triple ( , β, 0), where + β = 255. Python scripts that produced this PPM figure can be downloaded at the web server.
|
|
 |
CONCLUSION
|
|---|
RNAmutants is a novel application which computes, for each
k
K; (i) the MFE structure MFE


, free energy E


and the Boltzmann
partition function Z


, over all secondary structures of
all
k-point mutants; (ii) a plot of the ensemble free energy
-RT
ln Z


, as a function of
k; and (iii) a collection of RNA
mutant sequences and their secondary structures, as sampled
using the partition function. Since
RNAmutants runs in worst-case
O(
n3K2) time, where
n is the length of input RNA sequence, and
K is an upper bound for the number of mutations, the web server
cannot provide computational resources for large values of
n and
K. In such cases, the user should download executable code,
which can be retrieved from the web server.
RNAmutants allows
the user to estimate the impact of mutations on the structure
of functional RNA, and better understand the evolutionary process
of RNA molecules.
 |
SUPPLEMENTARY DATA
|
|---|
Supplementary Data are available at NAR Online.
 |
FUNDING
|
|---|
Deutscher Akademischer Austauschdienst for a visit to Max Planck
Institute of Molecular Genetics (to P.C.); foundation Digiteo
- Triangle de la Physique (to P.C.); National Science Foundation
(DBI-0543506 and DMS-0817971 to P.C.). Funding for open access
charge: National Science Foundation (DBI-0543506).
Conflict of interest statement. None declared.
 |
ACKNOWLEDGEMENTS
|
|---|
RNAmutants software was written by J. Waldispühl, who designed
the web site
http://rnamutants.csail.mit.edu/. The web server
and downloadable scripts at
bioinformatics.bc.edu/clotelab/RNAmutants were developed by P. Clote, using a general framework created
by Jason Piersampieri and Yann Ponty. Any opinions, findings,
and conclusions or recommendations expressed in this material
are those of the authors and do not necessarily reflect the
views of the National Science Foundation.
 |
Footnotes
|
|---|
The authors wish it to be known that, in their opinion, the
first and the last, author should be regarded as joint First
Authors.
 |
REFERENCES
|
|---|
- Fisher R. The Genetical Theory of Natural Selection. (1930) Oxford: Clarendon Press.
- Wright S. The differential equation of the distribution of gene frequencies. Proc. Natl Acad. Sci. USA (1945) 31:382–389.[Free Full Text]
- Watterson G. Some theoretical aspects of diffusion theory in population genetics. Ann. Math. Stat. (1962) 33:93–957. Correction: 34, 352.[CrossRef]
- Ewens W. The mean time for absorption in a process of genetic type. J. Austral. Math. Soc. (1963) 3:375–383.[CrossRef]
- Kingman JFC. The coalescent. Stoch. Proc. Appl. (1982) 13:235–248.[CrossRef]
- Hein J, Schierup M, Wiuf C. Gene Genealogies, Variation and Evolution: A Primer in Coalescent Theory. (2004) Oxford University Press.
- Buss S, Clote P. Solving the Fisher-Wright and coalescence problems with a discrete Markov chain analysis. Adv. Appl. Probab. (2005) 36:1175–1197.[CrossRef]
- Kimura M. Diffusion models in population genetics. J. Appl. Prob. (1964) 1:177–232.[CrossRef]
- Waldispuhl J, Devadas S, Berger B, Clote P. Efficient algorithms for probing the RNA mutation landscape. PLoS. Comput. Biol. (2008) 4:e1000124.[CrossRef][Medline]
- Doudna J, Cech T. The chemical repertoire of natural ribozymes. Nature (2002) 418:222–228.[CrossRef][Medline]
- Barrick J, Corbino K, Winkler W, Nahvi A, Mandal M, Collins J, Lee M, Roth A, Sudarsan N, Jona I, et al. New RNA motifs suggest an expanded scope for riboswitches in bacterial genetic control. Proc. Natl Acad. Sci. USA (2004) 101:6421–6426.[Abstract/Free Full Text]
- Ng KL, Mishra SK. De novo SVM classification of precursor microRNAs from genomic pseudo hairpins using global and intrinsic folding measures. Bioinformatics (2007) 23:1321–1330.[Abstract/Free Full Text]
- Commans S, Böck A. Selenocysteine inserting tRNAs: an overview. FEMS Microbiol. Rev. (1999) 23:333–351.
- Banerjee A, Jaeger J, Turner D. Thermal unfolding of a group I ribozyme: the low-temperature transition is primarily disruption of tertiary structure. Biochemistry (1993) 32:153–163.[CrossRef][Web of Science][Medline]
- Zuker M, Stiegler P. Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res. (1981) 9:133–148.[Abstract/Free Full Text]
- Mathews D, Sabina J, Zuker M, Turner H. Expanded sequence dependence of thermodynamic parameters provides robust prediction of RNA secondary structure. J. Mol. Biol. (1999) 288:911–940.[CrossRef][Web of Science][Medline]
- Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. (2003) 31:3406–3415.[Abstract/Free Full Text]
- Hofacker I. Vienna RNA secondary structure server. Nucleic Acids Res. (2003) 31:3429–3431.[Abstract/Free Full Text]
- Mathews D, Disney M, Childs J, Schroeder S, Zuker M, Turner D. Incorporating chemical modification constraints into a dynamic programming algorithm for prediction of RNA secondary structure. Proc. Natl Acad. Sci. USA (2004) 101:7287–7292.[Abstract/Free Full Text]
- Ding Y, Lawrence CE. A statistical sampling algorithm for RNA secondary structure prediction. Nucleic Acids Res. (2003) 31:7280–7301.[Abstract/Free Full Text]
- Ding Y, Chan C, Lawrence C. Sfold web server for statistical folding and rational design of nucleic acids. Nucleic Acids Res. (2004) 32:W135–W141.[Abstract/Free Full Text]
- Freyhult E, Moulton V, Clote P. Boltzmann probability of RNA structural neighbors and riboswitch detection. Bioinformatics (2007) 23:2054–2062.[Abstract/Free Full Text]
- Freyhult E, Moulton V, Clote P. RNAbor: a web server for RNA structural neighbors. Nucleic Acids Res. (2007) 35:W305–W309.[Abstract/Free Full Text]
- Waldispuhl J, Clote P. Computing the partition function and sampling for saturated secondary structures of RNA, with respect to the Turner energy model. J. Comput. Biol. (2007) 14:190–215.[CrossRef][Web of Science][Medline]
- Mathews D, Turner D. Dynalign: an algorithm for finding the secondary structure common to two RNA sequences. J. Mol. Biol. (2002) 317:191–203.[CrossRef][Web of Science][Medline]
- Havgaard JH, Lyngso RB, Gorodkin J. The FOLDALIGN web server for pairwise structural RNA alignment and mutual motif search. Nucleic Acids Res. (2005) 33:W650–W653.[Abstract/Free Full Text]
- Coventry A, Kleitman D, Berger B. MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc. Natl Acad. Sci. USA (2004) 101:12102–12107.[Abstract/Free Full Text]
- Washietl S, Hofacker I, Stadler P. Fast and reliable prediction of noncoding RNAs. Proc. Natl Acad. Sci. USA (2005) 102:2454–2459.[Abstract/Free Full Text]
- Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy S. Rfam: an RNA family database. Nucleic Acids Res. (2003) 31:439–441.[Abstract/Free Full Text]
- Clote P, Waldispühl J, Behzadi B, Steyaert J.-M. Exploring the energy landscape of k-point mutagens of RNA. Bioinformatics (2005) 21:4140–4147.[Abstract/Free Full Text]
- Nussinov R, Jacobson AB. Fast algorithm for predicting the secondary structure of single stranded RNA. Proc. Natl Acad. Sci. USA (1980) 77:6309–6313.[Abstract/Free Full Text]
- Xia T, SantaLucia J, Burkard M, Kierzek R, Schroeder S, Jiao X, Cox C, Turner D. Thermodynamic parameters for an expanded nearest-neighbor model for formation of RNA duplexes with Watson-Crick base pairs. Biochemistry (1999) 37:14719–14735.[CrossRef][Web of Science]
- Tinoco I Jr., Schmitz M. Thermodynamics of formation of secondary structure in nucleic acids. In: Thermodynamics in Biology.—Di Cera E, ed. (2000) Oxford University Press. 131–176.
- Matthews D, Sabina J, Zuker M, Turner D. Expanded sequence dependence of thermodynamic parameters improves prediction of RNA secondary structure. J. Mol. Biol. (1999) 288:911–940.[CrossRef][Web of Science][Medline]
- Waldispuhl J, Behzadi B, Steyaert JM. An approximate matching algorithm for finding (sub-)optimal sequences in S-attributed grammars. Bioinformatics (2002) 18:S250–S259.[Abstract]
- Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, et al. An RNA gene expressed during cortical development evolved rapidly in humans. Nature (2006) 443:167–172.[CrossRef][Medline]

CiteULike
Connotea
Del.icio.us What's this?