Skip Navigation


Nucleic Acids Research Advance Access originally published online on April 22, 2007
Nucleic Acids Research 2007 35(Web Server issue):W335-W338; doi:10.1093/nar/gkm222
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (5366K) Freely available
Right arrow Screen PDF (520K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W335    most recent
gkm222v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Gruber, A. R.
Right arrow Articles by Washietl, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gruber, A. R.
Right arrow Articles by Washietl, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W335-W338
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Articles

The RNAz web server: prediction of thermodynamically stable and evolutionarily conserved RNA structures

Andreas R. Gruber, Richard Neuböck, Ivo L. Hofacker and Stefan Washietl*

Institute for Theoretical Chemistry, University of Vienna, Währingerstrasse 17, 1090 Vienna, Austria

*To whom correspondence should be addressed. Tel: +43-1-4277-52744; Fax: +43-1-4277-52793; Email: wash{at}tbi.univie.ac.at

Received January 29, 2007. Revised March 20, 2007. Accepted March 28, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 THE RNAZ ALGORITHM
 THE RNAZ WEB SERVER
 IMPLEMENTATION
 REFERENCES
 
Many non-coding RNA genes and cis-acting regulatory elements of mRNAs contain RNA secondary structures that are critical for their function. Such functional RNAs can be predicted on the basis of thermodynamic stability and evolutionary conservation. We present a web server that uses the RNAz algorithm to detect functional RNA structures in multiple alignments of nucleotide sequences. The server provides access to a complete and fully automatic analysis pipeline that allows not only to analyze single alignments in a variety of formats, but also to conduct complex screens of large genomic regions. Results are presented on a website that is illustrated by various structure representations and can be downloaded for local view. The web server is available at: rna.tbi.univie.ac.at/RNAz.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 THE RNAZ ALGORITHM
 THE RNAZ WEB SERVER
 IMPLEMENTATION
 REFERENCES
 
Functional RNA structures play an important role in a variety of cellular processes. They can be found in independent non-coding RNAs (ncRNAs or ‘RNA genes’) as well as in untranslated regions of mRNAs (1,2). Defined RNA secondary structures in ncRNAs may be required for specific interactions with proteins as part of a ribonucleoprotein complex (e.g. the signal recognition particle), or for protein interaction during the maturation process of the RNA (e.g. microRNAs interacting with DICER). Also direct catalytic activity of RNAs is possible (e.g. type I and II self-splicing introns). RNA structures in untranslated regions of mRNAs often serve regulatory roles and form interaction partners for proteins (e.g. iron responsive element interacting with IRP1) or small ligands (e.g. riboswitches in bacteria).

Such functional RNAs have moved into focus of interest during the past years and computational techniques for their prediction have become more and more important (3). Methods for prediction of secondary structure models from primary sequence have a long tradition and are readily available (4–6). However, any nucleotide sequence can be folded into a secondary structure using these programs. The challenge is to discriminate real functional RNA structures from random structures. There is general consensus in the community, that the structure information contained in a single sequence is not sufficient to yield reasonable prediction accuracies (7,8). The observation that many functional RNA structures are conserved in evolution, and the massive availability of comparative sequence data, has lead the efforts to predict functional RNAs into a clear direction: a series of programs have been developed recently that try to detect ‘structurally conserved’ RNAs (9–15).

Although the problem still remains challenging especially when scanning large genomes (3,13,16,17), such programs represent an important addition to today's arsenal of sequence analytic methods. In this article, we describe a web server that scans multiple sequence alignments for functional RNAs using the RNAz algorithm [12].


    THE RNAZ ALGORITHM
 TOP
 ABSTRACT
 INTRODUCTION
 THE RNAZ ALGORITHM
 THE RNAZ WEB SERVER
 IMPLEMENTATION
 REFERENCES
 
RNAz predicts functional RNA structures based on two criteria: (i) structural conservation, (ii) thermodynamic stability. It first predicts a consensus secondary structure using the RNAalifold algorithm (18). This is essentially an extension of standard minimum free energy folding algorithms with the constraint that all sequences have to fold into a common structure. Compensatory/consistent mutations, i.e. mutations that preserve a secondary structure, are incorporated as ‘bonus’ into the energy model, while inconsistent mutations are penalized. RNAz measures structural conservation by calculating the ratio of this consensus folding energy to the unconstrained folding energies of the single sequences.

In addition, RNAz calculates a stability score for the sequences in the alignment because functional RNAs are known to be thermodynamically more stable than random sequences (8,19). Stability is measured as normalized z-score of folding energies. It indicates how many SDs a given sequence is more/less stable than expected for random sequences of the same length and base composition.

Finally, an alignment is classified as ‘functional RNA’ or ‘other’ based on these two characteristic measures. A support vector machine learning technique which calculates an optimal combination of both scores is used for this purpose. Details of the algorithm can be found in reference (12).


    THE RNAZ WEB SERVER
 TOP
 ABSTRACT
 INTRODUCTION
 THE RNAZ ALGORITHM
 THE RNAZ WEB SERVER
 IMPLEMENTATION
 REFERENCES
 
The design of the web server was guided by three main goals: (i) minimizing the burden of manual pre-processing and formatting of the input data, (ii) providing a reasonable analysis pipeline that, on the one hand, works ‘out of the box’ but, on the other hand, can also be customized by the user and (iii) providing reasonable output for humans (e.g. graphical visualization, overview tables) and computers (e.g. annotation files, raw RNAz text output).

The web server operates in two different modes: in ‘Standard Analysis’ mode, usually one single alignment is analyzed. In this mode, it is also possible to analyze more than one alignment in one session, but the alignments are treated to be independent from each other. In ‘Genomic screen’ mode, a large number of alignments covering a genomic region can be screened and the results from all alignments are integrated in the end.

In the following, we describe general features of the server that apply to both modes of operation. Special requirements and features of the ‘Genomic screen’ mode is described in Section ‘Conducting genomic screens’. Detailed instructions how to use the server is available as online help.

Uploading sequence alignments
Multiple sequence alignments can be provided by cut-and-paste or uploaded as file (Figure 1A). The server currently can read the following alignment formats: CLUSTALW, FASTA, PHYLIP, NEXUS, MAF and XMFA. Alignments can be generated by any sequence-based alignment program (see (20) and (21) for comparison of different programs on structural RNAs). However, one should not use ‘structurally enhanced’ alignments generated by programs that consider RNA structures. Although this appears counterintuitive, one has to keep in mind that RNAz was trained on pure sequence alignments and structural alignments could result in artifactually high scores even for alignments without conserved RNA structure.


Figure 1
View larger version (59K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. (A) Screenshots of the RNAz web server. The interface to upload alignments in Standard Analysis mode is shown left. The various slicing and filtering options together with context-sensitive online help is shown right. (B) Sample output of an alignment of ~400 columns that contains a H/ACA snoRNA in the middle and that was scored in overlapping windows of 120 columns and step size 40. The overview panel shows windows that were predicted to contain a significant RNA structure with RNAz classification probability higher than 90%. Below, detailed results for the window from positions 80 to 200 are shown which contains the first of the two stem-loops that are typical for H/ACA snoRNAs. The output consists of a table summarizing alignment characteristics and RNAz results, graphical representations of the consensus structure (structure annotated alignment, secondary structure drawing, base-pairing probabilities ‘dot-plot’) and secondary structure models in dot/bracket notation for each single sequence in comparison to the consensus structure.

 
File uploads are currently limited to 20 MB. This allows, e.g. to screen roughly 2 megabases of 6-way alignments in MAF format.

Local scanning and pre-processing of alignments
The RNAz algorithm works ‘globally’, i.e. the given alignment is scored as a whole. For long alignments (e.g. alignment of a whole chromosome), this is neither computationally tractable nor biologically meaningful. Therefore, long alignments are scanned in overlapping windows. The window and step size can be set by the user. By default, a window size of 120 and a step size of 40 is used. This window size appears large enough to detect local secondary structures within long ncRNAs and, on the other hand, small enough to find short secondary structures without loosing the signal in a much too long window.

In addition to this step, alignments are filtered in various ways before they are analyzed with RNAz. In particular, automatically generated genomic alignments are full of gap-rich regions, dubious aligned fragments or low-complexity regions. Such alignments are unlikely to contain true conserved structures and, in some cases, can cause artifactual predictions. Sequences that contain, e.g. too many gaps or too many repeat-masked letters are therefore filtered out. Details of the filtering process can be set by the user (Figure 1A).

The RNAz program in its current implementation can only analyze alignments with up to six sequences. Six sequences usually hold enough information to allow reasonable predictions. If there are more sequences in the given alignment, the server selects an optimal subset of sequences. A greedy algorithm is used that gradually selects sequences to optimize for a given target diversity in the alignment. By default, a subset of six sequences is chosen which is optimized for a mean pairwise sequence identity of 80%.

The output
Sample output of the server is shown in Figure 1B. In ‘Standard Analysis’ mode, an overview of each uploaded alignment is shown. Windows containing predicted secondary structures are highlighted and detailed information (z-score, structure conservation index, RNAz P-value, etc.) is shown in a table. These results are supplemented by different visualizations of the predicted consensus secondary structure. A typical secondary structure drawing, a ‘dot-plot’ representing the base-pairing probability matrix, and a structure-annotated alignment are generated. All three visualizations are color coded which makes it easy to identify compensatory/consistent mutations that support a predicted structure. In addition, the raw RNAz output can be viewed as text file. In ‘Genomic screen’ modus also annotation files in the standard formats BED and GFF are generated if desired. All result files are stored for 30 days on the server and can be downloaded as a single compressed archive file for local viewing.

Conducting genomic screens
For screening genomic regions, the ‘Genomic screen’ option must be chosen on the first page of the server. In general, the analysis pipeline and the generated output are the same as described above. However, only alignments in MAF and XMFA formats are read. These alignment need to fulfill some requirements: The identifier of the first sequence in the first alignment is used as ‘reference’. Each provided alignment must contain a sequence with this identifier and at least for this reference sequence correct genomic positions must be provided in the alignment. The MAF and XMFA file formats provide fields to store this information.

Also in this mode, alignments are sliced if necessary and filters are applied. After scoring of filtered alignment windows, RNA predictions in overlapping windows are combined to non-overlapping genomic ‘loci’. The genomic location of the predicted loci can be downloaded as BED or GFF annotation file and are presented in an overview table. It is also possible to upload an annotation file with already available annotation. This information will be included in the overview table and allows to compare the predictions with existing annotation. Each prediction shown in the overview table is linked to detailed result pages with illustrations and tables explained earlier.


    IMPLEMENTATION
 TOP
 ABSTRACT
 INTRODUCTION
 THE RNAZ ALGORITHM
 THE RNAZ WEB SERVER
 IMPLEMENTATION
 REFERENCES
 
The web server was implemented using Apache, Perl, BioPerl (22), CGI and client-side JavaScript. The analysis pipeline builds upon the programs of the RNAz package version 1.0. As of writing this article, the system makes use of 4 Intel XEON 2.20 GHz CPUs for performing the calculations.


    ACKNOWLEDGEMENTS
 
The authors would like to thank Stephan Bernhart for useful discussions. This work was supported by the Austrian Genome Initiative GEN-AU of the BMWF (projects ‘non-coding RNA’ and ‘Bioinformatics Integration network’). Funding to pay the Open Accesss publication charges for this article was provided by GEN-AU.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 THE RNAZ ALGORITHM
 THE RNAZ WEB SERVER
 IMPLEMENTATION
 REFERENCES
 

  1. Bompfünewerer AF, Flamm C, Fried C, Fritzsch G, Hofacker IL, Lehmann J, Missal K, Mosig A, Müller B, et al. Evolutionary patterns of non-coding RNAs. Th. Biosci (2005) 123:301–369.[CrossRef]

  2. Mignone F, Gissi C, Liuni S, Pesole G. Untranslated regions of mRNAs. Genome Biol (2002) 3:REVIEWS0004.[Medline]

  3. Athanasius F. Bompfünewerer Consortium. Backofen R, Bernhart SH, Flamm C, Fried C, Fritzsch G, Hackermüller J, Hertel J, Hofacker IL, et al. RNAs everywhere: genome-wide annotation of structured RNAs. J. Exp. Zool. B Mol. Dev. Evol (2007) 308:1–25.[Medline]

  4. Zuker M. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res (2003) 31:3406–3415.[Abstract/Free Full Text]

  5. Hofacker IL. Vienna RNA secondary structure server. Nucleic Acids Res (2003) 31:3429–3431.[Abstract/Free Full Text]

  6. Do CB, Woods DA, Batzoglou S. CONTRAfold: RNA secondary structure prediction without physics-based models. Bioinformatics (2006) 22:e90–e98.[Abstract/Free Full Text]

  7. Rivas E, Eddy S. Secondary structure alone is generally not statistically significant for the detection of noncoding RNAs. Bioinformatics (2000) 16:583–605.[Abstract/Free Full Text]

  8. Washietl S, Hofacker IL. Consensus folding of aligned sequences as a new measure for the detection of functional RNAs by comparative genomics. J. Mol. Biol (2004) 342:19–39.[CrossRef][ISI][Medline]

  9. Rivas E, Eddy SR. Noncoding RNA gene detection using comparative sequence analysis. BMC Bioinformatics (2001) 2:8.[CrossRef][Medline]

  10. diBernardo D, Down T, Hubbard T. ddbRNA: detection of conserved secondary structures in multiple alignments. Bioinformatics (2003) 19:1606–1611.[Abstract/Free Full Text]

  11. Coventry A, Kleitman DJ, Berger B. MSARI: multiple sequence alignments for statistical detection of RNA secondary structure. Proc. Natl. Acad. Sci. USA (2004) 101:12102–12107.[Abstract/Free Full Text]

  12. Washietl S, Hofacker IL, Stadler PF. Fast and reliable prediction of noncoding RNAs. Proc. Natl. Acad. Sci. USA (2005) 102:2454–2459.[Abstract/Free Full Text]

  13. Pedersen JS, Bejerano G, Siepel A, Rosenbloom K, Lindblad-Toh K, Lander ES, Kent J, Miller W, Haussler D. Classification of conserved RNA secondary structures in the human genome. PLoS Comput. Biol (2006) 2:e33.[CrossRef][Medline]

  14. Torarinsson E, Sawera M, Havgaard J, Fredholm M, Gorodkin J. Thousands of corresponding human an mouse genomic regions unalignable in primary sequece contain common RNA structure. Genome Res (2006) 6:885–889.

  15. Uzilov AV, Keegan JM, Mathews DH. Detection of non-coding RNAs on the basis of predicted secondary structure formation free energy change. BMC Bioinformatics (2006) 7:173.[CrossRef][Medline]

  16. Washietl S, Hofacker IL, Lukasser M, Hüttenhofer A, Stadler PF. Mapping of conserved RNA secondary structures predicts thousands of functional non-coding RNAs in the human genome. Nat. Biotechnol (2005) 23:1383–1390.[CrossRef][ISI][Medline]

  17. Washietl S, Pedersen JS, Korbel JO, Gruber AR, Stocsits C, Hackermüller J, Hertel J, Lindemeyer M, Reiche K, et al. Structured RNAs in the ENCODE selected regions of the human genome. Genome Res (2007) in press.

  18. Hofacker IL, Fekete M, Stadler PF. Secondary structure prediction for aligned RNA sequences. J. Mol. Biol (2002) 319:1059–1066.[CrossRef][ISI][Medline]

  19. Clote P, Ferre F, Kranakis E, Krizanc D. Structural RNA has lower folding energy than random RNA of the same dinucleotide frequency. RNA (2005) 11:578–591.[Abstract/Free Full Text]

  20. Gardner PP, Wilm A, Washietl S. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res (2005) 33:2433–2439.[Abstract/Free Full Text]

  21. Wilm A, Mainz I, Steger G. An enhanced RNA alignment benchmark for sequence alignment programs. Algorithms Mol. Biol (2006) 1:19.[CrossRef][Medline]

  22. Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, et al. The bioperl toolkit: Perl modules for the life sciences. Genome Res (2002) 12:1611–1618.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
GeneticsHome page
Y. Chen, F. Zhou, G. Li, and Y. Xu
A Recently Active Miniature Inverted-Repeat Transposable Element, Chunjie, Inserted Into an Operon Without Disturbing the Operon Structure in Geobacter uraniireducens Rf4
Genetics, August 1, 2008; 179(4): 2291 - 2297.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
B. S. Srinivasan, N. H. Shah, J. A. Flannick, E. Abeliuk, A. F. Novak, and S. Batzoglou
Current progress in network research: toward reference networks for key model organisms
Brief Bioinform, September 1, 2007; 8(5): 318 - 332.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (5366K) Freely available
Right arrow Screen PDF (520K) Freely available
Right arrowOA All Versions of this Article:
35/suppl_2/W335    most recent
gkm222v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Gruber, A. R.
Right arrow Articles by Washietl, S.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gruber, A. R.
Right arrow Articles by Washietl, S.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?