Nucleic Acids Research Advance Access published online on April 25, 2008
Nucleic Acids Research, doi:10.1093/nar/gkn222
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Web Server |
Software.ncrna.org: web servers for analyses of RNA sequences
1Department of Computational Biology, Graduate School of Frontier Sciences, University of Tokyo, 5-1-5 Kashiwa-no-ha, Chiba 277-8561, 2Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST), 2-42 Aomi, Koto-ku, Tokyo 135-0064, 3Mizuho Information & Research Institute, Inc., 2-3, Kanda-Nishikicho, Chiyoda-ku, Tokyo 101-8443, 4Japan Biological Informatics Consortium, 10F TIME24 Building, 2-45 Aomi, Koto-ku, Tokyo 135-8073, 5Department of Biosciences and Informatics, Keio University, 3-14-1 Hiyoshi, Kohoku-ku, Yokohama, Kanagawa 223-8522 and 6INTEC Systems Institute, Inc., 1-3-3 Shinsuna, Koto-ku, Tokyo 136-0075, Japan
*To whom correspondence should be addressed. Tel: +81 33599 8089; Fax: +81 33599 8081; Email: asai{at}k.u-tokyo.ac.jp
Received February 7, 2008. Revised April 3, 2008. Accepted April 10, 2008.
| ABSTRACT |
|---|
|
|
|---|
We present web servers for analysis of non-coding RNA sequences on the basis of their secondary structures. Software tools for structural multiple sequence alignments, structural pairwise sequence alignments and structural motif findings are available from the integrated web server and the individual stand-alone web servers. The servers are located at http://software.ncrna.org, along with the information for the evaluation and downloading. This website is freely available to all users and there is no login requirement.
| INTRODUCTION |
|---|
|
|
|---|
Comparisons, alignments and motif identification are essential procedures for extracting valuable information from biological sequences. Many effective software tools for these purposes are available for use with amino acid and DNA sequences, but their efficiency for RNA sequences is limited because they do not accommodate analysis of possible secondary structures. Practical analyses of multiple RNA sequences in light of their secondary structures have been difficult because of their extremely high computational costs, but several algorithms have been proposed and there are a few websites of software tools that support structure-based analyses of RNA sequences, e.g. Vienna RNA Package (http://www.tbi.univie.ac.at/~ivo/RNA/), Sfold (http://sfold.wadsworth.org) and BiBiServ (http://bibiserv.techfak.uni-bielefeld.de/).
Recent progress in RNA sequence analysis has created a demand for rapid and accurate structure-based analyses of multiple RNA sequences. To this end, we have developed several software tools for comparison (1,2), alignment (3–8) and motif identification (9) of multiple RNA sequences; searches for conserved miRNAs (10); prediction of common secondary structures from multiple sequence alignments (11) and calculation of base-pairing probabilities for long sequences (12). Using these software tools, we have developed an integrated web server and stand-alone web servers (software.ncrna.org) that support multiple alignment, pairwise alignment and extraction of structural motifs of RNA sequences.
| METHODS |
|---|
|
|
|---|
The integrated web server and the stand-alone web servers we developed offer three types of RNA sequence analyses based on common potential secondary structures: pairwise alignment, multiple alignment and structural motif extraction. SCARNA, PHMMTS (pair hidden Markov models on tree structures), PSTAG (pair stochastic tree adjoining grammar), Murlet and MXSCARNA can be used on their stand-alone web servers as well as on our integrated server. In addition, the source codes for PHMMTS, PSTAG, Murlet and MXSCARNA are available for download. The brief introductions to those tools follow, while the detailed evaluations are described in the refs (3–9) and their summaries given on web (http://software.ncrna.org).
Pairwise alignment
SCARNA (3) is a rapid structural pairwise alignment tool for RNA sequences of unknown secondary structure. This program separately aligns the 5' and 3' regions of stem candidates, which are extracted from each RNA sequence in light of base-pairing probabilities (12,13), by use of an engineered DP algorithm that incorporates rough consideration of consistency. We compared SCARNA with several other alignment tools by using Gardner's benchmark dataset (14) and a dataset comprising 5S ribosomal RNA, 5.8S ribosomal RNA and Hammerhead ribozyme from the Rfam database (15). The alignment accuracies of SCARNA for sequences with low similarities were not as high as those of programs that evaluate secondary structures more strictly, e.g. Foldalign (16), Dynalign (17) and PMcomp (18). However, the computational speed of SCARNA was approximately one order of magnitude faster (i.e. <1 min for 1000 bases) and allowed alignment of sequences longer than 1000 bases.
PHMMTS (4,5) and PSTAG (6) are tools for aligning RNA sequences of unknown secondary structure to RNA sequences with known secondary structure. PHMMTS evaluates only pseudoknot-free structures, whereas PSTAG can accept pseudoknotted structures. When compared with ClustalW (19) by using tRNA and Hammerhead ribozyme datasets, PHMMTS was more accurate in regard to correct assignment of secondary structures. In a comparison with PHMMTS and ClustalW by using RNA sequences of HDV_ribozyme, an RNA family in PseudoBase (20) that includes pseudoknotted structures, PSTAG was more accurate in correct assignment of secondary structures.
Multiple alignment
Murlet (7) and MXSCARNA (8) are structural multiple alignment tools for RNA sequences. Murlet is based on pair SCFG (stochastic context-free grammar), has dramatically decreased computational costs, and is applicable to RNA sequences as long as 300 bases. MXSCARNA is an extension of SCARNA that offers progressive alignment and is applicable to RNA sequences as long as 5000 bases though the accuracies for those longer than 500 bases are not confirmed and the lengths are restricted to 1000 in the web server. We validated Murlet and MXSCARNA by using the BRAlibaseII benchmark dataset (14) and the dataset of Kiryu et al. (7). Both tools showed comparative accuracies in SPS (sum-of-pairs score) with ProbCons (21). The accuracies in potential common secondary structures were evaluated by MCC (Mathew's correlation coefficient), and both tools showed comparative accuracy with Stemloc (22).
Motif extraction
RNAmine (9) is a tool for extraction of structural motifs. This program uses a graph-mining technique to identify local sequences with frequent stem patterns from among a set of RNA sequences. RNAmine is currently available only on the integrated web server.
Additional tools
In addition to the six software programs described, the tools SOKOS/CAN (1), Stem Kernel (2), miRRim (10), McCaskill-MEA (11) and Rfold (12) are available for download. Although pairwise alignment is the default method, kernels can be used as similarities in an alternative approach for comparing two biological sequences. SOCOS/CAN and Stem Kernel are tools for sequence comparison, both of which use features of the potential secondary structures to calculate the kernel function. SOKOS/CAN calculates the marginalized kernel on SCFG, and Stem Kernel compares the sequences by the kernel based on all possible stem patterns.
Predicting non-coding RNAs is difficult because general characteristic sequence patterns are not known. For specific families of non-coding RNAs, however, realistic predictions are possible. We developed miRRim (10) as a tool for finding conserved miRNAs.
McCaskill-MEA (11) is a method used to predict consensus secondary structures from given multiple alignments. Rfold (12) is a tool for calculating the local base-pairing probabilities without using sliding windows; it is based on the full energy model of the Vienna RNA Package (23).
| DESCRIPTION OF SERVICES |
|---|
|
|
|---|
Table 1 shows a list of software tools available at http://software.ncrna.org/.
|
The integrated web server and the stand-alone web servers offer web interfaces for use of SCARNA, PHMMTS, PSTAG, Murlet, MXSCARNA and RNAmine. On the integrated web server, users can select one of the service types: multiple alignment, pairwise alignment or structural motif extraction. On the menu for multiple alignment, users can select either Murlet or MXSCARNA. Either direct input or uploading of a file of RNA sequences in multi-FASTA format is accepted. The server outputs a multiple alignment with annotations of the predicted common secondary structure, a figure of the structure and a phylogenetic tree of the sequences.
On the menu for pairwise alignment, users can select SCARNA, PHMMTS or PSTAG. In SCARNA, either direct input or uploading of a file of RNA sequences in multi-FASTA format is accepted. The server outputs a pairwise alignment with annotations of the predicted common secondary structures and a figure of the structure that includes the two aligned sequences. PHMMTS and PSTAG accept either direct input or uploading of a file of RNA sequences of unknown secondary structures in multi-FASTA format as query sequences, and direct input of an RNA sequence of known secondary structure and its secondary structure in dot-bracket format as the template structure. The server outputs the result of alignments of the query sequences to the template structure with annotations of the secondary structures and the same kind of figures of the structures as SCARNA.
On the structural motif extraction menu, users can select RNAmine, which accepts either direct input or uploading of a file of RNA sequences in multi-FASTA format. The server outputs the extracted motifs as abstract figures of the secondary structures and the list of the members by sequence name and positions. Detailed figures and the structure-annotated sequence are linked to the members.
For each of the software tools described, after the server outputs the results the user can continue to a homology search of the RNA sequences by BLAT for various genomes. Hits of the search are equipped with links to UCSC GenomeBrowser for functional RNAs (24).
| FUTURE PLANS |
|---|
|
|
|---|
In addition to web servers, web services for sequence analysis tools are desirable. We have already developed soap-based web services for Murlet, MXSCARNA, PHMMTS and PSTAG. The services will start shortly.
| CONCLUSION |
|---|
|
|
|---|
We have developed web servers for analysis of RNA sequences in light of their secondary structures. The web server offers six software tools for multiple sequence alignment, pairwise alignment and extraction of structural motifs of RNA sequences. These servers provide practical speed of services for the tasks that have been thought to require high computational costs.
| ACKNOWLEDGEMENTS |
|---|
This work was supported in part by the Functional RNA Project funded by the New Energy and Industrial Technology Development Organization (NEDO) of Japan and by a Grant-in-Aid for Scientific Research on Priority Areas (Comparative Genomics) from the Ministry of Education, Culture, Sports, Science and Technology of Japan. The authors thank Ivo Hofacker, the author of Vienna RNA package, and Chuong Do, the author of ProbCons, because some of our programs include their parameters and/or source codes. The authors thank the Japan Biological Informatics Consortium (JBIC) for its support through the Functional RNA Project and our colleagues in the Computational Biology Research Center (CBRC) for their useful discussions. Funding to pay the Open Access publication charges for this article was provided by Grant-in-Aid for Scientific Research on Priority Areas (Comparative Genomics).
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Kin T, Tsuda K, Asai K. Marginalized kernels for RNA sequence data analysis. Genome Inform. (2002) 13:112–122.[Medline]
- Sakakibara Y, Popendorf K, Ogawa N, Asai K, Sato K. Stem kernels for RNA sequence analyses. J. Bioinform. Comput. Biol. (2007) 5:1103–1122.[CrossRef][Medline]
- Tabei Y, Tsuda K, Kin T, Asai K. SCARNA: fast and accurate structural alignment of RNA sequences by matching fixed-length stem fragments. Bioinformatics (2006) 22:1723–1729.
[Abstract/Free Full Text] - Sakakibara Y. Pair hidden Markov models on tree structures. Bioinformatics (2003) 19:i232–i240.[Abstract]
- Sato K, Sakakibara Y. RNA secondary structural alignment with conditional random fields. Bioinformatics (2005) 21:ii237–ii242.[Abstract]
- Matsui H, Sato K, Sakakibara Y. Pair stochastic tree adjoining grammars for aligning and predicting pseudoknot RNA structures. Bioinformatics (2005) 21:2611–2617.
[Abstract/Free Full Text] - Kiryu H, Tabei Y, Kin T, Asai K. Murlet: a practical multiple alignment tool for structural RNA sequences. Bioinformatics (2007) 23:1588–1598.
[Abstract/Free Full Text] - Tabei Y, Kiryu H, Kin T, Asai K. A fast structural multiple alignment method for long RNA sequences. BMC Bioinform. (2008) 9:33.[CrossRef][Medline]
- Hamada M, Tsuda K, Kudo T, Kin T, Asai K. Mining frequent stem patterns from unaligned RNA sequences. Bioinformatics (2006) 22:2480–2487.
[Abstract/Free Full Text] - Terai G, Komori T, Asai K, Kin T. miRRim: a novel system to find conserved miRNAs with high sensitivity and specificity. RNA (2007) 13:2081–2090.
[Abstract/Free Full Text] - Kiryu H, Kin T, Asai K. Robust prediction of consensus secondary structures using averaged base pairing probability matrices. Bioinformatics (2007) 23:434–441.
[Abstract/Free Full Text] - Kiryu H, Kin T, Asai K. Rfold: an exact algorithm for computing local base pairing probabilities. Bioinformatics (2008) 24:367–373.
[Abstract/Free Full Text] - McCaskill J. The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers (1990) 29:1105–1119.[CrossRef][ISI][Medline]
- Gardner P, Wilm A, Washietl S. A benchmark of multiple sequence alignment programs upon structural RNAs. Nucleic Acids Res. (2005) 33:2433–2439.
[Abstract/Free Full Text] - Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. (2003) 31:439–441.
[Abstract/Free Full Text] - Torarinsson E, Havgaard JH, Gorodkin J. Multiple structural alignment and clustering of RNA sequences. Bioinformatics (2007) 23:926–932.
[Abstract/Free Full Text] - Harmanci AO, Sharma G, Mathews DH. Efficient pairwise RNA structure prediction using probabilistic alignment constraints in Dynalign. BMC Bioinformatics (2007) 8:130.[CrossRef][Medline]
- Hofacker IL, Bernhart SH, Stadler PF. Alignment of RNA base pairing probability matrices. Bioinformatics (2004) 20:2222–2227.
[Abstract/Free Full Text] - Thompson JD, Higgins DG, Gibson TJ. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. (1994) 22:4673–4680.
[Abstract/Free Full Text] - van Batenburg FHD, Gultyaev AP, Pleij WA, Ng J, Oliehoek J. PseudoBase: a database with RNA pseudoknots. Nucleic Acids Res. (2000) 28:201–204.
[Abstract/Free Full Text] - Do CB, Mahabhashyam M.SP, Brudno M, Batzoglou S. PROBCONS: Probabilistic consistency-based multiple sequence alignment. Genome Research (2005) 15:330–340.
[Abstract/Free Full Text] - Holmes I. Accelerated probabilistic inference of RNA structure evolution. BMC Bioinform. (2005) 6:73.[CrossRef][Medline]
- Hofacker I. Vienna RNA secondary structure server. Nucleic Acids Res. (2003) 31:3429–3431.
[Abstract/Free Full Text] - Kin T, Yamada K, Terai G, Okida H, Yoshinari Y, Ono Y, Kojima A, Kimura Y, Komori T, Asai K. fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences. Nucleic Acids Res. (2006) 35:D145–D148.[ISI][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||