Nucleic Acids Research Advance Access originally published online on November 11, 2006
Nucleic Acids Research 2007 35(Database issue):D145-D148; doi:10.1093/nar/gkl837
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, Database issue D145-D148
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences
1 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) Aomi 2-42, Koto-ku, Tokyo 135-0064, Japan 2 Intec Web and Genome Informatics, 1-3-3 Shinsuna Koto-ku, Tokyo 136-0075, Japan 3 Information and Mathematical Science Laboratory, 1-5-21 Oh-tsuka, Bunkyo-ku, Tokyo 112-0012, Japan 4 Mitsubishi Research Institute, 2-3-6 O-temachi, Chiyoda-ku, Tokyo 100-8140, Japan 5 Department of Computational Biology, Graduate School of Frontier Sciences University of Tokyo, 5-1-5 Kashiwa-no-ha, Chiba 277-8583, Japan
*To whom correspondence should be addressed. Tel: +81 3 355 8059; Fax: +81 3 355 8081; Email: kin-taushin{at}aist.go.jp
Received August 15, 2006. Revised September 20, 2006. Accepted October 6, 2006.
| ABSTRACT |
|---|
|
|
|---|
There are abundance of transcripts that code for no particular protein and that remain functionally uncharacterized. Some of these transcripts may have novel functions while others might be junk transcripts. Unfortunately, the experimental validation of such transcripts to find functional non-coding RNA candidates is very costly. Therefore, our primary interest is to computationally mine candidate functional transcripts from a pool of uncharacterized transcripts. We introduce fRNAdb: a novel database service that hosts a large collection of non-coding transcripts including annotated/non-annotated sequences from the H-inv database, NONCODE and RNAdb. A set of computational analyses have been performed on the included sequences. These analyses include RNA secondary structure motif discovery, EST support evaluation, cis-regulatory element search, protein homology search, etc. fRNAdb provides an efficient interface to help users filter out particular transcripts under their own criteria to sort out functional RNA candidates. fRNAdb is available at http://www.ncrna.org/
| INTRODUCTION |
|---|
|
|
|---|
fRNAdb is a database that helps in annotating non-coding transcripts acquired from publicly available databases. H-inv: human full-length non-coding cDNAs (1); NONCODE: experimentally validated non-coding transcripts (2); and RNAdb: non-coding transcripts curated from the literature, human chromosome 7 project, and RIKEN antisense pipeline and other putative non-coding RNAs (3). Details are shown in Table 1. Each transcript is analyzed for various features such as maximum ORF length, the number of protein homologs, the average conservation score, transcription regulatory element motifs, existence of CpG islands and so on (listed in Table 2) that help in filtering out promising non-coding candidates. Transcripts can be filtered with fRNAdb's main listing interface in many different ways (see Figure 1). This main listing interface is linked to our custom UCSC Genome Browser (4) for functional RNAs equipped with our RNA-specific original custom tracks that are specific to screening of functional RNA. Users can inspect a transcript of interest from a genomic view with rich genomic information surrounding the mapped transcript. The information includes the UCSC original tracks such as known genes, genome conservation and Affymetrix transcriptome tracks (5), and our original tracks such as conserved potential secondary structure, existence of known RNA secondary structure motifs and significant RNA secondary structure Z-score regions (for details see Table 3).
|
|
|
|
| fRNAdb |
|---|
|
|
|---|
fRNAdb provides two types of interfaces. The first page presents a list of all transcripts rendered as a table with 35 columns including ones for the attributes described in Table 2 (Figure 1B). The tabular control panel is placed above the table, which presents five tabs labeled Basic, DB/ID, Expert, Sort and Column (Figure 1A). The Basic tab contains the basic filters: a collection of frequently used filters that provide simple and quick selection of transcripts that match common criteria of functional non-coding RNAs. For example, checking Mapped to select only genome-mapped transcripts, Well conserved at best (Max > 50%) for transcripts that have maximum conservation score >50% among 17 vertebrates (4) in their exonic regions, EST-supported for reliable expression evidence, Tiny ORF (<40 aa) enriching for non-coding transcripts, Low Repeat Coverage (<30%) for no repeat element contamination, No protein homolog for another condition which enriches non-coding transcripts, No overlapping known gene is for removing the possibility of being part of a protein-coding gene transcript. After checking the boxes, the refresh button runs filtering action and presents results. Our example conditions yield nine hits including one H-inv non-protein coding cDNA and eight RNAdb literature-curated miRNAs. In other words, these criteria match real functional RNAs and also indicate that one non-coding transcript shares the same properties. Clicking on the ID of this transcript produces a detailed view of this transcript shown in Figure 2. This feature visualizer shows graphical representation of a variety of sequence elements found in the transcript including cis-regulatory elements, repeat elements, EST mapping regions and six frame stop codon positions. There are many different ways to filter these non-coding transcripts and there are many more potential candidates hidden in this dataset. More details of the basic filters are provided on the website.
|
The rest of the tabs offer additional functionality to further improve usability. The DB/ID tab contains DB selection and ID selection boxes. The DB selection box allows you to limit the target databases from currently available databases: H-inv, NONCODE and RNAdb. The ID selection box lets you choose target transcripts that match given string patterns. For example, specifying FR000001 (fRNAdb ID) in this box limits the target transcript FR000001 alone. The wild-card % is allowed for pattern matching. Specifying LIT% lets you limit the search to targets whose original IDs start with LIT. The string pattern is matched against ID, Acc. and Original columns. The Expert tab provides an interface to specify multiple conditions that let you perform more complex filtering than the basic filters. Please refer to the website for more details about the expert filters. The Sort tab has a sorting interface that lets you sort the table with multiple sorting keys. The Column tab allows you to limit visible columns of the main listing table. Since the 35-column table is too wide for ordinary browsers to display on a single screen, you can narrow the width of the table with this interface for better visibility.
| UCSC GENOME BROWSER FOR FUNCTIONAL RNAs |
|---|
|
|
|---|
We mirrored the UCSC Genome Browser and added our custom tracks specific to functional RNAs and miRNAs as shown in Tables 3 and 4. Most of the tracks have their own sources and reference papers. Our original tracks are RNA clusters, Rfam seed folds, tRNAscan-SE, Ultra Conserved Elements 17way and Z-score (details are shown in Table 3). Besides, we mapped RNA sequences from public functional RNA sequence databases including Erdmann (6), NONCODE, RNAdb and Rfam. The UCSC Genome Browser has several tracks for miRNA genes and targets but we added more tracks including miRBase (7) known miRNA genes, miRNAMap (8) and Berezikov's predicted miRNA genes (9), TarBase (10) known miRNA targets, and predicted miRNA targets from RNAhybrid (11), PicTar 4 species and 5 species (12), miRBase targets and T-ScanS miRNA targets (13). Our custom tracks can be downloaded by using Table browser which can be accessed via Table menu of the UCSC Genome Browser.
|
In the near future, fRNAdb will include more transcripts from other sequence databases or non-coding gene prediction results. For example, Human Accelerated Region (14) is currently included as our custom track of the Genome Browser. Sequences of these non-coding gene candidates will be included in fRNAdb. We will also add more attributes to fRNAdb. Especially attributes representing expression patterns of the transcripts or protein genes related to the transcripts.
| ACKNOWLEDGEMENTS |
|---|
This research is partially supported by the Functional RNA project funded by Ministry of Economy, Trade and Industry (METI). We thank Dr. Paul Horton for his kind help. Funding to pay the Open Access publication charges for this article was provided by National Institute of Advanced Industrial Science and Technology (AIST).
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Imanishi, T., Itho, T., Suzuki, Y., O'Donovan, C., Fukuchi, S., Koyanagi, K.O., Barrero, R.A., Tamura, T., Yamaguchi-Kabata, Y., Tanino, M., et al. (2004) Integrative annotation of 21,037 human genes validated by full-length cDNA clones PLoS Biol, . 2, 856875 .
- Liu, C., Bai, B., Skogerbo, G., Cai, L., Deng, W., Zhang, Y., Bu, D., Zhao, Y., Chen, R. (2005) NONCODE: an integrated knowledge database of non-coding RNAs Nucleic Acids Res, . 33, D112D115
[Abstract/Free Full Text] . - Pang, K.C., Stephen, S., Engstrom, P.G., Tajul-Arifin, K., Chen, W., Wahlestedt, C., Lenhard, B., Hayashizaki, Y., Mattick, J.S. (2005) RNAdba comprehensive mammalian noncoding RNA database Nucleic Acids Res, . 33, D125D130
[Abstract/Free Full Text] . - Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., et al. (2006) The UCSC Genome Browser Database: update 2006 Nucleic Acids Res, . 34, D590D598
[Abstract/Free Full Text] . - Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., et al. (2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution Science, 308, 11491154
[Abstract/Free Full Text] . - Szymanski, M., Erdmann, V.A., Barciszewski, J. (2003) Noncoding regulatory RNAs database Nucleic Acids Res, . 31, 429431
[Abstract/Free Full Text] . - Griffiths-Jones, S. (2006) miRBase: microRNA sequences, targets and gene nomenclature Nucleic Acids Res, . 34, D140D144
[Abstract/Free Full Text] . - Hsu, P.W., Huang, H.D., Hsu, S.D., Lin, L.Z., Tsou, A.P., Tseng, C.P., Stadler, P.F., Washietl, S., Hofacker, I.L. (2006) miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genome Nucleic Acids Res, . 34, D135D139
[Abstract/Free Full Text] . - Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H., Cuppen, E. (2005) Phylogenetic shadowing and computational identification of human microRNA genes Cell, 120, 2124[CrossRef][Web of Science][Medline] .
- Sethupathy, P., Corda, B., Hatzigeorgiou, A.G. (2006) TarBase: a comprehensive database of experimentally supported animal microRNA targets RNA, 12, 192197
[Abstract/Free Full Text] . - Kuger, J. and Rehmsmeier, M. (2006) RNAhybrid: microRNA target prediction easy, fast and flexible Nucleic Acids Res, . 34, W451W454
[Abstract/Free Full Text] . - Krek, A., Grun, D., Poy, M.N., Wolf, R., Rosenberg, L, Epstein, E.J., MacMenamin, P., da Piedade, I., Gunsalus, K.C., Stoffel, M., et al. (2005) Combinatorial microRNA target predictions Nature Genet, . 37, 495500[CrossRef][Web of Science][Medline] .
- Lewis, B.P., Burge, C.B., Bartel, D.P. (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets Cell, 120, 1520[CrossRef][Web of Science][Medline] .
- Pollard, K.S., Salama, S.R., Lambert, N., Lambot, M.A., Coppens, S., Pedersen, J.S., Katzman, S., King, B., Onodera, C., Siepel, A., et al. (2006) An RNA gene expressed during cortical development evolved rapidly in humans Nature, 443, 167172[CrossRef][Medline] .
- Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., Stadler, P.F. (2005) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome Nat. Biotechnol, . 23, 13831390[CrossRef][Web of Science][Medline] .
- Furuno, M., Pang, K.C., Ninomiya, N., Fukuda, S., Frith, M.C., Bult, C., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., et al. (2006) Clusters of internally primed transcripts reveal novel long noncoding RNAs PLoS Genet, . 2, e37[CrossRef][Medline] .
- Chen, J., Sun, M., Kent, W.J., Huang, X., Xie, H., Wang, W., Zhou, G., Shi, R.Z., Rowley, J.D. (2004) Over 20% of human transcripts might form sense-antisense pairs Nucleic Acids Res, . 32, 48124820
[Abstract/Free Full Text] . - Lowe, T.M. and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence Nucleic Acids Res, . 25, 955964
[Abstract/Free Full Text] . - Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S., Haussler, D. (2004) Ultraconserved elements in the human genome Science, 304, 13211325
[Abstract/Free Full Text] . - Simons, C., Pheasant, M., Makunin, I.V., Mattick, J.S. (2005) Transposon-free regions in mammalian genome Genome Res, . 16, 164172
.
This article has been cited by other articles:
![]() |
C. Yamasaki, K. Murakami, J.-i. Takeda, Y. Sato, A. Noda, R. Sakate, T. Habara, H. Nakaoka, F. Todokoro, A. Matsuya, et al. H-InvDB in 2009: extended database and data mining resources for human genes and transcripts Nucleic Acids Res., November 23, 2009; (2009) gkp1020v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.-E. Lai, M.-Y. Tsai, Y.-C. Liu, C.-W. Wang, K.-T. Chen, and C. L. Lu FASTR3D: a fast and accurate search tool for similar RNA 3D structures Nucleic Acids Res., July 1, 2009; 37(suppl_2): W287 - W295. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Tabei and K. Asai A local multiple alignment method for detection of non-coding RNA sequences Bioinformatics, June 15, 2009; 25(12): 1498 - 1505. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Akama, K. Suzuki, K. Tanigawa, A. Kawashima, H. Wu, N. Nakata, Y. Osana, Y. Sakakibara, and N. Ishii Whole-Genome Tiling Array Analysis of Mycobacterium leprae RNA Reveals High Expression of Pseudogenes and Noncoding Regions J. Bacteriol., May 15, 2009; 191(10): 3321 - 3327. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Mituyama, K. Yamada, E. Hattori, H. Okida, Y. Ono, G. Terai, A. Yoshizawa, T. Komori, and K. Asai The Functional RNA Database 3.0: databases to support mining and annotation of functional RNAs Nucleic Acids Res., January 1, 2009; 37(suppl_1): D89 - D92. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. H. Jacobs, A. Chen, S. G. Stevens, P. A. Stockwell, M. A. Black, W. P. Tate, and C. M. Brown Transterm: a database to aid the analysis of regulatory sequences in mRNAs Nucleic Acids Res., January 1, 2009; 37(suppl_1): D72 - D76. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Abraham, O. Dror, R. Nussinov, and H. J. Wolfson Analysis and classification of RNA tertiary structures RNA, November 1, 2008; 14(11): 2274 - 2289. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Srinivasan, N. H. Shah, J. A. Flannick, E. Abeliuk, A. F. Novak, and S. Batzoglou Current progress in network research: toward reference networks for key model organisms Brief Bioinform, September 1, 2007; 8(5): 318 - 332. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






