Nucleic Acids Research Advance Access published online on November 11, 2006
Nucleic Acids Research, doi:10.1093/nar/gkl837
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
fRNAdb: a platform for mining/annotating functional RNA candidates from non-coding RNA sequences
Taishin Kin1,*,
Kouichirou Yamada3,
Goro Terai2,
Hiroaki Okida2,
Yasuhiko Yoshinari4,
Yukiteru Ono3,
Aya Kojima2,
Yuki Kimura3,
Takashi Komori2 and
Kiyoshi Asai1,5
1 Computational Biology Research Center, National Institute of Advanced Industrial Science and Technology (AIST) Aomi 2-42, Koto-ku, Tokyo 135-0064, Japan
2 Intec Web and Genome Informatics, 1-3-3 Shinsuna Koto-ku, Tokyo 136-0075, Japan
3 Information and Mathematical Science Laboratory, 1-5-21 Oh-tsuka, Bunkyo-ku, Tokyo 112-0012, Japan
4 Mitsubishi Research Institute, 2-3-6 O-temachi, Chiyoda-ku, Tokyo 100-8140, Japan
5 Department of Computational Biology, Graduate School of Frontier Sciences University of Tokyo, 5-1-5 Kashiwa-no-ha, Chiba 277-8583, Japan
*To whom correspondence should be addressed. Tel: +81 3 355 8059; Fax: +81 3 355 8081; Email: kin-taushin{at}aist.go.jp
Received August 15, 2006. Revised September 20, 2006. Accepted October 6, 2006.
 |
ABSTRACT
|
|---|
There are abundance of transcripts that code for no particular
protein and that remain functionally uncharacterized. Some of
these transcripts may have novel functions while others might
be junk transcripts. Unfortunately, the experimental validation
of such transcripts to find functional non-coding RNA candidates
is very costly. Therefore, our primary interest is to computationally
mine candidate functional transcripts from a pool of uncharacterized
transcripts. We introduce fRNAdb: a novel database service that
hosts a large collection of non-coding transcripts including
annotated/non-annotated sequences from the H-inv database, NONCODE
and RNAdb. A set of computational analyses have been performed
on the included sequences. These analyses include RNA secondary
structure motif discovery, EST support evaluation,
cis-regulatory
element search, protein homology search, etc. fRNAdb provides
an efficient interface to help users filter out particular transcripts
under their own criteria to sort out functional RNA candidates.
fRNAdb is available at
http://www.ncrna.org/
 |
INTRODUCTION
|
|---|
fRNAdb is a database that helps in annotating non-coding transcripts
acquired from publicly available databases. H-inv: human full-length
non-coding cDNAs (
1); NONCODE: experimentally validated non-coding
transcripts (
2); and RNAdb: non-coding transcripts curated from
the literature, human chromosome 7 project, and RIKEN antisense
pipeline and other putative non-coding RNAs (
3). Details are
shown in
Table 1. Each transcript is analyzed for various features
such as maximum ORF length, the number of protein homologs,
the average conservation score, transcription regulatory element
motifs, existence of CpG islands and so on (listed in
Table 2)
that help in filtering out promising non-coding candidates.
Transcripts can be filtered with fRNAdb's main listing interface
in many different ways (see
Figure 1). This main listing interface
is linked to our custom UCSC Genome Browser (
4) for functional
RNAs equipped with our RNA-specific original custom tracks that
are specific to screening of functional RNA. Users can inspect
a transcript of interest from a genomic view with rich genomic
information surrounding the mapped transcript. The information
includes the UCSC original tracks such as known genes, genome
conservation and Affymetrix transcriptome tracks (
5), and our
original tracks such as conserved potential secondary structure,
existence of known RNA secondary structure motifs and significant
RNA secondary structure
Z-score regions (for details see
Table 3).
 |
fRNAdb
|
|---|
fRNAdb provides two types of interfaces. The first page presents
a list of all transcripts rendered as a table with 35 columns
including ones for the attributes described in
Table 2 (
Figure 1B).
The tabular control panel is placed above the table, which presents
five tabs labeled Basic, DB/ID,
Expert, Sort and Column
(
Figure 1A). The Basic tab contains the basic filters: a collection
of frequently used filters that provide simple and quick selection
of transcripts that match common criteria of functional non-coding
RNAs. For example, checking Mapped to select only
genome-mapped transcripts, Well conserved at best (Max
> 50%) for transcripts that have maximum conservation
score >50% among 17 vertebrates (
4) in their exonic regions,
EST-supported for reliable expression evidence,
Tiny ORF (<40 aa) enriching for non-coding
transcripts, Low Repeat Coverage (<30%) for
no repeat element contamination, No protein homolog
for another condition which enriches non-coding transcripts,
No overlapping known gene is for removing the
possibility of being part of a protein-coding gene transcript.
After checking the boxes, the refresh button runs
filtering action and presents results. Our example conditions
yield nine hits including one H-inv non-protein coding cDNA
and eight RNAdb literature-curated miRNAs. In other words, these
criteria match real functional RNAs and also indicate that one
non-coding transcript shares the same properties. Clicking on
the ID of this transcript produces a detailed view of this transcript
shown in
Figure 2. This feature visualizer shows graphical representation
of a variety of sequence elements found in the transcript including
cis-regulatory elements, repeat elements, EST mapping regions
and six frame stop codon positions. There are many different
ways to filter these non-coding transcripts and there are many
more potential candidates hidden in this dataset. More details
of the basic filters are provided on the website.
The rest of the tabs offer additional functionality to further
improve usability. The DB/ID tab contains DB selection and ID
selection boxes. The DB selection box allows you to limit the
target databases from currently available databases: H-inv,
NONCODE and RNAdb. The ID selection box lets you choose target
transcripts that match given string patterns. For example, specifying
FR000001 (fRNAdb ID) in this box limits the target
transcript FR000001 alone. The wild-card % is
allowed for pattern matching. Specifying LIT%
lets you limit the search to targets whose original IDs start
with LIT. The string pattern is matched against
ID, Acc. and Original columns. The Expert tab provides an interface
to specify multiple conditions that let you perform more complex
filtering than the basic filters. Please refer to the website
for more details about the expert filters. The Sort tab has
a sorting interface that lets you sort the table with multiple
sorting keys. The Column tab allows you to limit visible columns
of the main listing table. Since the 35-column table is too
wide for ordinary browsers to display on a single screen, you
can narrow the width of the table with this interface for better
visibility.
 |
UCSC GENOME BROWSER FOR FUNCTIONAL RNAs
|
|---|
We mirrored the UCSC Genome Browser and added our custom tracks
specific to functional RNAs and miRNAs as shown in
Tables 3 and
4. Most of the tracks have their own sources and reference
papers. Our original tracks are RNA clusters, Rfam seed folds,
tRNAscan-SE, Ultra Conserved Elements 17way and
Z-score (details
are shown in
Table 3). Besides, we mapped RNA sequences from
public functional RNA sequence databases including Erdmann (
6),
NONCODE, RNAdb and Rfam. The UCSC Genome Browser has several
tracks for miRNA genes and targets but we added more tracks
including miRBase (
7) known miRNA genes, miRNAMap (
8) and Berezikov's
predicted miRNA genes (
9), TarBase (
10) known miRNA targets,
and predicted miRNA targets from RNAhybrid (
11), PicTar 4 species
and 5 species (
12), miRBase targets and T-ScanS miRNA targets
(
13). Our custom tracks can be downloaded by using Table browser
which can be accessed via Table menu of the UCSC
Genome Browser.
In the near future, fRNAdb will include more transcripts from
other sequence databases or non-coding gene prediction results.
For example, Human Accelerated Region (
14) is currently included
as our custom track of the Genome Browser. Sequences of these
non-coding gene candidates will be included in fRNAdb. We will
also add more attributes to fRNAdb. Especially attributes representing
expression patterns of the transcripts or protein genes related
to the transcripts.
 |
ACKNOWLEDGEMENTS
|
|---|
This research is partially supported by the Functional RNA project
funded by Ministry of Economy, Trade and Industry (METI). We
thank Dr. Paul Horton for his kind help. Funding to pay the
Open Access publication charges for this article was provided
by National Institute of Advanced Industrial Science and Technology
(AIST).
Conflict of interest statement. None declared.
 |
REFERENCES
|
|---|
- Imanishi, T., Itho, T., Suzuki, Y., O'Donovan, C., Fukuchi, S., Koyanagi, K.O., Barrero, R.A., Tamura, T., Yamaguchi-Kabata, Y., Tanino, M., et al. (2004) Integrative annotation of 21,037 human genes validated by full-length cDNA clones PLoS Biol, . 2, 856875
- Liu, C., Bai, B., Skogerbo, G., Cai, L., Deng, W., Zhang, Y., Bu, D., Zhao, Y., Chen, R. (2005) NONCODE: an integrated knowledge database of non-coding RNAs Nucleic Acids Res, . 33, D112D115[Abstract/Free Full Text]
- Pang, K.C., Stephen, S., Engstrom, P.G., Tajul-Arifin, K., Chen, W., Wahlestedt, C., Lenhard, B., Hayashizaki, Y., Mattick, J.S. (2005) RNAdba comprehensive mammalian noncoding RNA database Nucleic Acids Res, . 33, D125D130[Abstract/Free Full Text]
- Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., et al. (2006) The UCSC Genome Browser Database: update 2006 Nucleic Acids Res, . 34, D590D598[Abstract/Free Full Text]
- Cheng, J., Kapranov, P., Drenkow, J., Dike, S., Brubaker, S., Patel, S., Long, J., Stern, D., Tammana, H., Helt, G., et al. (2005) Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution Science, 308, 11491154[Abstract/Free Full Text]
- Szymanski, M., Erdmann, V.A., Barciszewski, J. (2003) Noncoding regulatory RNAs database Nucleic Acids Res, . 31, 429431[Abstract/Free Full Text]
- Griffiths-Jones, S. (2006) miRBase: microRNA sequences, targets and gene nomenclature Nucleic Acids Res, . 34, D140D144[Abstract/Free Full Text]
- Hsu, P.W., Huang, H.D., Hsu, S.D., Lin, L.Z., Tsou, A.P., Tseng, C.P., Stadler, P.F., Washietl, S., Hofacker, I.L. (2006) miRNAMap: genomic maps of microRNA genes and their target genes in mammalian genome Nucleic Acids Res, . 34, D135D139[Abstract/Free Full Text]
- Berezikov, E., Guryev, V., van de Belt, J., Wienholds, E., Plasterk, R.H., Cuppen, E. (2005) Phylogenetic shadowing and computational identification of human microRNA genes Cell, 120, 2124[CrossRef][ISI][Medline]
- Sethupathy, P., Corda, B., Hatzigeorgiou, A.G. (2006) TarBase: a comprehensive database of experimentally supported animal microRNA targets RNA, 12, 192197[Abstract/Free Full Text]
- Kuger, J. and Rehmsmeier, M. (2006) RNAhybrid: microRNA target prediction easy, fast and flexible Nucleic Acids Res, . 34, W451W454[Abstract/Free Full Text]
- Krek, A., Grun, D., Poy, M.N., Wolf, R., Rosenberg, L, Epstein, E.J., MacMenamin, P., da Piedade, I., Gunsalus, K.C., Stoffel, M., et al. (2005) Combinatorial microRNA target predictions Nature Genet, . 37, 495500[CrossRef][ISI][Medline]
- Lewis, B.P., Burge, C.B., Bartel, D.P. (2005) Conserved seed pairing, often flanked by adenosines, indicates that thousands of human genes are microRNA targets Cell, 120, 1520[CrossRef][ISI][Medline]
- Pollard, K.S., Salama, S.R., Lambert, N., Lambot, M.A., Coppens, S., Pedersen, J.S., Katzman, S., King, B., Onodera, C., Siepel, A., et al. (2006) An RNA gene expressed during cortical development evolved rapidly in humans Nature, 443, 167172[CrossRef][Medline]
- Washietl, S., Hofacker, I.L., Lukasser, M., Huttenhofer, A., Stadler, P.F. (2005) Mapping of conserved RNA secondary structures predicts thousands of functional noncoding RNAs in the human genome Nat. Biotechnol, . 23, 13831390[CrossRef][ISI][Medline]
- Furuno, M., Pang, K.C., Ninomiya, N., Fukuda, S., Frith, M.C., Bult, C., Kai, C., Kawai, J., Carninci, P., Hayashizaki, Y., et al. (2006) Clusters of internally primed transcripts reveal novel long noncoding RNAs PLoS Genet, . 2, e37[CrossRef][Medline]
- Chen, J., Sun, M., Kent, W.J., Huang, X., Xie, H., Wang, W., Zhou, G., Shi, R.Z., Rowley, J.D. (2004) Over 20% of human transcripts might form sense-antisense pairs Nucleic Acids Res, . 32, 48124820[Abstract/Free Full Text]
- Lowe, T.M. and Eddy, S.R. (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence Nucleic Acids Res, . 25, 955964[Abstract/Free Full Text]
- Bejerano, G., Pheasant, M., Makunin, I., Stephen, S., Kent, W.J., Mattick, J.S., Haussler, D. (2004) Ultraconserved elements in the human genome Science, 304, 13211325[Abstract/Free Full Text]
- Simons, C., Pheasant, M., Makunin, I.V., Mattick, J.S. (2005) Transposon-free regions in mammalian genome Genome Res, . 16, 164172

CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:

|
 |

|
 |
 
B. S. Srinivasan, N. H. Shah, J. A. Flannick, E. Abeliuk, A. F. Novak, and S. Batzoglou
Current progress in network research: toward reference networks for key model organisms
Brief Bioinform,
September 1, 2007;
8(5):
318 - 332.
[Abstract]
[Full Text]
[PDF]
|
 |
|