Nucleic Acids Research, 2001, Vol. 29, No. 19 3928-3938
© 2001 Oxford University Press
A computational approach to identify genes for functional RNAs in genomic sequences
Computational and Theoretical Biology Department, Physical Biosciences Division and 1National Energy Research Scientific Computing Center, Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA
Currently there is no successful computational approach for identification of genes encoding novel functional RNAs (fRNAs) in genomic sequences. We have developed a machine learning approach using neural networks and support vector machines to extract common features among known RNAs for prediction of new RNA genes in the unannotated regions of prokaryotic and archaeal genomes. The Escherichia coli genome was used for development, but we have applied this method to several other bacterial and archaeal genomes. Networks based on nucleotide composition were 8090% accurate in jackknife testing experiments for bacteria and 9099% for hyperthermophilic archaea. We also achieved a significant improvement in accuracy by combining these predictions with those obtained using a second set of parameters consisting of known RNA sequence motifs and the calculated free energy of folding. Several known fRNAs not included in the training datasets were identified as well as several hundred predicted novel RNAs. These studies indicate that there are many unidentified RNAs in simple genomes that can be predicted computationally as a precursor to experimental study. Public access to our RNA gene predictions and an interface for user predictions is available via the web.
* To whom correspondence should be addressed. Tel: +1 510 486 4305; Fax: +1 510 486 6059; Email: srholbrook{at}lbl.gov
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
T. T. Tran, F. Zhou, S. Marshburn, M. Stead, S. R. Kushner, and Y. Xu De novo computational prediction of non-coding RNA genes in prokaryotic genomes Bioinformatics, November 15, 2009; 25(22): 2897 - 2905. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. S. Shavkunov, I. S. Masulis, M. N. Tutukina, A. A. Deev, and O. N. Ozoline Gains and unexpected lessons from genome-scale promoter mapping Nucleic Acids Res., August 1, 2009; 37(15): 4919 - 4931. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Smollett, A. S. Fivian-Hughes, J. E. Smith, A. Chang, T. Rao, and E. O. Davis Experimental determination of translational start sites resolves uncertainties in genomic open reading frame predictions - application to Mycobacterium tuberculosis Microbiology, January 1, 2009; 155(1): 186 - 197. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Pichon and B. Felden Small RNA gene identification and mRNA target predictions in bacteria Bioinformatics, December 15, 2008; 24(24): 2807 - 2813. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. T. Tran, P. Dam, Z. Su, F. L. Poole II, M. W. W. Adams, G. T. Zhou, and Y. Xu Operon prediction in Pyrococcus furiosus Nucleic Acids Res., January 12, 2007; 35(1): 11 - 20. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Wang, C. Ding, R. F. Meraz, and S. R. Holbrook PSoL: a positive sample only learning algorithm for finding non-coding RNA genes Bioinformatics, November 1, 2006; 22(21): 2590 - 2596. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Christiansen, J. S. Nielsen, T. Ebersbach, P. Valentin-Hansen, L. Sogaard-Andersen, and B. H. Kallipolitis Identification of small Hfq-binding RNAs in Listeria monocytogenes RNA, July 1, 2006; 12(7): 1383 - 1396. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Ochman and L. M. Davalos The nature and dynamics of bacterial genomes. Science, March 24, 2006; 311(5768): 1730 - 1733. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Larranaga, B. Calvo, R. Santana, C. Bielza, J. Galdiano, I. Inza, J. A. Lozano, R. Armananzas, G. Santafe, A. Perez, et al. Machine learning in bioinformatics Brief Bioinform, March 1, 2006; 7(1): 86 - 112. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Laserson, H. H. Gan, and T. Schlick Predicting candidate genomic sequences that correspond to synthetic functional RNA motifs Nucleic Acids Res., October 27, 2005; 33(18): 6057 - 6069. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Saetrom, R. Sneve, K. I. Kristiansen, O. Snove Jr, T. Grunfeld, T. Rognes, and E. Seeberg Predicting non-coding RNA genes in Escherichia coli with boosted genetic programming Nucleic Acids Res., June 7, 2005; 33(10): 3263 - 3270. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Pasquali, H. H. Gan, and T. Schlick Modular RNA architecture revealed by computational analysis of existing pseudoknots and ribosomal RNAs Nucleic Acids Res., March 3, 2005; 33(4): 1384 - 1398. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Vogel, V. Bartels, T. H. Tang, G. Churakov, J. G. Slagter-Jager, A. Huttenhofer, and E. G. H. Wagner RNomics in Escherichia coli detects new sRNA species and indicates parallel transcriptional output in bacteria Nucleic Acids Res., November 15, 2003; 31(22): 6435 - 6443. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. P. McCutcheon and S. R. Eddy Computational identification of non-coding RNAs in Saccharomyces cerevisiae by comparative genomics Nucleic Acids Res., July 15, 2003; 31(14): 4119 - 4128. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. H. Gan, S. Pasquali, and T. Schlick Exploring the repertoire of RNA secondary motifs using graph theory; implications for RNA design Nucleic Acids Res., June 1, 2003; 31(11): 2926 - 2943. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kiyosawa, I. Yamanaka, N. Osato, S. Kondo, and Y. Hayashizaki Antisense Transcripts With FANTOM2 Clone Set and Their Implications for Gene Regulation Genome Res., June 1, 2003; 13(6): 1324 - 1334. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Hershberg, S. Altuvia, and H. Margalit A survey of small RNA-encoding genes in Escherichia coli Nucleic Acids Res., April 1, 2003; 31(7): 1813 - 1820. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Szymanski, V. A. Erdmann, and J. Barciszewski Noncoding regulatory RNAs database Nucleic Acids Res., January 1, 2003; 31(1): 429 - 431. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Tjaden, R. M. Saxena, S. Stolyar, D. R. Haynor, E. Kolker, and C. Rosenow Transcriptome analysis of Escherichia coli using high-density oligonucleotide probe arrays Nucleic Acids Res., September 1, 2002; 30(17): 3732 - 3738. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-c. Wang and D. A. Hickey Evidence for strong selective constraint acting on the nucleotide composition of 16S ribosomal RNA genes Nucleic Acids Res., June 1, 2002; 30(11): 2501 - 2507. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Klein, Z. Misulovin, and S. R. Eddy Noncoding RNA genes identified in AT-rich hyperthermophiles PNAS, May 28, 2002; 99(11): 7542 - 7547. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Storz An Expanding Universe of Noncoding RNAs Science, May 17, 2002; 296(5571): 1260 - 1263. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Kaplinsky, D. M. Braun, J. Penterman, S. A. Goff, and M. Freeling Utility and distribution of conserved noncoding sequences in the grasses PNAS, April 30, 2002; 99(9): 6147 - 6151. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. GOTTESMAN, G. STORZ, C. ROSENOW, N. MAJDALANI, F. REPOILA, and K.M. WASSARMAN Small RNA Regulators of Translation: Mechanisms of Action and Approaches for Identifying New Small RNAs Cold Spring Harb Symp Quant Biol, January 1, 2001; 66(0): 353 - 362. [Abstract] [PDF] |
||||
![]() |
N. J. Kaplinsky, D. M. Braun, J. Penterman, S. A. Goff, and M. Freeling Utility and distribution of conserved noncoding sequences in the grasses PNAS, April 30, 2002; 99(9): 6147 - 6151. [Abstract] [Full Text] [PDF] |
||||








