Nucleic Acids Research Advance Access originally published online on April 9, 2009
Nucleic Acids Research 2009 37(11):3569-3579; doi:10.1093/nar/gkp220
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 11 3569-3579
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
Accurate prediction of NAGNAG alternative splicing
1Leibniz Institute for Age Research – Fritz Lipmann Institute, Genome Analysis, Beutenbergstrasse 11, 07745 Jena, 2Albert-Ludwigs-University, Institute of Computer Science, Bioinformatics Group, Georges-Koehler-Allee 106, 79110 Freiburg, 3Friedrich-Schiller-University, Faculty of Biology and Pharmacy, Department of Bioinformatics, Ernst-Abbe-Platz 2, 07743 Jena and 4Leibniz Institute for Natural Product Research and Infection Biology, Hans-Knöll-Institute (HKI), Systems Biology/Bioinformatics, Beutenbergstrasse.11a, 07745 Jena, Germany
*To whom correspondence should be addressed. Tel: +49 (0) 761 203 7461; Fax: +49 (0) 761 203 7462; Email: backofen{at}informatik.uni-freiburg.de
Received October 6, 2008. Revised March 17, 2009. Accepted March 19, 2009.
Alternative splicing (AS) involving NAGNAG tandem acceptors is an evolutionarily widespread class of AS. Recent predictions of alternative acceptor usage reported better results for acceptors separated by larger distances, than for NAGNAGs. To improve the latter, we aimed at the use of Bayesian networks (BN), and extensive experimental validation of the predictions. Using carefully constructed training and test datasets, a balanced sensitivity and specificity of
92% was achieved. A BN trained on the combined dataset was then used to make predictions, and 81% (38/47) of the experimentally tested predictions were verified. Using a BN learned on human data on six other genomes, we show that while the performance for the vertebrate genomes matches that achieved on human data, there is a slight drop for Drosophila and worm. Lastly, using the prediction accuracy according to experimental validation, we estimate the number of yet undiscovered alternative NAGNAGs. State of the art classifiers can produce highly accurate prediction of AS at NAGNAGs, indicating that we have identified the major features of the NAGNAG-splicing code within the splice site and its immediate neighborhood. Our results suggest that the mechanism behind NAGNAG AS is simple, stochastic, and conserved among vertebrates and beyond.
Present address: Michael Hiller, Department of Developmental Biology, Stanford University, Stanford, CA 94305, USA.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.