Nucleic Acids Research Advance Access originally published online on November 15, 2006
Nucleic Acids Research 2007 35(Database issue):D93-D98; doi:10.1093/nar/gkl884
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, Database issue D93-D98
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species
Department of Chemistry and Biochemistry, Center for Computational Biology Institute for Genomics and Proteomics, Molecular Biology Institute, University of California Los Angeles, Los Angeles, CA 90095-1570, USA 1 Department of Biomathematics, David Geffen School of Medicine University of California Los Angeles, Los Angeles, CA 90095, USA
*To whom correspondence should be addressed. Tel: +1 310 825 7374; Fax: +1 310 206 7286; Email: leec{at}mbi.ucla.edu
Received September 15, 2006. Revised October 10, 2006. Accepted October 11, 2006.
| ABSTRACT |
|---|
|
|
|---|
We have greatly expanded the Alternative Splicing Annotation Project (ASAP) database: (i) its human alternative splicing data are expanded
3-fold over the previous ASAP database, to nearly 90 000 distinct alternative splicing events; (ii) it now provides genome-wide alternative splicing analyses for 15 vertebrate, insect and other animal species; (iii) it provides comprehensive comparative genomics information for comparing alternative splicing and splice site conservation across 17 aligned genomes, based on UCSC multigenome alignments; (iv) it provides an
2- to 3-fold expansion in detection of tissue-specific alternative splicing events, and of cancer versus normal specific alternative splicing events. We have also constructed a novel database linking orthologous exons and orthologous introns between genomes, based on multigenome alignment of 17 animal species. It can be a valuable resource for studies of gene structure evolution. ASAP II provides a new web interface enabling more detailed exploration of the data, and integrating comparative genomics information with alternative splicing data. We provide a set of tools for advanced data-mining of ASAP II with Pygr (the Python Graph Database Framework for Bioinformatics) including powerful features such as graph query, multigenome alignment query, etc. ASAP II is available at http://www.bioinformatics.ucla.edu/ASAP2. | INTRODUCTION |
|---|
|
|
|---|
Alternative splicing plays an important role in protein diversity and gene regulation (13). Recent studies on alternative splicing estimate that 4070% of human genes are alternatively spliced (46). Moreover, many splice variants alter the function of the protein product, and are involved in human diseases (7). Thus, alternative splicing is an important medical target for development of novel diagnostics and therapeutic drugs (8).
Genome-wide analyses of alternative splicing are mainly based on publicly available sequence databases such as GenBank (9) and Swiss-Prot/TrEMBL. HOLLYWOOD (10) and ASD (11) give comprehensive analyses of alternative splicing for human and mouse. Notably, those two databases provide with comparative studies between human and mouse. Lee et al. (12) constructed DEDB for genome-wide analysis of alternative splicing for Drosophila melanogaster. As well as alternative splicing analysis, ECgene (13,14) gives comprehensive analysis results for functional annotation of proteins and expression analysis. Furthermore, it has been recently expanded to nine species.
The Alternative Splicing Annotation Project (ASAP) database (15) is a widely used resource providing a genome-wide analysis of human alternative splicing and tissue-specific splicing (4,1620) based on expressed sequence tag (EST), messenger RNA (mRNA) and genome sequences. It has served as the basis for a wide variety of studies (2128).
Here we describe a major expansion of the ASAP database, designed to make it a good resource for analyzing and comparing alternative splicing between a wide range of animal genomes. Whereas the original release of ASAP focused entirely on human data, we have now included genome-wide analyses of alternative splicing for 15 animal species from human to nematodes. Furthermore, we have added a new dimension of comparative genomics tools, for comparing alternative splicing patterns, conservation of splice sites, exons and introns, across 17 animal genomes.
| MATERIALS AND METHODS |
|---|
|
|
|---|
We downloaded UniGene (29), GenBank (9) and Entrez Genes (30) from NCBI ftp site (UniGene; ftp://ftp.ncbi.nih.gov/repository/UniGene/, GenBank; ftp://ftp.ncbi.nih.gov/genbank/, Entrez Genes;ftp://ftp.ncbi.nih.gov/gene/) in January 2006. Genome assembly sequences, RefSeq (31)/mRNA alignments and RepeatMasker tracks were downloaded from UCSC genome browser except for yellow fever mosquito genome from Ensembl genome browser (32). Multigenome alignments for human (hg17), mouse (mm7), chicken (galGal2), fruit fly (dm2), zebrafish (danRer3) and western clawed frog (xenTro1) were downloaded from UCSC genome browser.
In order to update lists of tissue and cancer versus normal specific genes for human, we downloaded EST library information from UniLib (ftp:/ftp.ncbi.nih.gov/repository/UniLib/). A total of 2895 new human EST libraries were classified and added into existing 47 tissue categories and normal/tumor types. In total, 8828 human EST libraries were classified into 47 tissues and normal/tumor. We used same method used by Xu and Lee (19) for LOD value calculation for tissue and normal versus cancer specificity.
Orthologous exons, introns and splice site sequences were extracted using Pygr, which gives us less than a millisecond access to any location of any genome in multigenome alignments. Moreover, Pygr can be easily integrated with ASAP II database and more detailed information will be available at ASAP II website.
We defined as orthologous exons and introns if at least one of the splice sites of exons (those of flanking exons for introns) from two species is exactly aligned in multigenome alignments. This strategy can increase the possibilities of finding orthologous exons, because the exons can be within well-conserved blocks of multigenome alignments. Conventional protein similarity-based method can give only orthologous genes only if protein sequences are available. Moreover, multigenome alignment-based method enables us to interpret how alternatively spliced exons and introns are evolved across distant species.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
Alternative splicing analyses
Compared with the previous release of ASAP (15), ASAP II provides an
3-fold expansion in human alternative splicing events, to a total of 89 078 distinct alternative splicing relationships in human, detected within 11 717 genes (UniGene clusters). Out of the total set of multi-exon genes (22 220), 53% were detected to contain alternative splicing (Table 1). Focusing on genes with at least one mRNA sequence (for which our gene model is therefore likely to be full-length, and which generally have higher EST coverage), 75% (10 202 out of 13 690) were detected to contain alternative splicing. The continuing rapid growth in alternative splicing detection as a function of increased EST and mRNA counts suggests that the field is still far from saturation, and that far more experimental data will be required to obtain a complete catalog of human alternative splicing.
|
Another major change is the addition of alternative splicing analyses for 14 new animal genomes (Table 1), ranging from mammals, birds and fish, to insects, C.elegans and Ciona. This provides a very large dataset of non-human alternative splicing events (a total 67 095 alternative splicing relationships, over three-quarters the size of the human alternative splicing dataset). However, due to the limited EST coverage for many animal genomes (e.g. Fugu, honeybee), these data cannot be considered comprehensive. Numbers of mapped UniGene clusters can be lower than expected for Ciona, Fugu and yellow fever mosquito due to the incomplete genome assemblies. For mouse, 8711 (53%) out of 16 404 multi-exon genes were detected to contain alternative splicing and 60% (8203 out of 13 626) for genes with at least one mRNA. Twenty five percent of Rat, 22% of western clawed frog, 22% of chicken, 26% of cow and 19% of fruit fly multi-exon genes were detected to contain alternative splicing. Proportions of the alternatively spliced multi-exon genes for C.elegans (6%) and African malaria mosquito (8%) were lower than mammals. Alternative splicing analyses of 15 most sequenced species can expand our research area from human to nematodes as well as comparative and evolutionary studies between distantly related species.
As an illustration of ASAP II's value for biological discovery, we performed analyses of tissue-specificity and cancer versus normal specificity of human alternative splice forms. ASAP II yielded
2- to 3-fold larger identification of tissue-specific splice forms than the previous ASAP release (19,20). We added 2895 new EST libraries to our tissue classification database (Materials and Methods): each library source was classified as one of 47 tissue types, and also as tumor versus normal in origin. We found 1709 high-confidence (LOD
3) tissue-specific alternative splicing relationships from 960 genes, and 273 high-confidence (LOD
3) cancer-specific relationships from 198 genes. The largest categories of tissue-specific splice forms were identified from brain/nerve, testis, skin, muscle and lymph. Users can download all EST library classification and log-odds (LOD) calculation results from ASAP II download page and mine their own experimental candidates.
Comparative genomics analyses
To help researchers easily compare alternative splicing data between species, we performed a comprehensive comparative genomics analysis across 17 genomes (Table 2), identifying orthologous exons, introns and alternative splice events between these genomes. As a separate analysis that is valid even when the target genome has little or no alternative splicing data, we also analyzed the conservation of alternative exons and splice sites across 17 genomes. To do this, we used the well-established and characterized multigenome alignments (33) constructed for the UCSC genome browser (34). Orthologous exons and introns were defined by sharing at least one splice site in multigenome alignments (Materials and Methods). Out of 129 981, 85 673 (66%) human internal exons have at least one orthologous exon, which are identified by hg17 referenced 17 species multigenome alignments. Total numbers of orthologous exons found by five different multigenome alignments are summarized in Table 2. This method can give more comprehensive database for orthologous genes than conventional protein similarity-based method. Furthermore, we constructed multigenome splice site database from UCSC multigenome alignments (Figure 1D). These data give users both the ability to compare observed splicing patterns between experimental data for different species, but also to study the evolution of alternative exons and splice sites (by looking at their conservation) even in genomes for which no splicing data are available.
|
|
Database mining and tools
Users can mine ASAP II in several ways:
- by using the web interface (below);
- by downloading it as MySQL tables and performing SQL queries;
- by using Python tools that work directly with the ASAP II schema, for graph query of alternative splicing graphs and comparative genomics query of multigenome alignments.
Web interface
ASAP II can be searched by several different criteria such as gene symbol, gene name and ID [UniGene (29), GenBank (9), etc.]. The web interface provides seven different kinds of views:
- user query, UniGene annotation, orthologous genes and genome browsers;
- genome alignment;
- exons & orthologous exons;
- introns & orthologous introns;
- alternative splicing;
- isoform and protein sequences;
- tissue & cancer versus normal specificity.
3 and at least three EST sequences (19,20). A short introduction to the web interface and a comprehensive user guide are available at the ASAP II website, http://www.bioinformatics.ucla.edu/ASAP2. Comparative genomics is a major focus of the ASAP II web interface, displaying results from its new orthologous exons and introns database. For example, it displays the multiple alignments of splice site sequences as a phylogenetic tree (Figure 1D), enabling users to infer the evolutionary history of introns at a glance. In Figure 1D, one can easily that this pair of splice sites appears to have evolved in an early mammalian ancestor, but not before. Many applications are possible. For example, researchers could identify recently evolved splice sites by selecting introns whose canonical splice site sequences (GT/AG) are only conserved within closely related species, but not in distant species. ASAP II includes links to comparative genomics information from all views. All orthologous genes identified by multigenome alignments are listed in its annotation summary (Figure 1A). If the user clicks Show Orthologous Exons/Introns on any page, detailed information will be shown in new window (Figure 1B and C).
Comparison with other alternative splicing databases
Alternative splicing analysis results can be significantly different between different databases because each database uses different sequence databases, genome assembly, methods for sequence alignments, alignment filtering and stringency, etc. Total numbers of alternatively spliced genes and exons for other databases are summarized in Table 3. ASAP II has more alternatively spliced genes than ASD for human (11 717 versus 9929) and mouse (8711 versus 8211). But, DEDB has more spliced genes than ASAP II (13 222 versus 9683). ECgene has twice as many spliced genes as the other databases suggesting the use of different stringency criteria for alignment filtering. HOLLYWOOD has more human internal exons than ASAP II (151 199 versus 129 981), but percentage of alternative exons is significantly lower for human (25% versus 36%) and mouse (13% versus 21%). Presumably, sequence database for HOLLYWOOD (January 2004) is older than ASAP II (January 2006).
|
Update and future directions
ASAP II gives alternative splicing analysis of UniGene data released in January 2006 (Version JAN06). In order to provide with up-to-date alternative splicing analysis, ASAP II database will be updated within 2 years if total number of available sequences are significantly increased. Availability of genome assembly is essential for supporting new species; we will add new species if the genome assembly is publicly available as well as the orthologous Exon/Intron database.
We will also develop novel analysis methods for alternative splicing such as evolutionary history of exons and introns and make available in ASAP II. We hope that ASAP II can become a useful resource for comparative genomics studies in the post-genome era.
| ACKNOWLEDGEMENTS |
|---|
The authors wish to thank Calvin Pan, Qi Wang and Dr Yi Xing for valuable comments on this work. This work has been supported by NIH grant U54 RR021813, Department of Energy grant DE-FC02-02ER63421, and by a Dreyfus Foundation Teacher-Scholar Award to C.J.L. Funding to pay the Open Access publication charges for this article was provided by NIH grant U54 RR021813.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Black, D.L. (2003) Mechanisms of alternative pre-messenger RNA splicing Annu. Rev. Biochem, . 72, 291336[CrossRef][Web of Science][Medline] .
- Graveley, B.R. (2001) Alternative splicing: increasing diversity in the proteomic world Trends Genet, . 17, 100107[CrossRef][Web of Science][Medline] .
- Maniatis, T. and Tasic, B. (2002) Alternative pre-mRNA splicing and proteome expansion in metazoans Nature, 418, 236243[CrossRef][Medline] .
- Modrek, B., Resch, A., Grasso, C., Lee, C. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes Nucleic Acids Res, . 29, 28502859
[Abstract/Free Full Text] . - Kan, Z., States, D., Gish, W. (2002) Selecting for functional alternative splices in ESTs Genome Res, . 12, 18371845
[Abstract/Free Full Text] . - Johnson, J.M., Castle, J., Garrett-Engele, P., Kan, Z., Loerch, P.M., Armour, C.D., Santos, R., Schadt, E.E., Stoughton, R., Shoemaker, D.D. (2003) Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays Science, 302, 21412144
[Abstract/Free Full Text] . - Caceres, J.F. and Kornblihtt, A.R. (2002) Alternative splicing: multiple control mechanisms and involvement in human disease Trends Genet, . 18, 186193[CrossRef][Web of Science][Medline] .
- Mangasarian, A. (2005) Alternative RNA splicing and drug target identification IDrugs, 8, 725729[Web of Science][Medline] .
- Benson, D.A., Karsch-Mizrachi, I., Lipman, D.J., Ostell, J., Wheeler, D.L. (2006) GenBank Nucleic Acids Res, . 34, D16D20
[Abstract/Free Full Text] . - Holste, D., Huo, G., Tung, V., Burge, C.B. (2006) HOLLYWOOD: a comparative relational database of alternative splicing Nucleic Acids Res, . 34, D56D62
[Abstract/Free Full Text] . - Stamm, S., Riethoven, J.J., Le Texier, V., Gopalakrishnan, C., Kumanduri, V., Tang, Y., Barbosa-Morais, N.L., Thanaraj, T.A. (2006) ASD: a bioinformatics resource on alternative splicing Nucleic Acids Res, . 34, D46D55
[Abstract/Free Full Text] . - Lee, B.T., Tan, T.W., Ranganathan, S. (2004) DEDB: a database of Drosophila melanogaster exons in splicing graph form BMC Bioinformatics, 5, 189[CrossRef][Medline] .
- Kim, P., Kim, N., Lee, Y., Kim, B., Shin, Y., Lee, S. (2005) ECgene: genome annotation for alternative splicing Nucleic Acids Res, . 33, D75D79
[Abstract/Free Full Text] . - Kim, N., Shin, S., Lee, S. (2005) ECgene: genome-based EST clustering and gene modeling for alternative splicing Genome Res, . 15, 566576
[Abstract/Free Full Text] . - Lee, C., Atanelov, L., Modrek, B., Xing, Y. (2003) ASAP: the alternative splicing annotation project Nucleic Acids Res, . 31, 101105
[Abstract/Free Full Text] . - Le, K., Mitsouras, K., Roy, M., Wang, Q., Xu, Q., Nelson, S.F., Lee, C. (2004) Detecting tissue-specific regulation of alternative splicing as a qualitative change in microarray data Nucleic Acids Res, . 32, e180
[Abstract/Free Full Text] . - Xing, Y., Resch, A., Lee, C. (2004) The multiassembly problem: reconstructing multiple transcript isoforms from EST fragment mixtures Genome Res, . 14, 426441
[Abstract/Free Full Text] . - Xing, Y., Xu, Q., Lee, C. (2003) Widespread production of novel soluble protein isoforms by alternative splicing removal of transmembrane anchoring domains FEBS Lett, . 555, 572578[CrossRef][Web of Science][Medline] .
- Xu, Q. and Lee, C. (2003) Discovery of novel splice forms and functional analysis of cancer-specific alternative splicing in human expressed sequences Nucleic Acids Res, . 31, 56355643
[Abstract/Free Full Text] . - Xu, Q., Modrek, B., Lee, C. (2002) Genome-wide detection of tissue-specific alternative splicing in the human transcriptome Nucleic Acids Res, . 30, 37543766
[Abstract/Free Full Text] . - Resch, A., Xing, Y., Alekseyenko, A., Modrek, B., Lee, C. (2004) Evidence for a subpopulation of conserved alternative splicing events under selection pressure for protein reading frame preservation Nucleic Acids Res, . 32, 12611269
[Abstract/Free Full Text] . - Cusack, B.P. and Wolfe, K.H. (2005) Changes in alternative splicing of human and mouse genes are accompanied by faster evolution of constitutive exons Mol. Biol. Evol, . 22, 21982208
[Abstract/Free Full Text] . - Lian, Y. and Garner, H.R. (2005) Evidence for the regulation of alternative splicing via complementary DNA sequence repeats Bioinformatics, 21, 13581364
[Abstract/Free Full Text] . - Roy, M., Xu, Q., Lee, C. (2005) Evidence that public database records for many cancer-associated genes reflect a splice form found in tumors and lack normal splice forms Nucleic Acids Res, . 33, 50265033
[Abstract/Free Full Text] . - Xing, Y. and Lee, C.J. (2005) Protein modularity of alternatively spliced exons is associated with tissue-specific regulation of alternative splicing PLoS Genet, . 1, e34[CrossRef][Medline] .
- Chen, F.C., Wang, S.S., Chen, C.J., Li, W.H., Chuang, T.J. (2006) Alternatively and constitutively spliced exons are subject to different evolutionary forces Mol. Biol. Evol, . 23, 675682
[Abstract/Free Full Text] . - Xing, Y., Wang, Q., Lee, C. (2006) Evolutionary divergence of exon flanks: a dissection of mutability and selection Genetics, 173, 17871791
[Abstract/Free Full Text] . - Modrek, B. and Lee, C.J. (2003) Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss Nature Genet, . 34, 177180[CrossRef][Web of Science][Medline] .
- Schuler, G.D. (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes J. Mol. Med, . 75, 694698[CrossRef][Web of Science][Medline] .
- Maglott, D., Ostell, J., Pruitt, K.D., Tatusova, T. (2005) Entrez Gene: gene-centered information at NCBI Nucleic Acids Res, . 33, D54D58
[Abstract/Free Full Text] . - Pruitt, K.D., Tatusova, T., Maglott, D.R. (2005) NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins Nucleic Acids Res, . 33, D501D504
[Abstract/Free Full Text] . - Birney, E., andrews, D., Caccamo, M., Chen, Y., Clarke, L., Coates, G., Cox, T., Cunningham, F., Curwen, V., Cutts, T., et al. (2006) Ensembl 2006 Nucleic Acids Res, . 34, D556D561
[Abstract/Free Full Text] . - Blanchette, M., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F., Roskin, K.M., Baertsch, R., Rosenbloom, K., Clawson, H., Green, E.D., et al. (2004) Aligning multiple genomic sequences with the threaded blockset aligner Genome Res, . 14, 708715
[Abstract/Free Full Text] . - Hinrichs, A.S., Karolchik, D., Baertsch, R., Barber, G.P., Bejerano, G., Clawson, H., Diekhans, M., Furey, T.S., Harte, R.A., Hsu, F., et al. (2006) The UCSC Genome Browser Database: update 2006 Nucleic Acids Res, . 34, D590D598
[Abstract/Free Full Text] .
This article has been cited by other articles:
![]() |
H. Li, G. Liu, J. Yu, W. Cao, V. G. Lobo, and J. Xie In Vivo Selection of Kinase-responsive RNA Elements Controlling Alternative Splicing J. Biol. Chem., June 12, 2009; 284(24): 16191 - 16201. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Fox-Walsh and K. J. Hertel Splice-site pairing is an intrinsically high fidelity process PNAS, February 10, 2009; 106(6): 1766 - 1771. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A Reeves, D. Talavera, and J. M Thornton Genome and proteome annotation: organization, interpretation and integration J R Soc Interface, February 6, 2009; 6(31): 129 - 147. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Roy, N. Kim, Y. Xing, and C. Lee The effect of intron length on exon creation ratios during the evolution of mammalian genomes RNA, November 1, 2008; 14(11): 2261 - 2273. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Shepard and K. J. Hertel Conserved RNA secondary structures promote alternative splicing RNA, August 1, 2008; 14(8): 1463 - 1469. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Malanga, A. Czubaty, A. Girstun, K. Staron, and F. R. Althaus Poly(ADP-ribose) Binds to the Splicing Factor ASF/SF2 and Regulates Its Phosphorylation by DNA Topoisomerase I J. Biol. Chem., July 18, 2008; 283(29): 19991 - 19998. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Castrignano, M. D'Antonio, A. Anselmo, D. Carrabino, A. D'Onorio De Meo, A. M. D'Erchia, F. Licciulli, M. Mangiulli, F. Mignone, G. Pavesi, et al. ASPicDB: A database resource for alternative splicing analysis Bioinformatics, May 15, 2008; 24(10): 1300 - 1304. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Birzele, R. Kuffner, F. Meier, F. Oefinger, C. Potthast, and R. Zimmer ProSAS: a database for analyzing alternative splicing in the context of protein structures Nucleic Acids Res., January 1, 2008; 36(suppl_1): D63 - D68. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. He, Z. Zuo, H. Chen, L. Zhang, F. Zhou, H. Cheng, and R. Zhou Genome-wide detection of testis- and testicular cancer-specific alternative splicing Carcinogenesis, December 1, 2007; 28(12): 2484 - 2490. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. A. Calarco, Y. Xing, M. Caceres, J. P. Calarco, X. Xiao, Q. Pan, C. Lee, T. M. Preuss, and B. J. Blencowe Global analysis of alternative splicing differences between humans and chimpanzees Genes & Dev., November 15, 2007; 21(22): 2963 - 2975. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

0). (C) Orthologous Introns. (1157 






