Nucleic Acids Research, 2004, Vol. 32, Database issue D70-D74
© 2004 Oxford University Press
EASED: Extended Alternatively Spliced EST Database
Max-Delbrück-Center for Molecular Medicine, Department of Bioinformatics, Robert-Rössle-Strasse 10, 13125 Berlin, Germany
*To whom correspondence should be addressed. Tel: +49 30 94062831; Fax: +49 30 94062834; Email:pospisil{at}mdc-berlin.de
Received August 14, 2003; Revised September 11, 2003; Accepted October 27, 2003
| ABSTRACT |
|---|
|
|
|---|
We established a database of alternative splice forms (ASforms) for nine eukaryotic organisms. ASforms are defined by comparing high-scoring ESTs with mRNA sequences using BLAST, taking known exonintron information (from the Ensembl database). Filtering programs compare the ends of each aligned sequence pair for deletions or insertions in the EST sequence, which indicate the existence of alternative splice forms with respect to the exonintron boundaries. Moreover, we defined the alternative splice profile of each human sequence. It indicates the number of alternatively spliced ESTs (NAE), the number of constitutively spliced ESTs (NCE) as well as the number of alternative splice sites (NSS) per mRNA. NAE and NCE correspond to the EST coverage and can be used as a quality indicator for the predicted alternative splice variants. The NSS value specifies the splice propensity of a gene. Additionally, the tissue type information of all ESTs was included. This allows (i) restriction of the search to certain tissues and (ii) calculation of the tissue-NAEs, tissue-NCEs and tissue-NSS. These scores are suitable for the estimation of tissue specificity of certain ASforms. Furthermore, the developmental stage and disease information of the ESTs is available. EASED is accessible at http://eased.bioinf.mdc-berlin.de/.
| INTRODUCTION |
|---|
|
|
|---|
The concept of alternative splicing as a mechanism to create a high diversity of functional proteins in mammals has received increasing evidence and support with the progress of the Human Genome Project (1). Investigations based on human sequence material (experimental data) and computational methods suggest about half of the identified genes to be alternatively spliced in conjunction with cellular processes (13).
Various approaches to the detection of alternative spliceforms (ASforms) computationally are based on expressed sequence tags (ESTs) [(412) see also the review by Modrek and Lee (13)]. Determining the conditions under which an mRNA was isolated involves the collection and classification of information belonging to the corresponding EST, e.g. the tissue origin, the developmental stage or the association with diseases such as cancer.
| DESCRIPTION OF EASED |
|---|
|
|
|---|
The Extended Alternatively Spliced EST Database is an online compendium of ASforms for several organisms (Arabidopsis thaliana, Bos taurus, Caenorhabditis elegans, Drosophila melanogaster, Danio rerio, Homo sapiens, Mus musculus, Rattus norvegicus, Xenopus laevis). At the moment, the additional information described here (alternative splice profile, tissue types, developmental stage, disease, classification of splice events) are only available for human.
| PREDICTION ALGORITHM |
|---|
|
|
|---|
ASforms are defined by comparing high-scoring ESTs with mRNA sequences [both from GenBank (14)] using BLAST (15). The exonintron information for human sequences was obtained from the Ensembl database (16). (For the other organisms the exonintron information is not yet included.) Repetitive sequences of all mRNAs were previously masked by MaskerAid (17). The algorithm used to identify ASforms takes the currently available mRNA sequences and aligns these sequences to all available ESTs using the BLASTN program. A matching pair of mRNA and EST has to fulfil certain criteria to be considered as an ASform. The alignment has to show at least two high-scoring pairs (HSPs). Filtering programs with defined parameters (gap length
30 nucleotides; HSP length
100 nucleotides and percentage of identity of each HSP
98%) compare the ends of each aligned sequence pair for deletions or insertions in the EST sequence, which suggest the existence of ASforms. All predicted ASforms are stored in a database.
Alternative Splice Profile (ASP)
We defined the so-called alternative splice profile (ASP) of each human sequence. It indicates the number of alternatively spliced ESTs (NAEs), the number of constitutively spliced ESTs (NCEs) as well as the number of alternative splice sites (NSSs) per mRNA. NAE and NCE correspond to the EST coverage and can be used as a quality indicator for the predicted alternative splice variants. The NSS value specifies the splice propensity of a gene.
Tissue types
Another useful feature for human sequences is the tissue type information. This information was derived from the MeSH (Medical Subject Headings) tree (http://www.nlm.nih.gov/mesh/meshhome.html). We use the tissue type classification for human in the second layer, which contains 43 different human tissues. This allows the search to be restricted to certain tissues and the calculation of the tissue-NAEs, tissue-NCEs and tissue-NSSs.
Additional information
All available information concerning the developmental stage (embryo, newborn, juvenile and adult) as well as the disease status (health, cancer and disease meaning all other diseases except those marked as cancer) was additionally included. As previously mentioned for the tissue-specific parameter, this information also enables the calculation of the ASPs for selected developmental stages or diseases.
Classification of the types of alternative splice events
We classify the types of alternative splice events in terms of the location of the HSP boundaries compared with the given exonintron boundaries. We define an exact match of a HSP boundary to an exonintron boundary with an assumed 10 bp uncertainty. In doing this we assumed three possible donor as well as acceptor splice site events (Fig. 1): the HSP start or end lies on the exonintron boundary (xas or exact alternative splice; Fig. 1a), the HSP boundary lies within an exon (eas, alternative splice within an exon; Fig. 1b) or the HSP boundary lies within an intron (ias, alternative splice within an intron; Fig. 1c). For the donor site, the alternative splice events are termed 5xas, 5eas
[PDB]
or 5ias. For the acceptor site the classification gives rise to 3xas 3eas or 3ias splice sites. Using this classification, we can mark all splice sites as (i) alternative 3' splice sites (3eas or 3ias), (ii) alternative 5' splice site (5eas
[PDB]
or 5ias), (iii) cassette exons (3xas and 5xas) and (iv) retained introns [exact 3' and 5' splice sites (3xas and 5xas); the inserted nucleotides originate from the intron sequence].
|
Additionally, the type of alternative splicing is given as a skip (the EST sequence is shorter than the mRNA sequence and a gap between two HSPs was found on the mRNA) or insert (vice versa).
| DATABASE PRESENTATION |
|---|
|
|
|---|
Query interface
The database for human alternative splice forms is accessible via http://eased.bioinf.mdc-berlin.de/. EASED provides a number of possibilities to search for ASforms:
(i) In the simplest mode, users can query for ASforms by mRNA, CDS or EST accession numbers, the Ensembl gene ID or Ensembl transcript ID.
(ii) A keyword search mode for all organisms, which will identify a gene via a full text search of the genes name and description.
(iii) The search for other identifiers includes GO numbers (insert e.g. GO:0007048), Swiss-Prot entry name, RefSeq ID, EC number (if indicated in the description line), chromosome number (insert e.g. 7 or Y) and protein ID.
(iv) The restriction to splice sites with a defined (e.g. high) number or percentage of ESTs can be used to filter out those splice sites with a high coverage of ESTs.
(v) The tissue type search allows all ASforms to be extracted with a predetermined number or percentage of ESTs from one of the 43 human tissues.
(vi) Moreover, one can search for splice sites with a defined fraction of a selected developmental stage (adult, embryo, newborn, juvenile) or disease (cancer, healthy, other diseases).
(vii) A further restriction to splice sites with or without exact exonintron boundaries and to skipped or inserted splicing events is possible.
All these search options can be combined in one query.
The result of a query is summarized in a table with selectable links to the full information entries. The main features are listed in this table: the gene name, entry number, organism name and the number of splice sites (NSS). Detailed information is available by clicking on the entry name (Fig. 2).
|
Detailed information
For each ASform, we stored the following information from other databases: GenBank ID, CDS ID, Swiss-Prot ID (if available), length, taxonomy, mRNA entry name. The calculated alternative splice site profile denotes the number of alternatively or constitutively spliced ESTs as mentioned, as well as the number of alternative splice sites. In the splice site view, a graphical overview of the location of the matching (alternatively spliced) ESTs is given. The EST information (tissue type, disease status and developmental stage) can be obtained for the whole transcript (section Alternative Splice Frequency) or for each splice site separately. The alternative splice event is classified by its length, the type (skip or insert) and the localization of the HSP compared with the exonintron boundaries. To ease searching for the most interesting information, a color code for the tissue types and developmental stages and a bar code for the number of matching ESTs were established (also see http://eased.bioinf.mdc-berlin.de/eased_legend.html) (Fig. 3a and b).
|
| FUTURE DIRECTIONS |
|---|
|
|
|---|
EASED is an ongoing project. The features mentioned (ASP, tissue type, developmental stage, disease status and exonintron information), which describe human sequences, will be added for the eight other organisms in the near future. A number of new features are currently in development to expand the scope and usability of the resource. To this end, the algorithm that predicts potential ASforms will be improved and upgraded. As an important feature, it is planned to add evolutionary information, which will enable crosslinking of results from orthologous genes.
| CONCLUSIONS |
|---|
|
|
|---|
The EASED project is establishing a comprehensive database of alternatively spliced mRNAs from the (freely accessible) sequence pool of humans and eight model organisms. At the time of writing, EASED consists of nearly 30 000 alternatively spliced transcripts.
Moreover, EASED includes useful biological information, e.g. tissue type and developmental stage notation. This can be useful to biologists in several ways. Its main advantage is in providing the possibility to search for biologically relevant data. This feature is not yet included in any other alternative splice database and facilitates, e.g. extended statistical studies.
Another focus of EASED relates to finding candidate genes for the origin of diseases. Using combined query parameters (e.g. the number of ESTs expressed in cancer tissue and in a certain tissue) enables the user to filter out sequences of interest. As a result of the parametrization of the alternative splice profile, a ranking of the queried sequences is possible. EASED will be updated regularly and will be extended in the coming months.
| AVAILABILITY |
|---|
|
|
|---|
EASED is freely available on the web at http://eased. bioinf.mdc-berlin.de/.
| REFERENCES |
|---|
|
|
|---|
- McPherson,J.D., Marra,M., Hillier,L., Waterston,R.H., Chinwalla,A., Wallis,J., Sekhon, M.,Wylie,K., Mardis,E.R., Wilson,R.K. et al. International Human Genome Sequencing Consortium (2001) A physical map of the human genome. Nature, 409, 934941[CrossRef][Medline]
- Brett,D., Pospisil,H., Valcarcel,J., Reich,J. and Bork,P. (2002) Alternative splicing and genome complexity. Nature Genet., 30, 2930.[CrossRef][ISI][Medline]
- Mironov,A.A., Fickett,J.W. and Gelfand,M.S. (1999) Frequent alternative splicing of human genes. Genome Res., 9, 12881293.
[Abstract/Free Full Text] - Thanaraj,T.A. (1999) A clean data set of EST-confirmed splice sites from Homo sapiens and standards for clean-up procedures. Nucleic Acids Res., 27, 26272637.
[Abstract/Free Full Text] - Coward,E., Haas,S.A. and Vingron,M. (2002) SpliceNest: visualization of gene structure and alternative splicing based on EST clusters. Trends Genet., 18, 5355.[CrossRef]
- Huang,Y.H., Chen,Y.T., Lai,J.J., Yang,S.T. and Yang,U.C. (2002) PALS db: Putative Alternative Splicing database. Nucleic Acids Res., 30, 186190.
[Abstract/Free Full Text] - Thanaraj,T.A., Clark,F. and Muilu,J. (2003) Conservation of human alternative splice events in mouse. Nucleic Acids Res., 31, 25442552
[Abstract/Free Full Text] - Kan,Z., Rouchka,E.C., Gish,W.R. and States D.J. (2001) Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res., 11, 889900.
[Abstract/Free Full Text] - Modrek,B., Resch,A., Grasso,C. and Lee,C. (2001) Genome-wide detection of alternative splicing in expressed sequences of human genes. Nucleic Acids Res., 29, 28502859.
[Abstract/Free Full Text] - Lee,C., Atanelov,L., Modrek,B. and Xing,Y. (2003) ASAP: the Alternative Splicing Annotation Project. Nucleic Acids Res., 31, 101105.
[Abstract/Free Full Text] - Croft,L., Schandorff,S., Clark,F., Burrage,K., Archtander,P. and Mattick,J.S. (2000) ISIS, the intron information system, reveals the prevalence of alternative splicing in the human genome. Nature Genet., 24, 340341.[CrossRef][ISI][Medline]
- Kent,W.J. and Zahler,A.M. (2000) The intronerator: exploring introns and alternative splicing in Caenorhabditis elegans. Nucleic Acids Res., 28, 9193.
[Abstract/Free Full Text] - Modrek,B. and Lee,C. (2002) A genomic view of alternative splicing. Nature Genet., 30, 1319.[CrossRef][ISI][Medline]
- Benson,D.A., Karsch-Mizrachi,I., Lipman,D.J., Ostell,J. and Wheeler,D.L. (2003) GenBank. Nucleic Acids Res., 31, 2327
[Abstract/Free Full Text] - Altschul,S.F., Madden,T.L., Schäffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
[Abstract/Free Full Text] - Clamp,M., Andrews,D., Barker,D., Bevan,P., Cameron,G., Chen,Y., Clark,L., Cox,T., Cuff,J., Curwen,V. et al. (2003) Ensembl 2002: accommodating comparative genomics. Nucleic Acids Res., 31, 3842.
[Abstract/Free Full Text] - Bedell,J.A., Korf,I. and Gish,W. (2000) MaskerAid: a performance enhancement to RepeatMasker. Bioinformatics, 16, 10401041.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
M. Hiller, S. Nikolajewa, K. Huse, K. Szafranski, P. Rosenstiel, S. Schuster, R. Backofen, and M. Platzer TassDB: a database of alternative tandem splice sites Nucleic Acids Res., January 12, 2007; 35(suppl_1): D188 - D192. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Shepelev and A. Fedorov Advances in the Exon-Intron Database (EID) Brief Bioinform, June 1, 2006; 7(2): 178 - 185. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea Bioinformatics of alternative splicing and its regulation Brief Bioinform, March 1, 2006; 7(1): 55 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Holste, G. Huo, V. Tung, and C. B. Burge HOLLYWOOD: a comparative relational database of alternative splicing Nucleic Acids Res., January 1, 2006; 34(suppl_1): D56 - D62. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. ZHENG, Y.-S. KWON, H.-R. LI, K. ZHANG, G. COUTINHO-MANSFIELD, C. YANG, T. M. NAIR, M. GRIBSKOV, and X.-D. FU MAASE: An alternative splicing database designed for supporting splicing microarray applications RNA, December 1, 2005; 11(12): 1767 - 1776. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. de la Grange, M. Dutertre, N. Martin, and D. Auboeuf FAST DB: a website resource for the study of the expression regulation of human gene products Nucleic Acids Res., July 28, 2005; 33(13): 4276 - 4284. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kim, S. Shin, and S. Lee ECgene: Genome-based EST clustering and gene modeling for alternative splicing Genome Res., April 1, 2005; 15(4): 566 - 576. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Fehlbaum, C. Guihal, L. Bracco, and O. Cochet A microarray configuration to quantify expression levels and relative abundance of splice variants Nucleic Acids Res., March 10, 2005; 33(5): e47 - e47. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kim, N. Kim, Y. Lee, B. Kim, Y. Shin, and S. Lee ECgene: genome annotation for alternative splicing Nucleic Acids Res., January 1, 2005; 33(suppl_1): D75 - D79. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kim, S. Shin, and S. Lee ASmodeler: gene modeling of alternative splicing from genomic alignment of mRNA, EST and protein sequences Nucleic Acids Res., July 1, 2004; 32(suppl_2): W181 - W186. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







