Skip Navigation


Nucleic Acids Research Advance Access originally published online on June 21, 2007
Nucleic Acids Research 2007 35(Web Server issue):W292-W296; doi:10.1093/nar/gkm344
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4177K) Freely available
Right arrow Screen PDF (555K) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
35/suppl_2/W292    most recent
gkm344v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Lacroix, Z.
Right arrow Articles by Snyder, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lacroix, Z.
Right arrow Articles by Snyder, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W292-W296
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Articles

BIPASS: BioInformatics Pipeline Alternative Splicing Services

Zoé Lacroix1,*, Christophe Legendre1, Louiqa Raschid2 and Ben Snyder2

1Scientific Data Management Laboratory, Arizona State University, PO Box 875706, Tempe AZ 85287-5706 and 2Department of Computer Science, University of Maryland, College Park, MD 20742, USA

*To whom correspondence should be addressed. Tel: (480) 727 6935; Fax: (480) 965-8325; Email: Zoe.Lacroix{at}asu.edu

Received January 31, 2007. Revised April 14, 2007. Accepted April 22, 2007.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES AND FUNCTIONALITIES
 BIPASS USER INTERFACE
 CONCLUSION
 REFERENCES
 
BioInformatics Pipeline Alternative Splicing Services (BIPASS) offer support to scientists interested in gathering information related to alternative splicing (AS) events. The service BIPAS–SpliceDB provides access to AS information that has been extracted a priori from various public databases and stored in a data warehouse. In contrast, the BIPAS–Align&Splice service allows scientists to submit their own sequences and genome to compute AS analysis results. BIPAS services offer various user-friendly ways to navigate through the results. AS results are organized at different conceptual levels (clusters and sequences), and are displayed in graphs or summarized in tables that can be downloaded in XML or text format. The two BIPAS services SpliceDB and Align&Splice are available online at http://bip.umiacs.umd.edu:8080/.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES AND FUNCTIONALITIES
 BIPASS USER INTERFACE
 CONCLUSION
 REFERENCES
 
Alternative splicing (AS) is the splicing process of a pre-mRNA transcription from one gene that can lead to different mature mRNA molecules, and thus to different proteins. AS has emerged as a major research mechanism after the high-throughput genome sequencing of the 90's and the success of tools that perform pairwise alignment of genomic and transcript sequences. Recent improvements and better accuracy of these alignment tools have demonstrated that the previous assumption of a one-to-one mapping from a gene to a protein no longer holds. Instead AS has come to be accepted as a common process to generate multiple proteins.

AS analysis becomes a critical method for a variety of studies and experiments. AS information may contribute to the reconstitution of events of creation of peptides/proteins from a gene, the discovery of new exons, and thus new proteins, and the understanding of which mechanisms are involved in AS and how they are triggered and regulated. The latter addresses issues related to how cell or tissue characteristics may affect gene translation leading to more than one protein, e.g. a comparison (by computation) of the expression of transcription factor(s) across multiple tissues [see details in (1)]. Additional researches include the discovery of proteins involved in the AS process (splicesosome), the identification of specific sites (such as ESE, ESS, ISE or ISS) involved in the mechanism of AS, the study of activation and repression mechanisms, and some external factors that can influence the AS mechanism, and the identification of possible ‘defaults’ or ‘errors’ in AS events that may be directly or indirectly responsible for some diseases [see (2) for more details]. The phylogeny (3) and evolution of AS mechanisms and the identification of specific primers for one specific transcript have also been studied.


    FEATURES AND FUNCTIONALITIES
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES AND FUNCTIONALITIES
 BIPASS USER INTERFACE
 CONCLUSION
 REFERENCES
 
Developed jointly by Arizona State University and the University of Maryland at College Park, the BIPAS services are the first of a suite of BioInformatics Pipeline (BIP) services (4) designed to exploit scientific workflows and database mediation technology to implement scientific pipelines, and to develop useful tools for the AS community.

Technology
We chose IBM's WebSphere Information Integrator (WSII) as the mediator platform (5). Based on the relational data model, WSII makes an autonomous federation of data from heterogeneous sources which appear to the user or front-end application like a single large relational database management system (RDBMS). The SQL query language is supported over all the federated sources even if the underlying sources’ native search capabilities are less full-featured than SQL. Similarly, specialized non-SQL search capabilities of the underlying sources are also available through WSII. BIPASS use custom PERL scripts, C++ scripts and two alignment tools (Blat and SIM4). The BIPASS server is a DELL PC with two dual-core 64 bit processors operating on RedHat Enterprise Linux 4, 64 bit version. Wrappers were developed to access and retrieve data from multiple public resources, e.g. GenBank (Entrez Nucleotide). BIPAS-SpliceDB exploits data stored a priori in the WSII data warehouse, whereas BIPAS-Align-&-Splice processes the users’ input and stores the data in the WSII database where the pipeline (alignment and splicing) steps are performed.

Tools
AS analysis is performed in two successive steps. The first is an alignment of a transcript sequence against a genomic sequence, followed by a clustering step. BIPAS services exploit the BioInformatics Pipeline (BIP) toolbox (4).

BIP-Align, the first tool used in the BIP toolbox, can be used alone to create an alignment database, or with other BIP tools to create a database for a more specific function. The objective of BIP-Align is to map input transcripts to a given organism's genome, then store information in a database. Data are subject to user-definable quality filters. The BIP-Align tool extracts and integrates information from several data sources and loads input data in the BIP-SpliceDB, aligns all input transcripts to the genome, and filters and stores alignment data in the BIP-SpliceDB. The current design uses a two-step alignment process that utilizes Blat, then feeds the output through SIM4 to further refine the alignment.

BIP-Splice takes the transcripts loaded and aligned with BIP-Align, clusters them, and performs alternative splicing analysis. The clustering algorithm [used in (1)] is a two-step process. First, transcripts are grouped with respect to overlap, i.e., all transcripts that overlap with at least one base pair are considered part of the same cluster. Overlap takes into account not only the genomic coordinates of the transcripts, but also the orientation, or strand. For instance, if an overlap of two transcripts is based on a genomic position, but one maps to the positive strand and the other to the negative, they are members of two different clusters. The second clustering step uses each transcript's exon/intron structure to refine the clusters. To be a member of a cluster, a transcript must have at least one exon that overlaps with a minimum of one base pair with the exon of another transcript. If it does not meet these criteria then the transcript in question creates a new cluster. The quality filter provides a parameter which requires that all clusters have a minimum number of transcripts as members. The default value is 3, though a BIP-Splice database can be created which allows singleton and doubleton clusters if desired. Each cluster is analyzed to determine whether they exhibit any alternative splicing. The alternative splicing events that are recorded are:

  • Length variation. This refers to internal splice sites of exons, and if they differ between member transcripts. The 5' and 3' ends of the transcripts are not evaluated for splice variation because the sequence may be truncated.
  • Initial cassette exons. This type of exon is missing in one or more transcripts. An initial exon is the 5' exon of a transcript. To be flagged as an initial cassette exon, the exon cannot occur as an internal exon in any transcript.
  • Terminal cassette exons. Same as initial cassette exon, except it occurs at the 3' end.
  • Internal cassette exons. These are cassette exons present as internal exons in at least one transcript of the cluster. Internal cassette exons are assumed to be the most biologically relevant because truncated sequences may create artificial occurrences of initial and terminal cassette exons.

Any cluster with at least one form of splice variation is flagged as ‘variant’. A cluster typically represents intermediate transcripts [from the pre-messenger-RNA(s) to the mature messenger-RNA(s)] required to obtain one or several functional translated proteins from the same gene. The quality of the alternative splicing analysis depends on several parameters, including the size of the clusters. The more transcripts a cluster contains, The more transcripts a cluster contains, the better the result.

Databases
BIP–SpliceDB creates a data warehouse of data that are automatically extracted using wrappers. Data are obtained from multiple public repositories including UCSC (genome data), GenBank/Entrez Nucleotide (full-length mRNAs) and dbEST (EST data). These databases are a collection of transcript sequences such as cDNA, mRNA, EST, which are often annotated. Annotations lead to a better characterization of the transcripts in each cluster, resulting in an improved accuracy of AS results.


    BIPASS USER INTERFACE
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES AND FUNCTIONALITIES
 BIPASS USER INTERFACE
 CONCLUSION
 REFERENCES
 
The BIPASS front page available at http://bip.umiacs.umd.edu:8080/displays a form (shown in Figure 1) that gives access to two services: BIPAS-SpliceDB on the left and BIPAS-Align&Splice on the right. The display of the site is optimized for Mozilla Firefox 1.5.X.X and 2.0.X.X.


Figure 1
View larger version (20K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Homepage BIPAS services.

 
BIPAS–SpliceDB
The first service allows queries against our AS data warehouse.

BIPAS–SpliceDB input
To query BIPAS–SpliceDB, a user enters a keyword and selects the keyword type (Any, Genbank, Annotation) and an organism (Any, Human, or Mouse). When the mouse pad is over one interrogation point some tooltips guide the users. For example, the user enters cdkn1, as an annotation, and selects the human genome. Once the query is entered (see Figure 2), the user clicks on the GO button (row 2 in Figure 1) and the results are displayed (see Figure 3).


Figure 2
View larger version (26K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. BIPAS–SpliceDB input.

 

Figure 3
View larger version (41K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Page of clusters results.

 
BIPAS–SpliceDB output
BIPAS-SpliceDB returns a table (Figure 3). The first row of the table indicates the total number of clusters found in the database (this entry has nine clusters). The second row lists the following information describing each cluster (number of clones, clone identifiers, number of genomic exons and whether it contains variants), and information related to the chromosome (identifier, orientation, beginning and end, organism and its genome version). The following rows display information for each cluster matching the user's query.

The user can then click on the cluster of interest; the red arrow in Figure 3 indicates that the user selected the cluster Hs.chr6.p.17383. This selection opens a new window showing more details about the cluster including its transcripts, exons, introns, associated splice graph, etc. The details are shown in Figure 4.


Figure 4
View larger version (54K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 4. Cluster information page.

 
The cluster page may be divided in two components: the table and the graph which gives information about the transcripts in the cluster. More links are available at this level. In particular, details on the sequence of genomic exons can be displayed by clicking on the link Sequence(s) in the column ‘View’. Links to download data information exist in three formats: XML or text format for the data and png format for the images. In the clickable graph, a click on an exon or intron points directly to its nucleotide sequence in a page containing all the exonic or intronic sequences for the genomic data. By clicking on the label of a transcript a new transcript page containing all information about the transcript is displayed. It contains annotations, information about exons and introns, and a graphical representation of the transcript. At the top of the page, links to download transcript information exist in the three pre-cited formats.

BIPAS–Align&Splice
The second service does not access a data warehouse, in contrast it runs the pipeline online.

BIPAS–Align&Splice input
Once the BIPAS–Align&Splice service is selected, the form shown in Figure 5 is displayed. The completion of the submission is a three-step process:

  1. Enter transcript sequence.
  2. Select or enter genomic data.
  3. Enter a valid e-mail address.


Figure 5
View larger version (28K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 5. BIPAS–Align&Splice input.

 
There are two different ways to submit transcript sequences (Paste sequence or Upload sequence from a file) and two different formats that can be used (FASTA or Genbank Format). Several transcript sequences may be submitted at the same time, as long as they are from the same organism and properly formatted. Note that the input must contain at least two exons to return results as the protocol is run online.

The second step is the selection of genomic data. The whole genome of an organism is the default option and the user can decide among one of the available organisms. If the user decides to align ones own transcripts against specific genomic data, the user may enter genomic data as a full consecutive genomic sequence in FASTA format either by uploading or by pasting sequence. For example, one may submit the whole chromosome sequence as long as it is a single sequence in FASTA format.

BIPAS–Align&Splice output
BIPAS–Align&Splice returns a clusters page similar to Figure 3.

BIPASS and AS services
BIPASS are scientist-friendly services dedicated to alternative splicing. BIPAS provides two services directly accessible from its homepage. The BIPAS–SpliceDB service is similar to those provided by Hollywood (6), ASD (7) and H-DBAS (8). BIPAS–SpliceDB is based on an automatic a priori computation of data from different sources both for transcripts and for genomic data. Hollywood, H-DBAS and ASD are also based on the automatic computation of genomic and transcripts data combined with manual data curation. Although manual curation may increase accuracy, it is time and effort consuming. In contrast, automated extraction and computation allows efficient integration of new organisms and data sources. The BIPAS–Align&Splice service is similar to the service provided by ASPIC (9). BIPASS offer a new feature that can align transcripts to a whole genome, allowing users to provide (in one submission) multiple transcripts that are possibly widespread on the genome.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES AND FUNCTIONALITIES
 BIPASS USER INTERFACE
 CONCLUSION
 REFERENCES
 
BIPAS is an alternative to other AS services. It consolidates two services through a single convenient interface. The first allows the search of a pre-computed BIPAS–SpliceDB warehouse and the second allows the user to analyze AS events on their own transcript sequences. BIPASS will be maintained and improved by updating BIPAS–SpliceDB with new genomes (e.g., rat), providing advanced search features with new search fields (e.g. gene name, sequence) and creating an index of clusters by chromosome.


    ACKNOWLEDGEMENTS
 
This research was partially supported by the National Science Foundation grants IIS0222847, IIS0430915, IIS 0223042, and IIS 0222847. We thank Dr Terry Gaasterland and Dr Bahar Taneri for their contribution to the BIPAS project. Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Funding to pay he Open Access publication charges for this article was provided by the National Science Foundation.


    Footnotes
 
The authors wish it to be known that, in their opinion, all authors should be regarded as joint First Authors.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 FEATURES AND FUNCTIONALITIES
 BIPASS USER INTERFACE
 CONCLUSION
 REFERENCES
 

  1. Taneri B, Snyder B, Novoradovsky A, Gaasterland T. Alternative splicing of mouse transcription factors affects their DNA-binding domain architecture and is tissue specific. Genome Biol (2004) 5:R75.[CrossRef][Medline]

  2. Blencowe BJ. Alternative splicing: new insights from global analyses. Cell (2006) 126:37–47.[CrossRef][ISI][Medline]

  3. Riegert P, Wanner V, Bahram S. Genomics, isoforms, expression, and phylogeny of the MHC class I-related MR1 gene. J. Immunol (1998) 161:40667–4077.

  4. Eckman AB, Gaasterland T, Lacroix Z, Raschid L, Snyder B, Vidal ME. Implementing a Bioinformatics Pipeline (BIP) on a mediator platform: Comparing cost and quality of alternate choices. In: Proceedings of the 22nd international conference on Data Engneering Workshops (2006) Los Alamitos, CA, USA: IEEE ‘Computer Society’ and Press. 67.

  5. Haas L, Eckman BA, Kodali P, Lin E, Rice J, Schwarz PM. Bioinformatics: Managing Scientific Data—Lacroix Z, Critchlow T, eds. (2003) San Francisco: Elsevier Science ed. Morgan Kaufmann Publishers. 303.

  6. Holste D, Huo G, Tung V, Burge CB. HOLLYWOOD: a comparative relational database of alternative splicing. Nucleic Acids Res (2006) 34:D56–D62.[Abstract/Free Full Text]

  7. Stamm S, Riethoven JJ, Le Texier V, Gopalakrishnan C, Kumanduri V, Tang Y, Barbosa-Morais NL, Thanaraj TA. ASD: a bioinformatics resource on alternative splicing. Nucleic Acids Res (2006) 34:D46–D55.[Abstract/Free Full Text]

  8. Takeda J, Suzuki Y, Nakao M, Kuroda T, Sugano S, Gojobori T, Imanishi T. H-DBAS: alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-Invitational. Nucleic Acids Res (2007) 35:D104–D109.[Abstract/Free Full Text]

  9. Bonizzoni P, Rizzi R, Pesole G. ASPIC: a novel method to predict the exon-intron structure of a gene that is optimally compatible to a set of transcript sequences. BMC Bioinformatics (2005) 6:244.[CrossRef][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4177K) Freely available
Right arrow Screen PDF (555K) Freely available
Right arrow Supplementary Material
Right arrowOA All Versions of this Article:
35/suppl_2/W292    most recent
gkm344v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Lacroix, Z.
Right arrow Articles by Snyder, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lacroix, Z.
Right arrow Articles by Snyder, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?