Nucleic Acids Research, 2001, Vol. 29, No. 13 2850-2859
© 2001 Oxford University Press
Genome-wide detection of alternative splicing in expressed sequences of human genes
Received January 26, 2001; Revised and Accepted May 5, 2001.
| ABSTRACT |
|---|
|
|
|---|
We have identified 6201 alternative splice relationships in human genes, through a genome-wide analysis of expressed sequence tags (ESTs). Starting with
2.1 million human mRNA and EST sequences, we mapped expressed sequences onto the draft human genome sequence and only accepted splices that obeyed the standard splice site consensus. A large fraction (47%) of these were observed multiple times, indicating that they comprise a substantial fraction of the mRNA species. The vast majority of the detected alternative forms appear to be novel, and produce highly specific, biologically meaningful control of function in both known and novel human genes, e.g. specific removal of the lysosomal targeting signal from HLA-DM ß chain, replacement of the C-terminal transmembrane domain and cytoplasmic tail in an FC receptor ß chain homolog with a different transmembrane domain and cytoplasmic tail, likely modulating its signal transduction activity. Our data indicate that a large proportion of human genes, probably 42% or more, are alternatively spliced, but that this appears to be observed mainly in certain types of molecules (e.g. cell surface receptors) and systemic functions, particularly the immune system and nervous system. These results provide a comprehensive dataset for understanding the role of alternative splicing in the human genome, accessible at http://www.bioinformatics.ucla.edu/HASDB. | INTRODUCTION |
|---|
|
|
|---|
Alternative splicing is an important mechanism for modulating gene function. It can change how a gene acts in different tissues and developmental states by generating distinct mRNA isoforms composed of different selections of exons. Alternative splicing has been implicated in many processes, including sex determination (1), apoptosis (2) and acoustic tuning in the ear (3). Recently, it has been suggested that if alternative splicing is widespread in the human genome, it could represent a relatively efficient expansion of the genomes vocabulary of variant genes, by producing multiple functional forms of many genes. Its functional implications can be simple, generating a single alternative form, or can produce remarkable diversity. In the Drosophila gene Dscam, combinatorial alternative splicing of cassettes of exons reminiscent of the combinatorial generation of immunoglobulin diversity, produces thousands of distinct functional isoforms (4). This gene, homologous to the human gene for Downs syndrome cell adhesion molecule (DSCAM), appears to be involved in neuronal guidance, where such diversity could be useful as a molecular address.
Alternative splicing has been studied intensively in hundreds of human genes (1,5), and it appears to be widespread, occurring in 530% of human genes (6,7) or perhaps as many as 3540% (8,9). Recently, it has been reported that alternative splicing can be detected in expressed sequence tag (EST) sequencing (9) and has been analyzed in a collection of full-length mRNAs (8). Based upon estimates of the total number of human genes (10,11), it is likely that at least 10 00020 000 human genes are alternatively spliced. However, currently only 899 alternatively spliced human genes are catalogued in the Alternative Splicing Database of Mammals (AsMamDB) (12).
We have performed a genome-wide analysis of alternative splicing based on human expressed sequence data, which greatly expands our knowledge of this central function in human molecular biology (Table 1). We have identified tens of thousands of splices, and thousands of alternative splices, in several thousand human genes. We have mapped all of these onto the draft human genome sequence, and verified that the putative splice junctions detected in the expressed sequences map onto genomic exonintron junctions that match the known splice site consensus. Based on this genome-wide analysis of gene structure and alternative splicing, we have constructed a Human Alternative Splicing Database, at http://www.bioinformatics.ucla.edu/HASDB. In this paper we also show how our database can be used to study the impact of alternative splicing on protein function. We present an initial analysis of the patterns and functional role of alternative splicing across the human genome.
|
As we seek to show with examples in this paper, our database could be a useful resource to researchers who have found a new cDNA or human gene and wish to find additional information. It can help answer a wide range of questions, e.g. Are the two bands on a western blot due to alternative splicing? or Do the genes in protein family X all use alternative splicing as a mechanism to modulate function? The database integrates a variety of data for each gene, ranging from genomic map location to gene structure, with links to external resources such as GenBank, OMIM, SWISS-PROT etc. It provides a detailed alignment of the ESTs, mRNAs, genomic DNA and protein sequence, showing single nucleotide polymorphisms (SNPs) (13), exons and introns, splice site junctions, alternative splices and, most importantly, the raw experimental evidence for all of these features, including chromatogram traces from public EST sequencing projects.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Data sources
Our analysis is based on two major types of data: human genomic sequence assemblies and human EST sequences. Human genomic assembly sequences (accession no. NT_XXXX) and draft BAC clone sequences [accession nos ACXXXX, ALXXXXX, etc. (14)] were downloaded from NCBI (ftp://ncbi.nlm.nih.gov/genome/seq and ftp://ncbi.nlm.nih.gov/genbank/gbhtgXXX.seq.gz). For partially sequenced clones, draft fragments of 4 kb or longer continuous sequence were included in our analysis. All the work described in this paper is based on the October 2000 release of these data. Human EST sequences were downloaded from UNIGENE (ftp://ncbi.nlm.nih.gov/repository/UniGene). We used the EST clustering provided by UNIGENE, and did not perform our own re-clustering of the EST sequences. All the work described in this paper is based on the December 2000 release of UNIGENE.
Genomic mapping of expressed sequence clusters
Consensus sequences from our previous analysis of human expressed sequences (13) were searched against a database of human genomic assembly sequences using BLAST (15). We used consensus sequences to eliminate non-consensus features of each UNIGENE cluster, such as EST sequencing errors, chimeric ESTs or contamination by a minority of similar but paralogous sequences. The consensus sequence excludes these features, and should prevent them from affecting the genomic search. Our assembly and consensus analysis of UNIGENE was previously described as part of SNP discovery from human expressed sequences (13), and the consensus sequences are available at http://www.bioinformatics.ucla.edu/snp/. Briefly, after assembly, the maximum likelihood traversal of the ESTmRNA alignment is generated using dynamic programming, producing a consensus sequence that excludes minority features such as sequencing errors, sequence differences due to paralog contamination, unaligned ends and inserts due to chimeric sequences or unspliced introns.
The search for the genomic location of each UNIGENE cluster was performed in two stages. First, we identify the candidate gene regions in the genomic sequence for a given consensus using a BLAST threshold of E < 1050 and a nucleotide mismatch penalty of 11. Secondly, to check the candidate gene location, we searched for radiation hybrid mapping data for sequence tagged sites (STS) linked with this gene. Candidate regions that did not agree with the STS mapping information for the cluster were discarded. Thirdly, we identified the putative exons, by using a lower threshold (E < 1010) that will report shorter exons. The resulting BLAST hits must span the entire consensus, allowing only up to 100 bp of unmatched sequence at the consensus ends. Allowing BLAST this short unmatched region at the ends is necessary, since it may not identify very small exons reliably. Genomic candidates are assessed in order of ascending expectation value, until a candidate passes our second BLAST stage. The matching genomic region, plus 2 kb on either end to allow for short or fragmentary exons at the ends that BLAST may have missed, is aligned with the complete set of ESTs and mRNAs for the UNIGENE cluster using dynamic programming (16,17), truncating the gap extension penalty beyond 16 bp to allow for introns. The full EST and mRNA sequence must match the genomic sequence to be kept for the alternative splicing analysis. If an EST has 6 bp or more of insert relative to genomic, it is excluded. Using this procedure we mapped 47 422 of the 86 244 UNIGENE clusters onto the genomic sequence. Based on our analysis of chromosome 22, and comparison with NCBIs Acembly gene mapping, we estimate a false negative rate for our mapping procedure of 20%, and an upper bound for its false positive rate of 35% (see Results).
Alternative splicing analysis
Splicing is detected by a computational procedure that analyzes the genomicESTmRNA multiple sequence alignment. Briefly, the gene structure is marked on the genomic sequence, based on its alignment with ESTs and mRNAs, by drawing a connection between each pair of genomic letters aligned to a pair of letters in an expressed sequence that are adjacent (i.e. nucleotide i and i+1). Thus, an exon is identified by a contiguous segment of connected letters, an intron by a contiguous segment of unconnected letters and a splice by a connection that jumps from one genomic letter to a distant genomic letter. Thus, a candidate splice is detected as a gap between two exons that match a single contiguous region of one or more ESTs. We report splices only for connections that skip >10 bp in the genomic sequence (representing an effective minimum intron length) to screen out sequencing error or alignment heterogeneity artefacts. Individual splice observations from different ESTs are joined together when their 5' splice sites match within 6 bp in the genomic sequence, and their 3' splice sites also match within 6 bp. This level of variation is permitted to screen out sequencing error and alignment artefacts that could give spurious alternative splices. All candidate splices were checked against the standard consensus splicing sequences, and all candidates with mismatches were discarded. It is possible that some of these mismatches may be viable deviations from the consensus sequence and represent real splices. However, we have excluded them from consideration in the results presented in this paper. This procedure was designed to be robust, and even in cases with a mis-assembled genomic sequence should not report spurious splices. Instead, genomic mis-assembly would likely cause mismatch with the ESTs, and complete exclusion of the cluster from our analysis. It should be noted that ESTmRNA versus genomic sequence alignments occasionally contain degenerate alignment positions, in which one or more nucleotides are identical in the genomic sequence on either side of a gap (intron). In this case our software checks each of the equivalent alignments to identify the correct splice junction.
Alternative splices are reported when two detected splices overlap in the genomic sequence (and thus are mutually exclusive events). One important consequence of this definition is that alternative splicing always requires positive evidence (i.e. strong match of EST to genomic) on both sides of each compared splice. An alternative splice will never be reported simply because one EST is longer or shorter than others, or even if vector sequence was attached at one end. [Vector sequences are screened out of UNIGENE (18) data. However, it is still important to note that heterogeneity at EST ends will not give rise to reported alternative splices.] All splices, alternative splices, individual splice observations in specific sequences, source library information, gene information, genomic mapping information, etc. are stored within a relational database (MySQL), and are accessible for query via the web (http://www.bioinformatics.ucla.edu/HASDB). To assess the fraction of alternative splices detected based on mRNA, EST versus mRNA or EST versus EST evidence (Fig. 5D), we used a database query to compute these numbers for all the alternative splices in our database.
|
Functional analysis of alternative splicing
We have performed extensive visual analysis and verification of our results, for hundreds of different genes. We used the GeneMine software system (19) to validate all aspects of the genomic mapping of our clusters, the exons, introns, splice sites, alternative splicing analysis and impact on protein structure and function, by thoroughly examining each of these features in the genomicESTmRNA multiple sequence alignments. The GeneMine software is freely available to academic researchers (http://www.bioinformatics.ucla.edu/genemine).
To characterize the functional impacts of alternative splicing, a random sample of 50 clusters with alternative splicing and at least one full-length mRNA was generated (Table 2). The mRNA requirement was imposed to ensure that the cluster would contain as complete a set of the genes exons as possible, to cover the full coding region and untranslated regions (UTRs). Without such coverage in many cases it is not even possible to define what the actual bounds of the coding region are, let alone get unbiased sampling of the coding region versus UTRs. To characterize the function of each gene product at both the cellular and systemic level required careful manual evaluation and study (i.e. not only sequence analysis but also digging into the available literature and information on the web). We did not feel that the twin objectives of accurate classification of the functional effects of alternative splicing and lack of selection bias could be provided reliably by electronic annotations at this time, although this is a very interesting area for further work. The effect of each alternative splice was evaluated manually, by careful examination of the complete alignment and available information using the GeneMine software. Most importantly, we considered all possible changes in the boundaries of the coding region (alternative initiation, alternative termination, truncation, extension, in-frame deletion and insertion). Since an alternative splice can change where the coding region starts and ends, it is incorrect to classify it as the UTR simply because it is upstream of the translation start site given by the GenBank annotation for the gene. We have adopted the policy that any alternative splice that alters the protein product will be classified as a coding region, regardless of its location relative to the GenBank CDS annotation. In the process, the alternative splices affecting the coding region were identified as changing the N-terminal, C-terminal or internal region of the protein.
|
| RESULTS |
|---|
|
|
|---|
Detection of alternative splicing
Our analysis of alternative splicing is based strictly on experimental data, not theoretical models. Rather than seeking to predict alternative splices, we directly detect them as large inserts in EST data from the publicly available dbEST (20) and UNIGENE (18) databases. We measure the evidence for a genuine alternative splice via a series of criteria (Fig. 1). First, a set of ESTs must match over their full lengths, on both sides of a putative alternative splice (allowing for sequence error). A large insert in the middle of such a perfect match is a candidate alternative splice. Unlike many other types of genomics results such as SNPs and variations in expression level, alternative splicing does not resemble common experimental noise (such as sequencing error).
|
Next, the EST consensus sequence is mapped to the draft human genome sequence by homology search. Because human genes are broken into short exons, a genomic hit typically consists of many short matches. To be valid, these matches must be perfect (again allowing only for sequencing error), must all be in the same orientation (strand) and form a complete, correctly ordered walk through the EST consensus sequence. We require that each genomicEST match region (putative exon) be bounded by consensus splice donor site and acceptor site sequences in the neighboring genomic (intron) sequence. Our results give an average internal exon size of 144 bp, with only 4% of internal exons >300 bp in length, similar to results obtained for known genes (21). Only 0.2% (79/39 862) of our introns were <60 bp, and the median intron length was 935 bp. The typical gene pattern of short internal exons ending in a single, long 3' exon can usually be verified because 3'-end sequences are highly represented in the EST data, and because 3' ESTs can be identified by their conspicuous poly(A) tails, which directly indicate the end of the 3' exon.
To assess the accuracy of our gene mapping and exon/intron structure, we have compared against the completely independent data produced by NCBIs Acembly, a human curated gene annotation effort (data downloaded from ftp://ncbi.nlm.nih.gov/genomes/H_sapiens). LocusLink provides an independent linkage between individual RefSeq genes and UNIGENE clusters (22). For genes mapped independently to the genomic sequence by RefSeq and our procedure, 97.3% mapped to the same genomic contig. Moreover, of those genes, 95% were mapped to the same nucleotides of the contig. While Acemblys mapping should not be assumed to be perfect, this high level of agreement between independent efforts is encouraging. Our exon details (derived in our procedure from our splice detection) match the NCBI Acembly exons in 97% of cases at the 5' splice site, and 96% at the 3' splice site (overall, 94% of the exons were identical). Our splice details matched the NCBI Acembly introns in 93% of cases at the 5'-end, and 92% at the 3'-end (86% matched exactly at both ends). Because of alternative splicing, a 100% correspondence is not expected.
A candidate alternative splice insert (from the EST) must pass a series of tests. First, it must also be found in the genomic sequence, matching an exonic region in the genomic sequence whose boundaries correspond to known splice site sequences. Since these splice site sequences are mostly intronic, this provides an independent validation of the alternative splice. It should be emphasized that differences in where ESTs begin and end in a gene (e.g. a shorter EST might give the appearance of a truncated gene product) will never be interpreted as an alternative splice by our procedure. We focus exclusively on detecting splicing, i.e. a contiguous region of the transcript that has been removed during mRNA processing. Detecting a splice in an EST requires extensive matches to both upstream and downstream exons. Our analysis identified 39 862 splices in 8429 clusters. Our analysis only reports alternative splices, i.e. pairs of validated splices that are mutually exclusive. Thus unspliced introns or other genomic contaminants will never be reported, since they result in the absence of a splice, not the creation of a new, mutually exclusive splice. To call an alternative splice, our procedure requires a pair of splices that match exactly at one splice site, and differ at the second splice site. This procedure can detect exon skipping, alternative 5' donor sites, and alternative 3' acceptor sites (Fig. 1B). 6201 such alternative splice relationships were identified in 2272 clusters. These diverse forms of evidence produce strong log odds scores for each detected alternative splice. A detailed statistical analysis of this evidence will be presented elsewhere (D.Miller, J.Aten, C.Grasso, B.Modrek and C.Lee, manuscript in preparation).
As a typical validation example from our database, we illustrate the dystrophia myotonica protein kinase (DMPK) gene (Fig. 2), whose alternative splicing has previously been studied extensively. In DMPK, we identified three alternative splices in the EST data, all of which are verified by independent experimental results in the existing literature (23). Of the three alternative splices, one deletes the last 15 bp of exon 8, another skips exon 12 and exon 13, and the last deletes just 4 bp in exon 14. Figure 2 shows one of these alternative splice forms including junction and quality of match of the EST evidence versus the genomic sequence.
|
Novel alternative splice forms of a known gene
Figure 3 shows several novel alternative splices detected in a well-studied gene, HLA-DM ß. Eighty ESTs from UNIGENE cluster Hs.1162 align to form a consensus sequence, which in turn matches an ordered series of segments on one strand of chromosome 6. The EST sequences match the genomic sequence closely, consistent with sequencing error. The EST sequences mark out a long 3' exon (359 bp) plus a series of five short exons, whose sizes (36288 bp) match the range expected for internal exons. This matches the known gene location and structure for HLA-DM ß (24,25). Eight splices are observed in these ESTs, where sequence matching one exonic region skips directly to a downstream exonic region as indicated in Figure 3A. The 16 putative exon boundaries implied by the ESTs map precisely to strong consensus splice acceptor and donor sites in the genomic sequence (Fig. 3C).
|
Four different alternatively spliced forms of HLA-DM ß are observed: splices 3+4+5 (including exons IV and V in the mRNA product); splices 6+5 (skipping exon IV); splices 3+7 (excluding exon V); splice 8 (skipping exons IV and V). Exons IV and V are 117 and 36 bp in length, and thus these alternative splices are all in-frame. The protein coding region begins in exon I and ends in exon VI, so these splices produce four different forms of the HLA-DM ß chain that differ at their C-terminus.
Analysis of these forms reveals a remarkably simple and intriguing functional effect. HLA-DM is essential for the loading of class II MHC molecules with exogenous peptide antigens, a key step in antigen presentation and activation of the humoral immune response. This is thought to occur in early lysosomal compartments. HLA-DM is normally targeted to lysosomes, and its ß chain contains a transmembrane domain anchoring its C-terminus (26,27). Exon IV is short, and corresponds precisely to the transmembrane domain. Exon V is very short, and encodes the lysosomal targeting signal YTPL, whose first residue begins at the start of the exon. Thus, the alternative splice regulates HLA-DMs targeting to endosomal compartments (by including or excluding the YTPL signal), as well as its anchoring to the membrane. Given HLA-DMs importance in antigen processing and presentation by class II MHC, this regulation is functionally interesting. Removing its targeting signal would likely redirect HLA-DM first to the plasma membrane, so that it would travel to lysosomes via endocytic pathways, altering the kinetics and conditions in which it first encounters class II MHC. It appears that the gene structure of the HLA-DM ß gene has been carefully designed to enable control of HLA-DM function, by pulling out both the transmembrane helix and the lysosomal targeting signal into separate short exons (IV, V) that can be alternatively spliced in-frame (exon VI supplies the last 4 amino acids of the protein, identical in all forms). The alternatively spliced forms were detected in uterus (two ESTs), placenta, lymph, stomach and colon. Despite the fact that HLA-DM is the subject of intense research, we have not been able to find any report of such alternative splicing in the published literature, and it is thought to be novel by an expert on HLA-DM (E.Mellins, personal communication).
Scope of alternative splicing in human genes
Our genome-wide analysis detected thousands of alternative splices in the current, publicly available human genome data (Table 1). 6201 alternative splice relationships were detected in which two splices shared a common donor or acceptor site, but spliced to a different site on their other end (i.e. exon skipping, alternative 5' splice donor site or alternative 3' splice acceptor site; Fig. 1B). We found alternative splices in 27% of genes for which we had enough expressed sequence to cover more than a single exon. However, this estimate, based on analysis of all EST clusters, likely underestimates the real occurrence of alternative splicing, because the available EST data typically cover only a small part of the complete gene. To test this hypothesis, we analyzed the alternative splicing rate in genes for which mRNA sequence was available (representing all or part of the full gene). We detected one or more alternative splice forms in 42% of these genes, significantly higher than the rate observed in EST-only clusters. This is in close agreement with a previous study of mRNA-based expressed sequence clusters (8). Since fragmentation of the genomic sequence can also block complete coverage of a gene, we assessed the rate of alternative splice detection in genes mapped to chromosome 22. Of these, 43% contained alternative splices, including both mRNA and EST-only clusters.
The current EST data appears to be incomplete. Our procedure identified splices (i.e. multiple exons) in only 18% of the mapped EST clusters. However, for clusters that we mapped to chromosome 22 (full genomic) that also had an mRNA sequence, 88% contained at least one splice. A variety of factors such as the fragmentation of the draft human genome sequence, the large size of introns and the tendency of the ESTs to cluster at the 3'-end bias the current dataset against finding full-length genes, and probably underestimate the true level of alternative splicing. Moreover, since the current EST data for each gene represent only a subset of the tissues and cell types in which that gene is expressed, it is likely that the total occurrence of alternative splicing is much greater than what our analyses can detect. A large fraction of the EST alternative splice forms were observed multiple times (from different clones and different libraries), indicating that they constitute a relatively high fraction of total mRNA. Of our alternative splices, 2892 (47%) were observed in two or more EST sequences. These data represent a high confidence subset of the detected alternative splices.
Our analysis indicates that the vast majority of our database represents novel findings (Fig. 5D). Only 13% of our alternative splices were detected in mRNA sequences from GenBank, which presumably have been thoroughly studied. The remaining 87% could be detected only with ESTs. Our procedure also detected large numbers of alternative splicing events in completely novel genes. Approximately 1200 alternative splices were detected in clusters containing ESTs only.
Alternative splicing in a novel human gene
Figure 4 illustrates an example of alternative splice detection in a novel gene mapped in the human genome by our procedure. This gene has 33% identity to rat FC
receptor I ß chain, and 25% identity to CD20, and has a pattern of four predicted transmembrane domains characteristic of both proteins. At least seven different forms are detectable, all of which affect the protein product. In a pattern strikingly reminiscent of HLA-DM ß, the C-terminal transmembrane region and cytoplasmic tail of the major form (form 1) are placed on a single, short exon (exon VI), that can be included or excluded to create different forms. One particularly interesting form is created by ignoring the normal splice from exon V to exon VI, extending the coding region from exon Va for 142 bp (which we have designated exon Vb). A polyadenylation site is predicted at the end of this sequence, and the ESTs are observed to terminate in poly(A) at this point. This alternative termination replaces the coding region of exons VI and VII with 40 amino acids encoded by exon Vb [terminated by a STOP codon 23 bp before the poly(A) site]. Intriguingly, this replacement C-terminal sequence also contains a predicted transmembrane sequence, and thus neatly substitutes a new C-terminal transmembrane domain and cytoplasmic tail. The cytoplasmic tail in equivalent FC receptor chains plays a key role in activating cytoplasmic signal transduction molecules (28,29), so this alternative form likely modulates the signal transduction activity of this receptor. This form is detected in placenta and kidney, while the majority form was detected in many different libraries.
|
| DISCUSSION |
|---|
|
|
|---|
Our results provide a comprehensive dataset for understanding the role of alternative splicing in the human genome. First of all, what is the function of alternative splicingmodification of the protein product, or of the untranslated regions that could affect mRNA localization and stability? Analysis of a random sample from our database (Table 2) indicates that 74% of alternative splices modified the protein product, whereas 22% were confined to the 5' UTR versus just 4% in the 3' UTR (Fig. 5A). This may simply reflect the larger fraction of exons in human genes that are protein-coding as opposed to UTR. This result fits expectations from molecular biology studies (1), but disagrees strongly with a bioinformatics analysis of a small set of ESTs (9), which reported 80% of genes with alternative splicing had an alternative splice in 5' UTR versus only 20% in coding regions. Our observation of little alternative splicing in 3' UTR is striking in view of the strong bias in the EST data towards the 3' exon. One possible explanation is that mRNA species with alternatively spliced 3' UTR sequence could contain sequences that destabilize the mRNA, resulting in fewer observations of these forms. In contrast, the effect on the protein product is seen much more frequently at the C-terminal end (3' in the mRNA) (Fig. 5B). We observed a tendency to replace the C-terminus (46%), as opposed to making an internal deletion, insertion or substitution (37%), or a replacement of the N-terminus (17%). In this respect, the examples we have shown (HLA-DM ß and FC
receptor I ß homolog) are representative. Alternative splicing appears to be strongly biased to preserve the protein coding frame (Fig. 5C). Only 19% of alternative splices resulted in a truncation of the protein product due to frame-shift; occasionally alternative splicing was observed to add a new, extended C-terminal sequence through frame-shift (6%). Alternative splicing resulted in a switch to a new AUG start site on an alternative exon in 15% of cases. In contrast, replacing the C-terminus by switching to a different exon containing an alternative STOP codon occurred in 20% of cases. In-frame deletion or insertion of a new sequence in the middle of the protein accounted for 29 and 11% of cases, respectively. In what types of molecules is alternative splicing commonly observed? Figure 5G shows a molecular classification of a random sample of alternatively spliced genes. The most abundant category is cell surface functions/receptors (29%), which includes membrane-anchored receptors (e.g. CD79B), integral membrane proteins (e.g. folate transporter SLC19A1) and proteins involved in cell surface adhesion (e.g. lectin, hyaluronoglucosaminidase 2). In two related categories, an additional 14% of alternatively spliced genes encode secreted proteins (e.g. Norrie disease protein; group-specific component) and 9% encode signal transduction molecules (e.g. phospholipase D2; RIT). The next two major categories are transcriptional regulation (14%; e.g. MYB, PAX6) and apoptosis (11%; e.g. BID, PNAS-1). Together, these functions of transmission, reception and response to cellular signals comprise >75% of the observed alternatively spliced genes. Proteins involved in metabolism (e.g. aldolase C), and organelle-specific sorting proteins were also observed. This sample is by no means comprehensive or exact, but indicates a trend towards cell surface interactions and signaling.
What types of systemic functions are most often affected by alternative splicing? Twenty-nine percent of the alternatively spliced genes encoded functions specific to the immune system (Fig. 5F; e.g. T-cell specific transcription factor 7, TNF receptor superfamily member 6). In particular, alternative splicing of immune system cell surface receptors was very prevalent. Neuronal functions (e.g. neuropilin, brain-specific aldolase C) comprised 12% of the total. The remaining genes possessed no clearly specific systemic function. These data suggest alternative splicing may play a large role in immune system and nervous system functions which require precise control of cellular differentiation and activation, to process large amounts of information. Controlling how each cell responds to a diverse array of signals can be achieved through alternative splicing of its receptors and signal transduction molecules.
How often is alternative splicing clearly associated with a specific tissue? Based on a sample of 50 genes,
14% of alternatively spliced genes in our dataset showed evidence of tissue specificity for the minor isoform. This estimate is based on a conservative definition requiring that the minor isoform be observed multiple times in a specific tissue in which the major form was not observed. Since in many known cases of tissue-specific alternative splicing both minor and major forms are observed in the same tissue, this probably misses many cases of real tissue-specificity. Examples include DDR1, discoidin domain receptor, which has a minor form observed in muscle; and CG1I, a putative cyclin G1 interacting protein, which has isoforms observed specifically in ovary and brain. Within the small sample, tissue-specific minor isoforms were observed in novel, uncharacterized genes in brain, colon, testis and prostate.
How comprehensive is our dataset, and what are its prospects for growth? We have noted two causes of failure by our procedure to detect alternatively spliced forms that are known in the literature. First, a given gene may not map yet to the draft genome, a prerequisite in our procedure for analyzing its splicing. Secondly, some alternatively spliced mRNA forms are miscategorized as genomic DNA in GenBank, causing them to be excluded by our procedure. The former seems to be the most important cause of failure. Despite >90% completeness by total nucleotides sequenced, the draft genome used in this study (October 2000) only enabled mapping of 55% of UNIGENE expressed sequence clusters, because we require a full-length match versus the expressed gene sequence consensus (Table 1). The draft (i.e. incomplete) BAC clone sequences which constituted the majority of this dataset, consisted in large part of short sequence fragments (410 kb) separated by unsequenced regions. Such fragments are too small to map a typical human gene (1030 kb) by our conservative procedure. This trend is even stronger for the subset of genes that have full-length mRNAs. Of these clusters, only 41% could be mapped over their full length to an available genomic contig. To check whether this is due to the draft genomes fragmentation, we analyzed a subset of gene clusters that have been mapped by STS to chromosome 22, which has been almost completely sequenced. For these clusters, 77% could be mapped. Thus, given unbroken genomic sequence, our mapping procedure has a false negative rate of
20%. These data suggest that completion of the human genome sequence, along with improvements in our algorithms, will at least double the number of alternative splices detected. Our detection of alternative splicing should also grow with increasing EST data. In our current EST dataset (December 2000), splices were detected in only 18% of clusters, reflecting the fact that the average cluster consists of too few ESTs (one or two) and is too short (a few hundred base pairs) to cover more than a single exon. This is exaggerated by the strong bias of the ESTs to be from the 3'-end, since 3' exons tend to be much longer than typical internal exons. In contrast, in genes for which a full-length or partial mRNA sequence was available and which were mapped to a region of full-length genomic sequence (e.g. chromosome 22), 88% contained at least one splice (and typically many more).
| ACKNOWLEDGEMENTS |
|---|
We wish to thank D. Black, D. Miller, S. Galbraith and D. Eisenberg for their helpful discussions and comments on this work, and K. Ke for assistance in constructing the HASDB web site. This work was supported by Department of Energy grant DEFG0387ER60615 and a grant from the Searle Scholars Program to C.L. B.M. is a predoctoral trainee supported by NSF IGERT Award #DGE-9987641.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 310 825 7374; Fax: +1 310 267 0248; Email: leec{at}mbi.ucla.edu
| REFERENCES |
|---|
|
|
|---|
-
1 Lopez,A.J. (1998) Alternative splicing of pre-mRNA: developmental consequences and mechanisms of regulation. Annu. Rev. Genet., 32, 279305.[Web of Science][Medline]
2 Boise,L.H., Gonzalez-Garcia,M., Postema,C.E., Ding,L., Lindsten,T., Turka,L.A., Mao,X., Nunez,G. and Thompson,C.B. (1993) bcl-x, a bcl-2-related gene that functions as a dominant regulator of apoptotic cell death. Cell, 74, 597608.[Web of Science][Medline]
3 Fettiplace,R. and Fuchs,P.A. (1999) Mechanisms of hair cell tuning. Annu. Rev. Physiol., 61, 809834.[Web of Science][Medline]
4 Schmucker,D., Clemens,J.C., Shu,H., Worby,C.A., Xiao,J., Muda,M., Dixon,J.E. and Zipursky,S.L. (2000) Drosophila Dscam is an axon guidance receptor exhibiting extraordinary molecular diversity. Cell, 101, 671684.[Web of Science][Medline]
5 Smith,C.W.J. and Valcarcel,J. (2000) Alternative pre-mRNA splicing: the logic of combinatorial control. Trends Biochem. Sci., 25, 381388.[Web of Science][Medline]
6 Sharp,P.A. (1994) Split genes and RNA splicing. Cell, 77, 805815.[Web of Science][Medline]
7 Sutcliffe,J.G. and Milner,R.J. (1988) Alternative mRNA splicing: the Shaker gene. Trends Genet., 4, 297299.[Web of Science][Medline]
8 Brett,D., Hanke,J., Lehmann,G., Haase,S., Delbruck,S., Krueger,S., Reich,J. and Bork,P. (2000) EST comparison indicates 38% of human mRNAs contain possible alternative splice forms. FEBS Lett., 474, 8386.[Web of Science][Medline]
9 Mironov,A.A., Fickett,J.W. and Gelfand,M.S. (1999) Frequent alternative splicing of human genes. Genome Res., 9, 12881293.
10 Liang,F., Holt,I., Pertea,G., Karamycheva,S., Salzberg,S.L. and Quackenbush,J. (2000) Gene Index analysis of the human genome estimates approximately 120,000 genes. Nat. Genet., 25, 239240.[Web of Science][Medline]
11 Ewing,B. and Green,P. (2000) Analysis of expressed sequence tags indicates 35,000 human genes. Nat. Genet., 25, 232234.[Web of Science][Medline]
12 Ji,H., Zhou,Q., Wen,F., Xia,H., Lu,X. and Li,Y. (2001) AsMamDB: an alternative splice database of mammals. Nucleic Acids Res., 29, 260263.
13 Irizarry,K., Kustanovich,V., Li,C., Brown,N., Nelson,S., Wong,W. and Lee,C. (2000) Genome-wide analysis of single-nucleotide polymorphisms in human expressed sequences. Nat. Genet., 26, 233236.[Web of Science][Medline]
14 Jang,W., Chen,W.C., Sicotte,H. and Schuler,G.D. (1999) Making effective use of human genomic sequence data. Trends Genet., 15, 284286.[Web of Science][Medline]
15 Altschul,S.F., Gish,W., Miller,W., Myers,E.W. and Lipman,D.J. (1990) Basic local alignment search tool. J. Mol. Biol., 215, 403410.[Web of Science][Medline]
16 Needleman,S.B. and Wunsch,C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443453.[Web of Science][Medline]
17 Smith,T.F. and Waterman,M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol., 147, 195197.[Web of Science][Medline]
18 Schuler,G. (1997) Pieces of the puzzle: expressed sequence tags and the catalog of human genes. J. Mol. Med., 75, 694698.[Web of Science][Medline]
19 Lee,C. and Irizarry,K. (2001) The GeneMine system for genome/proteome annotation and collaborative data-mining. IBM Syst. J., 40, in press.
20 Boguski,M.S., Lowe,T.M. and Tolstoshev,C.M. (1993) dbESTdatabase for expressed sequence tags. Nat. Genet., 4, 332333.[Web of Science][Medline]
21 Hawkins,J.D. (1988) A survey on intron and exon lengths. Nucleic Acids Res., 16, 98939905.
22 Pruitt,K.D. and Maglott,D.R. (2001) RefSeq and LocusLink: NCBI gene-centered resources. Nucleic Acids Res., 29, 137140.
23 Groenen P.J., Wansink,D.G., Coerwinkel,M., van den Broek,W., Jansen,G. and Wieringa,B. (2000) Constitutive and regulated modes of splicing produce six major myotonic dystrophy protein kinase (DMPK) isoforms with distinct properties. Hum. Mol. Genet., 9, 605616.
24 Kelly,A.P., Monaco,J.J., Cho,S.G. and Trowsdale,J. (1991) A new human HLA class II-related locus, DM. Nature, 353, 571573.[Medline]
25 Shaman,J., von Scheven,E., Morris,P., Chang,M.Y. and Mellins,E. (1995) Analysis of HLA-DMB mutants and -DMB genomic structure. Immunogenetics, 41, 117124.[Web of Science][Medline]
26 Sanderson,F., Kleijmeer,M.J., Kelly,A.P., Verwoerd,D., Tulp,A., Neefjes,J., Geueze,H.J. and Trowsdale,J. (1994) Accumulation of HLA-DM, a regulator of antigen presentation, in MHC class II compartments. Science, 266, 15661569.
27 Potter,P.K., Copier,J., Sacks,S.H., Calafat,J., Janssen,H., Neefjes,J. and Kelly,A.P. (1999) Accurate intracellular localization of HLA-DM requires correct spacing of a cytoplasmic YTPL targeting motif relative to the transmembrane domain. Eur. J. Immunol., 29, 39363944.[Web of Science][Medline]
28 Daeron,M. (1997) Fc receptor biology. Annu. Rev. Immunol., 15, 203234.[Web of Science][Medline]
29 Kinet,J.P. (1999) The high affinity IgE receptor (FC
RI): from physiology to pathology. Annu. Rev. Immunol., 17, 931972.[Web of Science][Medline]
This article has been cited by other articles:
![]() |
K. K. Kim, R. S. Adelstein, and S. Kawamoto Identification of Neuronal Nuclei (NeuN) as Fox-3, a New Member of the Fox-1 Gene Family of Splicing Factors J. Biol. Chem., November 6, 2009; 284(45): 31052 - 31061. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Vandiedonck and J. C. Knight The human Major Histocompatibility Complex as a paradigm in genomics research Brief Funct Genomic Proteomic, September 1, 2009; 8(5): 379 - 394. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Juneau, C. Nislow, and R. W. Davis Alternative Splicing of PTC7 in Saccharomyces cerevisiae Determines Protein Localization Genetics, September 1, 2009; 183(1): 185 - 194. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Melamud and J. Moult Structural implication of splicing stochastics Nucleic Acids Res., August 1, 2009; 37(14): 4862 - 4872. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Melamud and J. Moult Stochastic noise in splicing machinery Nucleic Acids Res., August 1, 2009; 37(14): 4873 - 4886. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Meyer and J. Vilardell The quest for a message: budding yeast, a model organism to study the control of pre-mRNA splicing Brief Funct Genomic Proteomic, March 11, 2009; (2009) elp002v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Fox-Walsh and K. J. Hertel Splice-site pairing is an intrinsically high fidelity process PNAS, February 10, 2009; 106(6): 1766 - 1771. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A Reeves, D. Talavera, and J. M Thornton Genome and proteome annotation: organization, interpretation and integration J R Soc Interface, February 6, 2009; 6(31): 129 - 147. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Yu, Y. Hai, G. Liu, T. Fang, S. K. P. Kung, and J. Xie The Heterogeneous Nuclear Ribonucleoprotein L Is an Essential Component in the Ca2+/Calmodulin-dependent Protein Kinase IV-regulated Alternative Splicing through Cytidine-Adenosine Repeats J. Biol. Chem., January 16, 2009; 284(3): 1505 - 1513. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Xi, A. Feber, V. Gupta, M. Wu, A. D. Bergemann, R. J. Landreneau, V. R. Litle, A. Pennathur, J. D. Luketich, and T. E. Godfrey Whole genome exon arrays identify differential expression of alternatively spliced, cancer-related genes in lung cancer Nucleic Acids Res., November 1, 2008; 36(20): 6535 - 6547. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Topp, J. Jackson, A. A. Melton, and K. W. Lynch A cell-based screen for splicing regulators identifies hnRNP LL as a distinct signal-induced repressor of CD45 variable exon 4 RNA, October 1, 2008; 14(10): 2038 - 2049. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-T. Huang, F.-C. Chen, C.-J. Chen, H.-L. Chen, and T.-J. Chuang Identification and analysis of ancestral hominoid transcriptome inferred from cross-species transcript and processed pseudogene comparisons Genome Res., July 1, 2008; 18(7): 1163 - 1170. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Oshikawa, Y. Sugai, R. Usami, K. Ohtoko, S. Toyama, and S. Kato Fine Expression Profiling of Full-length Transcripts using a Size-unbiased cDNA Library Prepared with the Vector-capping Method DNA Res, June 1, 2008; 15(3): 123 - 136. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. R Copley The animal in the genome: comparative genomics and evolution Phil Trans R Soc B, April 27, 2008; 363(1496): 1453 - 1461. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Irimia, J. L. Rukov, D. Penny, J. Garcia-Fernandez, J. Vinther, and S. W. Roy Widespread Evolutionary Conservation of Alternatively Spliced Exons in Caenorhabditis Mol. Biol. Evol., February 1, 2008; 25(2): 375 - 382. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. Hertel Combinatorial Control of Exon Recognition J. Biol. Chem., January 18, 2008; 283(3): 1211 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Shi, Z. Hu, K. Pabon, and K. W. Scotto Caffeine Regulates Alternative Splicing in a Subset of Cancer-Associated Genes: a Role for SC35 Mol. Cell. Biol., January 15, 2008; 28(2): 883 - 895. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. R. Gauthier, D. L. Duhamel, M. Iezzi, S. Theander, F. Saltel, M. Fukuda, B. Wehrle-Haller, and C. B. Wollheim Synaptotagmin VII splice variants {alpha}, , and {delta} are expressed in pancreatic -cells and regulate insulin exocytosis FASEB J, January 1, 2008; 22(1): 194 - 206. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kwan, D. Benovoy, C. Dias, S. Gurd, D. Serre, H. Zuzan, T. A. Clark, A. Schweitzer, M. K. Staples, H. Wang, et al. Heritability of alternative splicing in the human genome Genome Res., August 1, 2007; 17(8): 1210 - 1218. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. V. Alekseyenko, N. Kim, and C. J. Lee Global analysis of exon creation versus loss and the role of alternative splicing in 17 vertebrate genomes RNA, May 1, 2007; 13(5): 661 - 670. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Du, J. M. Pollard, and R. A. Gatti Correction of prototypic ATM splicing mutations and aberrant ATM function with antisense morpholino oligonucleotides PNAS, April 3, 2007; 104(14): 6007 - 6012. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Y. Ip, A. Tong, Q. Pan, J. D. Topp, B. J. Blencowe, and K. W. Lynch Global analysis of alternative splicing during T-cell activation RNA, April 1, 2007; 13(4): 563 - 572. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Rukov, M. Irimia, S. Mork, V. K. Lund, J. Vinther, and P. Arctander High Qualitative and Quantitative Conservation of Alternative Splicing in Caenorhabditis elegans and Caenorhabditis briggsae Mol. Biol. Evol., April 1, 2007; 24(4): 909 - 917. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Jaworski, M. Beem-Miller, G. Lluri, and R. Barrantes-Reynolds Potential regulatory relationship between the nested gene DDC8 and its host gene tissue inhibitor of metalloproteinase-2 Physiol Genomics, January 17, 2007; 28(2): 168 - 178. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kim, A. V. Alekseyenko, M. Roy, and C. Lee The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species Nucleic Acids Res., January 12, 2007; 35(suppl_1): D93 - D98. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Nagaraj, R. B. Gasser, and S. Ranganathan A hitchhiker's guide to expressed sequence tag (EST) analysis Brief Bioinform, January 1, 2007; 8(1): 6 - 21. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Xia, J. Bi, and Y. Li Identification of alternative 5'/3' splice sites based on the mechanism of splice site competition Nucleic Acids Res., December 4, 2006; 34(21): 6305 - 6313. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-i. Takeda, Y. Suzuki, M. Nakao, R. A. Barrero, K. O. Koyanagi, L. Jin, C. Motono, H. Hata, T. Isogai, K. Nagai, et al. Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs Nucleic Acids Res., September 1, 2006; 34(14): 3917 - 3928. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Hasegawa, S. Fukuda, K. Shimokawa, S. Kondo, N. Maeda, and Y. Hayashizaki A RecA-mediated exon profiling method Nucleic Acids Res., August 8, 2006; 34(13): e97 - e97. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xing, Q. Wang, and C. Lee Evolutionary Divergence of Exon Flanks: A Dissection of Mutability and Selection Genetics, July 1, 2006; 173(3): 1787 - 1791. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Vacic, L. M. Iakoucheva, and P. Radivojac Two Sample Logo: a graphical representation of the differences between two sets of sequence alignments Bioinformatics, June 15, 2006; 22(12): 1536 - 1537. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Thill, V. Castelli, S. Pallud, M. Salanoubat, P. Wincker, P. de la Grange, D. Auboeuf, V. Schachter, and J. Weissenbach ASEtrap: A biological method for speeding up the exploration of spliceomes Genome Res., June 1, 2006; 16(6): 776 - 786. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-B. Wang and V. Brendel Genomewide comparative analysis of alternative splicing in plants PNAS, May 2, 2006; 103(18): 7175 - 7180. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Fernandez-Real, M. Straczkowski, B. Lainez, M. R Chacon, I. Kowalska, A. Lopez-Bermejo, A. Garcia-Espana, A. Nikolajuk, I. Kinalska, and W. Ricart An alternative spliced variant of circulating soluble tumor necrosis factor-{alpha} receptor-2 is paradoxically associated with insulin action. Eur. J. Endocrinol., May 1, 2006; 154(5): 723 - 730. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhong, J. R. Liu, J. W. Kyle, D. A. Hanck, and W. S. Agnew A profile of alternative RNA splicing and transcript variation of CACNA1H, a human T-channel gene candidate for idiopathic generalized epilepsies Hum. Mol. Genet., May 1, 2006; 15(9): 1497 - 1512. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. B. Fletcher, J. C. Baker, and R. M. Harland FGF8 spliceforms mediate early mesoderm and posterior neural tissue formation in Xenopus Development, May 1, 2006; 133(9): 1703 - 1714. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea Bioinformatics of alternative splicing and its regulation Brief Bioinform, March 1, 2006; 7(1): 55 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
F.-C. Chen, S.-S. Wang, C.-J. Chen, W.-H. Li, and T.-J. Chuang Alternatively and Constitutively Spliced Exons Are Subject to Different Evolutionary Forces Mol. Biol. Evol., March 1, 2006; 23(3): 675 - 682. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Su, J. Wang, J. Yu, X. Huang, and X. Gu Evolution of alternative splicing after gene duplication Genome Res., February 1, 2006; 16(2): 182 - 189. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Liang and L. F. Landweber A genome-wide study of dual coding regions in human alternatively spliced genes Genome Res., February 1, 2006; 16(2): 190 - 196. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Milani, M. Fredriksson, and A.-C. Syvanen Detection of Alternatively Spliced Transcripts in Leukemia Cell Lines by Minisequencing on Microarrays Clin. Chem., February 1, 2006; 52(2): 202 - 211. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Shemesh, A. Novik, S. Edelheit, and R. Sorek Genomic fossils as a snapshot of the human transcriptome PNAS, January 31, 2006; 103(5): 1364 - 1369. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Stamm, J.-J. Riethoven, V. Le Texier, C. Gopalakrishnan, V. Kumanduri, Y. Tang, N. L. Barbosa-Morais, and T. A. Thanaraj ASD: a bioinformatics resource on alternative splicing Nucleic Acids Res., January 1, 2006; 34(suppl_1): D46 - D55. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zhang and W. Gish Improved spliced alignment from an information theoretic approach Bioinformatics, January 1, 2006; 22(1): 13 - 20. [Abstract] [Full Text] [PDF] |
||||
![]() |
S.-J. Noh, K. Lee, H. Paik, and C.-G. Hur TISA: Tissue-specific Alternative Splicing in Human and Mouse Genes DNA Res, January 1, 2006; 13(5): 229 - 243. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. XIE, C. JAN, P. STOILOV, J. PARK, and D. L. BLACK A consensus CaMK IV-responsive RNA sequence mediates regulation of alternative exons in neurons RNA, December 1, 2005; 11(12): 1825 - 1834. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. ZHENG, Y.-S. KWON, H.-R. LI, K. ZHANG, G. COUTINHO-MANSFIELD, C. YANG, T. M. NAIR, M. GRIBSKOV, and X.-D. FU MAASE: An alternative splicing database designed for supporting splicing microarray applications RNA, December 1, 2005; 11(12): 1767 - 1776. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. L. ZHENG, X.-D. FU, and M. GRIBSKOV Characteristics and regulatory elements defining constitutive splicing and different modes of alternative splicing in human and mouse RNA, December 1, 2005; 11(12): 1777 - 1787. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Le Sommer, M. Lesimple, A. Mereau, S. Menoret, M.-R. Allo, and S. Hardy PTB Regulates the Processing of a 3'-Terminal Exon by Repressing both Splicing and Polyadenylation Mol. Cell. Biol., November 1, 2005; 25(21): 9595 - 9607. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Liao, T. F. Yong, M. C. Liang, D. T. Yue, and T. W. Soong Splicing for alternative structures of Cav1.2 Ca2+ channels in cardiac and smooth muscles Cardiovasc Res, November 1, 2005; 68(2): 197 - 203. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-L. Xiao, S. R. Smith, N. Ishmael, J. C. Redman, N. Kumar, E. L. Monaghan, M. Ayele, B. J. Haas, H. C. Wu, and C. D. Town Analysis of the cDNAs of Hypothetical Genes on Arabidopsis Chromosome 2 Reveals Numerous Transcript Variants Plant Physiology, November 1, 2005; 139(3): 1323 - 1337. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. J. Dixon, I. C. Eperon, L. Hall, and N. J. Samani A genome-wide survey demonstrates widespread non-linear mRNA in expressed sequences from multiple species Nucleic Acids Res., October 19, 2005; 33(18): 5904 - 5913. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hiller, K. Huse, M. Platzer, and R. Backofen Non-EST based prediction of exon skipping and intron retention events using Pfam information Nucleic Acids Res., October 4, 2005; 33(17): 5611 - 5621. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pritsker, T. T. Doniger, L. C. Kramer, S. E. Westcot, and I. R. Lemischka Diversification of stem cell molecular repertoire by alternative splicing PNAS, October 4, 2005; 102(40): 14290 - 14295. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xing and C. Lee Assessing the application of Ka/Ks ratio test to alternatively spliced exons Bioinformatics, October 1, 2005; 21(19): 3701 - 3703. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Kan, P. W. Garrett-Engele, J. M. Johnson, and J. C. Castle Evolutionarily conserved and diverged alternative splicing events show different expression and functional profiles Nucleic Acids Res., September 29, 2005; 33(17): 5659 - 5666. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xing and C. Lee Colloquium Paper: Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences PNAS, September 20, 2005; 102(38): 13526 - 13531. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Shin, F. E. Kleiman, and J. L. Manley Multiple Properties of the Splicing Repressor SRp38 Distinguish It from Typical SR Proteins Mol. Cell. Biol., September 15, 2005; 25(18): 8334 - 8343. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Roy, Q. Xu, and C. Lee Evidence that public database records for many cancer-associated genes reflect a splice form found in tumors and lack normal splice forms Nucleic Acids Res., September 7, 2005; 33(16): 5026 - 5033. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Baek and P. Green Sequence conservation, relative isoform frequencies, and nonsense-mediated decay in evolutionarily conserved alternative splicing PNAS, September 6, 2005; 102(36): 12813 - 12818. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. GORGONI, S. ANDREWS, A. SCHALLER, D. SCHUMPERLI, N. K. GRAY, and B. MULLER The stem-loop binding protein stimulates histone translation at an early step in the initiation pathway RNA, July 1, 2005; 11(7): 1030 - 1042. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kirschbaum-Slager, R. B. Parmigiani, A. A. Camargo, and S. J. de Souza Identification of human exons overexpressed in tumors through the use of genome and expressed sequence data Physiol Genomics, May 11, 2005; 21(3): 423 - 432. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Sharov, D. B. Dudekula, and M. S.H. Ko Genome-wide assembly and analysis of alternative transcripts in mouse Genome Res., May 1, 2005; 15(5): 748 - 754. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Nakao, R. A. Barrero, Y. Mukai, C. Motono, M. Suwa, and K. Nakai Large-scale analysis of human alternative protein isoforms: pattern classification and correlation with subcellular localization signals Nucleic Acids Res., April 28, 2005; 33(8): 2355 - 2363. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lian and H. R. Garner Evidence for the regulation of alternative splicing via complementary DNA sequence repeats Bioinformatics, April 15, 2005; 21(8): 1358 - 1364. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kim, S. Shin, and S. Lee ECgene: Genome-based EST clustering and gene modeling for alternative splicing Genome Res., April 1, 2005; 15(4): 566 - 576. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Dror, R. Sorek, and R. Shamir Accurate identification of alternatively spliced exons using support vector machine Bioinformatics, April 1, 2005; 21(7): 897 - 901. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Florea, V. Di Francesco, J. Miller, R. Turner, A. Yao, M. Harris, B. Walenz, C. Mobarry, G. V. Merkulov, R. Charlab, et al. Gene and alternative splicing annotation with AIR Genome Res., January 1, 2005; 15(1): 54 - 66. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Kim, N. Kim, Y. Lee, B. Kim, Y. Shin, and S. Lee ECgene: genome annotation for alternative splicing Nucleic Acids Res., January 1, 2005; 33(suppl_1): D75 - D79. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-D. Huang, J.-T. Horng, F.-M. Lin, Y.-C. Chang, and C.-C. Huang SpliceInfo: an information repository for mRNA alternative splicing in human genome Nucleic Acids Res., January 1, 2005; 33(suppl_1): D80 - D85. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Tanino, M.-A. Debily, T. Tamura, T. Hishiki, O. Ogasawara, K. Murakawa, S. Kawamoto, K. Itoh, S. Watanabe, S. J. de Souza, et al. The Human Anatomic Gene Expression Library (H-ANGEL), the H-Inv integrative display of human gene expression across disparate technologies and platforms Nucleic Acids Res., January 1, 2005; 33(suppl_1): D567 - D572. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Hitomi, N. Tsuchiya, A. Kawasaki, J. Ohashi, T. Suzuki, C. Kyogoku, T. Fukazawa, S. Bejrachandra, U. Siriboonrit, D. Chandanayingyong, et al. CD72 polymorphisms associated with alternative splicing modify susceptibility to human systemic lupus erythematosus through epistatic interaction with FCGR2B Hum. Mol. Genet., December 1, 2004; 13(23): 2907 - 2917. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. L. PHILIPPS, J. W. PARK, and B. R. GRAVELEY A computational and experimental approach toward a priori identification of alternatively spliced exons RNA, December 1, 2004; 10(12): 1838 - 1844. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Iida, M. Seki, T. Sakurai, M. Satou, K. Akiyama, T. Toyoda, A. Konagaya, and K. Shinozaki Genome-wide analysis of alternative pre-mRNA splicing in Arabidopsis thaliana based on full-length cDNA sequences Nucleic Acids Res., September 27, 2004; 32(17): 5096 - 5103. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Leipzig, P. Pevzner, and S. Heber The Alternative Splicing Gallery (ASG): bridging the gap between genome and transcriptome Nucleic Acids Res., August 3, 2004; 32(13): 3977 - 3983. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sorek, R. Shemesh, Y. Cohen, O. Basechess, G. Ast, and R. Shamir A Non-EST-Based Method for Exon-Skipping Prediction Genome Res., August 1, 2004; 14(8): 1617 - 1623. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Rafalska, Z. Zhang, N. Benderska, H. Wolff, A. M. Hartmann, R. Brack-Werner, and S. Stamm The intranuclear localization and function of YT521-B is regulated by tyrosine phosphorylation Hum. Mol. Genet., August 1, 2004; 13(15): 1535 - 1549. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Meshorer, D. Toiber, D. Zurel, I. Sahly, A. Dori, E. Cagnano, L. Schreiber, D. Grisaru, F. Tronche, and H. Soreq Combinatorial Complexity of 5' Alternative Acetylcholinesterase Transcripts and Protein Products J. Biol. Chem., July 9, 2004; 279(28): 29740 - 29751. [Abstract] [Full Text] [PDF] |
||||
![]() |
The Ludwig-FAPESP Transcript Finishing Initiative, M. C. Sogayar, and A. A. Camargo A Transcript Finishing Initiative for Closing Gaps in the Human Transcriptome Genome Res., July 1, 2004; 14(7): 1413 - 1423. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. F. GALANTE, N. J. SAKABE, N. KIRSCHBAUM-SLAGER, and S. J. DE SOUZA Detection and evaluation of intron retention events in the human transcriptome RNA, May 1, 2004; 10(5): 757 - 765. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Eyras, M. Caccamo, V. Curwen, and M. Clamp ESTGenes: Alternative Splicing From ESTs in Ensembl Genome Res., May 1, 2004; 14(5): 976 - 987. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Kampa, J. Cheng, P. Kapranov, M. Yamanaka, S. Brubaker, S. Cawley, J. Drenkow, A. Piccolboni, S. Bekiranov, G. Helt, et al. Novel RNAs Identified From an In-Depth Analysis of the Transcriptome of Human Chromosomes 21 and 22 Genome Res., March 1, 2004; 14(3): 331 - 342. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Stoilov, R. Daoud, O. Nayler, and S. Stamm Human tra2-beta1 autoregulates its protein concentration by influencing alternative splicing of its pre-mRNA Hum. Mol. Genet., March 1, 2004; 13(5): 509 - 524. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xing, A. Resch, and C. Lee The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms From EST Fragment Mixtures Genome Res., March 1, 2004; 14(3): 426 - 441. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Resch, Y. Xing, A. Alekseyenko, B. Modrek, and C. Lee Evidence for a subpopulation of conserved alternative splicing events under selection pressure for protein reading frame preservation Nucleic Acids Res., February 24, 2004; 32(4): 1261 - 1269. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. J. Hwang, H.-M. Muller, and P. W. Sternberg Genome annotation by high-throughput 5' RNA end determination PNAS, February 10, 2004; 101(6): 1650 - 1655. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Lainez, J. M. Fernandez-Real, X. Romero, E. Esplugues, J. D. Canete, W. Ricart, and P. Engel Identification and characterization of a novel spliced variant that encodes human soluble tumor necrosis factor receptor 2 Int. Immunol., January 1, 2004; 16(1): 169 - 177. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Thanaraj, S. Stamm, F. Clark, J.-J. Riethoven, V. Le Texier, and J. Muilu ASD: the Alternative Splicing Database Nucleic Acids Res., January 1, 2004; 32(90001): D64 - 69. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Pospisil, A. Herrmann, R. H. Bortfeldt, and J. G. Reich EASED: Extended Alternatively Spliced EST Database Nucleic Acids Res., January 1, 2004; 32(90001): D70 - 74. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



























