Nucleic Acids Research, 2000, Vol. 28, No. 21 4364-4375
© 2000 Oxford University Press
Analysis of canonical and non-canonical splice sites in mammalian genomes
Informatic Division, The Sanger Centre, Hinxton, Cambridge, CB10 1SA, UK
A set of 43 337 splice junction pairs was extracted from mammalian GenBank annotated genes. Expressed sequence tag (EST) sequences support 22 489 of them. Of these, 98.71% contain canonical dinucleotides GT and AG for donor and acceptor sites, respectively; 0.56% hold non-canonical GC-AG splice site pairs; and the remaining 0.73% occurs in a lot of small groups (with a maximum size of 0.05%). Studying these groups we observe that many of them contain splicing dinucleotides shifted from the annotated splice junction by one position. After close examination of such cases we present a new classification consisting of only eight observed types of splice site pairs (out of 256 a priori possible combinations). EST alignments allow us to verify the exonic part of the splice sites, but many non-canonical cases may be due to intron sequencing errors. This idea is given substantial support when we compare the sequences of human genes having non-canonical splice sites deposited in GenBank by high throughput genome sequencing projects (HTG). A high proportion (156 out of 171) of the human non-canonical and EST-supported splice site sequences had a clear match in the human HTG. They can be classified after corrections as: 79 GC-AG pairs (of which one was an error that corrected to GC-AG), 61 errors that were corrected to GT-AG canonical pairs, six AT-AC pairs (of which two were errors that corrected to AT-AC), one case was produced from non-existent intron, seven cases were found in HTG that were deposited to GenBank and finally there were only two cases left of supported non-canonical splice sites. If we assume that approximately the same situation is true for the whole set of annotated mammalian non-canonical splice sites, then the 99.24% of splice site pairs should be GT-AG, 0.69% GC-AG, 0.05% AT-AC and finally only 0.02% could consist of other types of non-canonical splice sites. We analyze several characteristics of EST-verified splice sites and build weight matrices for the major groups, which can be incorporated into gene prediction programs. We also present a set of EST-verified canonical splice sites larger by two orders of magnitude than the current one (22 199 entries versus ~600) and finally, a set of 290 EST-supported non-canonical splice sites. Both sets should be significant for future investigations of the splicing mechanism.
* To whom correspondence should be addressed at present address: EOS Biotechnology, 225A Gateway Boulevard, South San Francisco, CA 94080, USA. Tel: +1 650 246 2331; Fax: +1 650 583 3881; Email: solovyev@eosbiotech.com Present addresses: M. Burset, Institut Municipal dInvestigació Mèdica (IMIM), C/Dr Aiguader 80, 08003 Barcelona, Spain I. A. Seledtsov, Institute of Cytology and Genetics, Novosibirsk, 630090, Russia
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. L.-Y. Pang, S. Peacock, W. Johnson, D. H. Bear, O. M. Rennert, and W.-Y. Chan Cloning, Characterization, and Expression Analysis of the Novel Acetyltransferase Retrogene Ard1b in the Mouse Biol Reprod, August 1, 2009; 81(2): 302 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
E.-Y. Choi and D. Pintel Splicing of the Large Intron Present in the Nonstructural Gene of Minute Virus of Mice Is Governed by TIA-1/TIAR Binding Downstream of the Nonconsensus Donor J. Virol., June 15, 2009; 83(12): 6306 - 6311. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Inderbitzin, Y. R. Mehta, and M. L. Berbee Pleospora species with Stemphylium anamorphs: a four locus phylogeny resolves new lineages yet does not distinguish among species in the Pleospora herbarum clade Mycologia, May 1, 2009; 101(3): 329 - 339. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. H. Lee and H. Shatkay An integrative scoring system for ranking SNPs by their potential deleterious effects Bioinformatics, April 15, 2009; 25(8): 1048 - 1055. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ferraguti, L. Crepaldi, and F. Nicoletti Metabotropic Glutamate 1 Receptor: Current Concepts and Perspectives Pharmacol. Rev., December 1, 2008; 60(4): 536 - 581. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Spatuzza, M. Schiavone, E. Di Salle, E. Janda, M. Sardiello, G. Fiume, O. Fierro, M. Simonetta, N. Argiriou, R. Faraonio, et al. Physical and functional characterization of the genetic locus of IBtk, an inhibitor of Bruton's tyrosine kinase: evidence for three protein isoforms of IBtk Nucleic Acids Res., August 1, 2008; 36(13): 4402 - 4416. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Mandal, Z. Feng, and C. M. Stoltzfus Gag-Processing Defect of Human Immunodeficiency Virus Type 1 Integrase E246 and G247 Mutants Is Caused by Activation of an Overlapping 5' Splice Site J. Virol., February 1, 2008; 82(3): 1600 - 1604. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Zhang, M. L. Hastings, A. R. Krainer, and M. Q. Zhang Dual-specificity splice sites function alternatively as 5' and 3' splice sites PNAS, September 18, 2007; 104(38): 15028 - 15033. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bhasi, R. V. Pandey, S. P. Utharasamy, and P. Senapathy EuSplice: a unified resource for the analysis of splice signals and alternative splicing in eukaryotic genes Bioinformatics, July 15, 2007; 23(14): 1815 - 1823. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Roma, G. Cobellis, P. Claudiani, F. Maione, P. Cruz, G. Tripoli, M. Sardiello, I. Peluso, and E. Stupka A novel view of the transcriptome revealed from gene trapping in mouse embryonic stem cells Genome Res., July 1, 2007; 17(7): 1051 - 1060. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Menotti-Raymond, V. A. David, A. A. Schaffer, R. Stephens, D. Wells, R. Kumar-Singh, S. J. O'Brien, and K. Narfstrom Mutation in CEP290 Discovered for Cat Model of Human Retinal Degeneration J. Hered., May 16, 2007; (2007) esm019v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. P. Terwilliger, K. M. Buckley, D. Mehta, P. G. Moorjani, and L.C. Smith Unexpected diversity displayed in cDNAs expressed by the immune cells of the purple sea urchin, Strongylocentrotus purpuratus Physiol Genomics, September 14, 2006; 26(2): 134 - 144. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kyriakopoulou, P. Larsson, L. Liu, J. Schuster, F. Soderbom, L. A. Kirsebom, and A. Virtanen U1-like snRNAs lacking complementarity to canonical 5' splice sites RNA, September 1, 2006; 12(9): 1603 - 1611. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-i. Takeda, Y. Suzuki, M. Nakao, R. A. Barrero, K. O. Koyanagi, L. Jin, C. Motono, H. Hata, T. Isogai, K. Nagai, et al. Large-scale identification and characterization of alternative splicing variants of human gene transcripts using 56 419 completely sequenced and manually annotated full-length cDNAs Nucleic Acids Res., September 1, 2006; 34(14): 3917 - 3928. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Sheth, X. Roca, M. L. Hastings, T. Roeder, A. R. Krainer, and R. Sachidanandam Comprehensive splice-site analysis using comparative genomics Nucleic Acids Res., September 1, 2006; 34(14): 3955 - 3967. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang, X. S. Liu, Q.-R. Liu, and L. Wei Genome-wide in silico identification and analysis of cis natural antisense transcripts (cis-NATs) in ten species Nucleic Acids Res., July 18, 2006; 34(12): 3465 - 3475. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Castrignano, R. Rizzi, I. G. Talamo, P. D. De Meo, A. Anselmo, P. Bonizzoni, and G. Pesole ASPIC: a web resource for alternative splicing prediction and transcript isoforms characterization. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W440 - W443. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Vigetti, M. Ori, M. Viola, A. Genasetti, E. Karousou, M. Rizzi, F. Pallotti, I. Nardi, V. C. Hascall, G. De Luca, et al. Molecular Cloning and Characterization of UDP-glucose Dehydrogenase from the Amphibian Xenopus laevis and Its Involvement in Hyaluronan Synthesis J. Biol. Chem., March 24, 2006; 281(12): 8254 - 8263. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bonizzoni, R. Rizzi, and G. Pesole Computational methods for alternative splicing prediction Brief Funct Genomic Proteomic, March 1, 2006; 5(1): 46 - 51. |
||||
![]() |
L. Lipovich and M.-C. King Abundant novel transcriptional units and unconventional gene pairs on human chromosome 22 Genome Res., January 1, 2006; 16(1): 45 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Zhang and W. Gish Improved spliced alignment from an information theoretic approach Bioinformatics, January 1, 2006; 22(1): 13 - 20. [Abstract] [Full Text] [PDF] |
||||
![]() |
K J Bradley, B M Cavaco, M R Bowl, B Harding, A Young, and R V Thakker Utilisation of a cryptic non-canonical donor splice site of the gene encoding PARAFIBROMIN is associated with familial isolated primary hyperparathyroidism J. Med. Genet., August 1, 2005; 42(8): e51 - e51. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Ishunina, D. F. Swaab, and D. F. Fischer Estrogen Receptor-{alpha} Splice Variants in the Medial Mamillary Nucleus of Alzheimer's Disease Patients: Identification of a Novel MB1 Isoform J. Clin. Endocrinol. Metab., June 1, 2005; 90(6): 3757 - 3765. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. K. W. Cheung, Y. Guan, S. S. F. Ng, H. Chen, C. H. K. Wong, J. S. M. Peiris, and L. L. M. Poon Generation of recombinant influenza A virus without M2 ion-channel protein by introduction of a point mutation at the 5' end of the viral intron J. Gen. Virol., May 1, 2005; 86(5): 1447 - 1454. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. A. Sharov, D. B. Dudekula, and M. S.H. Ko Genome-wide assembly and analysis of alternative transcripts in mouse Genome Res., May 1, 2005; 15(5): 748 - 754. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y.-K. Leung, K.-M. Lau, J. Mobley, Z. Jiang, and S.-M. Ho Overexpression of Cytochrome P450 1A1 and Its Novel Spliced Variant in Ovarian Cancer Cells: Alternative Subcellular Enzyme Compartmentation May Contribute to Carcinogenesis Cancer Res., May 1, 2005; 65(9): 3726 - 3734. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. D. Wu and C. K. Watanabe GMAP: a genomic mapping and alignment program for mRNA and EST sequences Bioinformatics, May 1, 2005; 21(9): 1859 - 1875. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Chen, N. Fossar, D. Weil, M. Guillaud-Bataille, G. Danglot, B. Raynal, F. Dautry, A. Bernheim, and O. Brison High frequency trans-splicing in a cell line producing spliced and polyadenylated RNA polymerase I transcripts from an rDNA-myc chimeric gene Nucleic Acids Res., April 22, 2005; 33(7): 2332 - 2342. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Miyake, T. Mizuno, K.-i. Yanagi, and F. Hanaoka Novel Splicing Variant of Mouse Orc1 Is Deficient in Nuclear Translocation and Resistant for Proteasome-mediated Degradation J. Biol. Chem., April 1, 2005; 280(13): 12643 - 12652. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Vanacova, W. Yan, J. M. Carlton, and P. J. Johnson Spliceosomal introns in the deep-branching eukaryote Trichomonas vaginalis PNAS, March 22, 2005; 102(12): 4430 - 4435. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Kupfer, S. D. Drabenstot, K. L. Buchanan, H. Lai, H. Zhu, D. W. Dyer, B. A. Roe, and J. W. Murphy Introns and Splicing Elements of Five Diverse Fungi Eukaryot. Cell, October 1, 2004; 3(5): 1088 - 1100. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Chen, M. Sun, W. J. Kent, X. Huang, H. Xie, W. Wang, G. Zhou, R. Z. Shi, and J. D. Rowley Over 20% of human transcripts might form sense-antisense pairs Nucleic Acids Res., September 8, 2004; 32(16): 4812 - 4820. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Ovcharenko, D. Boffelli, and G. G. Loots eShadow: A Tool for Comparing Closely Related Sequences Genome Res., June 1, 2004; 14(6): 1191 - 1198. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Eskesen, F. N. Eskesen, and A. Ruvinsky Natural Selection Affects Frequencies of AG and GT Dinucleotides at the 5' and 3' Ends of Exons Genetics, May 1, 2004; 167(1): 543 - 550. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Stoilov, R. Daoud, O. Nayler, and S. Stamm Human tra2-beta1 autoregulates its protein concentration by influencing alternative splicing of its pre-mRNA Hum. Mol. Genet., March 1, 2004; 13(5): 509 - 524. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Eden and S. Brunak Analysis and recognition of 5' UTR intron splice sites in human pre-mRNA Nucleic Acids Res., February 11, 2004; 32(3): 1131 - 1142. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Freund, C. Asang, S. Kammler, C. Konermann, J. Krummheuer, M. Hipp, I. Meyer, W. Gierling, S. Theiss, T. Preuss, et al. A novel approach to describe a U1 snRNA binding site Nucleic Acids Res., December 1, 2003; 31(23): 6963 - 6975. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. Eisenhaure, S. A. Francis, L. D. Willison, S. R. Coughlin, and D. J. Lerner The Rho Guanine Nucleotide Exchange Factor Lsc Homo-oligomerizes and Is Negatively Regulated through Domains in Its Carboxyl Terminus That Are Absent in Novel Splenic Isoforms J. Biol. Chem., August 15, 2003; 278(33): 30975 - 30984. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Volfovsky, B. J. Haas, and S. L. Salzberg Computational Discovery of Internal Micro-Exons Genome Res., June 1, 2003; 13(6): 1216 - 1221. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Zhu, S. D. Schlueter, and V. Brendel Refined Annotation of the Arabidopsis Genome by Complete Expressed Sequence Tag Mapping Plant Physiology, June 1, 2003; 132(2): 469 - 484. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. C. Hogan, M. D. Griffin, S. Rossetti, V. E. Torres, C. J. Ward, and P. C. Harris PKHDL1, a homolog of the autosomal recessive polycystic kidney disease gene, encodes a receptor with inducible T lymphocyte expression Hum. Mol. Genet., March 15, 2003; 12(6): 685 - 698. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Romani, E. Guerra, M. Trerotola, and S. Alberti Detection and analysis of spliced chimeric mRNAs in sequence databanks Nucleic Acids Res., February 15, 2003; 31(4): e17 - e17. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. D. Einum, A. M. Clark, J. J. Townsend, L. J. Ptacek, and Y.-H. Fu A Novel Central Nervous System-Enriched Spinocerebellar Ataxia Type 7 Gene Product Arch Neurol, January 1, 2003; 60(1): 97 - 103. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. W. Soong, C. D. DeMaria, R. S. Alvania, L. S. Zweifel, M. C. Liang, S. Mittman, W. S. Agnew, and D. T. Yue Systematic Identification of Splice Variants in Human P/Q-Type Channel alpha 12.1 Subunits: Implications for Current Density and Ca2+-Dependent Inactivation J. Neurosci., December 1, 2002; 22(23): 10142 - 10152. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Mathe, M.-F. Sagot, T. Schiex, and P. Rouze Current methods of gene prediction, their strengths and weaknesses Nucleic Acids Res., October 1, 2002; 30(19): 4103 - 4117. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Farrer, A. B. Roller, W. J. Kent, and A. M. Zahler Analysis of the role of Caenorhabditis elegans GC-AG introns in regulated splicing Nucleic Acids Res., August 1, 2002; 30(15): 3360 - 3367. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Decher, O. Uyguner, C. R Scherer, B. Karaman, M. Yuksel-Apak, A. E Busch, K. Steinmeyer, and B. Wollnik hKChIP2 is a functional modifier of hKv4.3 potassium channels: Cloning and expression of a short hKChIP2 splice variant Cardiovasc Res, November 1, 2001; 52(2): 255 - 264. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Levine and R. Durbin A computational scan for U12-dependent introns in the human genome sequence Nucleic Acids Res., October 1, 2001; 29(19): 4006 - 4013. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. C. Erichsen, P. Eck, M. Levine, and S. Chanock Characterization of the Genomic Structure of the Human Vitamin C Transporter SVCT1 (SLC23A2) J. Nutr., October 1, 2001; 131(10): 2623 - 2627. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Warner, C. Finta, and P. G. Zaphiropoulos Intergenic Transcripts Containing a Novel Human Cytochrome P450 2C Exon 1 Spliced to Sequences from the CYP2C9 Gene Mol. Biol. Evol., October 1, 2001; 18(10): 1841 - 1848. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Thanaraj and F. Clark Human GC-AG alternative intron isoforms with weak donor sites show enhanced consensus at acceptor exon positions Nucleic Acids Res., June 15, 2001; 29(12): 2581 - 2593. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Burset, I. A. Seledtsov, and V. V. Solovyev SpliceDB: database of canonical and non-canonical mammalian splice sites Nucleic Acids Res., January 1, 2001; 29(1): 255 - 259. [Abstract] [Full Text] [PDF] |
||||

























