| Nucleic Acids Research | Pages |
Intron-exon structures of eukaryotic model organisms
Introduction
Materials And Methods
GenBank sequence database
Exon databases
Removing redunduncy
Searching for homologous genes
Results
Database construction
Intron-exon structures in the overall database
Intron-exon structures in the 10 model organisms
Correlation between intron size and genome size
Discussion
Acknowledgements
References
Intron-exon structures of eukaryotic model organisms
Received February 22, 1999; Revised and Accepted June 8, 1999
ABSTRACT To investigate the distribution of intron-exon structures of eukaryotic genes, we have constructed a general exon database comprising all available intron-containing genes and exon databases from 10 eukaryotic model organisms: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana, Zea mays, Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila. We purged redundant genes to avoid the possible bias brought about by redundancy in the databases. After discarding those questionable introns that do not contain correct splice sites, the final database contained 17 102 introns, 21 019 exons and 2903 independent or quasi-independent genes. On average, a eukaryotic gene contains 3.7 introns per kb protein coding region. The exon distribution peaks around 30-40 residues and most introns are 40-125 nt long. The variable intron-exon structures of the 10 model organisms reveal two interesting statistical phenomena, which cast light on some previous speculations. (i) Genome size seems to be correlated with total intron length per gene. For example, invertebrate introns are smaller than those of human genes, while yeast introns are shorter than invertebrate introns. However, this correlation is weak, suggesting that other factors besides genome size may also affect intron size. (ii) Introns smaller than 50 nt are significantly less frequent than longer introns, possibly resulting from a minimum intron size requirement for intron splicing.
INTRODUCTION
In order to understand the structure and evolution of genes and genomes in this era of genomics, it is important to know the general statistical characteristics of the intron-exon structures of eukaryotic genes. On the one hand, designing a research project involving genomic structures requires an understanding of general characteristics of genes and genomes. On the other hand, floods of information from exponentially growing databases of DNA and protein sequences often overwhelm researchers who study individual genes or gene families, rendering it difficult to place a newly sequenced gene or a newly determined gene family in a general picture of eukaryotic genes and genomes. When one determines the intron-exon structure of a newly characterized gene, one wonders if it is a normal structure or if it represents an entirely novel structure. Finally, developing sensitive bioinformatics tools to find genes and open reading frames in eukaryotic genome sequences, an important task in genomics studies, also depends on a complete statistical description of intron-exon structures. An updated statistical description of intron-exon structures has been lacking and is imperative for the theoretical study of the origin and evolution of genes and genomes.
It has been a decade since the first compilation of intron-exon structures in eukaryotic genes was published (1). A number of authors published analyses of some characteristics of nuclear introns in a few particular organisms in the late 1980s and early 1990s (2-5). However, the databases have evolved in both size and content in recent years. The first change is the astronomical growth of the sequence databases as a consequence of sequencing the entire genomes of many organisms. The second feature is that more and more redundant genes have entered the databases. For example, the genome of Saccharomyces cerevisiae contains 35% proteins from the same gene families (6). How to efficiently define and exclude such genomic redundancy, which may bring bias to the analysis of intron-exon structure, has become a technical challenge. This investigation has considered these new factors in an attempt to portray the general features of gene structures in various model organisms.
We analyzed the statistical distribution of spliceosomal introns and exons of nuclear genes in various model organisms using a DNA sequence database released recently, GenBank 106. These observations, based on a large number of genes (we only chose those model organisms that have many genes sequenced), may be viewed as a general description of gene structures in those organisms. We found from these statistics that, not surprisingly, species have evolved considerably different intron-exon structures. Remarkably, we observed that such changes are correlated with the evolution of genomes and are constrained by functional properties of intron splicing processes. Such correlations bear some implications for some significant issues in gene evolution.
MATERIALS AND METHODS
GenBank sequence database
GenBank release 106 contains sequences of 1.5 × 109 nt in 2.2 × 106 entries. We downloaded all flat files that contain eukaryotic genes, including gbmam.seq, gbinv.seq, bvrt.seq, gbpln.seq, gbrod.seq, gbpri1.seq and gbpri2.seq, to our alpha WDPS 500au workstation (Digital). All further analyses are based on the information stored in these files.
Exon databases
Using the method developed by Long et al. (7), we withdrew all entries in the GenBank files that contain intron-exon structures to form a raw intron-exon database. This raw database includes information on locus names, definition of intron-exon structures, species name, protein sequences and DNA sequences. Following the method of Long et al. (7), we then calculated all essential parameters, such as the sizes of introns and exons in the regions of the coding sequence (CDS), 3[prime]-UTR and 5[prime]-UTR. In order to avoid errors brought about by erroneous intron submissions, we also collected the dinucleotides around 5[prime] and 3[prime] splice sites as the feature table defined. In the analysis we only used those sequences that had correct GT..AG signals around splice sites within introns. This also deleted a minor class of introns that contain different splice site sequences. The deletion, however, did not change the statistical results significantly, because it only represents a small fraction (<1%) of the total introns (8).
In addition to the overall intron-exon database created from all available sequences, we also created intron-exon databases for the 10 model organisms that have many genes sequenced. These organisms are: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana (cress), Zea mays (corn), Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila.
Removing redunduncy
Many genes now have more than one copy in the database, either orthologous genes from different species or paralogous genes from a gene family in the same species. In some cases, the same genes have been sequenced and reported twice by different laboratories. An extreme case is that there are thousands of immunoglobulin sequences in the database. The uneven distribution of these redundant sequences in the databases will introduce bias into an analysis of intron-exon structures, e.g. the intron number. To avoid this potential bias, we purged the intron-exon databases using the method of Long et al. (7). The purging is based on pairwise comparison of protein sequences. When two protein sequences have a similarity greater than or equal to 20%, calculated by fasta3 (9), we keep one sequence and drop the other one in two ways. (i) If we are interested in the number of exons and introns per gene, we compare all the genes in the same gene families as defined by the 20% similarity criterion (all sequences that in comparison have similarity >20% are grouped together and taken as a family). We take a gene with the most common intron and exon numbers as representative of the family. Only the families that contain more than two sequences were considered for the purpose of comparison. (ii) In order to describe intron and exon lengths, we kept the genes that contain the highest intron and exon numbers as representatives of each family. This procedure created the largest unbiased sample of introns and exons from independent or quasi-independent genes.
Searching for homologous genes
In order to analyze the distribution of introns in homologous genes across the model organisms, we generated homologous gene families using GBPURGE (7) at a criterion of 30% similarity. In pairwise comparisons, if two sequences had a similarity of 30% or higher, we grouped them into a single family. From each gene family containing sequences of at least five model species, we kept one sequence from each organism, choosing a sequence with the most common number of introns in that species. We then generated databases of homologous genes for each species. We used this data set to analyze the relationship between introns and genome size. We also used the general databases of 10 model species for similar analysis for the purpose of comparison.
RESULTS
Database construction
The original and purged databases are summarized in Table 1. These results show that in the current databases most of the sequences (>70%) are redundant; either paralogous genes from the same gene family (superfamily) or orthologous genes from different species. Purging of these redundant sequences efficiently avoided the bias brought about by redundant sequences.
Table 1. Number of genes and introns before and after purging
Intron-exon structures in the overall database
Protein coding region. Figure 1 summarizes the distribution of intron-exon structures in protein coding regions for the overall database. This distribution gives a clear picture of the eukaryotic genes. An average gene contains 3.7 introns in 1 kb of protein coding region, but with considerable variation: a gigantic gene, human collagen type VII (13), contains 117 introns; the Fugu fish gene homologous to the Huntington's disease gene contains 66 introns (22).
Figure 1. Intron and exon length distributions in the overall database. The distributions of individual intron (intron length distribution) and exon lengths, of the total intron content per gene (total intron length distribution) and of the total intron content per kb of coding sequence are summarized and graphed. Each graph consists of two parts, with smaller lengths above longer ones. Exon lengths are given in amino acids, intron lengths in nucleotides. The horizontal and vertical axes represent lengths and frequencies, respectively.
Figure 1 shows that exon lengths are distributed much more tightly than intron lengths. Most exons are 30-40 residues long, which is consistent with previous observations on smaller samples (2,7). A common intron is 40-125 nt long, however, this statistic shows huge variation (in the database the largest recorded is 108 kb; human gene GenBank accession no. AC003992). The longest introns, although not in the database, are >300 kb in the dystrophin gene (10), which contains >700 kb of intron sequences (5). Human gene CIT987-SKA-34504 (M. D. Adams et al., GenBank accession no. AC002302) contains introns of 151 kb. The smallest introns were 18-20 nt long in the nucleomorph, a eukaryotic endosymbiont (11), and 21 nt in Paramecium tertaurelia, a ciliated protozoan (12).
UTR regions. In the database, 2% of genes contain descriptions of introns and exons in the 5[prime]- and 3[prime]-UTRs of the RNA (Fig. 2). Seventy-four genes have 5[prime]-UTR sequences, with seven genes having one 5[prime]-UTR intron. Sixty-nine genes have a single 3[prime]-UTR exon. The lengths of these exons and introns show a variable distribution. For instance, the average 3[prime]-UTR sequence is 340 nt long, with a minimum of 17 nt and maximum of 1376 nt; the lengths of the seven 5[prime]-UTR introns range from 96 to 8214 nt.
Human, Mouse
|
Rat, Chicken
![]() |
Drosophila, C.elegans
![]() |
Cress, Corn
![]() |
S.Pombe, Aspergillus
![]() |
Figure 2. Intron and exon length distributions in model organisms and 5[prime]-/3[prime]-UTRs. The distributions of individual intron (intron length distribution) and exon lengths, of the total intron content per gene (total intron length distribution) and of the total intron content per kb of coding sequence are summarized and graphed. Each graph consists of two parts, with smaller lengths above longer ones, except in certain cases where a small sample size made this unnecessary (intron lengths for Aspergillus, total intron length and intron length per kb coding sequence for S.pombe and total intron length for corn). Exon lengths are given in amino acids, intron lengths in nucleotides. The horizontal and vertical axes represent lengths and frequencies, respectively.
Intron-exon structures in the 10 model organisms
Figure 2 shows the intron-exon structures of the 10 model organisms. Like the overall database, these organisms show a tighter distribution of exon lengths than of intron lengths, as well as minimum intron lengths.
Human genes have the most introns and the largest introns of the 10 species. An average human gene has four introns; the highest recorded number is 116 introns, in the collagen type VII gene (13). The mean intron length is 3413 bases while the most common length is 75-150 nt. The longest introns are recorded in the BSC gene (its first intron is >71 kb) and the dystrophin gene (several introns >100 kb). The intron length in 1 kb of human gene CDS is close to 7 kb (6825 nt) on average. The other two mammalian species (mouse and rat) are similar to humans in the numbers of introns per kb of CDS. Their individual intron lengths and total intron length per kb of CDS or per gene are shorter than human but higher than other non-mammalian species. It appears that these two rodent species have shorter proteins.
The two fungi, S.pombe and Aspergillus, have the shortest introns. Their average gene contains only two introns per kb of CDS, totalling 160-280 nt. Individual introns on average are only 40-75 nt long. This is very similar to the case of S.cerevisiae, the first eukaryotic species whose complete genome was sequenced, which was found to have very few, short introns (14).
Chicken genes contain more and larger introns, next to the mammalian genes. A slightly higher number of introns per gene is compensated for by shorter introns (700 nt on average), giving a total of 1.8 kb of intron sequence per kb of CDS.
Caenorhabditis elegans and Drosophila, two invertebrates, do not contain long introns. Their total intron lengths per kb of CDS average only 1000 and 600 nt, respectively. However, these two species have contrasting intron-exon structures. Caenorhabditis elegans genes contain more (4.0 introns per kb of CDS), shorter (467 nt each) introns, while Drosophila genes have fewer, longer introns (2.7 introns per kb of CDS, 564 nt each).
The genes of the two plant species, corn and cress, contain introns whose lengths per kb of CDS are intermediate, like the two invertebrates. The number of introns is similar in the two plant species (3.9-4.3 per kb of CDS), but the average corn intron (328 nt) is longer than the average cress intron (240 nt).
Correlation between intron size and genome size
In our analysis, we observed a correlation between the size of genomes and the amount of intron sequences in their genes. Table 2 and Figure 3 show correlations between genome size and the number of introns per gene, number of introns per kb of CDS, total intron length per gene and intron length per kb of CDS. The correlation between genome size and total intron length per kb CDS is significant at a marginal level (P = 0.06 for R = 0.6). This weak correlation may suggest a limited causal relationship between intron content and genome complexity. However, it also indicates that other factors are likely to be involved in the evolution of intron size.
Figure 3. Correlation between genome size and average intron size/number in model organisms. Figures labeled `All genes' contain data from all the genes for each of the model organisms; figures labeled `Genes from same families' contain data only from genes present in at least five of the model organisms.
Table 2. Correlation between intron size (means) and genome size
The correlation coefficients (R) are calculated for the correlation between genome size and the measure of intron size indicated at the top of each column. Statistics from the purged database containing genes with the most common number of introns. Sample sizes given are for homologous genes.
Figure 3 reveals a possible relationship between genome size and intron-exon structure. The calculation is, however, based on the purged databases, in which different species may be represented by different types of genes. For example, the plant databases contain photosynthesis-related genes, while the human database contains immunoglobulin genes. Different types of genes may have different intron-exon structures. A more rigorous comparison requires homologous genes in the different organisms. We obtained sequences from 55 gene families that were represented in at least five of the 10 species. Figure 3 shows the relationship between intron content and genome size in these homologous genes. These results show an elevated correlation between intron content and genome size. The higher correlation values in homologous gene sets indicate a better control of the factors due to non-homologous genes in the analyses of Figure 3.
DISCUSSION
This analysis provides a general picture of the intron-exon structure of eukaryotic genes. On average, the analyzed genes have 3.7 introns of 40-150 nt each. These statistics are subject to large variation. A number of introns >100 kb or <20 nt exist in the databases and the literature. A human gene contains more than 100 introns, while some genes in the Fugu fish have more than 60 introns. The intron-exon structures of different organisms are variable. These statistical analyses, in general, may provide a fundamental basis for both understanding the structure of a gene that is identified in molecular studies and developing more sensitive tools to find genes or open reading frames in eukaryotic genome sequences. In particular, this investigation also suggests two interesting points, which may increase our understanding of the evolution of intron-exon structures.
The first is that although introns can be very long, the minimum intron size is limited by the length of the splicing signals. Most of the shortest introns observed were 20-30 bases long; very few were <20 bases. This indicates that in order to encode adequate splicing signals, introns cannot be too short. In fact, conservation analysis of the splice sites showed that the conserved sequence distribution in introns can be extended over >20 nt (14). The smallest recorded introns, found in protist genes, are 13-20 bases long (12). They also encode the minimum splice sequences (GT..AG dinucleotide sequences are encoded), supporting the general conclusion of this analysis.
Second, it has been speculated that intron content is correlated with genome size. For example, it was proposed that intronless prokaryotic genes might be a product of selection against introns for more efficent molecular processes of replication, transcription and processing (16). Recently, it was observed that small genome sizes in Drosophila were correlated with high deletion rates in the Helena retrosequence and introns (17-19). The correlations we have observed in the genomes of model organisms support, in general, the notion that the existence of non-functional elements should be consistent with the size of the genome. The small introns in bird genes provide another example of this relationship (20). However, the correlations as revealed in this study are only at a marginal level of significance in the total intron size per kb CDS, although the correlations increase in homologous genes. These analyses indicate a possibly true but weak connection between genome size and intron sizes, suggesting that there may be other factors involved as well.
This is the distribution of a large sample, with 2903 independent or quasi-independent genes and 17 102 introns. Cautionary notes, however, should be made. First, these statistics were calculated only from genes that contain introns, so information such as the number of introns per gene is only valid for these intron-containing genes. A complete survey of genes that do not contain introns is so far unavailable, except in the case of yeast (14,21). Second, when introns are very long, many researchers tend not to sequence them, for understandable reasons. As a result, the average intron sizes are underestimated to some extent. However, the smooth and fast decrease in frequency of intron sizes in Figure 1 implies that very large introns may make up a small proportion (<5%) of introns in the genome, although the final statistic awaits the completion of the human genome project. Finally, it is not unreasonable to believe that the mode of intron size distribution is likely a stable measurement of most introns.
ACKNOWLEDGEMENTS
We thank Carl Rosenberg at Falling Rain Genomics Inc. (Lincoln, MA) for developing the GBPURGE package which made automatic purging of gene redundancy possible. We thank Walter Gilbert for discussions. The laboratory of M.L. was supported by the Packard Fellowship in Science and Engineering and the National Science Foundation.
REFERENCES
*To whom correspondence should be addressed. Tel: +1 773 702 0557; Fax: +1 773 702 9740; Email: mlong{at}midway.uchicago.edu
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification:
Copyright© Oxford University Press, 1999.
This article has been cited by other articles:
![]() |
F. Belinky, O. Cohen, and D. Huchon Large-Scale Parsimony Analysis of Metazoan Indels in Protein-Coding Genes Mol. Biol. Evol., February 1, 2010; 27(2): 441 - 451. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhu, Y. Zhang, and M. Long Extensive Structural Renovation of Retrogenes in the Evolution of the Populus Genome Plant Physiology, December 1, 2009; 151(4): 1943 - 1951. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Andersson, S. Enroth, A. Rada-Iglesias, C. Wadelius, and J. Komorowski Nucleosomes are well positioned in exons and carry characteristic histone modifications Genome Res., October 1, 2009; 19(10): 1732 - 1741. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. V. Koonin Intron-Dominated Genomes of Early Ancestors of Eukaryotes J. Hered., September 1, 2009; 100(5): 618 - 623. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Hiller, S. Findeiss, S. Lein, M. Marz, C. Nickel, D. Rose, C. Schulz, R. Backofen, S. J. Prohaska, G. Reuter, et al. Conserved introns reveal novel transcripts in Drosophila melanogaster Genome Res., July 1, 2009; 19(7): 1289 - 1300. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. D. Cutter, A. Dey, and R. L. Murray Evolution of the Caenorhabditis elegans Genome Mol. Biol. Evol., June 1, 2009; 26(6): 1199 - 1234. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Roy, N. Kim, Y. Xing, and C. Lee The effect of intron length on exon creation ratios during the evolution of mammalian genomes RNA, November 1, 2008; 14(11): 2261 - 2273. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. B. Barbazuk, Y. Fu, and K. M. McGinnis Genome-wide analyses of alternative splicing in plants: Opportunities and challenges Genome Res., September 1, 2008; 18(9): 1381 - 1392. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. J. Emerson, M. Cardoso-Moreira, J. O. Borevitz, and M. Long Natural Selection Shapes Genome-Wide Patterns of Copy-Number Polymorphism in Drosophila melanogaster Science, June 20, 2008; 320(5883): 1629 - 1631. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Csuros, I. B. Rogozin, and E. V. Koonin Extremely Intron-Rich Genes in the Alveolate Ancestors Inferred with a Flexible Maximum-Likelihood Approach Mol. Biol. Evol., May 1, 2008; 25(5): 903 - 911. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Wahlberg and C. W. Wheat Genomic Outposts Serve the Phylogenomic Pioneers: Designing Novel Nuclear Markers for Genomic DNA Extractions of Lepidoptera Syst Biol, April 1, 2008; 57(2): 231 - 242. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. J. Hertel Combinatorial Control of Exon Recognition J. Biol. Chem., January 18, 2008; 283(3): 1211 - 1215. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Schwartz, J. Silva, D. Burstein, T. Pupko, E. Eyras, and G. Ast Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes Genome Res., January 1, 2008; 18(1): 88 - 103. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. K. Basu, I. B. Rogozin, O. Deusch, T. Dagan, W. Martin, and E. V. Koonin Evolutionary Dynamics of Introns in Plastid-Derived Genes in Plants: Saturation Nearly Reached but Slow Intron Gain Continues Mol. Biol. Evol., January 1, 2008; 25(1): 111 - 119. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Carmel, Y. I. Wolf, I. B. Rogozin, and E. V. Koonin Three distinct modes of intron dynamics in the evolution of eukaryotes Genome Res., July 1, 2007; 17(7): 1034 - 1044. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Coghlan and R. Durbin Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron exon structure Bioinformatics, June 15, 2007; 23(12): 1468 - 1475. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Kodama, S. Nagaya, A. Shinmyo, and K. Kato Mapping and Characterization of DNase I Hypersensitive Sites in Arabidopsis Chromatin Plant Cell Physiol., March 1, 2007; 48(3): 459 - 470. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Aluri and M. Buttner Identification and functional expression of the Arabidopsis thaliana vacuolar glucose transporter 1 and its role in seed germination and flowering PNAS, February 13, 2007; 104(7): 2537 - 2542. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Hong, D. G. Scofield, and M. Lynch Intron Size, Abundance, and Distribution within Untranslated Regions of Genes Mol. Biol. Evol., December 1, 2006; 23(12): 2392 - 2404. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. C. Presgraves Intron Length Evolution in Drosophila Mol. Biol. Evol., November 1, 2006; 23(11): 2203 - 2213. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Hawkins, H. Kim, J. D. Nason, R. A. Wing, and J. F. Wendel Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium Genome Res., October 1, 2006; 16(10): 1252 - 1261. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. R. Gilson, V. Su, C. H. Slamovits, M. E. Reith, P. J. Keeling, and G. I. McFadden From the Cover: Complete nucleotide sequence of the chlorarachniophyte nucleomorph: Nature's smallest nucleus PNAS, June 20, 2006; 103(25): 9566 - 9571. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Agrawal and G. D. Stormo Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans Bioinformatics, May 15, 2006; 22(10): 1239 - 1244. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-B. Wang and V. Brendel Genomewide comparative analysis of alternative splicing in plants PNAS, May 2, 2006; 103(18): 7175 - 7180. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. L. Fox-Walsh, Y. Dou, B. J. Lam, S.-p. Hung, P. F. Baldi, and K. J. Hertel The architecture of pre-mRNAs affects mechanisms of splice-site pairing PNAS, November 8, 2005; 102(45): 16176 - 16181. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Burnette, E. Miyamoto-Sato, M. A. Schaub, J. Conklin, and A. J. Lopez Subdivision of Large Introns in Drosophila by Recursive Splicing at Nonexonic Elements Genetics, June 1, 2005; 170(2): 661 - 674. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Vanacova, W. Yan, J. M. Carlton, and P. J. Johnson Spliceosomal introns in the deep-branching eukaryote Trichomonas vaginalis PNAS, March 22, 2005; 102(12): 4430 - 4435. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Nicholson, M. K. Theodorou, and J. L. Brookman Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome Microbiology, January 1, 2005; 151(1): 121 - 133. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Wang, X. Zhao, J. Zhu, and W. Wu Genome-wide Investigation of Intron Length Polymorphisms and Their Potential as Molecular Markers in Rice (Oryza sativa L.) DNA Res, January 1, 2005; 12(6): 417 - 427. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Krauss, M. Pecyna, K. Kurz, and H. Sass Phylogenetic Mapping of Intron Positions: A Case Study of Translation Initiation Factor eIF2{gamma} Mol. Biol. Evol., January 1, 2005; 22(1): 74 - 84. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. T. Fritz, D. Liu, J. Xu, S. Jiang, and M. B. Rogers Conservation of Bmp2 Post-transcriptional Regulatory Mechanisms J. Biol. Chem., November 19, 2004; 279(47): 48950 - 48958. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. M. Kupfer, S. D. Drabenstot, K. L. Buchanan, H. Lai, H. Zhu, D. W. Dyer, B. A. Roe, and J. W. Murphy Introns and Splicing Elements of Five Diverse Fungi Eukaryot. Cell, October 1, 2004; 3(5): 1088 - 1100. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. E. Grover, H. Kim, R. A. Wing, A. H. Paterson, and J. F. Wendel Incongruent Patterns of Local and Global Genome Size Evolution in Cotton Genome Res., August 1, 2004; 14(8): 1474 - 1482. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-G. Qiu, N. Schisler, and A. Stoltzfus The Evolutionary Gain of Spliceosomal Introns: Sequence and Phase Preferences Mol. Biol. Evol., July 1, 2004; 21(7): 1252 - 1263. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jarving, I. Jarving, R. Kurg, A. R. Brash, and N. Samel On the Evolutionary Origin of Cyclooxygenase (COX) Isozymes: CHARACTERIZATION OF MARINE INVERTEBRATE COX GENES POINTS TO INDEPENDENT DUPLICATION EVENTS IN VERTEBRATE AND INVERTEBRATE LINEAGES J. Biol. Chem., April 2, 2004; 279(14): 13624 - 13633. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Prachumwat, L. DeVincentis, and M. F. Palopoli Intron Size Correlates Positively With Recombination Rate in Caenorhabditis elegans Genetics, March 1, 2004; 166(3): 1585 - 1590. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome Genome Res., December 1, 2003; 13(12): 2541 - 2558. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Parsch Selective Constraints on Intron Evolution in Drosophila Genetics, December 1, 2003; 165(4): 1843 - 1851. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Lynch and J. S. Conery The Origins of Genome Complexity Science, November 21, 2003; 302(5649): 1401 - 1404. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. D. Drabenstot, D. M. Kupfer, J. D. White, D. W. Dyer, B. A. Roe, K. L. Buchanan, and J. W. Murphy FELINES: a utility for extracting and examining EST-defined introns and exons Nucleic Acids Res., November 15, 2003; 31(22): e141 - e141. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. J. Clyne, J. S. Brotman, S. T. Sweeney, and G. Davis Green Fluorescent Protein Tagging Drosophila Proteins at Their Native Genomic Loci With Small P Elements Genetics, November 1, 2003; 165(3): 1433 - 1441. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Rapic-Otrin, V. Navazza, T. Nardo, E. Botta, M. McLenigan, D. C. Bisi, A. S. Levine, and M. Stefanini True XP group E patients have a defective UV-damaged DNA binding protein complex and mutations in DDB2 which reveal the functional domains of its p48 product Hum. Mol. Genet., July 1, 2003; 12(13): 1507 - 1522. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. M. Nascimento, G. H. Goldman, S. Park, S. A. E. Marras, G. Delmas, U. Oza, K. Lolans, M. N. Dudley, P. A. Mann, and D. S. Perlin Multiple Resistance Mechanisms among Aspergillus fumigatus Mutants with High-Level Resistance to Itraconazole Antimicrob. Agents Chemother., May 1, 2003; 47(5): 1719 - 1726. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Pozzoli, G. Elgar, R. Cagliani, L. Riva, G. P. Comi, N. Bresolin, A. Bardoni, and M. Sironi Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature Genome Res., May 1, 2003; 13(5): 764 - 772. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Bon, S. Casaregola, G. Blandin, B. Llorente, C. Neuveglise, M. Munsterkotter, U. Guldener, H.-W. Mewes, J. V. Helden, B. Dujon, et al. Molecular evolution of eukaryotic genomes: hemiascomycetous yeast spliceosomal introns Nucleic Acids Res., February 15, 2003; 31(4): 1121 - 1135. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Hartung, F. R. Blattner, and H. Puchta Intron gain and loss in the evolution of the conserved eukaryotic recombination machinery Nucleic Acids Res., December 1, 2002; 30(23): 5175 - 5181. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Majewski and J. Ott Distribution and Characterization of Regulatory Elements in the Human Genome Genome Res., December 1, 2002; 12(12): 1827 - 1836. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. F. Wendel, R. C. Cronn, I. Alvarez, B. Liu, R. L. Small, and D. S. Senchina Intron Size and Genome Size in Plants Mol. Biol. Evol., December 1, 2002; 19(12): 2346 - 2352. [Full Text] [PDF] |
||||
![]() |
H. Kaessmann, S. Zollner, A. Nekrutenko, and W.-H. Li Signatures of Domain Shuffling in the Human Genome Genome Res., November 1, 2002; 12(11): 1642 - 1650. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. E. Ptak and D. A. Petrov How Intron Splicing Affects the Deletion and Insertion Profile in Drosophila melanogaster Genetics, November 1, 2002; 162(3): 1233 - 1244. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. B. Rogozin, K. S. Makarova, D. A. Natale, A. N. Spiridonov, R. L. Tatusov, Y. I. Wolf, J. Yin, and E. V. Koonin Congruent evolution of different classes of non-coding DNA in prokaryotic genomes Nucleic Acids Res., October 1, 2002; 30(19): 4264 - 4271. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Sorek, G. Ast, and D. Graur Alu-Containing Exons are Alternatively Spliced Genome Res., July 1, 2002; 12(7): 1060 - 1067. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Lynch Intron evolution as a population-genetic process PNAS, April 30, 2002; 99(9): 6118 - 6123. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. T. Webb, S. A. Shabalina, A. Yu. Ogurtsov, and A. S. Kondrashov Analysis of similarity within 142 pairs of orthologous intergenic regions of Caenorhabditis elegans and Caenorhabditis briggsae Nucleic Acids Res., March 1, 2002; 30(5): 1233 - 1239. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Sakharkar, F. Passetti, J. E. de Souza, M. Long, and S. J. de Souza ExInt: an Exon Intron Database Nucleic Acids Res., January 1, 2002; 30(1): 191 - 194. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. S. Mattick and M. J. Gagen Review ArticleThe Evolution of Controlled Multitasked Gene Networks: The Role of Introns and Other Noncoding RNAs in the Development of Complex Organisms Mol. Biol. Evol., September 1, 2001; 18(9): 1611 - 1630. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Bergman and M. Kreitman Analysis of Conserved Noncoding DNA in Drosophila Reveals Similar Constraints in Intergenic and Intronic Sequences Genome Res., August 1, 2001; 11(8): 1335 - 1345. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. M. Gomulski, R. J. Pitts, S. Costa, G. Saccone, C. Torti, L. C. Polito, G. Gasperi, A. R. Malacrida, F. C. Kafatos, and L. J. Zwiebel Genomic Organization and Characterization of the white Locus of the Mediterranean Fruitfly, Ceratitis capitata Genetics, March 1, 2001; 157(3): 1245 - 1255. [Abstract] [Full Text] |
||||
![]() |
T. Kohchi, K. Mukougawa, N. Frankenberg, M. Masuda, A. Yokota, and J. C. Lagarias The Arabidopsis HY2 Gene Encodes Phytochromobilin Synthase, a Ferredoxin-Dependent Biliverdin Reductase PLANT CELL, February 1, 2001; 13(2): 425 - 436. [Abstract] [Full Text] |
||||
![]() |
F. Hartung, H. Plchova, and H. Puchta Molecular characterisation of RecQ homologues in Arabidopsis thaliana Nucleic Acids Res., November 1, 2000; 28(21): 4275 - 4282. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Comeron and M. Kreitman The Correlation Between Intron Length and Recombination in Drosophila: Dynamic Equilibrium Between Mutational and Selective Forces Genetics, November 1, 2000; 156(3): 1175 - 1190. [Abstract] [Full Text] |
||||
![]() |
J. Trzcinska-Danielewicz and J. Fronk SURVEY AND SUMMARY: Exon-intron organization of genes in the slime mold Physarum polycephalum Nucleic Acids Res., September 15, 2000; 28(18): 3411 - 3416. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. A. Davis, L. Grate, M. Spingola, and M. Ares Jr Test of intron predictions reveals novel splice sites, alternatively spliced mRNAs and new introns in meiotically regulated genes of yeast Nucleic Acids Res., April 15, 2000; 28(8): 1700 - 1706. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. A. Thanaraj Positional characterisation of false positives from computational prediction of human splice sites Nucleic Acids Res., February 1, 2000; 28(3): 744 - 754. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. J. Schisler and J. D. Palmer The IDB and IEDB: intron sequence and evolution databases Nucleic Acids Res., January 1, 2000; 28(1): 181 - 184. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Saxonov, I. Daizadeh, A. Fedorov, and W. Gilbert EID: the Exon-Intron Database--an exhaustive database of protein-coding intron-containing genes Nucleic Acids Res., January 1, 2000; 28(1): 185 - 190. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. S. Walker, R. P. Shetty, K. Clark, S. G. Kazuko, A. Letsou, B. M. Olivera, and P. K. Bandyopadhyay On a Potential Global Role for Vitamin K-dependent gamma -Carboxylation in Animal Systems. EVIDENCE FOR A gamma -GLUTAMYL CARBOXYLASE IN DROSOPHILA J. Biol. Chem., March 9, 2001; 276(11): 7769 - 7774. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Boudet, S. Aubourg, C. Toffano-Nioche, M. Kreis, and A. Lecharny Evolution of Intron/Exon Structure of DEAD Helicase Family Genes in Arabidopsis, Caenorhabditis, and Drosophila Genome Res., December 1, 2001; 11(12): 2101 - 2114. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



























