Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (562K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (182)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Deutsch, M.
Right arrow Articles by Long, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Deutsch, M.
Right arrow Articles by Long, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Intron-exon structures of eukaryotic model organisms
Nucleic Acids Research Pages 3219-3228


Intron-exon structures of eukaryotic model organisms
Introduction
Materials And Methods
   GenBank sequence database
   Exon databases
   Removing redunduncy
   Searching for homologous genes
Results
   Database construction
   Intron-exon structures in the overall database
   Intron-exon structures in the 10 model organisms
   Correlation between intron size and genome size
Discussion
Acknowledgements
References


Intron-exon structures of eukaryotic model organisms

Michael Deutsch, Manyuan Long*

Department of Ecology and Evolution, The University of Chicago, 1101 East 57th Street, Chicago, IL 60637, USA

Received February 22, 1999; Revised and Accepted June 8, 1999

ABSTRACT

To investigate the distribution of intron-exon structures of eukaryotic genes, we have constructed a general exon database comprising all available intron-containing genes and exon databases from 10 eukaryotic model organisms: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana, Zea mays, Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila. We purged redundant genes to avoid the possible bias brought about by redundancy in the databases. After discarding those questionable introns that do not contain correct splice sites, the final database contained 17 102 introns, 21 019 exons and 2903 independent or quasi-independent genes. On average, a eukaryotic gene contains 3.7 introns per kb protein coding region. The exon distribution peaks around 30-40 residues and most introns are 40-125 nt long. The variable intron-exon structures of the 10 model organisms reveal two interesting statistical phenomena, which cast light on some previous speculations. (i) Genome size seems to be correlated with total intron length per gene. For example, invertebrate introns are smaller than those of human genes, while yeast introns are shorter than invertebrate introns. However, this correlation is weak, suggesting that other factors besides genome size may also affect intron size. (ii) Introns smaller than 50 nt are significantly less frequent than longer introns, possibly resulting from a minimum intron size requirement for intron splicing.

INTRODUCTION

In order to understand the structure and evolution of genes and genomes in this era of genomics, it is important to know the general statistical characteristics of the intron-exon structures of eukaryotic genes. On the one hand, designing a research project involving genomic structures requires an understanding of general characteristics of genes and genomes. On the other hand, floods of information from exponentially growing databases of DNA and protein sequences often overwhelm researchers who study individual genes or gene families, rendering it difficult to place a newly sequenced gene or a newly determined gene family in a general picture of eukaryotic genes and genomes. When one determines the intron-exon structure of a newly characterized gene, one wonders if it is a normal structure or if it represents an entirely novel structure. Finally, developing sensitive bioinformatics tools to find genes and open reading frames in eukaryotic genome sequences, an important task in genomics studies, also depends on a complete statistical description of intron-exon structures. An updated statistical description of intron-exon structures has been lacking and is imperative for the theoretical study of the origin and evolution of genes and genomes.

It has been a decade since the first compilation of intron-exon structures in eukaryotic genes was published (1). A number of authors published analyses of some characteristics of nuclear introns in a few particular organisms in the late 1980s and early 1990s (2-5). However, the databases have evolved in both size and content in recent years. The first change is the astronomical growth of the sequence databases as a consequence of sequencing the entire genomes of many organisms. The second feature is that more and more redundant genes have entered the databases. For example, the genome of Saccharomyces cerevisiae contains 35% proteins from the same gene families (6). How to efficiently define and exclude such genomic redundancy, which may bring bias to the analysis of intron-exon structure, has become a technical challenge. This investigation has considered these new factors in an attempt to portray the general features of gene structures in various model organisms.

We analyzed the statistical distribution of spliceosomal introns and exons of nuclear genes in various model organisms using a DNA sequence database released recently, GenBank 106. These observations, based on a large number of genes (we only chose those model organisms that have many genes sequenced), may be viewed as a general description of gene structures in those organisms. We found from these statistics that, not surprisingly, species have evolved considerably different intron-exon structures. Remarkably, we observed that such changes are correlated with the evolution of genomes and are constrained by functional properties of intron splicing processes. Such correlations bear some implications for some significant issues in gene evolution.

MATERIALS AND METHODS

GenBank sequence database

GenBank release 106 contains sequences of 1.5 × 109 nt in 2.2 × 106 entries. We downloaded all flat files that contain eukaryotic genes, including gbmam.seq, gbinv.seq, bvrt.seq, gbpln.seq, gbrod.seq, gbpri1.seq and gbpri2.seq, to our alpha WDPS 500au workstation (Digital). All further analyses are based on the information stored in these files.

Exon databases

Using the method developed by Long et al. (7), we withdrew all entries in the GenBank files that contain intron-exon structures to form a raw intron-exon database. This raw database includes information on locus names, definition of intron-exon structures, species name, protein sequences and DNA sequences. Following the method of Long et al. (7), we then calculated all essential parameters, such as the sizes of introns and exons in the regions of the coding sequence (CDS), 3[prime]-UTR and 5[prime]-UTR. In order to avoid errors brought about by erroneous intron submissions, we also collected the dinucleotides around 5[prime] and 3[prime] splice sites as the feature table defined. In the analysis we only used those sequences that had correct GT..AG signals around splice sites within introns. This also deleted a minor class of introns that contain different splice site sequences. The deletion, however, did not change the statistical results significantly, because it only represents a small fraction (<1%) of the total introns (8).

In addition to the overall intron-exon database created from all available sequences, we also created intron-exon databases for the 10 model organisms that have many genes sequenced. These organisms are: Homo sapiens, Mus musculus, Gallus gallus, Rattus norvegicus, Arabidopsis thaliana (cress), Zea mays (corn), Schizosaccharomyces pombe, Aspergillus, Caenorhabditis elegans and Drosophila.

Removing redunduncy

Many genes now have more than one copy in the database, either orthologous genes from different species or paralogous genes from a gene family in the same species. In some cases, the same genes have been sequenced and reported twice by different laboratories. An extreme case is that there are thousands of immunoglobulin sequences in the database. The uneven distribution of these redundant sequences in the databases will introduce bias into an analysis of intron-exon structures, e.g. the intron number. To avoid this potential bias, we purged the intron-exon databases using the method of Long et al. (7). The purging is based on pairwise comparison of protein sequences. When two protein sequences have a similarity greater than or equal to 20%, calculated by fasta3 (9), we keep one sequence and drop the other one in two ways. (i) If we are interested in the number of exons and introns per gene, we compare all the genes in the same gene families as defined by the 20% similarity criterion (all sequences that in comparison have similarity >20% are grouped together and taken as a family). We take a gene with the most common intron and exon numbers as representative of the family. Only the families that contain more than two sequences were considered for the purpose of comparison. (ii) In order to describe intron and exon lengths, we kept the genes that contain the highest intron and exon numbers as representatives of each family. This procedure created the largest unbiased sample of introns and exons from independent or quasi-independent genes.

Searching for homologous genes

In order to analyze the distribution of introns in homologous genes across the model organisms, we generated homologous gene families using GBPURGE (7) at a criterion of 30% similarity. In pairwise comparisons, if two sequences had a similarity of 30% or higher, we grouped them into a single family. From each gene family containing sequences of at least five model species, we kept one sequence from each organism, choosing a sequence with the most common number of introns in that species. We then generated databases of homologous genes for each species. We used this data set to analyze the relationship between introns and genome size. We also used the general databases of 10 model species for similar analysis for the purpose of comparison.

RESULTS

Database construction

The original and purged databases are summarized in Table 1. These results show that in the current databases most of the sequences (>70%) are redundant; either paralogous genes from the same gene family (superfamily) or orthologous genes from different species. Purging of these redundant sequences efficiently avoided the bias brought about by redundant sequences.


Table 1. Number of genes and introns before and after purging

Intron-exon structures in the overall database

Protein coding region. Figure 1 summarizes the distribution of intron-exon structures in protein coding regions for the overall database. This distribution gives a clear picture of the eukaryotic genes. An average gene contains 3.7 introns in 1 kb of protein coding region, but with considerable variation: a gigantic gene, human collagen type VII (13), contains 117 introns; the Fugu fish gene homologous to the Huntington's disease gene contains 66 introns (22).


Figure 1. Intron and exon length distributions in the overall database. The distributions of individual intron (intron length distribution) and exon lengths, of the total intron content per gene (total intron length distribution) and of the total intron content per kb of coding sequence are summarized and graphed. Each graph consists of two parts, with smaller lengths above longer ones. Exon lengths are given in amino acids, intron lengths in nucleotides. The horizontal and vertical axes represent lengths and frequencies, respectively.

Figure 1 shows that exon lengths are distributed much more tightly than intron lengths. Most exons are 30-40 residues long, which is consistent with previous observations on smaller samples (2,7). A common intron is 40-125 nt long, however, this statistic shows huge variation (in the database the largest recorded is 108 kb; human gene GenBank accession no. AC003992). The longest introns, although not in the database, are >300 kb in the dystrophin gene (10), which contains >700 kb of intron sequences (5). Human gene CIT987-SKA-34504 (M. D. Adams et al., GenBank accession no. AC002302) contains introns of 151 kb. The smallest introns were 18-20 nt long in the nucleomorph, a eukaryotic endosymbiont (11), and 21 nt in Paramecium tertaurelia, a ciliated protozoan (12).

UTR regions. In the database, 2% of genes contain descriptions of introns and exons in the 5[prime]- and 3[prime]-UTRs of the RNA (Fig. 2). Seventy-four genes have 5[prime]-UTR sequences, with seven genes having one 5[prime]-UTR intron. Sixty-nine genes have a single 3[prime]-UTR exon. The lengths of these exons and introns show a variable distribution. For instance, the average 3[prime]-UTR sequence is 340 nt long, with a minimum of 17 nt and maximum of 1376 nt; the lengths of the seven 5[prime]-UTR introns range from 96 to 8214 nt.

   Human, Mouse
   Rat, Chicken
   Drosophila, C.elegans
   Cress, Corn
     S.Pombe, Aspergillus

Figure 2. Intron and exon length distributions in model organisms and 5[prime]-/3[prime]-UTRs. The distributions of individual intron (intron length distribution) and exon lengths, of the total intron content per gene (total intron length distribution) and of the total intron content per kb of coding sequence are summarized and graphed. Each graph consists of two parts, with smaller lengths above longer ones, except in certain cases where a small sample size made this unnecessary (intron lengths for Aspergillus, total intron length and intron length per kb coding sequence for S.pombe and total intron length for corn). Exon lengths are given in amino acids, intron lengths in nucleotides. The horizontal and vertical axes represent lengths and frequencies, respectively.

Intron-exon structures in the 10 model organisms

Figure 2 shows the intron-exon structures of the 10 model organisms. Like the overall database, these organisms show a tighter distribution of exon lengths than of intron lengths, as well as minimum intron lengths.

Human genes have the most introns and the largest introns of the 10 species. An average human gene has four introns; the highest recorded number is 116 introns, in the collagen type VII gene (13). The mean intron length is 3413 bases while the most common length is 75-150 nt. The longest introns are recorded in the BSC gene (its first intron is >71 kb) and the dystrophin gene (several introns >100 kb). The intron length in 1 kb of human gene CDS is close to 7 kb (6825 nt) on average. The other two mammalian species (mouse and rat) are similar to humans in the numbers of introns per kb of CDS. Their individual intron lengths and total intron length per kb of CDS or per gene are shorter than human but higher than other non-mammalian species. It appears that these two rodent species have shorter proteins.

The two fungi, S.pombe and Aspergillus, have the shortest introns. Their average gene contains only two introns per kb of CDS, totalling 160-280 nt. Individual introns on average are only 40-75 nt long. This is very similar to the case of S.cerevisiae, the first eukaryotic species whose complete genome was sequenced, which was found to have very few, short introns (14).

Chicken genes contain more and larger introns, next to the mammalian genes. A slightly higher number of introns per gene is compensated for by shorter introns (700 nt on average), giving a total of 1.8 kb of intron sequence per kb of CDS.

Caenorhabditis elegans and Drosophila, two invertebrates, do not contain long introns. Their total intron lengths per kb of CDS average only 1000 and 600 nt, respectively. However, these two species have contrasting intron-exon structures. Caenorhabditis elegans genes contain more (4.0 introns per kb of CDS), shorter (467 nt each) introns, while Drosophila genes have fewer, longer introns (2.7 introns per kb of CDS, 564 nt each).

The genes of the two plant species, corn and cress, contain introns whose lengths per kb of CDS are intermediate, like the two invertebrates. The number of introns is similar in the two plant species (3.9-4.3 per kb of CDS), but the average corn intron (328 nt) is longer than the average cress intron (240 nt).

Correlation between intron size and genome size

In our analysis, we observed a correlation between the size of genomes and the amount of intron sequences in their genes. Table 2 and Figure 3 show correlations between genome size and the number of introns per gene, number of introns per kb of CDS, total intron length per gene and intron length per kb of CDS. The correlation between genome size and total intron length per kb CDS is significant at a marginal level (P = 0.06 for R = 0.6). This weak correlation may suggest a limited causal relationship between intron content and genome complexity. However, it also indicates that other factors are likely to be involved in the evolution of intron size.


Figure 3. Correlation between genome size and average intron size/number in model organisms. Figures labeled `All genes' contain data from all the genes for each of the model organisms; figures labeled `Genes from same families' contain data only from genes present in at least five of the model organisms.


Table 2. Correlation between intron size (means) and genome size
The correlation coefficients (R) are calculated for the correlation between genome size and the measure of intron size indicated at the top of each column. Statistics from the purged database containing genes with the most common number of introns. Sample sizes given are for homologous genes.

Figure 3 reveals a possible relationship between genome size and intron-exon structure. The calculation is, however, based on the purged databases, in which different species may be represented by different types of genes. For example, the plant databases contain photosynthesis-related genes, while the human database contains immunoglobulin genes. Different types of genes may have different intron-exon structures. A more rigorous comparison requires homologous genes in the different organisms. We obtained sequences from 55 gene families that were represented in at least five of the 10 species. Figure 3 shows the relationship between intron content and genome size in these homologous genes. These results show an elevated correlation between intron content and genome size. The higher correlation values in homologous gene sets indicate a better control of the factors due to non-homologous genes in the analyses of Figure 3.

DISCUSSION

This analysis provides a general picture of the intron-exon structure of eukaryotic genes. On average, the analyzed genes have 3.7 introns of 40-150 nt each. These statistics are subject to large variation. A number of introns >100 kb or <20 nt exist in the databases and the literature. A human gene contains more than 100 introns, while some genes in the Fugu fish have more than 60 introns. The intron-exon structures of different organisms are variable. These statistical analyses, in general, may provide a fundamental basis for both understanding the structure of a gene that is identified in molecular studies and developing more sensitive tools to find genes or open reading frames in eukaryotic genome sequences. In particular, this investigation also suggests two interesting points, which may increase our understanding of the evolution of intron-exon structures.

The first is that although introns can be very long, the minimum intron size is limited by the length of the splicing signals. Most of the shortest introns observed were 20-30 bases long; very few were <20 bases. This indicates that in order to encode adequate splicing signals, introns cannot be too short. In fact, conservation analysis of the splice sites showed that the conserved sequence distribution in introns can be extended over >20 nt (14). The smallest recorded introns, found in protist genes, are 13-20 bases long (12). They also encode the minimum splice sequences (GT..AG dinucleotide sequences are encoded), supporting the general conclusion of this analysis.

Second, it has been speculated that intron content is correlated with genome size. For example, it was proposed that intronless prokaryotic genes might be a product of selection against introns for more efficent molecular processes of replication, transcription and processing (16). Recently, it was observed that small genome sizes in Drosophila were correlated with high deletion rates in the Helena retrosequence and introns (17-19). The correlations we have observed in the genomes of model organisms support, in general, the notion that the existence of non-functional elements should be consistent with the size of the genome. The small introns in bird genes provide another example of this relationship (20). However, the correlations as revealed in this study are only at a marginal level of significance in the total intron size per kb CDS, although the correlations increase in homologous genes. These analyses indicate a possibly true but weak connection between genome size and intron sizes, suggesting that there may be other factors involved as well.

This is the distribution of a large sample, with 2903 independent or quasi-independent genes and 17 102 introns. Cautionary notes, however, should be made. First, these statistics were calculated only from genes that contain introns, so information such as the number of introns per gene is only valid for these intron-containing genes. A complete survey of genes that do not contain introns is so far unavailable, except in the case of yeast (14,21). Second, when introns are very long, many researchers tend not to sequence them, for understandable reasons. As a result, the average intron sizes are underestimated to some extent. However, the smooth and fast decrease in frequency of intron sizes in Figure 1 implies that very large introns may make up a small proportion (<5%) of introns in the genome, although the final statistic awaits the completion of the human genome project. Finally, it is not unreasonable to believe that the mode of intron size distribution is likely a stable measurement of most introns.

ACKNOWLEDGEMENTS

We thank Carl Rosenberg at Falling Rain Genomics Inc. (Lincoln, MA) for developing the GBPURGE package which made automatic purging of gene redundancy possible. We thank Walter Gilbert for discussions. The laboratory of M.L. was supported by the Packard Fellowship in Science and Engineering and the National Science Foundation.

REFERENCES

1. Hawkins, J.D. (1988) Nucleic Acids Res., 16, 9893-9906. MEDLINE Abstract

2. Dorit, R.L., Schoenbach,L. and Gilbert,W. (1990) Science, 250, 1377-1382. MEDLINE Abstract

3. Palmer, J.D. and Logsdon,J.M.,Jr (1991) Curr. Opin. Genet. Dev., 1, 470-477. MEDLINE Abstract

4. Mount, S.M., Burks,C., Hertz,G., Stormo,G.D., White,O. and Fields,C. (1992) Nucleic Acids Res., 20, 4255-4262. MEDLINE Abstract

5. Fedorov, A., Suboch,G., Bujakov,M. and Fedorova,L. (1992) Nucleic Acids Res., 20, 2553-2557. MEDLINE Abstract

6. Das, S., Yu,L., Gaitatzes,C., Rogers,R., Freeman,J., Bienkowska,J., Adams,R.M., Smith,T.F. and Lindelien,J. (1997) Nature, 385, 29-30. MEDLINE Abstract

7. Long, M., Rosenberg,C. and Gilbert,W. (1995) Proc. Natl Acad. Sci. USA, 92, 12495-12499. MEDLINE Abstract

8. Sharp, P.A. (1997) Cell, 91, 875-879. MEDLINE Abstract

9. Pearson, W.R. (1994) Methods Mol. Biol., 24, 307-331.

10. Boyce, F.M., Beggs,A.H., Feener,C. and Kunkel,L.M. (1991) Proc. Natl Acad. Sci. USA, 88, 1276-1280. MEDLINE Abstract

11. Gilson, P.R. and McFadden,G.I. (1996) Proc. Natl Acad. Sci. USA, 93, 7737-7742. MEDLINE Abstract

12. Russell, C.B, Fraga,D. and Hinrichsen,R.D. (1994) Nucleic Acids Res., 22, 1221-1225. MEDLINE Abstract

13. Christiano,A.M., Hoffman,G.G., Chung-Honet,L.C., Lee,S., Cheng,W., Uitto,J. and Greenspan,D.S. (1994) Genomics, 21, 169-179. MEDLINE Abstract

14. Long, M., De Souza,J.S. and Gilbert,W. (1997) Cell, 91, 739-740. MEDLINE Abstract

15. Long, M., de Souza,S.J., Rosenberg,C. and Gilbert,W. (1998) Proc. Natl Acad. Sci. USA, 95, 219-223. MEDLINE Abstract

16. Doolittle, W.F. (1990) In Stone,E.M. and Schwartz,R.J. (eds), Intervening Sequences in Evolution and Development. Oxford University Press, Oxford, UK, pp. 43-62.

17. Petrov, D.A. and Hartl,D.L. (1998) Mol. Biol. Evol., 15, 293-302. MEDLINE Abstract

18. Petrov, D.A., Lozovskaya,E.R. and Hartl,D.L. (1996) Nature, 384, 346-349. MEDLINE Abstract

19. Moriyama, E.N., Petrov,D.A. and Hartl,D.L. (1998) Mol. Biol. Evol., 15, 770-773. MEDLINE Abstract

20. Hughes, A.L. and Hughes,M.K. (1995) Nature, 377, 391. MEDLINE Abstract

21. Goffeau, A., Barrell,B.G., Bussey,H., Davis,R.W., Dujon,B., Feldmann,H., Galibert,F., Hoheisel,J.D., Jacq,C., Johnston,M., Louis,E.J., Mewes,H.W., Murakami,Y., Philippsen,P., Tettelin,H. and Oliver,S.G. (1996) Science, 274, 546-567. MEDLINE Abstract

22. Baxendale, S., Abdulla,S., Elgar,G., Buck,D., Berks,M., Micklem,G., Durbin,R., Bates,G., Brenner,S., Beck,S. and Lehrach,H. (1995) Nature Genet., 10, 67-76. MEDLINE Abstract


*To whom correspondence should be addressed. Tel: +1 773 702 0557; Fax: +1 773 702 9740; Email: mlong{at}midway.uchicago.edu


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification:
Copyright© Oxford University Press, 1999.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Mol Biol EvolHome page
F. Belinky, O. Cohen, and D. Huchon
Large-Scale Parsimony Analysis of Metazoan Indels in Protein-Coding Genes
Mol. Biol. Evol., February 1, 2010; 27(2): 441 - 451.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
Z. Zhu, Y. Zhang, and M. Long
Extensive Structural Renovation of Retrogenes in the Evolution of the Populus Genome
Plant Physiology, December 1, 2009; 151(4): 1943 - 1951.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. Andersson, S. Enroth, A. Rada-Iglesias, C. Wadelius, and J. Komorowski
Nucleosomes are well positioned in exons and carry characteristic histone modifications
Genome Res., October 1, 2009; 19(10): 1732 - 1741.
[Abstract] [Full Text] [PDF]


Home page
J HeredHome page
E. V. Koonin
Intron-Dominated Genomes of Early Ancestors of Eukaryotes
J. Hered., September 1, 2009; 100(5): 618 - 623.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
M. Hiller, S. Findeiss, S. Lein, M. Marz, C. Nickel, D. Rose, C. Schulz, R. Backofen, S. J. Prohaska, G. Reuter, et al.
Conserved introns reveal novel transcripts in Drosophila melanogaster
Genome Res., July 1, 2009; 19(7): 1289 - 1300.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. D. Cutter, A. Dey, and R. L. Murray
Evolution of the Caenorhabditis elegans Genome
Mol. Biol. Evol., June 1, 2009; 26(6): 1199 - 1234.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
M. Roy, N. Kim, Y. Xing, and C. Lee
The effect of intron length on exon creation ratios during the evolution of mammalian genomes
RNA, November 1, 2008; 14(11): 2261 - 2273.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
W. B. Barbazuk, Y. Fu, and K. M. McGinnis
Genome-wide analyses of alternative splicing in plants: Opportunities and challenges
Genome Res., September 1, 2008; 18(9): 1381 - 1392.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
J. J. Emerson, M. Cardoso-Moreira, J. O. Borevitz, and M. Long
Natural Selection Shapes Genome-Wide Patterns of Copy-Number Polymorphism in Drosophila melanogaster
Science, June 20, 2008; 320(5883): 1629 - 1631.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Csuros, I. B. Rogozin, and E. V. Koonin
Extremely Intron-Rich Genes in the Alveolate Ancestors Inferred with a Flexible Maximum-Likelihood Approach
Mol. Biol. Evol., May 1, 2008; 25(5): 903 - 911.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
N. Wahlberg and C. W. Wheat
Genomic Outposts Serve the Phylogenomic Pioneers: Designing Novel Nuclear Markers for Genomic DNA Extractions of Lepidoptera
Syst Biol, April 1, 2008; 57(2): 231 - 242.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
K. J. Hertel
Combinatorial Control of Exon Recognition
J. Biol. Chem., January 18, 2008; 283(3): 1211 - 1215.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
S. Schwartz, J. Silva, D. Burstein, T. Pupko, E. Eyras, and G. Ast
Large-scale comparative analysis of splicing signals and their corresponding splicing factors in eukaryotes
Genome Res., January 1, 2008; 18(1): 88 - 103.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. K. Basu, I. B. Rogozin, O. Deusch, T. Dagan, W. Martin, and E. V. Koonin
Evolutionary Dynamics of Introns in Plastid-Derived Genes in Plants: Saturation Nearly Reached but Slow Intron Gain Continues
Mol. Biol. Evol., January 1, 2008; 25(1): 111 - 119.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
L. Carmel, Y. I. Wolf, I. B. Rogozin, and E. V. Koonin
Three distinct modes of intron dynamics in the evolution of eukaryotes
Genome Res., July 1, 2007; 17(7): 1034 - 1044.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Coghlan and R. Durbin
Genomix: a method for combining gene-finders' predictions, which uses evolutionary conservation of sequence and intron exon structure
Bioinformatics, June 15, 2007; 23(12): 1468 - 1475.
[Abstract] [Full Text] [PDF]


Home page
Plant Cell PhysiolHome page
Y. Kodama, S. Nagaya, A. Shinmyo, and K. Kato
Mapping and Characterization of DNase I Hypersensitive Sites in Arabidopsis Chromatin
Plant Cell Physiol., March 1, 2007; 48(3): 459 - 470.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Aluri and M. Buttner
Identification and functional expression of the Arabidopsis thaliana vacuolar glucose transporter 1 and its role in seed germination and flowering
PNAS, February 13, 2007; 104(7): 2537 - 2542.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
X. Hong, D. G. Scofield, and M. Lynch
Intron Size, Abundance, and Distribution within Untranslated Regions of Genes
Mol. Biol. Evol., December 1, 2006; 23(12): 2392 - 2404.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
D. C. Presgraves
Intron Length Evolution in Drosophila
Mol. Biol. Evol., November 1, 2006; 23(11): 2203 - 2213.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. S. Hawkins, H. Kim, J. D. Nason, R. A. Wing, and J. F. Wendel
Differential lineage-specific amplification of transposable elements is responsible for genome size variation in Gossypium
Genome Res., October 1, 2006; 16(10): 1252 - 1261.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
P. R. Gilson, V. Su, C. H. Slamovits, M. E. Reith, P. J. Keeling, and G. I. McFadden
From the Cover: Complete nucleotide sequence of the chlorarachniophyte nucleomorph: Nature's smallest nucleus
PNAS, June 20, 2006; 103(25): 9566 - 9571.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Agrawal and G. D. Stormo
Using mRNAs lengths to accurately predict the alternatively spliced gene products in Caenorhabditis elegans
Bioinformatics, May 15, 2006; 22(10): 1239 - 1244.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
B.-B. Wang and V. Brendel
Genomewide comparative analysis of alternative splicing in plants
PNAS, May 2, 2006; 103(18): 7175 - 7180.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
K. L. Fox-Walsh, Y. Dou, B. J. Lam, S.-p. Hung, P. F. Baldi, and K. J. Hertel
The architecture of pre-mRNAs affects mechanisms of splice-site pairing
PNAS, November 8, 2005; 102(45): 16176 - 16181.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. M. Burnette, E. Miyamoto-Sato, M. A. Schaub, J. Conklin, and A. J. Lopez
Subdivision of Large Introns in Drosophila by Recursive Splicing at Nonexonic Elements
Genetics, June 1, 2005; 170(2): 661 - 674.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Vanacova, W. Yan, J. M. Carlton, and P. J. Johnson
Spliceosomal introns in the deep-branching eukaryote Trichomonas vaginalis
PNAS, March 22, 2005; 102(12): 4430 - 4435.
[Abstract] [Full Text] [PDF]


Home page
MicrobiologyHome page
M. J. Nicholson, M. K. Theodorou, and J. L. Brookman
Molecular analysis of the anaerobic rumen fungus Orpinomyces - insights into an AT-rich genome
Microbiology, January 1, 2005; 151(1): 121 - 133.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
X. Wang, X. Zhao, J. Zhu, and W. Wu
Genome-wide Investigation of Intron Length Polymorphisms and Their Potential as Molecular Markers in Rice (Oryza sativa L.)
DNA Res, January 1, 2005; 12(6): 417 - 427.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
V. Krauss, M. Pecyna, K. Kurz, and H. Sass
Phylogenetic Mapping of Intron Positions: A Case Study of Translation Initiation Factor eIF2{gamma}
Mol. Biol. Evol., January 1, 2005; 22(1): 74 - 84.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
D. T. Fritz, D. Liu, J. Xu, S. Jiang, and M. B. Rogers
Conservation of Bmp2 Post-transcriptional Regulatory Mechanisms
J. Biol. Chem., November 19, 2004; 279(47): 48950 - 48958.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
D. M. Kupfer, S. D. Drabenstot, K. L. Buchanan, H. Lai, H. Zhu, D. W. Dyer, B. A. Roe, and J. W. Murphy
Introns and Splicing Elements of Five Diverse Fungi
Eukaryot. Cell, October 1, 2004; 3(5): 1088 - 1100.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. E. Grover, H. Kim, R. A. Wing, A. H. Paterson, and J. F. Wendel
Incongruent Patterns of Local and Global Genome Size Evolution in Cotton
Genome Res., August 1, 2004; 14(8): 1474 - 1482.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W.-G. Qiu, N. Schisler, and A. Stoltzfus
The Evolutionary Gain of Spliceosomal Introns: Sequence and Phase Preferences
Mol. Biol. Evol., July 1, 2004; 21(7): 1252 - 1263.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
R. Jarving, I. Jarving, R. Kurg, A. R. Brash, and N. Samel
On the Evolutionary Origin of Cyclooxygenase (COX) Isozymes: CHARACTERIZATION OF MARINE INVERTEBRATE COX GENES POINTS TO INDEPENDENT DUPLICATION EVENTS IN VERTEBRATE AND INVERTEBRATE LINEAGES
J. Biol. Chem., April 2, 2004; 279(14): 13624 - 13633.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
A. Prachumwat, L. DeVincentis, and M. F. Palopoli
Intron Size Correlates Positively With Recombination Rate in Caenorhabditis elegans
Genetics, March 1, 2004; 166(3): 1585 - 1590.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein
Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome
Genome Res., December 1, 2003; 13(12): 2541 - 2558.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. Parsch
Selective Constraints on Intron Evolution in Drosophila
Genetics, December 1, 2003; 165(4): 1843 - 1851.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
M. Lynch and J. S. Conery
The Origins of Genome Complexity
Science, November 21, 2003; 302(5649): 1401 - 1404.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. D. Drabenstot, D. M. Kupfer, J. D. White, D. W. Dyer, B. A. Roe, K. L. Buchanan, and J. W. Murphy
FELINES: a utility for extracting and examining EST-defined introns and exons
Nucleic Acids Res., November 15, 2003; 31(22): e141 - e141.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
P. J. Clyne, J. S. Brotman, S. T. Sweeney, and G. Davis
Green Fluorescent Protein Tagging Drosophila Proteins at Their Native Genomic Loci With Small P Elements
Genetics, November 1, 2003; 165(3): 1433 - 1441.
[Abstract] [Full Text] [PDF]


Home page
Hum Mol GenetHome page
V. Rapic-Otrin, V. Navazza, T. Nardo, E. Botta, M. McLenigan, D. C. Bisi, A. S. Levine, and M. Stefanini
True XP group E patients have a defective UV-damaged DNA binding protein complex and mutations in DDB2 which reveal the functional domains of its p48 product
Hum. Mol. Genet., July 1, 2003; 12(13): 1507 - 1522.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
A. M. Nascimento, G. H. Goldman, S. Park, S. A. E. Marras, G. Delmas, U. Oza, K. Lolans, M. N. Dudley, P. A. Mann, and D. S. Perlin
Multiple Resistance Mechanisms among Aspergillus fumigatus Mutants with High-Level Resistance to Itraconazole
Antimicrob. Agents Chemother., May 1, 2003; 47(5): 1719 - 1726.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
U. Pozzoli, G. Elgar, R. Cagliani, L. Riva, G. P. Comi, N. Bresolin, A. Bardoni, and M. Sironi
Comparative Analysis of Vertebrate Dystrophin Loci Indicate Intron Gigantism as a Common Feature
Genome Res., May 1, 2003; 13(5): 764 - 772.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. Bon, S. Casaregola, G. Blandin, B. Llorente, C. Neuveglise, M. Munsterkotter, U. Guldener, H.-W. Mewes, J. V. Helden, B. Dujon, et al.
Molecular evolution of eukaryotic genomes: hemiascomycetous yeast spliceosomal introns
Nucleic Acids Res., February 15, 2003; 31(4): 1121 - 1135.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Hartung, F. R. Blattner, and H. Puchta
Intron gain and loss in the evolution of the conserved eukaryotic recombination machinery
Nucleic Acids Res., December 1, 2002; 30(23): 5175 - 5181.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. Majewski and J. Ott
Distribution and Characterization of Regulatory Elements in the Human Genome
Genome Res., December 1, 2002; 12(12): 1827 - 1836.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. F. Wendel, R. C. Cronn, I. Alvarez, B. Liu, R. L. Small, and D. S. Senchina
Intron Size and Genome Size in Plants
Mol. Biol. Evol., December 1, 2002; 19(12): 2346 - 2352.
[Full Text] [PDF]


Home page
Genome ResHome page
H. Kaessmann, S. Zollner, A. Nekrutenko, and W.-H. Li
Signatures of Domain Shuffling in the Human Genome
Genome Res., November 1, 2002; 12(11): 1642 - 1650.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
S. E. Ptak and D. A. Petrov
How Intron Splicing Affects the Deletion and Insertion Profile in Drosophila melanogaster
Genetics, November 1, 2002; 162(3): 1233 - 1244.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. B. Rogozin, K. S. Makarova, D. A. Natale, A. N. Spiridonov, R. L. Tatusov, Y. I. Wolf, J. Yin, and E. V. Koonin
Congruent evolution of different classes of non-coding DNA in prokaryotic genomes
Nucleic Acids Res., October 1, 2002; 30(19): 4264 - 4271.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
R. Sorek, G. Ast, and D. Graur
Alu-Containing Exons are Alternatively Spliced
Genome Res., July 1, 2002; 12(7): 1060 - 1067.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Lynch
Intron evolution as a population-genetic process
PNAS, April 30, 2002; 99(9): 6118 - 6123.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. T. Webb, S. A. Shabalina, A. Yu. Ogurtsov, and A. S. Kondrashov
Analysis of similarity within 142 pairs of orthologous intergenic regions of Caenorhabditis elegans and Caenorhabditis briggsae
Nucleic Acids Res., March 1, 2002; 30(5): 1233 - 1239.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. Sakharkar, F. Passetti, J. E. de Souza, M. Long, and S. J. de Souza
ExInt: an Exon Intron Database
Nucleic Acids Res., January 1, 2002; 30(1): 191 - 194.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. S. Mattick and M. J. Gagen
Review ArticleThe Evolution of Controlled Multitasked Gene Networks: The Role of Introns and Other Noncoding RNAs in the Development of Complex Organisms
Mol. Biol. Evol., September 1, 2001; 18(9): 1611 - 1630.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
C. M. Bergman and M. Kreitman
Analysis of Conserved Noncoding DNA in Drosophila Reveals Similar Constraints in Intergenic and Intronic Sequences
Genome Res., August 1, 2001; 11(8): 1335 - 1345.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
L. M. Gomulski, R. J. Pitts, S. Costa, G. Saccone, C. Torti, L. C. Polito, G. Gasperi, A. R. Malacrida, F. C. Kafatos, and L. J. Zwiebel
Genomic Organization and Characterization of the white Locus of the Mediterranean Fruitfly, Ceratitis capitata
Genetics, March 1, 2001; 157(3): 1245 - 1255.
[Abstract] [Full Text]


Home page
Plant CellHome page
T. Kohchi, K. Mukougawa, N. Frankenberg, M. Masuda, A. Yokota, and J. C. Lagarias
The Arabidopsis HY2 Gene Encodes Phytochromobilin Synthase, a Ferredoxin-Dependent Biliverdin Reductase
PLANT CELL, February 1, 2001; 13(2): 425 - 436.
[Abstract] [Full Text]


Home page
Nucleic Acids ResHome page
F. Hartung, H. Plchova, and H. Puchta
Molecular characterisation of RecQ homologues in Arabidopsis thaliana
Nucleic Acids Res., November 1, 2000; 28(21): 4275 - 4282.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
J. M. Comeron and M. Kreitman
The Correlation Between Intron Length and Recombination in Drosophila: Dynamic Equilibrium Between Mutational and Selective Forces
Genetics, November 1, 2000; 156(3): 1175 - 1190.
[Abstract] [Full Text]


Home page
Nucleic Acids ResHome page
J. Trzcinska-Danielewicz and J. Fronk
SURVEY AND SUMMARY: Exon-intron organization of genes in the slime mold Physarum polycephalum
Nucleic Acids Res., September 15, 2000; 28(18): 3411 - 3416.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. A. Davis, L. Grate, M. Spingola, and M. Ares Jr
Test of intron predictions reveals novel splice sites, alternatively spliced mRNAs and new introns in meiotically regulated genes of yeast
Nucleic Acids Res., April 15, 2000; 28(8): 1700 - 1706.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. A. Thanaraj
Positional characterisation of false positives from computational prediction of human splice sites
Nucleic Acids Res., February 1, 2000; 28(3): 744 - 754.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. J. Schisler and J. D. Palmer
The IDB and IEDB: intron sequence and evolution databases
Nucleic Acids Res., January 1, 2000; 28(1): 181 - 184.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Saxonov, I. Daizadeh, A. Fedorov, and W. Gilbert
EID: the Exon-Intron Database--an exhaustive database of protein-coding intron-containing genes
Nucleic Acids Res., January 1, 2000; 28(1): 185 - 190.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
C. S. Walker, R. P. Shetty, K. Clark, S. G. Kazuko, A. Letsou, B. M. Olivera, and P. K. Bandyopadhyay
On a Potential Global Role for Vitamin K-dependent gamma -Carboxylation in Animal Systems. EVIDENCE FOR A gamma -GLUTAMYL CARBOXYLASE IN DROSOPHILA
J. Biol. Chem., March 9, 2001; 276(11): 7769 - 7774.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
N. Boudet, S. Aubourg, C. Toffano-Nioche, M. Kreis, and A. Lecharny
Evolution of Intron/Exon Structure of DEAD Helicase Family Genes in Arabidopsis, Caenorhabditis, and Drosophila
Genome Res., December 1, 2001; 11(12): 2101 - 2114.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (562K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (182)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Deutsch, M.
Right arrow Articles by Long, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Deutsch, M.
Right arrow Articles by Long, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?