Nucleic Acids Research, 2002, Vol. 30, No. 11 2515-2523
© 2002 Oxford University Press
Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes
Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, Box 208114, New Haven, CT 06520-8114, USA
Based on searches for disabled homologs to known proteins, we have identified a large population of pseudogenes in four sequenced eukaryotic genomesthe worm, yeast, fly and human (chromosomes 21 and 22 only). Each of our nearly 2500 pseudogenes is characterized by one or more disablements mid-domain, such as premature stops and frameshifts. Here, we perform a comprehensive survey of the amino acid and nucleotide composition of these pseudogenes in comparison to that of functional genes and intergenic DNA. We show that pseudogenes invariably have an amino acid composition intermediate between genes and translated intergenic DNA. Although the degree of intermediacy varies among the four organisms, in all cases, it is most evident for amino acid types that differ most in occurrence between genes and intergenic regions. The same intermediacy also applies to codon frequencies, especially in the worm and human. Moreover, the intermediate composition of pseudogenes applies even though the composition of the genes in the four organisms is markedly different, showing a strong correlation with the overall A/T content of the genomic sequence. Pseudogenes can be divided into ancient and modern subsets, based on the level of sequence identity with their closest matching homolog (within the same genome). Modern pseudogenes usually have a much closer sequence composition to genes than ancient pseudogenes. Collectively, our results indicate that the composition of pseudogenes that are under no selective constraints progressively drifts from that of coding DNA towards non-coding DNA. Therefore, we propose that the degree to which pseudogenes approach a random sequence composition may be useful in dating different sets of pseudogenes, as well as to assess the rate at which intergenic DNA accumulates mutations. Our compositional analyses with the interactive viewer are available over the web at http://genecensus.org/pseudogene.
* To whom correspondence should be addressed. Tel: +1 203 432 6105; Fax: +1 360 838 7861; Email: mark.gerstein{at}yale.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
H. P. Harding, J. G. Lackey, H.-C. Hsu, Y. Zhang, J. Deng, R.-M. Xu, M. J. Damha, and D. Ron An intact unfolded protein response in Trpt1 knockout mice reveals phylogenic divergence in pathways for RNA ligation RNA, February 1, 2008; 14(2): 225 - 232. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Tuller, B. Chor, and N. Nelson Forbidden penta-peptides Protein Sci., October 1, 2007; 16(10): 2251 - 2259. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. G. Beiko and R. L. Charlebois A simulation test bed for hypotheses of genome evolution Bioinformatics, April 1, 2007; 23(7): 825 - 831. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Balakirev, V. R. Chechetkin, V. V. Lobzin, and F. J. Ayala Entropy and GC Content in the {beta}-esterase Gene Cluster of the Drosophila melanogaster Subgroup Mol. Biol. Evol., October 1, 2005; 22(10): 2063 - 2072. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Cobbe and M. M. S. Heck The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications Mol. Biol. Evol., February 1, 2004; 21(2): 332 - 347. [Abstract] [Full Text] [PDF] |
||||



