Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (458K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (30)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Echols, N.
Right arrow Articles by Gerstein, M.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Echols, N.
Right arrow Articles by Gerstein, M.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 11 2515-2523
© 2002 Oxford University Press

Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes

Nathaniel Echols, Paul Harrison, Suganthi Balasubramanian, Nicholas M. Luscombe, Paul Bertone, Zhaolei Zhang and Mark Gerstein*

Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, Box 208114, New Haven, CT 06520-8114, USA

Based on searches for disabled homologs to known proteins, we have identified a large population of pseudogenes in four sequenced eukaryotic genomes—the worm, yeast, fly and human (chromosomes 21 and 22 only). Each of our nearly 2500 pseudogenes is characterized by one or more disablements mid-domain, such as premature stops and frameshifts. Here, we perform a comprehensive survey of the amino acid and nucleotide composition of these pseudogenes in comparison to that of functional genes and intergenic DNA. We show that pseudogenes invariably have an amino acid composition intermediate between genes and translated intergenic DNA. Although the degree of intermediacy varies among the four organisms, in all cases, it is most evident for amino acid types that differ most in occurrence between genes and intergenic regions. The same intermediacy also applies to codon frequencies, especially in the worm and human. Moreover, the intermediate composition of pseudogenes applies even though the composition of the genes in the four organisms is markedly different, showing a strong correlation with the overall A/T content of the genomic sequence. Pseudogenes can be divided into ‘ancient’ and ‘modern’ subsets, based on the level of sequence identity with their closest matching homolog (within the same genome). Modern pseudogenes usually have a much closer sequence composition to genes than ancient pseudogenes. Collectively, our results indicate that the composition of pseudogenes that are under no selective constraints progressively drifts from that of coding DNA towards non-coding DNA. Therefore, we propose that the degree to which pseudogenes approach a random sequence composition may be useful in dating different sets of pseudogenes, as well as to assess the rate at which intergenic DNA accumulates mutations. Our compositional analyses with the interactive viewer are available over the web at http://genecensus.org/pseudogene.

* To whom correspondence should be addressed. Tel: +1 203 432 6105; Fax: +1 360 838 7861; Email: mark.gerstein{at}yale.edu


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
F. Diella, S. Chabanis, K. Luck, C. Chica, C. Ramu, C. Nerlov, and T. J. Gibson
KEPE--a motif frequently superimposed on sumoylation sites in metazoan chromatin proteins and transcription factors
Bioinformatics, January 1, 2009; 25(1): 1 - 5.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
H. P. Harding, J. G. Lackey, H.-C. Hsu, Y. Zhang, J. Deng, R.-M. Xu, M. J. Damha, and D. Ron
An intact unfolded protein response in Trpt1 knockout mice reveals phylogenic divergence in pathways for RNA ligation
RNA, February 1, 2008; 14(2): 225 - 232.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. G. Beiko and R. L. Charlebois
A simulation test bed for hypotheses of genome evolution
Bioinformatics, April 1, 2007; 23(7): 825 - 831.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. S. Balakirev, V. R. Chechetkin, V. V. Lobzin, and F. J. Ayala
Entropy and GC Content in the {beta}-esterase Gene Cluster of the Drosophila melanogaster Subgroup
Mol. Biol. Evol., October 1, 2005; 22(10): 2063 - 2072.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
N. Cobbe and M. M. S. Heck
The Evolution of SMC Proteins: Phylogenetic Analysis and Structural Implications
Mol. Biol. Evol., February 1, 2004; 21(2): 332 - 347.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.