Nucleic Acids Research, 2001, Vol. 29, No. 3 818-830
© 2001 Oxford University Press
Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome
Department of Molecular Biophysics and Biochemistry, Yale University, 260 Whitney Avenue, PO Box 208114, New Haven, CT 06511-8114, USA
Pseudogenes are non-functioning copies of genes in genomic DNA, which may either result from reverse transcription from an mRNA transcript (processed pseudogenes) or from gene duplication and subsequent disablement (non-processed pseudogenes). As pseudogenes are apparently dead, they usually have a variety of obvious disablements (e.g., insertions, deletions, frameshifts and truncations) relative to their functioning homologs. We have derived an initial estimate of the size, distribution and characteristics of the pseudogene population in the Caenorhabditis elegans genome, performing a survey in molecular archaeology. Corresponding to the 18 576 annotated proteins in the worm (i.e., in Wormpep18), we have found an estimated total of 2168 pseudogenes, about one for every eight genes. Few of these appear to be processed. Details of our pseudogene assignments are available from http://bioinfo.mbb.yale.edu/genome/worm/pseudogene. The population of pseudogenes differs significantly from that of genes in a number of respects: (i) pseudogenes are distributed unevenly across the genome relative to genes, with a disproportionate number on chromosome IV; (ii) the density of pseudogenes is higher on the arms of the chromosomes; (iii) the amino acid composition of pseudogenes is midway between that of genes and (translations of) random intergenic DNA, with enrichment of Phe, Ile, Leu and Lys, and depletion of Asp, Ala, Glu and Gly relative to the worm proteome; and (iv) the most common protein folds and families differ somewhat between genes and pseudogeneswhereas the most common fold found in the worm proteome is the immunoglobulin fold and the most common pseudofold is the C-type lectin. In addition, the size of a gene family bears little overall relationship to the size of its corresponding pseudogene complement, indicating a highly dynamic genome. There are in fact a number of families associated with large populations of pseudogenes. For example, one family of seven-transmembrane receptors (represented by gene B0334.7) has one pseudogene for every four genes, and another uncharacterized family (represented by gene B0403.1) is approximately two-thirds pseudogenic. Furthermore, over a hundred apparent pseudogenic fragments do not have any obvious homologs in the worm.
* To whom correspondence should be addressed. Tel: +1 203 432 6105; Fax: +1 360 838 7861; Email: mark.gerstein{at}yale.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. D. Cutter, A. Dey, and R. L. Murray Evolution of the Caenorhabditis elegans Genome Mol. Biol. Evol., June 1, 2009; 26(6): 1199 - 1234. [Abstract] [Full Text] [PDF] |
||||
![]() |
G.-F. Richard, A. Kerrest, and B. Dujon Comparative Genomics and Molecular Dynamics of DNA Repeats in Eukaryotes Microbiol. Mol. Biol. Rev., December 1, 2008; 72(4): 686 - 727. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. H. Maxwell and M. J. Curcio Retrosequence formation restructures the yeast genome Genes & Dev., December 15, 2007; 21(24): 3308 - 3318. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zheng, J. Shi, X. Fang, Y. Li, S. Vang, W. Fan, J. Wang, Z. Zhang, W. Wang, K. Kristiansen, et al. FGF: A web tool for Fishing Gene Family in a whole genome database Nucleic Acids Res., July 13, 2007; 35(suppl_2): W121 - W125. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Shakhnovich and E. V. Koonin Origins and impact of constraints in evolution of gene families Genome Res., December 1, 2006; 16(12): 1529 - 1536. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Drouin Processed Pseudogenes Are More Abundant in Human and Mouse X Chromosomes than in Autosomes Mol. Biol. Evol., September 1, 2006; 23(9): 1652 - 1655. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Csuros and I. Miklos Statistical Alignment of Retropseudogenes and Their Functional Paralogs Mol. Biol. Evol., December 1, 2005; 22(12): 2457 - 2471. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang, Y. Wu, Y. Liu, and B. Han Computational Identification of 69 Retroposons in Arabidopsis Plant Physiology, June 1, 2005; 138(2): 935 - 948. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome Genome Res., December 1, 2003; 13(12): 2541 - 2558. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Torrents, M. Suyama, E. Zdobnov, and P. Bork A Genome-Wide Survey of Human Pseudogenes Genome Res., December 1, 2003; 13(12): 2559 - 2567. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Katju and M. Lynch The Structure and Early Evolution of Recently Arisen Gene Duplicates in the Caenorhabditis elegans Genome Genetics, December 1, 2003; 165(4): 1793 - 1803. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. W. Ganko, V. Bhattacharjee, P. Schliekelman, and J. F. McDonald Evidence for the Contribution of LTR Retrotransposons to C. elegans Gene Evolution Mol. Biol. Evol., November 1, 2003; 20(11): 1925 - 1931. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. I. Wright, N. Agrawal, and T. E. Bureau Effects of Recombination Rate and Gene Density on Transposable Element Distributions in Arabidopsis thaliana Genome Res., August 1, 2003; 13(8): 1897 - 1903. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, D. Milburn, Z. Zhang, P. Bertone, and M. Gerstein Identification of pseudogenes in the Drosophila melanogaster genome Nucleic Acids Res., February 1, 2003; 31(3): 1033 - 1037. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. C. Conant and A. Wagner GenomeHistory: a software tool and its application to fully sequenced genomes Nucleic Acids Res., August 1, 2002; 30(15): 3378 - 3386. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Echols, P. Harrison, S. Balasubramanian, N. M. Luscombe, P. Bertone, Z. Zhang, and M. Gerstein Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes Nucleic Acids Res., June 1, 2002; 30(11): 2515 - 2523. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, A. Kumar, N. Lang, M. Snyder, and M. Gerstein A question of size: the eukaryotic proteome and the problems in defining it Nucleic Acids Res., March 1, 2002; 30(5): 1083 - 1090. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, H. Hegyi, S. Balasubramanian, N. M. Luscombe, P. Bertone, N. Echols, T. Johnson, and M. Gerstein Molecular Fossils in the Human Genome: Identification and Analysis of the Pseudogenes in Chromosomes 21 and 22 Genome Res., February 1, 2002; 12(2): 272 - 280. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. E. Warren, A. Krizus, and J. W. Dennis Complementary expression patterns of six nonessential Caenorhabditis elegans core 2/I N-acetylglucosaminyltransferase homologues Glycobiology, November 1, 2001; 11(11): 979 - 988. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Hegyi and M. Gerstein Annotation Transfer for Genomics: Measuring Functional Divergence in Multi-Domain Proteins Genome Res., October 1, 2001; 11(10): 1632 - 1640. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Greenbaum, N. M. Luscombe, R. Jansen, J. Qian, and M. Gerstein Interrelating Different Types of Genomic Data, from Proteome to Secretome: 'Oming in on Function Genome Res., September 1, 2001; 11(9): 1463 - 1468. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Qian, B. Stenger, C. A. Wilson, J. Lin, R. Jansen, S. A. Teichmann, J. Park, W. G. Krebs, H. Yu, V. Alexandrov, et al. PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information Nucleic Acids Res., April 15, 2001; 29(8): 1750 - 1764. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Mounsey, P. Bauer, and I. A. Hope Evidence Suggesting That a Fifth of Annotated Caenorhabditis elegans Genes May Be Pseudogenes Genome Res., May 1, 2002; 12(5): 770 - 775. [Abstract] [Full Text] [PDF] |
||||







