Nucleic Acids Research, 2002, Vol. 30, No. 5 1083-1090
© 2002 Oxford University Press
A question of size: the eukaryotic proteome and the problems in defining it
Department of Molecular Biophysics and Biochemistry, Yale University, 266 Whitney Avenue, PO Box 208114, New Haven, CT 06520-8114, USA
We discuss the problems in defining the extent of the proteomes for completely sequenced eukaryotic organisms (i.e. the total number of protein-coding sequences), focusing on yeast, worm, fly and human. (i) Six years after completion of its genome sequence, the true size of the yeast proteome is still not defined. New small genes are still being discovered, and a large number of existing annotations are being called into question, with these questionable ORFs (qORFs) comprising up to one-fifth of the current proteome. We discuss these in the context of an ideal genome-annotation strategy that considers the proteome as a rigorously defined subset of all possible coding sequences (the orfome). (ii) Despite the greater apparent complexity of the fly (more cells, more complex physiology, longer lifespan), the nematode worm appears to have more genes. To explain this, we compare the annotated proteomes of worm and fly, relating to both genome-annotation and genome evolution issues. (iii) The unexpectedly small size of the gene complement estimated for the complete human genome provoked much public debate about the nature of biological complexity. However, in the first instance, for the human genome, the relationship between gene number and proteome size is far from simple. We survey the current estimates for the numbers of human genes and, from this, we estimate a range for the size of the human proteome. The determination of this is substantially hampered by the unknown extent of the cohort of pseudogenes (dead genes), in combination with the prevalence of alternative splicing. (Further information relating to yeast is available at http://genecensus.org/yeast/orfome)
* To whom correspondence should be addressed. Tel: +1 203 432 6105; Fax: +1 360 838 7861; Email: mark.gerstein{at}yale.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Jones, A. W. Custer, and D. J. Begun Origin and Evolution of a Chimeric Fusion Gene in Drosophila subobscura, D. madeirensis and D. guanche Genetics, May 1, 2005; 170(1): 207 - 219. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. S. Okerberg, J. Wu, B. Zhang, B. Samii, K. Blackford, D. T. Winn, K. R. Shreder, J. J. Burbaum, and M. P. Patricelli High-resolution functional proteomics by active-site peptide profiling PNAS, April 5, 2005; 102(14): 4996 - 5001. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. A. Wetmore and B. A. Merrick Invited Review: Toxicoproteomics: Proteomics Applied to Toxicology and Pathology Toxicol Pathol, October 1, 2004; 32(6): 619 - 642. [Abstract] [PDF] |
||||
![]() |
D. Schubeler, D. M. MacAlpine, D. Scalzo, C. Wirbelauer, C. Kooperberg, F. van Leeuwen, D. E. Gottschling, L. P. O'Neill, B. M. Turner, J. Delrow, et al. The histone modification pattern of active genes revealed through genome-wide chromatin analysis of a higher eukaryote Genes & Dev., June 1, 2004; 18(11): 1263 - 1271. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Stoilov, R. Daoud, O. Nayler, and S. Stamm Human tra2-beta1 autoregulates its protein concentration by influencing alternative splicing of its pre-mRNA Hum. Mol. Genet., March 1, 2004; 13(5): 509 - 524. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xing, A. Resch, and C. Lee The Multiassembly Problem: Reconstructing Multiple Transcript Isoforms From EST Fragment Mixtures Genome Res., March 1, 2004; 14(3): 426 - 441. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhang, P. M. Harrison, Y. Liu, and M. Gerstein Millions of Years of Evolution Preserved: A Comprehensive Catalog of the Processed Pseudogenes in the Human Genome Genome Res., December 1, 2003; 13(12): 2541 - 2558. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. A. Huber, K. Pfaller, and I. Vietor Organelle Proteomics: Implications for Subcellular Fractionation in Proteomics Circ. Res., May 16, 2003; 92(9): 962 - 968. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, D. Milburn, Z. Zhang, P. Bertone, and M. Gerstein Identification of pseudogenes in the Drosophila melanogaster genome Nucleic Acids Res., February 1, 2003; 31(3): 1033 - 1037. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. P. Lewis, R. E. Green, and S. E. Brenner Evidence for the widespread coupling of alternative splicing and nonsense-mediated mRNA decay in humans PNAS, January 7, 2003; 100(1): 189 - 192. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Krause, R. Sillard, B. Kleemeier, E. Kluver, E. Maronde, J. R. Conejo-Garcia, W. G. Forssmann, P. Schulz-Knappe, M. C. Nehls, F. Wattler, et al. Isolation and biochemical characterization of LEAP-2, a novel blood peptide expressed in the liver Protein Sci., January 1, 2003; 12(1): 143 - 152. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Kiechle, P. Manivasakam, F. Eckardt-Schupp, R. H. Schiestl, and A. A. Friedl Promoter-trapping in Saccharomyces cerevisiae by radiation-assisted fragment insertion Nucleic Acids Res., December 15, 2002; 30(24): e136 - e136. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. N. Adkins, S. M. Varnum, K. J. Auberry, R. J. Moore, N. H. Angell, R. D. Smith, D. L. Springer, and J. G. Pounds Toward a Human Blood Serum Proteome: Analysis By Multidimensional Separation Coupled With Mass Spectrometry Mol. Cell. Proteomics, December 1, 2002; 1(12): 947 - 955. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Stamm Signals and their transduction pathways regulating alternative splicing: a new dimension of the human genome Hum. Mol. Genet., October 1, 2002; 11(20): 2409 - 2416. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhang, P. Harrison, and M. Gerstein Identification and Analysis of Over 2000 Ribosomal Protein Pseudogenes in the Human Genome Genome Res., October 1, 2002; 12(10): 1466 - 1482. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. S. Hekmat-Scafe, C. R. Scafe, A. J. McKinney, and M. A. Tanouye Genome-Wide Analysis of the Odorant-Binding Protein Gene Family in Drosophila melanogaster Genome Res., September 1, 2002; 12(9): 1357 - 1369. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Echols, P. Harrison, S. Balasubramanian, N. M. Luscombe, P. Bertone, Z. Zhang, and M. Gerstein Comprehensive analysis of amino acid and nucleotide composition in eukaryotic genomes, comparing genes and pseudogenes Nucleic Acids Res., June 1, 2002; 30(11): 2515 - 2523. [Abstract] [Full Text] [PDF] |
||||









