Nucleic Acids Research, 2003, Vol. 31, No. 15 4632-4638
© 2003 Oxford University Press
Protein families and TRIBES in genome sequence space
Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK
*To whom correspondence should be addressed. Tel: +44 1223 494653; Fax: +44 1223 494471; Email: ouzounis{at}ebi.ac.uk
Present address:
Anton J. Enright, Computational Biology Center, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, Box no. 460, New York, NY 10021, USA
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
Accurate detection of protein families allows assignment of protein function and the analysis of functional diversity in complete genomes. Recently, we presented a novel algorithm called TribeMCL for the detection of protein families that is both accurate and efficient. This method allows family analysis to be carried out on a very large scale. Using TribeMCL, we have generated a resource called TRIBES that contains protein family information, comprising annotations, protein sequence alignments and phylogenetic distributions describing 311 257 proteins from 83 completely sequenced genomes. The analysis of at least 60 934 detected protein families reveals that, with the essential families excluded, paralogy levels are similar between prokaryotes, irrespective of genome size. The number of essential families is estimated to be between 366 and 426. We also show that the currently known space of protein families is scale free and discuss the implications of this distribution. In addition, we show that smaller families are often formed by shorter proteins and discuss the reasons for this intriguing pattern. Finally, we analyse the functional diversity of protein families in entire genome sequences. The TRIBES protein family resource is accessible at http://www.ebi.ac.uk/research/cgg/tribes/.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Kuzniar, K. Lin, Y. He, H. Nijveen, S. Pongor, and J. A. M. Leunissen ProGMap: an integrated annotation resource for protein orthology Nucleic Acids Res., July 1, 2009; 37(suppl_2): W428 - W434. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. G. Falkowski, T. Fenchel, and E. F. Delong The Microbial Engines That Drive Earth's Biogeochemical Cycles Science, May 23, 2008; 320(5879): 1034 - 1039. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Xu, J. Wu, J. Xiao, Y. Tan, Q. Bao, F. Zhao, and X. Li PlasmoGF: an integrated system for comparative genomics and phylogenetic analysis of Plasmodium gene families Bioinformatics, May 1, 2008; 24(9): 1217 - 1220. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Kosaka, S. Kato, T. Shimoyama, S. Ishii, T. Abe, and K. Watanabe The genome of Pelotomaculum thermopropionicum reveals niche-associated evolution in anaerobic microbiota Genome Res., March 1, 2008; 18(3): 442 - 448. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Ding, Y. Sun, H. Li, Z. Wang, H. Fan, C. Wang, D. Yang, and Y. Li EPGD: a comprehensive web resource for integrating and displaying eukaryotic paralog/paralogon information Nucleic Acids Res., January 11, 2008; 36(suppl_1): D255 - D262. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Brugger The Sulfolobus database Nucleic Acids Res., January 12, 2007; 35(suppl_1): D413 - D415. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Shakhnovich and E. V. Koonin Origins and impact of constraints in evolution of gene families Genome Res., December 1, 2006; 16(12): 1529 - 1536. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Massjouni, C. G. Rivera, and T. M. Murali VIRGO: computational prediction of gene functions. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W340 - W344. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. L Marsden, J. A.G Ranea, A. Sillero, O. Redfern, C. Yeats, M. Maibaum, D. Lee, S. Addou, G. A Reeves, T. J Dallman, et al. Exploiting protein structure data to explore the evolution of protein function and biological complexity Phil Trans R Soc B, March 29, 2006; 361(1467): 425 - 440. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. M. Duarte, L. Cui, P. K. Wall, Q. Zhang, X. Zhang, J. Leebens-Mack, H. Ma, N. Altman, and C. W. dePamphilis Expression Pattern Shifts Following Duplication Indicative of Subfunctionalization and Neofunctionalization in Regulatory Genes of Arabidopsis Mol. Biol. Evol., February 1, 2006; 23(2): 469 - 478. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Che, G. Li, F. Mao, H. Wu, and Y. Xu Detecting uber-operons in prokaryotic genomes. Nucleic Acids Res., January 1, 2006; 34(8): 2418 - 2427. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Cui, N. Veeraraghavan, A. Richter, K. Wall, R. K. Jansen, J. Leebens-Mack, I. Makalowska, and C. W. dePamphilis ChloroplastDB: the Chloroplast Genome Database Nucleic Acids Res., January 1, 2006; 34(suppl_1): D692 - D696. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Goldovsky, P. Janssen, D. Ahren, B. Audit, I. Cases, N. Darzentas, A. J. Enright, N. Lopez-Bigas, J. M. Peregrin-Alvarez, M. Smith, et al. CoGenT++: an extensive and extensible data environment for computational genomics Bioinformatics, October 1, 2005; 21(19): 3806 - 3810. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kunin, S. A. Teichmann, M. A. Huynen, and C. A. Ouzounis The properties of protein family space depend on experimental design Bioinformatics, June 1, 2005; 21(11): 2618 - 2622. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Horan, J. Lauricha, J. Bailey-Serres, N. Raikhel, and T. Girke Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice Plant Physiology, May 1, 2005; 138(1): 47 - 54. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kunin, J. B. Pereira-Leal, and C. A. Ouzounis Functional Evolution of the Yeast Protein Interaction Network Mol. Biol. Evol., July 1, 2004; 21(7): 1171 - 1176. [Abstract] [Full Text] [PDF] |
||||






