Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (234K) Freely available
Right arrow Supplementary Material
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (58)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Enright, A. J.
Right arrow Articles by Ouzounis, C. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Enright, A. J.
Right arrow Articles by Ouzounis, C. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2003, Vol. 31, No. 15 4632-4638
© 2003 Oxford University Press

Protein families and TRIBES in genome sequence space

Anton J. Enright, Victor Kunin and Christos A. Ouzounis*

Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK

*To whom correspondence should be addressed. Tel: +44 1223 494653; Fax: +44 1223 494471; Email: ouzounis{at}ebi.ac.uk
Present address:
Anton J. Enright, Computational Biology Center, Memorial Sloan Kettering Cancer Center, 1275 York Avenue, Box no. 460, New York, NY 10021, USA
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors

Accurate detection of protein families allows assignment of protein function and the analysis of functional diversity in complete genomes. Recently, we presented a novel algorithm called TribeMCL for the detection of protein families that is both accurate and efficient. This method allows family analysis to be carried out on a very large scale. Using TribeMCL, we have generated a resource called TRIBES that contains protein family information, comprising annotations, protein sequence alignments and phylogenetic distributions describing 311 257 proteins from 83 completely sequenced genomes. The analysis of at least 60 934 detected protein families reveals that, with the essential families excluded, paralogy levels are similar between prokaryotes, irrespective of genome size. The number of essential families is estimated to be between 366 and 426. We also show that the currently known space of protein families is scale free and discuss the implications of this distribution. In addition, we show that smaller families are often formed by shorter proteins and discuss the reasons for this intriguing pattern. Finally, we analyse the functional diversity of protein families in entire genome sequences. The TRIBES protein family resource is accessible at http://www.ebi.ac.uk/research/cgg/tribes/.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
A. Kuzniar, K. Lin, Y. He, H. Nijveen, S. Pongor, and J. A. M. Leunissen
ProGMap: an integrated annotation resource for protein orthology
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W428 - W434.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
P. G. Falkowski, T. Fenchel, and E. F. Delong
The Microbial Engines That Drive Earth's Biogeochemical Cycles
Science, May 23, 2008; 320(5879): 1034 - 1039.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Xu, J. Wu, J. Xiao, Y. Tan, Q. Bao, F. Zhao, and X. Li
PlasmoGF: an integrated system for comparative genomics and phylogenetic analysis of Plasmodium gene families
Bioinformatics, May 1, 2008; 24(9): 1217 - 1220.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
T. Kosaka, S. Kato, T. Shimoyama, S. Ishii, T. Abe, and K. Watanabe
The genome of Pelotomaculum thermopropionicum reveals niche-associated evolution in anaerobic microbiota
Genome Res., March 1, 2008; 18(3): 442 - 448.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Ding, Y. Sun, H. Li, Z. Wang, H. Fan, C. Wang, D. Yang, and Y. Li
EPGD: a comprehensive web resource for integrating and displaying eukaryotic paralog/paralogon information
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D255 - D262.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
K. Brugger
The Sulfolobus database
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D413 - D415.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
B. E. Shakhnovich and E. V. Koonin
Origins and impact of constraints in evolution of gene families
Genome Res., December 1, 2006; 16(12): 1529 - 1536.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. Massjouni, C. G. Rivera, and T. M. Murali
VIRGO: computational prediction of gene functions.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W340 - W344.
[Abstract] [Full Text] [PDF]


Home page
Phil Trans R Soc BHome page
R. L Marsden, J. A.G Ranea, A. Sillero, O. Redfern, C. Yeats, M. Maibaum, D. Lee, S. Addou, G. A Reeves, T. J Dallman, et al.
Exploiting protein structure data to explore the evolution of protein function and biological complexity
Phil Trans R Soc B, March 29, 2006; 361(1467): 425 - 440.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. M. Duarte, L. Cui, P. K. Wall, Q. Zhang, X. Zhang, J. Leebens-Mack, H. Ma, N. Altman, and C. W. dePamphilis
Expression Pattern Shifts Following Duplication Indicative of Subfunctionalization and Neofunctionalization in Regulatory Genes of Arabidopsis
Mol. Biol. Evol., February 1, 2006; 23(2): 469 - 478.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Che, G. Li, F. Mao, H. Wu, and Y. Xu
Detecting uber-operons in prokaryotic genomes.
Nucleic Acids Res., January 1, 2006; 34(8): 2418 - 2427.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
L. Cui, N. Veeraraghavan, A. Richter, K. Wall, R. K. Jansen, J. Leebens-Mack, I. Makalowska, and C. W. dePamphilis
ChloroplastDB: the Chloroplast Genome Database
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D692 - D696.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Goldovsky, P. Janssen, D. Ahren, B. Audit, I. Cases, N. Darzentas, A. J. Enright, N. Lopez-Bigas, J. M. Peregrin-Alvarez, M. Smith, et al.
CoGenT++: an extensive and extensible data environment for computational genomics
Bioinformatics, October 1, 2005; 21(19): 3806 - 3810.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Kunin, S. A. Teichmann, M. A. Huynen, and C. A. Ouzounis
The properties of protein family space depend on experimental design
Bioinformatics, June 1, 2005; 21(11): 2618 - 2622.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Horan, J. Lauricha, J. Bailey-Serres, N. Raikhel, and T. Girke
Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice
Plant Physiology, May 1, 2005; 138(1): 47 - 54.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
V. Kunin, J. B. Pereira-Leal, and C. A. Ouzounis
Functional Evolution of the Yeast Protein Interaction Network
Mol. Biol. Evol., July 1, 2004; 21(7): 1171 - 1176.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.