Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (378K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (228)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Enright, A. J.
Right arrow Articles by Ouzounis, C. A.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Enright, A. J.
Right arrow Articles by Ouzounis, C. A.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 7 1575-1584
© 2002 Oxford University Press

An efficient algorithm for large-scale detection of protein families

A. J. Enright*, S. Van Dongen1 and C. A. Ouzounis

Computational Genomics Group, The European Bioinformatics Institute, EMBL Cambridge Outstation, Cambridge CB10 1SD, UK and 1Centrum voor Wiskunde en Informatica, Kruislaan 413, NL-1098 SJ Amsterdam, The Netherlands

Detection of protein families in large databases is one of the principal research objectives in structural and functional genomics. Protein family classification can significantly contribute to the delineation of functional diversity of homologous proteins, the prediction of function based on domain architecture or the presence of sequence motifs as well as comparative genomics, providing valuable evolutionary insights. We present a novel approach called TRIBE-MCL for rapid and accurate clustering of protein sequences into families. The method relies on the Markov cluster (MCL) algorithm for the assignment of proteins into families based on precomputed sequence similarity information. This novel approach does not suffer from the problems that normally hinder other protein sequence clustering algorithms, such as the presence of multi-domain proteins, promiscuous domains and fragmented proteins. The method has been rigorously tested and validated on a number of very large databases, including SwissProt, InterPro, SCOP and the draft human genome. Our results indicate that the method is ideally suited to the rapid and accurate detection of protein families on a large scale. The method has been used to detect and categorise protein families within the draft human genome and the resulting families have been used to annotate a large proportion of human proteins.

* To whom correspondence should be addressed. Tel: +44 1223 494452; Fax: +44 1223 494468; Email: anton{at}ebi.ac.uk


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. Wong and M. A. Ragan
MACHOS: Markov clusters of homologous subsequences
Bioinformatics, July 1, 2008; 24(13): i77 - i85.
[Abstract] [PDF]


Home page
Nucleic Acids ResHome page
J. Reimand, L. Tooming, H. Peterson, P. Adler, and J. Vilo
GraphWeb: mining heterogeneous biological networks for gene modules with functional significance
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W452 - W459.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Brohee, K. Faust, G. Lima-Mendez, O. Sand, R. Janky, G. Vanderstocken, Y. Deville, and J. van Helden
NeAT: a toolbox for the analysis of biological networks, clusters, classes and pathways
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W444 - W451.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. Zhang, B.-H. Park, T. Karpinets, and N. F. Samatova
From pull-down data to protein interaction networks and complexes with biological relevance
Bioinformatics, April 1, 2008; 24(7): 979 - 986.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
W. D. Swingley, R. E. Blankenship, and J. Raymond
Integrating Markov Clustering and Molecular Phylogenetics to Reconstruct the Cyanobacterial Species Tree from Conserved Protein Families
Mol. Biol. Evol., April 1, 2008; 25(4): 643 - 654.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
T. Kosaka, S. Kato, T. Shimoyama, S. Ishii, T. Abe, and K. Watanabe
The genome of Pelotomaculum thermopropionicum reveals niche-associated evolution in anaerobic microbiota
Genome Res., March 1, 2008; 18(3): 442 - 448.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes
Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information
Bioinformatics, March 1, 2008; 24(5): 621 - 628.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
K. Vandenbroucke, S. Robbens, K. Vandepoele, D. Inze, Y. Van de Peer, and F. Van Breusegem
Hydrogen Peroxide-Induced Gene Expression across Kingdoms: A Comparative Analysis
Mol. Biol. Evol., March 1, 2008; 25(3): 507 - 516.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
W. D. Swingley, M. Chen, P. C. Cheung, A. L. Conrad, L. C. Dejesa, J. Hao, B. M. Honchak, L. E. Karbach, A. Kurdoglu, S. Lahiri, et al.
Niche adaptation and genome expansion in the chlorophyll d-producing cyanobacterium Acaryochloris marina
PNAS, February 12, 2008; 105(6): 2005 - 2010.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
A. Prachumwat and W.-H. Li
Gene number expansion and contraction in vertebrate genomes with respect to invertebrate genomes
Genome Res., February 1, 2008; 18(2): 221 - 232.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Rosvall and C. T. Bergstrom
Maps of random walks on complex networks reveal community structure
PNAS, January 29, 2008; 105(4): 1118 - 1123.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
M. G. Conte, S. Gaillard, N. Lanau, M. Rouard, and C. Perin
GreenPhylDB: a database for plant comparative genomics
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D991 - D998.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Rattei, P. Tischler, R. Arnold, F. Hamberger, J. Krebs, J. Krumsiek, B. Wachinger, V. Stumpflen, and W. Mewes
SIMAP structuring the network of protein similarities
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D289 - D292.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
P. K. Wall, J. Leebens-Mack, K. F. Muller, D. Field, N. S. Altman, and C. W. dePamphilis
PlantTribes: a gene and gene family resource for comparative genomics in plants
Nucleic Acids Res., January 11, 2008; 36(suppl_1): D970 - D976.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
M. Brilli, R. Fani, and P. Lio
Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes
Brief Bioinform, January 1, 2008; 9(1): 34 - 45.
[Abstract] [Full Text] [PDF]


Home page
Infect. Immun.Home page
A. M. Pollard, K. N. Onatolu, L. Hiller, K. Haldar, and L. J. Knoll
Highly Polymorphic Family of Glycosylphosphatidylinositol-Anchored Surface Antigens with Evidence of Developmental Regulation in Toxoplasma gondii
Infect. Immun., January 1, 2008; 76(1): 103 - 110.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
A. Mavroidi, D. M. Aanensen, D. Godoy, I. C. Skovsted, M. S. Kaltoft, P. R. Reeves, S. D. Bentley, and B. G. Spratt
Genetic Relatedness of the Streptococcus pneumoniae Capsular Biosynthetic Loci
J. Bacteriol., November 1, 2007; 189(21): 7841 - 7855.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y.-C. Lin, L.-C. Hsieh, M.-W. Kuo, J. Yu, H.-H. Kuo, W.-L. Lo, R.-J. Lin, A. L. Yu, and W.-H. Li
Human TRIM71 and Its Nematode Homologue Are Targets of let-7 MicroRNA and Its Zebrafish Orthologue Is Essential for Development
Mol. Biol. Evol., November 1, 2007; 24(11): 2525 - 2534.
[Abstract] [Full Text] [PDF]


Home page
DNA ResHome page
K. Kurokawa, T. Itoh, T. Kuwahara, K. Oshima, H. Toh, A. Toyoda, H. Takami, H. Morita, V. K. Sharma, T. P. Srivastava, et al.
Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut Microbiomes
DNA Res, October 16, 2007; (2007) dsm018v2.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
E. W. Ganko, B. C. Meyers, and T. J. Vision
Divergence in Expression between Duplicated Genes in Arabidopsis
Mol. Biol. Evol., October 1, 2007; 24(10): 2298 - 2309.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
E. D. Harrington, A. H. Singh, T. Doerks, I. Letunic, C. von Mering, L. J. Jensen, J. Raes, and P. Bork
Quantitative assessment of protein function prediction from metagenomics shotgun sequences
PNAS, August 28, 2007; 104(35): 13913 - 13918.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J.-R. Xu, J.-X. Zhang, B.-C. Han, L. Liang, and Z.-L. Ji
CytoSVM: an advanced server for identification of cytokine-receptor interactions
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W538 - W542.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Dutkowski and J. Tiuryn
Identification of functional modules from conserved ancestral protein protein interactions
Bioinformatics, July 1, 2007; 23(13): i149 - i158.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Feng and E. R.M. Tillier
A fast and flexible approach to oligonucleotide probe design for genomes and gene families
Bioinformatics, May 15, 2007; 23(10): 1195 - 1202.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
G. Yi, S.-H. Sze, and M. R. Thon
Identifying clusters of functionally related genes in genomes
Bioinformatics, May 1, 2007; 23(9): 1053 - 1060.
[Abstract] [Full Text] [PDF]


Home page
Microbiol. Mol. Biol. Rev.Home page
J. Filee, P. Siguier, and M. Chandler
Insertion Sequence Diversity in Archaea
Microbiol. Mol. Biol. Rev., March 1, 2007; 71(1): 121 - 157.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
S. R. Collins, P. Kemmeren, X.-C. Zhao, J. F. Greenblatt, F. Spencer, F. C. P. Holstege, J. S. Weissman, and N. J. Krogan
Toward a Comprehensive Atlas of the Physical Interactome of Saccharomyces cerevisiae
Mol. Cell. Proteomics, March 1, 2007; 6(3): 439 - 450.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. Gfeller, P. De Los Rios, A. Caflisch, and F. Rao
From the Cover: Complex network analysis of free-energy landscapes
PNAS, February 6, 2007; 104(6): 1817 - 1822.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
T. Dagan and W. Martin
Ancestral genome sizes specify the minimum rate of lateral gene transfer during prokaryote evolution
PNAS, January 16, 2007; 104(3): 870 - 875.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. Nikolski and D. J. Sherman
Family relationships: should consensus reign?--consensus clustering for protein families
Bioinformatics, January 15, 2007; 23(2): e71 - e76.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
W. Zhang, Y. Zhang, H. Zheng, C. Zhang, W. Xiong, J. G. Olyarchuk, M. Walker, W. Xu, M. Zhao, S. Zhao, et al.
SynDB: a Synapse protein DataBase based on synapse ontology
Nucleic Acids Res., January 12, 2007; 35(suppl_1): D737 - D741.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
S. M. Mount, V. Gotea, C.-F. Lin, K. Hernandez, and W. Makalowski
Spliceosomal small nuclear RNA genes in 11 insect genomes
RNA, January 1, 2007; 13(1): 5 - 14.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
V. Shoja and L. Zhang
A Roadmap of Tandemly Arrayed Genes in the Genomes of Human, Mouse, and Rat
Mol. Biol. Evol., November 1, 2006; 23(11): 2134 - 2141.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
S. Li, D. W. Ehrhardt, and S. Y. Rhee
Systematic Analysis of Arabidopsis Organelles and a Protein Localization Database for Facilitating Fluorescent Tagging of Full-Length Arabidopsis Proteins
Plant Physiology, June 1, 2006; 141(2): 527 - 539.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S.-H. Kim and S. V. Yi
Correlated Asymmetry of Sequence and Functional Divergence Between Duplicate Proteins of Saccharomyces cerevisiae
Mol. Biol. Evol., May 1, 2006; 23(5): 1068 - 1075.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
G. K. McEwen, A. Woolfe, D. Goode, T. Vavouri, H. Callaway, and G. Elgar
Ancient duplicated conserved noncoding elements in vertebrates: A genomic and functional analysis
Genome Res., April 1, 2006; 16(4): 451 - 465.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Paccanaro, J. A. Casbon, and M. A. S. Saqi
Spectral clustering of protein sequences
Nucleic Acids Res., March 17, 2006; 34(5): 1571 - 1580.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
P. Ternes, P. Sperling, S. Albrecht, S. Franke, J. M. Cregg, D. Warnecke, and E. Heinz
Identification of Fungal Sphingolipid C9-methyltransferases by Phylogenetic Profiling
J. Biol. Chem., March 3, 2006; 281(9): 5582 - 5592.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. L. Marsden, D. Lee, M. Maibaum, C. Yeats, and C. A. Orengo
Comprehensive genome analysis of 203 genomes provides structural genomics with new insights into protein family space
Nucleic Acids Res., February 15, 2006; 34(3): 1066 - 1080.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. M. Duarte, L. Cui, P. K. Wall, Q. Zhang, X. Zhang, J. Leebens-Mack, H. Ma, N. Altman, and C. W. dePamphilis
Expression Pattern Shifts Following Duplication Indicative of Subfunctionalization and Neofunctionalization in Regulatory Genes of Arabidopsis
Mol. Biol. Evol., February 1, 2006; 23(2): 469 - 478.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. Uchiyama
Hierarchical clustering algorithm for comprehensive orthologous-domain classification in multiple genomes
Nucleic Acids Res., January 25, 2006; 34(2): 647 - 658.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Yeats, M. Maibaum, R. Marsden, M. Dibley, D. Lee, S. Addou, and C. A. Orengo
Gene3D: modelling protein structure, function and evolution
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D281 - D284.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Chen, A. J. Mackey, C. J. Stoeckert Jr, and D. S. Roos
OrthoMCL-DB: querying a comprehensive multi-species collection of ortholog groups
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D363 - D368.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Hartmann, D. Lu, J. Phillips, and T. J. Vision
Phytome: a platform for plant comparative genomics
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D724 - D730.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. P. Olinski, L.-G. Lundin, and F. Hallbook
Conserved Synteny Between the Ciona Genome and Human Paralogons Identifies Large Duplication Events in the Molecular Evolution of the Insulin-Relaxin Gene Family
Mol. Biol. Evol., January 1, 2006; 23(1): 10 - 22.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
S. Kim, P. S. Soltis, K. Wall, and D. E. Soltis
Phylogeny and Domain Evolution in the APETALA2-like Gene Family
Mol. Biol. Evol., January 1, 2006; 23(1): 107 - 120.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
L. Goldovsky, P. Janssen, D. Ahren, B. Audit, I. Cases, N. Darzentas, A. J. Enright, N. Lopez-Bigas, J. M. Peregrin-Alvarez, M. Smith, et al.
CoGenT++: an extensive and extensible data environment for computational genomics
Bioinformatics, October 1, 2005; 21(19): 3806 - 3810.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. Bishop, T. Shah, R. Pelle, D. Hoyle, T. Pearson, L. Haines, A. Brass, H. Hulme, S. P. Graham, E. L. N. Taracha, et al.
Analysis of the transcriptome of the protozoan Theileria parva using MPSS reveals that the majority of genes are transcriptionally active in the schizont stage
Nucleic Acids Res., September 25, 2005; 33(17): 5503 - 5511.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Petryszak, E. Kretschmann, D. Wieser, and R. Apweiler
The predictive power of the CluSTr database
Bioinformatics, September 15, 2005; 21(18): 3604 - 3609.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
A. T. R. Vasconcelos, H. B. Ferreira, C. V. Bizarro, S. L. Bonatto, M. O. Carvalho, P. M. Pinto, D. F. Almeida, L. G. P. Almeida, R. Almeida, L. Alves-Filho, et al.
Swine and Poultry Pathogens: the Complete Genome Sequences of Two Strains of Mycoplasma hyopneumoniae and a Strain of Mycoplasma synoviae
J. Bacteriol., August 15, 2005; 187(16): 5568 - 5577.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
M. W. Hahn, T. De Bie, J. E. Stajich, C. Nguyen, and N. Cristianini
Estimating the tempo and mode of gene family evolution from comparative genomic data
Genome Res., August 1, 2005; 15(8): 1153 - 1160.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
V. Kunin, L. Goldovsky, N. Darzentas, and C. A. Ouzounis
The net of life: Reconstructing the microbial phylogenetic network
Genome Res., July 1, 2005; 15(7): 954 - 959.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. E. Donald and E. I. Shakhnovich
Determining functional specificity from protein sequences
Bioinformatics, June 1, 2005; 21(11): 2629 - 2635.
[Abstract] [Full Text] [PDF]


Home page
Genome Res.Home page
N. R. Thomson, C. Yeats, K. Bell, M. T.G. Holden, S. D. Bentley, M. Livingstone, A. M. Cerdeno-Tarraga, B. Harris, J. Doggett, D. Ormond, et al.
The Chlamydophila abortus genome sequence reveals an array of variable proteins that contribute to interspecies variation
Genome Res., May 1, 2005; 15(5): 629 - 640.
[Abstract] [Full Text] [PDF]


Home page
Plant Physiol.Home page
K. Horan, J. Lauricha, J. Bailey-Serres, N. Raikhel, and T. Girke
Genome Cluster Database. A Sequence Family Analysis Platform for Arabidopsis and Rice
Plant Physiology, May 1, 2005; 138(1): 47 - 54.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Prigent, J. C. Thierry, O. Poch, and F. Plewniak
DbW: automatic update of a functional family-specific multiple alignment
Bioinformatics, April 15, 2005; 21(8): 1437 - 1442.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. A. C. Singer, A. T. Lloyd, L. B. Huminiecki, and K. H. Wolfe
Clusters of Co-expressed Genes in Mammalian Genomes Are Conserved by Natural Selection
Mol. Biol. Evol., March 1, 2005; 22(3): 767 - 775.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
N. Hall, M. Karras, J. D. Raine, J. M. Carlton, T. W. A. Kooij, M. Berriman, L. Florens, C. S. Janssen, A. Pain, G. K. Christophides, et al.
A Comprehensive Survey of the Plasmodium Life Cycle by Genomic, Transcriptomic, and Proteomic Analyses
Science, January 7, 2005; 307(5706): 82 - 86.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
Y. Chen, Y. Zhang, Y. Yin, G. Gao, S. Li, Y. Jiang, X. Gu, and J. Luo
SPD--a web-based secreted protein database
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D169 - D173.
[Abstract] [Full Tex