Nucleic Acids Research, 2000, Vol. 28, No. 1 49-55
© 2000 Oxford University Press
ProtoMap: automatic classification of protein sequences and hierarchy of protein families
Department of Structural Biology, Fairchild Building D-109, Stanford University, CA 94305, USA, 1Institute of Computer Science, Hebrew University, Jerusalem 91904, Israel and 2Department of Biological Chemistry, Institute of Life Sciences, Hebrew University, Jerusalem 91904, Israel
The ProtoMap site offers an exhaustive classification of all proteins in the SWISS-PROT database, into groups of related proteins. The classification is based on analysis of all pairwise similarities among protein sequences. The analysis makes essential use of transitivity to identify homologies among proteins. Within each group of the classification, every two members are either directly or transitively related. However, transitivity is applied restrictively in order to prevent unrelated proteins from clustering together. The classification is done at different levels of confidence, and yields a hierarchical organization of all proteins. The resulting classification splits the protein space into well-defined groups of proteins, which are closely correlated with natural biological families and superfamilies. Many clusters contain protein sequences that are not classified by other databases. The hierarchical organization suggested by our analysis may help in detecting finer subfamilies in families of known proteins. In addition it brings forth interesting relationships between protein families, upon which local maps for the neighborhood of protein families can be sketched. The ProtoMap web server can be accessed at http://www.protomap.cs.huji.ac.il
* To whom correspondence should be addressed. Tel: +1 650 725 0754; Fax: +1 650 723 8464; Email: golan@gimmel.stanford.edu
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
S. Wong and M. A. Ragan MACHOS: Markov clusters of homologous subsequences Bioinformatics, July 1, 2008; 24(13): i77 - i85. [Abstract] [PDF] |
||||
![]() |
I. V. Tetko, I. V. Rodchenkov, M. C. Walter, T. Rattei, and H.-W. Mewes Beyond the 'best' match: machine learning annotation of protein sequences by integration of different sources of information Bioinformatics, March 1, 2008; 24(5): 621 - 628. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Coronado, S. Mneimneh, S. L. Epstein, W.-G. Qiu, and P. N. Lipke Conserved Processes and Lineage-Specific Proteins in Fungal Cell Wall Evolution Eukaryot. Cell, December 1, 2007; 6(12): 2269 - 2277. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Coronado, O. Attie, S. L. Epstein, W.-G. Qiu, and P. N. Lipke Composition-Modified Matrices Improve Identification of Homologs of Saccharomyces cerevisiae Low-Complexity Glycoproteins. Eukaryot. Cell, April 1, 2006; 5(4): 628 - 637. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Paccanaro, J. A. Casbon, and M. A. S. Saqi Spectral clustering of protein sequences Nucleic Acids Res., March 17, 2006; 34(5): 1571 - 1580. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. D. Finn, J. Mistry, B. Schuster-Bockler, S. Griffiths-Jones, V. Hollich, T. Lassmann, S. Moxon, M. Marshall, A. Khanna, R. Durbin, et al. Pfam: clans, web tools and services Nucleic Acids Res., January 1, 2006; 34(suppl_1): D247 - D251. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Petryszak, E. Kretschmann, D. Wieser, and R. Apweiler The predictive power of the CluSTr database Bioinformatics, September 15, 2005; 21(18): 3604 - 3609. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Kunin, S. A. Teichmann, M. A. Huynen, and C. A. Ouzounis The properties of protein family space depend on experimental design Bioinformatics, June 1, 2005; 21(11): 2618 - 2622. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Kifer, O. Sasson, and M. Linial Predicting fold novelty based on ProtoNet hierarchical classification Bioinformatics, April 1, 2005; 21(7): 1020 - 1027. [Abstract] [Full Text] [PDF] |
||||
![]() |
Q. J. Su, L. Lu, S. Saxonov, and D. L. Brutlag eBLOCKs: enumerating conserved protein blocks to achieve maximal sensitivity and specificity Nucleic Acids Res., January 1, 2005; 33(suppl_1): D178 - D182. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Alam, A. Dress, M. Rehmsmeier, and G. Fuellen Comparative homology agreement search: An effective combination of homology-search methods PNAS, September 21, 2004; 101(38): 13814 - 13819. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. A. Marti-Renom, M.S. Madhusudhan, and A. Sali Alignment of protein sequences by their profiles Protein Sci., April 1, 2004; 13(4): 1071 - 1087. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Kaplan, A. Vaaknin, and M. Linial PANDORA: keyword-based analysis of protein sets by integration of annotation sources Nucleic Acids Res., October 1, 2003; 31(19): 5617 - 5626. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. I. Sadreyev, D. Baker, and N. V. Grishin Profile-profile comparisons by COMPASS predict intricate homologies between protein families Protein Sci., October 1, 2003; 12(10): 2262 - 2272. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Zhang, S. Kochhar, and M. Grigorov Exploring the sequence-structure protein landscape in the glycosyltransferase family Protein Sci., October 1, 2003; 12(10): 2291 - 2302. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Enright, V. Kunin, and C. A. Ouzounis Protein families and TRIBES in genome sequence space Nucleic Acids Res., August 1, 2003; 31(15): 4632 - 4638. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Liu and A. Mushegian Three monophyletic superfamilies account for the majority of the known glycosyltransferases Protein Sci., July 1, 2003; 12(7): 1418 - 1431. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Sasson, A. Vaaknin, H. Fleischer, E. Portugaly, Y. Bilu, N. Linial, and M. Linial ProtoNet: hierarchical classification of the protein space Nucleic Acids Res., January 1, 2003; 31(1): 348 - 352. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Chetouani, P. Glaser, and F. Kunst FindTarget: software for subtractive genome analysis Microbiology, October 1, 2001; 147(10): 2643 - 2649. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bertone, Y. Kluger, N. Lan, D. Zheng, D. Christendat, A. Yee, A. M. Edwards, C. H. Arrowsmith, G. T. Montelione, and M. Gerstein SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics Nucleic Acids Res., July 1, 2001; 29(13): 2884 - 2898. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. M. Harrison, N. Echols, and M. B. Gerstein Digging for dead genes: an analysis of the characteristics of the pseudogene population in the Caenorhabditis elegans genome Nucleic Acids Res., February 1, 2001; 29(3): 818 - 830. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. A. T. Silverstein, E. Shoop, J. E. Johnson, A. Kilian, J. L. Freeman, T. M. Kunau, I. A. Awad, M. Mayer, and E. F. Retzel The MetaFam Server: a comprehensive protein family resource Nucleic Acids Res., January 1, 2001; 29(1): 49 - 51. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Perrière, L. Duret, and M. Gouy HOBACGEN: Database System for Comparative Genomics in Bacteria Genome Res., March 1, 2000; 10(3): 379 - 385. [Abstract] [Full Text] |
||||
![]() |
E. Portugaly and M. Linial Estimating the probability for a protein to have a new fold: A statistical computational model PNAS, May 9, 2000; 97(10): 5161 - 5166. [Abstract] [Full Text] [PDF] |
||||






