Nucleic Acids Research, 2001, Vol. 29, No. 1 239-241
© 2001 Oxford University Press
DIP: The Database of Interacting Proteins: 2001 update
UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, Molecular Biology Institute, PO Box 951570, UCLA, Los Angeles, CA 90095-1570, USA and 1Protein Pathways, 1034 Gayley Avenue, Los Angeles CA 90024, USA
Received October 2, 2000; Revised and Accepted October 17, 2000.
| ABSTRACT |
|---|
|
|
|---|
The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu) is a database that documents experimentally determined proteinprotein interactions. Since January 2000 the number of proteinprotein interactions in DIP has nearly tripled to 3472 and the number of proteins to 2659. New interactive tools have been developed to aid in the visualization, navigation and study of networks of protein interactions.
| INTRODUCTION |
|---|
|
|
|---|
The Database of Interacting Proteins (DIP) is a database that documents experimentally determined proteinprotein interactions. During the last year, an effort has been underway to increase the number of interactions described by DIP and to link DIP to major sequence and knowledge databases. Tools have been developed that enable the user to traverse the interaction networks and to visualize the various networks of protein complexes and biochemical pathways.
The past year has brought an increased interest in databases presenting knowledge about proteins in the context of the entire cell. This is in part due to the explosion of genome sequence data and to the development of DNA chip methods that rapidly produce large sets of gene expression data. Knowledge databases such as DIP can be used to interpret whole cell expression data (1) and should substantially improve these analyses in the future.
| GROWTH OF THE DATABASE |
|---|
|
|
|---|
The core of the DIP database structure is composed of three linked tables: one of protein information, one of proteinprotein interactions and one describing details of experiments (2). During this year, a table linking DIP to the YPD database provided by Proteome, Inc. has been added (3). DIP was expanded significantly by the addition of data from large-scale yeast two-hybrid experiments (4,5). Although many of these interactions are yet to be confirmed by other methods, the yeast two-hybrid studies offer a wealth of potential interactions. DIP can also be used to compare differences in yeast two-hybrid data from various sources.
Since January 2000, the number of articles in DIP reporting interaction experiments has increased from 500 to 1020. Correspondingly, DIP has increased in size from 1500 to 2659 proteins, and the number of interactions has nearly tripled, rising to 3472.
| STATE OF THE DATABASE |
|---|
|
|
|---|
The methods detecting interactions reported in DIP are summarized in Figure 1A. The majority of interactions in DIP have been detected by the yeast two-hybrid method, but a significant fraction by co-immunoprecipitation (coIP). Our hypothesis is that many proteinprotein interactions are first observed by the two-hybrid method, and then later confirmed by other methods. This type of hypothesis can be evaluated as DIP grows.
|
Some 16% of interactions have been detected by more than one method. In Figure 1B, we show the fraction of interactions detected by more than one method. The majority of interactions (84%) are detected by only a single experiment; of these 25% were determined by genome-wide yeast two-hybrid method (4,5). As new methods detect interactions already documented in DIP, we will add these confirmations. Although proteins from 79 organisms are present in DIP, some 65% of interactions documented at present are between yeast proteins (65%).
| CLUSTERS OF PROTEINS |
|---|
|
|
|---|
The DIP offers a large-scale picture of protein interaction networks. Perhaps not surprisingly given the homeostatic characteristics of cells, many of the proteins in DIP form a single connected network of interactions, accompanied by several smaller networks.
In total, 350 connected interaction networks are found in DIP; their size distribution is shown in Figure 2. The majority of interaction networks correspond to heterodimers (185) or homodimers (47), but larger networks range from 4 to 16 proteins in size, and the principal cluster contains 1495 proteins. A year ago, only 1089 proteins were contained in this network, and we suspect that as we increase the number of interactions in DIP, the smaller networks will merge with the principal network. The principal cluster of 1495 proteins is examined further in Figure 3B, where we show all interactions that are within three interaction steps from yeast actin.
|
|
| VISUALIZATION OF PROTEIN NETWORKS AND THEIR SUPPORTING EXPERIMENTAL METHODS |
|---|
|
|
|---|
DIP now includes an interactive web page that enables the user to traverse the network of interactions from any protein in the database. As shown in Figure 3A, the page is composed of three different frames: the upper right frame contains the graphical representation of the network, the upper left frame contains the protein information and the bottom frame lists the proteins that interact with the selected protein. Each frame is interactive. For example, clicking on a protein in the graphical map changes the protein information display.
As illustrated in Figure 3B for yeast actin, the detecting experimental methods can be superimposed over the network of protein interactions. Here, one can see that the most popular experiments is the two-hybrid test, and next most popular is co-immunoprecipitation, as already described in Figure 1A.
The goal of this new graphical representation is to allow users to grasp more easily the connection of their protein of interest with other proteins.
| FUTURE DIRECTIONS |
|---|
|
|
|---|
Several approaches have been proposed for automatic extraction of information for known proteinprotein interactions from MEDLINE (National Library of Medicine, MD).
We have used the abstracts of articles present in DIP to train a Bayesian classifier (6) to extract abstracts from MEDLINE that potentially describe protein interactions. Aided by this automated approach, a curator then checks the articles and enters the interactions into DIP. We expect to extract information on thousands of protein interactions from the literature using this approach.
Another planned improvement will be to allow users to submit protein sequences and to search for interactions by homologous proteins, as well as linking the DIP to predictions of interactions from the Rosetta Stone and Phylogenetic Profile methods (7).
Another planned improvement to DIP is to include protein modification states. This should allow users to examine interaction networks according to protein status (e.g. phosphorylation). We anticipate that this type of data will be useful for more complex modeling of the circuitry of interaction networks.
| DATA SUBMISSION AND CURATION |
|---|
|
|
|---|
We seek expert curators to screen entries into the DIP. Scientists are invited to contribute to this database, by submitting interactions directly over the World Wide Web after obtaining a user account. To obtain an account, please contact us at dip{at}mbi.ucla.edu Help for editing and submission is available online; questions can also be directed to dip{at}mbi.ucla.edu or at the fax number and address listed. Please feel free to send email containing published proteinprotein interactions, and a curator will enter this information in the DIP.
| ACKNOWLEDGEMENTS |
|---|
The authors thank Thomas Graeber and Ken Goodwill for discussion and critical reading of the manuscript. We thank DOE and NIH for support of DIP. I.X. is a fellow of the Swiss National Fund.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 310 825 3754; Fax: +1 310 206 3914; Email: david@mbi.ucla.edu
| REFERENCES |
|---|
|
|
|---|
-
1 Zien,A., Kueffner,R., Zimmer,R. and Lengauer,T. (2000) Analysis of Gene Expression Data with Pathway Scores. Ismb, 407417.
2 Xenarios,I., Rice,D.W., Salwinski,L., Baron,M.K., Marcotte,E.M. and Eisenberg,D. (2000) DIP: the database of interacting proteins. Nucleic Acids Res., 28, 289291.
3 Costanzo,M.C., Hogan,J.D., Cusick,M.E., Davis,B.P., Fancher,A.M., Hodges,P.E., Kondu,P., Lengieza,C., Lew-Smith,J.E., Lingner,C., Roberg-Perez,K.J., Tillberg,M., Brooks,J.E. and Garrels,J.I. (2000) The yeast proteome database (YPD) and Caenorhabditis elegans proteome database (WormPD): comprehensive resources for the organization and comparison of model organism protein information. Nucleic Acids Res., 28, 7376.
4 Ito,T., Tashiro,K., Muta,S., Ozawa,R., Chiba,T., Nishizawa,M., Yamamoto,K., Kuhara,S. and Sakaki,Y. (2000) Toward a protein-protein interaction map of the budding yeast: A comprehensive system to examine two-hybrid interactions in all possible combinations between the yeast proteins. Proc. Natl Acad. Sci. USA, 97, 11431147.
5 Uetz,P., Giot,L., Cagney,G., Mansfield,T.A., Judson,R.S., Knight,J.R., Lockshon,D., Narayan,V., Srinivasan,M., Pochart,P., Qureshi-Emili,A., Li,Y., Godwin,B., Conover,D., Kalbfleisch,T., Vijayadamodar,G., Yang,M., Johnston,M., Fields,S. and Rothberg,J.M. (2000) A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature, 403, 623627.[Medline]
6 Marcotte,E.M., Xenarios,I. and Eisenberg,D. (2001) Mining literature for protein-protein interaction. Bioinformatics, in press.
7 Eisenberg,D., Marcotte,E.M., Xenarios,I. and Yeates,T.O. (2000) Protein function in the post-genomic era. Nature, 405, 823826.[Medline]
This article has been cited by other articles:
![]() |
K. D. Bromberg, A. Ma'ayan, S. R. Neves, and R. Iyengar Design Logic of a Cannabinoid Receptor Signaling Network That Triggers Neurite Outgrowth Science, May 16, 2008; 320(5878): 903 - 909. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Presser, M. B. Elowitz, M. Kellis, and R. Kishony The evolutionary dynamics of the Saccharomyces cerevisiae protein interaction network after duplication PNAS, January 22, 2008; 105(3): 950 - 954. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. R. Jefferson, T. P. Walsh, T. J. Roberts, and G. J. Barton SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D580 - D589. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Massjouni, C. G. Rivera, and T. M. Murali VIRGO: computational prediction of gene functions. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W340 - W344. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Birkland and G. Yona BIOZON: a hub of heterogeneous biological data Nucleic Acids Res., January 1, 2006; 34(suppl_1): D235 - D242. [Abstract] [Full Text] [PDF] |
||||
![]() |
X.-W. Chen and M. Liu Prediction of protein-protein interactions using random decision forest framework Bioinformatics, December 15, 2005; 21(24): 4394 - 4400. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Kolker, K. S. Makarova, S. Shabalina, A. F. Picone, S. Purvine, T. Holzman, T. Cherny, D. Armbruster, R. S. Munson Jr, G. Kolesov, et al. Identification and functional analysis of 'hypothetical' genes expressed in Haemophilus influenzae Nucleic Acids Res., April 30, 2004; 32(8): 2353 - 2361. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Yeger-Lotem, S. Sattath, N. Kashtan, S. Itzkovitz, R. Milo, R. Y. Pinter, U. Alon, and H. Margalit Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction PNAS, April 20, 2004; 101(16): 5934 - 5939. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Schlitt, K. Palin, J. Rung, S. Dietmann, M. Lappe, E. Ukkonen, and A. Brazma From Gene Networks to Gene Function Genome Res., December 1, 2003; 13(12): 2568 - 2576. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Yeger-Lotem and H. Margalit Detection of regulatory circuits by integrating the cellular networks of protein-protein interactions and transcription regulation Nucleic Acids Res., October 15, 2003; 31(20): 6053 - 6061. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Obenauer, L. C. Cantley, and M. B. Yaffe Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs Nucleic Acids Res., July 1, 2003; 31(13): 3635 - 3641. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Nagashima, D. G. Silva, N. Petrovsky, L. A. Socha, H. Suzuki, R. Saito, T. Kasukawa, I. V. Kurochkin, A. Konagaya, and C. Schonbach Inferring Higher Functional Information for RIKEN Mouse Full-Length cDNA Clones With FACTS Genome Res., June 1, 2003; 13(6): 1520 - 1533. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. E. Neet and J. C. Lee Biophysical Characterization of Proteins in the Post-genomic Era of Proteomics Mol. Cell. Proteomics, June 1, 2002; 1(6): 415 - 420. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. J. Duan, I. Xenarios, and D. Eisenberg Describing Biological Protein Interactions in Terms of Protein States and State Transitions : THE LiveDIP DATABASE Mol. Cell. Proteomics, February 1, 2002; 1(2): 104 - 116. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions Nucleic Acids Res., January 1, 2002; 30(1): 303 - 305. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Ashman, M. F. Moran, F. Sicheri, T. Pawson, and M. Tyers Cell Signalling - The Proteomics of It All Sci. Signal., October 9, 2001; 2001(103): pe33 - pe33. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









