Nucleic Acids Research, 2000, Vol. 28, No. 1 289-291
© 2000 Oxford University Press
DIP: the Database of Interacting Proteins
UCLA-DOE Laboratory of Structural Biology and Molecular Medicine, Molecular Biology Institute, PO Box 951570, UCLA, Los Angeles, CA 90095-1570, USA
Received August 31, 1999; Revised and Accepted October 4, 1999.
| ABSTRACT |
|---|
|
|
|---|
The Database of Interacting Proteins (DIP; http://dip.doe-mbi.ucla.edu ) is a database that documents experimentally determined proteinprotein interactions. This database is intended to provide the scientific community with a comprehensive and integrated tool for browsing and efficiently extracting information about protein interactions and interaction networks in biological processes. Beyond cataloging details of proteinprotein interactions, the DIP is useful for understanding protein function and proteinprotein relationships, studying the properties of networks of interacting proteins, benchmarking predictions of proteinprotein interactions, and studying the evolution of proteinprotein interactions.
| INTRODUCTION |
|---|
|
|
|---|
The Database of Interacting Proteins (DIP) aims to integrate the diverse body of experimental knowledge about interacting proteins into a single, easily accessed database. Biological knowledge about proteinprotein interactions is contained in many different scientific journals and in archives such as MEDLINE (National Library of Medicine, MD, USA). Although the literature and archives are used daily by the scientific community, retrieving specialized data from such sources requires more effort than from the DIP, which combines information from multiple observations and experimental techniques as well as providing information about networks of interacting proteins.
The primary goal of DIP is to extract and integrate the wealth of information about proteinprotein interactions into a user-friendly environment. Although organism-specific databases such as YPD (1) for yeast, EcoCyc (2) for Escherichia coli, and FlyNet for Drosophila (3) often contain information regarding protein pathways and protein complexes as do pathway databases such as KEGG (4) and CNSB (5), the DIP was created to complement the existing databases and to include interacting proteins from many organisms allowing scientists to expand and complement the observations of proteinprotein interactions in one organism with observations from other organisms.
| DESCRIPTION AND STRUCTURE OF THE DATABASE |
|---|
|
|
|---|
In its original conception (6), information on protein interaction was stored in the DIP as a single text file. To handle effectively the growing body of data, the DIP has now been implemented as a relational database written in the programming language SQL, specifically mySQL (TcX Sweden). SQL efficiently handles diverse types of data and enables rapid sorting and analysis. The database can be conveniently extended as required, without altering the existing database content, by adding new fields and tables to the data structure.
The DIP database is composed of three linked tables: a table of protein information, a table of proteinprotein interactions, and a table describing details of experiments detecting the proteinprotein interactions. These tables are shown schematically in Figure 1, and contain the following information.
|
(i) The protein information table contains protein identification codes from the SWISS-PROT (7), PIR (8) and GenBank (9) sequence databases, as well as each proteins gene name, description, enzyme code and cellular localization, when known.
(ii) The interaction table describes proteins that interact from the protein information table, as well as the ranges of amino acids and the protein domains involved in the proteinprotein interaction, when known.
(iii) The experimental article table details the experiments used to detect the interactions from the interaction table and their associated literature citations. This table includes the MEDLINE standard article code (PMID/UID), as well as the authors, title, journal and year of publication of the article. Over 20 different experimental techniques are represented in DIP, including co-immunoprecipitation, yeast two-hybrid and in vitro binding assays; for a complete list see http://dip.doe-mbi.ucla.edu/help.html . Where determined, a dissociation constant is also included.
Each interacting protein is linked to an interaction in the interaction table. Linked to the same interaction are one or more experiments from the experiment table, because some interactions are determined with many different experiments. For example, the interaction between the human proto-oncogene h-ras-1 and the ras interactor RIN1 is documented in DIP by four experimental methods (10). The scientist can therefore evaluate the quality of an interaction by the particular experiments performed.
| SEARCHING THE DATABASE OF INTERACTING PROTEINS |
|---|
|
|
|---|
Currently proteinprotein interactions are entered into the DIP only following publication in peer-reviewed journals. Entry is done manually by the curator, followed by automated tests that show the proteins and citations exist. Interactions are double-checked by a second curator and flagged accordingly in the database.
DIP can be searched in a variety of ways. One can look for interactions involving a specific protein by entering its gene name or its accession code from GenBank, PIR or SWISS-PROT. More general searches can be performed for information such as organisms, protein superfamilies, keywords, experimental techniques or literature citations. Multiple fields can be searched simultaneously to narrow the query, and the use of wildcards and regular expressions is supported to further aid in searching. A search returns a list of proteinprotein interactions, each hyperlinked to a DIP entry. Each resulting DIP entry reports information about the two interacting proteins, the protein domains and range of amino acids involved, the curator, date of entry and updating and the articles describing the interaction, and the corresponding experiments. For example, a search on a single protein returns all of the interactions recorded in DIP in which that protein participates.
| CURRENT STATE OF THE DIP |
|---|
|
|
|---|
As of August 1999, the DIP contains 1089 unique proteins and 1269 pairwise interactions. Numerous large networks of proteinprotein interactions are represented in DIP, as illustrated in Figure 2 for 57 proteins controlling cell cycle and transcription. The largest such network of proteins in DIP contains 514 proteins involved in cell cycle and transcription; each protein is connected to the network by at least one proteinprotein interaction.
|
| FUTURE DIRECTIONS |
|---|
|
|
|---|
Although the DIP has grown to its current state by manual entry of interaction data, we plan to implement automatic literature search and text extraction methods, such as those described by Blaschke et al. (11), to target interactions for inclusion in the database, followed by manual expert review. In the near future, we hope to include tools to visualize and analyze interaction network properties, as well as add a measure of interaction quality determined by the number and type of experiments defining the interaction. Finally, we hope to interactively link the DIP with the various sequence databases, such as SWISS-PROT, GenBank and PIR, as well to other databases containing interacting proteins such as KEGG, CNSB, Ecocyc, Flynet and YPD.
| DATA SUBMISSION |
|---|
|
|
|---|
As for SWISS-PROT, an example of a well-curated database, we would like expert curators to screen each entry to the DIP. For this reason, scientists are invited to visit and contribute to this database, which can be edited directly over the World Wide Web by obtaining a user account. To obtain an account, please contact us at dip@mbi.ucla.edu. Help for editing and submission is available online; questions can also be directed to dip@mbi.ucla.edu . Please feel free to send Email containing published proteinprotein interactions, and a curator will enter this information in the DIP.
| ACKNOWLEDGEMENTS |
|---|
We thank Rob Grothe for discussions at the beginning of the project and DOE for support. I.X. is a fellow of the Swiss National Fund for Research (SNFR).
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +1 310 825 3754; Fax: +1 310 206 3914; Email: david@mbi.ucla.edu
| REFERENCES |
|---|
|
|
|---|
-
1 Hodges,P.E., McKee,A.H., Davis,B.P., Payne,W.E. and Garrels,J.I. (1999) Nucleic Acids Res., 27, 6973. Updated article in this issue: Nucleic Acids Res. (2000), 28, 7376.
2 Karp,P.D., Riley,M., Paley,S.M., Pellegrini-Toole,A. and Krummenacker,M. (1999) Nucleic Acids Res., 27, 5558. Updated article in this issue: Nucleic Acids Res. (2000), 28, 5659.
3 Sanchez,C., Lachaize,C., Janody,F., Bellon,B., Roder,L., Euzenat,J., Rechenmann,F. and Jacq,B. (1999) Nucleic Acids Res., 27, 8994.
4 Ogata,H., Goto,S., Sato,K., Fujibuchi,W., Bono,H. and Kanehisa,M. (1999) Nucleic Acids Res., 27, 2934. Updated article in this issue: Nucleic Acids Res. (2000), 28, 2730.
5 Takai-Igarashi,T., Nadaoka,Y. and Kaminuma,T. (1998) J. Comput. Biol., 5, 747754.[ISI][Medline]
6 Marcotte,E.M., Pellegrini,M., Ng,H.L., Rice,D.W., Yeates,T.O. and Eisenberg,D. (1999) Science, 285, 751753.
7 Bairoch,A. and Apweiler,R. (1999) Nucleic Acids Res., 27, 4954. Updated article in this issue: Nucleic Acids Res. (2000), 28, 4548.
8 Barker,W.C., Garavelli,J.S., McGarvey,P.B., Marzec,C.R., Orcutt,B.C., Srinivasarao,G.Y., Yeh,L.S., Ledley,R.S., Mewes,H.W., Pfeiffer,F., Tsugita,A. and Wu,C. (1999) Nucleic Acids Res., 27, 3943. Updated article in this issue: Nucleic Acids Res. (2000), 28, 4144.
9 Benton,D. (1990) Nucleic Acids Res., 18, 15171520.
10 Han,L., Wong,D., Dhaka,A., Afar,D., White,M., Xie,W., Herschman,H., Witte,O. and Colicelli,J. (1997) Proc. Natl Acad. Sci. USA, 94, 49544959.
11Blaschke,C., Andrade,M.A., Ouzounis,C. and Valencia,A. (1990) ISMB, 99, 6067.
This article has been cited by other articles:
![]() |
D. Park, B.-C. Kim, S.-W. Cho, S.-J. Park, J.-S. Choi, S. I. Kim, J. Bhak, and S. Lee MassNet: a functional annotation service for protein mass spectrometry data Nucleic Acids Res., July 1, 2008; 36(suppl_2): W491 - W495. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Ciriello and C. Guerra A review on models and algorithms for motif discovery in protein-protein interaction networks Brief Funct Genomic Proteomic, April 28, 2008; (2008) eln015v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. Higham, M. Rasajski, and N. Przulj Fitting a geometric graph to a protein-protein interaction network Bioinformatics, April 15, 2008; 24(8): 1093 - 1099. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Guo, X. Wu, D.-Y. Zhang, and K. Lin Genome-wide inference of protein interaction sites: lessons from the yeast high-quality negative protein-protein interaction dataset Nucleic Acids Res., April 1, 2008; 36(6): 2002 - 2011. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Presser, M. B. Elowitz, M. Kellis, and R. Kishony The evolutionary dynamics of the Saccharomyces cerevisiae protein interaction network after duplication PNAS, January 22, 2008; 105(3): 950 - 954. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Wierling, R. Herwig, and H. Lehrach Resources, standards and tools for systems biology Brief Funct Genomic Proteomic, October 17, 2007; (2007) elm027v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Suderman and M. Hallett Tools for visually exploring biological networks Bioinformatics, October 15, 2007; 23(20): 2651 - 2659. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Yi, S.-H. Sze, and M. R. Thon Identifying clusters of functionally related genes in genomes Bioinformatics, May 1, 2007; 23(9): 1053 - 1060. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. R. Jefferson, T. P. Walsh, T. J. Roberts, and G. J. Barton SNAPPI-DB: a database and API of Structures, iNterfaces and Alignments for Protein-Protein Interactions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D580 - D589. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Morozova, J. Allers, J. Myers, and Y. Shamoo Protein-RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures Bioinformatics, November 15, 2006; 22(22): 2746 - 2752. [Abstract] [Full Text] [PDF] |
||||
![]() |
M.-J. Han and S. Y. Lee The Escherichia coli Proteome: Past, Present, and Future Prospects Microbiol. Mol. Biol. Rev., June 1, 2006; 70(2): 362 - 439. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. H. Paszkiewicz, M. J. E. Sternberg, and M. Lappe Prediction of viable circular permutants using a graph theoretic approach Bioinformatics, June 1, 2006; 22(11): 1353 - 1358. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Arifuzzaman, M. Maeda, A. Itoh, K. Nishikata, C. Takita, R. Saito, T. Ara, K. Nakahigashi, H.-C. Huang, A. Hirai, et al. Large-scale identification of protein-protein interaction of Escherichia coli K-12 Genome Res., May 1, 2006; 16(5): 686 - 691. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. McDermott, R. Bumgarner, and R. Samudrala Functional annotation from predicted protein interaction networks Bioinformatics, August 1, 2005; 21(15): 3217 - 3226. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. C. Janga, J. Collado-Vides, and G. Moreno-Hagelsieb Nebulon: a system for the inference of functional relationships of gene products from the rearrangement of predicted operons Nucleic Acids Res., May 2, 2005; 33(8): 2521 - 2530. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. R. Brown and I. Jurisica Online Predicted Human Interaction Database Bioinformatics, May 1, 2005; 21(9): 2076 - 2082. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Leone and A. Pagnani Predicting protein functions with message passing algorithms Bioinformatics, January 15, 2005; 21(2): 239 - 247. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Forgacs, S. H. Yook, P. A. Janmey, H. Jeong, and C. G. Burd Role of the cytoskeleton in signaling networks J. Cell Sci., June 1, 2004; 117(13): 2769 - 2775. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Salwinski, C. S. Miller, A. J. Smith, F. K. Pettit, J. U. Bowie, and D. Eisenberg The Database of Interacting Proteins: 2004 update Nucleic Acids Res., January 1, 2004; 32(90001): D449 - 451. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Peri, J. D. Navarro, R. Amanchy, T. Z. Kristiansen, C. K. Jonnalagadda, V. Surendranath, V. Niranjan, B. Muthusamy, T.K.B. Gandhi, M. Gronborg, et al. Development of Human Protein Reference Database as an Initial Platform for Approaching Systems Biology in Humans Genome Res., October 1, 2003; 13(10): 2363 - 2371. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. M. Deane, L. Salwinski, I. Xenarios, and D. Eisenberg Protein Interactions: Two Methods for Assessment of the Reliability of High Throughput Observations Mol. Cell. Proteomics, May 1, 2002; 1(5): 349 - 356. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Aloy and R. B. Russell Interrogating protein interaction networks through structural biology PNAS, April 30, 2002; 99(9): 5896 - 5901. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Xenarios, L. Salwinski, X. J. Duan, P. Higney, S.-M. Kim, and D. Eisenberg DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions Nucleic Acids Res., January 1, 2002; 30(1): 303 - 305. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Mellor, I. Yanai, K. H. Clodfelter, J. Mintseris, and C. DeLisi Predictome: a database of putative functional links between proteins Nucleic Acids Res., January 1, 2002; 30(1): 306 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Jansen, D. Greenbaum, and M. Gerstein Relating Whole-Genome Expression Data with Protein-Protein Interactions Genome Res., January 1, 2002; 12(1): 37 - 46. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. M. Gomez, S.-H. Lo, and A. Rzhetsky Probabilistic Prediction of Unknown Metabolic and Signal-Transduction Networks Genetics, November 1, 2001; 159(3): 1291 - 1298. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Bertone, Y. Kluger, N. Lan, D. Zheng, D. Christendat, A. Yee, A. M. Edwards, C. H. Arrowsmith, G. T. Montelione, and M. Gerstein SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics Nucleic Acids Res., July 1, 2001; 29(13): 2884 - 2898. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Qian, B. Stenger, C. A. Wilson, J. Lin, R. Jansen, S. A. Teichmann, J. Park, W. G. Krebs, H. Yu, V. Alexandrov, et al. PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information Nucleic Acids Res., April 15, 2001; 29(8): 1750 - 1764. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki A comprehensive two-hybrid analysis to explore the yeast protein interactome PNAS, March 7, 2001; (2001) 61034498. [Abstract] [Full Text] |
||||
![]() |
I. Xenarios, E. Fernandez, L. Salwinski, X. J. Duan, M. J. Thompson, E. M. Marcotte, and D. Eisenberg DIP: The Database of Interacting Proteins: 2001 update Nucleic Acids Res., January 1, 2001; 29(1): 239 - 241. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. D. Bader, I. Donaldson, C. Wolting, B. F. F. Ouellette, T. Pawson, and C. W. V. Hogue BIND--The Biomolecular Interaction Network Database Nucleic Acids Res., January 1, 2001; 29(1): 242 - 245. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Ito, T. Chiba, R. Ozawa, M. Yoshida, M. Hattori, and Y. Sakaki A comprehensive two-hybrid analysis to explore the yeast protein interactome PNAS, April 10, 2001; 98(8): 4569 - 4574. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Aloy and R. B. Russell Interrogating protein interaction networks through structural biology PNAS, April 30, 2002; 99(9): 5896 - 5901. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||










