Nucleic Acids Research, 2004, Vol. 32, Database issue D449-D451
© 2004 Oxford University Press
The Database of Interacting Proteins: 2004 update
1 Howard Hughes Medical Institute, 2 UCLA-DOE Institute for Genomics and Proteomics, 3 Department of Chemistry and Biochemistry and 4 Department of Biological Chemistry, Molecular Biology Institute, Box 951570, UCLA, Los Angeles, CA 90095-1570, USA
*To whom correspondence should be addressed. Tel: +1 310 825 3754; Fax: +1 310 206 3914; Email: david{at}mbi.ucla.edu
Received September 18, 2003; Revised and Accepted October 6, 2003
| ABSTRACT |
|---|
|
|
|---|
The Database of Interacting Proteins (http://dip.doe-mbi.ucla.edu) aims to integrate the diverse body of experimental evidence on proteinprotein interactions into a single, easily accessible online database. Because the reliability of experimental evidence varies widely, methods of quality assessment have been developed and utilized to identify the most reliable subset of the interactions. This CORE set can be used as a reference when evaluating the reliability of high-throughput proteinprotein interaction data sets, for development of prediction methods, as well as in the studies of the properties of protein interaction networks.
| INTRODUCTION |
|---|
|
|
|---|
The Database of Interacting Proteins (DIP) was initially developed (1) to store and organize information on binary proteinprotein interactions that was retrieved from individual research articles. Over the course of the last 4 years the progress in genome-scale experimental methods has resulted in rapid identification of binary proteinprotein interactions (2,3) and multi-protein complexes (4,5). On one hand, it prompted enhancements to the database schema that allow the capture, with increased level of detail, of information on the molecular interactions. On the other hand, questions about the reliability of the experiments conducted on a genome-wide scale stimulated development of data quality assessment methods (6).
| STRUCTURE OF THE DATABASE |
|---|
|
|
|---|
The DIP database is implemented as a relational database using an open source PostgreSQL database management system (http://www.postgresql.org). The simplified version of the current database schema is shown in Figure 1. The key tablesPROTEIN, SOURCE and EVIDENCEstore, respectively, information on individual proteins, sources of experimental information and information on individual experiments. The information on proteinprotein interactions is stored in two tablesINTERACTION and INT_PRT. Such arrangement of the tables enables description of binary interactions (two entries in the INT_PRT table for each INTERACTION entry) but also of multi-protein complexes (more than two entries in INT_PRT for each INTERACTION entry). The METHOD table provides a list of controlled vocabulary terms, together with references to the corresponding PSI ontology entries (7), which are used to annotate the experiments.
|
When available, information on the details of the topology of a molecular complex that was inferred from each experiment is stored in the TOPOLOGY and LOCATION tables. The LOCATION table describes regions of proteins participating in interactions whereas the TOPOLOGY table pairs them into records that describe observed binary interactions. It also specifies the type of interaction inferred from each experiment as one of aggregate (both partners shown to be present in the same complex but not necessarily in direct contact), contact or covalent bond.
| DATABASE GROWTH |
|---|
|
|
|---|
Since our previous NAR report was published (8), the number of distinct binary proteinprotein interactions has nearly doubled and, as of September 2003, exceeds 18 500. Even more importantly, the number of research articles referenced in DIP has grown to more than 2500, providing a broad perspective on experimental approaches used to determine proteinprotein interactions. It makes DIP an ideal starting point when comparing and assessing the reliability of different experimental methodologies, including high-throughput interaction screens.
In addition to the information extracted from the research literature, the database has been recently enriched with information obtained by analyzing the structures of protein complexes deposited in the Protein Data Bank (9). As of September 2003 analysis of protein hetero-complexes in the PDB database resulted in the identification of
2000 structures describing proteinprotein interactions at the atomic level. We are in the process of entering this information into the database.
| QUALITY ASSESSMENT |
|---|
|
|
|---|
The recent development of high-throughput technologies for the detection of proteinprotein interactions, such as large-scale yeast two-hybrid screens (2,3), protein microarrays (10) and mass spectrometric analysis of affinity purified multi-protein complexes (4,5), has resulted in a rapid accumulation of proteinprotein interaction data. However, small overlaps between the high-throughput data sets and, often, lack of agreement with small-scale experiments (11) gave rise to questions about the reliability of high-throughput approaches and about the compatibility of their results with those obtained using conventional methods. As a result, a number of attempts has been made to assess the quality of the high-throughput data (6,12,13). They demonstrated large differences in quality between data sets, some of which can contain many erroneously identified interactions (false positives) (11).
In order to evaluate the reliability of individual interactions reported in DIP a number of tests are used to identify the most reliable core subset of the interactions. The tests range from a simple evaluation based on the reliability of individual experimental methods to the analysis of the patterns of interactions between analogous proteins using the PVM method (6).
Besides analysis of the data already present in the DIP database, the evaluation methods are implemented as publicly available services (http://dip.doe-mbi.ucla.edu/dip/Services.cgi) that can be used to evaluate the reliability of new experimental and predicted interactions. Those services include our previously described PVM and EPR methods (6) as well as the Domain Pair Verification (DPV) method, which analyses domaindomain interaction preferences as described by Deng et al. (14).
| DATA ACCESS AND EXCHANGE |
|---|
|
|
|---|
All the DIP data can be accessed online in both interactive and batch modes. The interactive, Web-based interface allows users to query the database for a specific protein based on its name, annotation or species of origin. In case the protein of interest is not yet present in the database, it is also possible to perform sequence similarity (BLAST) and motif searches in order to identify closely related proteins. The pattern of interaction of these might provide insights into the potential but not yet identified interactions of the query protein.
In the batch mode, different subsets of the DIP database can be downloaded in a variety of formats ranging from the native XML-based XIN format to simple, tab-delimited text files that are ready to be imported into spreadsheet applications. The DIP data are also provided in the Molecular Interaction Format (MIF) developed under the auspices of the Human Proteome Organization (HUPO) Proteomics Standards Initiative (7). MIF is a community-developed data standard that provides a database-independent platform for the exchange of information on proteinprotein interactions. It is expected to be supported by the major providers of protein interaction data, including DIP, BIND (15) and Mint (16) databases.
| FUTURE DIRECTIONS |
|---|
|
|
|---|
The progress in the development of high-throughput interaction detection methods will soon result in a rapid accumulation of large amounts of protein interaction data. Organizing these data and assessing its reliability will pose significant challenges to the database providers. We foresee further development of quality assessment measures, most likely based on integration of the experimental interaction data with other sources of information, such as expression and functional data. Integration of the data will also play a key role when analyzing the topology and dynamics of protein interaction networks. It would ultimately lead to the construction of comprehensive models of proteinprotein interactions amenable to computational analysis and simulation (17).
| ACKNOWLEDGEMENTS |
|---|
We thank the NIH and DOE for support of DIP.
| REFERENCES |
|---|
|
|
|---|
- Xenarios,I., Rice,D.W., Salwinski,L., Baron,M.K., Marcotte,E.M. and Eisenberg,D. (2000) DIP: the Database of Interacting Proteins. Nucleic Acids Res., 28, 289291.
[Abstract/Free Full Text] - Ito,T., Chiba,T., Ozawa,R., Yoshida,M., Hattori,M. and Sakaki,Y. (2001) A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc. Natl Acad. Sci. USA, 98, 45694574.
[Abstract/Free Full Text] - Uetz,P., Giot,L., Cagney,G., Mansfield,T.A., Judson,R.S., Knight,J.R., Lockshon,D., Narayan,V., Srinivasan,M., Pochart,P. et al. (2000) A comprehensive analysis of proteinprotein interactions in Saccharomyces cerevisiae. Nature, 403, 623627.[CrossRef][Medline]
- Gavin,A.C., Bosche,M., Krause,R., Grandi,P., Marzioch,M., Bauer,A., Schultz,J., Rick,J.M., Michon,A.M., Cruciat,C.M. et al. (2002) Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature, 415, 141147.[CrossRef][Medline]
- Ho,Y., Gruhler,A., Heilbut,A., Bader,G.D., Moore,L., Adams,S.L., Millar,A., Taylor,P., Bennett,K., Boutilier,K. et al. (2002) Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature, 415, 180183.[CrossRef][Medline]
- Deane,C.M., Salwinski,L., Xenarios,I. and Eisenberg,D. (2002) Protein interactions: two methods for assessment of the reliability of high throughput observations. Mol. Cell. Proteomics, 1, 349356.
[Abstract/Free Full Text] - Hermjakob,H., Montecchi-Palazzi,L., Bader,G., Wojcik,J., Salwinski,L., Moore,S., Orchard,S., Sarkans,U., von Mering,C., Roechert,B. et al. (2004) The HUPO PSI molecular interaction format. A community standard for the representation of protein interaction data. Nat. Biotechnol., in press.
- Xenarios,I., Salwinski,L., Duan,X.Q.J., Higney,P., Kim,S.M. and Eisenberg,D. (2002) DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions. Nucleic Acids Res., 30, 303305.
[Abstract/Free Full Text] - Westbrook,J., Feng,Z., Jain,S., Bhat,T.N., Thanki,N., Ravichandran,V., Gilliland,G.L., Bluhm,W., Weissig,H., Greer,D.S. et al. (2002) The Protein Data Bank: unifying the archive. Nucleic Acids Res., 30, 245248.
[Abstract/Free Full Text] - Zhu,H., Bilgin,M., Bangham,R., Hall,D., Casamayor,A., Bertone,P., Lan,N., Jansen,R., Bidlingmaier,S., Houfek,T. et al. (2001) Global analysis of protein activities using proteome chips. Science, 293, 21012105.
[Abstract/Free Full Text] - Salwinski,L. and Eisenberg,D. (2003) Computational methods of analysis of proteinprotein interactions. Curr. Opin. Struct. Biol., 13, 377382.[CrossRef][ISI][Medline]
- Mrowka,R., Patzak,A. and Herzel,H. (2001) Is there a bias in proteome research? Genome Res., 11, 19711973.
[Abstract/Free Full Text] - von Mering,C., Krause,R., Snel,B., Cornell,M., Oliver,S.G., Fields,S. and Bork,P. (2002) Comparative assessment of large-scale data sets of proteinprotein interactions. Nature, 417, 399403.[Medline]
- Deng,M., Mehta,S., Sun,F. and Chen,T. (2002) Inferring domaindomain interactions from proteinprotein interactions. Genome Res., 12, 15401548.
[Abstract/Free Full Text] - Bader,G.D., Betel,D. and Hogue,C.W. (2003) BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res., 31, 248250.
[Abstract/Free Full Text] - Zanzoni,A., Montecchi-Palazzi,L., Quondam,M., Ausiello,G., Helmer-Citterich,M. and Cesareni,G. (2002) MINT: a Molecular INTeraction database. FEBS Lett., 513, 135140.[CrossRef][ISI][Medline]
- Duan,X.J., Xenarios,I. and Eisenberg,D. (2002) Describing biological protein interactions in terms of protein states and state transitions: the LiveDIP database. Mol. Cell. Proteomics, 1, 104116.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
M. Iqbal, A. A. Freitas, C. G. Johnson, and M. Vergassola Message-passing algorithms for the prediction of protein domain interactions from protein-protein interaction data Bioinformatics, September 15, 2008; 24(18): 2064 - 2070. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Xue, J. Ren, X. Gao, C. Jin, L. Wen, and X. Yao GPS 2.0, a Tool to Predict Kinase-specific Phosphorylation Sites in Hierarchy Mol. Cell. Proteomics, September 1, 2008; 7(9): 1598 - 1608. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Greene, G. Cagney, N. Krogan, and P. Cunningham Ensemble non-negative matrix factorization methods for clustering protein-protein interactions Bioinformatics, August 1, 2008; 24(15): 1722 - 1728. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Goll, S. V. Rajagopala, S. C. Shiau, H. Wu, B. T. Lamb, and P. Uetz MPIDB: the microbial protein interaction database Bioinformatics, August 1, 2008; 24(15): 1743 - 1744. [Abstract] [Full Text] [PDF] |
||||
![]() |
L.-C. Tranchevent, R. Barriot, S. Yu, S. Van Vooren, P. Van Loo, B. Coessens, B. De Moor, S. Aerts, and Y. Moreau ENDEAVOUR update: a web resource for gene prioritization in multiple species Nucleic Acids Res., July 1, 2008; 36(suppl_2): W377 - W384. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. C. Conant and K. H. Wolfe Probabilistic Cross-Species Inference of Orthologous Genomic Regions Created by Whole-Genome Duplication in Yeast Genetics, July 1, 2008; 179(3): 1681 - 1692. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Aguilar, L. Skrabanek, S. S. Gross, B. Oliva, and F. Campagne Beyond tissueInfo: functional prediction using tissue expression profile similarity searches Nucleic Acids Res., June 1, 2008; 36(11): 3728 - 3737. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Li, W. Liu, Z. Liu, J. Wang, Q. Liu, Y. Zhu, and F. He PRINCESS, a Protein Interaction Confidence Evaluation System with Multiple Data Sources Mol. Cell. Proteomics, June 1, 2008; 7(6): 1043 - 1052. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Hegermann, S. Halbedel, R. Dumke, J. Regula, R. R. Gabdoulline, F. Mayer, J. Stulke, and R. Herrmann The acidic, glutamine-rich Mpn474 protein of Mycoplasma pneumoniae is surface exposed and covers the complete cell Microbiology, April 1, 2008; 154(4): 1185 - 1192. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Saeed and C. Deane An assessment of the uses of homologous interactions Bioinformatics, March 1, 2008; 24(5): 689 - 695. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Presser, M. B. Elowitz, M. Kellis, and R. Kishony The evolutionary dynamics of the Saccharomyces cerevisiae protein interaction network after duplication PNAS, January 22, 2008; 105(3): 950 - 954. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Su, J. M. Peregrin-Alvarez, G. Butland, S. Phanse, V. Fong, A. Emili, and J. Parkinson Bacteriome.org an integrated protein interaction database for E. coli Nucleic Acids Res., January 11, 2008; 36(suppl_1): D632 - D636. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cui, P. Li, G. Li, F. Xu, C. Zhao, Y. Li, Z. Yang, G. Wang, Q. Yu, Y. Li, et al. AtPID: Arabidopsis thaliana protein interactome database an integrative platform for plant systems biology Nucleic Acids Res., January 11, 2008; 36(suppl_1): D999 - D1008. [Abstract] [Full Text] [PDF] |
||||
![]() |
B.-J. Breitkreutz, C. Stark, T. Reguly, L. Boucher, A. Breitkreutz, M. Livstone, R. Oughtred, D. H. Lackner, J. Bahler, V. Wood, et al. The BioGRID Interaction Database: 2008 update Nucleic Acids Res., January 11, 2008; 36(suppl_1): D637 - D640. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Donitz, B. Goemann, M. Lize, H. Michael, N. Sasse, E. Wingender, and A. P. Potapov EndoNet: an information resource about regulatory networks of cell-to-cell communication Nucleic Acids Res., January 11, 2008; 36(suppl_1): D689 - D694. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Brilli, R. Fani, and P. Lio Current trends in the bioinformatic sequence analysis of metabolic pathways in prokaryotes Brief Bioinform, January 1, 2008; 9(1): 34 - 45. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.D.J. van Dijk, C.J.F. ter Braak, R.G. Immink, G.C. Angenent, and R.C.H.J. van Ham Predicting and understanding transcription factor interactions based on sequence level determinants of combinatorial control Bioinformatics, January 1, 2008; 24(1): 26 - 33. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Bhardwaj, R. V. Stahelin, G. Zhao, W. Cho, and H. Lu MeTaDoR: a comprehensive resource for membrane targeting domains and their host proteins Bioinformatics, November 15, 2007; 23(22): 3110 - 3112. [Abstract] [Full Text] [PDF] |
||||
![]() |
Z. Guo, Y. Li, X. Gong, C. Yao, W. Ma, D. Wang, Y. Li, J. Zhu, M. Zhang, D. Yang, et al. Edge-based scoring and searching method for identifying condition-responsive protein protein interaction sub-network Bioinformatics, August 15, 2007; 23(16): 2121 - 2128. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Salwinski and D. Eisenberg The MiSink Plugin: Cytoscape as a graphical interface to the Database of Interacting Proteins Bioinformatics, August 15, 2007; 23(16): 2193 - 2195. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Al-Shahrour, P. Minguez, J. Tarraga, I. Medina, E. Alloza, D. Montaner, and J. Dopazo FatiGO +: a functional profiling tool for genomic data. Integration of functional annotation, regulatory motifs and interaction data with microarray experiments Nucleic Acids Res., July 13, 2007; 35(suppl_2): W91 - W96. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Backes, A. Keller, J. Kuentzer, B. Kneissl, N. Comtesse, Y. A. Elnakady, R. Muller, E. Meese, and H.-P. Lenhof GeneTrail--advanced gene set enrichment analysis Nucleic Acids Res., July 13, 2007; 35(suppl_2): W186 - W192. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Dutkowski and J. Tiuryn Identification of functional modules from conserved ancestral protein protein interactions Bioinformatics, July 1, 2007; 23(13): i149 - i158. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. D. Dyer, T. M. Murali, and B. W. Sobral Computational prediction of host-pathogen protein protein interactions Bioinformatics, July 1, 2007; 23(13): i159 - i166. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Ourfali, T. Shlomi, T. Ideker, E. Ruppin, and R. Sharan SPINE: a framework for signaling-regulatory pathway inference from cause-effect experiments Bioinformatics, July 1, 2007; 23(13): i359 - i366. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. von Mering, L. J. Jensen, M. Kuhn, S. Chaffron, T. Doerks, B. Kruger, B. Snel, and P. Bork STRING 7--recent developments in the integration and prediction of protein interactions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D358 - D362. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Chatr-aryamontri, A. Ceol, L. M. Palazzi, G. Nardelli, M. V. Schneider, L. Castagnoli, and G. Cesareni MINT: the Molecular INTeraction database Nucleic Acids Res., January 12, 2007; 35(suppl_1): D572 - D574. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ceol, A. Chatr-aryamontri, E. Santonico, R. Sacco, L. Castagnoli, and G. Cesareni DOMINO: a database of domain-peptide interactions Nucleic Acids Res., January 12, 2007; 35(suppl_1): D557 - D560. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kerrien, Y. Alam-Faruque, B. Aranda, I. Bancarz, A. Bridge, C. Derow, E. Dimmer, M. Feuermann, A. Friedrichsen, R. Huntley, et al. IntAct--open source resource for molecular interaction data Nucleic Acids Res., January 12, 2007; 35(suppl_1): D561 - D565. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Chaurasia, Y. Iqbal, C. Hanig, H. Herzel, E. E. Wanker, and M. E. Futschik UniHI: an entry gate to the human protein interactome Nucleic Acids Res., January 12, 2007; 35(suppl_1): D590 - D594. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Liu and L. Wang Computing the maximum similarity bi-clusters of gene expression data Bioinformatics, January 1, 2007; 23(1): 50 - 56. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Ng, B. Bursteinas, Q. Gao, E. Mollison, and M. Zvelebil Resources for integrative systems biology: from data through databases to networks and dynamic system models Brief Bioinform, December 1, 2006; 7(4): 318 - 330. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Sanford, M. L.K. Yip, C. White, and J. Parkinson Cell++--simulating biochemical pathways Bioinformatics, December 1, 2006; 22(23): 2918 - 2925. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Li, Y. Huang, X. Xia, and Z. Sun Preferential Duplication in the Sparse Part of Yeast Protein Interaction Network Mol. Biol. Evol., December 1, 2006; 23(12): 2467 - 2473. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Sato, Y. Yamanishi, K. Horimoto, M. Kanehisa, and H. Toh Partial correlation coefficient between distance matrices as a new indicator of protein-protein interactions Bioinformatics, October 15, 2006; 22(20): 2488 - 2492. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. F. Jonsson and P. A. Bates Global topological features of cancer proteins in the human interactome Bioinformatics, September 15, 2006; 22(18): 2291 - 2297. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Y. Shi, G. A. Miller, H. Qian, and K. Bomsztyk Free-energy distribution of binary protein-protein binding suggests cross-species interactome differences PNAS, August 1, 2006; 103(31): 11527 - 11532. [Abstract] [Full Text] [PDF] |
||||
![]() |
M Oti, B Snel, M A Huynen, and H G Brunner Predicting disease genes using protein-protein interactions J. Med. Genet., August 1, 2006; 43(8): 691 - 698. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-H. Hung, H.-D. Huang, and T.-Y. Lee ProKware: integrated software for presenting protein structural properties in protein tertiary structures. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W89 - W94. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Prieto and J. De Las Rivas APID: Agile Protein Interaction DataAnalyzer. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W298 - W302. [Abstract] [Full Text] [PDF] |









