Nucleic Acids Research Advance Access originally published online on November 6, 2008
Nucleic Acids Research 2009 37(Database issue):D767-D772; doi:10.1093/nar/gkn892
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, Database issue D767-D772
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
This article appears in the following Nucleic Acids Research issue: Database issue [View the issue table of contents]
Articles |
Human Protein Reference Database—2009 update
1Institute of Bioinformatics, International Tech Park, Bangalore 560 066, 2Department of Biotechnology, Kuvempu University, Shankaraghatta, Karnataka, India, 3McKusick-Nathans Institute of Genetic Medicine, 4Department of Biological Chemistry and 5Department of Pathology and Oncology, Johns Hopkins University, Baltimore, MD 21205, USA
*To whom correspondence should be addressed. Tel: +410 502 6662; Fax: +410 502 7544; Email: pandey{at}jhmi.edu
Correspondence may also be addressed to T. S. Keshava Prasad. Tel: (+91) 80-28416140; Fax: (+91) 80-28416132; Email: keshav{at}ibioinformatics.org
Received September 16, 2008. Revised October 20, 2008. Accepted October 22, 2008.
| ABSTRACT |
|---|
|
|
|---|
Human Protein Reference Database (HPRD—http://www.hprd.org/), initially described in 2003, is a database of curated proteomic information pertaining to human proteins. We have recently added a number of new features in HPRD. These include PhosphoMotif Finder, which allows users to find the presence of over 320 experimentally verified phosphorylation motifs in proteins of interest. Another new feature is a protein distributed annotation system—Human Proteinpedia (http://www.humanproteinpedia.org/)—through which laboratories can submit their data, which is mapped onto protein entries in HPRD. Over 75 laboratories involved in proteomics research have already participated in this effort by submitting data for over 15 000 human proteins. The submitted data includes mass spectrometry and protein microarray-derived data, among other data types. Finally, HPRD is also linked to a compendium of human signaling pathways developed by our group, NetPath (http://www.netpath.org/), which currently contains annotations for several cancer and immune signaling pathways. Since the last update, more than 5500 new protein sequences have been added, making HPRD a comprehensive resource for studying the human proteome.
| INTRODUCTION |
|---|
|
|
|---|
Human Protein Reference Database (HPRD; http://www.hprd.org/) is a resource for experimentally derived information about the human proteome including protein–protein interactions, post-translational modifications (PTMs) and tissue expression (1–4). The contents of several proteomic databases, including HPRD, pertaining to human proteins have recently been evaluated in terms of the number of nonredundant protein–protein interactions, number of direct interactions per protein, number of proteins with disease annotation and the number of linked citations (5). The curation and annotation process in HPRD involves entry of protein data through BioBuilder, a tool developed by our group for editing and managing data through a web browser (6). We have incorporated new features, such as PhosphoMotif Finder, links to a signaling pathway resource called NetPath, Human Proteinpedia for enhanced community participation and the use of BLAST for querying mRNA/protein data. Since the last update, we have added approximately 5500 new protein sequences and corresponding information in HPRD, which now contains information on most of the human proteins including their isoforms.
PhosphoMotif Finder searches experimentally derived phosphorylation-based substrate and binding motifs
PhosphoMotif Finder contains experimentally characterized phosphorylation-based substrate and binding motifs derived from the literature (7) and has been integrated with HPRD. PhosphoMotif Finder searches across the user submitted protein sequence for the presence of any of the 320 phosphorylation-based motifs listed in the compendium. Figure 1 shows the presence of 30 known tyrosine kinase phosphorylation sites in microtubule-associated serine/threonine kinase-like protein (MASTL), which is implicated in thrombocytopenia, a blood disorder. In addition to the mapped motifs, PhosphoMotif Finder also indicates potential enzymes (i.e. kinases or phosphatases) associated with these phosphorylation motifs. PhosphoMotif Finder should also be helpful in ascertaining the novelty of any motif that is described in the literature. Finally, it can be used in designing phosphorylation motif-specific antibodies and antibody-based arrays.
|
NetPath pathway resource
We have incorporated a compendium of human signaling pathways called NetPath (http://www.netpath.org/) through the Pathways tab in HPRD. NetPath contains information about protein interactions, catalytic reactions and protein translocation events, which occur downstream of ligand–receptor interactions. Currently, the role of 2732 and 1793 proteins are thus annotated in the context of cancer and immune signaling pathways, respectively. We have also cataloged genes that are upregulated or downregulated at the transcriptional level under the influence of these signaling pathways. Pathway data can be downloaded in standard international data exchange formats including BioPAX Level 2.0, PSI-MI version 2.5 and SBML version 2.1. The list of transcriptionally upregulated and downregulated genes can be obtained in the form of Excel sheet and tab delimited text documents. Integration of NetPath data in HPRD will assist users in visualizing the probable role of proteins in diverse signaling networks. For example, Janus Kinase 2 (JAK2) is involved in diverse pathways including EGFR1, Kit receptor, Notch, IL-2, IL-3, IL-4, IL-5 and IL-6 signaling pathways. NetPath provides the list of physical interactions and catalysis events of JAK2 with various proteins under different signaling pathways. Each interaction or catalysis event is linked to the PubMed abstract of the original article (Figure 2).
|
Annotation of proteomic information
Protein isoforms
We have included most of human protein isoforms present in the RefSeq Database (8). Currently, 25 661 protein sequences encoded by 19 433 genes have been annotated in HPRD. Phosphodiesterase 9A, cAMP response element modulator, collagen type XIII alpha1 and dystrophin are examples of proteins with the highest number of isoforms with 20, 20, 19 and 18 isoforms, respectively. However, only data pertaining to the sequence, subcellular localization, mRNA/protein expression, biological motifs and domains are currently being annotated as isoform specific whereas protein–protein interactions and enzyme–substrate relationships are annotated as common to all isoforms. This is mainly due to the general lack of experimental data for the latter.
Protein–protein interactions
Protein–protein interactions are one of the most requested components of HPRD among those who downloaded this dataset. We have added more than 5000 protein–protein interactions in HPRD since the previous update in 2006. Among the 38 167 protein–protein interactions documented in HPRD, 8958 interactions were based on yeast two-hybrid analysis alone, whereas 8827 interactions were based on in vitro and 7163 on in vivo methods. Detection of 2410 protein–protein interactions was confirmed by all three methods. Overall, in HPRD, 8710 proteins are annotated with at least one protein–protein interaction, whereas 2015 and 774 proteins have more than 5 or 10 protein–protein interactions, respectively. The 14-3-3 gamma protein has a maximum of 173 protein–protein interactions. 15 231 protein–protein interactions (Table 1) have been submitted to HPRD by the scientific community using Human Proteinpedia (9,10). Enzyme–substrate relationships determined through peptide/protein arrays is a new data type included in HPRD, as represented by the phosphorylation of Tyr 16 of RNA binding motif protein 10 by c-Src.
|
PTMs and subcellular localization
HPRD currently contains information for 16 972 PTMs (Table 2) which belong to various categories with phosphorylation (10 858), dephosphorylation (3118) and glycosylation (1860) forming the majority of the annotated PTMs (Table 2). At least one enzyme responsible for PTMs has been annotated for 8960 PTMs, which resulted in the documentation of 7253 enzyme–substrate relationships. Of these, 1277 PTMs have more than one enzyme annotated. Human Proteinpedia has contributed over 17 400 PTMs, which are mainly derived from mass spectrometry studies. One or more site of subcellular localization has been annotated for 8620 proteins in HPRD with 586 of them being isoform specific. In addition to these, scientific investigators have contributed 2906 entries pertaining to subcellular localization through Human Proteinpedia.
|
Community participation through Human Proteinpedia
We have developed a distributed annotation system called Human Proteinpedia and incorporated in HPRD (9,10). Proteomic investigators can directly contribute protein data derived from diverse platforms including the yeast two-hybrid, mass spectrometry, peptide/protein array, immunohistochemistry, Western blot, coimmunoprecipitation and fluorescence microscopy to HPRD using Human Proteinpedia. The protein features that can be mapped to corresponding entries in HPRD include PTMs, mRNA/protein expression in tissues or cell lines, subcellular localization, enzyme–substrate relationships and protein–protein interactions. These annotations are made available for viewing in a separate box beneath the HPRD annotation (Figure 3). Each entry is also linked to experimental evidence, such as mass spectra, images of Western blots and fluorescence micrographs. Figure 3 shows five serine phosphorylation sites for Adducin 1 protein in HPRD, submitted through Human Proteinpedia. PTM sites are linked to the meta-annotation of mass spectrometry data in Human Proteinpedia database as submitted by the investigator. The corresponding MS/MS spectrum can also be viewed by following a link in the meta-annotation page.
|
Investigators worldwide have already submitted 15 231 protein–protein interactions, 17 410 PTMs and 150 368 mRNA/protein expression to HPRD through Human Proteinpedia. Human Proteinpedia has increased quantity of the HPRD data by 2-fold in a relatively short span of time (Table 1). By involving investigators and experimentalists in the annotation of proteomic data, Human Proteinpedia has transformed HPRD into a true community database.
Usage of HPRD data by the community
Over the years, the biomedical community has provided valuable suggestions by interacting with HPRD team through Comments and Help buttons provided in HPRD page. More than 8000 gene comments, expert suggestions and help requests have been received and nearly 100 scientists have been designated as Molecule Authorities based on their expertise. We hope to further increase participation by the community by implementing a microattribution system, which provides a citable credit to the investigators. Web resources that display or have made use of HPRD data include Entrez-Gene, VisANT (11) Genes2Networks (12), Cerebral (13), BioNetBuilder (14), COXPRESdb (15), STRING 7 (16) and UniHI (17). Molecular Signature Database (MSigDB) (18) used for Gene Set Enrichment Analysis of gene expression data incorporates pathway gene sets curated from HPRD. Sequence analysis tools which use HPRD data include CompariMotif (19) and SLiMFinder (20). CutDB, a database of proteolytic events (21), PepBank, a database of peptides (22) and T1Dbase, a database for type 1 diabetes research (23) are other resources that also incorporate curated proteomic data from HPRD.
| CONCLUSIONS |
|---|
|
|
|---|
With the inclusion of most of human protein sequences, HPRD has grown into an integrated knowledgebase for genomic and proteomic investigators. Incorporation of PhosphoMotif Finder and signaling pathways will help users to generate novel hypotheses or to point out likely molecules involved in a biological process of their interest. Further, the implementation of Human Proteinpedia has transformed HPRD into a community driven database and we hope that this trend will continue so that each and every entry is directly or indirectly verified by the individual experimentalists.
| FUNDING |
|---|
|
|
|---|
Funding for open access charge: Institute of Bioinformatics.
Conflict of interest statement. None declared.
| ACKNOWLEDGEMENTS |
|---|
We thank all investigators and Molecule Authorities who have provided valuable feedback about individual entries in this database.
| REFERENCES |
|---|
|
|
|---|
- Gandhi TK, Zhong J, Mathivanan S, Karthick L, Chandrika KN, Mohan SS, Sharma S, Pinkert S, Nagaraju S, Periaswamy B, et al. Analysis of the human protein interactome and comparison with yeast, worm and fly interaction datasets. Nat. Genet. (2006) 38:285–293.[CrossRef][Web of Science][Medline]
- Mishra GR, Suresh M, Kumaran K, Kannabiran N, Suresh S, Bala P, Shivakumar K, Anuradha N, Reddy R, Raghavan TM, et al. Human protein reference database–2006 update. Nucleic Acids Res. (2006) 34:D411–D414.
[Abstract/Free Full Text] - Peri S, Navarro JD, Amanchy R, Kristiansen TZ, Jonnalagadda CK, Surendranath V, Niranjan V, Muthusamy B, Gandhi TK, Gronborg M, et al. Development of human protein reference database as an initial platform for approaching systems biology in humans. Genome Res. (2003) 13:2363–2371.
[Abstract/Free Full Text] - Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V, Muthusamy B, Gandhi TK, Chandrika KN, Deshpande N, Suresh S, et al. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res. (2004) 32:D497–D501.
[Abstract/Free Full Text] - Mathivanan S, Periaswamy B, Gandhi TK, Kandasamy K, Suresh S, Mohmood R, Ramachandra YL, Pandey A. An evaluation of human protein-protein interaction data in the public domain. BMC Bioinformatics (2006) 7(Suppl. 5):S19.
- Navarro JD, Talreja N, Peri S, Vrushabendra BM, Rashmi BP, Padma N, Surendranath V, Jonnalagadda CK, Kousthub PS, Deshpande N, Shanker K, et al. BioBuilder as a database development and functional annotation platform for proteins. BMC Bioinformatics (2004) 20:5–43.
- Amanchy R, Periaswamy B, Mathivanan S, Reddy R, Tattikota SG, Pandey A. A curated compendium of phosphorylation motifs. Nat. Biotechnol. (2007) 25:285–286.[CrossRef][Web of Science][Medline]
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, Dicuccio M, Edgar R, Federhen S, et al. Database resources of the national center for biotechnology information. Nucleic Acids Res. (2008) 36:D13–D21.
[Abstract/Free Full Text] - Kandasamy K, Keerthikumar S, Goel R, Mathivanan S, Patankar N, Shafreen B, Renuse S, Pawar H, Ramachandra YL, Acharya PK, et al. Human Proteinpedia: a unified discovery resource for proteomics research. Nucleic Acids Res. (2008) (in press).
- Mathivanan S, Ahmed M, Ahn NG, Alexandre H, Amanchy R, Andrews PC, Bader JS, Balgley BM, Bantscheff M, Bennett KL, et al. Human Proteinpedia enables sharing of human protein data. Nat. Biotechnol. (2008) 26:164–167.[CrossRef][Web of Science][Medline]
- Hu Z, Snitkin ES, DeLisi C. VisANT: an integrative framework for networks in systems biology. Brief Bioinform. (2008) 9:317–325.
[Abstract/Free Full Text] - Berger SI, Posner JM, Maayan A. Genes2Networks: connecting lists of gene symbols using mammalian protein interactions databases. BMC Bioinformatics (2007) 8:372.[CrossRef][Medline]
- Barsky A, Gardy JL, Hancock RE, Munzner T. Cerebral: a Cytoscape plugin for layout of and interaction with biological networks using subcellular localization annotation. Bioinformatics (2007) 23:1040–1042.
[Abstract/Free Full Text] - Avila-Campillo I, Drew K, Lin J, Reiss DJ, Bonneau R. BioNetBuilder: automatic integration of biological networks. Bioinformatics (2007) 23:392–393.
[Abstract/Free Full Text] - Obayashi T, Hayashi S, Shibaoka M, Saeki M, Ohta H, Kinoshita K. COXPRESdb: a database of coexpressed gene networks in mammals. Nucleic Acids Res. (2008) 36:D77–D82.
[Abstract/Free Full Text] - von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P. STRING 7–recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. (2007) 35:D358–D362.
[Abstract/Free Full Text] - Chaurasia G, Iqbal Y, Hanig C, Herzel H, Wanker EE, Futschik ME. UniHI: an entry gate to the human protein interactome. Nucleic Acids Res. (2007) 35:D590–D594.
[Abstract/Free Full Text] - Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, Gillette MA, Paulovich A, Pomeroy SL, Golub TR, Lander ES, et al. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc. Natl Acad. Sci. USA (2005) 102:15545–15550.
[Abstract/Free Full Text] - Edwards RJ, Davey NE, Shields DC. CompariMotif: quick and easy comparisons of sequence motifs. Bioinformatics (2008) 24:1307–1309.
[Abstract/Free Full Text] - Edwards RJ, Davey NE, Shields DC. SLiMFinder: a probabilistic method for identifying over-represented, convergently evolved, short linear motifs in proteins. PLoS ONE (2007) 2:e967.[CrossRef]
- Igarashi Y, Eroshkin A, Gramatikova S, Gramatikoff K, Zhang Y, Smith JW, Osterman AL, Godzik A. CutDB: a proteolytic event database. Nucleic Acids Res. (2007) 35:D546–D549.
[Abstract/Free Full Text] - Shtatland T, Guettler D, Kossodo M, Pivovarov M, Weissleder R. PepBank–a database of peptides based on sequence text mining and public peptide data sources. BMC Bioinformatics (2007) 8:280.[CrossRef][Medline]
- Hulbert EM, Smink LJ, Adlem EC, Allen JE, Burdick DB, Burren OS, Cassen VM, Cavnor CC, Dolman GE, Flamez D, et al. T1DBase: integration and presentation of complex data for type 1 diabetes research. Nucleic Acids Res. (2007) 35:D742–D746.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
L. Wang, Y. Xiong, Y. Sun, Z. Fang, L. Li, H. Ji, and T. Shi HLungDB: an integrated database of human lung cancer research Nucleic Acids Res., November 9, 2009; (2009) gkp945v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kandasamy, S. Keerthikumar, R. Raju, T. S. Keshava Prasad, Y. L. Ramachandra, S. Mohan, and A. Pandey PathBuilder--open source software for annotating and developing pathway resources Bioinformatics, November 1, 2009; 25(21): 2860 - 2862. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. F. Ochs Knowledge-based data analysis comes of age Brief Bioinform, October 23, 2009; (2009) bbp044v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Keerthikumar, S. Bhadra, K. Kandasamy, R. Raju, Y. L. Ramachandra, C. Bhattacharyya, K. Imai, O. Ohara, S. Mohan, and A. Pandey Prediction of Candidate Primary Immunodeficiency Disease Genes Using a Support Vector Machine Learning Approach DNA Res, October 3, 2009; (2009) dsp019v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-F. Fontaine, A. Barbosa-Silva, M. Schaefer, M. R. Huska, E. M. Muro, and M. A. Andrade-Navarro MedlineRanker: flexible ranking of biomedical literature Nucleic Acids Res., July 1, 2009; 37(suppl_2): W141 - W146. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Blankenburg, F. Ramirez, J. Buch, and M. Albrecht DASMIweb: online integration, analysis and assessment of distributed protein interaction data Nucleic Acids Res., July 1, 2009; 37(suppl_2): W122 - W128. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Blankenburg, R. D. Finn, A. Prlic, A. M. Jenkinson, F. Ramirez, D. Emig, S.-E. Schelhorn, J. Buch, T. Lengauer, and M. Albrecht DASMI: exchanging, annotating and assessing molecular interaction data Bioinformatics, May 15, 2009; 25(10): 1321 - 1328. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Chautard, L. Ballut, N. Thierry-Mieg, and S. Ricard-Blum MatrixDB, a database focused on extracellular protein-protein and protein-carbohydrate interactions Bioinformatics, March 1, 2009; 25(5): 690 - 691. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||






