IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks

Wong, Aaron K.; Krishnan, Arjun; Yao, Victoria; Tadych, Alicja; Troyanskaya, Olga G.

doi:10.1093/nar/gkv486

Abstract

IMP (Integrative Multi-species Prediction), originally released in 2012, is an interactive web server that enables molecular biologists to interpret experimental results and to generate hypotheses in the context of a large cross-organism compendium of functional predictions and networks. The system provides biologists with a framework to analyze their candidate gene sets in the context of functional networks, expanding or refining their sets using functional relationships predicted from integrated high-throughput data. IMP 2.0 integrates updated prior knowledge and data collections from the last three years in the seven supported organisms (Homo sapiens, Mus musculus, Rattus norvegicus, Drosophila melanogaster, Danio rerio, Caenorhabditis elegans, and Saccharomyces cerevisiae) and extends function prediction coverage to include human disease. IMP identifies homologs with conserved functional roles for disease knowledge transfer, allowing biologists to analyze disease contexts and predictions across all organisms. Additionally, IMP 2.0 implements a new flexible platform for experts to generate custom hypotheses about biological processes or diseases, making sophisticated data-driven methods easily accessible to researchers. IMP does not require any registration or installation and is freely available for use at http://imp.princeton.edu.

INTRODUCTION

Biologists using modern experimental methods are generating massive amounts of genome-scale data. However, there continues to be a substantial gap between the avalanche of genomic data, which are abundant but not reliable, and our limited biological knowledge, which can only be discovered through careful, small-scale techniques. This disparity has been exacerbated with the development and popularity of next-generation technologies, such as RNA-seq, which enable genome-wide measurements at unprecedented resolution and cost (1). A paucity of biological knowledge (i.e. experimentally validated gene function) limits the coverage and accuracy of computational methods that require prior knowledge to learn novel biology, even when large-scale genomic data are available (2). Thus, these methods are limited to performing well on processes and pathways that are already well characterized in an organism. IMP (Integrated Multi-species Prediction) was originally developed to address the growing need to interpret and analyze results from genome-wide studies and generate hypotheses for experimental follow-up in the context of integrated functional gene networks, even when prior knowledge is limited in an organism or for a specific biological context (3).

IMP is an exploratory tool that provides a high-quality, interactive interface for functional prediction and interrogation. Researchers can incorporate IMP into their analysis workflow in several ways. For example, biologists can overlay their genes from a high-throughput experiment onto IMP's functional gene networks, expanding or contracting the network and identifying enriched, unifying functional themes. Alternatively, researchers can generate specific functional hypotheses by querying IMP's collection of gene-pathway predictions, identifying candidate genes for a biological context of interest. In all of these analyses, IMP systematically applies a previously developed network-based method that identifies functionally similar homologs to transfer annotations (i.e. gene-pathway membership) between organisms. This more specific homology detection method extends beyond simple annotation transfer by sequence similarity and enables accurate gene pathway predictions, even for processes that have few or no experimental annotations in an organism (2).

There are several successful web servers that allow researchers to analyze their gene sets in the context of gene networks (4–6), however, they are either constrained by the availability of prior knowledge in an organism and biological process of interest or limited to sequence-based transfers of input data (7,8). IMP is unique in its systematic incorporation of a functional genomics-based homology transfer of prior knowledge (9) in all of its analyses, improving the accuracy and coverage of functional interrogation (2).

IMP has been continuously maintained and developed since the original publication and here we describe major updates to the server. We have extensively updated the gene networks and functional predictions across all seven organisms, adding publicly available gene expression experiments from the subsequent years, and updating the already included data sources. Additionally, we extend IMP's functional coverage to include human diseases, allowing biologists to analyze disease contexts and predictions in human and across model organisms. Human disease gene knowledge is transferred to other organisms and predictions are made using each organism's gene network. By exploring disease gene predictions across the model organisms, biologists can find candidate genes to serve as targets for follow-up in human and in potential animal models for their disease of interest.

Additionally, we have created a flexible tool that furthers the original goal of the web server: to enable biologists to analyze their experimental results in the context of massive-scale integrated data compendia. We developed a prediction platform that allows biologists to bring their larger experimental result (for example, a list of hundreds of genes identified as over-expressed in a microarray experiment) and run a sophisticated machine-learning method for classification. This tool can be used to answer many pertinent questions, for example, identifying additional candidate disease genes from a microarray experiment, or additional players in a biological process of interest. Such an analysis might otherwise be infeasible due to biologists’ limited computational resources or expertise. The software is maintained and executed on IMP's servers and only requires a list of genes from the user. Genome-wide results are available by email, if provided, or directly on the web server.

IMP SYSTEM DATA UPDATES

IMP is a flexible tool that can be used to answer diverse biological questions posed in the form of a biological context (a process or a disease), a single gene, or a set of genes of interest. These questions can be broad and exploratory, for example, determining the shared pathways among a set of genes that are co-expressed in an mRNA experiment. Alternatively, researchers can generate specific experimentally testable hypotheses, such as identifying functional partners of a gene of interest or possible pathways that the gene participates in. As an exploratory tool, IMP provides interactive visualizations of gene-gene functional relationships, gene-process predictions and cross-organism network alignments. IMP is both a collection of gene-pathway predictions that users can query for specific targeted results and a suite of user-driven tools that can be customized for broad discovery.

All of IMP's diverse analyses leverage an organism's functional gene network, which integrates thousands of genome-wide experiments from an array of public data sources (10–13) and describes whether genes participate in similar biological processes. These networks are constructed using previously described methods (2,6,14) and have been extensively updated in the subsequent years since IMP was originally released. We use a new expert-curated set of Gene Ontology (GO) terms (15) to define the gold standard for learning gene–gene functional relationships and have updated the standard to include the latest experimental annotations. IMP networks now integrate 3540 data sets, a 49% increase in the number of data sets from IMP's original release (3), and include over 70 000 experimental conditions. In addition to adding gene expression experiments from the last three years, IMP networks have been updated with the most recent releases of popular functional genomic databases. For example, BioGRID (10) has been updated to include 196 909 additional protein–protein interactions, an increase of 78% from the original networks. A complete list of data sources is available directly on the web server.

DISEASE PREDICTIONS

Biologists can query IMP with a gene set or biological context of interest to retrieve putative gene-pathway assignments. We have extended IMP's biological contexts to include human diseases, in addition to GO biological processes. Biologists can now analyze disease contexts and predictions across organisms. IMP applies the same machine-learning method for predicting genes to biological processes (2,3) as it does to diseases, which uses a functional network as input to a Support Vector Machine (SVM) to classify genes (Figure 1). We showed previously that this method is accurate and competitive among state-of-the-art methods in predicting genes to biological processes (2,3). Disease gene predictions are inferred directly in human—from disease genes curated by Online Mendelian Inheritance in Man (OMIM) (16) and using the human functional network—and in the six model organisms. The disease predictions inferred in the other organisms leverage biological knowledge from human by transferring OMIM knowledge using our previously described method to identify the appropriate homologs for gene annotation transfer (2,9). These human-transferred gene-disease annotations are then used as training data for prediction with the organisms’ functional network, and the subsequent gene predictions are specific to that organism. By applying a model organism's functional network to predict disease genes, IMP can help biologists address an important challenge in the study of human disease: identifying the best model system for a given disease and the appropriate orthologs for a disease of interest.

Figure 1.

Open in new tab Download slide

A schematic for IMP disease knowledge transfer and prediction. (A) IMP constructs a functional network for each of seven organisms by integrating heterogeneous genomic data. (B) Disease-gene annotations from human are transferred to the functionally similar homologs in other organisms. (C) Additional disease genes are predicted using the human-transferred disease genes in the organism-specific functional networks.

Using IMP, users search by Disease Ontology (DO) (17) term or by gene to retrieve gene-disease predictions. OMIM disease genes are mapped to DO, using the mapping provided by DO, to leverage the unified naming and hierarchical structure of the ontology. Figure 2 shows queries for hypertrophic cardiomyopathy (HCM) in both human (Figure 2A) and mouse (Figure 2B). Many of the top genes in the human query are known to be involved in the disease (highlighted rows), and the others are potential disease candidates. For example, the second novel gene prediction is TRIM63, which encodes an E3 ubiquitin ligase and plays a role in the atrophy of skeletal and cardiac muscle (18,19). The gene has recently been suggested (independent of IMP) as a candidate for HCM with several mutations observed in patients with the disease (20).

Figure 2.

Open in new tab Download slide

Disease result pages for ‘hypertrophic cardiomyopathy’ in IMP. (A) Querying ‘hypertrophic cardiomyopathy’ in human returns a list of genes predicted to be involved in the disease, sorted by probability. IMP applies known hypertrophic cardiomyopathy genes in human (from OMIM) to predict additional genes from the human functional network. (B) The same disease query can be performed in mouse, returning predicted mouse genes. These predictions were learned using human disease genes transferred to mouse with the mouse functional network.

Figure 2B shows the same query for HCM in mouse. These gene predictions, which leverage human disease knowledge transferred to mouse, are potentially informative as a mouse model for the disease. In fact, the most confidently predicted gene, Csrp3, was a target in the first model for dilated cardiomyopathy with hypertrophy in a genetically manipulatable organism. Csrp3-deficient mice reproduce the same morphologic and clinical features of the disease as in human (21). The Csrp3 mouse model serves as a valuable resource for understanding the pathophysiology of heart failure and for identifying potential therapies for the disease (22,23). Thus, in these example use cases, IMP independently, and in a data-driven predictive fashion, identifies a candidate human gene for HCN and a mouse gene that is already a model for understanding human HCM.

PLATFORM FOR CUSTOM PREDICTIONS

Many biological questions cannot be posed as a predefined gene set, such as a GO biological process or OMIM disease, or expressed as a small gene set (i.e. <50 genes), requiring more advanced and flexible data-mining techniques. For example, a researcher with results from a genetic screen may be interested in identifying additional candidate genes. Alternatively, a biologist may want to combine her private experimental result with public gene pathway annotations to make customized predictions. Most biologists lack the computational resources or expertise to implement and support the necessary machine learning software and data compendia for such an analysis. With IMP 2.0, we provide a flexible platform for researchers to run state-of-the-art machine learning methods and pose customized, sophisticated biological questions.

Users provide a gold standard, in the form of a set of relevant genes, or use IMP provided gene sets, which include GO biological process and DO terms. IMP uses the same previously described and validated method for predicting GO function (2,3), which applies a SVM with features constructed from the organism of interest's functional gene network for classification. The SVM classifies all genes in the genome based on its pattern of functional relationships with the provided genes of interest, up-weighting the parts of the network that are informative for membership in the gene set. This method has been previously shown to be accurate in predicting genes to biological processes and phenotypes, with corresponding estimates of prediction performance (2,24).

Figure 3 shows the schematic for running a custom IMP function prediction. A user starts an analysis by specifying an organism and her genes of interest, either manually, from a user-saved gene set, or pre-defined by IMP. Pre-defined gene sets can be from GO or DO, and can include annotations transferred from other organisms by selecting the corresponding checkbox. Figure 3A shows the input for an analysis of five user-provided breast cancer genes. These genes are treated as positive examples for classification, with random negative gene examples selected by IMP for classification. The researcher runs the analysis on IMP's servers using the human functional gene network (Figure 3B). Each gene in the genome is assigned a probability based on its five-fold cross-validated SVM result, and results are sent by email, if provided, or viewed directly on the server though a result-specific URL (Figure 3C). Performance is evaluated as the area under the receiver-operator curve (AUC) and provided with the genome-wide prediction results. As we continue to update IMP's collection of functional networks in the future, the prediction performance of this tool is expected to improve even further, and we encourage biologists to rerun their analyses. With these features, IMP enables biologists to both pose complex biological questions and easily run sophisticated machine-learning tools to help answer them.

Figure 3.

Open in new tab Download slide

Diagram for submitting custom user predictions. (A) The input form for entering a gene set of interest. Genes can be pasted, selected from a saved gene set, or chosen from a pre-defined set. (B) IMP applies an SVM with the provided gene set as positive examples and predicts additional genome-wide genes likely to be functionally related. (C) The output is a list of genome-wide genes, ranked by their probability of functional relationship with the provided gene set. This result can be emailed to the user or accessed directly on the web server.

SUMMARY

IMP is a flexible, user-friendly web server that serves as an intuitive and accessible resource for molecular biologists who want to leverage heterogeneous biological big data collections to explore predictions of gene function and disease association in human and model organisms. The described updates add substantial value to IMP as a unique resource and suite of analysis tools for biological researchers. In the future, we plan to continue to add additional organisms (Arabidopsis thaliana) and additional data sources for our functional gene networks. We continue to develop additional tools that leverage our cross-organism collection of networks and predictions with the goal of making complex tools and analyses accessible to biological researchers.

FUNDING

National Science Foundation (NSF) CAREER [DBI-0546275]; National Institutes of Health [R01 GM071966, R01 HG005998, T32 HG003284]; National Institute of General Medical Sciences (NIGMS) Center of Excellence [P50 GM071508]. Funding for open access charge: Simons Foundation.

Conflict of interest statement. None declared.

REFERENCES

1.

Ozsolak

F.

Milos

P.M.

RNA sequencing: advances, challenges and opportunities

Nat. Rev. Genet.

2011

12

87

98

2.

Park

C.Y.

Wong

A.K.

Greene

C.S.

Rowland

J.

Guan

Y.

Bongo

L.A.

Burdine

R.D.

Troyanskaya

O.G.

Functional knowledge transfer for high-accuracy prediction of under-studied biological processes

PLoS Comput. Biol.

2013

9

e1002957

3.

Wong

A.K.

Park

C.Y.

Greene

C.S.

Bongo

L.A.

Guan

Y.

Troyanskaya

O.G.

IMP: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks

Nucleic Acids Res.

2012

40

W484

W490

4.

Zuberi

K.

Franz

M.

Rodriguez

H.

Montojo

J.

Lopes

C.T.

Bader

G.D.

Morris

Q.

GeneMANIA prediction server 2013 update

Nucleic Acids Res.

2013

41

W115

W122

5.

Guan

Y.

Gorenshteyn

D.

Burmeister

M.

Wong

A.K.

Schimenti

J.C.

Handel

M.A.

Bult

C.J.

Hibbs

M.A.

Troyanskaya

O.G.

Tissue-specific functional networks for prioritizing phenotype and disease genes

PLoS Comput. Biol.

2012

8

e1002694

6.

Huttenhower

C.

Haley

E.M.

Hibbs

M.A.

Dumeaux

V.

Barrett

D.R.

Coller

H.A.

Troyanskaya

O.G.

Exploring the human genome with functional maps

Genome Res.

2009

19

1093

1106

7.

Franceschini

A.

Szklarczyk

D.

Frankild

S.

Kuhn

M.

Simonovic

M.

Roth

A.

Lin

J.

Minguez

P.

Bork

P.

Von Mering

C.

et al.

STRING v9.1: protein-protein interaction networks, with increased coverage and integration

Nucleic Acids Res.

2013

41

D808

D815

8.

Schmitt

T.

Ogris

C.

Sonnhammer

E.L.L.

FunCoup 3.0: database of genome-wide functional coupling networks

Nucleic Acids ResD.

2014

42

D380

D388

Google Scholar

Crossref

WorldCat

9.

Chikina

M.D.

Troyanskaya

O.G.

Accurate quantification of functional analogy among close homologs

PLoS Comput. Biol.

2011

7

e1001074

10.

Chatr-Aryamontri

A.

Breitkreutz

B.-J.

Oughtred

R.

Boucher

L.

Heinicke

S.

Chen

D.

Stark

C.

Breitkreutz

A.

Kolas

N.

O'Donnell

L.

et al.

The BioGRID interaction database: 2015 update

Nucleic Acids Res.

2015

43

D470

D478

11.

Barrett

T.

Wilhite

S.E.

Ledoux

P.

Evangelista

C.

Kim

I.F.

Tomashevsky

M.

Marshall

K.A.

Phillippy

K.H.

Sherman

P.M.

Holko

M.

et al.

NCBI GEO: archive for functional genomics data sets–update

Nucleic Acids Res.

2013

41

D991

D995

12.

Licata

L.

Briganti

L.

Peluso

D.

Perfetto

L.

Iannuccelli

M.

Galeota

E.

Sacco

F.

Palma

A.

Nardozza

A.P.

Santonico

E.

et al.

MINT, the molecular interaction database: 2012 update

Nucleic Acids Res.

2012

40

D857

D861

13.

Kerrien

S.

Aranda

B.

Breuza

L.

Bridge

A.

Broackes-Carter

F.

Chen

C.

Duesbury

M.

Dumousseau

M.

Feuermann

M.

Hinz

U.

et al.

The IntAct molecular interaction database in 2012

Nucleic Acids Res.

2012

40

D841

D846

14.

Myers

C.L.

Robson

D.

Wible

A.

Hibbs

M.A.

Chiriac

C.

Theesfeld

C.L.

Dolinski

K.

Troyanskaya

O.G.

Discovery of biological networks from diverse functional genomic data

Genome Biol.

2005

6

R114

15.

Ashburner

M.

Ball

C.A.

Blake

J.A.

Botstein

D.

Butler

H.

Cherry

J.M.

Davis

A.P.

Dolinski

K.

Dwight

S.S.

Eppig

J.T.

et al.

Gene ontology: tool for the unification of biology. The Gene Ontology Consortium

Nat. Genet.

2000

25

29

16.

Amberger

J.S.

Bocchini

C.A.

Schiettecatte

F.

Scott

A.F.

Hamosh

A.

OMIM.org: online mendelian inheritance in man (OMIM^®), an online catalog of human genes and genetic disorders

Nucleic Acids Res.

2014

doi:10.1093/nar/gku1205

Google Scholar

OpenURL Placeholder Text

WorldCat

17.

Kibbe

W.A.

Arze

C.

Felix

V.

Mitraka

E.

Bolton

E.

Fu

G.

Mungall

C.J.

Binder

J.X.

Malone

J.

Vasant

D.

et al.

Disease ontology 2015 update: an expanded and updated database of human diseases for linking biomedical knowledge through disease data

Nucleic Acids Res.

2014

doi:10.1093/nar/gku1011

Google Scholar

OpenURL Placeholder Text

WorldCat

18.

Willis

M.S.

Rojas

M.

Li

L.

Selzman

C.H.

Tang

R.-H.

Stansfield

W.E.

Rodriguez

J.E.

Glass

D.J.

Patterson

C.

Muscle ring finger 1 mediates cardiac atrophy in vivo

Am. J. Physiol. Heart Circ. Physiol.

2009

296

H997

H1006

19.

Kedar

V.

McDonough

H.

Arya

R.

Li

H.-H.

Rockman

H.A.

Patterson

C.

Muscle-specific RING finger 1 is a bona fide ubiquitin ligase that degrades cardiac troponin I

Proc. Natl. Acad. Sci. U.S.A.

2004

101

18135

18140

20.

Chen

S.N.

Czernuszewicz

G.

Tan

Y.

Lombardi

R.

Jin

J.

Willerson

J.T.

Marian

A.J.

Human molecular genetic and functional studies identify TRIM63, encoding muscle RING finger protein 1, as a novel gene for human hypertrophic cardiomyopathy

Circ. Res.

2012

111

907

919

21.

Arber

S.

Hunter

J.J.

Ross

J.

Hongo

M.

Sansig

G.

Borg

J.

Perriard

J.C.

Chien

K.R.

Caroni

P.

MLP-deficient mice exhibit a disruption of cardiac cytoarchitectural organization, dilated cardiomyopathy, and heart failure

Cell

1997

88

393

403

22.

Hambleton

M.

Hahn

H.

Pleger

S.T.

Kuhn

M.C.

Klevitsky

R.

Carr

A.N.

Kimball

T.F.

Hewett

T.E.

Dorn

G.W.

Koch

W.J.

et al.

Pharmacological- and gene therapy-based inhibition of protein kinase Cα/β enhances cardiac contractility and attenuates heart failure

Circulation

2006

114

574

582

23.

Molkentin

J.D.

Robbins

J.

With great power comes great responsibility: Using mouse genetics to study cardiac hypertrophy and failure

J. Mol. Cell. Cardiol.

2009

46

130

136

24.

Guan

Y.

Ackert-Bicknell

C.L.

Kell

B.

Troyanskaya

O.G.

Hibbs

M.A.

Functional genomics complements quantitative genetics in identifying disease-gene associations

PLoS Comput. Biol.

2010

6

e1000991

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Download all slides

Month:	Total Views:
December 2016	10
January 2017	4
February 2017	19
March 2017	21
April 2017	10
May 2017	13
June 2017	7
July 2017	15
August 2017	8
September 2017	18
October 2017	16
November 2017	21
December 2017	41
January 2018	60
February 2018	57
March 2018	68
April 2018	34
May 2018	34
June 2018	19
July 2018	39
August 2018	32
September 2018	17
October 2018	20
November 2018	25
December 2018	22
January 2019	37
February 2019	38
March 2019	44
April 2019	41
May 2019	37
June 2019	25
July 2019	34
August 2019	61
September 2019	41
October 2019	25
November 2019	27
December 2019	28
January 2020	43
February 2020	27
March 2020	21
April 2020	9
May 2020	14
June 2020	24
July 2020	21
August 2020	16
September 2020	15
October 2020	14
November 2020	32
December 2020	9
January 2021	10
February 2021	28
March 2021	14
April 2021	30
May 2021	8
June 2021	8
July 2021	6
August 2021	13
September 2021	9
October 2021	14
November 2021	19
December 2021	7
January 2022	17
February 2022	10
March 2022	19
April 2022	28
May 2022	5
June 2022	21
July 2022	15
August 2022	25
September 2022	18
October 2022	21
November 2022	14
December 2022	6
January 2023	20
February 2023	12
March 2023	13
April 2023	14
May 2023	16
June 2023	10
July 2023	11
August 2023	14
September 2023	9
October 2023	11
November 2023	18
December 2023	14
January 2024	21
February 2024	17
March 2024	25
April 2024	27

Article Contents

IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks

Abstract

INTRODUCTION

IMP SYSTEM DATA UPDATES

DISEASE PREDICTIONS

PLATFORM FOR CUSTOM PREDICTIONS

SUMMARY

FUNDING

REFERENCES

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

Article Contents

IMP 2.0: a multi-species functional genomics portal for integration, visualization and prediction of protein functions and networks

Abstract

INTRODUCTION

IMP SYSTEM DATA UPDATES

DISEASE PREDICTIONS

PLATFORM FOR CUSTOM PREDICTIONS

SUMMARY

FUNDING

REFERENCES

Comments

Citations

Views

Altmetric

Email alerts

Citing articles via

Latest

Most Read

Most Cited

This Feature Is Available To Subscribers Only