Nucleic Acids Research Advance Access published online on May 7, 2007
Nucleic Acids Research, doi:10.1093/nar/gkm298
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Web Server Issue |
iHOP web services
1Structural Biology and Biocomputing Program, CNIO and 2Decentralized Information Group, Computer Science and Artificial Intelligence Laboratory, MIT.
*To whom correspondence should be addressed. Tel: +34 917 328 000; Fax: +34 912 246 976; Email: Jmfernandez{at}cnio.es
Received February 5, 2007. Revised April 5, 2007. Accepted April 12, 2007.
| ABSTRACT |
|---|
|
|
|---|
iHOP provides fast, accurate, comprehensive, and up-to-date summary information on more than 80 000 biological molecules by automatically extracting key sentences from millions of PubMed documents. Its intuitive user interface and navigation scheme have made iHOP extremely successful among biologists, counting more than 500 000 visits per month (iHOP access statistics: http://www.ihop-net.org/UniPub/iHOP/info/logs/). Here we describe a public programmatic API that enables the integration of main iHOP functionalities in bioinformatic programs and workflows.
| INTRODUCTION |
|---|
|
|
|---|
iHOP (1) (iHOP literature server, http://www.ihop.net.org) allows researchers to explore a network of gene and protein interactions by directly navigating the pool of published scientific literature. Rather than providing long lists of entire abstracts upon keyword searches, iHOP selectively retrieves information that is specific to genes and proteins and summarizes their interactions and functions. The system adds value by filtering and ranking extracted sentences according to significance, impact factor, date of publication and syntax.
iHOP web content is pre-compiled and generated in a multi-step process to annotate biomedical texts with gene and protein names, chemical compounds and MeSH terms. This annotation task is computationally expensive because of the sheer number of entities, but more importantly, hindered by a high semantic overloading of abbreviations and synonyms in biomedicine. The continuous development and optimization of heuristics and machine learning algorithms to improve entity detection and synonym disambiguation is therefore a central effort in the maintenance of iHOP.
Given the complexity and effort that goes into the development and maintenance of a text-mining pipeline, it makes sense to build upon the existing infrastructure of iHOP rather than reinventing the wheel. Already numerous online resources are linking to iHOP and novel tools are emerging which are based on the iHOP resource, e.g. iHOPerator (2). The iHOP web service API has already been tested in selected projects over the last 2 years and is made publicly available now. Although on any biocomputing facility APIs are not as visible to the end user, they are very important for the different omics, which usually depend on powerful data set analysis. Those powerful analysis run distributed workflows, which have to semantically integrate the results from diverse biocomputing facilities and data sources. Other large-scale biocomputing facilities provide environments such as NCBI Entrez (3) (Entrez CGI services, http://www.ncbi.nlm.nih.gov/entrez/query/static/eutils_help.html; Entrez SOAP services, http://www.ncbi.nlm.nih.gov/entrez/query/static/esoap_help.html) or EBI WS (4) (EBI SOAP services, http://www.ebi.ac.uk/Tools/webservices/).
| METHODS |
|---|
|
|
|---|
To make the iHOP programmatic interface remotely accessible and integrable on workflows in a way that is neutral to programming languages and vendor independent, we decided to implement the public API in the form of web services (5). Three popular web service API models have been implemented for iHOP: the REST model (6) (Wikipedia description of REST, http://en.wikipedia.org/wiki/Representational_State_Transfer), which the DAS (7) protocol follows; SOAP + WSDL, which is based on WSDL document description and uses SOAP messages and XML for messaging; BioMOBY (8), which is focused on bioinformatic workflow building (9). Table 1 contains a brief description of these API models. All three API implementations are based on a common internal library and common XML schemas to facilitate maintenance and future developments. MOBY implementation required additional efforts to integrate iHOP web services into the MOBY ontology.
|
The schema design was driven by the iHOP functionalities that are directly useful for bioinformatical workflows (Figures 1 and 2). Table 2 contains a brief description of these functionalities, with their inputs and outputs.
|
|
|
|
A key issue in the development was the design of an XML schema rich enough to describe and integrate the valuable information that is already accessible through the iHOP user interface. For instance, annotated sentences are generated by getSymbolDefinitions, getSymbolInteractions and getPubMed functionalities. Each sentence also provides information about the abstract, journal and the journal impact factor. Basic symbol information is provided by getSymbolInfo, and it can also be found on getSymbolDefinitions and getSymbolInteractions results. The designed XML Schema, along with its documentation, is available at the iHOP web services site.
Usually, gene symbol disambiguation is a hard task, made in the last term by the user, and its automation is an essential part in a useful workflow. Using specific heuristics for these web services, we have created an additional functionality called guessSymbolIdFromSymbolText, which guesses the nearest unambiguous iHOP gene symbol id from free text input and an optional target organism. This concept is very similar to I'm feeling lucky Google functionality, and the functionality speeds up workflow building. Workflow writers are not tied to this service and its heuristics, because anyone can create their own heuristics about symbol selection using getRelatedSymbols output.
Under the REST (Representational State Transfer) paradigm there is a CGI-XML service available for all functionalities described earlier. Special return cases have been modelled using standard HTTP codes: when there is no answer for a query in a CGI-XML service, a 404 Not Found error is returned; if an internal error happens, a 500 Internal Server Error is used; if no input parameter is specified, a 400 Bad Request error is returned.
For SOAP (Simple Object Access Protocol), we created for each functionality variations of the same web service, to simplify workflow building. SOAP services use the RPC/encoded WSDL style, so they can be used from Perl programs with any SOAP::Lite version. Critical errors (no input parameter, internal server error) are reported by the iHOP SOAP services using the standard SOAP fault mechanism. When there is no answer to return, the services return a specific XML structure (iHOPSOAPNotFound) designed for these SOAP services, instead of using SOAP fault mechanism. This is important, because some workflow enactment tools (like Taverna) stop the whole workflow when a SOAP service returns a SOAP fault, an undesirable effect when a service invocation has not failed.
In the design of BioMOBY services it was necessary to comply with the common object ontology on MOBY Central and the portfolio of services that are using this ontology. Although the main iHOP services take the same parameters as input and use the same XML schema as CGI-XML and SOAP for their outputs, the true power of iHOP MOBY service are the additional translation services. These services take as input iHOP XML structures generated by the iHOP services, and translate the content into a collection of usable MOBY objects. This way, other MOBY services which use the same ontology can be chained to this output.
CGI-XML services were tested using both web browsers and command-line HTTP retrieval tools (like wget). We tested and cross validated the functionality of iHOP SOAP web services with unit tests based on the Perl SOAP::Lite library and in the context of Taverna (10,11), a workflow enactment tool extensively used by the bioinformatics community. We found that SOAP::Lite 0.60 had a better behaviour than former versions and some new intermediate ones (last version is 0.69). Taverna 1.4 and 1.5 are discouraged, because SOAP services results are pruned. Taverna 1.5.1 solves these and other issues, and it is recommended. Older versions, like Taverna 1.3.1, also work, but they have many limitations related to BioMOBY services.
| RESULTS AND DISCUSSION |
|---|
|
|
|---|
A proof for the functionality, completeness and usefulness of the iHOP web service APIs are a number of collaborative projects that make programmatic use of iHOP content. Table 3 contains a brief list of the projects where iHOP web services have been used, and Figure 3 shows a CARGO (Cases et al., submitted to NAR-WEB 2007) widget using information provided by iHOP CGI-XML services.
|
In the context of the use of iHOP as a web service it is necessary to be aware of the current limitations of biological text mining. BioCreAtIvE (12) and other blind community assessments (13) have clearly shown that name identification and in particular matching gene/name in the literature with the corresponding database entries is a hard problem and the best systems are still far from perfect (14). Our own evaluation of iHOP in 2005 (15) shows that in model organisms the average precision is around 94% and the recall around 87%. Even if the inclusion of additional refinements and dictionaries is producing continuous progress the poor adhesion of the community to naming standards (16) will continue creating problems in this area.
Other obvious limitations of iHOP and all other current text mining systems are imposed by the limited availability of full text sources [main reason for the common use of abstract collections (17)] and the still limited possibilities to incorporate effective Natural Processing Techniques for the extraction of additional features from biomedical text. A more detailed description of the status of this fast developing field can be found in (1820).
Despite these general limitations in the field, the iHOP web interface has become popular among biologists searching for information about the function and relation of the genes and proteins of their interest. To our knowledge, iHOP is the only large-scale text mining resource in biology that is offered as an open web service, we therefore, expect that the novel possibilities described in this work will contribute to the use of iHOP as part of numerous high-throughput analysis environments.
| AVAILABILITY |
|---|
|
|
|---|
Information relevant to developers, like detailed documentation of the iHOP web service XML file format, the URLs required to invoke the REST API, the WSDL document describing SOAP services and usage examples in Perl and Taverna are available at http://www.ihop-net.org/UniPub/iHOP/webservices/.
| ACKNOWLEDGEMENTS |
|---|
Funding to pay the Open Access publication charges for this article was provided by ENFIN Network of Excellence (LSHG-CT-2005-518254).
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Hoffmann R, Valencia A. A gene network for navigating the literature. Nat. Genet, ( (2004) ) 36, : 664664.[CrossRef][Web of Science][Medline]
- Good BM, Kawas EA, Kuo BY, Wilkinson MD. iHOPerator: User-scripting a personalized bioinformatics Web, starting with the iHOP website. BMC Bioinformatics, ( (2006) ) 7, : 534.[CrossRef][Medline]
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res, ( (2007) ) 35, : D5D12.
[Abstract/Free Full Text] - Labarga A, Pilai S, Valentin F, Anderson M, Lopez R. Web services at EBI. EMBnet.news, ( (2005) ) 11, : 1823.
- Kreger H. Web Sevices Conceptual Architecture (WSCA) 1.0, ( (2001) ) IBM Software Group.
- Fielding RT. Architectural Styles and the Design of Network-based Software Architectures, ( (2000) ) University of California, Irvine: Doctoral dissertation.
- Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics, ( (2001) ) 2, : 7.[CrossRef][Medline]
- Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief. Bioinform, ( (2002) ) 3, : 331341.
[Abstract/Free Full Text] - Wilkinson M, Schoof H, Ernst R, Haase D. BioMOBY successfully integrates distributed heterogeneous bioinformatics Web Services. The PlaNet exemplar case. Plant Physiol, ( (2005) ) 138, : 517.
[Abstract/Free Full Text] - Oinn T, Addis M, Ferris J, Marvin D, Senger M, Greenwood M, Carver T, Glover K, Pocock MR. Taverna: a tool for the composition and enactment of bioinformatics workflows. Bioinformatics, ( (2004) ) 20, : 30453054.
[Abstract/Free Full Text] - Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Taverna: a tool for building and running workflows of services. Nucleic Acids Res, ( (2006) ) 34, : W729W732.
[Abstract/Free Full Text] - Hirschman L, Yeh A, Blaschke C, Valencia A. Overview of BioCreAtIvE: critical assessment of information extraction for biology. BMC Bioinformatics, ( (2005) ) 6, (Suppl. 1): S1.
- Jin-Dong K, Ohta T, Tsuruoka Y, Tateisi Y, Collier N. Introduction to the bio-entity recognition task at JNLPBA. In. ( (2004) ) Proceedings of the International Workshop on Natural Language Processing in Biomedicine and its Applications (JNLPBA-04). 7075.
- Hirschman L, Colosimo M, Morgan A, Yeh A. Overview of BioCreAtIvE task 1B: normalized gene lists. BMC Bioinformatics, ( (2005) ) 6, (Suppl. 1): S11.
- Hoffmann R, Valencia A. Implementing the iHOP concept for navigation of biomedical literature. Bioinformatics, ( (2005) ) 21, (Suppl. 2): ii252ii258.[Abstract]
- Tamames J, Valencia A. The success (or not) of HUGO nomenclature. Genome Biol, ( (2006) ) 7, : 402.[CrossRef][Medline]
- Schuemie MJ, Weeber M, Schijvenaars BJ, van Mulligen EM, van der Eijk CC, Jelier R, Mons B, Kors JA. Distribution of information in biomedical abstracts and full-text publications. Bioinformatics, ( (2004) ) 20, : 25972604.
[Abstract/Free Full Text] - Krallinger M, Erhardt RA, Valencia A. Text-mining approaches in molecular biology and biomedicine. Drug Discov. Today, ( (2005) ) 10, : 439445.[CrossRef][Web of Science][Medline]
- Hoffmann R, Krallinger M, Andres E, Tamames J, Blaschke C, Valencia A. Text mining for metabolic pathways, signaling cascades, and protein networks. Sci. STKE, ( (2005) ) 2005, : 21.
- Krallinger M, Valencia A. Text-mining and information-retrieval services for molecular biology. Genome Biol, ( (2005) ) 6, : 224.[CrossRef][Medline]
This article has been cited by other articles:
![]() |
P. Alexiou, T. Vergoulis, M. Gleditzsch, G. Prekas, T. Dalamagas, M. Megraw, I. Grosse, T. Sellis, and A. G. Hatzigeorgiou miRGen 2.0: a database of microRNA genomic information and regulation Nucleic Acids Res., November 17, 2009; (2009) gkp888v2. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Plake, L. Royer, R. Winnenburg, J. Hakenberg, and M. Schroeder GoGene: gene annotation in the fast lane Nucleic Acids Res., July 1, 2009; 37(suppl_2): W300 - W304. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. C. Tsoi, M. Boehnke, R. L. Klein, and W. J. Zheng Evaluation of genome-wide association study results through development of ontology fingerprints Bioinformatics, May 15, 2009; 25(10): 1314 - 1320. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Martin, A. Valsesia, A. Telenti, I. Xenarios, and B. J. Stevenson AssociationViewer: a scalable and integrated software tool for visualization of large-scale variation data in genomic context Bioinformatics, March 1, 2009; 25(5): 662 - 663. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. J. Richardson, Q. Gao, C. Mitsopoulous, M. Zvelebil, L. H. Pearl, and F. M. G. Pearl MoKCa database--mutations of kinases in cancer Nucleic Acids Res., January 1, 2009; 37(suppl_1): D824 - D831. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Andres Leon, I. Ezkurdia, B. Garcia, A. Valencia, and D. Juan EcID. A database for the inference of functional interactions in E. coli Nucleic Acids Res., January 1, 2009; 37(suppl_1): D629 - D635. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-J. Dai, C.-H. Huang, R. T. K. Lin, R. T.-H. Tsai, and W.-L. Hsu BIOSMILE web search: a web application for annotating biomedical entities and relations Nucleic Acids Res., July 1, 2008; 36(suppl_2): W390 - W398. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. S. Srinivasan, N. H. Shah, J. A. Flannick, E. Abeliuk, A. F. Novak, and S. Batzoglou Current progress in network research: toward reference networks for key model organisms Brief Bioinform, September 1, 2007; 8(5): 318 - 332. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





