Nucleic Acids Research Advance Access originally published online on May 5, 2007
Nucleic Acids Research 2007 35(Web Server issue):W16-W20; doi:10.1093/nar/gkm280
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, No. suppl_2 W16-W20
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Articles |
CARGO: a web portal to integrate customized biological information
1SCOMPBio Group, 2Bioinformatics Unit (UBio), Structural Biology and Biocomputing Programme, Spanish National Cancer Research Centre (CNIO), Madrid, Spain and 3Spanish National Institute for Bioinformatics (INB), Spain
*To whom correspondence should be addressed: Tel.: +34-91-224-6900; Fax: +34-91-224-8006; Email: dgpisano{at}cnio.es
Received February 5, 2007. Revised April 5, 2007. Accepted April 11, 2007.
| ABSTRACT |
|---|
|
|
|---|
There is a huge quantity of information generated in Life Sciences, and it is dispersed in many databases and repositories. Despite the broad availability of the information, there is a great demand for methods that are able to look for, gather and display distributed data in a standardized and friendly way. CARGO (Cancer And Related Genes Online) is a configurable biological web portal designed as a tool to facilitate, integrate and visualize results from Internet resources, independently of their native format or access method. Through the use of small agents, called widgets, supported by a Rich Internet Application (RIA) paradigm based on AJAX, CARGO provides pieces of minimal, relevant and descriptive biological information. The tool is designed to be used by experimental biologists with no training in bioinformatics. In the current state, the system presents a list of human cancer genes. Available at http://cargo.bioinfo.cnio.es
| INTRODUCTION |
|---|
|
|
|---|
With the implementation of large-scale analysis initiatives, the amount of information in terms of biological data availability is overwhelming, as reflected by the hundreds of databases (1) and web servers (2) described in the literature. This number is expected to grow year on year, increasing the size of resources available to the experimental research community. These resources have a great value to scientists for proposing novel hypotheses and delineating further research, but are sometimes difficult to access due to both usability and maintenance issues.
In addition, without a strategy to efficiently extract it, the information may become unusable as data availability increases. Efforts to implement integration and standardization have been developed in different frameworks. To mention a few, the BioMOBY (3) (http://www.biomoby.org) and DAS (4) (http://www.biodas.org) projects aim to leverage the retrieval and integration of biological data served from distributed resources at the machine level through commonly agreed public conventions. The open approach and the potential of those initiatives have been well accepted by the scientific community (5), but still substantial improvement is needed to present them to end users. The creation of customized bioinformatics environments would greatly facilitate end user interaction with the information, and would enable its effective use. This is particularly important when the end users are experimentalists with no training in Bioinformatics. The task of developing systems for this type of user constitutes a challenge for developers and individuals trained in computing and bioinformatics.
The current trends to create specialized user interfaces fall into two different approaches: data aggregation, or super-specialization. Examples of the first approach are illustrated by the main biological data repositories like NCBI (6) (http://www.ncbi.nlm.nih.gov/) and Ensembl (7) (http://www.ensembl.org/), or by web servers like Harvester (8) (http://harvester.embl.de/), where exhaustive information about a queried entity can be retrieved and presented simultaneously to the user. The aggregation concept is an approach that suffers from certain issues: although it simplifies the task of searching over multiple disparate resources by providing a unique entry point, it may add complexity to the analysis of the results. This is in part caused by inherent problems such as representing heterogeneous findings into a single result format, or eliminating data redundancy if the same piece of information is retrieved from more than one source.
Certain super-specialized servers address some of these issues with strategies like presenting only domain specific data obtained from selected sources (9) (http://www.efamily.org.uk/software/dasclients/spice/). However, the high customization required for a particular task produces a lack of flexibility. The user might deal with complex data from multiple origins and this creates a myriad of specialized applications.
CARGO represents a new generation of visualization tools that aims to circumvent these two problems. In this work we present a system that aims to facilitate the visualization and the analysis of biological data. Its philosophy is to address the problem of aggregation by presenting only slices of core information in a unified interface, and leaving up to the user whether to search more details by providing links to the original source.
The system uses a gene-based querying procedure that activates a number of small software agents (called widgets) to retrieve, relate and display concise, accurate and relevant information. As CARGO deals with disparate sources, and allows a great technological diversity, the problem of super-specialization is also addressed. In its current state, it provides updated information obtained from reliable data sources such Ensembl (7), CPATH (10), dbSNP (6), PDB (11), OMIM (12) or iHOP (13) via DAS (4) or Web services (3).
| THE SYSTEM |
|---|
|
|
|---|
The system has three main modules. The first one constitutes the query module (Figure 1a), where the user can search Ensembl (v41October 2006) (7) gene descriptions by either writing a free text option, or by choosing results from a precompiled lists of genes. To illustrate this, we have included a list containing cancer candidate genes for breast and colon cancer (14) as an example. The second part of the system is the core module (Figure 1b), which connects the results, in the form of a normalized gene list, to widgets. When the user chooses a gene in the results list, this module broadcasts the gene identifier to the open widgets, and activates their internal logic. The last element is formed by the widgets themselves, which retrieve and assemble the information (Figure 1c) in a process transparent to the user. A widget is just a small web page, designed to show information in a visual and concise manner. This web page is produced by a server-side program (a CGI, for instance), which is called by the core module via an HTTP GET request, passing an Ensembl identifier as the only parameter. The server-side part of the widgets that retrieves and displays the information is technology-agnostic, and currently implements many technologies including DAS (4), specialized APIs, specialized Web services (3), and other web technologies like RSS. Since the widgets are called by standard HTTP requests, any widget can be hosted remotely with respect to the core module. Thus, third-party developed widgets can easily be included into CARGO. Developers willing to display their widgets in CARGO, need only to provide a URL, and make their server-side program respond to the HTTP GET request ?ensemblid=%, where % means the Ensembl identifier selected by the user. The core module in CARGO calls the URL of the opened widgets, and loads the valid HTML document provided by the widget into its window.
|
We have developed a set of widgets that addresses different biological problems to represent the concept of CARGO's widget implementation. The literature mining widget, or iHOP widget (Figure 2), is the CARGO implementation of the popular iHOP tool (13), which provides information from PubMed abstracts. The widget uses Web services technologies (José M. Fernández, in press) to query iHOP. The output is a comprehensive collection of sentences mined from the literature, either defining a gene or its interactions with other genes. The 3D Coding SNPs (Figures 1c and 2), maps the non-synonymous SNPs found in proteins onto PDB structures (11). The widget retrieves information from dbSNP database using Ensembl APIs, and maps the gene sequence to retrieved PDB structures using a DAS alignment server (http://das.sanger.ac.uk/das/msdpdbsp), but the process is transparent to the end user. The Disease Information module (Figure 2) is the widget implementation of the OMIM (12) database, providing summarized information about diseases associated to the gene. The Interactome widget extracts information from the proteinprotein interactions database cPath (10), along with experimental evidence of the interaction and literature references. The Transcript annotations widget shows functional predictions for the query using the FunCUT pipeline (15). Due to the large computational resources used by this pipeline, predictions have to be pre-calculated.
|
| USAGE |
|---|
|
|
|---|
CARGO is designed to be extremely simple and clear to use. To best show its capabilities we propose biological questions and answer them on-the fly (Figure 2, background coloured boxes).
We can imagine a user asking a simple biological question: How many SNPs of P53 are coding and can be mapped onto solved structures? With the current tools available, the user would need to navigate at least two different Internet resources, PDB and dbSNP, to retrieve the information about the structure and the variations, respectively. Once this step is done, the user would have to manually find the correspondence between these two resources, which means mapping the sequence to the structure. A further step would require determining the localization of the variation mapped onto the correct structure, which is itself a tricky matter. To do this, the user would have to download and learn to use a specialized viewer.
Using CARGO, the user just searches for P53 (Figure 2, red star) and selects this gene in the results list. Since the SNP-3D widget integrates information from dbSNP, PDB, structure/sequence equivalences and a visualizer, all of the aforementioned steps are addressed at once (Figure 2, red background boxes). This widget exemplifies the integration level design concept in CARGO: the user can access three different kinds of information and integrate them in a minimal working space, because special attention has been paid to visually represent all the relevant data in a concise way. Additional visualization tools facilitate the interpretation of the results. For instance, the blue top bar in the widget represents the length of the protein and the grey area indicates the structural coverage in the sequence by the selected structure (in this case, 1TPU). The red vertical bars indicate all the residues mapped as SNPs. By choosing a particular SNP either in the bar or using the menus, the user can view the SNP mapped onto the structure (yellow balls in the structure). The user can select alternative structures and also links are provided to the original sources if more detailed information is desired.
Once the coding SNPs are mapped into the structure, the user could ask how many of these coding SNPs are defined as allelic variants? To answer this, the user should navigate the OMIM database and read all the free text. By activating the OMIM widget in CARGO, a small window shows all the results for the P53 gene. Searching in the allelic variants section, we can compare them with the SNPs shown by the SNP-3D widget. For instance, the allelic variant ARG249SER is a coding SNP (Figure 2). We can ask still further questions, such as is this allelic variant associated to any disease? As seen in the OMIM entry, it is linked to hepatocellular carcinoma.
One straightforward task would be to retrieve literature regarding this disease or this allelic variant. The usual way would be to navigate and perform advanced text searches in PubMed. These searches usually provide lots of unwanted results. In CARGO, activation of the iHOP widget brings up a whole list of references involving the protein. Again, the visual aspect of the widget emphasizes and highlights the relevance of the term in the context of the sentence. Here, an entry relating the p53 with hepatocellular carcinoma is shown, along with links to the original article. By activating the other widgets, the user can display alternative information. Any widget can be minimized, displayed or hidden (Figure 2, black circles).
| SUMMARY |
|---|
|
|
|---|
CARGO is capable of integrating information at different levels, ranging from visualization to mapping. The main strength of CARGO, as compared to alternative tools (8) (http://harvester.embl.de/), is its go-to-the-core capabilities. At the same time it aims to facilitate the interpretation and retrieval of information in a second-generation visualization framework. It features technology-agnostic widget creation, which allows the inclusion of as many disparate resources as desired to completely address a range of questions.
By definition, CARGO is an open platform, since it can be extended just by the addition of new widgets and searching modules. At the same time, the usefulness of CARGO depends strongly on the diversity and quality of the available widgets. We are already developing new widgets to mediate further information, such as the visualization of protein annotation at the residue level, or the visualization of the tissues in which a gene is expressed, as reported by the GNF Gene Atlas (16). Additional query modules are also under development, including one that allows the graphical generation of gene lists from genome coordinates. We hope CARGO will grow by community effort, so we are preparing documentation to facilitate other parties to develop their own widgets or searching modules, which can be easily integrated into the CARGO structure. Given the simplicity and flexibility of the CARGO concept, we expect that many of these widgets will be incorporated in the near future.
| SUPPORTED PLATFORMS |
|---|
|
|
|---|
A selected community of experimental researchers is currently testing the system. CARGO requires Firefox 1.5 and Java 1.4.1, and has been tested in Windows, Mac OS X and Linux. Additional support for alternative browsers is currently in progress.
| CONTRIBUTIONS |
|---|
|
|
|---|
I.C., J.M.R., J.M.F. and J.F.V. developed the widgets. A.C. and E.A. programmed the search modules. A.C. and J.F.V. worked out the main framework. J.F.V. designed the website. G.G.-L. provided scientific support and wrote the documentation. I.C., D.G.P. and A.M.R. developed the general concept and coordinated the project. D.G.P., A.M.R. and A.V. assembled the manuscript. A.V. did the general coordination.
| ACKNOWLEDGEMENTS |
|---|
I.C. is a recipient of a Ramon y Cajal programme, J.F.V. is supported by EU Marie Curie MIRG-CT-2005-016499, G.G.-L. is funded by the Biomedical Foundation of Vigo Hospitalary Complex (FICHUVI). This work has been partially financed by EU BIOSAPIENS (LSHC-CT-2003-505265) and EU EMBRACE (LSCG-CT-2004-512092), and by the National Institute for Bioinformatics (www.inab.org), a platform of Genoma España. Funding to pay the Open Access publication charges for this article was provided by EU BIOSAPIENS (LSHC-CT-2003-505265).
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Bateman A. Editorial. Nucleic Acids Res (2007) 35:D1D2.[CrossRef][ISI]
- Bateman A. Editorial. Nucl. Acids Res (2006) 34:W1.
[Free Full Text] - Wilkinson MD, Links M. BioMOBY: an open source biological web services proposal. Brief Bioinform (2002) 3:331341.
[Abstract/Free Full Text] - Dowell RD, Jokerst RM, Day A, Eddy SR, Stein L. The distributed annotation system. BMC Bioinformatics (2001) 2:7.[CrossRef][Medline]
- Stein L. Creating a bioinformatics nation. Nature (2002) 417:119120.[CrossRef][Medline]
- Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, et al. Database resources of the National Center for Biotechnology Information. Nucleic Acids Res (2007) 35:D5D12.
[Abstract/Free Full Text] - Hubbard TJ, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, et al. Ensembl 2007. Nucleic Acids Res (2007) 35:D610D617.
[Abstract/Free Full Text] - Liebel U, Kindler B, Pepperkok R. Bioinformatic "Harvester": a search engine for genome-wide human, mouse, and rat protein resources. Methods Enzymol (2005) 404:1926.[ISI][Medline]
- Prlic A, Down TA, Hubbard TJ. Adding some SPICE to DAS. Bioinformatics (2005) 21(Suppl 2):ii40ii41.[Abstract]
- Cerami EG, Bader GD, Gross BE, Sander C. cPath: open source software for collecting, storing, and querying biological pathways. BMC Bioinformatics (2006) 7:497.[CrossRef][Medline]
- Berman HM, Bhat TN, Bourne PE, Feng Z, Gilliland G, Weissig H, Westbrook J. The Protein Data Bank and the challenge of structural genomics. Nat. Struct. Biol (2000) 7(Suppl):957959.[CrossRef][Medline]
- Hamosh A, Scott AF, Amberger JS, Bocchini CA, McKusick VA. Online mendelian inheritance in man (OMIM), a knowledgebase of human genes and genetic disorders. Nucleic Acids Res (2005) 33:D514D517.
[Abstract/Free Full Text] - Hoffmann R, Valencia A. A gene network for navigating the literature. Nat. Genet (2004) 36:664.[CrossRef][ISI][Medline]
- Sjoblom T, Jones S, Wood LD, Parsons DW, Lin J, Barber TD, Mandelker D, Leary RJ, Ptak J, et al. The consensus coding sequences of human breast and colorectal cancers. Science (2006) 314:268274.
[Abstract/Free Full Text] - Abascal F, Valencia A. Automatic annotation of protein function based on family identification. Proteins (2003) 53:683692.[CrossRef][ISI][Medline]
- Kent WJ, Sugnet CW, Furey TS, Roskin KM, Pringle TH, Zahler AM, Haussler D. The human genome browser at UCSC. Genome Res (2002) 12:9961006.
[Abstract/Free Full Text]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

