Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (49K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 113-114  


Virgil database for rich links (1999 update)
Background
Rich Links
Implementation
Discussion
Acknowledgements
References


Virgil database for rich links (1999 update)

Virgil database for rich links (1999 update)

Frédéric Achard1,*, Guy Vaysseix1,2, Philippe Dessen1 and Emmanuel Barillot1,2

1GIS Infobiogen, 7 rue Guy Môquet BP 8, 94801 Villejuif cedex, France and 2Généthon, 1 rue de l'Internationale BP 60, 91002 Évry cedex, France

Received October 1, 1998; Accepted October 7, 1998

ABSTRACT

With so many databases available for research in the Human Genome Project, it is crucial to efficiently relate information from different resources. For that purpose, we maintain Virgil, a database of rich links for data browsing, data analysis and database interconnection. Virgil current version contains more than 40 000 rich links from five major databases: SWISS-PROT, GenBank, PDB, GDB and OMIM. Materials described in this paper are available from http://www.infobiogen.fr/services/virgil/

BACKGROUND

In their day to day work, the biologists conducting research for the Human Genome Project constantly need to relate data from heterogeneous resources. As usually found in databases, a link is simply an interconnection between two biological objects. Without any further indication, it is sometimes difficult to know what is hidden behind a link. Moreover, many of these links are difficult to retrieve, if not missing. This is exemplified by Macauley et al. (1) in their study on GenBank and the human and mouse genome databases, where they found many erroneous or missing links.

Virgil was developed to collect, manage and distribute rich links, that is, the link itself and related pieces of information to document the nature of the link.

In the 1998 Database Issue of Nucleic Acids Research, we introduced Virgil (2), a database of rich links between GDB and GenBank. The current version of Virgil now contains links between five major databases of the human genome project, with a focus on human data. So, in addition to GDB and GenBank, Virgil now stores rich links from SWISS-PROT, PDB and OMIM (3-7).

RICH LINKS

Virgil schema was designed to comprehensively describe a link between two biological objects. A Virgil link is a bi-directional connection between two remote database objects. The two database objects are referred to by a unique identifier, prefixed with the database name. Link characterization is effective by means of annotations.

The current version of Virgil contains more than 40 000 rich link objects and about the same number of remote database objects.

The links were extracted from the remote databases using SRS (8) query program in combination with shell and perl scripts. The results are reported in Table 1.


Table 1. Number of rich links in Virgil
NA stands for non-assessed.

With the addition of links from other databases, Virgil defined new types of links, as described below.

(i) SWISS-PROT is a curated protein sequence database, known for the high level of its annotations. The DR field was queried to extract PDB, GenBank and OMIM identifiers. A link between a SWISS-PROT entry and a PDB entry is denoted as a `VIRGIL.UNION', in the sense that both entries actually describe parts of the same object (the 1D and the 3D structure of the same biological entity). A link between a SWISS-PROT entry and a GenBank entry is a `VIRGIL.RELATION_PRODUCT'; and a link between a SWISS-PROT entry and an OMIM entry is a `VIRGIL.RELATION_REFERENCE'.

(ii) GenBank is the NIH's database of all known nucleotide and protein sequences including supporting bibliographic and biological information. Links from GenBank to SWISS-PROT and GDB were extracted from the FEATURES field (the /db_xref qualifier).

(iii) The Protein Data Bank (PDB) is an archive of experimentally determined 3D structures of biological macromolecules. The links were extracted from the DBREF field.

(iv) Online Mendelian Inheritance in Man (OMIM) is a catalog of human genes and genetic disorders authored and edited by Dr Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere. The OMIM databank does not contain links to external databases, although it is linked to other databases by means of the Entrez system.

(iv) GDB is the official central repository for human genomic mapping data. Termination of GDB is scheduled for January 31, 1999 (http://www.gdb.org/shutdown/shutdown.html ); the data will then be passed to the Oak Ridge National Laboratory (ORNL: http://compbio.ornl.gov ). Due to the uncertain future of the data, we have temporarily ceased our developments on GDB.

IMPLEMENTATION

Virgil is maintained under the EYEDB system (developed by Sysra informatique, technical documents are available from http://www.infobiogen.fr/services/eyedb ), an object-oriented engine, which provides a generic Web interface for data browsing.

Simple access to Virgil is given via searches from a Web form. It allows the retrieval of all the links attached to a remote biological object by entering its unique identifier (such as SWISSPROT:P02248 or OMIM:191320). As shown in Figure 1, one can also enter a Virgil link identifier (such as VIRGIL:53685) to retrieve a rich link. A list of hyperlinks to EYEDB object links is returned. For test purposes, theVirgil CORBA server is also available for querying rich links.


Figure 1. Virgil simple query.

We refer the reader to Achard et al. (9) for a technical account of Virgil's implementation. This paper describes the computing issues of the Virgil database. It presents the arguments for a centralized database of links and emphasizes the importance of controlled vocabulary to describe the semantic (meaning) of a link. A major goal is to facilitate programmed access to the data, and consequently, to build the framework that will allow database interoperation (10).

DISCUSSION

Maintainers of the major genome databases are making a great effort to cross-reference data via links of good quality. However, ensuring that a collection of links is complete and up to date is a very difficult task due to inconsistent naming conventions, different semantics and rapid evolution of biological knowledge. A central database of links is a first step that will help in the distribution of serviceable links. In addition, data access is easier because the links share the same data model.

Link consistency is difficult to enforce over independent databases. For example, database links are not systematically reciprocated. We found that only 14% of the links from SWISS-PROT to GenBank have corresponding links in GenBank, while 30% of the links from SWISS-PROT to PDB have corresponding links in PDB.

Our mid-term project is to enrich the Virgil database with links from other databases. Notably, we are working on adding Medline references. We are also planning to define a link submission protocol for expert annotation from individuals via a Web form and batch submission of a collection of links. For the time being, link submission is manual: any new links or link annotations should be sent to virgil@infobiogen.fr.

ACKNOWLEDGEMENTS

The authors wish to thank Eric Viara, Christophe Cussat-Blanc and Philippe Gesnouin for excellent computing support. We also would like to thank the authors and maintainers of the databases used to construct Virgil.

REFERENCES

1. Macauley,J. Wang,H. and Goodman,N. (1998) Bioinformatics, 14, 575-582. MEDLINE Abstract

2. Achard,F. and Barillot,E. (1998) Nucleic Acids Res., 26, 100-101. MEDLINE Abstract

3. Abola,E.E., Bernstein,F.C., Bryant,S.H., Koetzle,T.F. and Weng,J. (1987) Crystallographic Databases-Information Content. Chapter: Protein Data Bank, pp. 107-132. Data Commission of the International Union of Crystallography, Bonn/Cambridge/Chester. URL: http://www.pdb.bnl.gov

4. Bairoch,A. and Apweiler,R. (1998) Nucleic Acids Res., 26, 38-42. URL: http://www.ebi.ac.uk/ebi_docs/swissprot_db/swisshome.html MEDLINE Abstract

5. Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J. and Ouellette,B.F.F. (1998) Nucleic Acids Res., 26, 1-7. URL: http://www.ncbi.nlm.nih.gov/ MEDLINE Abstract

6. Letovsky,S.I., Cottingham,R.W., Porter,C.J. and Li,P.W.D. (1998) Nucleic Acids Res., 26, 94-99. URL: http://www.gdb.org/ MEDLINE Abstract

7. McKusick,V.A. (1994) Mendelian inheritance in man. Johns Hopkins University Press, Baltimore, USA. URL: http://www.ncbi.nlm.nih.gov/Omim/

8. Etzold,T., Ulyanov,A. and Argos,P. (1996) Methods Enzymol., 266, 114-128. MEDLINE Abstract

9. Achard,F., Cussat-Blanc,C., Viara,E. and Barillot,E. (1998) Bioinformatics, 14, 342-348. MEDLINE Abstract

10. Karp,P.D. (1996) Trends Biotechnol., 14, 273-279.


*To whom correspondance should be addressed. Tel: +33 1 49 58 36 82; Fax: +33 1 45 59 52 50; Email: fred@infobiogen.fr


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (49K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?