| Nucleic Acids Research | Pages |
Virgil database for rich links (1999 update)
Background
Rich Links
Implementation
Discussion
Acknowledgements
References
Virgil database for rich links (1999 update)
ABSTRACT
BACKGROUND
In their day to day work, the biologists conducting research for the Human Genome Project constantly need to relate data from heterogeneous resources. As usually found in databases, a link is simply an interconnection between two biological objects. Without any further indication, it is sometimes difficult to know what is hidden behind a link. Moreover, many of these links are difficult to retrieve, if not missing. This is exemplified by Macauley et al. (1) in their study on GenBank and the human and mouse genome databases, where they found many erroneous or missing links.
Virgil was developed to collect, manage and distribute rich links, that is, the link itself and related pieces of information to document the nature of the link.
In the 1998 Database Issue of Nucleic Acids Research, we introduced Virgil (2), a database of rich links between GDB and GenBank. The current version of Virgil now contains links between five major databases of the human genome project, with a focus on human data. So, in addition to GDB and GenBank, Virgil now stores rich links from SWISS-PROT, PDB and OMIM (3-7).
RICH LINKS
Virgil schema was designed to comprehensively describe a link between two biological objects. A Virgil link is a bi-directional connection between two remote database objects. The two database objects are referred to by a unique identifier, prefixed with the database name. Link characterization is effective by means of annotations.
The current version of Virgil contains more than 40 000 rich link objects and about the same number of remote database objects.
The links were extracted from the remote databases using SRS (8) query program in combination with shell and perl scripts. The results are reported in Table 1.
Table 1.
With the addition of links from other databases, Virgil defined new types of links, as described below.
(i) SWISS-PROT is a curated protein sequence database, known for the high level of its annotations. The DR field was queried to extract PDB, GenBank and OMIM identifiers. A link between a SWISS-PROT entry and a PDB entry is denoted as a `VIRGIL.UNION', in the sense that both entries actually describe parts of the same object (the 1D and the 3D structure of the same biological entity). A link between a SWISS-PROT entry and a GenBank entry is a `VIRGIL.RELATION_PRODUCT'; and a link between a SWISS-PROT entry and an OMIM entry is a `VIRGIL.RELATION_
(ii) GenBank is the NIH's database of all known nucleotide and protein sequences including supporting bibliographic and biological information. Links from GenBank to SWISS-PROT and GDB were extracted from the FEATURES field (the /db_xref qualifier).
(iii) The Protein Data Bank (PDB) is an archive of experimentally determined 3D structures of biological macromolecules. The links were extracted from the DBREF field.
(iv) Online Mendelian Inheritance in Man (OMIM) is a catalog of human genes and genetic disorders authored and edited by Dr Victor A. McKusick and his colleagues at Johns Hopkins and elsewhere. The OMIM databank does not contain links to external databases, although it is linked to other databases by means of the Entrez system.
(iv) GDB is the official central repository for human genomic mapping data. Termination of GDB is scheduled for January 31, 1999 (http://www.gdb.org/shutdown/shutdown.html ); the data will then be passed to the Oak Ridge National Laboratory (ORNL: http://compbio.ornl.gov ). Due to the uncertain future of the data, we have temporarily ceased our developments on GDB.
IMPLEMENTATION
Virgil is maintained under the EYEDB system (developed by Sysra informatique, technical documents are available from http://www.infobiogen.fr/services/eyedb ), an object-oriented engine, which provides a generic Web interface for data browsing.
Simple access to Virgil is given via searches from a Web form. It allows the retrieval of all the links attached to a remote biological object by entering its unique identifier (such as SWISSPROT:P02248 or OMIM:191320). As shown in Figure
Figure 1. Virgil simple query. We refer the reader to Achard et al. (9) for a technical account of Virgil's implementation. This paper describes the computing issues of the Virgil database. It presents the arguments for a centralized database of links and emphasizes the importance of controlled vocabulary to describe the semantic (meaning) of a link. A major goal is to facilitate programmed access to the data, and consequently, to build the framework that will allow database interoperation (10).
DISCUSSION
Maintainers of the major genome databases are making a great effort to cross-reference data via links of good quality. However, ensuring that a collection of links is complete and up to date is a very difficult task due to inconsistent naming conventions, different semantics and rapid evolution of biological knowledge. A central database of links is a first step that will help in the distribution of serviceable links. In addition, data access is easier because the links share the same data model.
Link consistency is difficult to enforce over independent databases. For example, database links are not systematically reciprocated. We found that only 14% of the links from SWISS-PROT to GenBank have corresponding links in GenBank, while 30% of the links from SWISS-PROT to PDB have corresponding links in PDB.
Our mid-term project is to enrich the Virgil database with links from other databases. Notably, we are working on adding Medline references. We are also planning to define a link submission protocol for expert annotation from individuals via a Web form and batch submission of a collection of links. For the time being, link submission is manual: any new links or link annotations should be sent to virgil@infobiogen.fr.
ACKNOWLEDGEMENTS
The authors wish to thank Eric Viara, Christophe Cussat-Blanc and Philippe Gesnouin for excellent computing support. We also would like to thank the authors and maintainers of the databases used to construct Virgil.
REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This Article ![]()
![]()
Abstract
![]()
Print PDF (49K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Achard, F.
![]()
Articles by Barillot, E.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Achard, F.
![]()
Articles by Barillot, E.
![]()
Social Bookmarking ![]()
![]()
What's this?