Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (634K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 100-101  


Virgil: a database of rich links between GDB and GenBank
Background
Rich Links
Implementation
Data Model
Data Curation
Discussion
Acknowledgements
References


Virgil: a database of rich links between GDB and GenBank

Virgil: a database of rich links between GDB and GenBank

Frédéric Achard*, Emmanuel Barillot

GIS Infobiogen, 7 rue Guy Môquet, BP 8, 94801 Villejuif Cedex, France

Received September 2, 1997; Revised and Accepted October 3, 1997

ABSTRACT

Database interconnection requires the development of links between related objects from different databases. We built a database of links, called Virgil, to manage and distribute rich (documented) links between GDB genes and GenBank human sequences. Virgil contains 18 667 unique links. In addition to a simple Web form for ad-hoc queries, we propose a generic Web interface and a prototype CORBA server for link distribution. Materials described in this paper are available from http://www.infobiogen.fr/services/virgil/home.html

BACKGROUND

Links between biological objects are frequently used by individuals, e.g., for data browsing, and software, e.g., for data analysis or database interconnection. However, the links as found in the major databases are too often difficult to retrieve, inconsistent, not sufficiently documented or maintained. To address these problems, we propose Virgil, a database dedicated to the management and distribution of rich links.

RICH LINKS

Virgil focuses on storing rich links between GDB genes (1) and GenBank human sequences (2).

Virgil links are imported from GDB (10 155 links) and GenBank (3433 links). It also contains 10 677 links that were automatically generated by the genXref system (3).

It results in 18 667 links, each referred to by a unique Virgil identifier. From a random sample of 170 links, we estimate that Virgil contains 86% (±5%) of relevant links.

IMPLEMENTATION

Virgil uses an object-oriented engine to model and manage the data: EYEDB (developed by Sysra informatique, technical documents are available from http://www.infobiogen.fr/services/eyedb ).

Simple Virgil searches are available from a Web form, as shown in Figure 1a. It allows retrieval of all the links attached to a remote biological object by entering its unique identifier (such as GDB:128600 or GENBANK:M61764). One can also enter a Virgil link identifier (such as VIRGIL:6661) to retrieve a rich link. The ad-hoc query form returns a list of hyperlinks to EYEDB object links. Such a link is shown in Figure 1b. Navigating through the generic Web EYEDB interface gives access to the data that constitute rich links such as objects shown in Figure 1c, d and e.


Figure 1 Example of a simple query: find all the validated links which contain GDB:128600 gene. It shows some of the objects attached to the link returned by the query.

Expert queries to Virgil can be performed directly by means of a generic Web interface. It allows one to enter an EYEDB OQL query (Object Query Language); facilities for building such queries are provided. We also provide a prototype CORBA server for programmed access. The services delivered to a client by CORBA objects are publicly available by means of an IDL (Interface Definition Language). As an illustration, we implemented two CORBA clients for querying Virgil. We refer the reader to `Ubiquitous Distributed Objects with CORBA' (4) for an overview on CORBA.

DATA MODEL

Virgil schema was designed to comprehensively describe a link between two biological objects.

A Virgil link is a bi-directional connection between two database objects. The two database objects are referred to by a unique identifier, prefixed with the database name.

Link characterization is effective by means of annotations. An annotation contains the name of the author (an individual, a database or a program), the method supporting for the creation of the link, the status (VALIDATED, PUTATIVE or DELETED) and the belief value (normalized value between 0 and 1). The two latter attributes give the author's judgment on the link quality.

The global status of a link is inferred from the status given by all the annotators. For example, if some annotators are known for the quality of their annotations, the annotation status will be passed on as the global status of the link.

In Virgil, much attention was given to describe the meaning of the objects. Controlled vocabulary (as opposed to free-text) was used to facilitate programmed access. At present Virgil contains three object types (new object types can be created on demand). VIRGIL.UNION specifies a union link between two parts of the same biological object (this terminology is imported from ref. 5); GENBANK.NUCLEIC_SEQUENCE and GDB.GENE specify two types of objects from remote databases.

DATA CURATION

The default status of a link is PUTATIVE. A link is VALIDATED when it is imported from a population where >95% of links are relevant. This is the case for links imported from GDB and GenBank. The link generated by genXref are VALIDATED when the belief value is >0.90 (this threshold corresponds to a population where 95% of the links are relevant). Virgil contains 11 195 VALIDATED links.

Methods to check data integrity are attached to the link objects. Virgil data will be updated on a bi-monthly basis with every new major GenBank release.

DISCUSSION

A few works make extensive use of links to build complex information sytems dedicated to biological data. The getDB system (6) achieves integration of information via linkDB, a bank of the links explicitly specified within any of the 16 molecular databases that compose getDB. Similarly, SRS (7) creates a virtual federation of genome DBs. A language allows one to describe the structure of a flat file library and to define means to extract links between libraries. The program processes indices to allow navigation through all the libraries. A limitation of the SRS system is that it applies only to flat file libraries, not relational or object oriented systems.

In parallel to these efforts, we propose with Virgil a service to distribute exhaustive collections of richer links between GDB genes and GenBank sequences. This is a necessary step to allow a seamless integration of data between a biomolecular and a genomic database.

ACKNOWLEDGEMENTS

Virgil benefitted from the work of Ken Fasman and Stan Letovsky on the Biolinks project, we thank them for this contribution. We are also grateful to Infobiogen staff for excellent computing support and valuable discussions.

REFERENCES

1. Fasman,K.H., Letovsky,S.I., Cottingham,R.W. and Kingsbury,D.T. (1996) Nucleic Acids Res. 24, 57-63 [see also this issue, Nucleic Acids Res. (1998) 26, 21-26].

2. Benson,D.A., Boguski,M., Lipman,D.J. and Ostell,J. (1996) Nucleic Acids Res. 24, 1-5 [see also this issue, Nucleic Acids Res. (1998) 26, 1-7].

3. Achard,F. and Dessen,P. (1995) In International IEEE Symposium on Intelligence in Neural and Biological Systems: pp. 78-83.

4. Achard,F. and Barillot,E. (1997) In Altman,R., Dunker,K., Hunter,L. and Klein,T. (eds) Pacific Symposium on Biocomputing `97. World Scientific. pp. 39-50.

5. Karp,P.D. (1996) Trends Biotechnol. 14, 273-279. MEDLINE Abstract

6. Akiyama,Y., Goto,S., Uchiyama,I. and Kanehisa,M. (1995) In Second Meeting on the Interconnection of Molecular Biology Databases: http://www.ai.sri.com/people/pkarp/mimbd/95/abstracts.html

7. Etzold,T., Ulyanov,A. and Argos,P. (1993) Methods Enzymol. 266, 114-128.


*To whom correspondence should be addressed. Tel: +33 1 49 58 35 97; Fax: +33 1 49 58 36 89; Email: frederic.achard@infobiogen.fr


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (634K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Achard, F.
Right arrow Articles by Barillot, E.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?