| Nucleic Acids Research | Pages |
IXDB, an X chromosome integrated database
Introduction
Data Representation
Content
WWW Access
Data Submission
Implementation
Work In Progress
Acknowledgement
References
IXDB, an X chromosome integrated database
ABSTRACT
INTRODUCTION
Physical mapping is a complex task that essentially consists in detecting and analysing connections between thousands of genomic objects to gain further insights in the structure of the genome. Objects include clones of different types and hosts, and markers of various origins (e.g. microsatellites, anonymous DNA, transcripts). Clones are usually stored and distributed in large libraries and those of interest are frequently re-arrayed into more specific collections, which usually goes along with renaming them. Probes, on the other hand, are often derived from other objects such as extremities of clones or part of transcripts. Keeping track of the relationships between objects and their different names is necessary when analysing data stemming from more than one project. Hence, tight integration of the data is a prerequisite to assemble in one reference map the fragmented views that each separate data set currently provides. IXDB is a database which strives to achieve this integration and to provide a comprehensive source of data for physical mapping of the human X chromosome. Its current content has been extracted from a number of laboratory-specific data sets, personal communications, public databases and publications. Specific problems arising during the integration of data from external sources strongly influence the architecture of the system. This is reflected in a flexible schema that allows new links and new object types to be established instantly. Importing new data has the potential to create contradictions in the database and requires extensive comparisons with the existing data set. Most of the importing tasks have therefore been automated via an intermediary data format and adapted software. The software checks new objects against existing ones, flags potential identities across tables and rejects semantically inconsistent data. Ultimately, the purpose of IXDB is to support the creation of new integrated maps. In this process, we distinguish data integration from map integration: the former is a semantically difficult, but algorithmically simple task, while the latter is semantically clear, but computationally complex. Data integration, however, is an absolute prerequisite for efficient map integration, and is therefore the main focus of IXDB in its current phase.
DATA REPRESENTATION
The data schema of IXDB is centred around genomic objects, their inter-object relationships and their position on physical, genetic or radiation hybrid maps. Each object can have arbitrary positions on arbitrary maps. Object categories include a variety of clone types (YAC, PAC, BAC, cosmids, etc.), marker types (amplimers, hybridisation probes, etc.), ESTs, loci and genes. Object are annotated with details such as length, chimerism, maximum heterozygosity, etc. Each piece of information is tagged with its source, and different sources can provide contradicting values on the same objects without harm for the consistency of the database. WWW links to other databases such as GDB, GenBank, EMBL, OMIM or dbEST are extensively used. Objects in IXDB are connected either by experimental evidence or by the fact that they were directly derived from another object in the database. Experimental evidences include a number of different methods (hybridisations, gel fingerprints, PCR assay, etc.) which may be annotated to describe the strength of a link (e.g. probability of overlap by fingerprint, intensity of a hybridisation signal). Direct connections, on the other hand, describe the generation of an object from another; for instance, the preparation of an amplimer from the sequence of the extremity of a clone, or the subcloning of cosmids from a YAC. The number of relationship types is strictly controlled but can easily be extended when required by new data.
Table
Figure
Figure
Situations where an object is given several different names occur frequently in genomics. IXDB uses synonyms to give additional names to an object, all of which can be used in queries. For some DNA objects, however, different names describe different versions of one original and unique object. For instance, a YAC clone may be distributed to a number of different laboratories, and evolve differently with each new culturing. We have defined a special relationship to represent this situation, which we call aliases. Each alias corresponds to a different entry in the database, with its own annotations and relations to other objects. However, each alias within a group is likely to yield the same results, since each is likely to contain similar biological material. Therefore, queries for any one alias always retrieve a report on all aliases. Objects can be annotated with contradictory information according to different sources. For instance, different laboratories may measure different length or determine different chimerism status for a clone. IXDB stores this information with equal rights. This makes IXDB well suited for merging and integrating data from different database systems and nomenclature without the need to make arbitrary decisions on which view is more adequate or is a more accurate reflection of the reality.
YACs
17 153
Other clones
2300
Genes
409
STSs
3129
ESTs
2655
Connections by experimental evidence 37 516
Direct connections
15 142
Connected objects
36 058
Objects positioned on maps
4929


CONTENT
IXDB has integrated data from 16 regional and nine chromosome scale projects. It contains seven chromosome-spanning maps which include consensus marker maps produced as a result of the X chromosome workshops 1994 (1) and 1995 (2): the genetic map constructed by Généthon (3), the first X chromosome specific YAC map assembled at the MPIMG (4), an EST map based on radiation hybrid mapping (5) and an STS map (6). Table 1 summarises the number of objects currently stored in the main classes of IXDB.
IXDB stores 40 000 different DNA objects related to the X chromosome, excluding aliases and synonyms. For instance, 80% of STSs have at least two names, and each YAC has an average of 2.7 aliases, which are not accounted for in Table 1. IXDB is used as a repository for data on the cX YAC collection assembled at the MPIMG and distributed by the German Resource Centre (RZPD at http://www.rzpd.de ). The cX YAC collection contains 9000 clones mapped on the human X chromosome and contributed by 14 laboratories world-wide (http://www.mpimg-berlin-dahlem.mpg.de/~xteam ).
Figure

WWW ACCESS
Access to IXDB is provided via a WWW interface (Fig 1.; http://ixdb.mpimg-berlin-dahlem.mpg.de/ ). Queries can be performed in several ways: by name, keyword, internal or external identification number (e.g. an EMBL accession id) or by map position. A successful search will dynamically generate a report on the appropriate object that contains all stored information, including general annotations, links to other databases, results of experiments it participated in, connected objects and map positions (Fig. 2). Once in IXDB, a user may also chose to navigate more intuitively by clicking on hyperlinks that will retrieve information on objects related to the one on display. All objects get assigned a stable internal identifier which is used for references. This identifier, together with the appropriate URL, can be used to generate links from other databases into IXDB. Maps can either be viewed as a simple HTML table or via DerBrowser, a JAVA applet that turns this list into an interactive graphical representation (Fig. 3). Clicking on an object on the map will query the database and present the object report in a new browser window. The applet also allows the exclusion of certain object types and provides a zooming function. Entire maps can be viewed at once, or smaller regions can be specified using flanking objects (markers, genes, etc.) or coordinates in kilobase pairs. DerBrowser also compares the order of two maps side by side. Although the applet is platform independent, the size, position and font of the text and the zooming factor can be configured to better suit different environments.
DATA SUBMISSION
Data is currently actively searched by scanning literature, examining databases and laboratory servers. All pieces of information are tagged with their original source. In addition, we also encourage the direct submission of data to IXDB. We have defined a very simple and powerful file format called IACE for which documentation is available on the IXDB WWW site. IACE files can be easily generated from any well-structured format such as database entries or EXCEL spreadsheets. Help will be provided on request. Since version 2.0, information can be kept confidential to a user or group of users. Private data is transparently included in the WWW reports for the owner, but invisible for the rest of the world. Therefore, a special, so-called private query mode has been introduced, which is maintained via a standard authentication procedure.
IMPLEMENTATION
IXDB is a relational database on top of the commercial database management system ORACLE v7.3. The relational data schema consists of approximately 40 tables. Documention can be obtained from our WWW site. The WWW interface and other tools are written in Tcl/Tk (7) and use the ORATCL package written by Tom Poindexter (http://www.nyx.net/~tpoindex/tcl.html ). We decided to use a relational instead of an object-oriented database for mainly two reasons. First, schema of scientific database such as IXDB which reflect a rapidly evolving area of technology, cannot be designed perfectly from scratch. They continuously evolve when new experiments are carried out or new tasks are included, and this schema evolution is not well supported by current object-oriented database management systems. Second, our intention was to generate a generic object representation, where identity takes priority over classification. Objects changing classes due to new information appeared frequently, which would have required considerable programming in an object-oriented environment.
WORK IN PROGRESS
IXDB receives over 1000 queries per month from the community at large, an indication that its current content is valuable to researchers seeking mapping data on the X chromosome. Integration of data into IXDB is however still the highest priority, going back in time as well as keeping up with new results. A recently created European consortium to construct a transcript map of the human X chromosome should assist in the latter objective. In this project, IXDB plays a central role to pool results and coordinate their access and dissemination. In addition to updating and curating the data, several technical developments are in progress. DerBrowser is an essential part of the database and is being turned into a drag-and-drop interface that allows the graphical manipulation of object positions on maps via the WWW. This would allow external users to directly create, move or delete objects on their maps, and would greatly enhance community curation. IXDB offers a service by providing access to an integrated pool of information on the X chromosome. Our purpose is also to construct an integrated physical, genetic, transcript and sequence map that takes into account all the data stored in the database, and new software is being developed to perform the necessary analysis.
ACKNOWLEDGEMENT
This work is supported by the European Community under grant CT961134.
REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||