Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (214K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Leser, U.
Right arrow Articles by Roest Crollius, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leser, U.
Right arrow Articles by Roest Crollius, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 108-111  


IXDB, an X chromosome integrated database
Introduction
Data Representation
Content
WWW Access
Data Submission
Implementation
Work In Progress
Acknowledgement
References


IXDB, an X chromosome integrated database

IXDB, an X chromosome integrated database

Ulf Leser+, Robert Wagner, Andrei Grigoriev, Hans Lehrach, Hugues Roest Crollius*

Max-Planck-Institut für Molekulare Genetik, Ihnestrasse 73, D-14195 Berlin, Germany

Received September 4, 1997; Revised and Accepted October 28, 1997

ABSTRACT

The integrated X chromosome database (IXDB) is a repository for physical mapping data of the human X chromosome. Its current content is the result of a strict integration of data stemming from many different sources. The main features of IXDB include a flexible and extendible schema, a comfortable and fully cross-referenced WWW interface (http://ixdb.mpimg-berlin-dahlem.mpg.de) and a graphical map viewer implemented in JAVA. The database stores objects used in physical mapping as well as the maps resulting from this work, but a strong emphasis is placed on recording experiments that connect objects together. This should greatly contribute to fulfilling one of the major goals of the database: to support the construction of an integrated physical, genetic, transcript and sequence map of the human X chromosome.

INTRODUCTION

Physical mapping is a complex task that essentially consists in detecting and analysing connections between thousands of genomic objects to gain further insights in the structure of the genome. Objects include clones of different types and hosts, and markers of various origins (e.g. microsatellites, anonymous DNA, transcripts). Clones are usually stored and distributed in large libraries and those of interest are frequently re-arrayed into more specific collections, which usually goes along with renaming them. Probes, on the other hand, are often derived from other objects such as extremities of clones or part of transcripts. Keeping track of the relationships between objects and their different names is necessary when analysing data stemming from more than one project. Hence, tight integration of the data is a prerequisite to assemble in one reference map the fragmented views that each separate data set currently provides. IXDB is a database which strives to achieve this integration and to provide a comprehensive source of data for physical mapping of the human X chromosome. Its current content has been extracted from a number of laboratory-specific data sets, personal communications, public databases and publications. Specific problems arising during the integration of data from external sources strongly influence the architecture of the system. This is reflected in a flexible schema that allows new links and new object types to be established instantly. Importing new data has the potential to create contradictions in the database and requires extensive comparisons with the existing data set. Most of the importing tasks have therefore been automated via an intermediary data format and adapted software. The software checks new objects against existing ones, flags potential identities across tables and rejects semantically inconsistent data. Ultimately, the purpose of IXDB is to support the creation of new integrated maps. In this process, we distinguish data integration from map integration: the former is a semantically difficult, but algorithmically simple task, while the latter is semantically clear, but computationally complex. Data integration, however, is an absolute prerequisite for efficient map integration, and is therefore the main focus of IXDB in its current phase.

DATA REPRESENTATION

The data schema of IXDB is centred around genomic objects, their inter-object relationships and their position on physical, genetic or radiation hybrid maps. Each object can have arbitrary positions on arbitrary maps. Object categories include a variety of clone types (YAC, PAC, BAC, cosmids, etc.), marker types (amplimers, hybridisation probes, etc.), ESTs, loci and genes. Object are annotated with details such as length, chimerism, maximum heterozygosity, etc. Each piece of information is tagged with its source, and different sources can provide contradicting values on the same objects without harm for the consistency of the database. WWW links to other databases such as GDB, GenBank, EMBL, OMIM or dbEST are extensively used. Objects in IXDB are connected either by experimental evidence or by the fact that they were directly derived from another object in the database. Experimental evidences include a number of different methods (hybridisations, gel fingerprints, PCR assay, etc.) which may be annotated to describe the strength of a link (e.g. probability of overlap by fingerprint, intensity of a hybridisation signal). Direct connections, on the other hand, describe the generation of an object from another; for instance, the preparation of an amplimer from the sequence of the extremity of a clone, or the subcloning of cosmids from a YAC. The number of relationship types is strictly controlled but can easily be extended when required by new data.

Table 1. Number of genomic objects and relationships37 516
YACs 17 153
Other clones 2300
Genes 409
STSs 3129
ESTs 2655
Connections by experimental evidence
Direct connections 15 142
Connected objects 36 058
Objects positioned on maps 4929


Figure 1 The WWW query page of IXDB. Objects can be searched by name, keyword or external identifier. The result can either be a general report about the object or a list of overlapping objects. Direct links are provided to all the maps and all the genes stored in IXDB. The complete content of IXDB can be reviewed and each object count calculated at any time.


Figure 2 A report about the YAC `681_F_6'. Aliases of this YAC are given explicitly. The report gathers information on all aliases. Annotations are provided by different sources and can be contradictory.

Situations where an object is given several different names occur frequently in genomics. IXDB uses synonyms to give additional names to an object, all of which can be used in queries. For some DNA objects, however, different names describe different versions of one original and unique object. For instance, a YAC clone may be distributed to a number of different laboratories, and evolve differently with each new culturing. We have defined a special relationship to represent this situation, which we call aliases. Each alias corresponds to a different entry in the database, with its own annotations and relations to other objects. However, each alias within a group is likely to yield the same results, since each is likely to contain similar biological material. Therefore, queries for any one alias always retrieve a report on all aliases. Objects can be annotated with contradictory information according to different sources. For instance, different laboratories may measure different length or determine different chimerism status for a clone. IXDB stores this information with equal rights. This makes IXDB well suited for merging and integrating data from different database systems and nomenclature without the need to make arbitrary decisions on which view is more adequate or is a more accurate reflection of the reality.

CONTENT

IXDB has integrated data from 16 regional and nine chromosome scale projects. It contains seven chromosome-spanning maps which include consensus marker maps produced as a result of the X chromosome workshops 1994 (1) and 1995 (2): the genetic map constructed by Généthon (3), the first X chromosome specific YAC map assembled at the MPIMG (4), an EST map based on radiation hybrid mapping (5) and an STS map (6). Table 1 summarises the number of objects currently stored in the main classes of IXDB.


IXDB stores 40 000 different DNA objects related to the X chromosome, excluding aliases and synonyms. For instance, 80% of STSs have at least two names, and each YAC has an average of 2.7 aliases, which are not accounted for in Table 1. IXDB is used as a repository for data on the cX YAC collection assembled at the MPIMG and distributed by the German Resource Centre (RZPD at http://www.rzpd.de ). The cX YAC collection contains 9000 clones mapped on the human X chromosome and contributed by 14 laboratories world-wide (http://www.mpimg-berlin-dahlem.mpg.de/~xteam ).


Figure 3 Graphical map in the DMD region displayed using `DerBrowser'. Objects are arranged in horizontal stripes. At the top, cytogenetic bands help orient the map. Genes are represented in green, followed by YAC clones in various colours related to experimental data. EST bins from the human transcript map consortium have been integrated with this map and are represented as long blocks under the YACs. The markers and the scale are below and require scrolling down to be viewed. Any of these stripes can be removed instantly by checking out a list currently superimposed on the map. Names of objects are written inside them, when zooming permits. The lowest YAC clone is selected and information similar to that on Figure 2. can be retrieved from IXDB by pressing the `About' button. This display can be reached for instance by querying for the DMD gene, clicking on any of the positions given on the integrated3 map, and pressing `Show Map' on the next page, which offers the possibility to enlarge or reduce the map slice to query for.

WWW ACCESS

Access to IXDB is provided via a WWW interface (Fig 1.; http://ixdb.mpimg-berlin-dahlem.mpg.de/ ). Queries can be performed in several ways: by name, keyword, internal or external identification number (e.g. an EMBL accession id) or by map position. A successful search will dynamically generate a report on the appropriate object that contains all stored information, including general annotations, links to other databases, results of experiments it participated in, connected objects and map positions (Fig. 2). Once in IXDB, a user may also chose to navigate more intuitively by clicking on hyperlinks that will retrieve information on objects related to the one on display. All objects get assigned a stable internal identifier which is used for references. This identifier, together with the appropriate URL, can be used to generate links from other databases into IXDB. Maps can either be viewed as a simple HTML table or via DerBrowser, a JAVA applet that turns this list into an interactive graphical representation (Fig. 3). Clicking on an object on the map will query the database and present the object report in a new browser window. The applet also allows the exclusion of certain object types and provides a zooming function. Entire maps can be viewed at once, or smaller regions can be specified using flanking objects (markers, genes, etc.) or coordinates in kilobase pairs. DerBrowser also compares the order of two maps side by side. Although the applet is platform independent, the size, position and font of the text and the zooming factor can be configured to better suit different environments.

DATA SUBMISSION

Data is currently actively searched by scanning literature, examining databases and laboratory servers. All pieces of information are tagged with their original source. In addition, we also encourage the direct submission of data to IXDB. We have defined a very simple and powerful file format called IACE for which documentation is available on the IXDB WWW site. IACE files can be easily generated from any well-structured format such as database entries or EXCEL spreadsheets. Help will be provided on request. Since version 2.0, information can be kept confidential to a user or group of users. Private data is transparently included in the WWW reports for the owner, but invisible for the rest of the world. Therefore, a special, so-called private query mode has been introduced, which is maintained via a standard authentication procedure.

IMPLEMENTATION

IXDB is a relational database on top of the commercial database management system ORACLE v7.3. The relational data schema consists of approximately 40 tables. Documention can be obtained from our WWW site. The WWW interface and other tools are written in Tcl/Tk (7) and use the ORATCL package written by Tom Poindexter (http://www.nyx.net/~tpoindex/tcl.html ). We decided to use a relational instead of an object-oriented database for mainly two reasons. First, schema of scientific database such as IXDB which reflect a rapidly evolving area of technology, cannot be designed perfectly from scratch. They continuously evolve when new experiments are carried out or new tasks are included, and this schema evolution is not well supported by current object-oriented database management systems. Second, our intention was to generate a generic object representation, where identity takes priority over classification. Objects changing classes due to new information appeared frequently, which would have required considerable programming in an object-oriented environment.

WORK IN PROGRESS

IXDB receives over 1000 queries per month from the community at large, an indication that its current content is valuable to researchers seeking mapping data on the X chromosome. Integration of data into IXDB is however still the highest priority, going back in time as well as keeping up with new results. A recently created European consortium to construct a transcript map of the human X chromosome should assist in the latter objective. In this project, IXDB plays a central role to pool results and coordinate their access and dissemination. In addition to updating and curating the data, several technical developments are in progress. DerBrowser is an essential part of the database and is being turned into a drag-and-drop interface that allows the graphical manipulation of object positions on maps via the WWW. This would allow external users to directly create, move or delete objects on their maps, and would greatly enhance community curation. IXDB offers a service by providing access to an integrated pool of information on the X chromosome. Our purpose is also to construct an integrated physical, genetic, transcript and sequence map that takes into account all the data stored in the database, and new software is being developed to perform the necessary analysis.

ACKNOWLEDGEMENT

This work is supported by the European Community under grant CT961134.

REFERENCES

1. Willard,H.F., Cremers,F., Mandel,J.L., Monaco,A.P., Nelson,D.L. and Schlessinger,D. (1994) Cytogenet. Cell Genet., 67, 295-358. MEDLINE Abstract

2. Nelson,D.L., Ballabio,A., Cremers,F., Monaco,A.P. and Schlessinger,D. (1995) Cytogenet. Cell Genet., 71, 307-342.

3. Dib,C., Faure,S., Fizames,C., Samson,D., Drouot,N., Vignal,A., Millasseau,P., Marc,S., Hazan,J., Seboun,E., et al.) (1996) Nature, 380, 152-154. MEDLINE Abstract

4. Roest Crollius,H., Ross,M.T., Grigoriev,A., Knights,C.J., Holloway,E., Misfud,J., Li,K., Playford,M., Gregory,S.J., Humphray,S.J., et al. (1996) Genome Res., 6, 943-955.

5. Schuler,G.D., Boguski,M.S., Stewart,E.A., Stein,L.D., Gyapay,G., Rice,K., White,R.E., Rodriguez-Tome,P., Aggarwal,A., Bajorek,E., et al). (1996) Science, 274, 540-546. MEDLINE Abstract

6. Nagaraja,R., Macmillan,S., Kere,J., Jones,C., Griffin,S., Schmatz,M., Terrell,J., Shomaker,M., Jermak,C., Hott,C., et al.) (1997) Genome Res., 7, 210-222. MEDLINE Abstract

7. Ousterhout,J.K. (1994) Tcl and the Tk toolkit, Addison-Wesley. MEDLINE Abstract


*To whom correspondence should be addressed at present address: Genoscope, Centre National de Séquencage, BP 191, 91000 Evry cedex, France. Tel: +33 1 60 87 25 64; Fax: +33 1 60 87 25 89; Email: hrc@genoscope.cns.fr
+Present address: Technische Universität Berlin, Fachbereich 13, CIS Group, Einsteinufer 17, D-10587 Berlin, Germany


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (214K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (4)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Leser, U.
Right arrow Articles by Roest Crollius, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Leser, U.
Right arrow Articles by Roest Crollius, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?