| Nucleic Acids Research | Pages |
GXD: a Gene Expression Database for the laboratory mouse
Introduction
Design And Implementation Of GXD
GXD Data And Query Interfaces
The GXD Index
Mouse cDNA data
The Gene Expression Data query form
Work In Progress-Future Directions
Addresses And User Support
Citing GXD
Acknowledgements
References
GXD: a Gene Expression Database for the laboratory mouse
ABSTRACT
INTRODUCTION
Knowledge of gene expression patterns is crucial for understanding the function of genes and the molecular mechanisms that underlie normal development and disease. Gene expression analysis has therefore become a major focus of biomedical research. The mouse serves as a pivotal animal model in these studies, because it is closely related to the human and because tissues from many different mouse strains and mutants are readily available for detailed expression analysis.
To cope with the large volumes and the complexity of gene expression data for the laboratory mouse we developed the Gene Expression Database GXD (1,2). GXD is designed as an open-ended system that can integrate many different types of expression data. Thus, as data accumulate, GXD can provide increasingly complete information about what transcripts and proteins are produced by what genes; where, when and in what amounts these gene products are expressed; and how their expression is affected in different mouse strains and mutants. Expression patterns are described by using a comprehensive dictionary of anatomical terms, and the records are directly connected to digitized images of original expression data. The parts of the dictionary covering embryonic development were developed and are being completed by our colleagues in Edinburgh, UK (3) with whom we are collaborating on building the more comprehensive Mouse Gene Expression Information Resource. This resource will eventually provide integrated access to GXD, an anatomy database, and a 3D atlas/graphical gene expression database for mouse development that will enable 3D graphical storage, display and analysis of in situ expression patterns (1,3,4).
GXD addresses many issues that apply to gene expression information and to gene expression databases in general. Here, we illustrate the design and implementation of GXD, and describe the data sets and query capabilities that are currently available.
DESIGN AND IMPLEMENTATION OF GXD
Different types of expression assays reveal different but complementary gene expression information. cDNA clones and ESTs provide sequence information for transcripts, and expression information based on, but limited to, tissue derivation. High-throughput expression methods, such as the analysis of cDNA arrays with complex probes hold the promise of providing large amounts of quantitative dot-blot type expression data (5-7).
Northern and western blot analyses reveal the number and sizes of transcripts and proteins. Methods such as RNAse protection and RT-PCR can detect small quantities of and small structural differences in transcripts. RNA in situ hybridization and immunohistochemistry furnish the most detailed spatial expression information with a resolution at the cellular level. Only by combining different expression analysis methods can we determine what all the different gene products of a given gene are, and when and where they are expressed.
Accordingly, GXD is designed to integrate different types of expression data. How this is achieved is illustrated in Figure
Figure 1. The GXD concept. The probe, whether an antibody or a nucleotide probe, is the key to each assay. Primary expression data are stored and can be combined to synthesize higher order information. For all assay classes, the time and tissue of expression and the genetic origin of the sample are described. For gel assays, additional fields describe the number and size of bands detected. Each band is treated as a unique entity in the database, as it represents a distinct transcript or protein. Sequence information is stored by cross-reference to sequence databases. GXD is implemented in the Sybase relational database system. It is integrated with the Mouse Genome Database (Blake et al., this issue) and provides links to many other resources so that the expression data can be analyzed in the broader biological context. In the following, we give a brief account of the most important contents and fields of the database, and their significance for database queries. The probe used to measure gene expression is described in detail, because it constitutes the molecular window for the assay. If the information is available, nucleotide probes or the primary antigens against which an antibody was generated are described at the sequence level by cross-reference to sequence databases using accession numbers and standard location descriptors. Those annotations enable integration with and searches of the rapidly accumulating information contained in sequence databases (8-12). `Gene symbol' and `Gene name' of the gene whose expression is analyzed (not shown in Fig. Time and space of gene expression is described by using an extensive Dictionary of Anatomical Terms; and digitized images of original expression data are directly connected with these records so that the primary expression data become accessible to global searches. The anatomical dictionary names the tissues and structures for each developmental stage. The terms are organized hierarchically, from body region or system to tissue to tissue substructure (3). This model enables an integrated description of expression patterns for various assays with differing spatial resolution, computational analysis of expression patterns at different levels of detail, and continuous extensions of the anatomical hierarchy itself. The anatomy system can and will also be used for the standardized description of other phenomena that relate to anatomy such as mutant phenotype and disease descriptions (see e.g., Blake et al., and Bult et al., this issue), and thus enable an integrated analysis of these data sets. Expression levels for the assay types currently captured by GXD are described qualitatively as `Absent' or `Present' with additional subjective qualifiers for `Present' such as `Trace', `Weak', `Moderate' and `Strong' when provided by the authors. `Absent' means expression was analyzed but not detected. It is important to note that GXD stores `negative expression data' and allows searches for this information. Information about the `Mouse Strain' and the `Genotype' of the animal in which expression was studied enables a comparative analysis of expression patterns in wild type and mutant animals, and in different genetic backgrounds. This will facilitate the dissection of molecular networks and the analysis of modifiers and complex traits.
GXD DATA AND QUERY INTERFACES
During the last few years we have developed GXD from initial prototypes to a robust resource that can be used by the community at large. Data are now being entered on a daily basis, and are accessible to relational queries via HTML based query forms. Query results are returned as summaries that can be sorted in different ways. They link to individual expression entries that, in turn, have comprehensive pointers to related information in GXD, MGD, and other relevant databases such as sequence databases and Medline. Many links (and all links between GXD and MGD) are bi-directional, providing researchers with multiple entry points into the integrated data sets. The following expression data and query forms are currently available:
The GXD Index
The GXD Index is a search tool for locating expression information in the literature. While GXD was under development, we identified all newly published articles documenting data on endogenous gene expression during mouse development. As of 29 August 1998, 37 365 articles have been surveyed and 3663 have been identified as having relevant expression information. All these articles have been indexed with respect to Authors, Journals, Gene(s), and Embryonic Age(s) analyzed and Expression Assays used. The Index currently contains 9881 entries covering expression information for 2565 genes. The index data are updated daily and are searchable via the GXD Index query form (http://www.informatics.jax.org/gxdindex.html , Fig.
Figure 2. Querying the gene expression literature. The GXD Index query form enables queries such as: List all the publications that contain expression information for specific genes/days of mouse development/assay types, or for any combination of these parameters. Queries can include searches for bibliographic information (Authors, Journal, Year), and one can search for publications that contain specific words in the title or abstract. The query formulated above asks for publications that report RNA in situ hybridization studies on sections for Pax genes AND contain the word `kidney' in the abstract. The Query Results Summary form lists GXD entries matching these criteria. The Query Results Detail form displays the full information of a specific GXD index entry, and provides links to additional information on the respective reference and gene. GXD and MGD collect cDNA data because they constitute expression information in and of themselves, and serve as molecular probes in further expression studies and in gene mapping experiments. We continue to acquire such data from the literature. However, the majority of mouse cDNA data is now obtained from the I.M.A.G.E. consortium (13) and the WashU/HHMI projects via incremental bulk data downloads. At present, the database includes data for over 300 000 I.M.A.G.E. mouse cDNA clones and their corresponding ESTs. GXD provides access to mouse cDNA information via the `cDNA clone and EST expression query form' (http://www.informatics.jax.org/estclone.html ). This query form is optimized for interrogating expression information deduced from the tissue or cell line from which cDNAs were derived, and enables queries such as `From which tissues or cell lines have cDNAs for a specified gene been isolated?', `What genes are, according to cDNA source information, expressed in brain at specified developmental stages?' or `For what genes on a specified chromosome have cDNAs been isolated from a particular tissue?'. Queries can be limited to cDNA clones or ESTs only. One can search for in vivo or in vitro expression data (tissue or cell culture origin) or both. In addition to `type in fields', the query form provides selection lists for the tissues and cell lines from which the I.M.A.G.E. consortium and WashU/HHMI libraries have been generated. This design reflects the fact that the vast majority of cDNA information in our database comes from these two efforts. The Gene Expression Data query form (http://www.informatics.jax.org/expression.html , Fig. Figure 3. Querying gene expression data in GXD. The left side shows the GXD Data Query Form. At the top, users can choose to sort results in a number of ways, and obtain summaries of assays (not shown) or assay results (right side). Searchable fields include gene name, gene symbol, map position, developmental stage and anatomical structure. Anatomical structures are named according to a controlled vocabulary system, which can be examined by linking on the phrase `browse the Anatomical Dictionary'. Users can specify particular assay types, select those where expression was detected, not detected, or either, and determine if anatomical substructures or superstructures should be included in the search. The sample query shown asks for all genes located within 3 cM of the Pltr6 locus that are expressed in muscle or in a substructure of muscle. On the right is the returned assay results summary, sorted by Gene symbol. Assay IDs link to the detailed expression records. Examples of assay records are shown in Figures 4 and 5. Figure 4. Expression record for in situ data. On the left is a sample of query result details for RNA in situ hybridization data. Each entry lists the bibliographic reference, the gene whose expression was analyzed, and the molecular probe used and provides links to the respective records. The first table (from the top) gives descriptions for all the specimens used, additional ones display the detailed results obtained for each specimen. A sample result set is shown for specimen 1. Level of expression is provided if given by the author. A hypertext link to the raw image data is provided, when those data are available in the database. On the right is a sample image of the in situ hybridization data that corresponds to specimen 1. Figure 5. Expression record for gel-type expression data. On the left a sample of query result details for RT-PCR experiments. As for in situ data (Fig. 4), hypertext links to the reference, gene and probe information are provided. Data reports are organized by lane, with result and sample information provided in a single row of the results table. An additional feature of gel assays is the information about the size of the gene product (or products) which are detected, given as band size. Again, level of expression is presented if given by the author. A hypertext link to the raw image data is provided, when those data are available. On the right is the image of the RT-PCR data that corresponds to the experiment. We will continue to annotate expression data from the literature but it is our goal to make direct electronic submission from research laboratories the primary mode of data acquisition for GXD. We have already built an electronic data submission system that enables laboratories to describe their data, including images, in a much more comprehensive and standardized way than is possible in journal publications, and to send the data to GXD in electronic form (2). Prototypes of this system, referred to as the `Gene Expression Annotator', have been made available to a number of test laboratories in North America and Europe with the aim of producing a robust data submission tool for the community at large. Once the system is in place, we will provide accession numbers for each submitted entry that can be cited in journal publications in a manner similar to sequence data. Database entries can complement the publication with more comprehensive primary expression data that, importantly, will be accessible to global searches. Submitted data will be subject to several levels of review. The first level will consist of electronic checks to ensure that information in the submission is consistent with existing data in the database. Examples include associations between genes, probes and sequence accession numbers or correct nomenclature for genes, mutant alleles and mouse strain names. The second level will be editorial review. Database editors holding an M.Sc. or Ph.D. in biological sciences with specialization in pertinent areas will examine submissions for areas that do not make scientific sense. Contradictions and inconsistencies will be resolved by correspondence with the submitter. In the longer term we plan, in collaboration with journals, to establish peer review as a third level of quality control. In this scenario, electronically submitted data cited in manuscripts would become part of the manuscript review process. It is expected that different laboratories may obtain conflicting expression results due to the use of different experimental conditions. GXD will store these results, as such, together with the experimental conditions. All entries will be referenced to the author and, if available, to the journal publication. Very importantly, and as already mentioned above, GXD entries include digitized images of the original expression data or pointers to the primary data if they are not stored in GXD. Thus, users can judge the quality of expression data and database annotations themselves. GXD will be expanded to accommodate new types of expression data. Of immediate relevance are data derived from high-throughput expression methods such as the analysis of high density cDNA or oligonucleotide arrays with complex cDNA probes (5-7). We are working with laboratories employing these methods to define data formats and contents appropriate for a community database, and will develop the submission tools necessary for including these data in GXD. Another important area will be the development of more sophisticated query interfaces. We will develop query forms with additional query options, those that are based on data already captured by GXD, e.g., mouse strain and mutant information, and those based on additional biological descriptors that will be added to the database. In an ongoing collaboration with SWISS-PROT (12), we are establishing curated cross-references between `Genes' in GXD and MGD and protein entries in SWISS-PROT. These links will give access to biochemical and structural classification schemes that will help users to sort through the large volumes of expression data, and they will provide additional important benefits to the community such as standardized gene nomenclature for mouse protein and nucleotide sequence entries in sequence databases, and increased interconnectivity with many additional resources. Perhaps the most exciting future development will be the integration of GXD with the 3D graphical Anatomy Atlas/Gene Expression Database being developed by our Edinburgh collaborators (4,14). The 3D atlas of mouse development will provide a standard spatial framework to store, display and analyze in situ expression data and anatomy data in their true dimensions. GXD will contribute to each entry in the 3D graphical database the supporting textual data and interconnect the spatially mapped data with all the other data contained in the GXD and MGD. The integrated Mouse Gene Expression Information Resource will combine the power of text-based and image-based analysis methods and thus provide researchers with a completely new approach for understanding the molecular basis of development and disease. GXD can be accessed at The Jackson Laboratory through the Mouse Genome Informatics (MGI) Web site at http://www.informatics.jax.org which provides integrated access to GXD and MGD data, or directly at http://www.informatics.jax.org/exptools.html . Additional information about GXD and the Gene Expression Information Resource can be found at http://www.informatics.jax.org/doc/gxdgen.html GXD is also accessible from the following MGI mirror sites:
GXD provides user support through on-line documentation and dedicated User Support Staff. User support can be contacted by Email (mgi-help{at}informatics.jax.org), phone (+1 207 288 6445) or FAX (+1 207 288 6132). To reference the database itself, please cite this article as well as other GXD references listed at http://www.informatics.jax.org/doc/citation.html For referring to specific GXD data, we suggest the following format. `These data were retrieved from the Gene Expression Database (GXD), Mouse Genome Informatics, The Jackson Laboratory, Bar Harbor, Maine, USA, World Wide Web (URL: http://www.informatics.jax.org )'. Type in date (month, year) of when the data cited was retrieved. We would like to thank Lori Corbani, Glenn Colby and John Gilbert for help in software development and database maintenance, Marjorie May and Jennifer Bobish for help in user support, Janice Ormsby for secretarial assistance, and all the MGD staff participating in the joint literature triage system. It is a particular pleasure to thank our colleagues Drs Jonathan Bard and Matthew Kaufman at the University of Edinburgh, and Drs Duncan Davidson and Richard Baldock at the MRC Human Genetics Unit in Edinburgh for making the Dictionary of Mouse Developmental Anatomy available to us, and for many useful discussions. The Gene Expression Database is supported by NIH grants HD33745 and HD08435.
Mouse cDNA data
The Gene Expression Data query form
WORK IN PROGRESS-FUTURE DIRECTIONS
ADDRESSES AND USER SUPPORT
UK:
http://mgd.hgmp.mrc.ac.uk
Japan:
http://mgd.niai.affrc.go.jp
France:
http://www.pasteur.fr/Bio/MGD
Australia:
http://mgd.wehi.edu.au:8080
Israel:
http://bioinfo.weizmann.ac.il/mgd
CITING GXD
ACKNOWLEDGEMENTS
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
C. M. Smith, J. H. Finger, T. F. Hayamizu, I. J. McCright, J. T. Eppig, J. A. Kadin, J. E. Richardson, and M. Ringwald
The mouse Gene Expression Database (GXD): 2007 update
Nucleic Acids Res.,
January 12, 2007;
35(suppl_1):
D618 - D623.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
I. Bradford, R. Winter, C. Evans, and J. Bard
Human-Mouse Gene Searcher: a tool to assist discovery of malformation-associated genes by using phenotype databases
Bioinformatics,
February 1, 2005;
21(3):
408 - 409.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
D. P. Hill, D. A. Begley, J. H. Finger, T. F. Hayamizu, I. J. McCright, C. M. Smith, J. S. Beal, L. E. Corbani, J. A. Blake, J. T. Eppig, et al.
The mouse Gene Expression Database (GXD): updates and enhancements
Nucleic Acids Res.,
January 1, 2004;
32(90001):
D568 - 571.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. Ringwald, J. T. Eppig, D. A. Begley, J. P. Corradi, I. J. McCright, T. F. Hayamizu, D. P. Hill, J. A. Kadin, and J. E. Richardson
The Mouse Gene Expression Database (GXD)
Nucleic Acids Res.,
January 1, 2001;
29(1):
98 - 101.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
E. Wingender, X. Chen, E. Fricke, R. Geffers, R. Hehl, I. Liebich, M. Krull, V. Matys, H. Michael, R. Ohnhauser, et al.
The TRANSFAC system on gene expression regulation
Nucleic Acids Res.,
January 1, 2001;
29(1):
281 - 283.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. A. Blake, J. T. Eppig, J. E. Richardson, M. T. Davisson, and the Mouse Genome Database Group
The Mouse Genome Database (MGD): expanding genetic and genomic resources for the laboratory mouse
Nucleic Acids Res.,
January 1, 2000;
28(1):
108 - 111.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. Ringwald, J. T. Eppig, J. A. Kadin, J. E. Richardson, and the Gene Expression Database Group
GXD: a Gene Expression Database for the laboratory mouse: current status and recent enhancements
Nucleic Acids Res.,
January 1, 2000;
28(1):
115 - 119.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
N. Pollet, H. A. Schmidt, V. Gawantka, M. Vingron, and C. Niehrs
Axeldb: a Xenopus laevis database focusing on gene expression
Nucleic Acids Res.,
January 1, 2000;
28(1):
139 - 140.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
E. Wingender, X. Chen, R. Hehl, H. Karas, I. Liebich, V. Matys, T. Meinhardt, M. Pru{beta}, I. Reuter, and F. Schacherer
TRANSFAC: an integrated system for gene expression regulation
Nucleic Acids Res.,
January 1, 2000;
28(1):
316 - 319.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (868K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (22)
![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Ringwald, M.
![]()
Articles by Richardson, J. E.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Ringwald, M.
![]()
Articles by Richardson, J. E.
![]()
Social Bookmarking ![]()
![]()
What's this?