Nucleic Acids Research, 2001, Vol. 29, No. 1 270-274
© 2001 Oxford University Press
MethDBa public database for DNA methylation data
Institute for Human Genetics, CNRS UPR 1142, Laboratoire de Séquences Répétées et Centromères Humains, 141 Rue de la Cardonille, 34396 Montpellier, France and 1Friedrich-Schiller-University Jena, Department of Biology and Pharmacie, D-07743 Jena, Germany
Received August 30, 2000; Revised and Accepted October 31, 2000.
| ABSTRACT |
|---|
|
|
|---|
Methylation of cytosine in the 5 position of the pyrimidine ring is a major modification of the DNA in most organisms. In eukaryotes, the distribution and number of 5-methylcytosines (5mC) along the DNA is heritable but can also change with the developmental state of the cell and as a response to modifications of the environment. While DNA methylation probably has a number of functions, scientific interest has recently focused on the gene silencing effect methylation can have in eukaryotic cells. In particular, the discovery of changes in the methylation level during cancer development has increased the interest in this field. In the past, a vast amount of data has been generated with different levels of resolution ranging from 5mC content of total DNA to the methylation status of single nucleotides. We present here a database for DNA methylation data that attempts to unify these results in a common resource. The database is accessible via WWW (http://www.methdb.de). It stores information about the origin of the investigated sample and the experimental procedure, and contains the DNA methylation data. Query masks allow for searching for 5mC content, species, tissue, gene, sex, phenotype, sequence ID and DNA type. The output lists all available information including the relative gene expression level. DNA methylation patterns and methylation profiles are shown both as a graphical representation and as G/A/T/C/5mC-sequences or tables with sequence positions and methylation levels, respectively.
| INTRODUCTION |
|---|
|
|
|---|
In plants, vertebrates and certain fungi, DNA methylation is related to a number of epigenetic (i.e. heritable, but potentially reversible) phenomena (1). In vertebrates, the DNA methylation pattern is established early in embryonic development and in general the distribution of 5-methylcytosine (5mC) along the chromosomes is maintained during the life span of the organism (for review see 2). Methylation in particular regions of genes or in their vicinity can inhibit the expression of these gene (3). Artificial demethylation of genes has been shown to result in reactivation. Recent work has provided evidence that the gene silencing effect of methylated regions is accomplished through the interaction of methylcytosine binding proteins with other structural compounds of the chromatin (for review see 4). This interaction probably makes the DNA inaccessible to transcription factors through histone deacetylation and chromatin structure changes (for review see 5). Differentially methylated regions are also key elements in the transcriptional regulation of genes whose expression state depends on the sex of the parent from whom they were inherited (imprinted genes) (for review see 6). The initiation and the maintenance of the inactive X-chromosome in female eutherians was found to depend on methylation (for review see 7). Changes in the methylation patterns seem to occur during developmental and pathologic changes of cells. In particular, tumor tissue of mammalia was found to be characterized by a genome wide demethylation and a local hypermethylation of tumor-suppressor genes (for review see 8).
Despite a long history of DNA methylation studies and in spite of the increasing interest in DNA methylation patterns, so far no attempt has been made to collect and store the available methylation data in a public database. When DNA methylation patterns are published, the authors display them usually in a graphical way. This is convenient for pattern recognition and data representation, however it makes it difficult, if not impossible, to compare patterns from different sources or to analyze the data with appropriate software tools. The common bibliographic search engines will only deliver information that is contained in the title, the abstract and the keywords of a publication but they will fail when a systematic survey about the available data is necessary. For this reason we established a new database for DNA methylation patterns and propose to store all available data about DNA methylation in this database.
Catalogs and other forms of databases have always been indispensable tools for biologists (for review see 9). Nowadays the World Wide Web provides a simple way to make databases accessible to the public. MethDB, the DNA methylation database described in this paper, stores data about the degree of methylation in total DNA, subfractions of total DNA, DNA fragments and single nucleotide positions. It contains information about the nature and the origin of the investigated sample and data about the experiments. Methylation patterns and profiles are represented in a graphical and alphanumeric way. A particular objective of this database is to unify the heterogeneous data about DNA methylation into one common resource and to cross-relate the entries to other existing databases via HTML links.
Structure of the database and file formats
MethDB was designed to store heterogeneous data from different kinds of experiments. The database system has an open structure which consists of 18 tables (Table 1) and can easily be extended. The tables are related to each other (Fig. 1). The three central tables are sample, experiment and methylation. A sample record contains the description of the object, or target, of an experiment. In experiment the experimental conditions and procedures are stored. Finally, methylation contains the results of the experiments. For the moment we distinguish three sorts of results: methylation content, methylation profile and methylation pattern. Methylation content is the relative share of 5mC in a DNA sample without having information about the position of 5mC in the sequence. A methylation profile consists of a series of values for the methylation level at cytosine positions along a known sequence. Methylation patterns describe the sequence of the five nucleotide bases G,A,T,C and 5mC in individual DNA molecules. DNA sequences, methylation profiles and methylation patterns are stored in flat ASCII format.
|
|
Sequences. Sequences are simple text files in FASTA format (>name[new line]sequence[new line]) (10). The name is the sequence ID. These files contain the investigated sequences in the standard IUB notation. File-name is sequence_sequence ID. An example is shown in Figure 2a.
|
For methylation profiles and patterns we introduce two new file formats:
Methylation profiles. Methylation profiles are described by tables of two comma-separated values: position in a mother-sequence and degree of methylation at this site between 0 (no methylation) and 1 (complete methylation). na instead of the methylation level indicates that the position was not analyzed and that this fact has been mentioned in the original publication. Each pair of position/methylation-level values is followed by a [new line] character. The first line of the table contains the symbol > followed by the metpos ID. The table is separated from an optional comment by two slashes //. File-name is methpostbl_metpos ID. Figure 2b shows an example table.
Methylation patterns. Methylation patterns are sequence files in FASTA format with all nucleotide bases, except 5mC, written in lower case and 5-mC residues written in upper case C. The first line consists of the leading symbol > followed by the metpos ID. An arbitrary name can be entered after the symbol # as separator. Files carry the names methpos_metpos ID. An example is given in Figure 2c. For bisulfite genomic sequencing data (11), methylation patterns in the above file format can be conveniently generated with the MethTools software suite (12).
Description of the web interface
MethDB can be accessed via WWW (http://www.methdb.de) (Fig. 3). For administrative reasons the home page is served by a commercial ISP. The database itself has its own dedicated web server. We recommend not to access this server directly since it can not be excluded that its IP number might change from time to time. The search page offers three forms for a structured search and one for a simple quick-search field.
|
Search for patterns and profiles. Methylation patterns (G,A,T,C,5mC sequences) and methylation profiles (level of methylation at cytosine sites across a sequence) are sequence specific data. It is possible to search the database for species, sex, tissue and a gene or locus. A table is returned with a list of matching records. The [Details]-link brings the user to a page with specific information about the particular experiment and the methylation profile or a list of methylation patterns. By definition, one experiment can deliver several patterns but only one profile. Multiple experiments can be performed on one sample. If an experimental method is DNA strand specific, the complement DNA strands are treated as separate samples.
Search for 5mC content. The query form permits to search for species, sex, tissue, phenotype and DNA type of a sample. The database returns a table of matching records together with their 5mC content data. The [Proof]-link brings the user to the reference for the data, the [Details]-link provides information about the sample and the experimental procedure used to obtain the data.
Search for phenotypes. The form allows to enter 5mC content values and to search for phenotypes that have been found to be associated with these 5mC content data. Provided enough background data is present this form might be useful for diagnostic purposes.
Quick search. In this simple query form a single search term can be entered. When the query is submitted a full text search is performed in the following fields: comment, method name, sample name, restriction enzyme, author, phenotype, tissue, gene and species name. In the resulting table, the most recent entries are listed first.
MethDB contains a number of cross references to other databases, for instance the NCBI Taxonomy browser (http://www.ncbi.nlm.nih.gov/Taxonomy/), Entrez PubMed, Entrez GenBank (http://www.ncbi.nlm.nih.gov/Entrez/) and OMIM (http://www.ncbi.nlm.nih.gov/Omim/)) and independent web sites. The tables author, primer and PCR are not yet accessible via the web-interface.
Submission policy and prospects of the database
Currently, the related literature is screened and the available data are entered into the database. Each time literature data has been entered, the corresponding authors of the original research papers are informed and asked to verify these entries. Data from unpublished sources can be submitted by personal communication to the database administrator. We would like to invite and encourage the scientific community to submit their findings. Naturally, for every database entry the author who has submitted the data is indicated. During the last year, the amount of knowledge about DNA methylation has increased considerably. However, while the mechanism of DNA methylation and the way it affects chromatin structure and gene expression becomes clearer and clearer the biological role of DNA methylation is far from being well-defined. The short-term goal of the MethDB project is to collect as much data as possible and to make these data available to the public. We believe that as soon as a sufficient amount of data has been accumulated it will be possible to gain new insights about the function of DNA methylation in the living cell. For this purpose it will be certainly necessary in the future to develop new software tools for the analysis of the data. We are willing to do this by ourselves or in a collaborative effort.
| ACKNOWLEDGEMENTS |
|---|
The authors are grateful to Susan Clark, Peter Warnecke, John Melki (CSIRO, Sydney), Igor Progribny (NCTR, Jefferson) and Melanie Ehrlich (Tulane University Medical Center, New Orleans) who provided unpublished data in an early stage of the development of MethDB. The work presented here was made possible by a grant of the Klaus Tschira Foundation (Heidelberg) to C.G. The authors are grateful to Niels Jahn and Maik Schilling for the introduction into database design.
| FOOTNOTES |
|---|
* To whom correspondence should be addressed. Tel: +33 4 99 61 99 77; Fax: +33 4 99 61 99 01; Email: christoph.grunau{at}methdb.de
| REFERENCES |
|---|
|
|
|---|
-
1 Holliday,R. (1989) DNA methylation and epigenetic mechanisms. Cell Biophys., 15, 1520.
2 Razin,A. and Cedar,H. (1993) In Jost,J.P. and Saluz,H.P. (eds), DNA Methylation: Molecular Biology and Biological Significance. Birkhäuser Verlag, Basel, pp. 343359.
3 Chen,B., Kung,H.F. and Bates,R.R. (1976) Effects of methylation of the beta-galactosidase genome upon in vitro synthesis of beta-galactosidase. Chem. Biol. Interact., 14, 101111.[Web of Science][Medline]
4 Razin,A. (1998) CpG methylation, chromatin structure and gene silencing-a three-way connection. EMBO J., 17, 49054908.[Web of Science][Medline]
5 Bestor,T.H. (1998) Gene silencing. Methylation meets acetylation. Nature, 393, 311312.[Medline]
6 Sasaki,H., Allen,N.D. and Surani,M.A. (1993) In Jost,J.P. and Saluz,H.P. (eds), DNA: Molecular Biology and Biological Significance. Birkhäuser Verlag, Basel, pp. 469477.
7 Goto,T. and Monk,M. (1998) Regulation of X-chromosome inactivation in development in mice and humans. Microbiol. Mol. Biol. Rev., 62, 362378.
8 Schulz,W.A. (1998) DNA methylation in urological malignancies Int. J. Oncol., 13, 151167.[Web of Science][Medline]
9 Maurer,S.M., Firestone,R.B. and Scriver,C.R. (2000) Sciences neglected legacy. Nature, 405, 117120.[Medline]
10 Pearson,W.R. and Lipman,D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 24442448.
11 Frommer,M., McDonald,L.E., Millar,D.S., Collis,C.M., Watt,F., Grigg,G.W., Molloy,P.L. and Paul,C.L. (1992) A genomic sequencing protocol that yields a positive display of 5- methylcytosine residues in individual DNA strands. Proc. Natl Acad. Sci. USA, 89, 18271831.
12 Grunau,C., Schattevoy,R., Mache,N. and Rosenthal,A. (2000) MethToolsa toolbox to visualize and analyze DNA methylation data. Nucleic Acids Res., 28, 10531058.
This article has been cited by other articles:
![]() |
F. Fang, S. Fan, X. Zhang, and M. Q. Zhang Predicting methylation status of CpG islands in the human brain Bioinformatics, September 15, 2006; 22(18): 2204 - 2209. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Hermann, S. Schmitt, and A. Jeltsch The Human Dnmt2 Has Residual DNA-(Cytosine-C5) Methyltransferase Activity J. Biol. Chem., August 22, 2003; 278(34): 31717 - 31721. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Grunau, E. Renault, and G. Roizes DNA Methylation Database "MethDB": a User Guide J. Nutr., August 1, 2002; 132(8): 2435S - 2439. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





