| Nucleic Acids Research | Pages |
The human gene mutation database
Introduction
Data Coverage And Structure
Data Access
Conclusions And Outlook
Acknowledgements
References
The human gene mutation database
ABSTRACT
INTRODUCTION
The Human Gene Mutation Database (HGMD), maintained at the Institute of Medical Genetics in Cardiff, represents a comprehensive core collection of data on germline mutations underlying human inherited disease. Thus, HGMD comprises published single base-pair substitutions in coding, regulatory and splicing-relevant regions of human nuclear genes as well as deletions, duplications, insertions, repeat expansions and `indels', plus a number of complex rearrangements not covered by the above categories. Somatic gene mutations and mitochondrial genome mutations are not included.
The curators of HGMD have adopted a policy of entering each mutation only once in order to avoid confusion between recurrent and identical-by-descent lesions. Reliable discrimination between these two alternatives would require information available only for a very small proportion of known lesions. Therefore, although data on the regional, ethnic and haplotype context of mutations would be extremely useful in terms of epidemiological and population genetics research, any unselective accumulation of literature reports would have resulted in an inflation of references with little immediate scientific use.
Although originally established for the scientific study of mutational mechanisms in human genes (1), HGMD has acquired a much broader utility in that it provides information of practical importance to researchers in human molecular genetics, physicians interested in a particular inherited condition in a given patient or family, and genetic counsellors. In view of its potential usefulness, the curators of HGMD made the database publicly available (2) through the WorldWideWeb in April 1996.
DATA COVERAGE AND STRUCTURE
By September 1997, HGMD contained >11 900 different lesions in a total of 636 different genes (Table 1). Entries are accumulating at a rate of >2000 per annum (Fig. 1). Coverage is limited to original published reports although some data are taken from `Mutation Updates' or review articles. Mutations reported only in abstract form are not generally included. Data acquisition for HGMD has been accomplished by a combination of manual and computerised search procedures, scanning in excess of 250 journals on a weekly/monthly basis.
Table
Figure
Mutation type
No. of entries
Single base-pair substitutions, missense/nonsense
7282
Single base-pair substitutions, splicing
1052
Single base-pair substitutions, regulatory
102
Small deletions ([le]20 bp)
1857
Small insertions ([le]20 bp)
653
Small indels ([le]20 bp)
82
Repeat expansions
15
Gross deletions (>20 bp)
736
Gross insertions and duplications (>20 bp)
122
Complex rearrangements including inversions
71
Total
11972

DATA ACCESS
HGMD is accessible on the basis of every gene being allocated one web page per mutation type, if data of that type are present. Since HGMD is partly dependent upon industrial funding and involves considerable editorial work over and above mere literature screening (e.g. to ensure the consistency of nucleotide sequence information, amino acid residue numbering and gene symbol usage), unsolved copyright problems have so far precluded HGMD from being downloadable in its entirety. However, once the closer cooperation with publically funded bioinformatics institutions currently envisaged has been put in place, unrestricted access to the database will become possible. During its first 17 months on the Internet, HGMD has been accessed >70 000 times.
Meaningful integration of the data with phenotypic, structural and mapping information on human genes has been accomplished through bi-directional links between HGMD and both the Genome Database (GDB) and Online Mendelian Inheritance in Man (OMIM), Baltimore, USA. In addition, hypertext links have been established from HGMD references to Medline abstracts through Entrez. Hypertext links have also been set up to `reference cDNA sequences' (458 to date) which are used for data checking. The links to GDB and OMIM have enforced the standardisation of disease and gene nomenclature in HGMD. Thus HGMD can be searched either by HUGO-approved gene symbols, GDB accession numbers, or OMIM-compatible disease or gene names. For genes for which Locus-Specific Mutation Databases are available on the Internet, these databases (currently ~40) can be accessed either from the corresponding gene-specific HGMD pages or via the Locus-Specific Mutation Database page (3).
Table Being both comprehensive and fully integrated into the existing bioinformatics structures relevant to human genetics, HGMD has established itself as the central core database of inherited human gene mutations. In order to improve the accuracy, efficiency and rapidity of mutation publication, however, direct submission of mutation data to a central resource capable of (and responsible for) checking the novelty and consistency of data is both necessary and desirable. Although some Locus-Specific Databases have included mutations not published anywhere in the literature, even the close integration of these facilities will be inadequate to the task of meeting the demands likely to be made upon a central data repository. Table 2 illustrates that a substantial proportion of published mutation data are derived from genes in which only a handful of lesions have so far been characterised. In such cases the establishment of a Locus-Specific Database is not warranted. Indeed, such a resource is currently accessible via the Internet for only 58/628 (9%) of genes also referred to in HGMD. Although mutation data associated with these genes should comprise 48% mutations in HGMD (assuming the Locus-Specific Databases to be sufficiently comprehensive), the obvious lack of general coverage stresses the point that comprehensive collection of mutation data can only be performed in generalised fashion. To this end, HGMD has instituted a collaboration with Springer-Verlag GmbH, Heidelberg, to make online submission and electronic publication of human gene mutation data possible (4). These data will be published regularly by Springer's journal Human Genetics in both electronic and printed form. Once published, the data will be transmitted to Cardiff and deposited in HGMD. It is hoped that other journals may eventually follow suit.
Mutation type
Number of entries per gene
1
2
3
4-5
6-10
11-25
26-50
51-100
>100
Single bp substitutions
Missense/nonsense
126
86
38
53
96
94
41
25
6
Splicing
100
58
30
17
31
20
3
2
0
Regulatory
19
4
3
3
4
2
0
0
0
Other lesions
Small deletions ([le]20 bp)
112
52
31
45
40
24
12
4
1
Small insertions ([le]20 bp)
94
44
14
18
26
11
2
0
0
Small indels ([le]20 bp)
45
12
1
1
1
0
0
0
0
Repeat expansions
15
0
0
0
0
0
0
0
0
CONCLUSIONS AND OUTLOOK
The authors wish to thank SmithKline Beecham, Pfizer and the Deutsche Forschungsgemeinschaft for their financial support and Iain Fenton for computer assistance.
ACKNOWLEDGEMENTS
REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.
This article has been cited by other articles:
![]() |
E. A. Adie, R. R. Adams, K. L. Evans, D. J. Porteous, and B. S. Pickard SUSPECTS: enabling fast and effective prioritization of positional candidates Bioinformatics, March 15, 2006; 22(6): 773 - 774. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Mathe, M. Olivier, S. Kato, C. Ishioka, P. Hainaut, and S. V. Tavtigian Computational approaches for predicting the biological effect of p53 missense mutations: a comparison of three sequence analysis based methods Nucleic Acids Res., March 6, 2006; 34(5): 1317 - 1325. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. E.L.M. Vissers, J. A. Veltman, A. G. van Kessel, and H. G. Brunner Identification of disease genes by whole genome CGH arrays Hum. Mol. Genet., October 15, 2005; 14(suppl_2): R215 - R223. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Thomas and A. Kejariwal Coding single-nucleotide polymorphisms associated with complex vs. Mendelian disease: Evolutionary evidence for differences in molecular effects PNAS, October 26, 2004; 101(43): 15398 - 15403. [Abstract] [Full Text] [PDF] |
||||
![]() |
M L Bisgaard, R Ripa, A L Knudsen, and S Bulow Familial adenomatous polyposis patients without an identified APC germline mutation have a severe phenotype Gut, February 1, 2004; 53(2): 266 - 270. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. D. Thomas, M. J. Campbell, A. Kejariwal, H. Mi, B. Karlak, R. Daverman, K. Diemer, A. Muruganujan, and A. Narechania PANTHER: A Library of Protein Families and Subfamilies Indexed by Function Genome Res., September 1, 2003; 13(9): 2129 - 2141. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. CARNES, M. JACOBSON, L. LEINWAND, and M. YARUS Stop codon suppression via inhibition of eRF1 expression RNA, June 1, 2003; 9(6): 648 - 653. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. K. Pritchard and N. J. Cox The allelic architecture of human disease genes: common disease-common variant... or not? Hum. Mol. Genet., October 1, 2002; 11(20): 2417 - 2423. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Ng and S. Henikoff Accounting for Human Polymorphisms Predicted to Affect Protein Function Genome Res., March 1, 2002; 12(3): 436 - 446. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Kerlavage, V. Bonazzi, M. di Tommaso, C. Lawrence, P. Li, F. Mayberry, R. Mural, M. Nodell, M. Yandell, J. Zhang, et al. The Celera Discovery SystemTM Nucleic Acids Res., January 1, 2002; 30(1): 129 - 136. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. C. Ng and S. Henikoff Predicting Deleterious Amino Acid Substitutions Genome Res., May 1, 2001; 11(5): 863 - 874. [Abstract] [Full Text] |
||||
![]() |
P.D. Lewis, J.S. Harvey, E.M. Waters, and J.M. Parry The Mammalian Gene Mutation Database Mutagenesis, September 1, 2000; 15(5): 411 - 414. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Tarczy-Hornoch, P. Shannon, P. Baskin, M. Espeseth, and R. A. Pagon GeneClinics: A Hybrid Text/Data Electronic Publishing Model Using XML Applied to Clinical Genetic Testing J. Am. Med. Inform. Assoc., May 1, 2000; 7(3): 267 - 276. [Abstract] [Full Text] |
||||
![]() |
D. R. Maglott, K. S. Katz, H. Sicotte, and K. D. Pruitt NCBI's LocusLink and RefSeq Nucleic Acids Res., January 1, 2000; 28(1): 126 - 128. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Wingender, X. Chen, R. Hehl, H. Karas, I. Liebich, V. Matys, T. Meinhardt, M. Pru{beta}, I. Reuter, and F. Schacherer TRANSFAC: an integrated system for gene expression regulation Nucleic Acids Res., January 1, 2000; 28(1): 316 - 319. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Brookes, H. Lehvaslaiho, M. Siegfried, J. G. Boehm, Y. P. Yuan, C. M. Sarkar, P. Bork, and F. Ortigao HGBASE: a database of SNPs and other variations in and around human genes Nucleic Acids Res., January 1, 2000; 28(1): 356 - 360. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. T. Furlong, C. D. Hough, C. A. Sherman-Baust, E. S. Pizer, and P. J. Morin Evidence for the Colonic Origin of Ovarian Cancer Cell Line SW626 J Natl Cancer Inst, August 4, 1999; 91(15): 1327 - 1328. [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||









