Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (239K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Attimonelli, M.
Right arrow Articles by Saccone, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Attimonelli, M.
Right arrow Articles by Saccone, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 120-125  


Update of MmtDB: a Metazoa mitochondrial DNA variants database
Introduction
MmtDB Data
MmtDB Data Source
MmtDB Structure
MmtDB Data Distribution And Database Access
Conclusion
Acknowledgements
References


Update of MmtDB: a Metazoa mitochondrial DNA variants database

Update of MmtDB: a Metazoa mitochondrial DNA variants database

M. Attimonelli*, D. Calò, A. De Montalvo2, C. Lanave3, D. Sasanelli, M. Tommaseo Ponzetta1, C. Saccone

Dipartimento di Biochimica e Biologia Molecolare and 1Dipartimento di Zoologia e Anatomia Comparata, Università di Bari, 70126 Bari, Italy, 2Departamento de Biologia Molecular, Universidad de Cantabria, 39011 Santander, Spain and 3Centro Studi Mitocondri e Metabolismo Energetico, C.N.R., 70126 Bari, Italy

Received October 2, 1997; Accepted October 3, 1997

ABSTRACT

The present paper describes the improvements in MmtDB, a specialised database designed to collect Metazoa mitochondrial DNA variants. Priority in the data collection has been given to Metazoa for which a large amount of variants is available, e.g., for humans. Starting from the sequences available in the Nucleotide Sequence Databases, the redundant sequences have been removed and new sequences from other sources have been added. Value-added information is associated to each variant sequence, e.g., analysed region, experimental method, tissue and cell lines, population data, sex, age, family code and information about the variation events (nucleotide position, involved gene, restriction site gain or loss). Cross-references are introduced to the EMBL Data Library, as well as an internal cross-referencing among MmtDB entries according to tissual, heteroplasmic, familiar and aplotypical correlation. Furthermore MmtDB has a new section, AMmtDB: Aligned Metazoan mitochondrial biosequences. MmtDB can be accessed through the World Wide Web at URL http://WWW.ba.cnr.it/~areamt08/MmtDBWWW.htm

INTRODUCTION

Mitochondria are subcellular organella under the control of both nuclear and mitochondrial genomes. The mitochondrion is the only organelle in Metazoa that contains its own DNA (1). The idea to create a Metazoa mtDNA variants specialised database (MmtDB) originated from the awareness that a large mass of information associated to mtDNA sequences is not stored in the primary databases which instead contain redundant information (e.g., bibliographic and taxonomic). Therefore MmtDB has been designed and implemented as a subset of the primary databases, accurately revised and enriched with specific information pertaining to the biological features of each entry, aimed at providing new data structure and generating new cross-references between sets of data completely unlinked up till now.

MmtDB is characterised as being a collection of variants and not simply a collection of Metazoa mtDNA sequences. A variant is therefore, for each species of the class Metazoa, a fragment where nucleotide differences (variations) are detected as compared to a reference sequence.

The database has been updated and several improvements have been implemented: (i) a more detailed codification for subject and pedigree; (ii) a new section of MmtDB, the Aligned Metazoan mitochondrial biosequences (AMmtDB), has been added; (iii) the human data have been implemented in SRS (2).

MmtDB DATA

MmtDB was originally designed as a Metazoan database, and our group started with the management of vertebrate and particularly human data. Yet we realized a much greater effort was needed to cover all Metazoa and some collaboration was sought. So presently MmtDB is part of the MitBASE project, a comprehensive and integrated mitochondrial database funded by the EU BIOTECHNOLOGY Programme. MitBASE is developed by a network of six nodes, each collecting and editing data on different groups of organisms (protists, plants, fungi, vertebrates, invertebrates and humans), by a bioinformatic node (EBI) and a node dealing with a pilot project on nuclear genes related to mitochondria.

The vertebrate mitochondrial genome is a closed round molecule having a size between 15 000 and 20 000 bp (3). The peculiar features of this molecule are its high compactness and simplicity; mitochondrial DNA (mtDNA) is dense with information, does not have introns, contains only orthologous single copy genes, and lacks recombination. It is generally composed of 13 genes coding for proteins, 22 tRNA genes and two rRNA genes. It shows a major non-coding region, called D-loop in vertebrates, which is involved in regulatory processes. The D-loop is the most variable part of the genome and is thus used as a marker for human diversity studies (4,5).

The mtDNA has unusual genetics. It is maternally inherited (6) and is polyploid, which means it is present, in both the cell and organelle, with a high copy number. Therefore the mtDNA in the same organelle, cell, tissue, organ or subject may not be homogenous, indeed when a new mutation occurs it creates a mixed intracellular population of mutant and normal mtDNAs known as heteroplasmy. When a heteroplasmic cell divides, the mutant and the normal DNAs are randomly distributed in the daughter cells (mitotic segregation process) (7). At some stages in oogenesis, the amount of mtDNA molecules is reduced to a relatively small number (bottleneck hypothesis) (8). Following these stages, over-replication brings the amount of molecules in each DNA cell to its normal high level and this can lead to a relatively pure population of each genome that pre-existed in the original parental organelle (9).

Moreover, the location of mtDNA, which is attached to the mitochondrial inner membrane and close to the respiratory chain, its lack of protective proteins (e.g., histones) and a very poor DNA repair system (10) make mtDNA particularly prone to mutation.


Figure 1 MmtDB structure. MmtDB is structured into two large classes: SPECIES and VARIANTS. Each of these classes is further organised into subclasses. For each species n variants are possible. Each variant is an entry in MmtDB database.


Figure 2 Each class is organised into subclasses. The subclasses associated with each human variant are shown.


Figure 3 Example of MmtDB entry in flatfile format. The fields in bold contain classes of information usually not stored in the flatfile of the primary databases, e.g., SM, reference sequence accession number; SO, source, i.e., tissue or cell lines from which the mtDNA was extracted; IN, information on the subjects, i.e., number of subjects for which the same type of source has produced the same variant with the same analysis, sex, age, pathological or normal status, family code; CP, classification of the population to which the subjects belong, Continental groups and Population groups; CL, linguistic classification to which the subjects belong; DR, cross-referencing to the primary databases or/and to MmtDB. In the entry the DR line refers to a family correlation with another MmtDB entry (MmtDB_F:HSP0337); EE, experimental technique used (PCR RFLP); AR, analysed region based on the reference sequence; CC, other comments on the entry.


Because of its reduced size and the fact that mtDNA is used in several fields of applied biology (11), the number of sequences of mitochondrial genes and complete genomes is growing exponentially. In particular much information on polymorphic regions of mtDNA is now available in the literature. For human mtDNA, several of the sequences managed in MmtDB are related to evolutionary studies and are relevant to the hypervariable segments (HVI, HVII) of the D-loop (12-15), and as many others are in connection with pathology studies on alterations of the mtDNA (deletions, insertions and point mutations) (16,17).

The human data are coded using as reference the nucleotide sequence published by Anderson et al. in 1981 (18), which, despite being a hybrid (derived in part from placental mtDNA and in part from HeLa cell mtDNA), represents an important reference in human variability studies.

MmtDB DATA SOURCE

The Metazoa mitochondrial sequences are retrieved from the EMBL (19) and GenBank (20) primary databases.

The data are extracted from the primary databases using the GCG (21), ACNUC (22) and SRS packages. The comparison between a reference sequence and each potential variant is performed by applying the GCG program BESTFIT. The published sequence data which are not included in the primary databases are extracted from bibliographic databases (Medline, Current Contents), from Entrez or other information systems. Congress acta and unpublished data kindly provided by the authors are also included.

MmtDB STRUCTURE

The data in MmtDB are organised into two large classes: SPECIES and VARIANTS (Fig. 1). Each of these classes is further organised into subclasses or objects (an example of subclasses associated to each human variant is shown in Figure 2). To each species, n variants are associated and each variant is an entry in the MmtDB database. The SPECIES class refers to the items in the database which can be associated to a biological species of METAZOA and for which mtDNA data are available.

To the class SPECIES the following objects are associated: the Reference Sequence, the Gene and Restriction Endonuclease Maps, the Taxonomic Classification and the Bibliography. The reference sequence(s) is represented by the nucleotide sequence of the complete mitochondrial genome, if the genome of that species has been fully sequenced, or otherwise if one or more fragments have been sequenced, by the longest sequence of each fragment. In any case, the reference sequence is also considered as a variant.

The VARIANTS class includes as objects information items specific of the fragment under consideration, such as: (i) the location of the fragment in the reference sequence (analysed region); (ii) the experimental method used for the detection of the variant, e.g., Sanger, Maxam and Gilbert, RFLP, Southern or PCR; (iii) the pattern of the variation events with respect to the reference sequence, i.e., the nucleotide position in the reference sequence where the variation occurs, the type of variation (point mutations, deletions or insertions), the involved gene and the loss or gain of a restriction site following the variation; (iv) bibliographic references; (v) the tissue or cell lines from which the DNA was extracted; (vi) population data, relevant to the geographic and linguistic origin of the subjects from which the DNA was extracted; (vii) the age and the sex of the subjects and their pathological or normal status.

Population data are coded according to geographical (Continental groups) and anthropological (Population groups) classifications. A linguistic classification according to Ruhlen (23) (Linguistic phylum, Language group and Language) has also been added. These classifications are often limited by the scarcity of information reported in the original papers. The subject origin has been better coded according to geographical coordinates and will soon be implemented.

When the variant has been extracted from the Primary Databases it is cross-referenced through the accession number and the entry name which univocally identify each data-entry in the primary databases (DR field in Fig. 3). Then the entries are internally cross-referenced through their accession number in MmtDB (AC fields in Fig. 3) in order to link different but correlated entries. The correlation can be based on the Tissue type (T), the Aplotype (A) , the Family (F) or the Heteroplasmic status (H).


Figure 4 Number of aligned sequences in AMmtDB. The aligned complete sequences are divided into two tables: coding genes and tRNA genes.

The family correlation pertains mainly to human data and namely to mitochondrial disease studies. The family code, defined in MmtDB has been improved by adding the subject pedigree, where known, in order to define if the pathology has been maternally inherited. To each studied pedigree a progressive family code is assigned, e.g., AB, followed by a roman number identifying the generation and an Arabic number identifying the subject in the generation. The spouses of direct descendants will have a different family code, e.g., AC, AD, because they belong to a different family. If no information on spouses is reported in the molecular study no code is assigned.

Starting from the second generation, direct descendants are coded as follows: family code, in brackets codes of ancestors added with sex information (m for male and f for female), subject code, e.g. AB(m I.1/f I.2).II.1. Whenever the parents mating is consanguineous, the / symbol in brackets is substituted with the = symbol. In further generations only the parent belonging to the family pedigree is reported with the sex code as follows: AB[(m I.1/f I.2):f II.4:m III.2)].IV.5. If the direct ancestor has been studied the ancestor code is followed by * to mark information on the subjects which are present in the database. The same rule will be applied for any further generation and consanguineous mating in the pedigree.


Figure 5 MmtDB home page. Underlined words can be clicked to navigate in MmtDB.


For subjects whose information on the pedigree is not reported, the general code AA.I.1 is used. Whenever special notations are used in the paper, such as letters, they are kept in the family code. Example AB.I.A. If information on generation and number of subjects in the generation is missing, a question mark is used in the code.

Figure 3 shows an example of MmtDB entry in the flatfile format, which is commonly used by most of the biological databases such as the EMBL data library, GenBank, SWISS-PROT (24) and many others.

In AMmtDB, the Metazoa mitochondrial sequences of complete genes coding for proteins and tRNAs carefully aligned, have been stored and can be retrieved. The keys for selection are name of genes, name of species. The alignment of the sequences has been performed by using different programmes CLUSTAL (25), PILEUP from GCG package (26) and optimized manually. Figure 4 shows the number of the aligned sequences in AMmtDB stored up to date. Alignment of both nucleotide and amino acid sequences can be viewed.

MmtDB DATA DISTRIBUTION AND DATABASE ACCESS

A World Wide Web site has been developed to allow easy access to the information in the MmtDB at the following address: http://WWW.ba.cnr.it/~areamt08/MmtDBWWW.htm . The MmtDB home page is shown in Figure 5.

MmtDBWWW is an interrogation system which allows a point-and-click interface for the selection of lists of entries on which the following tasks can be performed:

  • flatfile generation for each variant entry (Fig. 3)
  • nucleotide sequence extraction of variant sequences based on a reference sequence with nucleotide variations in capital letters
  • analysis of variation events (Fig. 6).


Figure 6 Results of MmtDB_WWW query for the analysis of variation events. For each entry name the variation event, its position, the gene name, amino acid change, the associated pathology, pathological status and lost or gained enzyme restriction sites are reported.

The selection can be performed using at maximum the Boolean combination of four of the following criteria: gene code, source, technique name, continental group, linguistic family, linguistic group, language, family code, sex code, age, pathology acronym, variation event code, variation position, analysed region, restriction enzyme name and all text, that is search for a specific word in the entire flatfile entry.

Within the collaboration with the HmutDB (a federated single human mutation database), the human mtDNA variants have been implemented in SRS by Heiki Lehavaslaiho in the section `Mutation Databases'. In particular the `View' option allows data to be easily retrieved into tables which can be processed by any graphics software in order to obtain any statistical view of the data according to user requirements.

CONCLUSION

MmtDB is a complete database and its structure is suitable for the flexible organization of several different information items, which can then be easily retrieved.

In MmtDB redundancy has been minimised by comparing each new sequence against the whole set of stored sequences before it is entered as a new entry. Every effort has been made to ensure the accuracy of the data.

Database users are constantly encouraged to provide comments and possibly new data to include in the database. We believe that the contribution of the mitochondrion community, of biochemists, clinicians, geneticists and taxonomists is essential for allowing the implementation and growth of the project.

ACKNOWLEDGEMENTS

This work has been partially supported by MPI (Italy), and by CNR Research Area of Bari, Italy.

REFERENCES

1. Nass,S. and Nass,M.M.K. (1983) J. Cell. Biol., 19, 593-628.

2. Etzold,T. and Argos,P. (1993) Comput. Appl. Biosci., 9, 49-57. MEDLINE Abstract

3. Wolstenholme,D.R. (1992) Int. Rev. Cytol., 141, 173-216. MEDLINE Abstract

4. Vigilant,L., Stoneking,M., Harpending,H., Hawkes,K. and Wilson,A. (1991) Science, 253, 1503-1507. MEDLINE Abstract

5. Ward,R.H., Frazier,B.L., Dew-Jager,K.and Paabo,S. (1991) Proc. Natl. Acad. Sci. USA, 88, 8720-8724. MEDLINE Abstract

6. Giles,R.E., Blanc,H., Cann,H.M. and Wallace,D.C. (1980) Proc. Natl. Acad. Sci. USA, 77, 6715-6719. MEDLINE Abstract

7. Wallace,D.C. (1986) Somatic Cell Mol. Genet., 12, 41-49. MEDLINE Abstract

8. Haurswirth,W.W. and Laipis,P.J. (1982) Proc. Natl. Acad. Sci. USA, 79, 4686-4690.

9. Haurswirth,W.W. and Laipis,P.J. (1985) In Quagliariello,E. et al). (eds), Achievements and Perspectives of Mitochondrial Research. Elsevier Science Publishers B.V, Vol.II:Biogenesis, pp.49-59.

10. Schon,E.A. (1993) In DiMauro,S. and Wallace,D.C. (eds), Mitochondrial DNA in Human Pathology. Raven Press Ltd., New York, pp.1-7.

11. Saccone,C. (1994) Curr. Opin. Genet. Dev., 4, 875-881. MEDLINE Abstract

12. Cann,R.L., Stoneking,M. and Wilson,A.C. (1987) Nature, 325, 31-36. MEDLINE Abstract

13. Stoneking,M., Jorde,L.B., Bhatia,K. and Wilson,A.C. (1990) Genetics, 124, 717-733. MEDLINE Abstract

14. Torroni,A. and Wallace,D.C. (1994) J. Bioenerg. Biomem., 26, 261-271.

15. Ayala,F.J. and Escalante,A.A. (1995) Mol. Phylogenet. Evol., 5, 188-201.

16. Wallace,D.C. (1992) Annu. Rev. Biochem., 61, 1172-1212.

17. Wallace,D.C. (1992) Science, 256, 628-632. MEDLINE Abstract

18. Anderson,S., Bankier,A.T., Barrell,B.G., Debrujin,M.H., Coulson,A.R., Drouin,J., Eperon,I.C., Nierlich,D.P., Roe,B.A., Sanger,F., Schreir,P.H., Smith,A.J., Staden,R. and Young,I.G.(1981) Nature, 290, 457-465. MEDLINE Abstract

19. Rice,C.M., Fuchs,R., Higgins,D.G., Stoehr,P.J.and Cameron,G.N. (1993) Nucleic Acids Res., 21, 2967-2971. MEDLINE Abstract

20. Benson,D., Lipman,D.J. and Ostell,J. (1993) Nucleic Acids Res., 21, 2963-2965. [See also this issue Nucleic Acids Res. (1998) 26, 1-7.]

21. Devereux,J., Haeberli,P. and Smithieso (1984) Nucleic Acids Res., 12, 387-395. MEDLINE Abstract

22. Gouy,M., Gautier,C., Attimonelli,M., Lanave,C. and DiPaola,G. (1985) Comput. Appl. Biosci., 1, 167-172. MEDLINE Abstract

23. Ruhlen M. (1991) In Edward Arnold (ed.) A Guide to the World's languages Vol.I.

24. Bairoch,A. and Boeckmann,B. (1991) Nucleic Acids Res., 19, 2247-2249. MEDLINE Abstract

25. Devereux J., Haeberli,P. and Smithies,O. (1994) Nucleic Acids Res., 12, 387-395.

26. Higgins,D.G. and Sharp,P.M. (1989) Comput. Appl. Biosci., 5, 151-153. MEDLINE Abstract


* To whom correspondence should be addressed. Tel: +39 80 548 2180; Fax: +39 80 548 4467; Email: marcella@area.ba.cnr.it


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals Comments and feedback: www-admin{at}oup.co.uk
Last modification: 17 Dec 1997
Copyright© Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?



This Article
Right arrow Abstract Freely available
Right arrow Print PDF (239K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Attimonelli, M.
Right arrow Articles by Saccone, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Attimonelli, M.
Right arrow Articles by Saccone, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?