| Nucleic Acids Research | Pages |
Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences
Introduction
Assembling AMmtDB Database
Data source
Data organization
Alignment of the data
Content Of AMmtDB
AMmtDB Flatfile
Availability Of AMmtDB
Conclusions And Perspectives
Acknowledgements
References
Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences
ABSTRACT
INTRODUCTION
Molecular evolution, including molecular systematics and phylogeny, is one of the fields in which the sequences of the mitochondrial genomes are more frequently used.
In vertebrates as in all metazoa, the mitochondrial (mt) genome (1) has circular shape, a genome length ranging around 16-17 kb, a very compact gene organization-no space between genes, in some cases short overlaps of genes, the presence of only one major non-coding region containing in general the main regulatory elements. The map of the genes coded in the vertebrate mitochondrial genome is reported in Figure
Figure 1. Organization of the vertebrate mitochondrial genome: the different variation of gene positions in Galliforme and Marsupiala orders are shown. tRNA genes are specified by the one letter code of the amino acid they transport. For these studies a very important prerequisite needs to be met, that is the best alignment of the sequences under comparison. Within metazoa, protein coding genes generally do not present particular problems, whereas ribosomal and transfer RNA genes are not easy to align even between closely related species. In these latter molecules we also have to take into account the secondary structure requirements, and possibly even the tertiary structures if known. Particular attention requires the alignment of the main non-coding regulatory regions, in vertebrates called D-loop regions. In our laboratory we have shown that this region evolves in a species-specific manner and is able to accept large and short sequence repeats (2). The availability of a database reporting the multi-aligned mtDNA genes associated with a system allowing the extraction and management of the selected data according to the needs of the end-users may be extremely useful. Here we describe AMmtDB database that in the previous issue was published as part of the paper describing MmtDB (3), the metazoa mtDNA variants database whose data are now incorporated in MitBASE (4,5). At present AMmtDB collects the multi-aligned sequences of the vertebrate mitochondrial genes coding for proteins and tRNAs; an interesting addition to the data presented last year is the presence of the new section of multi-aligned D-loop sequences. Mitochondrial ribosomal RNA genes have not been considered in AMmtDB because already several compilations of small and large ribosomal RNAs (6,7) are available.
ASSEMBLING AMmtDB DATABASE
Data source
Sequence data are mainly retrieved from the primary databases [EMBL data library (8) and GenBank (9)] using ACNUC (10) retrieval system. Another source of data collection is the literature for the published sequence data not included in the primary databases. Unpublished data kindly provided by the authors are also entered. In AMmtDB, not all the partial sequences are included and only one sequence for each gene of a species has been entered in order not to overload the database with incomplete and redundant information.
Data organization
The database is organized into three main sections: (i) the multi-aligned sequences of the protein coding genes (sequence class code: CDS); (ii) the multi-aligned sequences of the tRNA coding genes (sequence class code: tRNA); and (iii) the multi-aligned D-loop sequences (sequence class code: D-loop).
The genes coding for proteins are multi-aligned on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. For genes coding for tRNAs the multi-alignments based on the primary structure are reported. Furthermore, for species with a completely sequenced mt genome the following multi-aligned supergenes have been produced: SUP, the multi-alignment of the supergenes constructed by joining the 13 coding genes; PSUP and SSUP, the multi-alignment of the supergenes constructed by joining the 22 tRNAs aligned on the basis of the primary and secondary structures, respectively.
The D-loop section contains the multi-alignments of the conserved sequence region in the 27 mammalian species considered in the studies by Sbisà et al. (2). These regions are: the extended termination associated sequences (ETAS1 and ETAS2), the conserved sequence blocks (CSB1, CSB2 and CSB3) and the central domain (Central).
The multi-aligned sequences are grouped according to their taxonomic class. Hence each multi-alignment file is identified with a name composed of a 3-letter class code and a gene name code preceded by a letter identifying nucleotide (N) or amino acid (A) multi-alignments for the CDS class and primary (P) or secondary (S) structure multi-alignments for the tRNA class. The classes for the presently available vertebrate data are: mammals, amphibian, reptilian, aves, osteichthyes and condroichthyes.
A code-name unequivocally identifies each multi-aligned sequence of AMmtDB. The first part (two or three characters) refers to the taxonomic scientific name of the species, the second part to the gene (from three to six characters).
Alignment of the data
The alignment of sequences has been performed by using different programs, CLUSTALV (11) and PILEUP (12) from the GCG package (13) and the optimization of the alignment has been performed manually by using SEAVIEW (14) and GeneDOC (15).
Table 1.
The multi-alignment based on the secondary structure of genes coding for tRNAs has been performed manually taking into account the published clover-leaf structures. The interest in making available tRNA multi-alignments based both on the primary and secondary structures rests on the still unsolved issue of whether the evolutionary process of RNA molecules depends on structural constraints (16). Furthermore, one of the main references in this work has been the paper of Sprinzl et al. (17).
The multi-alignment of complete D-loop sequences is particularly difficult due to heterogeneity in length and the presence of repeated sequences. As shown by Sbisà et al. (2), this region evolves in a species-specific manner.
For the D-loop region we report only the multi-alignment of the regions which are conserved in all mammalian species (ETASs, Central and CSBs). The identification of these blocks is the result of an extensive manual revision of the preliminary output obtained with the PILEUP program aimed at optimizing sequence similarity of the complete D-loop sequences.
All the multi-alignment files are stored in the MSF format.
CONTENT OF AMmtDB
Table 2.
AMmtDB FLATFILE
An AMmtDB flatfile format (ff) has been defined. The schematic representation of the ff is reported in Figure
Figure 2. AMmtDB flatfile format. AMmtDB can be retrieved on the Web by using SRS (19) server at the BioWWW site on the basis of the above described ff. The SRS system allows remote public access through pre-existing and widely accessible client-server software, thereby permitting easy interactive browsing, sophisticated, and at the same time intuitive, query possibilities, and easy downloading of single or multiple query results. The SRS query form allows to search for the Entry Name, the GENE_CLASS, the GENE_NAME, the PRODUCT_NAME, the ORGANISM_SPECIES, the ORGANISM_ORDER and the ORGANISM_CLASS besides the EMBL/GenBank accession number through the cross-referencing lines and the multi-alignment sequence files. These files are displayed in MSF format and they can be managed and retrieved through any WEB browser interface with different application programs. In Table 3 software that can be freely downloaded for the different operative systems is listed. By entering the file in any of these editing software programs, the user may pick up the sequences of interest. Table 3. An example of the usage of AMmtDB through SRS and the GeneDoc program is shown in Figure Figure 3. The figure reproduces the display of an AmmtDB entry retrieved trough SRS and the following view of the multi-aligned file managed with the GeneDoc program. AMmtDB is a very useful database for different research purposes: in the laboratory, to find consensus features, and, in theoretical studies, for outlining the molecular evolution of the species. AMmtDB is a database that unfortunately reflects the same biases as the primary databases. The class of mammals has the greatest number of sequenced mt genes; furthermore the number of entries for each class is unequally distributed on the different genes and on the different organisms. AMmtDB will be updated with the new EMBL/GenBank database releases. The database users are constantly encouraged to provide comments and possibly new data to include in the database. Users of this database are kindly required to cite the present article. This work is partly funded by EU Biotech Program under the contract BIO4 CT950160. We thank Vito Volpetti for his contribution to the multi-alignment of the tRNA genes.
AVAILABILITY OF AMmtDB
CONCLUSIONS AND PERSPECTIVES
ACKNOWLEDGEMENTS
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
P. F. Chinnery, D. T. Brown, R. M. Andrews, R. Singh-Kler, P. Riordan-Eva, J. Lindley, D. A. Applegarth, D. M. Turnbull, and N. Howell
The mitochondrial ND6 gene is a hot spot for mutations that cause Leber's hereditary optic neuropathy
Brain,
January 1, 2001;
124(1):
209 - 218.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. Lanave, S. Liuni, F. Licciulli, and M. Attimonelli
Update of AMmtDB: a database of multi-aligned Metazoa mitochondrial DNA sequences
Nucleic Acids Res.,
January 1, 2000;
28(1):
153 - 154.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (169K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (5)
![]()
Request Permissions ![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Lanave, C.
![]()
Articles by Saccone, C.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Lanave, C.
![]()
Articles by Saccone, C.
![]()
Social Bookmarking ![]()
![]()
What's this?