Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (169K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Lanave, C.
Right arrow Articles by Saccone, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lanave, C.
Right arrow Articles by Saccone, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research Pages 134-137  


Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences
Introduction
Assembling AMmtDB Database
   Data source
   Data organization
   Alignment of the data
Content Of AMmtDB
AMmtDB Flatfile
Availability Of AMmtDB
Conclusions And Perspectives
Acknowledgements
References


Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences

Update of AMmtDB: a database of multi-aligned metazoa mitochondrial DNA sequences

Cecilia Lanave, Marcella Attimonelli*, Mariateresa De Robertis, Flavio Licciulli2, Sabino Liuni, Elisabetta Sbisà and Cecilia Saccone1

Centro di Studio sui Mitocondri e Metabolismo Energetico, C.N.R., Via Amendola 165/A, 70126 Bari, Italy, 1Dipartimento di Biochimica e Biologia Molecolare, Università di Bari, Via Orabona 4, 70126 Bari, Italy and 2Area di Ricerca di Bari, C.N.R., Via Amendola 166/5, 70126 Bari, Italy

Received October 5, 1998; Accepted October 8, 1998

ABSTRACT

The present paper describes AMmtDB, a database collecting the multi-aligned sequences of vertebrate mitochondrial genes coding for proteins and tRNAs, as well as the multiple alignment of the mammalian mtDNA main regulatory region (D-loop) sequences. The genes coding for proteins are multi-aligned based on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. As far as the genes coding for tRNAs are concerned, the multi-alignments based on the primary and the secondary structures are both provided; for the mammalian D-loop multi-alignments we report the conserved regions of the entire D-loop (CSB1, CSB2, CSB3, the central region, ETAS1 and ETAS2) as defined by Sbisà et al. [Gene (1997), 205, 125-140). A flatfile format for AMmtDB has been designed allowing its implementation in SRS (http://bio-www.ba.cnr.it:8000/BioWWW/#AMMTDB ). Data selected through SRS can be managed using GeneDoc or other programs for the management of multi-aligned data depending on the user's operative system. The multiple alignments have been produced with CLUSTALV and PILEUP programs and then carefully optimized manually.

INTRODUCTION

Molecular evolution, including molecular systematics and phylogeny, is one of the fields in which the sequences of the mitochondrial genomes are more frequently used.

In vertebrates as in all metazoa, the mitochondrial (mt) genome (1) has circular shape, a genome length ranging around 16-17 kb, a very compact gene organization-no space between genes, in some cases short overlaps of genes, the presence of only one major non-coding region containing in general the main regulatory elements. The map of the genes coded in the vertebrate mitochondrial genome is reported in Figure 1. Because of its reduced size the metazoan mt genome can be completely sequenced rather easily thus making comparative studies possible not only at the gene but also at the genomic level. This and other properties, such as the maternal type of inheritance, the lack of recombination and namely the presence in the genome of orthologous genes explain very well the large use of mt DNA in molecular evolution not only at the qualitative but also at the quantitative level.


Figure 1. Organization of the vertebrate mitochondrial genome: the different variation of gene positions in Galliforme and Marsupiala orders are shown. tRNA genes are specified by the one letter code of the amino acid they transport.

For these studies a very important prerequisite needs to be met, that is the best alignment of the sequences under comparison. Within metazoa, protein coding genes generally do not present particular problems, whereas ribosomal and transfer RNA genes are not easy to align even between closely related species. In these latter molecules we also have to take into account the secondary structure requirements, and possibly even the tertiary structures if known.

Particular attention requires the alignment of the main non-coding regulatory regions, in vertebrates called D-loop regions. In our laboratory we have shown that this region evolves in a species-specific manner and is able to accept large and short sequence repeats (2).

The availability of a database reporting the multi-aligned mtDNA genes associated with a system allowing the extraction and management of the selected data according to the needs of the end-users may be extremely useful.

Here we describe AMmtDB database that in the previous issue was published as part of the paper describing MmtDB (3), the metazoa mtDNA variants database whose data are now incorporated in MitBASE (4,5).

At present AMmtDB collects the multi-aligned sequences of the vertebrate mitochondrial genes coding for proteins and tRNAs; an interesting addition to the data presented last year is the presence of the new section of multi-aligned D-loop sequences. Mitochondrial ribosomal RNA genes have not been considered in AMmtDB because already several compilations of small and large ribosomal RNAs (6,7) are available.

ASSEMBLING AMmtDB DATABASE

Data source

Sequence data are mainly retrieved from the primary databases [EMBL data library (8) and GenBank (9)] using ACNUC (10) retrieval system. Another source of data collection is the literature for the published sequence data not included in the primary databases. Unpublished data kindly provided by the authors are also entered. In AMmtDB, not all the partial sequences are included and only one sequence for each gene of a species has been entered in order not to overload the database with incomplete and redundant information.

Data organization

The database is organized into three main sections: (i) the multi-aligned sequences of the protein coding genes (sequence class code: CDS); (ii) the multi-aligned sequences of the tRNA coding genes (sequence class code: tRNA); and (iii) the multi-aligned D-loop sequences (sequence class code: D-loop).

The genes coding for proteins are multi-aligned on the translated sequences and both the nucleotide and amino acid multi-alignments are provided. For genes coding for tRNAs the multi-alignments based on the primary structure are reported. Furthermore, for species with a completely sequenced mt genome the following multi-aligned supergenes have been produced: SUP, the multi-alignment of the supergenes constructed by joining the 13 coding genes; PSUP and SSUP, the multi-alignment of the supergenes constructed by joining the 22 tRNAs aligned on the basis of the primary and secondary structures, respectively.

The D-loop section contains the multi-alignments of the conserved sequence region in the 27 mammalian species considered in the studies by Sbisà et al. (2). These regions are: the extended termination associated sequences (ETAS1 and ETAS2), the conserved sequence blocks (CSB1, CSB2 and CSB3) and the central domain (Central).

The multi-aligned sequences are grouped according to their taxonomic class. Hence each multi-alignment file is identified with a name composed of a 3-letter class code and a gene name code preceded by a letter identifying nucleotide (N) or amino acid (A) multi-alignments for the CDS class and primary (P) or secondary (S) structure multi-alignments for the tRNA class. The classes for the presently available vertebrate data are: mammals, amphibian, reptilian, aves, osteichthyes and condroichthyes.

A code-name unequivocally identifies each multi-aligned sequence of AMmtDB. The first part (two or three characters) refers to the taxonomic scientific name of the species, the second part to the gene (from three to six characters).

Alignment of the data

The alignment of sequences has been performed by using different programs, CLUSTALV (11) and PILEUP (12) from the GCG package (13) and the optimization of the alignment has been performed manually by using SEAVIEW (14) and GeneDOC (15).


Table 1. The number of the CDS sequences listed in AMmtDB for the different genes and classes of organisms
The bottom row reports the total number of CDS genes present in the AMmtDB database.

The multi-alignment based on the secondary structure of genes coding for tRNAs has been performed manually taking into account the published clover-leaf structures. The interest in making available tRNA multi-alignments based both on the primary and secondary structures rests on the still unsolved issue of whether the evolutionary process of RNA molecules depends on structural constraints (16). Furthermore, one of the main references in this work has been the paper of Sprinzl et al. (17).

The multi-alignment of complete D-loop sequences is particularly difficult due to heterogeneity in length and the presence of repeated sequences. As shown by Sbisà et al. (2), this region evolves in a species-specific manner.

For the D-loop region we report only the multi-alignment of the regions which are conserved in all mammalian species (ETASs, Central and CSBs). The identification of these blocks is the result of an extensive manual revision of the preliminary output obtained with the PILEUP program aimed at optimizing sequence similarity of the complete D-loop sequences.

All the multi-alignment files are stored in the MSF format.

CONTENT OF AMmtDB


Table 2. The number of tRNA sequences listed in AMmtDB for the different genes and the different classes of organisms
The bottom row reports the total number of tRNA genes present in the AMmtDB database.The content of AMmtDB is schematically shown in Tables 1 and 2 that report the number of the CDS and tRNA entries, respectively, listed for the different genes and the different classes of organisms. At present, AMmtDB contains data from 888 different species: 1121 protein coding genes, 27 mammalian D-loop and 1480 tRNA (updated August 1998).

AMmtDB FLATFILE

An AMmtDB flatfile format (ff) has been defined. The schematic representation of the ff is reported in Figure 2. Each entry in the ff is associated to an organism-class/gene specific multi-alignment and assumes a name composed by both the class and the gene. Cross-referencing to the primary databases, to the multi-alignment files and to the vertebrate MitBASE data (16,18) have been implemented.


Figure 2. AMmtDB flatfile format.

AVAILABILITY OF AMmtDB

AMmtDB can be retrieved on the Web by using SRS (19) server at the BioWWW site on the basis of the above described ff. The SRS system allows remote public access through pre-existing and widely accessible client-server software, thereby permitting easy interactive browsing, sophisticated, and at the same time intuitive, query possibilities, and easy downloading of single or multiple query results. The SRS query form allows to search for the Entry Name, the GENE_CLASS, the GENE_NAME, the PRODUCT_NAME, the ORGANISM_SPECIES, the ORGANISM_ORDER and the ORGANISM_CLASS besides the EMBL/GenBank accession number through the cross-referencing lines and the multi-alignment sequence files. These files are displayed in MSF format and they can be managed and retrieved through any WEB browser interface with different application programs. In Table 3 software that can be freely downloaded for the different operative systems is listed. By entering the file in any of these editing software programs, the user may pick up the sequences of interest.


Table 3. Freely available software on the WEB to manage the MSF formatted multi-aligned files of the AMmtDB database

An example of the usage of AMmtDB through SRS and the GeneDoc program is shown in Figure 3.


Figure 3. The figure reproduces the display of an AmmtDB entry retrieved trough SRS and the following view of the multi-aligned file managed with the GeneDoc program.

CONCLUSIONS AND PERSPECTIVES

AMmtDB is a very useful database for different research purposes: in the laboratory, to find consensus features, and, in theoretical studies, for outlining the molecular evolution of the species.

AMmtDB is a database that unfortunately reflects the same biases as the primary databases. The class of mammals has the greatest number of sequenced mt genes; furthermore the number of entries for each class is unequally distributed on the different genes and on the different organisms.

AMmtDB will be updated with the new EMBL/GenBank database releases. The database users are constantly encouraged to provide comments and possibly new data to include in the database.

Users of this database are kindly required to cite the present article.

ACKNOWLEDGEMENTS

This work is partly funded by EU Biotech Program under the contract BIO4 CT950160. We thank Vito Volpetti for his contribution to the multi-alignment of the tRNA genes.

REFERENCES

1. Saccone,C. (1994) Curr. Opin. Genet. Dev., 4, 875-881. MEDLINE Abstract

2. Sbisà,E., Tanzariello,F., Reyes,A., Pesole,G. and Saccone,C. (1997) Gene, 205, 125-140. MEDLINE Abstract

3. Attimonelli,M., Calò,D., De Montalvo,A., Lanave,C., Sasanelli,D., Tommaseo Ponzetta,M. and Saccone,C. (1998) Nucleic Acids Res., 26, 120-125. MEDLINE Abstract

4. Attimonelli,M., Altamura,N., Boyen,C., Benne,R., Brennicke,A., Carone,A., Cooper,J.M., D'Elia,D., De Montalvo, A, De Pinto,B., De Robertis,M., Golik,P.J., Grienenberger,M., Knoop,V., Lanave,C., Lazowska,J., Lemagnen,A., Malladi,S., Memeo,F., Monnerot,M., Pilbout,S., Schapira,A.H.V., Sloof,P., Slonimski,P., Stevens,K. and Saccone,C. (1999) Nucleic Acids Res., 27, 128-133.

5. Attimonelli,M., Calò,D., Cooper,J.M., De Montalvo,A., Licciulli,F., Sasanelli,D., Stevens,K., Malladi,B.S., Saccone,C. and Schapira,A.H.V. (1998) Nucleic Acids Res., 26, 116-119. MEDLINE Abstract

6. Van de Peer,Y., Caers,A., De Rijk,P. and De Wachter,R. (1998) Nucleic Acids Res., 26, 179-182. MEDLINE Abstract

7. De Rijk,P., Caers,A., Van de Peer,Y. and De Wachter,R. (1998) Nucleic Acids Res., 26,183-186. MEDLINE Abstract

8. Stoesser,G., Moseley,M.A., Sleep,J., McGowran,M., Garcia-Pastor,M. and Sterk,P. (1998) Nucleic Acids Res., 26, 8-15. MEDLINE Abstract

9. Dennis,A., Benson,D.A., Boguski,M.S., Lipman,D.J., Ostell,J. and Ouellette,B.F.F. (1998) Nucleic Acids Res., 26, 1-7.

10. Gouy,M., Gautier,C., Attimonelli,M., Lanave,C. and DiPaola,G. (1985) Comput. Applic. Biosci., 1, 167-172.

11. Higgins,D.G., Bleasby,A.J. and Fuchs,R. (1992) Comput. Applic. Biosci., 8, 189-191.

12. Higgins,D.G. and Sharp,P. (1989) Comput. Applic. Biosci., 5, 151-153.

13. Devereux,J., Haeberli,P. and Smithies,O. (1984) Nucleic Acids Res., 12, 387-395. MEDLINE Abstract

14. Galtier,N., Gouy,M. and Gautier,C. (1996) Comput. Applic. Biosci., 12, 543-548.

15. Nicholas,K.B., Nicholas,H.B.,Jr and Deerfield,D.W.,II (1997) EMBnet. News, 4, 14.

16. Morrison,D.A. and Ellis,J.T. (1997) Mol. Biol. Evol., 14, 428-441. MEDLINE Abstract

17. Sprinzl,M., Horn,C., Brown,M., Ioudovitch,A. and Steinberg,S. (1998) Nucleic Acids Res., 26, 148-153. MEDLINE Abstract

18. Carone,A., Malladi,S.B., Attimonelli,M. and Saccone,C. (1999) Nucleic Acids Res., 27, 150-152.

19. Etzold,T., Ulyanov,A. and Argos,P. (1996) Methods Enzymol., 266, 114-128. MEDLINE Abstract


*To whom correspondence should be addressed. Tel. +39 080 548 2130; Fax: +39 080 548 4467; Email: marcella@area.ba.cnr.it


This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 9 Dec 1998
Copyright©Oxford University Press, 1998.

Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BrainHome page
P. F. Chinnery, D. T. Brown, R. M. Andrews, R. Singh-Kler, P. Riordan-Eva, J. Lindley, D. A. Applegarth, D. M. Turnbull, and N. Howell
The mitochondrial ND6 gene is a hot spot for mutations that cause Leber's hereditary optic neuropathy
Brain, January 1, 2001; 124(1): 209 - 218.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
C. Lanave, S. Liuni, F. Licciulli, and M. Attimonelli
Update of AMmtDB: a database of multi-aligned Metazoa mitochondrial DNA sequences
Nucleic Acids Res., January 1, 2000; 28(1): 153 - 154.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (169K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (5)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Lanave, C.
Right arrow Articles by Saccone, C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Lanave, C.
Right arrow Articles by Saccone, C.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?