Nucleic Acids Research, 1999, Vol. 27, No. 1 138-142
© 1999
Articles |
HvrBase: compilation of mtDNA control region sequences from primates
Max-Planck-Institut für evolutionäre Anthropologie, Inselstraße 22-26, D-04103 Leipzig, Germany Zoologisches Institut, Universität München, Luisenstraße 14, D-80333 München, Germany
* To whom correspondence should be addressed at: Zoologisches Institut, Universität München, Luisenstraße 14, D-80333 München, Germany. Tel: +49 89 5902 570; Fax: +49 89 5902 474; Email: meyers{at}zi.biologie.uni-muenchen.de
Received October 15, 1998. Accepted October 16, 1998.
| ABSTRACT |
|---|
|
|
|---|
HvrBase is a compilation of human and ape mtDNA control region sequences. Sequences and related information on individuals, such as from where the sequences were obtained, is stored in three ASCii files as described previously. Moreover, the collection is also available as Mac/PC database application with a graphical user interface. It can be accessed through the WWW at URL http://www.eva.mpg.de/hvrbase. The current collection comprises 5846 human sequences from hypervariable region I (HVRI) and 2302 human sequences from hypervariable region II (HVRII). From apes, 295 HVRI sequences and 13 HVRII sequences are available.
| Introduction |
|---|
|
|
|---|
HvrBase is an updated and extended version of the compilation of human mtDNA control region sequences (1). It consists of a collection of aligned sequences from the hypervariable regions I and II (HVRI and HVRII) together with available information about the individuals (humans or apes) from whom the sequences were obtained. The compilation was created to simplify the coordination of collection, alignment and storage of the huge amount of D-loop sequence data and relevant information that has accumulated in the course of various population studies during the last 10 years (1,2 and references therein). The vast majority of previously determined HVRI and HVRII sequences were human sequences and were consequently the focus of the first version of the compilation. However, since the amount of available control region sequences from apes is steadily increasing (3–12), we have added available data from Pan troglodytes (common chimpanzee), Pan paniscus (pygmy chimpanzee), Gorilla gorilla (gorilla) and Pongo pygmaeus (orangutan) to HvrBase. Here we describe the status of the updated human data, the novel ape data and the changes necessary to include the ape data in HvrBase. We also introduce a new Mac/PC database application with a graphical user interface.
| Compilation of Sequences and Organization of the Data |
|---|
|
|
|---|
We adopted the same strategy to collect and organize the new information about the sequences as expatiated in the previous compilation. New human sequences were collected from the publications (13–19). Sequences from great apes were taken from available publications (3,6–12).
To accomodate for the fact that HvrBase now comprises information from different species, an additional category, S: <genus, species, subspecies>, was introduced to allow retrieval of sequences from species.
| Programs |
|---|
|
|
|---|
We provide a simple C-program to allow the retrieval of individual sequences that match a user-defined keyword. Search results are output in four files (for details see 1).
Moreover, we introduce a self-running FilemakerPro database application that allows easy access to the data with the aid of a graphical user interface (Fig. 1). Users can query some or all of the information categories: original name in publication or GenBank (20), author and publication, species, continent, origin, population, language and language phylum, and presence or absence of the 9 bp deletion (21). In addition, sequences can be searched for certain motifs. A detailed user manual is included in the database.
| Description of the Compilation |
|---|
|
|
|---|
Alignment
The dot matrix program dotter (22) was used as a visual aid to produce a global alignment for the ape and human sequences. Figure 2 displays the dot plot of HVRI regions of two randomly selected modern humans, a modern human and a neanderthal, as well as the dot plots obtained when comparing a human to a common chimpanzee, pygmy chimpanzee, gorilla and orangutan, respectively. From this picture it is obvious that the alignment is not a major problem for HVRI. However, it is notable that the gorilla shows a large deletion in region 16162–16264 as already observed by Foran et al. (3). Using the dot plots as guides to align the sequences, the complete collection was manually aligned.
More specifically, for the HVRI region we aligned positions 16 001–16 408 according to Anderson et al. (23). If published sequences were longer than this alignment, they were truncated to the corresponding sites; if they were shorter, question marks were introduced to achieve the length required by the alignment. All non-determined nucleotides within a sequence are represented by N. A dash (–) indicates an insertion or deletion of a nucleotide.
|
The alignment of the HVRI sequences is 422 bp long and starts at position 16 001 according to the human reference sequence (23). Gaps of varying length were introduced at positions 16104.1, 16139.1, 16169.1, 16174.1–16174.2, 16183.1–16183.4, 16227.1, 16259.1, 16366.1, 16296.1 and 16386.1 (Fig. 3). Gaps at positions 16139.1, 16296.1 and 16174.2 are introduced to accommodate for the great ape sequences.
The HVRII sequence alignment exhibits similar features as the HVRI alignment. It is not difficult to align chimpanzees and humans, but the alignments of humans against gorillas or orangutans is very problematic. The current alignment will certainly change in the future as more sequences become available.
The HVRII sequence alignment starts at position 1 and comprises 421 bp with gaps at positions 56.1–56.2, 174.1, 190.1, 291.1–291.2, 294.1, 302.1–302.4 and 315.1–315.2 (Fig. 3). Note that the alignment in HvrBase differs from the one in Handt et al. (1). We introduced four gaps at positions 56.2, 174.1, 190.1 and 315.1–315.2, and removed two previous gaps at positions 65.1 and 310.1–310.2.
Humans
The current collection of human data comprises 5846 HVRI sequences and 2302 HVRII sequences. From 2061 individuals, HVRI and HVRII were both sequenced. These numbers also include some unpublished sequences [Bauer,K., Geisert,H., Krings,M., Laan,M., Salem,A., Sajantila,A., Pääbo,S., Wiebe,V. and Wilson,J. (1998) in preparation; Parsons,T. (1998) in preparation] that will be made available once they are in press. Table 1 displays the populations, grouped according to continents, for which at least 20 individuals were sequenced. For some individuals, a population affiliation was not specified. These sequences could not be classified. However, the number of classified individuals outweighs the number of ungrouped individuals, with the exception of America. Note that the number of available HVRII sequences within a population is usually smaller than the available number of HVRI sequences.
Great Apes
For the first time HvrBase includes sequences from the neanderthal (24), chimpanzees, gorillas and orangutans (3,612).
Table 2 gives an overview of the sequenced individuals. Again the number of HVRI sequences outweighs that of HVRII sequences. From P.troglodytes, 264 individual HVRI sequences were determined, whereas the gorilla HVRI and the orangutan HVRI are represented by 28 individuals and 3 individuals, respectively. For HVRII, a total of 13 sequences are collected in HvrBase.
|
|
|
|
| Quality and Completeness of the Data |
|---|
|
|
|---|
Our data have been largely compiled from published sequences. Although we have taken great pains to minimize mistakes, there may still be sequences in our collection that contain errors or where some annotations are not correct. To ensure a high quality of the data, we are grateful if bugs or obscurities are pointed out to us. We solicit everybody to furnish new sequences via electronic mail together with the relevant information. We would also be grateful to receive already published sequences which are missing in our collection. While our collection is aimed at the control region sequences, there are other databases like MITOMAP (25) which collect information about the variability of the entire human mitochondrial genome.
| Availability |
|---|
|
|
|---|
HvrBase can be retrieved free of charge over the internet from http://www.eva.mpg.de/hvrbase
| Acknowledgements |
|---|
We are grateful to all colleagues who provided their sequence data as a computer file and gave additional information when needed. We want to express our special thanks to Roland Fleißner, Tony Goldberg, Matthias Krings, Martin Richards, Antti Sajantila, Thomas Parsons, Svante Pääbo, Victor Wiebe and Cheryl Wise. Financial support from the DFG is greatly acknowledged.
| References |
|---|
|
|
|---|
- Handt O., Meyer S., von Haeseler A. Nucleic Acids Res. (1997) 26:126–129.[Web of Science]
- von Haeseler A., Sajantila A., Pääbo S. Nature Genet. (1996) 14:135–140.[Web of Science][Medline]
- Foran D., Hixson J., Brown W. Nucleic Acids Res. (1988) 16:5841–5861.
[Abstract/Free Full Text] - Bailey W.J., Hayasaka K., Skinner C.G., Kehoe S., Sieu L.C., Slightom J.L., Goodman M. Mol. Phylogenet. Evol. (1992) 1:97–135.[Medline]
- Horai S., Satta Y., Hayasaka K., Kondo R., Inoue T., Ishida T., Takahata S.H. J. Mol. Evol. (1992) 35:32–43.[Web of Science][Medline]
- Morin P.A., Moore J.J., Chakraborty R., Jin L., Goodall J., Woodruff D.S. Science (1994) 265:1193–1201.
[Abstract/Free Full Text] - Xu X., Arnason U. Mol. Biol. Evol. (1996) 3:691–698.
- Xu X., Arnason U. J. Mol. Evol. (1996) 43:431–437.[Web of Science][Medline]
- Arnason U., Xu X., Gullberg A. J. Mol. Evol. (1996) 42:145–152.[Web of Science][Medline]
- Garner K.J., Ryder O.A. Mol. Phylogenet. Evol. (1996) 6:39–48.[Web of Science][Medline]
- Goldberg T.L., Ruovolo M. Nucleic Acids Res. (1997) 25:1–6.
[Abstract/Free Full Text] - Wise C.A., Sraml M., Rubinsztein D.C., Easteal S. Mol. Biol. Evol. (1997) 14:707–716.[Abstract]
- Bortolini M.C., Zago M.A., Salzano F.M., Silva-Junior W.A., Bonatto S.L., Silva M.C.B.O.D., Weimer T.A. Hum. Biol. (1997) 69:141–159.[Web of Science][Medline]
- Calafell F., Underhill P., Tolun A., Angelicheva D., Kalaydjieva L. Ann. Hum. Genet. (1996) 60:35–49.[Web of Science][Medline]
- Delghandi M., Utsi E., Krauss S. Hum. Hered. (1998) 48:108–114.[Web of Science][Medline]
- Lee S.D., Shin C.H., Kim K.B., Lee Y.S., Lee J.B. For. Sci. Int. (1997) 87:99–116.
- Murray-McIntosh R.P., Scimshaw B.J., Hateld P.J., Penny D. Proc. Natl Acad. Sci. USA (1998) 95:9047–9052.
[Abstract/Free Full Text] - Soodyall H., Vigilant L., Hill A.V., Stoneking M., Jenkins T. Am. J. Hum. Genet. (1996) 58:595–608.[Web of Science][Medline]
- Ward R.H., Salzano F.M., Bonatto S.L., Hutz M.H., Coimbra C.E.A., Santos J.R., Santos R.V. Am. J. Hum. Biol. (1996) 8:317–323.
- Benson D.A., Boguski M.S., Lipman D.L., Ostell J. Nucleic Acids Res. (1997) 25:1–6.
[Abstract/Free Full Text] - Wrischnik L.A., Higuchi R.G., Stoneking M., Erlich H.A., Arnheim N., Wilson A.C. Nucleic Acids Res. (1987) 15:529–542.
[Abstract/Free Full Text] - Sonnhammer E.L.L., Durbin R. Gene (1995) 167:GC1–G10.[Web of Science][Medline]
- Anderson S., Bankier A.T., Barell B.G., de Bruijn M.H.L., Coulson A.R., Drouin J., Eperon I.C., Nierlich D.P., Roe B.A., Sanger F., Schreier P.H., Smith A.J.H., Staden R., Young I.G. Nature (1981) 290:457–465.[Medline]
- Krings M., Stone A., Schmitz R.W., Krainitzki H., Stoneking M., Pääbo S. Cell (1997) 90:19–30.[Web of Science][Medline]
- Kogelnik A.M., Lott M.T., Brown M.D., Navathe S.B., Wallace D.C. Nucleic Acids Res. (1997) 25:196–199.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
J. Kohl, I. Paulsen, T. Laubach, A. Radtke, and A. von Haeseler HvrBase++: a phylogenetic database for primate species Nucleic Acids Res., January 1, 2006; 34(suppl_1): D700 - D704. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Gutierrez, D. Sanchez, and A. Marin A Reanalysis of the Ancient Mitochondrial DNA Sequences Recovered from Neandertal Bones Mol. Biol. Evol., August 1, 2002; 19(8): 1359 - 1366. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. N. Poinar, M. Kuch, K. D. Sobolik, I. Barnes, A. B. Stankiewicz, T. Kuder, W. G. Spaulding, V. M. Bryant, A. Cooper, and S. Paabo A molecular analysis of dietary diversity for three archaic Native Americans PNAS, April 10, 2001; 98(8): 4317 - 4322. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Kaessmann, V. Wiebe, and S. Paabo Extensive Nuclear DNA Sequence Diversity Among Chimpanzees Science, November 5, 1999; 286(5442): 1159 - 1162. [Abstract] [Full Text] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



