Complete sequence analysis of the genome of the bacterium
Mycoplasma pneumoniae
Complete sequence analysis of the genome of the bacterium Mycoplasma pneumoniae
Ralf
Himmelreich
,
Helmut
Hilbert
+
,
Helga
Plagens
,
Elsbeth
Pirkl
,
Bi-Chen
Li
w
and
Richard
Herrmann*
Zentrum für Molekulare Biologie Heidelberg, Mikrobiologie, Universität Heidelberg, 69120
Heidelberg
,
Germany
Received August 22, 1996;
Revised and Accepted October 10, 1996
DDBJ/EMBL/GenBank accession no. U00089
ABSTRACT
The entire genome of the bacterium
Mycoplasma pneumoniae
M129 has been sequenced. It has a size of 816 394 base pairs with an average
G+C content of 40.0 mol%. We predict 677 open reading frames (ORFs) and 39
genes coding for various RNA species. Of the predicted ORFs, 75.9% showed
significant similarity to genes/proteins of other organisms while only 9.9% did
not reveal any significant similarity to gene sequences in databases. This
permitted us tentatively to assign a functional classification to a large
number of ORFs and to deduce the biochemical and physiological properties of
this bacterium. The reduction of the genome size of
M.pneumoniae
during its reductive evolution from ancestral bacteria can be explained by the
loss of complete anabolic (e.g. no amino acid synthesis) and metabolic
pathways. Therefore,
M.pneumoniae
depends in nature on an obligate parasitic lifestyle which requires the
provision of exogenous essential metabolites. All the major classes of cellular
processes and metabolic pathways are briefly described. For a number of
activities/functions present in
M.pneumoniae
according to experimental evidence, the corresponding genes could not be
identified by similarity search. For instance we failed to identify
genes/proteins involved in motility, chemotaxis and management of oxidative
stress.
INTRODUCTION
The bacterium
Mycoplasma pneumoniae
has a genome size of ~800 kb and completely lacks a cell wall. The bacterium is surrounded by a
cytoplasmic membrane only, which contains cholesterol as an indispensable component.
Mycoplasma pneumoniae
is a human pathogen, causing `atypical pneumonia' (
1
) usually in older children and young adults. As a surface parasite, it attaches to the host's respiratory epithelium by
means of a differentiated terminal structure termed attachment organelle or tip
structure. For a long time, research activities mainly focused on pathogenicity-related topics such as studies on cytadherence (
2
), vaccination and diagnosis (
3
).
Mycoplasma pneumoniae
was not considered as an organism suitable for basic studies partly because of
its fastidious growth requirements and partly because of the lack of
established standard genetic tools like conjugation or transformation with self-replicating vectors (
4
). These disadvantages can be compensated now to a large extent by the methods
of molecular biology.
Morowitz pointed out in 1984, that mycoplasmas would be suitable candidates for
defining the genetic constitution of a minimal self-replicating cell (
5
). The advantage of these bacteria for such studies (
6
,
7
), mainly due to their small genome size, was so obvious that several
initiatives were started to sequence five different mycoplasma genomes:
Mycoplasma genitalium
(
8
,
9
),
M.pneumoniae
(
10
),
Mycoplasma capricolum
(
11
),
Mycoplasma mycoides
(
12
) and a species from the related genus
Ureaplasma
,
Ureaplasma urealyticum
(
13
). So far, only the complete sequence of the
M.genitalium
genome has been published (
9
) which, with 580 070 bp, is the smallest bacterial genome known so far. In the
genus
Mycoplasma
,
M.pneumoniae
and
M.genitalium
are the closest related species. We report in this publication the complete
nucleotide sequence of the genome of
M.pneumoniae
, which thus provides information on a second small bacterial genome. All
M.pneumoniae
genes which had been already sequenced were reanalyzed except for the P1 operon
(
14
). Our sequencing strategy, early results and a detailed description of
M.pneumoniae
as an experimental system have been recently published (
10
).
MATERIALS AND METHODS
Mycoplasma strain
The strain
Mycoplasma pneumoniae
M129 (ATTC 29342) in the 18th broth passage was used to construct an ordered
cosmid library containing the complete genome (
15
). This cosmid library was the basis for the DNA sequence analysis. We selected
this specific bacterial strain because it has been used in cytadherence and
pathogenicity studies (
2
,
16
,
17
). The strain in the 20th broth passage was still infectious in hamsters (H.
Brunner, unpublished data).
DNA sequencing
Using the enzymatic dideoxy chain-termination method (
18
), the sequence data for this study were exclusively generated on a fluorescent-based sequence-gel reader (Model 373A, Applied Biosystems). Sequencing strategies
and methods were as described in Hilbert
et al.
(
10
).
Computer assisted analysis
Sequence assembly, map drawing and multiple alignments were done with the
Lasergene
program package (DNA STAR).
Other analyses were performed with the
HUSAR
(Heidelberg Unix Sequence Analysis Resources) program package release 4.0 at
the German Cancer Research Center, Heidelberg, Germany. This package is based
on the
GCG
program package version Unix-8.1 of the Genetics Computer Group, Wisconsin. For searching the DNA and
protein databases [
SWISS-PROT
(
19
) and
PIR
(
20
)] the
FASTA
(
21
) and
BLAST
(
22
) programs (
BLASTX
,
BLASTN
and
BLASTP
) were used. Conserved motifs in proteins and peptides were identified by using
the program
PROSITE
(
23
). Open reading frames (ORFs) were calculated by the program
FRAMES
allowing AUG (or GUG, UUG) as start codons using the Mycoplasma translation
table where UGA codes for tryptophan (
24
). The G+C content was calculated by the program
WINDOW
. Codon usage was performed with the program
CODONFREQUENCY
.
The programs
TopPred 1.1.1
(Manuel G. Carlos, Ecole Normale Superieure, Laboratoire de Genetique
Moleculaire, Paris, France) and PSORT (
25
) (http://psort.nibb.ac.jp/) were used for the prediction of transmembrane
domains and the membrane topology of proteins.
Each ORF analysis is accessible as a
File Maker Pro
(Claris) database which can be accessed at our world wide web (www) site
(http://zmbh.uni-heidelberg.de/M_pneumoniae). It contains, besides genome and cosmid
position of each ORF/gene, data about expression, availibility of antibodies,
comments, literature, prosite patterns, amino acid composition and database
search homology scores. All the annotations in this paper were done on the
basis of the highest score values.
Accession number
The complete
M.pneumoniae
sequence has been annotated in GenBank (NCBI) with the accession number U00089.
RESULTS AND DISCUSSION
The strategy and methodology for sequencing the complete genome has been described by us recently (
10
). A total of 2 415 202 nucleotides primary sequence data were provided by 6385 sequencing reactions.
Each strand of the genome was completely sequenced at least once. The direct
sequencing approach, combining primer walking with a limited shotgun strategy
based on a complete cosmid and plasmid library considerably facilitated the
assembly of the individual sequences to the entire genome sequence. The average
redundancy of the sequencing was 2.95 (calculated for both strands). This very
low redundancy was achieved by the use of 5095 oligonucleotides.
The complete
M.pneumoniae
genome has a size of 816 394 bp and a G+C content of 40.0 mol%. Altogether 677
open reading frames (ORFs) and 39 genes coding for various RNA species were
predicted. All ORFs were sorted into categories according to their proposed
functions (Tables
1
and
2
; Fig.
1
). Only 333 ORFs (49.2%) were functionally assigned, based on significant
sequence similarities to genes or proteins from other organisms with known
functions (e.g. ribosomal proteins) or at least known categories of function
(e.g. proteins involved in cytadherence). Significant similarities to proteins
without known function from other bacteria, mostly
M.genitalium
,
were shown for 181 proposed ORFs (26.7%). We also included in this group those
M.pneumoniae
proteins which were identified in protein extracts of
M.pneumoniae
by monospecific antibodies or by the N-terminal amino acid sequences of enriched proteins (
26
,
27
). The group of ORFs without significant similarity or without indication for their
in vivo
expression comprised 109 members (16.1%); 42 of them carry characteristic
motifs, which are not sufficient for defining a function. Examples of such
motifs are the leucine zipper (29 cases; refered to all predicted ORFs), the
typical prokaryotic lipoprotein sequence pattern (46 cases) or ATP- and GTP-binding sites (73 cases). In addition all predicted gene products
were analyzed by programs for structure predictions, e.g. coiled/coiled
structures (29 cases) or transmembrane segments (275 cases). The latter result
supports the analysis of cell fractionation experiments which indicate that the
membrane fraction contains ~50% of the total proteins estimated by SDS-PAGE. About 8% of the genome is composed of repetitive DNA elements
RepMP1, RepMP2/3, RepMP4 and RepMP5, while only 67 of all predicted ORFs (9.9%)
code for a product without any similarity to a known RNA or protein.
CONCLUSIONS
It is impossible to address each proposed
M.pneumoniae
gene in this paper. We have tried to cover the most important categories of
functions and point to genes which should be present, but could not be found by
our applied methods. Typical examples are the missing diphosphonucleoside
kinase for the conversion of (d)NDPs to (d)NTPs, and the substrate binding
domain (oppA) for the oligopeptide ABC transporter. In addition, we could not
find any indication for a number of genes/proteins, which should be there based on experimental evidence.
Mycoplasma pneumoniae
has been shown to be motile and to exhibit chemotactic behaviour (
64
). Motility genes are difficult to identify since the motility in
M.pneumoniae
is independent of pili or flagella and it is not yet known which are potential
candidates. Therefore, any progress in this field depends on the isolation of
mutants. Furthermore, none of the components of the chemotactic signal pathway,
the Che proteins, which are well conserved among bacteria, or any other `two-component signal transduction system' could be detected. Chemotactic
behaviour in
M.pneumoniae
is difficult to study. While it might be possible that these bacteria are
chemotaxis negative, only additional experiments will clarify this point.
It has been reported that
M.pneumoniae
produces hydrogen peroxide considered to be a pathogenicity factor (
17
). Therefore, to protect itself from oxidative stress one would expect to find
the standard enzymes dealing with these stress factors like catalase,
superoxide dismutase or peroxidase, but we have no similarity based evidence
that these enzymes exist in
M.pneumoniae
. Experimental data on this topic are also inconsistent (
62
).
The results of our sequence analysis explain quite well the kind of changes
which have led to the observed reduction of the genome size in
M.pneumoniae
from the presumed genome size of several million base pairs of the ancestral
bacteria. The main cause is the loss of complete anabolic (no amino acid
synthesis) and metabolic pathways and of genes for the synthesis of complex
structures like the bacterial cell wall which requires a large number of genes.
In addition, for several processes like DNA repair, DNA recombination, cell
division or protein secretion, the number of genes involved is smaller than in
the more complex bacteria.
No significant changes were observed in the size of individual genes which
resemble more or less their counterparts in
E.coli
or
B.subtilis
. The occasionally observed smaller intergenic regions, like those found in the
ATPase operon, do not appear to significantly contribute to the overall genome
size reduction.
In contrast with the loss of complete pathways we frequently observed the
amplification of complete genes or segments of genes (see sections on
lipoprotein families or on the repetitive DNA sequences RepMP2/3, RepMP4 and
RepMP5). In these two instances the obvious advantage would be the potential of
expressing antigenic variants of surface-exposed proteins.
The various truncated genes which are also present in full length copies e.g.
arginine deiminase (H03_orf438 and H03_orf238), DNA primase (H91_orf620 and
D12_orf212) and the dihydrofolate reductase (H10_orf506 and F10_orf160) might
be relics of recombination events which took place in the course of the process
of evolution.
Finally among the many proposed proteins are a few which share the highest
similarity over their entire length with a eukaryotic protein. The most
prominent examples are the pre-B cell enhancing factor (pbeF, D09_orf451) and the carnitine
palmitoyltransferase II precursor (cpt2, C09_orf600). Both might be candidates
for examples of horizontal gene transfer, but at the present state of analysis
a definitive answer cannot be given.
It will be the main task of future studies to reconcile the experimental
evidence and the DNA sequence-based predictions, i.e. to indentify the genes for observed functions and
vice versa, and to assign functions to proposed open reading frames with
hitherto unknown functions.
One obvious topic is the comparative analysis between the completely sequenced
genomes of the closely related species
M.pneumoniae
and
M.genitalium
(
9
). Since the present paper is already very voluminous we decided to publish this
analysis in an additional paper (Himmelreich
et al
., in preparation).
ACKNOWLEDGEMENTS
We thank R. Frank and A. Bosserhoff for the synthesis of oligonucleotides, B.
Reiner for her expertise in computer data analysis, Raphael Mosbach for his
technical assistance concerning hardware problems, U. Leibfried for technical
assistance, I. Schmidt for preparing the manuscript, D. Hofmann and H. Göhlmann for reading of the manuscript and H. Schaller for financial
assistance and his encouragement throughout our work. We thank S. Razin, A.
Wieslander, K. Dybvig, K. Sitaraman, R. Walker, H. Neimark and R. Miles who
read drafts of this publication. Their corrections, critical comments and
suggestions helped us very much. This research was supported by a grant from
the Deutsche Forschungsgemeinschaft (He 780/5-1-He 780/5-4) and by the Fonds der Chemischen Industrie.
REFERENCES
1 Chanock, R. M., Dienes, L., Eaton, M. D., Edward, D. G., Freundt, E. A., Hayflick, L., Hers, J. F. P., Jensen, K. E., Liu, C., Marmion, B. P., Morton, H. E., Mufson, M. A., Smith, P. F., Somerson, N. L. and Taylor-Robinson, D. (1963) Science, 140,662.
2 Krause, D. C. (1996) Mol. Microbiol., 20,247-253.
3 Jacobs, E. (1991) Rev. Med. Microbiol., 2,83-90.
5 Morowitz, H. J. (1984) Isr. J. Med. Sci., 20,750-753.
6 Razin, S. (1992) FEMS Microbiol Lett, 100,423-431.
7 Bove, J. M. (1993) Clin. Infect. Dis., 17 Suppl 1,10-31
8 Peterson, S. N., Hu, P. C., Bott, K. F. and Hutchison, C. A. d. (1993) J. Bacteriol., 175, 7918-7930
9 Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., Clayton, R. A., Fleischmann, R. D., Bult, C. J., Kerlavage, A. R., Sutton, G., Kelley, J. M. et al. (1995) Science, 270,397-403.
10 Hilbert, H., Himmelreich, R., Plagens, H. and Herrmann, R. (1996) Nucleic Acids Res., 24,628-639.MEDLINE Abstract
11 Bork, P., Ouzounis, C., Casari, G., Schneider, R., Sander, C., Dolan, M., Gilbert, W. and Gillevet, P. M. (1995) Mol. Microbiol., 16,955-967MEDLINE Abstract
12 Sterky, F., Holmberg, A. and Uhlen, M. (1996) HUGO'96, Heidelberg, Germany.
13 Glass, J. L., Glass, J. S., Lefkowitz, E. J., Chen, E. Y. and Cassel, G. H. (1996) IOM Letters, USA, Vol. 4, pp. 12., Proc. Meet. Int. Org. Mycoplasm., Orlando, Florida.
14 Inamine, J. M., Loechel, S. and Hu, P. C. (1988) Gene, 73,175-183.
15 Wenzel, R. and Herrmann, R. (1989) Nucleic Acids Res., 17,7029-7043.MEDLINE Abstract
16 Su, C. J., Chavoya, A. and Baseman, J. B. (1988) Infect Immunol., 56,3157-3161.
17 Almagor, M., Yatziv, S. and Kahane, I. (1983) Infect. Immunol., 41,251-256.
18 Sanger, F., Nicklen, R. and Coulson, A. R. (1977) Proc. Natl Acad. Sci. USA, 79,5463-5467.
19 Bairoch, A. and Boeckmann, B. (1991) Nucleic Acids Res., 19,2247-2249.MEDLINE Abstract
20 Barker, W. C., George, D. G., Mewes, H.-W., Pfeiffer, F. and Tsugita, A. (1993) Nucleic Acids Res., 21,3089-3092.
21 Pearson, W. R. and Lipman, D. J. (1988) Proc. Natl Acad. Sci. USA, 85,2444-2448.
22 Altschul, S., Gish, W., Miller, W., Myers, E. and Lipman, D. (1990) J. Mol. Biol., 215, 403-410.
24 Inamine, J. M., Ho, K. C., Loechel, S. and Hu, P. C. (1990) J. Bacteriol., 172,504-506.
25 Nakai, K. and Kanehisa, M. (1991) Proteins: Struct., Funct. Genet., 11,95-110.MEDLINE Abstract
26 Proft, T. and Herrmann, R. (1994) Mol. Microbiol., 13,337-348.MEDLINE Abstract
27 Proft, T., Hilbert, H., Layh Schmitt, G. and Herrmann, R. (1995) J. Bacteriol., 177, 3370-3378.
28 Razin, S. and Jacobs, E. (1992b) J. Gen. Microbiol., 138, 407-422.
29 Ruland, K., Wenzel, R. and Herrmann, R. (1990) Nucleic Acids Res., 18,6311-6317.MEDLINE Abstract
30 Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M. et al. (1995) Science, 269,496-512.
32 Baker, T. A. and Wickner, S. H. (1992) Annu. Rev. Genet., 26,447-477.
33 Mills, L. B., Stanbridge, E. J., Sedwick, W. D. and Korn, D. (1977) J. Bacteriol., 132, 641-649.
34 Barnes, M. H., Tarantino, P. M., Jr., Spacciapoli, P., Brown, N. C., Yu, H. and Dybvig, K. (1994) Mol. Microbiol., 13,843-54
35 Koonin, E. V. and Bork, P. (1996) Trends Biochem. Sci., 21,128-129.
36 Camerini-Otero, R. D. and Hsieh, P. (1995) Annu. Rev. Genet., 29, 509-552.
37 Demple, B. and Harrison, L. (1994) Annu. Rev. Biochem., 63,915-948.MEDLINE Abstract
38 Sancar, A. and Sancar, G. B. (1988) Annu. Rev. Biochem., 57,29-67.MEDLINE Abstract
39 Haldenwang, W. G. (1995) Microbiol. Rev., 59,1-30
40 Hyman, H. C., Gafny, R., Glaser, G. and Razin, S. (1988) J. Bacteriol., 170,3262-3268.
41 Moran, C. P. j., Lang, N., LeGrice, S. F. J., Lee, G., Stephens, M., Sonnenshein, A. L., Pero, J. and Losik, R. (1982) Mol. Gen. Genet., 186,339-346.
49 Proft, T., Hilbert, H., Plagens, H. and Herrmann, R. (1996) Gene, 171,79-82.MEDLINE Abstract
50 Kahane, I., Tucker, S., Leith, D. K., Morrison, P. J. and Baseman, J. B. (1985) Infect. Immunol., 50,944-946.
51 Su, C. J., Chavoya, A., Dallo, S. F. and Baseman, J. B. (1990) Infect. Immunol., 58,2669-2674.
52 Ruland, K., Himmelreich, R. and Herrmann, R. (1994) J. Bacteriol., 176,5202-5209MEDLINE Abstract
53 Vicente, M. and Errington, J. (1996) Mol. Microbiol., 20,1-7.MEDLINE Abstract
54 Sankaran, K., Gupta, S. D. and Wu, H. C. (1995) Methods Enzymol, 250,683-697MEDLINE Abstract
55 Gilson, E., Alloing, G., Schmidt, T., Claverys, J. P., Dudler, R. and Hofnung, M. (1988) EMBO J., 7,3971-3974.MEDLINE Abstract
56 Citti, C. and Wise, K. S. (1995) Mol. Mircobiol., 18,649-660.
57 Higgins, C. F. (1992) Annu. Rev. Cell Biol., 8,67-113.
58 Postma, P. W., Lengeler, J. W. and Jacobson, G. R. (1993) Microbiol. Rev., 57,543-594
59 Fath, M. J. and Kolter, R. (1993) Microbiol. Rev., 57,995-1017.
60 Schatz, G. and Dobberstein, B. (1996) Science, 271,1519-1526MEDLINE Abstract
61 Freundt, E. A. and Razin, S. (1984) In Krieg, N. R. and Holt, J. G. e. (eds), Bergey's Manual of Systematic Bacteriology, Vol. 1. Williams and Wilkins, Baltimore, pp. 742-770.
62 Pollack, J. D. (1992) In Maniloff, J., McElhaney, R. N., Finch, L. R. and Baseman, J. B. e. (eds), Mycoplasmas-Molecular Biology and Pathogenesis. American Society for Microbiology, Washington, DC, pp. 181-200.
63 Plackett, P., Marmion, B. P., Shaw, E. J. and Lemke, R. M. (1969) Aust. J. Exp. Biol. Med. Sci., 47,171-195.MEDLINE Abstract
64 Kirchhoff, H. (1992) In Maniloff, J., McElhaney, R. N., Finch, L. R. and Baseman, J. B. e. (eds.), Mycoplasmas-Molecular Biology and Pathogenesis. Americam Society for Microbiology, Washington, DC, pp. 289-308.
65 Matic, I., Rayssiguier, C. and Radman M. (1995) Cell, 80,507-515.MEDLINE Abstract
66 Atkins, J. F. and Gesteland, R. F. (1996) Nature, 379,769-771
67 Rottem, S., Adar, L., Gross, Z., Ne'Eman, Z. and Davis, P. J. (1986) J. Bacteriol., 167, 299-304.