Comparative analysis of the genomes of the bacteria
Mycoplasma pneumoniae
and
Mycoplasma genitalium
Comparative analysis of the genomes of the bacteria Mycoplasma pneumoniae and Mycoplasma genitalium
Ralf
Himmelreich
,
Helga
Plagens
,
Helmut
Hilbert
+
,
Berta
Reiner
and
Richard
Herrmann*
Zentrum für Molekulare Biologie Heidelberg, Mikrobiologie, Universität Heidelberg, 69120
Heidelberg
,
Germany
Received December 9, 1996;
Revised and Accepted January 8, 1997
ABSTRACT
The sequenced genomes of the two closely related bacteria
Mycoplasma genitalium
and
Mycoplasma pneumoniae
were compared with emphasis on genome organization and coding capacity. All the 470 proposed open reading
frames (ORFs) of the smaller
M.genitalium
genome (580 kb) were contained in the larger genome (816 kb) of
M.pneumoniae
. There were some discrepancies in annotation, but inspection of the DNA
sequences showed that the corresponding DNA was always present in
M.pneumoniae
. The two genomes could be subdivided into six segments. The order of
orthologous genes was well conserved within individual segments but the order
of these segments in both bacteria was different. We explain the different
organization of the segments by translocation via homologous recombination. The
translocations did not disturb the continuous bidirectional course of transcription in both genomes, starting at the proposed origin of replication. The additional 236 kb in
M.pneumoniae,
compared with the
M.genitalium
genome, were coding for 209 proposed ORFs not identified in
M.genitalium
. Of these ORFs, 110 were specific to
M.pneumoniae
exhibiting no significant similarity to
M.genitalium
ORFs, while 76 ORFs were amplifications of ORFs existing mainly as single copies in
M.genitalium
. In addition, 23 ORFs containing a copy of either one of the three repetitive DNA sequences RepMP2/3, RepMP4 and
RepMP5 were annotated in
M.pneumoniae
but not in
M.genitalium,
although similar DNA sequences were present. The
M.pneumoniae
-specific genes included a restriction-modification system, two transport systems for carbohydrates, the
complete set of three genes coding for the arginine dihydrolase pathway and 14
copies of the repetitive DNA sequence RepMP1 which were part of several
different translated genes with unknown function.
INTRODUCTION
Since the first publication of the complete nucleotide sequence of the genome of
the bacterium
Haemophilus influenzae
(
1
), four more sequences of bacterial genomes have been published, namely
Mycoplasma genitalium
(
2
),
Methanococcus jannaschii
(
3
),
Synechocystis sp
. (
4
) and
Mycoplasma pneumoniae
(
5
), but many more are expected to appear within the next 1-2 years. The large amount of data produced has already initiated several studies on whole
genome comparisons, between the genomes from
Escherichia coli, Haemophilus influenzae
and
M.genitalium
; bacteria which are only distantly related (
6
,
7
).
The complete sequencing of the
M.pneumoniae
genome enabled us to accomplish a comparative analysis of two genomes of closely related
organisms:
M.genitalium
and
M.pneumoniae
. Both
M.genitalium
and
M.pneumoniae
are human surface parasites and in nature depend on the host which supplies them with essential nutrients,
but both organisms can be propagated
in vitro
in a serum-enriched cell-free medium. Yet, it is much more difficult to cultivate
in vitro
M.genitalium
; in fact only a few isolates of
M.genitalium
have been made so far (
8
).
Mycoplasma pneumoniae
is preferentially found in the respiratory tract (
9
) and
M.genitalium
in the urogenital tract (
10
), although exceptions are possible. The isolation of
M.genitalium
from the respiratory tract of
M.pneumoniae
-infected patients has been reported (
11
) and
M.pneumoniae
has recently been isolated from urogenital clinical specimens (
12
). This shows that both bacteria can exist in the same environment.
Mycoplasma pneumoniae
is an established human pathogen causing atypical pneumonia, mostly in children
and young adults (
13
,
14
). There is accumulating evidence that
M.genitalium
is one of the agents of nongonococcal urethritis in man (
14
).
Both bacteria share a similar flask-like morphology and show serological cross reactions (
14
), but they differ in several important features, including a difference in G+C
content (8 mol%) and genome size (236 kb), different tissue specificity and pathogenic effects for humans (
13
,
14
), and genomic DNA:DNA hybridizations show low values (
15
). Since the complete nucleotide sequences of both genomes have been
established, it has become feasible to compare both bacteria at the nucleotide
and protein level.
The genome of
M.genitalium
consists of only 580 070 base pairs (bp) representing the smallest bacterial genome presently known (
2
). The additional genetic information contained in the larger genome (816 394 bp)
of
M.pneumoniae
(
5
) is probably the key for explaining and understanding the observed biological
differences between both species. The comparison of these two closely related
bacteria might also provide information for defining essential functions of a
self-replicating minimal cell as well as dispensable functions on the way to
smaller genomes.
Mycoplasma pneumoniae
-specific ORFs with significant similarities to proteins in databases
This publication describes the results of our comparative analysis of these two
mycoplasma genomes with emphasis on
genome organization, coding capacity and gene to gene comparison.
METHODS
Computer assisted analysis
Analyses were performed with the
HUSAR
(Heidelberg Unix Sequence Analysis Resources) program package release 4.0 at
the German Cancer Research Center, Heidelberg, Germany. This package is based
on the
GCG
program package version Unix-8.1 of the Genetics Computer Group, Wisconsin.
For the DNA and protein comparisons, the
FASTA
(
16
) and
BLAST
(
17
) programs (
BLASTX, BLASTN
and
BLASTP
) were used. Protein sequences were aligned by using either the program GAP
(pairwise alignment) based on the algorithm of Needleman and Wunsch (
18
) or CLUSTAL (
19
) for multiple alignments. The G+C content was calculated by the program
WINDOW
. Codon usage was assessed with the program
CODONFREQUENCY
.
The annotated sequence data from
M.genitalium
(
2
) and
M.pneumoniae
(
5
) serve as the basis for the comparative analyses. Corrections to the original paper on
M.genitalium
were only considered if they were published in a scientific journal.
Comparison of functional classification of proteins based on sequence similarity
Abbreviations: n.d., not detected; n.a., not annotated;
+
, according to our calculations;
*
, see refs. 5, 20, 21 and 50, numbers are different from those published by
Fraser
et al.
(2); numbers in brackets, percentage of total ORFs.
For comparison between
H.influenzae
and
M.genitalium
see http://www.ncbi.nlm.nih.gov./cgi-bin/complete_genomes
Our published data (
5
) can also be accessed at the world wide web (www) page (http://www.zmbh.uni-heidelberg.de/M_pneumoniae). The www-pages contain the following additional information: differences to
annotations of
M.genitalium
ORFs, missing ORFs in
M.genitalium
, lists of direct length-comparisons of the orthologous
M.genitalium
and
M.pneumoniae
ORFs, degree of identity between orthologous genes/proteins, and theoretical
two-dimensional protein maps of both bacteria.
RESULTS AND DISCUSSION
Coding capacity
Originally, 470 ORFs were proposed for
M.genitalium
(
2
) and 677 for
M.pneumoniae
(
5
). After the publication of the
M.genitalium
sequence a few changes were introduced (
20
) and a number of functional corrections and new assignments were added [(
21
,
22
) see www]. But, independent of these ambiguities a comparison at the ORF and
nucleotide sequence level shows clearly that all of the proposed
M.genitalium
ORFs are completely contained in
M.pneumoniae
. There are some discrepancies in annotation and functional prediction, but in
all instances inspection of the
M.genitalium
DNA sequences showed that the corresponding DNA sequence was present in
M.pneumoniae
. An obvious example is the ORF
MG468 in
M.genitalium
which was assigned in the gene map (
2
) but originally misnamed in the table on the www pages (http://www.tigr.org/tdb/mdb/mgdb/mgdb.html) or the ortholog to the proposed ORF F10_orf357 of the p65 operon of
M.pneumoniae
was not annotated in
M.genitalium
, but the DNA sequence coding for a protein with a significant similarity is
present between the ORFs
MG218 and MG219 in
M.genitalium
(
23
). Further, we proposed for
M.pneumoniae
ORFs containing the repetitive DNA sequences RepMP2/3, RepMP4 and RepMP5 [Table
1
, (
24
)], this was not done for
M.genitalium
, but again at the DNA level the sequences were present. For more detailed
information see also our www pages.
The difference in size between the two mycoplasma genomes amounts to 236 kb
coding for 209 proposed ORFs [Table
1
, www pages, (
5
)]. Among these ORFs (Table
1
) there are two prominent groups: (i) ORFs showing similarities to functionally
assigned proteins only present in
M.pneumoniae
and gene amplifications thereof (Tables
1
and
2
); these ORFs may help to explain the biological differences between
M.pneumoniae
and
M.genitalium
. (ii) ORFs which are amplified and are also present in
M.genitalium
but with a smaller copy number; these ORFs contribute to the difference in
genome size but not as much to the repertoire of new functions. The relatively
small number of
M.pneumoniae
-specific proposed ORFs with significant similarities to proteins with known functions is summarized in Table
2
.
The most important functions are: (i) an hsd-type restriction-modification (R-M) system; (ii) two phosphoenolpyruvate:carbohydrate phosphotransferase systems (PTS), one with a predicted
transport specificity for mannitol and the other with an unknown specificity; (iii) an NADP-dependent alcohol dehydrogenase; (iv) the complete set of the enzymes involved in the arginine dihydrolase pathway consisting of arginine deiminase, ornithine carbamoyltransferase
and carbamate kinase.
Also included here are the predicted ORFs which contain sequences of the repetitive DNA sequence RepMP1, first described by Wenzel and Herrmann (
25
).
The proposed restriction-modification (R-M) system shares the highest similarity with the type I restriction-modification system from
E.coli
(
26
). This system consists of an enzyme complex with three different subunits: R
(1033 amino acids) for restriction, M (520 amino acids) for modification and S
(410 amino acids) for sequence specificity. This type of R-M system has been already identified in
Mycoplasma pulmonis
(
27
). The comparison of the R subunits size from
M.pulmonis
and
M.pneumoniae
and also of the orthologs of
E.coli
and
H.influenzae,
strongly suggests that the three ORFs H91_orf376, H91_orf115 and H91_orf206,
corresponding to the N-terminal, the middle, and the C-terminal part of a complete R subunit are the result of frameshifts.
Since the repeated sequence analysis with PCR-amplified genomic
M.pneumoniae
confirmed our original DNA sequence, we assume that our
M.pneumoniae
strain carries frameshift mutations in the hsdR gene. We also observed a number
of gene amplifications of the hsdS gene coding for S subunits varying in length
between 363 and 145 amino acids. Since the orthologs in
E.coli, H.influenzae
or
M.pulmonis
are ~400 amino acids long, we assume that the shorter ORFs in
M.pneumoniae
are truncated, inactive forms.
Codon usage by
M.pneumoniae
(MP) and
M.genitalium
(MG)
All values are calculated in thousands. The `MP/MG' column contains the 458
M.pneumoniae
ORFs with similarity to
M.genitalium
and the `MG/MG' column represents the codon usage of the
M.genitalium
ORFs with similarity to
M.pneumoniae
. We compare here only ORFs with identical annotation.
*The stop codons are not included in the calculation.The G+C contents of
M.pneumoniae
(40 mol%) and
M.genitalium
(32 mol%) vary by 8 mol%. Plotting the G+C content of the first, second and
third position of all the codons used for the proposed ORFs against the G+C
content of the genomes of
M.pneumoniae
and
M.genitalium
reveals that the third codon position is, with almost 19 mol% difference, the
most variable (
M.pneumoniae
, 41.9 mol%;
M.genitalium
, 23.1 mol%), the first position is the next variable (
M.pneumoniae
, 46.9 mol%;
M.genitalium
, 41.6 mol%), and the second position is the most constrained (
M.pneumoniae
, 33.4 mol%;
M.genitalium
, 30.0 mol%). Changing the third positions in ~32 000 codons (corresponding to the 19 mol% difference) of a total of 170 400 codons from A or T to G or C would
already cause a 5.6 mol% increase in the G+C content of the
M.genitalium
genome without affecting the amino acid composition. This indicates that the
difference in G+C content is mainly caused by the difference in the third
position of the codons (Table
5
).
We still cannot explain why only
M.pneumoniae
has this relatively high G+C content among the
Mollicutes
(
33
). A nucleotide bias in the mutational mechanism (
34
) might be the cause. An unbalanced supply of the essential external precursors
for nucleic acid synthesis and repair seems unlikely since both bacteria can
grow in the same enviroment. On comparing the similarity of orthologs in
M.genitalium
and
M.pneumoniae
measured as amino acid identities, we found a large spectrum reaching from 95%
to only 20% identity (www pages). The average was ~67% identity. The highest scores were found for housekeeping proteins like
ribosomal proteins, elongation factors or subunits of the F
o
F
1
ATPase. Interesting among these is G12_orf109 (MG353) with a high identity
score but no functional assignment (Table
6
). The high score suggests that this is a protein with a function also well
conserved among other bacteria. Proteins with low identity scores were the
components of the cytoskeleton and the lipoproteins which are mostly surface-exposed and play a role in antigenic variation in other mycoplasmas. In analogy to other organisms, antigenic variation could be accomplished by differential expression of members of lipoprotein gene families (
35
). In general, functionally assigned proteins, which occur in many bacteria,
showed high identities and most of the functionally unassigned proteins had the
lower identity values. About 90% of orthologous genes were similar in size in both mycoplasmas. Only 52 proposed ORFs exhibited significant size deviations. The differences were frequently due
to different localization of the start codon (ATG, TTG, GTG) causing an
extension of the N-terminus mainly in conserved proteins (Table
6
, atpD gene) and gaps in genes with lower identity scores such as the
cytadherence accessory genes hmw1 and hmw3 (Table
6
).
The comparative analysis was done with the program GAP.
*These values consider only the overlapping regions.The G+C content of the two genomes also influences the total content of amino
acids assigned by codons with only A and/or T (Phe, Ile, Met, Tyr, Asn, Lys) or
G and/or C (Pro, Ala, Arg, Gly) in the first and second position. We calculated
for the 458 orthologous proteins (Table
5
) the following values for the sum of AT codons: 31.78 mol% for
M.pneumoniae
and 36.1 mol% for
M.genitalium
, and for the GC codons, 18.52 mol% for
M.pneumoniae
and 14.43 mol% for
M.genitalium
. If we took for these calculations
M.pneumoniae
ORFs with a G+C content <35 mol% (
5
) a significant increase from 31.44 to 36.9 mol% was observed for AT codons and
a decrease from 18.71 to 13.3 mol% for GC codons. These results support the
findings of Sueoka (
37
) and many others (for review see ref.
38
) that a relationship exists between the G+C content of a DNA and the amino acid
composition.
Genome organization
One of the main conclusions derived from comparative analyses of bacterial
genome organizations was that gene order is not conserved. Even the proposed
origins of replication, normally located around the dnaA gene (
39
) are not uniform. In contrast the genomes of
M.pneumoniae
and
M.genitalium
represent an example of two different species with a very conserved gene order
(Fig.
1
) and dnaA regions (
36
), revealing the same arrangement of genes and 69.4% identity at the nucleotide
level from nucleotide position 196 519 to 217 156 (Fig.
1
;
36
).
The
M.pneumoniae
and
M.genitalium
genomes can be subdivided into six genomic segments (Figs
1
and
2
). Within these segments, the order of genes was conserved with the only
exception that additional genes were interspaced in the larger genome of
M.pneumoniae
(indicated by the white and light-coloured arrows in Fig.
1
), but the order of the six fragments is different in both genomes. A closer
inspection of the regions bordering these segments showed that in each case,
one or more of the repetitive sequences RepMP1, RepMP2/3, RepMP4 and RepMP5
were present in
M.pneumoniae
(Fig.
3
) and that relics of these sequences were still visible in
M.genitalium
. They were named MgPa repeats (
2
,
40
) and revealed strong sequence similarities to the above mentioned repetitive DNA sequences from
M.pneumoniae
, except for RepMP1, which could not be identified in
M.genitalium.
We concluded therefore that the reorganization of the
M.genitalium
genome took place by translocation of entire segments via homologous recombination between the repetitive DNA sequences. This conclusion is supported by the presence of the recA gene in both
mycoplasmas. The proposed sites of translocation in
M.genitalium
were between the ORFs MG068/069, MG139/140, MG185/186, MG192/193 and MG207/208.
Only between MG207 and MG208 there is no MgPa repeat (Fig.
2
;
2
).
Figure 3
.
High resolution G+C plot combined with the gene map of the first 100 kb of the
M.pneumoniae
genome (5). This figure illustrates the proposed sites of genomic recombination
and their correlation to the repetitive DNA sequences. The position of the
repetitive DNA sequences can easily be recognized by an increase in the G+C-content. The segments of conserved gene order are displayed as coloured
bars. The magenta coloured bars indicate the areas of putative genomic
recombinations. The thick coloured arrows indicate the functional categories of
the proposed ORFs from
M.pneumoniae
(5).
Except for RepMP1, the repetitive DNA sequences of both genomes are
characterized by their high G+C content, ~55% in
M.pneumoniae
and 43% in
M.genitalium
. Most of the peaks reaching above 50 mol% for
M.pneumoniae
and 40 mol% for
M.genitalium
in the plot of the G+C content represent the repetitive DNA sequences or the P1
gene and the ORF6 gene of the P1 operon (Figs
2
and
3
). They contribute also to the uneven G+C distribution on the genome (Fig.
2
).
It has been pointed out that in both the
M.genitalium
(
2
) and
M.pneumoniae
genomes (
5
) a remarkable uniformity of the direction of transcription is conserved (Fig.
4
). We see a frequent switching of transcription only between nucleotide
positions 520 000 and 608 000 on the
M.pneumoniae
map and between ORF MG291 and MG247 on the
M.genitalium
map (Fig.
1
). In all other genome regions only ~15% of the proposed ORFs are transcribed against the general direction of
transcription. The observed translocation of DNA segments which took place in
M.genitalium
did not change this uniform transcription pattern. One can see that in both
genomes, the red-coloured region between nucleotide positions 100 000 and 675 000 of the
M.pneumoniae
genome and between ORFs MG068 and MG208 of the
M.genitalium
genome (Figs
1
and
2
) has not been rearranged by translocations although the repetitive DNA
sequences, the hypothetical sites for homologous recombination, were present as
indicated for
M.pneumoniae
in Figures
1
and
2
. The corresponding MgPa repetitive DNA sequences are located between the
following pairs of ORFs: MG 226/227, MG260/261, MG287/288 and MG339/340 (Figs
1
and
2
;
2
). We explain this observed genomic stability by a selection pressure which
tolerates translocations only when the transcription of the genes on the
translocated segment does not interfere with the general direction of
transcription of the genomic environment. Most translocations of DNA segments
from outside of the red-coloured area into this region, by homologous recombination between the
repetitive DNA sequences bordering the DNA segments, would interrupt the
direction of transcription. The proposed mechanism of translocation permits
only insertion in such an orientation, that the genes on this segment are
transcribed in the opposite direction with respect to the general orientation
of transcription of the cell. Such a conserved continuous bidirectional
direction of transcription has not been seen in
H.influenzae
(
1
),
Methanococcus jannaschii
(
3
) or in
Synechocystis sp.
(
4
); either the genes were preferentially transcribed from one DNA strand (
H.influenzae
) or frequent switching of strands occurred (
M.jannaschii
). Assuming a bidirectional modus of DNA replication, it might be possible that
transcription and DNA replication are coupled. This could be a way of
regulating gene expression. Any reversion of this directionality might be
disadvantageous to these bacterial cells.
Figure 4
.
Comparative presentation of the organization of DNA segments with conserved
gene order and general direction of transcription in
M.pneumoniae
and
M.genitalium
. Segments with conserved gene order in both bacteria are shown in the same
colour. Gaps within the coloured block and triangles in the red segments
represent repetitive DNA sequences. The black arrows starting in two opposite
directions from the proposed origin of replication (black dot) indicate the
general direction of transcription, the two arrowheads mark the region on the
genome where the direction of transcription is frequently changing (see also
Fig. 1). The numbers inside the coloured circle indicate the first and last
M.genitalium
ORF (MG is omitted) of each of the six DNA segments with conserved gene order.
For this illustration the
M.genitalium
sequence was reverse complemented and the putative origin of replication was
oriented in the 3 `o' clock position. This figure shows that the general
transcription orientation is very conserved in these two bacteria and that the
proposed genomic recombination took place only in one half of the genome. For
clarity, the ORFs which are transcribed contrary to the general direction of
transcription (~15%, Fig. 1) have not been indicated.When we analyzed the sites where, compared with
M.genitalium
, the specific additional proposed ORFs of
M.pneumoniae
were located, we found a remarkable conformity because they were mapped in
regions of low G+C content. Examples are given in Figure
3
, roughly from nucleotide positions 60 000 to 67 000 and from 94 000 to 99 000.
All the proposed ORFs represented by white arrows are
M.pneumoniae
-specific. This phenomenon, the correlation between additional
M.pneumoniae
-specific genes and a relative low G+C content, appears throughout the
entire genome (see www pages). In some instances, the segments with a low G+C
content also have a lower coding density (Figs
1
and
3
); such as the proposed origin of replication (Fig.
3
;
36
). Presently, we have no explanation for this observation. To understand these
findings one has to wait until the direct ancestor of
M.pneumoniae
has been identified.
The difference in genome size between
M.pneumoniae
and
M.genitalium
can be explained by two processes: Size reduction of the
M.genitalium
genome by deleting genes and by an increase in size of the
M.pneumoniae
genome by amplification of existing genes. The best example for the latter
event is the significantly higher number of lipoproteins in
M.pneumoniae
, many of which probably arose via gene amplification. If we subtract the length
of the genes amplified from the genome of
M.pneumoniae
(Table
1
), we end up with a
M.pneumoniae
genome of ~710 kb. Unless the gene amplifications turn out to code for important
genetic information for
M.pneumoniae
, it appears that the larger genome does not code for as many more functions as
would be anticipated from its higher DNA content. Following the same argument
the genome size of
M.genitalium
could also be further reduced, e.g. to ~560 000 bp by deleting MgPa repeats.
The minimal cell
Mycoplasma genitalium
has the smallest presently known genome, it is therefore the most promising
candidate for defining and constructing a minimal cell by genetic manipulations
e.g. inactivating or deleting genes. More importantly, the proposed or
constructed minimal cell can be experimentally tested for its ability to
survive and reduplicate under defined conditions. It is apparent that the minimal set of essential genes for an
M.genitalium
-derived minimal cell has to be different depending on growth conditions,
e.g. whether the minimal cell is growing
in vitro
in a serum-enriched medium or in the respiratory or urogenital tracts of the host. It
might grow well without adhesin proteins under laboratory conditions, but it
would probably be unable to survive in the respiratory or urogenital tract
without the ability to colonize following its adhesion to the epithelial
surfaces. Therefore, when defining a minimal cell, one has also to define the
environmental conditions for growth of this cell.
An obvious approach for defining a minimal cell is to start with
M.genitalium
, the smallest known existing cell, and gradually reduce its genetic complexity.
The comparison with the larger
M.pneumoniae
genome provides hints for genetic information that may be deleted (see below).
An alternative approach for defining the minimal cell was applied by Mushegian
and Koonin (
41
). They identified all pairs of orthologous genes in the distantly related
bacteria
H.influenzae
and
M.genitalium
and, on this basis, constructed a minimal gene set of 240 members complemented
by a small number of non-orthologous genes, ending up with a final set of 256 genes. This approach has two disadvantages: essential
functions could be missed and the difficulty in experimental verification of the minimal cell [see also the commentary of Maniloff (
42
) on the topic `the minimal cell genome']. For instance a conventional bacterium
possesses a cell wall which provides structural stability and protects it
against osmotic stress. The wall-less
M.genitalium
and
M.pneumoniae
do not code for a single gene involved in cell wall formation but as a
substitute they possess a cytoskeleton (
43
). The proteins which were proposed to participate in cytoskeleton formation do
not share significant sequence similarities with proteins from other bacteria,
therefore proteins which have the same function in both,
H.influenzae
and
M.genitalium
, do not share sequence similarities and might be eliminated as non-essential. On the contrary, comparison of the two mycoplasma genomes
permits several conclusion as to the probable number of genes involved in
macromolecule synthesis, metabolic and anabolic pathways, transport and
formation of structural elements. The same or a similar number of genes
involved in both mycoplasmas in DNA replication, transcription and translation
suggests that in these functional categories the minimal gene set has already
been established. In other functional categories, like cell envelope and
cytoskeletal proteins, energy metabolism or transport, more flexibility seems
possible, since environmental conditions might strongly influence the number of
genes/functions required to be supplied by the minimal cell itself. One has
also to consider that quite a proportion of gene functions in
M.genitalium
are either still unknown or insufficiently defined, and it might well be that
among the hitherto unclassified genes, essential genes are hidden. In addition,
it remains to be seen how much intergenic regions contribute to a functional
chromosome, e.g. by influencing the chromosomal DNA topology.
The definition of the minimal cell might be considered as a pure academic
problem and cannot be answered satisfactorily, but it may be possible to define
a minimal set, a core of genes, which has to be present in every self-replicating cell and which has to be complemented by additional genetic
information depending on the growth conditions provided by the specific
environment. Mycoplasmas in general, and
M.genitalium
and
M.pneumoniae
in particular, can serve as excellent model organisms to tackle experimentally
the question of the essential functions for small self-replicating cells.
ACKNOWLEDGEMENTS
We thank E. Pirkl for excellent technical assistance, R. Mosbach for technical assistance with hardware and software problems, I. Schmid for preparing the manuscript, H. Neimark for discussion, and S. Razin and R. Walker for reading the paper and giving valuable
suggestions for its improvement. This research was supported by a grant from
the Deutsche Forschungsgemeinschaft (He 780/5-1-He 780/5-4) and by the Fonds der Chemischen Industrie.
REFERENCES
1 Fleischmann, R. D., Adams, M. D., White, O., Clayton, R. A., Kirkness, E. F., Kerlavage, A. R., Bult, C. J., Tomb, J. F., Dougherty, B. A., Merrick, J. M. et al. (1995) Science, 269,496-512.
2 Fraser, C. M., Gocayne, J. D., White, O., Adams, M. D., Clayton, R. A., Fleischmann, R. D., Bult, C. J., Kerlavage, A. R., Sutton, G., Kelley, J. M. et al. (1995) Science, 270,397-403.
3 Bult, C. J., White, O., Olsen, G. J., Zhou, L., Fleischmann, R. D., Sutton, G. G., Blake, J. A., FitzGerald, L. M., Clayton, R. A., Gocayne, J. D. et al. (1996) Science, 273,1058-1073.
4 Kaneko, T., Sato, S., Kotani, H., Tanaka, A., Asamizu, E., Nakamura, Y., Miyajima, N., Hirosawa, M., Sugiura, M., Sasamoto, S. et al. (1996) DNA Res., 3,109-136.
5 Himmelreich, R., Hilbert, H., Plagens, H., Pirkl, E., Li, B.-C. and Herrmann, R. (1996) Nucleic Acids Res., 24,4420-4449.
6 Tatusov, R. L., Mushegian, A. R., Bork, P., Brown, N. P., Hayes, W. S., Borodovsky, M., Rudd, K. E. and Koonin, E. V. (1996) Current Biol., 6,279-291.
7 Koonin, E. V., Mushegian, A. R. and Rudd, K. E. (1996) Current Biol., 6,404-416.
8 Jensen, J. S., Hansen, H. T. and Lind, K. (1996) J. Clin. Mircobiol., 34,286-291
9 Hu, P. C., Collier, A. M. and Baseman, J. B. (1977) J .Exp. Med., 145,1328-1343.
10 Tully, J. G., Taylor Robinson, D., Cole, R. M. and Rose, D. L. (1981) Lancet, 1,1288-1291.
11 Baseman, J. B., Dallo, S. F., Tully, J. G. and Rose, D. L. (1988) J. Clin. Microbiol., 26,2266-2269.
12 Goulet, M., Dular, R. , Tully, J. G., Billowes, G. and Kasatiya, S. (1995) J. Clin. Microbiol., 33,2823-2825.
13 Jacobs, E. (1991) Rev. Med. Microbiol., 2,83-90.
14 Taylor-Robinson, D. (1996) Clin. Infect. Dis.,23, 671-684.MEDLINE Abstract
15 Yogev, D. and Razin, S. (1986) Int. J. Syst. Bacteriol., 36,426-430.
16 Pearson, W. R. and Lipman, D. J. (1988) Proc. Natl. Acad. Sci. USA, 85,2444-2448.
17 Altschul, S., Gish, W., Miller, W., Myers, E. and Lipman, D. (1990) J. Mol. Biol., 215,403-410.
18 Needleman, S. B. and Wunsch, C. D. (1970) J. Mol. Biol., 48,443-453.
19 Higgins, D. G. and Sharp, P. M. (1988) Gene, 73,237-244.
22 Ouzounis, C., Casari, G., Valencia, A. and Sander, C. (1996) Mol. Mircobiol., 20,898-900.
23 Krause, D. C., Proft, T., Hedreyda, C. T., Hilbert, H., Plagens, H. and Herrmann, R. (1997) J. Bacteriol., in press
24 Ruland, K., Wenzel, R. and Herrmann, R. (1990) Nucleic Acids Res., 18,6311-6317.
25 Wenzel, R. and Herrmann, R. (1988) Nucleic Acids Res., 16,8337-8350.
26 Bickle, T. A. and Kruger, D. H. (1993) Microbiol. Rev., 57,434-450.
27 Dybvig, K. and Yu, H. (1994) Mol. Microbiol., 12,547-560.
28 Postma, P. W., Lengeler, J. W. and Jacobson, G. R. (1993) Microbiol. Rev., 57,543-594.
29 Pollack, J. D. (1992) In Maniloff, J., McElhaney, R. N., Finch, L. R., and Baseman, J. B. (eds), Mycoplasmas - Molecular Biology and Pathogenesis. American Society for Microbiology, Washington, DC, pp. 181-200.
30 Proft, T. (1995), PhD thesis, Ruprecht-Karls-Universität Heidelberg.
31 Braun, V. and Wu, H. C. (1994) In Ghuysen, J.-M., and Hakenbeck, R. (eds), Bacterial Cell Wall. Elsevier Science B.V., Vol. 27., Chapter 14, pp. 319-341.
32 Radestock, U. and Bredt, W. (1977) J. Bacteriol., 129,1495-1501.
33 Herrmann, R. (1992) In Maniloff, J., McElhaney, R. N., Finch, L. R., and Baseman, J. B. (eds), Mycoplasmas - Molecular Biology and Pathogenesis. American Society for Microbiology, Washington, DC, pp. 157-168.
34 Cox, E. C. and Yanofsky, C. (1967) Proc. Natl. Acad .Sci. USA, 58,1895-1902.
35 Citti, C. and Wise, K. S. (1995) Mol. Mircobiol., 18,649-660.
36 Hilbert, H., Himmelreich, R., Plagens, H. and Herrmann, R. (1996) Nucleic Acids Res., 24,628-639.
37 Sueoka, N. (1961) Proc. Natl. Acad. Sci. USA, 47, 1141-1149
38 Osawa, S., Jukes, T. H., Watanabe, K. and Muto, A. (1992) Microbiol. Rev., 56,229-264MEDLINE Abstract
39 Ogasawara, N. and Yoshikawa, H. (1992) Mol. Microbiol., 6,629-634.
40 Peterson, S. N., Bailey, C. C., Jensen, J. S., Borre, M. B., King, E. S., Bott, K. F. and Hutchison, C. A. (1995) Proc. Natl. Acad. Sci. USA, 92,11829-11833.
41 Mushegian, A. R. and Koonin, E. V. (1996) Proc. Natl. Acad. Sci. USA, 93,10268-10273.
42 Maniloff, J. (1996) Proc. Natl. Acad. Sci. USA, 93,10004-10006.
43 Krause, D. C. (1996)Mol. Mircobiol., 20, 247-253.
44 Koonin, E. V. and Bork, P. (1996) Trends Biochem. Sci., 21,128-129.
45 Koonin, E. V., Mushegian, A. R. and Bork, P. (1996) Trends Genet., 12,334-336.
46 Simoneau, P., Li, C. M., Loechel, S., Wenzel, R., Herrmann, R. and Hu, P. C. (1993) Nucleic Acids Res, 21,4967-4974.
47 Atkins, J. F. and Gesteland, R. F. (1996) Nature, 379,769-771.
48 Schatz, G. and Dobberstein, B. (1996) Science, 271,1519-1526.
49 Saurin, W. and Dassa, E. (1996) Mol. Mircobiol., 22,389-391.
50 Robinson, K., Gilbert, W. and Church, G. M. (1996) Science, 271, 1302-1313.