| Nucleic Acids Research | Pages |
Molecular evolution of DNA-(cytosine-N4) methyltransferases: evidence for their polyphyletic origin
Introduction
Materials And Methods
Results
Multiple sequence alignment
Phylogenetic trees
Discussion
Acknowledgements
References
Molecular evolution of DNA-(cytosine-N4) methyltransferases: evidence for their polyphyletic origin
Received June 22, 1999; Revised and Accepted September 21, 1999
ABSTRACT DNA N4-cytosine methyltransferases (N4mC MTases) are a family of S-adenosyl-L-methionine (AdoMet)-dependent MTases. Members of this family were previously found to share nine conserved sequence motifs, but the evolutionary basis of these similarities has never been studied in detail. We performed phylogenetic analysis of 37 known and potential new family members from the multiple sequence alignment using distance matrix, parsimony and maximum likelihood approaches to infer the evolutionary relationship among the N4mC MTases and classify them into groups of orthologs. All the treeing algorithms employed as well as results of exhaustive sequence database searching support a scenario, in which the majority of N4mC MTases, except for M.BalI and M.BamHI, arose by divergence from a common ancestor. Interestingly, MTases M.BalI and M.BamHI apparently originated from N6-adenine MTases and represent the most recent addendum to the N4mC MTase family. In addition to the previously reported nine sequence motifs, two more conserved sequence patches were detected. Phylogenetic analysis also provided the evidence for massive horizontal transfer of MTase genes, presumably with the whole restriction-modification systems, between Bacteria and Archaea.
INTRODUCTION
DNA methylation is catalyzed by DNA MTases, transferring the methyl group from the AdoMet molecule to certain N and C atoms of nucleotides. Modification of genomic DNA of most organisms plays a role in a variety of biological processes, including regulation of gene expression, DNA replication, mismatch repair and defense of the host against foreign DNA (reviewed in 1,2). DNA methylation leads to the formation of three kinds of products: N6-methyladenine (N6mA), N4-methylcytosine (N4mC) and 5-methylcytosine (5mC). Because of the chemical character of the reaction catalyzed by N4mC and N6mA DNA MTases (methylation of exocyclic -NH2 group), they are both grouped as one class, N-MTases (3). The methylation of 5mC is widespread in all branches of the tree of life. N6-adenine methylation, common to all bacteria, has been also reported in the ciliated protozoa (4). To our knowledge, however, N4mC has been found only in Prokaryota and Archaea. Moreover, contrary to the diversity of the biological function of DNA modification, N4-methylation seems to be primarily a component of restriction-modification systems (R-M) with the exception of M.NgoMXV MTase (5; former name M.NgoMV), for which no corresponding endonucleolytic activity has been found. Nowadays the number of known MTase sequences is difficult to estimate precisely as genome and other sequence data continue to pour into databases at a fast rate, but despite the growing number of putative N4mC MTases, this group remains minor compared to N6mA and 5mC MTases (6).
All DNA MTases share a common building plan, with a pattern of highly conserved amino acid sequence blocks. A set of ten motifs arranged in a constant linear order is found among most 5mC MTases along with a variable region, which confers sequence specificity (7). Although N-MTases seem to be a much less homogenous class than 5mC MTases, Malone et al. (8) were able to identify nine segments of similarity in the sequence alignment of 45 N-MTases (36 N6mA and only nine N4mC) corresponding to motifs I-VIII and X in 5mC MTases. Based on relative position of two most conserved of these motifs (I and IV) and the variable region N-MTases were classified as [alpha], [beta] and [gamma] (9). Group [alpha] is arranged in the order, motif I-variable region-motif IV; group [beta], motif IV-variable region-motif I; and group [gamma], motif I-motif IV-variable region (9). The N6mA MTases were found in all these classes, while the majority of N4mC MTases aggregated into the [beta] group with only one representative in the [alpha] group, and none in the [gamma] group (8). Only recently a N4mC MTase was described with an order of motifs similar to that of [gamma]-MTases, however lacking the typical variable region at the C-terminus (5,10).
Crystal structures have been determined for a number of AdoMet-dependent MTases, including two 5mC, M.HhaI (11) and M.HaeIII (12); two N6mA, M.TaqI (13) and M.DpnM (14); and one N4mC DNA MTase, M.PvuII (15). All of these enzymes share a remarkably similar catalytic domain structure, resembling an [alpha]/[beta] Rossmann-fold with conserved binding patterns for the cofactor AdoMet and modified base corresponding mainly to conserved motifs I and IV (16). In all cases the substrate to be methylated is bound or expected to bind in a pocket adjacent to the AdoMet binding site, which is formed by different amino acids in different MTases. The binding mode of N-MTases for their DNA target (different in all examined enzymes) has been suggested from the relative orientation of either additional target recognition domains (TRD) or assemblies of flexible loops, and accumulation of positive electrostatic charge in certain regions of protein surface (14-16). The site of the flipped-out nucleotide binding has been also postulated which has suggested a possible reaction mechanism, different from that of 5mC MTases (15,16).
It has been proposed that N6mA and N4mC MTases, which closely resemble one another, derive from a common ancestor (17). Recently, Jeltsch et al. (18) demonstrated that the catalytic activities of these two families overlap to some degree. However, a phylogenetic analysis of MTases utilizing superposition of tertiary structures and resulting rmsd values along with a structure-guided sequence alignment, which included representatives of N4mC and N6mA families, argues against their close common origin (19). N4mC and N6mA MTases are found on distinct branches of a tree, suggesting very ancient divergence of both subfamilies of N-MTases and opening possibilities for subsequent functional convergence.
In this paper we investigate the phylogenetic history of the N4mC MTase family and ask whether discrepancies between their function and degree of sequence similarity arose by divergence, or are evidence for convergent evolution. We compare enzymes from different structural classes ([alpha], [beta] and [gamma]-like) and propose a non-trivial scheme describing their divergence from a common ancestor.
MATERIALS AND METHODS
Amino acid sequences of all previously characterized members of the N4mC family were taken from publicly available databases through the REBASE catalog (6) and the PSI-BLAST program (20) was used for iterative multiple database searches with all of them as queries. The databases used in this search were the non-redundant (NR) database and both the complete and unfinished genomes obtained through the BLAST interface (http://www.ncbi.nlm.nih.gov/BLAST/ ) at the NCBI. The assignment of putative protein sequences as members of N4mC MTase family was based on high homology to known N4mC sequences according to the BLAST default cutoff values. All sequences were subsequently aligned using the CLUSTALX program (21). After the refinement of poorly aligned regions or subsets of sequences, manual adjustments were introduced based on the PSI-BLAST pairwise comparison and secondary structure prediction [carried out using consense JPRED approach (22), data not shown]. All sequences that appeared truncated, defective or only marginally similar to N4mC MTases were excluded from further analysis.
The phylogenetic trees were inferred from the sequence alignments using distance, parsimony and maximum likelihood algorithms implemented in programs available in the PHYLIP package (23 and references therein). In a distance matrix method, evolutionary distances (representing an estimate of the number of amino acid substitutions per site) were computed for all protein pairs, and a phylogenetic tree was reconstructed by using an algorithm of Fitch and Margoliash (24). According to the principle of maximum likelihood, for a possibly large set of trees a search for the maximum likelihood value was carried out for the patterns of amino acid differences among the sequences considering each site separately, and the tree with the largest value was chosen as the preferred one. Using a maximum parsimony method a tree was generated, which required the possibly smallest number of evolutionary changes to explain the differences observed among the sequences under study (methodology comprehensively reviewed in 25,26).
Since in all methods employed, each alignment position is assumed to include residues sharing common ancestry, regions of ambiguous alignment and most extensive gaps were excluded from the phylogenetic analysis. The distances proportional to the number of amino acid replacements per sequence position separating each pair of sequences were estimated using the JTT model (27) and the phylogenetic tree that best fits the sequence-to-sequence distances was generated with the KITSCH program. The trees that best fit the parsimony and maximum likelihood criteria were generated with the PROTPARS and PROTML programs respectively. Multiple runs were conducted using up to 20 different input orders, with global rearrangements and the subreplicates options used wherever possible to find an optimal (or nearly optimal) tree. The length of branches in each consensus tree computed using the majority-rule method CONSENSE was calculated with the FITCH program. The consistency of each tree was evaluated by the bootstrap resampling of the original sequence data using the SEQBOOT program. In this technique all alignment positions were randomly sampled with replacement from the original sequence set (28). The process was repeated 100 times, and a set of randomized alignments was used for reconstruction of new phylogenetic trees. The clusters with high proportion of occurrence among all the trees were considered to be statistically significant (26).
RESULTS
Taking advantage of all sequences deposited in databases and the 18 completed (four archaeal and 14 bacterial) and 34 (including three eukaryotic) unfinished genome sequences we have identified 37 proteins and putative proteins with extensive amino acid sequence similarity to the N4mC MTases. Nine homologs of N4mC MTases from Archaea and 28 from Bacteria have been identified, their absence from eukaryotic sequences has been also confirmed (Table 1). Many sequences of new family members have been obtained by genome sequencing projects that do not provide any information about biological function or biochemical activity of putative proteins and even if such information exist it is often incomplete and sometimes incorrect (29). It is worth emphasizing that homologs of known MTases were found only in two of 14 completely sequenced bacterial genomes, but in two of four archaeal genomes.
Table 1. The 37 known and potential N4mC MTases analyzed in this study All retrieved sequences were aligned using computer programs and criteria described in Materials and Methods. The resulting multiple sequence alignment (Fig. 1) was analyzed from the point of conservation of sequence patterns specific for N4mC and their closest relatives. Pairwise comparison of most N4mC MTase sequences indicated a moderate degree of sequence similarity restricted mainly to nine motifs composed of groups of conserved residues (8). However, only two residues are invariant, found not surprisingly in the two most conserved motifs: second proline in the core of the motif IV, `SPPY' hallmark of the N4mC MTase active site (30) and the middle glycine in `FxGxG' motif I-more generally conserved in all AdoMet-dependent MTases (31). This is due to the relatively large number of protein sequences used in the alignment and the inclusion of atypical (i.e. other than `SPPY'-bearing) and hypothetical proteins in initial calculations of the consensus sequence. The difficulty in obtaining unambiguous alignment of several regions, including for example the segment of M.PvuII, for which structure could not be solved, suggests either the presence of structural or functional features unique to each protein (such as specific sequence recognition determinants) or some degree of structural plasticity and lack of amino acid sequence constraints in these regions (15; our unpublished data). Figure 1. Multiple sequence alignment of 37 members of the N4mC MTase family classified as `[alpha]', `[beta]' or `[gamma]-like' (Materials and Methods). The order is as in Table 1. # indicates the site of deletion in the loop regions or the topological breakpoint introduced into the alignment. The secondary structure of M.PvuII (15) is shown at the bottom. Conserved motifs are outlined. Sequence blocks used for phylogenetic calculations are delineated using black bars above the alignment. Our results comparing N4mC MTases presented here in the form of the multiple sequence alignment and phylogenetic trees are more complete than previous studies, as they are based on all 37 sequences available to date. In addition, recent crystallographic results for M.DpnM (14) and M.PvuII (15) showed that several sequence motifs and local supersecondary structure predictions assigned by Malone et al. (8) as common features of all DNA N-MTases were in fact inconsistent between analyzed subfamilies. Therefore in our analysis, we attempted to rationalize the classification of conserved motifs of N4mC MTases based on similarities to the motifs of other classes of DNA MTases in respect to the common supersecondary structural and functional elements. The assignment of conserved motifs I-VIII and X in our final alignment differs slightly from the widely cited results of Malone et al. (8), especially in respect to the position of weakly conserved motif III, but is essentially identical to the structure-based alignment presented by Gong et al. (15) (Fig. 1). Many residues are conserved throughout the sequence, most of them forming common structural features: both Rossmann-fold-like core (16,32) and several conserved loops with catalytic or ligand-binding functions, as inferred from M.PvuII structure (15). We suggest a modification in nomenclature regarding motif VIII, building an antiparallel [beta]-hairpin localized at the `edge' of the common core of AdoMet-dependent MTases (33). In all structurally characterized DNA and RNA MTases this region forms a part of a target nucleotide binding pocket (16); however, the length of the loop between antiparallel [beta]-strands may dramatically vary even between proteins belonging to the same class (Fig. 1). For that reason different locations of motif VIII were proposed, in either one of the [beta]-strands (14,15) or the intervening loop (8,16). This discrepancy is clearly caused by the inability to bridge two conserved patches, for convenience we suggest referring to the C-terminal part of motif VIII as to the submotif VIII', so that parts VIII and VIII' would correspond to either of [beta]-strands, respectively (Fig. 1). We have also localized a previously overlooked, weakly conserved sequence patch present in most of N4mC MTase sequences. This patch (N/Q/D-V/I-W-N/E/D-I/V) can be found after motif VIII, between the variable region and motif X. In M.PvuII MTase this region precedes helix F, postulated as a DNA-binding element similar to 5mC MTases and not present in N6mA MTases (15,34). However, there is no significant sequence similarity between motif IX in 5mC MTases and the newly described region in N4mC MTases, and to avoid confusion we labeled it as motif IX-N4. In M.PvuII this region forms a short [beta]-strand, while in M.HhaI it forms a loop and an [alpha]-helix, moreover, the presented alignment suggests that the helix F of M.PvuII might be not conserved among many of N4mC MTases (Fig. 1; J.M.Bujnicki, unpublished structure predictions). Due to low sequence conservation in this region (e.g. M.PvuII is lacking central Trp residue) structure prediction is ambiguous and would certainly benefit from further experimental investigation. In phylogenetic inference there are two computational steps: estimation of the topology (branching pattern of a tree) and estimation of branch lengths for that topology (35). While the statistical estimation of branch lengths is relatively simple for a known topology, the number of possible topologies for a sizeable number of sequences is enormously large (for 37 sequences the number of bifurcating unrooted trees is in the order of 1049 given by N = (2n - 5)!/[2n - 3(n - 3)!]) (26). We therefore had to resort to a heuristic search to estimate a good tree. In the absence of a priori knowledge, the ultimate criterion for determining phylogenetic reliability rests on tests of congruence among results of different algorithms, which enable detection and minimization of systematic errors caused by the partially false assumptions of the implemented methods. Because the issue of phylogenetic reconstruction is controversial, with some disagreement coming even from personal preference or philosophy of researches in the field, we decided to use methods, which rely on substantially different assumptions about the molecular evolutionary process and have different limitations. Phylogenetic trees of the N4mC MTases were inferred from the alignment using distance, maximum likelihood and parsimony methods (Materials and Methods). The distance method is based on a probabilistic model of amino acid transitions, which does not take explicit account of the genetic code or differences in preferred directions of substitutions of residues from different secondary structures. Its performance depends on the linear relationship with the number of substitutions and the standard error of the estimate of the distance measure. The maximum parsimony procedure is the only one that can easily take care of insertions and deletions, which may carry important phylogenetic information, but when the rate of multiple substitutions per site in the alignment is relatively high, it can be expected to converge onto the wrong tree. Under assumption that all amino acid residues diverge at the same expected rate the maximum likelihood estimation yields quite robust trees, but is computationally most expensive and to reduce the number of calculations of the maximum likelihood values for all alternative trees heuristics leading to relatively greatest simplifications are necessary. All phylogenetic algorithms that we used assume correct alignment of positional homologs. For this reason, areas of questionable alignment including regions with gaps in >50% of sequences have been omitted from consideration prior to the process of tree inference (Fig. 1). Such regions, where the sequences appear randomized with respect to evolutionary history are evolving at rates too high for effective phylogenetic analysis (26). Therefore restriction of our analysis to regions that are likely to have the highest signal-to-noise ratio seems justified. Due to the possibility of processes such as domain swapping and recombination with genes coding for MTases other than N4mC-specific (which would generate a hybrid with mosaic similarity to N4mC and other MTase subfamilies), we inferred and compared the evolutionary trees based solely on regions forming the catalytic (motifs III-VIII) and cofactor-binding (motifs X, I and II) subdomains. For each method, the topologies of both trees were nearly identical and the separate alignment of sequences from classes [alpha] and [beta] also gave similar distribution of branches in corresponding subtrees (data not shown). This congruence strongly suggests that both subdomains coevolved and that the recombination events leading to the permutation of the catalytic and AdoMet binding regions in the N4mC MTase family did not involve `domain stealing' (36) from any other family of MTases. This justifies the approach of artificial unification of the order of conserved motifs in sequences from different classes to base the phylogenetic inference on one alignment (Fig. 1). The subtopologies of most branches of the evolutionary trees obtained by maximum parsimony, maximum likelihood and distance criteria are nearly identical (Fig. 2). These topologies are reliable by the criterion of bootstrap and even the removal of putative MTases does not significantly alter the relationship between other lineages (data not shown). This suggests that the markedly different assumptions used by the three algorithms were in agreement with the nature of evolutionary processes governing the divergence of N4mC MTases and small differences most likely come from unequal efficiency of the algorithms in the exploration of the huge space of possible results (see above). Figure 2. Dendrograms representing the relationship between N4mC MTases inferred using (a) distance method, (b) parsimony and (c) maximum likelihood approaches. Conserved subfamilies are shown in color, thermophilic enzymes are indicated by asterisks. For clarity of the presentation M.Hpy99ORF244P, M.Hpy99ORF629P and M.Hpy99ORF248P have been labeled as M.Hpy244P, M.Hpy629P and M.Hpy248P, respectively. Branches with bootstrap values below 50% are shown as broken lines. The bars at the bottom of each phylogeny are scaled to an amino acid replacement distance of 1 (corrected for multiple substitutions). A clear correlation exists between the distribution of the proteins into clusters and the nature of the recognized sequence. All of MTases recognizing the same target form individual branches with the subtopology unchanged between trees and strongly supported by bootstrap values, suggesting that they recently diverged from a common ancestor. Homologs of M.SmaI and M.NgoMXV form coherent clades with bootstrap values close to 100 in all trees and with low estimated branch lengths (Fig. 1). The group of [alpha]-MTases bearing the `SPPY' version of motif IV (M.MvaI homologs) and three MTases from the thermophilic Archaea (M.MthZI, M.PhoIIIP and M.MjaI) also form separate clades in all trees, but with branches rather longer with respect to the common stem. MTases from Helicobacter pylori, M.HpyAXIIBP and M.Hpy99ORF244P (certainly a pair of orthologs) and M.HpyAIIP usually group together, but the subtopology is not congruent between trees. Both `DPPY' MTases, namely M.BamHI and M.BalI, despite their different motif permutations (typical for [alpha] and [beta] classes, respectively) are usually found together, branched out at the central part of the tree. The sequence database searching using sequences of these proteins as queries resulted in almost exclusively N6mA MTases and putative proteins assigned to this family based on sequence similarity (Table 2). These results taken together led us to the conclusion that both M.BamHI and M.BalI MTases diverged relatively recently from N6mA MTases. Table 2. Proteins homologous to M.BalI and M.BamHI MTases The evolutionary relationships among subfamilies are less strongly resolved than those within the subfamilies, bootstrap values for the nodes that define the deep branching pattern are low, indicating that changes in the sampling of alignment position used to generate the trees affect the inferred relationships among subfamilies, especially in the parsimony-based tree. This can be explained as indication of their simultaneous differentiation from the common ancestor. The mutual position of M.NgoMXV, M.MvaI and M.SmaI clades is ambiguous, similarly the position of several single MTases, e.g. M.PvuII. More accurate determination of their relationship should be possible after identification of further members of each subfamily; however, we believe that most of the overall topology of the relationships among subfamilies will not change significantly from that presented here. In our opinion, it would be most reasonable to root the main branches based on more unequivocal data, e.g. comparison of atomic coordinates, if they were available for more N4mC MTases than only M.PvuII. Protein families are often categorized as the result of the possession of conserved motifs. The N4mC MTases share nine weakly conserved regions with N6mC MTases and to some degree with other AdoMet-dependent MTases. The amino acid sequences of N4mC MTases exhibit great divergence, including permutation of structural and functional modules within a common three-dimensional fold, a feature characteristic for all DNA MTases (15). Therefore the issue of evolutionary history and phylogenetic origin of these enzymes is not straightforward. Traditionally, N4mC and N6mA DNA MTases have been considered to be very similar (3,8,9,30). However, the determination of three crystal structures (two for N6mA and one for N4mC) did not fully clarify their relationships, showing incompatibility of target DNA recognition determinants between classes and presenting features common also to N4mC and 5mC MTases and absent from N6mA MTases (11,13-15). The inference of evolutionary relationship among different MTases based on similarity of the three-dimensional fold of their catalytic domains directly supports the scenario, in which the bulk of N4mC and N6mA MTases diverged prior to the specialization of N6mA MTases into DNA and RNA-specific subfamilies (19). At least in certain cases equivalence at the level of target specificity and local sequence similarity between members of different classes could be explained by subsequent convergence. The simplest assumption would be that all genes of the known N4mC MTases evolved from one or several recombining common precursor genes, similarly to 5mC MTases (37). Considering the limited number of proteins in the family and their fairly unique role, namely protection of bacterial DNA against digestion by the restriction endonuclease (ENase) from its `own' R-M system, but also non-cognate ENases (38), one might expect a high level of sequence conservation. However, in striking contrast to the 5mC MTases, the N4mC MTase family encompasses extensive diversity. The phylogenetic trees inferred with different methods suggest that the N4mC family underwent a radical restructuring, leading to inversion of the linear order of two main subdomains and establishing two major highly diverged branches: [alpha] and [beta]. In addition to that, all data support the relationship of M.BamHI and M.BalI not with other N4mC, but rather with N6mA MTases, suggesting a polyphyletic origin of the N4mC MTase subfamily (Table 2, Fig. 3). Recently it has been speculated that M.NgoMXV (and presumably its homologs) might be related to the common ancestor of both N6mA and N4mC MTases, as it shows comparable degree of similarity to representatives of both N-MTase subfamilies (10). Modeling of M.NgoMXV, which exhibits relaxed sequence specificity, indicated its single-domain structure and lack of extended loops (10), which are properties usually assigned to the ancestors of modern enzyme families (39). The `ancient' character of M.NgoMXV would be consistent with the hypothesis that the most highly specific MTases evolved later in the history of this family by acquiring additional target-recognizing determinants (40). Figure 3. Proposed schematic phylogeny of N4mC MTases. The branch lengths are arbitrary and indicate only relative time of divergence of different lineages. The present data do not exclude alternative rooting, e.g. with M.NgoMXV group radiated soon after the major N6mA/N4mC bifurcation. In Figure 3 we propose a general model of polyphyletic evolution of the N4mC MTase family, in which after separation of two main lineages a few widely diverged enzymes narrowed or switched their preference for a methylated base to N4mC-specificity. Recently, Jeltsch et al. (18) demonstrated that certain N6mA MTases are able to methylate mismatched cytosines in artificial substrates. The authors argued that this result supports the hypothesis of independent origin of [alpha] and [beta] subfamilies of N4mC MTases from [alpha] and [beta] N6mA MTases, respectively, considered by Malone et al. (8). Jeltsch et al. (18) suggested that the permutation events must have been so rare that simultaneous use of the same `topological switchpoint' in two families should be considered improbable. However, they did not support this conjecture by any of the established methods of phylogenetic inference, and their biochemical data might equally support our model of late convergence and specificity switching between the N4mC and N6mA MTase families. Whereas the significantly higher degree of overall sequence similarity between [alpha] and [beta] N4mC MTases than between N4mC and N6mA MTases within [alpha] or [beta] groups (our unpublished data) clearly argues for independence of permutation events in the N4mC and N6mA lineages. Even if N4mC and N6mA MTase families were in fact more closely related to one another than to other MTases-a hypothesis not supported by either structure- or sequence-based trees (19)-we suggest that the ancestral N4mC MTases would rather evolve from the relatively most similar [beta]-N6mA lineage. In other words, the [alpha]-topology would independently appear among N6mA and N4mC MTases. We believe that structure solution of any [beta]-N6mA MTase and/or [alpha]-N4mC MTase and including it in a recalculated carbon-[alpha] distance-based tree might help to resolve that controversy. The distribution of `modern specificities' among 5mC or N6mA MTases could result from shuffling of `mobile' TRD units between independently evolving catalytic domains (37,40). However, the analysis of the structure and the docking model of M.PvuII MTase (15) shows that this and related N4mC MTases do not maintain potential DNA-recognizing determinants in one distinct domain (neither in amino acid sequence nor three-dimensional structure). Instead, they seem to be embedded in several loops protruding from between conserved segments of the structural scaffold common to all AdoMet-dependent MTases. Therefore, contrary to 5mC and probably also N6mA MTases, which presumably gained target specificity primarily through fusion with distinct domains, modern specificities of most of N4mC MTases bearing `DPPH' and `SPPY' versions of motif IV may have arisen by extension of flexible loops accommodating substrate nucleic acids in a V-shaped cleft (10,15). Our results indicate that particular specificities evolved only once in the evolutionary history of N4mC MTases. Proteins with similar target recognition properties usually display significant sequence homology and form coherent branches of the evolutionary tree, suggesting that they derive from a common ancestor. The sole exception is a pair of M.BamHI and M.BamHII MTases, extremely diverged at the amino acid sequence level, but recognizing identical target DNA sequence (Fig. 1, Table 1). Docking the substrate DNA onto the three-dimensional models of these two MTases suggests that the possible determinants of sequence specificity are located within dissimilar secondary structural elements, further supporting the case of functional convergence or at least `domain shuffling' (our unpublished data). If sequence specificity is conserved to some degree within subfamilies, then the specificity of some of the uncharacterized proteins in the N4mC family can be predicted by comparison with other members of the same subfamily (Table 1). The presence of N4mC MTases both in Bacteria and Archaea indicates that these enzymes had their origins in the common ancestor of these kingdoms or that one of them acquired the N4mC MTase gene(s) from another by horizontal transfer of genetic material. It also suggests that there is something specific that prevented or at least did not support the diversification of this protein family in the higher organisms. It is hypothesized that the last common ancestor of all cellular organisms was a hyperthermophilic prokaryote (41). However, even if the ancient N4mC MTase was present in the thermophilic cenancestor, many of thermophilic enzymes are more related to their mesophilic homologs, than to each other (e.g. M.PspGI to M.MvaI and M.MjaV to M.BglI, see Fig. 2), indicating that hyperthermophilicity or hyperthermostability of N4mC MTases evolved relatively late and independently from various mesophilic lineages. Also the N4mC MTases from psychrophilic (M.CsyAIP and M.CsyBIP) or halophilic (M.PhiHII) Archaea seem to originate from a mesophile (the bulk of MTases in the `blue' clade in Fig. 2), suggesting multiple events of horizontal transfer of genetic material from already diverged Bacteria. We are aware that the accuracy of our analysis depends on the assumption that sequences and functional annotations of putative proteins are correct. However, we hope that it will stimulate and help to advance the experimental verification of presented premises. As additional complete genome sequences become available, especially from eukaryotic and archaeal genome projects, and relation of the phylogeny of the N4mC MTase family to the organismal phylogeny becomes less obscure it shall be possible to answer the question of whether Eukaryota lost the N4mC activity during evolution or it evolved in Bacteria and/or Archaea after the establishment of the main branches of the tree of life. We believe that combining genomics, molecular phylogeny and comparative biochemistry of DNA MTases will help to solve the problem of the last universal common ancestor, whilst highlighting the pitfalls of horizontal transfer and molecular convergence between paralogs, which when unnoticed may obscure the organismal phylogeny inferred from sequence data. We wish to thank Dr David T. F. Dryden for critical reading of the manuscript, provision of unpublished materials and useful discussions and two anonymous referees for helpful suggestions. We also thank all genome-sequencing groups, especially the Pathogen Sequencing Group at the Sanger Centre, for making their preliminary data publicly available.
*To whom correspondence should be addressed. Tel: +1 313 874 61 28; Fax: +1 313 876 23 80; Email: iamb{at}ibbrain.ibb.waw.pl Permanent address: Monika Radlinska, Institute of Microbiology, University of Warsaw, Nowy Swiat 67, 00-046 Warsaw, Poland
This article has been cited by other articles:
Multiple sequence alignment


Phylogenetic trees
DISCUSSION
ACKNOWLEDGEMENTS
REFERENCES
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification:
Copyright© Oxford University Press, 1999.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
B. Furmanek-Blaszk, R. Boratynski, N. Zolcinska, and M. Sektas
M1.MboII and M2.MboII type IIS methyltransferases: different specificities, the same target
Microbiology,
April 1, 2009;
155(4):
1111 - 1121.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. Nakonieczna, T. Kaczorowski, A. Obarska-Kosinska, and J. M. Bujnicki
Functional Analysis of MmeI from Methanol Utilizer Methylophilus methylotrophus, a Subtype IIC Restriction-Modification Enzyme Related to Type I Enzymes
Appl. Envir. Microbiol.,
January 1, 2009;
75(1):
212 - 223.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. Naderer, J. R. Brust, D. Knowle, and R. M. Blumenthal
Mobility of a Restriction-Modification System Revealed by Its Genetic Contexts in Three Hosts
J. Bacteriol.,
May 1, 2002;
184(9):
2411 - 2419.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
K. Nakahigashi, N. Kubo, S.-i. Narita, T. Shimaoka, S. Goto, T. Oshima, H. Mori, M. Maeda, C. Wada, and H. Inokuchi
HemK, a class of protein methyl transferase with similarity to DNA methyl transferases, methylates polypeptide chain release factors, and hemK knockout induces defects in translational termination
PNAS,
January 17, 2002;
(2002)
32488499.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
A. V. Matveyev, K. T. Young, A. Meng, and J. Elhai
DNA methyltransferases of the cyanobacterium Anabaena PCC 7120
Nucleic Acids Res.,
April 1, 2001;
29(7):
1491 - 1506.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. M. BUJNICKI
Phylogenomic analysis of 16S rRNA:(guanine-N2) methyltransferases suggests new family members and reveals highly conserved motifs and a domain structure similar to other nucleic acid amino-methyltransferases
FASEB J,
November 1, 2000;
14(14):
2365 - 2368.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
S. S. Szegedi and R. I. Gumport
DNA binding properties in vivo and target recognition domain sequence alignment analyses of wild-type and mutant RsrI [N6-adenine] DNA methyltransferases
Nucleic Acids Res.,
October 15, 2000;
28(20):
3972 - 3981.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
K. Nakahigashi, N. Kubo, S.-i. Narita, T. Shimaoka, S. Goto, T. Oshima, H. Mori, M. Maeda, C. Wada, and H. Inokuchi
From the Cover: HemK, a class of protein methyl transferase with similarity to DNA methyl transferases, methylates polypeptide chain release factors, and hemK knockout induces defects in translational termination
PNAS,
February 5, 2002;
99(3):
1473 - 1478.
[Abstract]
[Full Text]
[PDF]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (2187K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (23)
![]()
Request Permissions ![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Bujnicki, J. M.
![]()
Articles by Radlinska, M.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Bujnicki, J. M.
![]()
Articles by Radlinska, M.
![]()
Social Bookmarking ![]()
![]()
What's this?


