Nucleic Acids Research, 2001, Vol. 29, No. 18 3705-3727
© 2001 Oxford University Press
Survey and Summary |
Structure and function of type II restriction endonucleases
Institut für Biochemie (FB 08), Justus-Liebig-Universität, Heinrich-Buff-Ring 58, D-35392 Giessen, Germany
Received February 5, 2001; Revised March 23, 2001; Accepted June 7, 2001.
| ABSTRACT |
|---|
|
|
|---|
More than 3000 type II restriction endonucleases have been discovered. They recognize short, usually palindromic, sequences of 48 bp and, in the presence of Mg2+, cleave the DNA within or in close proximity to the recognition sequence. The orthodox type II enzymes are homodimers which recognize palindromic sites. Depending on particular features subtypes are classified. All structures of restriction enzymes show a common structural core comprising four ß-strands and one
-helix. Furthermore, two families of enzymes can be distinguished which are structurally very similar (EcoRI-like enzymes and EcoRV-like enzymes). Like other DNA binding proteins, restriction enzymes are capable of non-specific DNA binding, which is the prerequisite for efficient target site location by facilitated diffusion. Non-specific binding usually does not involve interactions with the bases but only with the DNA backbone. In contrast, specific binding is characterized by an intimate interplay between direct (interaction with the bases) and indirect (interaction with the backbone) readout. Typically
1520 hydrogen bonds are formed between a dimeric restriction enzyme and the bases of the recognition sequence, in addition to numerous van der Waals contacts to the bases and hydrogen bonds to the backbone, which may also be water mediated. The recognition process triggers large conformational changes of the enzyme and the DNA, which lead to the activation of the catalytic centers. In many restriction enzymes the catalytic centers, one in each subunit, are represented by the PD . . . D/EXK motif, in which the two carboxylates are responsible for Mg2+ binding, the essential cofactor for the great majority of enzymes. The precise mechanism of cleavage has not yet been established for any enzyme, the main uncertainty concerns the number of Mg2+ ions directly involved in cleavage. Cleavage in the two strands usually occurs in a concerted fashion and leads to inversion of configuration at the phosphorus. The products of the reaction are DNA fragments with a 3'-OH and a 5'-phosphate. | INTRODUCTION |
|---|
|
|
|---|
Restriction endonucleases occur ubiquitously among prokaryotic organisms (1,2). Their principal biological function is the protection of the host genome against foreign DNA, in particular bacteriophage DNA (3). Other functions are still being discussed, such as an involvement in recombination and transposition (47). In addition, there is evidence that the genes for restriction and modification enzymes may act together as selfish elements (8).
By definition, restriction endonucleases are parts of restrictionmodification (RM) systems, which comprise an endonuclease and a methyltransferase activity. Whereas the substrate of the restriction enzyme is foreign DNA, which is cleaved in response to defined recognition sites, that of the modification enzyme is the DNA of the host which is modified at the recognition sequence and, thereby, protected against attack by the restriction endonuclease. Three types of RM systems have been found and were classified according to their subunit composition, cofactor requirement and mode of action (9). The distinction between type I, II and III systems is still useful, but it is becoming apparent that there are intermediate cases (vide infra).
The present review will deal with the type II restriction endonucleases, which, because of their extraordinary importance for gene analysis and cloning work, have been studied in great detail. Moreover, they have proven to be excellent model systems to study highly specific proteinnucleic acid interactions, to investigate structurefunction relationships and, last but not least, to understand the mechanisms of evolution within a large family of functionally related enzymes.
The last comprehensive reviews on the structure and function of type II restriction endonucleases appeared in 1993 (1) and 1997 (10). Since then about 1000 new type II restriction enzymes [compare entry numbers in (11) and (12)] were identified, eight more crystal structures determined (giving a total of 12 structures) and many biochemical studies published (http://rebase.neb.com).
From these structural and functional studies it is clear that the family of type II restriction endonucleases is more heterogeneous than originally thought. To point out the common features of these enzymes and the peculiarities of some of them will be the main focus of this review.
| THE DIVERSITY OF RESTRICTION ENDONUCLEASES |
|---|
|
|
|---|
The main criterion for classifying a restriction endonuclease as a type II enzyme is that it cleaves specifically within or close to its recognition site and that it does not require ATP hydrolysis for its nucleolytic activity.
The orthodox type II restriction endonuclease is a homodimer of
2 x 30 kDa molecular mass, which recognizes a palindromic sequence 48 bp in length, and in the presence of Mg2+ cleaves the two strands of the DNA within or immediately adjacent to the recognition site to give a 5'-phosphate and a 3'-OH end. Typical representatives (Table 1) are EcoRI (which produces sticky ends with 5'-overhangs) (13), EcoRV (which produces blunt ends) (14) and BglI (which produces sticky ends with 3'-overhangs) (15).
|
Many type II restriction endonucleases do not conform to this narrow definition, making it necessary to define subdivisions. A new nomenclature for these heterodox type II restriction endonucleases (Table 1) has recently been proposed (R.Roberts, personal communication).
Type IIS restriction endonucleases recognize asymmetric sequences and cleave these sequences at a defined distance (reviewed in 16), for example FokI. Until recently it was believed that these enzymes function as monomers. However, it is now clear from studies on FokI that it dimerizes on the DNA and this may be a more general phenomenon (17).
Type IIE restriction endonucleases interact with two copies of their recognition sequence, one being the target for cleavage, the other serving as allosteric effector (18), for example NaeI.
Type IIF restriction endonucleases are similar to type IIE enzymes, in as much as they interact with two copies of their recognition sequence. They differ from the type IIE enzymes in that they cleave both sequences in a concerted reaction (19), for example NgoMIV.
Type IIT restriction endonucleases are composed of two different subunits, for example Bpu10I and BslI. Bpu10I recognizes an asymmetric sequence and functions as a heterodimer (
ß) in which both subunits presumably have one active site (20). BslI recognizes a palindromic sequence and functions as a heterotetramer (
2ß2) (21).
Type IIB restriction endonucleases cleave DNA at both sides of the recognition sequence, for example BcgI which recognizes an asymmetric sequence, or BplI which recognizes a symmetric sequence. These enzymes are composed of different subunits (BcgI,
2ß; BplI,
ß) and have restriction and modification activity. They require the presence of AdoMet for restriction (22,23). For BcgI, it was shown that the catalytic centers for restriction and modification are located in the
-subunit, whereas the ß-subunit harbors the target recognition domain (24).
Type IIG restriction endonucleases like IIB enzymes are stimulated by AdoMet but have both restriction and modification activity present in a single polypeptide chain (25), for example Eco57I.
Type IIM restriction endonucleases recognize methylated DNA (26), for example DpnI.
Restriction endonucleases, like McrBC, also require a methylated DNA substrate. They resemble type I and type III enzymes in as much as they are dependent on nucleoside triphosphate hydrolysis (GTP in the case of McrBC) for DNA cleavage. Escherichia coli McrBC for cleavage requires two C5- or N4-methylated (or C5-hydroxymethylated) PuC sites (Pu = A or G), carrying at least one methyl group per half-site, at a distance of 40 to
2000 bp (27). Cleavage occurs somewhere between the two sites (28). Whereas the McrB subunit is responsible for DNA recognition (29) and GTP cleavage (30), the McrC subunit harbors the catalytic center for phosphodiester bond hydrolysis (U.Pieper and A.Pingoud, submitted). The fact that McrBC requires GTP hydrolysis for cleavage would also justify classifying it as a variant of the type III enzymes. These restriction endonucleases have not been included in Table 1 because they are dependent on nucleoside triphosphate hydrolysis.
It is clear that this nomenclature does not do justice to borderline cases. Consider for example FokI, the archetypal IIS enzyme, which according to recent investigations could also be considered a type IIE enzyme (192), as it requires binding to a second recognition sequence. The recently discovered restriction enzyme HaeIV like BcgI cleaves double-stranded DNA on both sides of its recognition sequence, which means that it should be classified as a type IIB enzyme. On the other hand it harbors restriction and modification activity in one polypeptide chain, making it similar to type IIG enzymes but, in contrast, is not stimulated by AdoMet (31). Type IIT enzymes were originally classified as similar to type IIS enzymes that recognize an asymmetric sequence, but consist of two different subunits. Only last year a type IIT enzyme was discovered, BslI (21), that recognizes a palindromic sequence. Of course, restriction endonucleases that do not fit into any of these subdivisions will continue to be discovered. Eventually this will lead to new subdivisions.
| THE SIMILARITY OF RESTRICTION ENDONUCLEASE STRUCTURES |
|---|
|
|
|---|
With a few obvious exceptions of closely related isoschizomers, like EcoRI and RsrI (recognizing G
AATTC), MthT1, FnuDI and NgoPII (recognizing GG
CC), XmaI and Cfr9I (recognizing CCC
GGG), BanI and HgiCI (recognizing G
GYRCC), TaqI/TtHB8I (recognizing T
CGA), BsoBI and AvaII (recognizing C
YCRG), to name a few that share between 50 and 80% identical amino acid residues, type II restriction enzymes display little, if any, sequence homology, which had been interpreted to mean that these enzymes are evolutionarily unrelated (4,32). This conviction began to lose credibility by the observation that there is a statistically highly significant correlation between the genotype (amino acid sequence) and the phenotype (recognition sequence, site of cleavage) of restriction enzymes (33).
With the determination of more crystal structures it became clear that all restriction endonuclease structures so far known (Fig. 1) have a very similar core (34), including orthodox restriction enzymes producing sticky ends with a 5'-overhang (BamHI, BglII, EcoRI, MunI, BsoBI), sticky ends with a 3'-overhang (BglI) or blunt ends (EcoRV, PvuII), as well as members of the type IIS (FokI), type IIE (NaeI) and type IIF (Cfr10I, NgoMIV) subdivisions (Fig. 1). This core consists of a five-stranded mixed ß-sheet flanked by
-helices, as first recognized by a comparison of the structures of EcoRI and EcoRV (35). Intriguingly, this core is also present in four other proteins with an endonuclease function, namely
-exonuclease (34,36), MutH (37) which is involved in methyl-directed mismatch repair, Vsr endonuclease (38) which is involved in the repair of TG mismatches, and TnsA (39), one of two subunits of the Tn7 transposase. The conserved core harbors the catalytic center: it brings into spatial proximity two carboxylates, typically one aspartate and one glutamate or aspartate residue, and one lysine residue.
|
The structural similarity of the type II restriction endonucleases suggests that they indeed have a common, although distant, ancestor. On the basis of a comparison of protein structures a phylogeny of the restriction endonuclease superfamily was proposed (40), with two main branches, one comprising BglI, EcoRV and PvuII (as well as MutH and
-exonuclease), the other BamHI, Cfr10I, EcoRI and FokI. The distinction between an EcoRI-like family and EcoRV-like family had been made before and not only associated with similarities in structure but also with similarities in function: EcoRI, like BamHI, binds the DNA from the major groove side and produces sticky ends with 5'-overhangs, whereas EcoRV, like PvuII, approaches the DNA from the minor groove side and produces blunt ends. This has consequences for the positioning of the two active sites and, therefore, for the arrangement of the two subunits in the homodimer. Thus, the nature of the cleavage pattern rather than the DNA sequence recognized, appears to be the most important constraint on the mode of dimerization of restriction endonucleases (41). Within the common core, characteristic for type II restriction endonucleases, only four ß-strands are absolutely conserved, two of these strands (ß2 and ß3 in EcoRI; ßd and ße in EcoRV) contain the amino acid residues directly involved in catalysis, the remaining ones may be critical for formation of the ß-sheet and the hydrophobic core. The other secondary structure elements of the common core could have been altered during divergent evolution (42). In this context, it was also observed that the EcoRI and EcoRV families differ in the orientation of a ß-strand (ß5 in EcoRI; ßh in EcoRV), as noted before based on a smaller data set (43).
Several restriction enzymes function as homotetramers. The crystal structures of two of them, Cfr10I and NgoMIV, the latter as an enzymeproduct complex, were determined (44,45). As expected from their function as type IIF enzymes, they can be considered as dimers of dimers, with a back-to-back orientation, which puts the DNA binding sites of the primary dimers at opposite ends of the tetramer. In the NgoMIV tetramer, the dimers are rotated relative to each other by
60° around their 2-fold axis, in Cfr10I the angle is more like 90°. In both cases, dimerdimer contacts are extensive, the total contact surface area between primary dimers being 3200 Å2 (NgoMIV) and 2300 Å2 (Cfr10I), respectively. As shown for Crf10I, tetramerization nevertheless can be easily disrupted by a single amino acid substitution at a strategic position in a loop at the tetramer interface (44). This argues for a continuous transition between type IIF enzymes and orthodox type II enzymes, some of which, e.g. EcoRI (46), also tend to be homotetramers at higher concentrations.
All restriction enzymes are composed of subdomains, one of which constitutes the common core with the catalytic center. The other subdomains, which are in part responsible for DNA binding and dimerization, are more diverse in structure than the catalytic core. Consider for example the related proteins EcoRV (47) and PvuII (48). Both have an N-terminal dimerization subdomain which in EcoRV is formed by a short
-helix, a two-stranded antiparallel ß-sheet, followed by a long
-helix, while in PvuII it consists of a long
-helix connected via a loop to a shorter
-helix. In spite of the difference in size of EcoRV and PvuII, the dimerization interface is of similar size (2300 Å2). BglI, which belongs to the same family as EcoRV and PvuII, but recognizes an interrupted sequence (GCCNNNN
NGGC) and cleaves the DNA to produce sticky ends with 3'-overhangs, has an usually large dimer interface (3100 Å2) in which one side of each subunit is involved (49). In contrast, EcoRI (50) and BamHI (51) have a very similar dimerization module, two
-helices which in the dimer form a four-helix bundle. EcoRI in addition has a small two-stranded antiparallel ß-sheet, which interacts with the symmetry related ß-sheet of the other subunit. Altogether, BamHI has a considerably smaller subunit interface than EcoRI (800 versus 2600 Å2). BsoBI, which is closely related to BamHI and EcoRI (as well as the tetrameric Crf10I) but with a molecular mass of 36.7 kDa per subunit the largest one of the three, has a large all helical subdomain fused to the cleavage domain. This subdomain is closely associated with the symmetry related other subdomain: between them 3500 Å2 of surface area is buried, whereas between the pair of catalytic subdomains only 1000 Å2 is buried. With 4800 Å2 BsoBI has the largest subunitsubunit interface among the dimeric restriction enzymes whose structure is known so far (52). It must be emphasized that the principal functions of restriction enzymes, namely dimerization, DNA binding and DNA cleavage, are interwoven, which means that regions involved in one function are often also of importance for another function (see also Fig. 4).
|
The type IIS restriction endonuclease FokI has a two-domain structure (53), a recognition domain comprising three smaller subdomains which are structurally related to the helixturnhelix motif containing DNA binding domain of the catabolite gene activator protein, and a cleavage domain which is similar to a BamHI monomer. In the crystal, FokI is a dimer (54), in which dimerization is mediated by the cleavage domain. The total surface area buried in the dimer interface is unusually small (800 Å2) which may explain why FokI is a monomer in solution. Dimerization is required for DNA cleavage: presumably, a FokI monomer binds DNA at its recognition site and then recruits a second FokI monomer bound to another recognition site to form a dimer which catalyzes cleavage at the first site (17,54).
Another restriction endonuclease with a two-domain structure is the homodimeric type IIE enzyme NaeI (42). One domain (Endo domain) is structurally very similar to other type II restriction endonucleases and is responsible for substrate binding and cleavage as well as for dimerization, the other domain (Topo domain) contains a helixturnhelix motif, similar to the catabolite gene activator protein, and presumably harbors the effector DNA binding site of NaeI (42). It is likely that this domain is also responsible for the topoisomerase activity of the L43K variant of NaeI (55).
The unusually large amino acid sequence of some type II restriction endonucleases suggest that they are composed of more than one domain. EcoRII, for example, is a homodimer with a subunit molecular mass of 45.6 kDa (56). Its enzymatic activity depends on the simultaneous binding of two copies of the recognition sequence (57), which means that it must have two DNA binding sites: indeed, it was shown recently that EcoRII like NaeI (58) induces loops in DNA containing two recognition sites (59). This could be interpreted to mean that EcoRII has a similar structural organization to NaeI, with one active site and one allosteric site (60) or, although less likely, two tightly coupled active sites as normally observed with type IIF enzymes (61,62), which are homotetramers. Another example of a large restriction endonuclease is Sau3AI, which is a monomer with a molecular mass of 56.5 kDa (63). Biochemical experiments demonstrate that it dimerizes on the DNA and like a type IIE or F enzyme, requires two recognition sites for efficient DNA cleavage (193). A remote sequence similarity between the N- and C-terminal halves of Sau3AI suggests that Sau3AI is a pseudodimer which dimerizes in the presence of DNA and thus could be considered to be a pseudotetramer in its active form. The gene for HgiDII codes for a protein of 68 kDa (64); inspection of the sequence revealed that it contains in its N-terminal half all consensus elements typically found in the GHKL family of ATPases, the significance of this observation being unclear (P.Friedhoff, personal communication).
| THE INTERACTION OF RESTRICTION ENDONUCLEASES WITH DNA |
|---|
|
|
|---|
Restriction endonucleases interact with DNA in a complex manner. Because of the large size of a normal DNA substrate the reaction of a restriction enzyme with DNA cannot be simply formulated as a sequence of two or three steps. Figure 2 presents a minimal scheme for the individual steps involved in DNA cleavage by a type II restriction endonuclease. The reaction cycle starts with non-specific binding to the macromolecular DNA, which is followed by a random diffusional walk of the restriction endonuclease on the DNA. If a recognition site is not too far away from the initial site of contact it will most likely be located within one binding event. At the recognition site, conformational changes take place that constitute the recognition process and lead to the activation of the catalytic centers. After phosphodiester bond cleavage in both strands the product is released, either by direct dissociation of the enzymeproduct complex or by a transfer of the enzyme to non-specific sites on the same DNA molecule. Often this step is rate limiting for DNA cleavage by restriction enzymes under multiple turnover conditions. In the following sections we will deal with the individual steps of this reaction cycle.
|
| DNA BINDING AND TARGET SITE LOCATION |
|---|
|
|
|---|
All restriction endonucleases bind DNA not only specifically but also, with considerably weaker affinity, non-specifically, similar to other proteins that recognize a specific DNA sequence (65). Upon non-specific complex formation, counterions and water molecules are released from the proteinDNA interface (66), which because of the associated favorable entropy changes balances the unfavorable loss of translational and rotational entropies of the protein and DNA upon complex formation. Proteinphosphate contacts on the other hand will lead to positive enthalpy changes. For EcoRI (67) and EcoRV (68) it has been shown by analyzing osmotic pressure effects on DNA binding that non-specific complex formation is accompanied by a release of 7080 water molecules.
It is likely that upon non-specific DNA binding conformational changes occur, mainly in the protein, which will lead to an adaptation of the surface of the two macromolecules as is apparent from structural studies on EcoRV (47) and BamHI (69). Figure 3 shows the structures of EcoRV and BamHI together with the structures of their non-specific and specific DNA complexes. For EcoRV it is obvious that the enzyme has to open its DNA-binding site, which requires a conformational transition that presumably is triggered by a transient contact between the outer sides of the C-terminal arms of EcoRV and the DNA (70) and that allows the DNA (non-specific or specific) to enter the binding cleft. A similar mechanism of DNA binding has been discovered recently for the T7 helicaseprimase protein (71). A region at the floor of the DNA binding site of EcoRV (the Q-loop), which is disordered in the free enzyme, becomes ordered in the non-specific complex. The stable non-specific complex differs from the specific complex by being less compact and having a much smaller proteinDNA contact surface: 1370 versus 2173 Å (47). No base contacts are seen in the non-specific complex; DNA backbone contacts are fewer in numbers and differ substantially from those observed for the specific complex. For BamHI, it seems as if major conformational transitions are not required to allow access of the DNA to the DNA binding cleft. Nevertheless, DNA binding is accompanied by an induced fit, as a large segment at the floor of the DNA binding site (residues 7992), which is disordered in the free enzyme, becomes ordered in the non-specific complex. It is intriguing to note that no base-specific contacts and no direct DNA backbone contacts are seen in the non-specific complex, only a few water-mediated contacts even though the non-specific DNA used in the co-crystallization experiment differed only in one base pair from the specific DNA sequence (69). As observed with EcoRV, the non-specific BamHIDNA complex is more open and less compact than the specific complex.
|
Non-specific DNA binding is the prerequisite for one-dimensional diffusion of proteins along DNA (72). The structures of the non-specific complexes of EcoRV and BamHI provide remarkable snapshots of enzymes poised for linear diffusion (rather than cleavage) (69), the enzymes being only loosely bound to the DNA and their catalytic centers at a safe distance from the phosphodiester backbone. One-dimensional diffusion is defined as translocation along a DNA molecule, which does not involve a true free state of the protein: it includes sliding (i.e. a helical movement due to tracking along a groove of the DNA), hopping (i.e. a movement more or less parallel to the DNA, during which the protein does not leave the DNA domain) as well as intersegment transfer (which requires two DNA binding sites on the protein) (7274). It has been shown for EcoRI and EcoRV that sliding is the most important process in target site location (75,76). Leaving the target site after DNA cleavage might involve either sliding or hopping (77,78). The biological significance of linear diffusion is obvious. It can accelerate target site location, as shown for EcoRI (75,79,80), BamHI (80,81), HindIII (80), EcoRV (76,82,83) and BssHII (84), it can increase processivity as for example shown for EcoRI (85) or EcoRV (78) and it can accelerate the dissociation from the specific site after cleavage, as is the case for EcoRI (79). Under optimum conditions, restriction endonucleases can scan
106 bp in one binding event. As this scan is a random walk, the effective sliding distance is much shorter,
1000 bp, as shown for EcoRI and EcoRV (75,76,80). During linear diffusion, EcoRI follows the helical pitch of the DNA, does not overlook any recognition site on its route and pauses at sites that resemble the recognition site; proteins firmly bound to DNA or unusual DNA structures constitute road blocks (75). The ionic milieu, in particular the Mg2+ concentration, has a strong influence on the effective sliding distance, as shown for EcoRI (85) and EcoRV (76). It must be emphasized that linear diffusion is not just a test tube curiosity but a process of importance in vivo (83,86), because the biological function of many enzymes acting on DNA requires fast target site location. | DNA RECOGNITION |
|---|
|
|
|---|
Restriction endonucleases while linearly diffusing along the DNA must constantly scan the major groove, possibly also the minor groove, for recognition elements at the edges of the bases. Coming into contact with some idiosyncratic features of the DNA backbone and the bases, characteristic for the recognition sequence, triggers the highly cooperative conversion of a non-specific to a specific complex, which requires major conformational changes of both the protein and the DNA, as well as the expulsion of solvent molecules from the interface to allow for more intimate contacts. For EcoRI it was shown that altogether about 150 water molecules are released upon specific DNA binding (67), much more than upon non-specific binding (87). Interestingly, binding of BamHI to its cognate sequence is accompanied by the release of a somewhat smaller number of solvent molecules (88). Whereas non-specific DNA binding by EcoRI and BamHI has a
Cp°
0 and is enthalpy driven, specific DNA binding by these enzyme has a
Cp° < 0. Depending on temperature, specific binding is enthalpy or entropy driven (89).
The EcoRV system provides an excellent and so far unique example among type II restriction endonuclease of a major protein-induced conformational change of the DNA. In the specific complex the DNA is bent by
50° (compared with little if any bending in the non-specific complex), as determined both in the crystal (47) and in solution (9092). This angle varies somewhat, depending on the crystal form, the particular oligodeoxynucleotide and EcoRV variant used for the co-crystallization (93) (see also Fig. 6). Bending of the DNA is largest at the central TA step, which leads to an unstacking of the bases, widening of the minor groove with a concomitant compression of the major groove, which most importantly brings the scissile phosphates deeper into the active site. It is interesting that the DNA bend is preserved in the product complex (94) as well as in a quasi-product complex in which the 5'-phosphate is missing at the site of cleavage (95), indicating that the continuity of the phosphodiester bond is not required for bending. In this context it is worth mentioning that chemically modified oligodeoxynucleotide substrates (in which G is replaced by inosine and C by 5-methyl cytosine) are bent to a similar extent as the corresponding unmodified oligodeoxynucleotide (96) and that in the crystal, bending is also observed in the absence of divalent cations (47). This means that bending is required but not sufficient for DNA recognition.
|
Another example of DNA bending in the specific complex, although not as pronounced as with EcoRV, is provided by the EcoRI (97) and MunI systems (98). For both enzymes, which recognize the same AATT core sequence in their hexanucleotide recognition sequence (G
AATTC and C
AATTG, respectively), a central kink is observed, accompanied by unwinding of the DNA. A similar but more localized unwinding and a similar overall bending but without a central kink of the DNA has been observed in the specific BglIIDNA complex (99). In contrast, BamHI, which recognizes the same GATC core sequence within its hexanucleotide recognition sequence as BglII (G
GATCT and A
GATCT, respectively), does not bend, kink or unwind the DNA significantly (100). Whereas no major DNA distortion is observed for PvuII (43) which like EcoRV is a blunt end cutter, BglI (49) which is a sticky end cutter leaving 3'-overhangs, bends the DNA by
20°, more or less smoothly without major kinks, the largest deviations from B-form DNA being seen in the two recognition half-sites of the interrupted GCCN4
NGGC recognition sequence. Also, for the most recently reported structure of a specific restriction enzymeDNA complex, BsoBI (52), no pronounced DNA bending is observed; however, slight deviations from canonical B-form exist: the DNA is extended and undertwisted, making the major and minor groove wider and more shallow. Taken together, no generalization can be made for the kind and extent of distortion type II restriction enzymes induce in their DNA substrate. In general, however, the local helical parameters of the DNA in the specific complex differ from ideal B-DNA parameters (or where it is known from the helical parameters of the specific oligodeoxynucleotide used in the co-crystallization). It is important to note that distortions are an intimate part of the recognition process. In a few instances this has been experimentally verified by facilitating such a distortion using chemically modified substrates or substrate analogs, for example EcoRV (101), and demonstrating that they are bound more firmly than the natural (undistorted in the free state) substrate.
For EcoRV (47,9395,102,103) and BamHI (69,100) the structural changes occurring in the protein during the transition from the non-specific complex to the specific complex are known from detailed crystallographic analyses (Fig. 3). In EcoRV correlated movements of the protein occur in concert with the binding and unwinding of the DNA (93). These movements are characterized by a translation and rotation of the long B-helices as well as a rotation of the DNA binding domains by
25°; they lead to an induced fit of the protein around the DNA substrate. Of particular importance is the ordering of the recognition loop, which appears to be largely unstructured in the non-specific complex and becomes structured only in the specific complex. It is responsible for making all base-specific contacts in the major groove and presumably is part of a communication network between the two identical subunits which have to act in concert to achieve double-strand cleavage in one binding event (104). The specific complex is more compact than the non-specific one, mainly because of the rotation of the DNA binding domains which brings these two domains closer together and allows them to encircle the DNA almost completely (Fig. 3).
The conformational changes that BamHI undergoes in the transition from non-specific to specific DNA binding are very different from those observed for EcoRV, in spite of the fact that in both cases the binding cleft is wider in the non-specific complex than in the specific complex and that the specific complex is more compact than the non-specific complex. The more compact structure of the specific BamHIDNA complex is in part due to the fact that a segment of the protein is pushed back into the protein core by the specific DNA, while the same segment is located in the binding cleft in the non-specific DNA complex. Whereas the non-specific BamHIDNA complex (69) preserves the 2-fold symmetry of the free enzyme (51), the specific complex is characterized by a pronounced asymmetry (100), produced by the unfolding of the C-terminal
-helix in both subunits and the insertion of the unfolded polypeptide segment in one subunit only in an extended conformation into the minor groove of the DNA, while in the other subunit the unfolded polypeptide makes a side-by-side contact with the phosphodiester backbone (Fig. 3). Furthermore, whereas in the non-specific complex the DNA is only loosely bound within the cleft formed by the two subunits such that it protrudes out of the cleft (more so than with EcoRV), in the specific complex it is almost surrounded by the enzyme (like in EcoRV). Another remarkable difference between the specific and the non-specific complex concerns the orientation of the DNA relative to the two subunits of BamHI. In both complexes, the 2-fold axis of the dimeric protein coincides with the 2-fold axis of the DNA. However, compared with the non-specific complex, the enzyme is tilted about this axis by
20°, resulting in a different contact area at the periphery of the DNA binding site in the non-specific and the specific complex (69).
Given the fact that at present only in two systems, EcoRV and BamHI, can a comparison be made between the non-specific and the specific complex, only very general statements can be made regarding the structural changes accompanying the transition from non-specific to specific binding. Because of the similarities in function it is likely for all type II restriction endonucleases that the specific complex will be more compact than the non-specific one, in order to allow for a tighter contact between enzyme and substrate. Presumably, this will be achieved by a reorientation of the two subunits towards each other and the DNA, which will lead to a compaction of the DNA binding site and a more or less complete encircling of the DNA. The re-orientation can be substantial, as is apparent from the comparison of the structures of BglII in the free (105) and bound state (98): to bind DNA, the enzyme has to open by a scissor-like motion of the subunits parallel to the DNA helix axis, which is accompanied by a complete rearrangement of the
-helices at the dimer interface. In contrast, in EcoRV and BamHI opening the binding cleft is achieved essentially by a motion of the subunits in a direction perpendicular to the helix axis.
There is an interesting difference between EcoRV on one side and BamHI as well as most of the other restriction endonucleases on the other side, including EcoRI. EcoRV requires the presence of Mg2+ (106) or Ca2+ (91) for specific binding. In the presence of EDTA, EcoRV in a gel electrophoretic mobility shift assay produces multiple bands, whose concentration-dependent distribution demonstrates that this enzyme binds all DNA sequences with similar affinity (82), a conclusion that was challenged (107), and then confirmed by binding studies with oligodeoxynucleotides in solution (108). In a more recent study, the preference of EcoRV for its cognate sequence in the absence of divalent cations was shown to be within a factor of 10 at neutral pH (109), in agreement with results obtained previously for wild-type EcoRV (110) and an EcoRV variant (111). Similar results were reported for PaeR7 (112), TaqI (113), Cfr9I (114), BcgI (115), MunI (116), Cfr10I (117) and BglI (118), which also require Ca2+ (as a substitute for Mg2+) for specific binding. For MunI it was shown that this requirement could be relaxed by protonation or substitution of the active site carboxylates (119), indicating that the divalent cation is required to decrease the electrostatic repulsion between the protein and the DNA at the active center. For EcoRV, additional Mg2+ binding sites outside the catalytic center are required for specific binding, as the substitution of the active site carboxylates does not alleviate the Mg2+-dependence of specific binding (111). We suggest, therefore, that for some restriction endonucleases Mg2+ (or other divalent cations) is involved in the recognition process, not only in the transition state, where its contribution is obvious, but also for preferential and strong (i.e. specific) binding of the recognition sequence. That restriction endonucleases have additional divalent metal ion binding sites already in the absence of DNA has been shown by metal ion mapping experiments for TaqI (120) and by crystallography for PvuII near Tyr94 (M.Kokkinidis, personal communication). This residue has been discussed previously as being involved in metal ion positioning on the basis of a PvuII mutantDNA co-crystal structure (121). The Tyr94 site of PvuII is only seen with Mg2+ soaked into the crystals and not with Mn2+, which may explain why this site was not seen in the metal ion mapping experiments carried out with Fe2+ (122). In the presence of the DNA substrate more divalent metal ion binding sites may appear, as has been shown by crystallography and biochemical studies for EcoRV at position His71, His193 and a phosphodiester group within the recognition site (GpATATC), respectively (93,111) (F.Winkler, personal communication). Figure 6 gives a compilation of all metal ion binding sites observed in EcoRV so far illustrating that the interaction of a restriction enzyme with metal ions must be considered a very complicated issue. It is interesting to note that restriction enzymes that do not require divalent cations for specific DNA binding, like EcoRI, can be made dependent on divalent cations by introducing amino acid substitutions in critical positions. The EcoRI K130A or E and R131E variants behave like EcoRV in requiring Ca2+ for specific binding (123). This argues against a fundamental difference between enzymes that achieve specificity already at the binding step (in the absence of Mg2+) or only in the catalytic step (in the presence of Mg2+).
Since 1997, when we discussed the recognition process for EcoRI, EcoRV, BamHI and PvuII (for details see 10), six more co-crystal structures of specific restriction endonucleaseDNA complexes were determined: FokI (53), BglI (49), MunI (98), BglII (99), NgoMIV (45) and BsoBI (52).
FokI
FokI, a type IIS enzyme recognizes the asymmetric sequence GGATG and makes a staggered cut 9 and 13 nt, respectively, downstream of the recognition sequence, after dimerization on the DNA via its cleavage domain (17,54). FokI approaches the DNA from the major groove side and appears to surround the DNA. The recognition domain consists of three subdomains (D1, D2 and D3), which all contain a helixturnhelix motif and are similar to the DNA binding domain of the catabolite gene activator protein. DNA recognition is based on two modules: subdomain D1, which covers the major groove at the 3'-end of the recognition sequence (GGATG), and subdomain D2, which contacts the 5'-end of the recognition sequence (GGATC). Subdomain D3 is not involved in proteinDNA but rather in proteinprotein interactions. FokI, like all other restriction endonucleases, makes extensive interactions to all bases of the recognition sequence: almost all hydrogen-bond acceptors and donors at the edges of the bases in the major groove are involved in direct contacts with the protein.
BglI
BglI, an orthodox type II enzyme, recognizes the sequence GCCN5GGC and cleaves between the fourth and fifth unspecified nucleotide to produce 3'-overhanging ends. BglI approaches the DNA from the minor groove side (Fig. 1), similarly to EcoRV and PvuII with which it shares many structural features, in spite of the fact that the two subunits are arranged differently than in these two proteins in order to accommodate the unspecified sequence between the two recognition half-sites and to produce the different cleavage pattern (3'-overhangs versus blunt ends). Due to the long distance between both recognition half-sites, each subunit of BglI contacts only one half-site and cleaves close to it: there is no cross-over mode of recognition, as observed for most of the other type II restriction endonucleases and argued to be beneficial for concerted double-strand cleavage (10). This might be achieved in this case solely by the extensive hydrogen-bonding network that connects the catalytic centers of the two subunits. BglI makes base contacts predominantly in the major groove. The unspecified 5 bp between the two half-sites are contacted at the sugarphosphate backbone. The base contacts in the major groove involve amino acid residues located on or near to a small three-stranded ß-sheet (recognition sheet), in a topologically similar location as observed for EcoRV and also PvuII (Fig. 4). Per recognition site there are 16 direct hydrogen bonds and two water-mediated ones, which saturates the hydrogen-bonding potential in the major groove. Moreover, there is one direct and several indirect, i.e. water-mediated, contacts to the bases from the minor groove side. In addition to these base contacts (direct readout), there are numerous backbone contacts (indirect readout); altogether 17 direct and 21 water-mediated hydrogen bonds per subunit to the DNA phosphates.
MunI
MunI recognizes the sequence C
AATTG. The core sequence AATT as well as the cleavage pattern is the same as that for EcoRI. This and the identification of local sequence similarities, which concern structural elements of EcoRI involved in recognition and cleavage, led to the suggestion that MunI might employ a similar mechanism for DNA recognition and cleavage (124). The determination of the co-crystal structure of the specific MunIDNA complex confirmed this proposition (Figs 1 and 4) and thereby provides the first example in which two restriction enzymes contact common parts of their recognition sequence by homologous structural elements. MunI, like EcoRI, approaches the DNA from the major groove side and distorts the DNA in a similar manner as EcoRI. MunI makes base contacts only in the major groove. There are altogether 16 hydrogen bonds to the edges of the bases and six van der Waals contacts per hexanucleotide recognition site. The outer GC base pair is contacted by Arg115, which has no counterpart in EcoRI. The AATT core sequence is recognized by amino acid residues located on one segment (Arg115 to Arg121), which in its topological location and function has a correlate in EcoRI, where it is responsible for the recognition of the same core sequence (AATT). In addition to base-specific contacts, numerous contacts exist between the sugarphosphate backbone of the DNA and the protein, extending to two phosphate residues outside of the recognition sequence. These contacts come from several regions of the protein, which in part are also involved in base contacts. Thus, direct and indirect readout are interwoven. Some of these contacts are very similar to those observed previously in the EcoRIDNA complex and they are considered to stabilize the distorted DNA conformation (50,125). Thus, not only are there common features in base recognition between EcoRI and MunI, but also in backbone recognition (see also 126). Deibert et al. (98) suggest that this finding may eventually be extended to ApoI, which recognizes and cleaves the sequence Pu
AATTPy.
BglII
BglII recognizes and cleaves the sequence A
GATCT, which closely resembles the recognition sequence of BamHI (G
GATCC). The determination of the co-crystal structure (Fig. 1) of a specific BglIIDNA complex (99) allowed for a comparison of the strategies employed for recognition (see also 126). Although the enzymes have a similar core structure, there are remarkable differences in the way these two enzymes interact with their substrate. The most obvious difference regarding the mode of recognition is that in BglII the core structure is augmented by a ß-sandwich subdomain that fully encircles the DNA and is responsible for the minor groove contacts as well as some of the backbone contacts. Different from the EcoRI/MunI systems, BamHI and BglII, which also share a common tetranucleotide in their respective hexanucleotide recognition sequences, interact with this tetranucleotide sequence differently andwith one exception (Asn140 and Ser141 in BglII correspond to Asp154 and Asp155 in BamHI, both recognizing the respective outer base pairs)use different structural elements for recognition (Fig. 4). Although BglII (like BamHI) approaches the DNA from the major groove side, contacts are also made to the edges of the bases in the minor groove. Three loops are responsible for all base contacts: Asn140 and Ser141 (loop C) recognize via their side chain functions the first TA base pair and the C of the second CG base pair. The G of the second base pair is contacted by water-mediated bidentate hydrogen bonds from Asn98 (loop B). The T of the third TA base pair is recognized by Tyr190 of one subunit (loop D), and the A by Ser97 of the other subunit (loop B). There are in addition four more water-mediated hydrogen bonds between the minor groove face of the bases and Tyr190 and Arg192. Altogether there are 14 hydrogen bonds to the major groove and five hydrogen bonds to the minor groove. There is a pronounced intertwining of the recognition of the two strands/two halves of the recognition sequence on one side and the two subunits on the other side. Numerous interactions exist between the protein and the DNA; they extend by two phosphate residues beyond the recognition sequence. Altogether there are 28 backbone contacts, 20 of them are water-mediated.
BsoBI
BsoBI recognizes the degenerate sequence C
PyCGPuG. A remarkable feature of the co-crystal structure of the specific BsoBIDNA complex is the complete encirclement of the DNA by the protein to form a 20 Å long tunnel (52). Approximately 3800 Å2 of the solvent accessible surface of the enzyme and the DNA are buried in the proteinDNA interface (Fig. 1 and Table 2). As expected from its mode of cleavage, BsoBI approaches the DNA from the major groove side. Each subunit interacts with each recognition half-site and makes base-specific contacts in the major and minor grooves. The outer and inner CG base pairs are involved in several hydrogen bonds to the protein. Of particular interest is how this enzyme manages to accept a CG or TA and GC or AT base pair in the second and fourth position of the recognition sequence. This is now understood in structural terms because this PyPu base pair is involved in only one direct hydrogen bond between Lys81 and the N7 of the purine (A in the co-crystal structure), and in one water-mediated bidentate hydrogen bond between Asp246 and N7 of the purine as well as the substituent in position 6 of the purine (N6 of A in the co-crystal structure). In addition to these 22 hydrogen bonds to the bases, several van der Waals contacts to the bases exist as well as 64 hydrogen bonds (24 water-mediated) to the backbone per site. The backbone contacts extend to two residues to the left and right of the recognition sequence.
|
NgoMIV
For NgoMIV only the structure of the enzymeproduct complex has been determined (45). It is likely that many of the sequence-specific contacts required for the recognition of the substrate are preserved in the enzymeproduct complex (as is the case in the EcoRV and BamHI systems). NgoMIV approaches the DNA from the major groove side and makes most of the base-specific contacts in the major groove of the target sequence recognized (G
CCGGC). One subunit forms hydrogen bonds to the GCC half-site in the major groove, while the neighbor subunit forms a hydrogen bond to the C of the outer GC base pair in the minor groove. Base-specific contacts come from three structural elements, namely loops preceding
-helix 2, 7 and 8 (Fig. 4). It is noteworthy that three neighboring amino acids (Arg191, Asp193, Arg194) make all possible hydrogen bonds to the two adjacent GC base pairs in the major groove. Altogether there are 18 direct and two water-mediated hydrogen bonds to the bases of the NgoMIV recognition sequence. Interestingly, there is a hydrogen bond contact from Ser36 to the C on the 5'-side of the sequence, which may explain the flanking sequence preference of NgoMIV. In addition to the base-specific contacts, numerous contacts to the sugarphosphate backbone exist, mainly from the other subunit, such that direct readout for which one subunit is responsible is interwoven with indirect readout for which the other subunit is responsible. Altogether, there are six direct and eight water-mediated contacts to the DNAphosphates.
The region from Arg191Arg194 (RSDR) has a structural equivalent in Cfr10I (RPDR), which recognizes a similar recognition sequence as NgoMIV (compare Pu
CCGGPy with G
CCGGC) and also has a Glu residue six residues away which is part of the catalytic center of NgoMIV (45). It is very likely that the recognition of the adjacent GG sequence is done by Crf10I using the equivalent residues as in NgoMIV. One possibly could extend this suggestion to other restriction endonucleases that recognize adjacent GC base pairs (Table 3). In the lack of structural data or a detailed mutational analysis this is speculative. For some of these enzymes, for example SsoII (V.Pingoud, personal communication), biochemical evidence exists that the RXXR motif plays an important role in DNA binding.
|
The increasing numbers of co-crystal structures available for specific restriction endonucleaseDNA complexes and complementary biochemical studies allows us to make generalizations regarding the mechanism of DNA recognition. (i) Enzymes that produce blunt ends or sticky ends with 3'-overhangs approach the DNA from the minor groove side, whereas enzymes that produce sticky ends with 5'-overhangs contact the DNA from the major groove side. (ii) DNA binding is accompanied by more or less pronounced distortions of the DNA and conformational adaptations of the enzyme, which in many cases lead to a partial encircling of the DNA by the protein. (iii) Specific DNA binding is accompanied by the release of counter ions and partial dehydration of the enzyme and the DNA at the proteinDNA interface. (iv) Enzymes that produce blunt ends or sticky ends with 3'-overhangs mainly use a ß-strand and ß-like turn for DNA recognition. In contrast, enzymes that produce sticky ends with 5'-overhangs mainly use an
-helix and a loop. (v) Recognition is achieved by direct and indirect readout, i.e. base contacts, and backbone contacts, respectively. Contacts to the bases are predominantly in the major groove and usually exhaust the hydrogen bonding potential in the major groove. This means that a hexanucleotide sequence is recognized by
20 hydrogen bonds to the bases of the recognition sequence. Interactions with the backbone are often water-mediated. (vi) Individual recognition modules (short sequence motifs) begin to show up that are used by different restriction endonucleases to recognize common parts in similar recognition sites. | COUPLING BETWEEN RECOGNITION AND CATALYSIS |
|---|
|
|
|---|
Specific DNA binding by restriction endonucleases is defined as strong and, more importantly, preferential binding to the recognition site. Its outcome is what we usually see in the co-crystal structures or what we measure in binding experiments (including footprinting and crosslinking experiments). Specific binding does not necessarily mean recognition that is defined operationally, i.e. by the reaction that follows. By this definition the co-crystal structures of the specific restriction endonucleaseDNA complexes only mimic the recognition complex. In a similar argument, the results of binding experiments only address the mechanism of specific binding and not in the strictest sense the mechanism of recognition. Nevertheless, there is no doubt that the investigation of specific binding helps to understand recognition, presumably because the enzymesubstrate complex (studied in the absence of Mg2+ or in the presence of Ca2+) as well as the enzymeproduct complex (studied in the presence of Mg2+, after turnover) is very similar to the ground state complex in the presence of Mg2+. In this context it is important to note that the discrimination between specific and non-specific sites requires multiple contacts to be formed between enzyme and substrate. In order to prevent these contacts, formed in the ground state complex, from impairing the catalytic efficiency (given by the difference in the Gibbs free energy of the transition state complex and the ground state complex), it is necessary that these interactions must also stabilize the transition state (127). Therefore, the ground state complex is likely to resemble the transition state complex very much, differences being localized to the site of phosphodiester bond cleavage.
The coupling of specific binding, recognition and catalysis is ill understood. There have been many attempts to understand how the catalytic machinery is activated during the recognition process. Ideally one would like to see this by time-resolved crystallography or NMR. This, however, has not yet been achieved. Instead, crystallographic studies of wild-type and mutant enzymes with canonical and chemically modified substrates, in the absence and presence of divalent metal ion cofactors, have been carried out and their results interpreted together with the results of single and multiple turnover cleavage studies. The best studied system in this respect is the EcoRV system, for which the structures of different enzymesubstrate complexes in different crystal lattices were determined and for which detailed biochemical data were obtained. Recognition can be formally divided into direct and indirect readout. In EcoRV the recognition (R)-loop, comprising residues 182187, whose importance for recognition has been confirmed by a mutional analysis (128), makes all the base-specific contacts in the major groove. A hydrogen bond network links the R-loops to the scissile phosphates and the catalytic centers via Asn188 and Lys92 (94). Base recognition in the minor groove is accomplished by the glutamine (Q)-loop, comprising residues 6870. Gln69 is in close proximity to one catalytic center and via Thr37 also to the other catalytic center (94); these two residues are very important for catalysis (128132). Thr37 is also one of the key amino residues involved in indirect readout (94,96,130,131,133). The results of these investigations concerning coupling of recognition and catalysis assign a critical role to the symmetry related B-helices and Q-loops, which connect ß-strands c and d. This region is located at the floor of the DNA binding site, vis à vis the phosphodiester bonds to be cleaved. It is known that this region adopts slightly different conformations in various co-crystal structures of EcoRV (93), which makes it likely that it has sufficient conformational freedom to be involved in activation of the catalytic centers. Residues whose position is affected by these conformational changes include Asp36 and Lys38, which also have been shown by mutational analyses to be essential for EcoRV (134,135).
A major aspect of the mechanism of activation of the catalytic centers of restriction enzymes concerns the positioning of the divalent metal ion cofactors and the water molecules, one of which in each catalytic center must take up a position in-line with the phosphodiester bond to be cleaved. For EcoRV it has been shown that Asp74 and Asp90, as well as the scissile phosphate and its 3'-neighbor, which all cooperate in Mg2+ and water binding at the catalytic centers, take up slightly different positions in different co-crystal structures of EcoRV (93). This is not unexpected for catalytically relevant residues (128,129,136) in a complex that is not active in the crystal. Of course, one would like to know how all amino acid residues in specific contact with the bases and the backbone communicate with the catalytic centers. The fact that this communication must be highly cooperative will make it very difficult to identify an intramolecular signal transduction pathway. It must be admitted, therefore, that at present, it is at best partially understood how the catalytic centers of EcoRV or any other restriction endonuclease are activated during the recognition process. Probably this is the main reason why all efforts to change or expand the specificity of restriction endonucleases by rational, i.e. structure-guided, design failed so far (137,138) or were not as successful as one had hoped (139).
Coupling of recognition to catalysis not only concerns intrasubunit but also intersubunit communication, as restriction endonucleases in general catalyze a concerted double-strand cut. This means that the information of recognition must be passed on from one subunit to the other. As pointed out above, with few exceptions each subunit of a restriction enzyme makes contacts to both halves of the recognition sequence, which integrates the recognition process. This has been demonstrated directly for EcoRV, using artificial heterodimers (104,131,140) and for PvuII using a single chain variant (194): substitution of a residue involved in base-specific contacts in only one subunit affects cleavage in both strands, whereas substitution of a catalytic residue in one active center does not affect the other catalytic center and allows for cleavage in one strand (nicking).
| THE MECHANISM OF CATALYSIS OF PHOSPHODIESTER BOND CLEAVAGE BY RESTRICTION ENDONUCLEASES |
|---|
|
|
|---|






