Homing endonucleases: keeping the house in orderMarlene Belfort* and Richard J. Roberts1
Molecular Genetics Program, Wadsworth Center, New York State Department of Health, and School of Public Health, State University of New York at Albany, PO Box 22002, Albany, New York 12201-2002, USA and 1New England Biolabs, 32 Tozer Road, Beverly, MA 01915-5599, USA
Received May 27, 1997;Revised and Accepted July 18, 1997
ABSTRACT
Homing endonucleases are rare-cutting enzymes encoded by introns and inteins. They have striking structural and functional properties that distinguish them from restriction enzymes. Nomenclature conventions analogous to those for restriction enzymes have been developed for the homing endonucleases. Recent progress in understanding the structure and function of the four families of homing enzymes is reviewed. Of particular interest are the first reported structures of homing endonucleases of the LAGLIDADG family. The exploitation of the homing enzymes in genome analysis and recombination research is also summarized. Finally, the evolution of homing endonucleases is considered, both at the structure-function level and in terms of their persistence in widely divergent biological systems.
INTRODUCTION
Several endonucleases encoded by introns and inteins in the three biological kingdoms have been shown to promote the homing of their respective genetic elements into allelic intronless and inteinless sites (reviewed in 1 -3 ). By making a site-specific double-strand break in the intronless or inteinless alleles, these nucleases create recombinogenic ends which engage in a gene conversion process that duplicates the intron or intein (Fig. 1 ). The homing enzymes that initiate the mobility process can be grouped into families, which share structural and functional properties with each other and with some freestanding, intergenic endonucleases. Regardless of whether these enzymes have been shown to be involved in DNA rearrangements, they are collectively termed homing endonucleases.
DISTINGUISHING CHARACTERISTICS OF HOMING ENDONUCLEASES
Although most homing endonucleases share with restriction enzymes the ability to make a site-specific double-strand break in the DNA target, they differ in structure, recognition properties, and genomic location (Table 1 ). First, the vast majority of homing endonucleases fall within one of four families, characterized by the sequence motifs LAGLIDADG, GIY-YIG, H-N-H and His-Cys box (1 ). In contrast, restriction enzymes do not fall within easily recognizable families. Although the PDX9-18 (E/D)XK motif has been associated with the catalytic center of several restriction enzymes (4 -6 ), its occurrence by chance is so frequent that it alone cannot be considered indicative of endonuclease function. Type II restriction enzymes only share significant sequence similarity if they are isoschizomers (recognize identical DNA sequences).
Second, homing endonucleases have recognition sequences that span 12-40 bp of DNA, whereas restriction enzymes recognize much shorter stretches of DNA, in the 3-8 bp range (7 -9 ). Although the homing endonucleases are rather tolerant of single-base-pair changes in their lengthy DNA interaction sites, the restriction enzymes are highly sensitive to single-site mutations in their short recognition sequences (8 -10 ). Furthermore, general asymmetry of homing endonuclease target sequences contrasts with the characteristic dyad symmetry of most restriction enzyme recognition sites (7 -9 ).
Third, the enzymes have different molecular associations. Homing endonucleases act as monomers or homodimers and, while some function independently of accessory molecules, others require associated proteins to regulate their activity (11 ). Yet other homing endonucleases form ribonucleoprotein (RNP) complexes, wherein RNA molecules are integral components of the catalytic apparatus (12 ). Restriction enzymes can also function either alone, as monomers or homodimers (the Type II enzymes) (10 ), or with additional protein subunits (the Type I and Type III enzymes) (13 ), but the accessory subunits are quite different from those of the homing endonucleases. Thus, the Type I enzymes require restriction, modification, and specificity subunits for their action, while the Type III enzymes require only modification subunits for cleavage to occur (13 ).
Finally, the phylogenetic distribution of the two types of enzymes differs. Homing endonucleases have been found in all three biological kingdoms-the archaea, bacteria, and eukarya-whereas restriction enzymes occur only in archaea, bacteria and certain eukaryotic viruses (7 -9 ). In addition to being phylogenetically widespread, homing endonucleases are expressed in different compartments of the eukaryotic cell: nuclei; mitochondria; and chloroplasts. Their genomic microenvironment also differs. The homing endonuclease open reading frames occur in introns, inteins, and in freestanding form between genes, whereas restriction enzyme genes have been found only in freestanding form, almost always in close association with genes encoding cognate modifying enzymes (14 ). Thus, while the restriction enzymes and homing endonucleases share the function of cleaving double-stranded DNA, they appear to have evolved independently.
. Comparison of homing endonucleases and restriction enzymes
Property
Homing endonuclease
Restriction enzyme
1. Conserved protein motifs
Four i. LAGLIDADG
None definitive1
ii. GIY-YIG
iii. H-N-H
iv. His-Cys
2. Recognition sequences
a. Lengthy (12-40 bp)
a. Short (3-8 bp)
b. Asymmetric
b. Symmetric and asymmetric
c. Sequence-tolerant
c. Sequence-specific
3. Accessory molecules
Some require protein or RNA
Some require methyltransferase
components for full activity
or specificity subunits
4. Genomic location
a. Intron, intein, or intergenic
a. Flanking modification gene
b. All three biological
b. Confined to archaea, bacteria and some
kingdoms
eukaryotic viruses
1The loosely-defined PDX9-18 (E/D)XK motif may be present (see text).
NOMENCLATURE CONVENTIONS
RECENT STRUCTURE-FUNCTION INSIGHTS INTO HOMING ENDONUCLEASES
Dissection of protein structure and DNA-protein interaction is proceeding apace for the four families of homing endonucleases. For classification and common sequence features of the four enzyme families the reader is referred to ref. 1 and citations therein. LAGLIDADG endonucleases under intensive study include intron-derived enzymes I-CreI, I-CpaII, I-DmoI, and I-PorI, and the intein endonuclease PI-SceI. Also the subject of active investigation are intron endonucleases I-TevI, I-PpoI, and I-SceV, members of the GIY-YIG, His-Cys and H-N-H endonuclease families, respectively. The origin and characteristics of these endonucleases are summarized in Tables 2 and 3 , and reviewed in refs. 1 -3 ,8 ,9 ,19 . Following is a brief review focusing on the literature of the past year. Although there are some clear differences in the architecture and action between the best-studied LAGLIDADG and GIY-YIG families of endonucleases, presumably reflecting their independent ancestry, some common themes also emerge in the properties of these distinctive endonucleases.
Endonuclease and homing-site structure and properties
LAGLIDADG endonucleases. LAGLIDADG enzymes exist with either one or two LAGLIDADG motifs (Table 2 ) (1 ), which have been implicated in endonuclease function (20 -22 ). In a major breakthrough for the field, the structures of a single-motif enzyme, I-CreI, and a two-motif enzyme, PI-SceI, have recently been determined (23 ,24 ). I-CreI, solved at 3.0 Å resolution, forms a homodimer, with dimensions consistent with its ability to recognize its lengthy homing site, variously estimated at 19-24 bp (24 ). The 163 residue I-CreI monomers form an elongated protein with a half-cylindrical groove of ~25 * 25 * 35 Å (Fig. 2 ). The homodimer forms an extended saddle of ~70 Å for DNA binding, with its undersurface consisting of four antiparallel [beta]-strands, which are likely to contact the substrate. The LAGLIDADG motifs are proposed to form the dimer interface, while simultaneously positioning conserved aspartate residues (Asp20 of each monomer) adjacent to the scissile phosphates (Fig. 2 ). These residues may function to coordinate Mg2+, while conserved arginines are also implicated in catalysis. Considering the symmetry of the dimer, one aspartate from each monomer could allow simultaneous attack across the minor groove to generate 4-nt 3" overhangs (23 ).
Dynamic interactions and catalysis
The degree to which the details of enzymatic catalysis will be similar for the different homing endonuclease families is unknown. However, interesting themes are apparent in the dynamic properties of the enzyme-DNA complex. These include distortions of the homing site upon endonuclease binding and catalysis, distance sensing as well as site specificity of the enzymes, and product retention by some of the endonucleases.
In general, the distortions induced by the homing endonucleases are in the 40-90o range. Directed bending can promote catalysis in at least three different ways. First, a distortion can facilitate contact between the relatively small proteins and two separated regions of interaction on the DNA as a prerequisite to the transition state, as has been proposed for PI-SceI and I-TevII (27 ,39 ). Second, distortion of the substrate can position the scissile phosphates in the active site, as is the case for the restriction enzyme EcoRV (40 ), and as also suggested for PI-SceI (26 ). Third, a bend toward the major groove can widen the minor groove and facilitate catalysis by allowing access of minor-groove binding enzymes to the scissile phosphates, as has been proposed for I-TevI (30 ).
Two distortions have been observed for PI-SceI, as well as for I-TevI and I-TevII, although for I-TevI one of these distortions is subtle (26 ,27 ,30 ,39 ). Interestingly, while DNA is structurally intact in each of the PI-SceI bent complexes, one of the distortions induced by both I- TevI and I-TevII is associated with a nick in one strand. For I-TevI the nick-associated distortion is a directed bend close to the cleavage site. Nicking, which occurs in the absence of added Mg2+ for both I-TevI and I-TevII, implies a sequential cleavage mechanism, although cleavage of both strands appears to occur concomitantly in the presence of Mg2+. Preferential nicking of one strand is a property shared by some LAGLIDADG enzymes, as for example I-SceI (28 ), I-ChuI, I-CeuI and I-CpaII (41 ). Others exhibit concerted cleavage, as for example PI-SceI (27 ). Enzymes I-TevI, I-TevII, and I-CpaII may resemble EcoRV, for which the preference for sequential or concerted cleavage has been linked to divalent cation availability (30 ,41 ,42 ). It is unclear for the homing enzymes whether sequential versus concerted cleavage represents a mechanistic or simply a kinetic difference. The ways in which divalent cations and DNA distortions promote catalysis in these different enzyme systems must await further biochemical analysis and structure determination of the enzyme-substrate complexes.
Another feature of the GIY-YIG protein I-TevI that may be common to the LAGLIDADG enzymes is manifest once binding has been established. Genetic experiments indicate that I-TevI selects its cut site by both distance sensing and sequence discrimination (31 ,43 ). Based on a comparison of homing sites, a similar cleavage-site selection mechanism has been proposed for PI-SceI (26 ). Furthermore, both enzymes, like I-SceI, F-SceII and I-TevII, remain bound to one cleavage product (27 ,39 ,44 ,45 ). In contrast, both I-PorI and I-PpoI appear to be released after catalysis (cited in 27 ). Persistent binding of the endonuclease to one of the cleavage products can have genetic consequences in the ensuing recombination events (44 ).
It is noteworthy that I-TevI, I-TevII, F-TevI, and F-TevII, all GIY-YIG endonucleases, generate 2-nt 3" extensions, whereas all LAGLIDADG enzymes characterized to date leave 4-nt 3" extensions. A classification of restriction endonuclease structures on the basis of cleavage pattern (i.e., the nature and length of single-strand extensions) has been proposed (4 ,46 ,47 ). Given the foregoing, it will be of interest to see whether for the GIY-YIG, LAGLIDADG, and other families of nucleases, the position of scissile phosphates is also a correlative feature with the structure and catalytic properties of the homing enzymes.
UTILITY OF HOMING ENDONUCLEASES IN GENE MANIPULATION AND RECOMBINATION RESEARCH
Rare-cutting endonucleases allow one to introduce one or a few double-strand breaks into complex genomes. This capability makes the homing enzymes useful tools for analyzing and manipulating genomes for mapping, gene cloning and targeting, and for studying double-strand-break (DSB) repair in diverse biological systems.
Genome analysis and gene manipulation
Mapping. Genome mapping strategies have been based both on naturally available cleavage sites and on the introduction of cleavage sites as chromosomal landmarks. The homing endonucleases have been used in combination with rare-cutting restriction enzymes to map a variety of bacterial genomes, and have been particularly useful for analyzing chromosomal organization. I-CeuI, for example, cleaves only in the rRNA genes of many bacterial strains, which harbor multiple copies of the gene. Mapping I-CeuI fragments therefore allows one to probe genome configuration in the rDNA region. Such approaches have underscored chromosomal plasticity in several bacterial species (48 ,49 ).
Mapping has also been achieved by introduction of novel sites into genomes, with transposons engineered to contain cleavage sites. These approaches have been used for the study of genome organization, and for chromosome fragmentation for cloning large DNA fragments in bacteria and yeast. Available systems include mini-Tn10::I-SceI and Tn5::I-SceI cassettes for use in Gram-negative bacteria (50 -52 ) and a retrotransposon-based Ty1::I-DmoI cartridge for use in yeast (53 ). The Ty1 system has been useful in analyzing both native yeast chromosomes, and mouse genes in yeast artificial chromosomes (YACs). Genetically engineered I-SceI sites in yeast have also been helpful for physical mapping of yeast contigs for yeast genome sequencing (54 ).Cloning. Vectors have been developed for cloning large fragments generated by homing endonucleases. These include plasmid vectors with a multiple cloning site containing homing-endonuclease cleavage sites for PI-SceI, I-PpoI, I-CeuI, and PI-TliI (55 ). Cosmid vectors with an I-SceI site are also available (56 ).Targeting. Double-strand breaks (DSBs) are recombinogenic, facilitating both homologous and non-homologous recombination events. It has been shown that homologous recombination can be stimulated 10- to 1000-fold in both pro- and eukaryotic systems by a DSB. This phenomenon provides a means for targeting integration events from a transformed or transfected sequence to the chromosome by introduction of a DSB in a homologous sequence on the genome. Such gene-targeting strategies have been facilitated by the expression of homing endonucleases in fungal, plant, and mammalian cells (reviewed in 57) or by electroporation of purified enzyme into cells (58 ). The existence or introduction of sites at defined positions within genomes therefore creates the ability to engineer targeted deletions or insertions into genomes in many different biological systems. Indeed, gene targeting has been achieved in embryonic stem cells expressing I-SceI, paving the way to create transgenic animals by homing endonuclease-directed gene targeting (reviewed in 57 ).
Homing endonucleases in studies of DSB repair
DSBs must be repaired in all organisms to maintain chromosomal integrity and viability. DSB repair also plays a role in DNA rearrangements such as intron mobility, in both prokaryotic and eukaryotic systems (reviewed in 1 ,9 ). Furthermore, mating-type switching in fungi (59 ), transposition in flies (60 ), and V(D)J recombination in mammalian cells (61 ) are all DSB-dependent events. The highly specific homing endonucleases provide the ability to direct discrete breaks into genomes generating isolated foci for study of both homologous and non-homologous DSB-repair events.Homology-dependent events. Homing endonucleases have been used to study homology-dependent DSB-repair pathways in phage T4 (62 ,63 ), yeast (2 ,64 ,65 ), plants (66 ,67 ), and mammalian cells (57 ,68 ). Not only have these studies shed light on the functional requirements of the repair events, but they have illuminated different recombination pathways. One emerging theme from these studies has been the tight coupling of DNA replication and DSB repair in gene conversion events in which foreign sequences are used to repair the breaks, as in group I intron homing. These studies have taken advantage of I-TevI in the phage system and the HO endonuclease, F-SceII, in yeast (62 -64 ).
Another area in which we have gained new insight is in RNA-dependent group II intron homing, also called retrohoming, with I-SceV and I-SceVI. As part of RNP complexes, these remarkable intron-encoded proteins, which have reverse transcriptase and RNA maturase function in addition to endonuclease activity, are physically associated with the excised intron RNA. The RNA cleaves the sense strand of the DNA homing site by reverse splicing while the protein cleaves the antisense strand, to generate a primer for reverse transcription of the intron RNA (12 ,69 ). Alternatively, the intron RNA can insert itself directly into double-stranded DNA (70 ). In either event, repair occurs via a cDNA copy of the intron RNA.Homology-independent events. Homing endonucleases have also been used to study homology-independent events in bacterial/phage (71 ), fungal (72 ,73 ), and mammalian systems (57 ,68 ). An interesting finding emerged from studies with the HO-endonuclease, F-SceII, to initiate DSB repair in S.cerevisiae in which homologous recombination had been inhibited (72 ,73 ). Under such conditions most of the DSBs were repaired by end-joining events similar to those found in mammalian cells, whereas ~1% of the events reflected capture of cDNAs corresponding to Ty1 retrotransposon mRNA. While such events have some features in common with group II retrohoming, they provide a possible mechanism for insertion of pseudogenes and short and long interspersed nuclear sequences (SINEs and LINEs) in eukaryotic genomes. Clearly, the repair of DSBs by foreign DNAs, including endogenous retroelements, is important in the evolution of genomes.
EVOLUTION OF HOMING ENDONUCLEASES
In considering the evolution of homing endonucleases one must address questions at both the structure-function level, and at the level of the persistence of these apparently discretionary elements in biological systems.
Evolution of endonuclease structure
It has been proposed that the double LAGLIDADG motif homing endonucleases evolved from the single-motif enzymes by a gene duplication event. In support of this argument are protein footprinting experiments of the two-motif archaeal endonucleases I-DmoI and I-PorI. The results suggest that these enzymes consist of two repeats with each containing one LAGLIDADG motif (74 ). Furthermore, single-motif enzymes like I-CreI act as dimers on pseudopalindromic substrates (23 ), whereas double-motif enzymes like PI-SceI bind as monomers on asymmetric substrates (27 ). The gene duplication hypothesis is lent further credence by the structure of these enzymes. The substrate-binding surface of I-CreI is created by the symmetric juxtaposition of the monomers about the LAGLIDADG motifs; in much the same way, the DNA-binding structure in the nuclease domain of PI-SceI has a pseudo 2-fold symmetry about its two LAGLIDADG motifs (23 ,24 ). It has been postulated that the derived two-motif duplicated monomers have evolved a relaxed requirement for symmetry, thereby allowing the enzymes to acquire an expanded substrate repertoire (26 ,74 ).
Whereas in the above scenario binding and catalytic regions would be interdigitated in each half of the double-motif LAGLIDADG enzymes, they are separated by a flexible tether in the monomeric GIY-YIG enzyme I-TevI (Fig. 3 ). Although the GIY-YIG motif that forms the catalytic domain of I-TevI is conserved in different GIY-YIG proteins, the DNA-binding region is variant. It has therefore been proposed that the GIY-YIG domain is a catalytic cartridge that can be combined with different DNA-binding proteins to evolve nucleases with altered specificities (32 ).
Endonuclease persistence in diverse organisms
Homing endonuclease genes have been considered highly invasive elements that gain access into genomes by virtue of the ability of their products to make DSBs and promote recombination. Their propagation is ensured when these parasitic elements find refuge in introns and inteins (reviewed in 8 ). While their sheer invasiveness would secure the persistence and dissemination of endonuclease genes in biological systems, there have also been examples of the selective advantage to organisms with intron- encoded endonucleases. These include the ability of the phage SP82 intron endonuclease to exclude genetic markers of related phage in mixed infections (38 ), and the selective advantage of an archaeal rDNA intron to Sulfolobus acidocaldarius (75 ).
While restriction enzymes may similarly be of advantage to bacteria through their ability to limit phage infection, restriction-modification systems have also been shown to behave in a selfish manner (76 ). Why then do the intron and intein endonucleases engage in self-propagating homing reactions, whereas restriction enzymes do not? One possibility is related to the rarity of homing endonuclease cut sites. On the one hand, the action of the frequently cutting restriction enzymes is precluded in vivo by their cognate modifying enzymes, except with foreign unmodified DNA, which is likely to be degraded by the enzyme and unable to perpetuate a homing event. On the other hand, the recognition site of the homing endonucleases is so large that there is likely to be only one site per genome; once cleaved and occupied by the endonuclease-encoding ORF, further cleavage at this site would be prevented, while homing to similar unoccupied sites would be ensured. A second possibility is related to the tendency of homing endonucleases to tenaciously bind their cleavage products via their lengthy recognition sequences and thereby influence subsequent recombination events (44 ). Regardless of why homing endonucleases promote DNA rearrangements, their ability to do so is an important factor in the evolution of genomes.
ACKNOWLEDGEMENTS
We are thankful to Barry Stoddard, Fred Gimble and Florante Quiocho for providing endonuclease structures before publication, and Shmuel Pietrokovski and Richard Waring for sharing unpublished data. We are also grateful to Mary Bryk, Elaine Davis, Vicky Derbyshire, Fred Gimble, Debbie Hinton, Claude Jacq, Claude Lemieux, Alan Lambowitz, Ray Monat, Fran Perler, Phil Perlman, Shmuel Pietrokovski, Alfred Pingoud, David Shub, Barry Stoddard, Jeremy Thorner, Monique Turmel and members of the Belfort Laboratory for their constructive suggestions on this review. Thanks also to Maureen Belisle, George Silva and Patrick VanRoey for help with the figures, and to Maryellen Carl for preparing the manuscript. Work in the authors' laboratories is supported by grants from the NIH, GM39422 and GM44844 to MB, and LM04971 to RR.
REFERENCES
1 Belfort, M. and Perlman, P.S. (1995) J. Biol. Chem.270, 30237-30240.MEDLINE Abstract
6 Pingoud, A. and Jeltsch, A. (1997) Eur. J. Biochem. 246, 1-22.MEDLINE Abstract
7 Roberts, R.J. and Macelis, D. (1997) Nucleic Acids Res. 25, 248-262.
8 Mueller, J.E., Bryk, M., Loizos, N. and Belfort, M. (1993) In Linn, S.M., Lloyd, R.S. and Roberts, R.J., eds, Nucleases 2nd edn. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, pp. 111-143.
9 Lambowitz, A.M. and Belfort, M. (1993) Annu. Rev. Biochem. 62, 587-622.MEDLINE Abstract
10 Roberts, R.J. and Halford, S.E. (1993) In Linn, S.M., Lloyd, R.S. and Roberts, R.J., eds, Nucleases 2nd edn. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, pp. 35-88.
11 Shibata, T., Nakagawa, K. and Morishima, N. (1995) Adv. Biophys. 31, 77-91.MEDLINE Abstract
12 Zimmerly, S., Guo, H., Eskes, R., Yang, J., Perlman, P.S. and Lambowitz, A. (1995) Cell 83, 529-538.MEDLINE Abstract
13 Bickle, T.A. (1993) In Linn, S.M., Lloyd, R.S. and Roberts, R.J., eds, Nucleases 2nd edn. Cold Spring Harbor: Cold Spring Harbor Laboratory Press, pp. 89-109.
43 Bell-Pedersen, D., Quirk, S.M., Bryk, M. and Belfort, M. (1991) Proc. Natl. Acad. Sci. USA 88, 7719-7723.
44 Mueller, J.E., Smith, D. and Belfort, M. (1996) Genes Dev. 10, 2158-2166.MEDLINE Abstract
45 Jin, Y., Binkowski, G., Simon, L.D. and Norris, D. (1997) J. Biol. Chem. 272, 7352-7359.MEDLINE Abstract
46 Athanasiadis, A., Vlassi, M., Kotsifaki, D., Tucker, P.A., Wilson, K.S. and Kokkinidis, M. (1994) Struct. Biol. 1, 469-475.
47 Newman, M., Strzelecka, T., Dorner, L.F., Schildkraut, I. and Aggarwal, A.K. (1994) Nature 368, 660-664.MEDLINE Abstract
48 Liu, S.L. and Sanderson, K.E. (1996) Proc. Natl. Acad. Sci. USA 93, 10303-10308.MEDLINE Abstract
49 Toda, T. and Itaya, M. (1995) Microbiology 141, 1937-1945.MEDLINE Abstract
50 Bloch, C.A., Rode, C.K., Obreque, V.H. and Mahillon, J. (1996) Biochem. Biophys. Res. Commun. 223, 104-111.MEDLINE Abstract
51 Mahillon, J., Rode, C.K., Leonard, C. and Bloch, C.A. (1997) Gene 187, 273-279.MEDLINE Abstract
52 Jumas-Bilak, E., Maugard, C., Michaux-Charachon, S., Allardet-Servent, A., Perrin, A., O'Callaghan, D. and Ramuz, M. (1995) Microbiology 141, 2425-2432.
53 Dalgaard, J.Z., Banerjee, M. and Curcio, M.J. (1996) Genetics 143, 673-683.MEDLINE Abstract
54 Thierry, A., Gaillon, L., Galibert, F. and Dujon, B. (1995) Yeast 11, 121-135.MEDLINE Abstract
55 Asselbergs, F.A.M. and Rival, S. (1996) BioTechniques 20, 558-562.