ABSTRACT
The archaeal intron-encoded homing enzymes I-
Por
I and I-
Dmo
I belong to a family of endonucleases that contain two copies of a
characteristic LAGLIDADG motif. These endonucleases cleave their intron
-
or intein
-
alleles site-specifically, and thereby facilitate homing of the introns or inteins
which encode them. The protein structure and the mechanism of DNA recognition
of these homing enzymes is largely unknown. Therefore, we examined these
properties of I-
Por
I and I-
Dmo
I by protein footprinting. Both proteins were susceptible to proteolytic cleavage within regions that are equidistant from each of the two LAGLIDADG motifs. When complexed with
their DNA substrates, a characteristic subset of the exposed sites, located in
regions immediately after and 40-60 amino acids after each of the LAGLIDADG motifs, were protected. Our
data suggest that the enzymes are structured into two, tandemly repeated,
domains, each containing both the LAGLIDADG motif and two putative DNA binding
regions. The latter contains a potentially novel DNA binding motif conserved in
archaeal homing enzymes. The results are consistent with a model where the
LAGLIDADG endonucleases bind to their non-palindromic substrates as monomeric enzymes, with each of the two domains recognizing
one half of the DNA substrate.
Homing enzymes are site-specific DNases, encoded by introns or inteins. They specifically cleave
intron
-
or intein
-
alleles of their genes and, thereby, facilitate homing of the introns or
inteins that encode them (reviewed in refs
1
,
2
). Most homing enzymes belong to one of four protein families named according to
the presence of one or two, often highly degenerate, LAGLIDADG motifs, a GIY-YIG motif (
2
), a His-Cys motif (
3
) or an H-N-H motif (
4
,
5
). The LAGLIDADG-like motif was first recognized in yeast maturases (
6
), and later found in endonucleases encoded by group I introns, archaeal
introns, inteins and by separate open reading frames in yeast (reviewed in ref.
2
). Four archaeal introns have been found which encode putative LAGLIDADG-type proteins, one in
Desulfurococcus mobilis
23S rDNA (
7
,
8
), two in
Pyrobaculum organotrophum
23S rDNA (
9
) and one in
Pyrobaculum aerophilum
16S rDNA (
10
).
The archaeal introns all generate `bulge-helix-bulge' motifs at the exon-intron junctions of their RNA transcripts, that are cut by
an archaeal-specific endonuclease (
8
,
11
-
14
). The larger RNA introns (600-700 nucleotides), which encode the LAGLIDADG-type proteins, circularize and generate stable structures (
8
,
9
,
14
). The rDNA intron of
D.mobilis
encodes I-
Dmo
I, which has been shown to cleave intron
-
DNA
in vitro
(
15
). Furthermore, it has been demonstrated that the intron can move inter-cellularly, and home, in a
Sulfolobus acidocaldarius
culture, conferring a selective advantage on the intron
+
cells (
16
). I-
Por
I from intron 1 of the rDNA of
P.organotrophum
is also a homing-type enzyme capable of cleaving intron
-
DNA
in vitro
, whereas the LAGLIDADG-type protein encoded by intron 2 of the rDNA of the same organism exhibits
no DNA cleavage activity
in vitro
(
17
). It remains to be investigated whether the LAGLIDADG-type protein encoded by the
P.aerophilum
16S rDNA intron is an endonuclease. Three archaeal inteins which contain the
LAGLIDADG-like motif have been described (
18
,
19
). At least one of these, PI-
Tli
I, shows site-specific endonuclease activity (
18
).
LAGLIDADG-type homing enzymes cleave intron
-
/intein
-
alleles near the site of intron/intein-insertion, generating 3'-overhangs of 4 nucleotides with 5'-phosphates. They are highly specific for their cleavage sites, recognizing generally
non-palindromic DNA sequences of ~20 bp (
2
). However mutational analyses have shown that limited sequence redundancy is
tolerated (
17
,
20
-
24
). The functional domain structures of these proteins are less well understood.
Mutagenesis of the LAGLIDADG motifs, have shown that these are involved in
catalysis but not in DNA binding (
25
,
26
). No other conserved sequences have been recognized in these endonucleases, and
it it is still not known which parts of the proteins participate in the binding
of the DNA substrate. Therefore, we have employed a proteolytic protein
footprinting approach on I-
Por
I and I-
Dmo
I to identify regions involved in DNA binding. This method has previously been
used succesfully to map amino acid sequences involved in protein-RNA and protein-protein interactions (
27
,
28
). The approach involves limited proteolysis of the endonucleases, specifically
labeled at the N- or C-terminus, in the absence or presence of their DNA ligand. The data
indicate that the homing enzymes consist of two structurally similar, and
tandemly repeated, domains, each containing a LAGLIDADG motif and two potential DNA binding regions. Sequence alignment with other archaeal LAGLIDADG-type proteins revealed a conserved sequence, which may constitute a novel DNA binding motif.
In order to generate fusion protein expression vectors for the two LAGLIDADG-type proteins, the polymerase chain reaction (PCR) was performed on a plasmid containing intron 1 of
P.organotrophum
23S rDNA (
14
), using oligodeoxynucleotide primers 5'-GAGGATCCATGGATATATTCCAGTATG and 5'-GAGAATTCCGAGGTCAAGATAATGGC, and on a plasmid containing the
D.mobilis
rDNA intron (
8
), using oligodeoxynucleotide primers 5'-GAGGATCCATGCATAATAATtoGAGAATG and 5'-GAGAATTCCCTCGGGGGGCAGGGGGTT. These primers generated PCR products with
Bam
HI and
Eco
RI restriction sites at the upstream and downstream ends, respectively. The PCR
products were cleaved with
Bam
HI and
Eco
RI (intron 1 of
P.organotrophum
was cleaved partially with
Eco
RI to avoid an internal site), and the resulting DNA fragments of 522 bp (I-
Por
I) and 564 bp (I-
Dmo
I) were purified from a 1.5% agarose gel, and ligated with
Bam
HI/
Eco
RI-cleaved pET-HTG and pGEX-GTH vectors, described in Jensen
et al.
(
29
), yielding pET-HTG-
Por
I, pGEX-GTH-
Por
I, pET-HTG-
Dmo
I and pGEX-GTH-
Dmo
I.
Cleavage assays were performed using pUC-P.isl and pUC-D.muc as substrates for I-
Por
I and I-
Dmo
I, respectively. pUC-P.isl contains a 205 bp PCR fragment of
P.islandicum
23S rDNA [positions 1955-2159,
D. mobilis
numbering (
30
)], and pUC-D.muc carries a 245 bp PCR fragment of
D.mucosus
23S rDNA [positions 1915-2159,
D.mobilis
numbering (
30
)], both inserted into the
Hin
cII site of pUC19. The rDNA sequences of
P.islandicum
and
D.mucosus
do not contain introns, but exhibit the same rDNA sequences in the vicinity of
the intron-insertion sites as
P.organotrophum
and
D.mobilis
, respectively. Synthetic 25 bp DNA substrates were used in the protein footprinting experiments. They were generated by
mixing two complementary 25 nucleotide oligodeoxynucleotides at 20 [mu]M in 10 mM Tris-HCl (pH 8.0), 100 mM KCl, heating at 95oC for 30 s, followed by incubation at 65oC for 5 min and slow cooling to room temperature. The
synthetic oligodeoxynucleotides used were 5'-GCGAGCCCGTAAGGGTGTGTACGGG and 5'-GCCTTGCCGGGTAAGTTCCGGCGCG and their complementary sequences, for I-
Por
I and I-
Dmo
I respectively.
Overnight cultures of
E.coli
BL21/DE3 containing plasmids pET-HTG-
Por
I, pGEX-GTH-
Por
I, pET-HTG-
Dmo
I or pGEX-GTH-
Dmo
I were diluted 1:50 in 500 ml LB media containing ampicillin at 100 [mu]g/ml, and incubated at 30oC to an A
600
value of ~0.8. Isopropyl-1-thio-[beta]-d-galactopyranoside was added to 0.5 mM,
followed by a 2 h incubation. Cells were harvested and resuspended in 25 ml of
PBS buffer [140 mM NaCl, 2.7 mM KCl, 10 mM Na
2
HPO
4
, 1.8 mM KH
2
PO
4
(pH 7.3)] containing 0.2 mM phenylmethylsulfonyl fluoride, 0.5 [mu]g/ml leupeptin and 2 [mu]g/ml aprotinin. After sonication, 2.5 ml 10% Triton X-100 was added, and cell debris was removed by centrifuging at 11
000 r.p.m. for 15 min. A volume of 250 [mu]l glutathione Sepharose (Pharmacia) was added to the supernatant, and it was
incubated at room temperature for 1 h with gentle shaking. Sepharose beads were
collected by centrifugation, washed five times with 4 ml PBS buffer,
resuspended in 3.2 ml PBS buffer, 0.8 ml 80% glycerol and frozen at -80oC in aliquots of 100 [mu]l.
Ten microlitres of Sephadex G75 (Pharmacia, 75 mg/ml in H
2
O) was added as carrier to 100 [mu]l fusion protein bound to glutathione Sepharose, prepared as described
above. The mixture was washed three times with 1 ml of HMK buffer [20 mM Tris-HCl (pH 7.5), 100 mM NaCl, 12 mM MgCl
2
], and resuspended in 100 [mu]l HMK buffer. Ten units bovine heart muscle kinase (Sigma) and 33 [mu]Ci [[gamma]-
32
P]ATP (ICN, 7000 Ci/mmol) was added, and the mixture was incubated at room
temperature for 1 h with gentle shaking. Sepharose beads were washed five times
with 1 ml of 10 mM Tris-HCl (pH 8.0), 50 mM KCl, resuspended in 80 [mu]l elution buffer [20 mM glutathione, 100 mM Tris-HCl (pH 8.0), 120 mM NaCl], and incubated at room temperature
for 30 min with gentle shaking. Beads were removed by centrifugation, 0.2 U
thrombin (Sigma) was added to the supernatant and the mixture was incubated at 4oC for 2 h. Fusion protein from the pGEX-GTH-
Por
I vector needed 2.0 U thrombin for 64 h to remove the GST-tag. Bovine serum albumin (BSA) and Triton X-100 were added to final concentrations of 0.5 mg/ml and 0.1%,
respectively. The mixture was applied to a 1 ml Sephadex G75 gel filtration
column pre-equilibrated with, and run in, TKT buffer [10 mM Tris-HCl (pH 8.0), 100 mM KCl, 0.1% Triton X-100] and 20 [mu]l fractions were collected. Fractions containing full
length endonuclease were pooled, adjusted to 35% glycerol, and stored at -20oC.
I-
Por
I and I-
Dmo
I were mixed with
Pvu
II cleaved pUC-P.isl and
Pvu
II cleaved pUC-D.muc, respectively, in 10 [mu]l binding buffer [50 mM HEPES-KOH (pH 8.0), 100 mM KCl, 1 mM DTT, 2%
glycerol, 0.05% Triton X-100]. MgCl
2
was added to 10 mM and reaction mixes were incubated at 80oC (I-
Por
I) or 65oC (I-
Dmo
I) for 15 min. DNA was precipitated with ethanol, redissolved in glycerol loading buffer [10 mM Tris-HCl (pH 8.0), 0.1 mM EDTA, 5% glycerol, 0.05% bromophenol blue] and run on an agarose gel.
About 20 000 d.p.m.
32
P-labeled endonuclease (~1 pmol) was incubated in a siliconized micro-titer plate with 20 pmol of the 25 bp DNA substrate in 10 [mu]l binding buffer containing 0.2 [mu]g/[mu]l BSA at 65oC for 5 min. Samples were transferred to a 50oC bath and incubated for 5 min, followed
by addition of 10 [mu]l of proteinase in binding buffer. After incubation at 50oC for 15 min, 4 [mu]l of 100 mM Tris-HCl (pH 8.3), 4% sodium dodecylsulfate (SDS), 50 mM DTT,
0.2% bromophenol blue, 40% glycerol was added, and the samples were denatured
at 95oC for 2 min, and run on a 20 * 40 * 0.04 cm 7% stacking/20% separation
polyacrylamide/SDS-Tricin gel (
31
). Gels were subjected to autoradiography.
Protein domains involved in DNA binding are likely to become less susceptible to
proteinase digestion upon DNA binding. In order to identify the DNA binding
domains of I-
Por
I and I-
Dmo
I, N- and C-terminal
32
P-labeled enzymes were incubated at 65oC, in the absence of Mg
2+
, with 25 bp DNA substrates containing the respective cleavage sites. Co-precipitation experiments, using biotinylated DNA substrate on
streptavidin coated magnetic beads, showed that under these conditions the endonucleases bind to, but do not cleave, their substrates (data not shown). The temperature was reduced to 50oC and the accessibility of the protein polypeptide chain was assesed by
probing with 10 proteinases of different specificities (Table
1
). As a negative control, for each experiment, the substrate was replaced by a
similar sized unrelated DNA fragment. Controls with no DNA gave proteinase
cleavages identical to the controls containing unrelated DNA showing that only
the cognate DNA bound to the protein (data not shown). Protein fragments were separated on SDS/polyacrylamide gels, which were subjected to autoradiography (Figs
3
and
4
). The bands were assigned to specific amino acids on the basis of band patterns
generated with the site-specific proteinases. As a test for correct assignment, the mobilities of
the protein fragments were plotted against the logarithm of their molecular
weights, and they generally produced a straight line for masses above 5 kDa (Fig.
5
), in good agreement with earlier reports (
27
,
31
).
The proteinase footprinting approach used in this study has proven a simple and
effective method for determining ligand binding sites on proteins. Previously
it was applied successfully to studying the protein structure of the HIV-1 Rev protein and its interactions with the RNA substrate, cognate monoclonal antibodies and cellular proteins (
27
,
28
). Here we have invoked this approach to establish the DNA binding regions of two intron-encoded homing enzymes, I-
Por
I and I-
Dmo
I from hyperthermophilic archaea.
I-
Por
I and I-
Dmo
I were probed with 10 different proteinases, either alone or bound to their DNA
substrates. The proteinase cleavages clustered, most clearly for I-
Por
I, within proteinase sensitive regions ~0-20 and 40-60 amino acids after each of the LAGLIDADG motifs,
interrupted by a relatively proteinase resistant region of ~20 amino acids (Fig.
6
). In the presence of the DNA substrate, a similar protection pattern occurred
relative to the position of the two LAGLIDADG motifs in both proteins. For I-
Por
I protections were observed at a distance of 4-10 and 41-45 amino acids C-terminal to both of the LAGLIDADG motifs (Figs
6
A and
7
A). In I-
Dmo
I protections were observed six amino acids after the first LAGLIDADG motif and
39-55 amino acids after the second motif. The protected regions are likely
to be involved directly in DNA binding, although some of the protections may be caused by conformational changes induced by the DNA. In addition,
enhancements were observed 74 amino acids after the first, and 36 and 51 amino
acids after the second LAGLIDADG motif (Figs
6
B and
7
A).
Almost all proteins belonging to the LAGLIDADG family, contain two copies of the
LAGLIDADG motif, and it has previously been noted that the distance between the
two motifs is similar to the distance between the second motif and the C-terminus of the protein (
32
). An alignment of the regions C-terminal to the LAGLIDADG motifs revealed some sequence similarities,
coinciding with the regions affected by DNA binding (Fig.
7
A). The significance of this conservation is strengthened by sequence
comparisons with other known archaeal LAGLIDADG-type proteins (Fig.
7
A). Each protein contains a motif with the consensus sequence
K/R
,K/R-(3 aa)-
Y/F
-(6-7 aa)- K/R,
E/D
,K/R (based on at least 70 and 50% identity at the bold face and normal face positions, respectively) located 40-68 amino acids after the first LAGLIDADG motif, and repeated 37-60 amino acids after the second motif. The only exception, for the latter,
is the protein encoded by intron 2 of
P.organotrophum
23S rDNA (Por-I2, Fig.
7
A), which exhibits no DNA endonuclease activity either in total cell extracts or
when expressed
in vitro
(
17
). Furthermore, the intein-encoded LAGLIDADG-type proteins, which have extended C-terminal sequences as compared with the intron-encoded species, contain two to three copies of the
putative DNA binding motif in their C-terminus (Fig.
7
). A sequence search within the eukaryotic LAGLIDADG-type homing enzymes revealed no such motif in the corresponding region. Based on these
results, we propose a putative DNA binding motif, K/R,K/R-(3 aa.)-Y/F-(6-7 aa.)-K/R,E/D,K/R, which, together with the LAGLIDADG
motif, defines the borders of a repetitive domain in archaeal homing enzymes (Fig.
7
B).
The cleavage sites of >10 different LAGLIDADG homing enzymes have been characterized, and the general picture to emerge is that the
recognition sequence spans ~20 bp of DNA, surrounding the centrally located 4 bp 3'-staggered cleavage site (
17
,
20
,
23
,
24
,
33
,
34
). Mutational studies have shown that the sequence recognition is flexible and
that both sides of the cleavage sites are important (
17
,
20
-
24
). All the enzymes, except I-
Cre
I, contain non-palindromic recognition sequences with respect to their cleavage sites
(ref.
2
and references therein). It is therefore conceivable that the homing enzymes
have evolved to recognize the non-palindromic DNA substrate as monomeric enzymes, allowing each of the two
repetitive domains to interact with one of the two halves, symmetrically
positioned around the DNA cleavage site. A similar mode of binding to non-palindromic DNA substrates has earlier been demonstrated for the specificity subunit of type I-restriction enzymes (
35
-
37
). It is possible, that I-
Cre
I, which is exceptionally short compared with other eukaryotic homing enzymes and contains only one recognizable LAGLIDADG motif (
38
), is a one-domain protein, that binds as a homodimer to its palindromic substrate, as
is the case for type II-restriction enzymes. During the evolution of the homing enzymes
recombination of two different LAGLIDADG protein genes, or duplication of a
single gene, could have given rise to monomeric enzymes. As a consequence, the
requirement for symmetry at the intron homing site could be relieved, providing
a biological advantage by expanding the possible homing site repertoire.
We thank Torben H. Jensen and Thomas Ø. Tange for technical advice on protein footprinting experiments. The work was supported in part by grants from Novo Nordic Fund and Danish Natural Science
Research Council. J.L. was supported by Copenhagen University.
REFERENCES
Return




