| Nucleic Acids Research | Article |
Recombination and chimeragenesis by in vitro heteroduplex formation and in vivo repair
Introduction
Materials And Methods
Preparation of recombination templates
Heteroduplex formation directly from parent plasmids
Insert heteroduplex formation and ligation into vector
Transformation and determination of recombination efficiency
Results
Recombination test system
Heteroduplex plasmid recombination
Effect of plasmid size on heteroduplex formation
Insert heteroduplex recombination
Elimination of parental homoduplexes from heteroduplex preparations
Discussion
Conclusions
Acknowledgement
References
Recombination and chimeragenesis by in vitro heteroduplex formation and in vivo xrepair
Received January 28, 1999; Revised May 30, 1999; Accepted June 14, 1999
ABSTRACT We describe a simple method for creating libraries of chimeric DNA sequences derived from homologous parental sequences. A heteroduplex formed in vitro is used to transform bacterial cells where repair of regions of non-identity in the heteroduplex creates a library of new, recombined sequences composed of elements from each parent. Heteroduplex recombination provides a convenient addition to existing DNA recombination methods (`DNA shuffling') and should be particularly useful for recombining large genes or entire operons. This method can be used to create libraries of chimeric polynucleotides and proteins for directed evolution to improve their properties or to study structure-function relationships. We also describe a simple test system for evaluating the performance of DNA recombination methods in which recombination of genes encoding truncated green fluorescent protein (GFP) reconstructs the full-length gene and restores its characteristic fluorescence. Comprising seven truncated GFP constructs, this system can be used to evaluate the efficiency of recombination between mismatches separated by as few as 24 bp and as many as 463 bp. The optimized heteroduplex recombination protocol is quite efficient, generating nearly 30% fluorescent colonies for recombination between two genes containing stop codons 463 bp apart (compared to a theoretical limit of 50%).
INTRODUCTION
The directed or in vitro evolution of polynucleotides and polypeptides generally involves repetitive cycles of mutagenesis and screening or selection. Recombination can accelerate the improvement of a target function by accumulating beneficial mutations and by eliminating deleterious ones in the evolving population (1). Furthermore, screening or selection of chimeric libraries made by recombining homologous genes may give rise to improved, or even completely novel, functions (2,3).
Several methods by which a set of DNA sequences can be recombined to create these chimeric sequence libraries have been reported. The PCR-based in vitro recombination methods (`DNA shuffling'), DNA fragmentation and reassembly (1,4), random priming recombination (5) and the staggered extension process (StEP) (6,7) all require synthesis of significant amounts of DNA during the assembly/recombination step and subsequent amplification of the final products. Although PCR in theory can be used to amplify very long sequences, in practice the efficiency of amplification decreases significantly for very long sequences. Another drawback of PCR amplification, the introduction of unwanted mutations, is particularly problematic for long sequences.
Yeast cells possess a very active system for recombination of linear, partially overlapping double-stranded DNA fragments. Cells transformed with a vector and partially overlapping inserts can efficiently join them together in the regions of homology and restore a functional, covalently closed plasmid (8,9). Depending on the ability of the cells to take up large pieces of DNA, this in vivo recombination approach is free from the size limitation associated with the PCR-based approaches. However, the number of crossovers introduced in one recombination experiment is rather low. The low efficiency of transformation with multiple inserts also means that recombination will likely be largely pairwise. Two other pairwise in vivo recombination methods utilize two parental genes cloned on the same plasmid in tandem. Homologous recombination in bacterial cells produces chimeric genes in which the first gene provides the N-terminal part of the target protein and the second the C-terminal part (10). In the second method the plasmid is linearized by endonuclease digestion at a position inbetween the parental sequences before transformation into Escherichia coli (11). Recombination is performed in vivo by the enzymes responsible for double-strand break repair (12). Both methods generate only one crossover.
Here we describe a convenient hybrid in vitro-in vivo DNA recombination method which generates multiple crossovers and which neither suffers the limitations of PCR-based approaches nor requires transformation with multiple gene fragments. This method, which we call in vitro heteroduplex formation and in vivo repair, or heteroduplex recombination for short, is based on the ability of host cells to repair mismatched heteroduplexes. Heteroduplexes of the parental sequences are prepared and transformed into an appropriate host. To the extent that each mismatch or region of non-identity between the parents is repaired independently, libraries of chimeric genes are generated (Fig. 1). Heteroduplex constructs have been widely used in studying mechanisms of DNA mismatch repair (13-15). However, we found that preparing the heteroduplexes from whole plasmids using standard protocols is relatively ineffective for recombination of some DNA sequences. We therefore identified improvements to preparation of the heteroduplexes in vitro that help to achieve significantly higher recombination efficiency.
Figure 1. To the extent that regions of non-identity are repaired independently, in vivo repair of a heteroduplex gives rise to a library of parent and recombined (bold) sequences.
We also describe a simple and convenient fluorescence-based test system suitable for optimizing recombination conditions or for comparing different recombination methods with respect to crossover frequency and recombination efficiency. This test system uses recombination between truncated variants of Aequorea victoria green fluorescent protein (GFP) to generate full-length GFP and restore fluorescence. The efficiency of recombination between any two parental sequences can be determined from the fraction of bacterial clones that are fluorescent.
It should be noted that the recombination process we refer to here does not involve physical exchange between two double-stranded DNA molecules, but is rather a `functional recombination', i.e. generating the same kind of chimeric products. The mechanism for this functional recombination can be mismatch repair of two or more separated mutations, which gives rise to a set of sequences containing different combinations of those mutations. Not all mismatch repair events give rise to functional recombination, however, as some repair events lead to formation of parent sequences (Fig. 1).
MATERIALS AND METHODS
Preparation of recombination templates
Truncated GFP DNA sequences for recombination experiments were created by site-directed mutagenesis of plasmid pGFP (GenBank accession no. U17997; Clontech, CA). Mutations were introduced using the Chameleon site-directed mutagenesis kit (Stratagene, CA). Table 1 lists mutagenic primers, mutation positions and restriction endonucleases associated with each mutant.
Table 1. GFP recombination templates and mutagenic primers used to construct them
| GFP recombination template (position of stop codon) | Mutagenic primer sequence | Restriction endonuclease site |
| 205 | ctttctcttatggtgttTAAtgctagctcaagatacccagatc | NheI |
| 313 | caaagatgacgggTAAtagatctacacgtgctgaagtc | BglII |
| 421 | gaaacattcttggacacaaaTAAtatgcataactataactcacacaatg | NsiI |
| 529 | gaagatggaagcgttTAAtggatccgaccattatcaacaaaatactc | BamHI |
| 604 | gacaaccattacctgTAAtggtacccctgccctttcg | KpnI |
| 637 | cgaaagatcccaacTAAttctagagaccacatggtcc | XbaI |
Heteroduplex formation directly from parent plasmids
Approximately 2-4 µg of each parent plasmid were used in each recombination experiment. One parent plasmid was digested with PstI endonuclease, the other with EcoRI. Linearized plasmids were mixed together and purified using the Qiagen PCR purification kit. Aliquots of 20× SSPE buffer was added to eluted plasmids (50-100 µl) to a final concentration of 1× SSPE (180 mM NaCl, 1 mM EDTA, 10 mM NaH2PO4, pH 7.4). The reaction mixture was heated at 96°C for 4 min, immediately placed on ice for 4 min and incubated for 2 h at 68°C. Products were kept on ice until transformation.
Insert heteroduplex formation and ligation into vector
GFP recombination templates were amplified in separate PCR reactions with primers directed to the pGFP vector (in the proximity of the protein coding sequence). The forward primer was GFP13 (5[prime]-CCGACTGGAAAGCGGGCAGTG-3[prime]), the reverse primer GFP14 (5[prime]-CCGCATAGTTAAGCCAGCCCCG-3[prime]). PCR reaction conditions: 1× PCR (10 mM Tris-HCl, 1.5 mM MgCl2, 50 mM KCl, pH 8.3), 0.2 mM each dNTP, 30-50 pmol of each primer, 1-10 ng of template and 5 U of Taq polymerase in a final volume of 50 µl. Amplification was performed for 30 cycles of 30 s at 94°C, 30 s at 56°C and 1 min at 72°C. PCR products were mixed together and purified using a Qiagen PCR purification kit. Purified products were mixed with 20× SSPE buffer and used to form heteroduplexes, according to the procedure described above. Annealed products were either precipitated with ethanol or purified on Qiagen PCR purification columns and digested with PstI and EcoRI. Digested products were ligated into the gel-purified PstI-EcoRI fragment of pGFP.
Transformation and determination of recombination efficiency
The products of heteroduplex formation were transformed into E.coli strain XL10 by a modified chemical transformation method (SuperComp protocol; Bio101 Inc., CA). Bacteria were plated on LB agar with 100 µg/ml ampicillin and grown overnight at 37°C. Agar plates containing grown colonies were further incubated at room temperature or at 4°C until fluorescence developed. Plates were illuminated with UV light (366 nm) for counting fluorescent and non-fluorescent colonies. The relationship between fraction of colonies that are fluorescent and recombination efficiency depends on the number of stop codons in the templates and whether or not parent homoduplexes are removed. If homoduplexes are removed and two single stop codon templates are recombined (Figs 1 and 3a) then as many as 50% of the colonies could be fluorescent (assuming no mutation and 100% recombination efficiency).
RESULTS
Recombination test system
To develop a convenient test system for recombination, seven truncated GFP recombination templates were prepared (Table 1). Templates 205, 313, 421, 529, 604 and 637 contain a single stop codon at the designated position, while 421-637 contains two stop codons ~200 bp apart. The stop codons interrupt translation and result in the synthesis of truncated products that are not fluorescent. The TAA stop codon was followed by an additional T introduced to shift the reading frame and prevent occasional read-through events that might lead to synthesis of full-length GFP and interfere with identification of true recombination events. Immediately after the TAAT sequence, each template also has a unique restriction endonuclease recognition site (Table 1) introduced for identification of the GFP variants during mutagenesis.
The single stop codon templates can be recombined pairwise with mismatches separated by distances ranging from 24 to 423 bp (Fig. 2). For example, if template 637 is recombined with each of the others, restoration of fluorescence requires a single recombination between the following mismatch distances: 24, 99, 207, 315 and 423 bp. When double mutant 421-637 is recombined with single stop codon template 529, two recombination events are required to restore the wild-type sequence, each one to occur within 99 bp (Fig. 2).
Figure 2. Recombination between two truncated GFP variants can lead to the restoration of full-length protein. Stop codons are indicated by small rectangles.
Figure 3. (a) Heteroduplex recombination using heteroduplexes prepared directly from parent plasmids. Circular parental plasmids are linearized on either side of the target gene using restriction endonucleases with unique cutting sites. Linear plasmids are mixed, denatured and annealed. Possible products of this reaction are parent homoduplexes (linear) and recombinant heteroduplexes (circular). Only circular plasmids efficiently transform bacterial cells. (b) Heteroduplex recombination using heteroduplexes formed by insert hybridization and ligation. Target genes are amplified in a PCR reaction, mixed and annealed together. After digestion with appropriate restriction endonucleases, the annealing products are ligated into a vector. Asymmetric synthesis should be used to suppress parent homoduplexes.
Heteroduplex plasmid recombination
The in vivo repair of mismatches present in a heteroduplex plasmid can generate new, chimeric sequences provided the mismatched regions are repaired independently and repair is either not complete or not directed entirely to one strand (Fig. 1). Fully random repair of a heteroduplex with two mismatch sites should generate 25% full-length GFP and 75% truncated sequences. If heteroduplexes are formed directly from the parent plasmids, as illustrated in Figure 3a, the fluorescent colony fraction could reach as high as 50% (100% of the transformants are recombinants), since the linear, parent sequences will not efficiently transform. Initial attempts to generate chimeric sequences by this approach, however, showed very low recombination efficiency. Only a few percent of the bacterial colonies carried active GFP, even for recombination of sequences in which two stop codons were separated by hundreds of base pairs (see below).
One factor we felt might be responsible for the low recombination efficiency is the existence of two nicks in the heteroduplex plasmids (Fig. 3a). To test this hypothesis the heteroduplex plasmids were treated with DNA ligase to close single-strand breaks. Recombination efficiency was measured after transformation with the ligated heteroduplex and an unligated sample for comparison. As shown in Table 2, heteroduplex recombination using the ligated whole plasmid heteroduplexes show improved recombination efficiency (up to ~7-fold for mutations separated by 99 bp). The maximum fraction of fluorescent colonies was never >10%, however, and very little recombination was observed for the most closely spaced mutations.This was also supported by our observation that dUTP incorporation into target genes decreased recombination efficiency (data not shown). Uracil N-glycosylase of the host cells recognizes and initiates repair of incorporated dUMP nucleotides, which results in the introduction of single-strand breaks (16).
Table 2. Effect of DNA ligation of plasmid heteroduplexes on efficiency of recombination
| Recombination templates | Distance between mutations (bp) | Heteroduplex plasmid with single-strand breaks | Heteroduplex plasmid treated with DNA ligase |
| 205 and 637 | 423 | 3% | 10% |
| 313 and 637 | 315 | 3% | 9% |
| 421 and 637 | 207 | 1.3% | 7% |
| 529 and 637 | 99 | 1.1% | 8% |
| 604 and 637 | 24 | 0 | <1% |
Effect of plasmid size on heteroduplex formation
Plasmid heteroduplex formation results in a mixture of products, including linearized and circular heteroduplexes. Only circular plasmids efficiently transform bacterial cells. Plasmid size was found to have a large effect on the yield of transformation-competent plasmid. We compared the efficiency of circularization for two plasmids, pGFP (3.3 kb) and Bacillus shuttle vector pCT1 (K.Miyazaki, unpublished results) (~9 kb). (Because only one parental plasmid was used, actual heteroduplexes were not formed.) Two aliquots of the plasmid were digested with different enzymes to produce linearized vectors. Plasmid pGFP was digested with HindIII and SpeI endonucleases, while pCT1 was digested with BamHI and Bsu36I. The digested plasmids were mixed, purified and annealed as described in Materials and Methods. Under these conditions the 3.3 kb pGFP reformed ~30-40% circular plasmid, while the 9 kb vector yielded <10% circular form (data not shown). Even the relatively small pGFP plasmid forms a significant amount of linearized product, forcing the researcher to start the experiment with large amounts of DNA in order to obtain large numbers of transformants.
Insert heteroduplex recombination
Both problems-the single-strand nicks and the poor efficiency of circular heteroduplex formation-can be solved by forming a heteroduplex of only the target sequences and ligating that into the vector, as illustrated in Figure 3b. The ligation automatically takes care of all nicks and annealing depends only on the size of the target sequences and their level of identity and is no longer affected by the vector.
Using this approach, different templates were recombined pairwise to determine recombination efficiency for different distances between repaired regions. Parental genes were amplified by PCR with primers GFP13 and GFP14, mixed, annealed and ligated back into the pGFP vector. Table 3 shows the results of three separate recombination trials using the different templates. As expected, recombination becomes less efficient when the distance between mutations decreases. The heteroduplex method can nonetheless recombine mutations separated by only 24 bp, albeit at low efficiency. Recombination between a single (529) and a double mutant (412-637) also takes place, generating ~2% fluorescent clones (versus 12.5% theoretical). Wild-type GFP can only be restored in this case in the event of double recombination with each event occurring within 99 bp.
Table 3. Insert heteroduplex recombination (Fig. 3b): relationship between mutation distance and recombination efficiency
| Recombination templates | Distance between mutations (bp) | Insert heteroduplex (three trials) | Insert heteroduplex, asymmetric strand synthesis |
| 205 and 637 | 423 | 15-18% | 29% |
| 313 and 637 | 315 | 12-14% | 25% |
| 421 and 637 | 207 | 8-10% | 18% |
| 529 and 637 | 99 | 10-11% | 16% |
| 604 and 637 | 24 | 0.7-1.0% | 1.2% |
| 421-637 and 529 | 99 + 99 | 1.4-2.1% | 2.8% |
Elimination of parental homoduplexes from heteroduplex preparations
Conventional whole plasmid heteroduplex formation offers one important advantage over the insert heteroduplex protocol: it eliminates the parental (homoduplex) plasmids from the resulting library. Complementary strands originating from a single parent and annealed together form linear molecules (Fig. 3a) which are normally extremely inefficient in transformation. Only heteroduplexes, having strands from different parents, can form a circular plasmid. In the insert heteroduplex protocol (Fig. 3b), parental sequences are formed during insert annealing with 50% probability and create a significant background of non-recombinant clones.
Several approaches can be used to eliminate the parental homoduplexes. Most of them require selective removal or degradation of one strand in each parent. We used an equally effective, but technically less complex and more reliable, approach. Two asymmetric PCR reactions were performed, each containing only one recombination template and one PCR primer (GFP13 for one parent and GFP14 for the other). Each reaction synthesized only one strand. PCR reactions seeded with a previously amplified and purified gene fragment (~5 ng) were run for 100 cycles with single primers ensuring significant excess of one strand over another. The products of these asymmetrical reactions were mixed together and annealed, producing only a minor amount of non-recombinant homoduplexes. The products were purified, digested with PstI and EcoRI restriction endonucleases and cloned into the PstI-EcoRI fragment of pGFP. The last column in Table 3 shows the fraction of fluorescent clones obtained from these enriched heteroduplexes. Asymmetric synthesis of the parental strands provides a significant improvement in the fraction of colonies having recombined sequences. For mismatches separated by 423 bp, the fraction of fluorescent clones is now nearly 30% (60% recombinants).
DISCUSSION
Although there is more than one mechanism for mismatch repair in E.coli, `long patch' repair (17-19) is probably responsible for the effective recombination in this heteroduplex repair approach. Mismatches are recognized by proteins MutS and MutL, which activate the endonuclease MutH. MutH binds to GATC sequences and introduces a single-strand break in one of the DNA strands. In hemimethylated duplexes, endonuclease preferentially cuts the unmethylated strand. In duplexes composed of both methylated or both unmethylated strands, MutH cuts either strand. Helicase II recognizes the break and starts unwinding DNA preferentially in the direction of the mismatch, clearing the path for DNA polymerase III to fill the gap by copying the second strand of DNA. The MutH protein is also known to have low intrinsic activity in the absence of MutS and MutL (15). This background activity may be responsible for repair of relatively large loops (up to 38 nt) which are not recognized by MutS (20,21). It should therefore be possible to recombine heteroduplexes containing large regions of non-identity using this approach.
Whole plasmid heteroduplex construction (Fig. 3a) generates heteroduplexes carrying nicks or small gaps which we have found to significantly decrease recombination efficiency. The initial events in DNA mismatch repair are the recognition of a mismatch by MutS and MutL, activation of MutH endonuclease and formation of a nick on one of the strands (17). Whole plasmid heteroduplexes already have two nicks and present a perfect substrate for the remaining part of the repair mechanism. The pre-existing nicks may serve as primary initiation points for repair or as a shunt, suppressing initiation at other regions. The location of the nicks outside the target sequence means that productive recombination can occur only when repair polymerization terminates within the target sequence.
The use of covalently closed heteroduplexes without any breaks or gaps increases the yield of chimeric constructs. Gaps and nicks can be filled in in whole plasmid heteroduplexes. However, parental linearized plasmids would also be ligated, significantly increasing the non-recombinant background. We therefore constructed heteroduplexes from the target sequences and ligated those into the vector. Because insert hybridization reintroduces the large non-recombinant background from parent sequences, asymmetric or single primer target sequence synthesis was used to suppress formation of parental duplexes. By amplifying each parent template in a PCR reaction with a single primer, parents contribute only one strand each to the resulting heteroduplex, which greatly limits the possibility of homoduplex formation. Asymmetric synthesis, however, cannot completely prevent homoduplex formation. Some strands can come from the seeded template, but by running the synthesis for many cycles one can ensure that asymmetrically synthesized strands are the predominant species.
Insert annealing solves another problem associated with preparing heteroduplexes from whole plasmids. Efficient transformation into bacterial cells requires that the plasmid be circular. It is not sufficient to anneal the vector sequence and leave the single-strand ends unpaired. Although these linear heteroduplexes could still enter the cells, they would be repaired by the double-strand break repair system, resulting in a single crossover. The efficiency of the circularization reaction decreases rapidly with increasing plasmid size. For large vectors, the amount of the circular form may be so low that the background from uncut molecules becomes significant. Increasing plasmid size decreases the concentration of the ends and makes annealing of relatively long (>0.8 kb) single-strand ends more difficult. Even though insert ligation is a bimolecular reaction and may seem to be less favorable than intramolecular annealing in whole plasmids, in practice it works significantly better.
Insert annealing offers a further convenience with respect to whole plasmid annealing. The cloning vector, prepared once, can last for hundreds of experiments, replacing the lengthy procedure of digesting each parent for each experiment using whole plasmid heteroduplexes. Also, the quality of the vector needs to be controlled only once. In contrast, two control transformations are required to estimate the amount of uncut plasmid in each experiment using whole plasmid heteroduplexes.
The E.coli DNA repair system is capable of resolving multiple and closely spaced mismatch sites to form effectively recombined sequences. In part this effect may be explained by the bi-directional nature of DNA repair (22) and the fact that both initiation and termination of repair synthesis can potentially lead to recombination. Other researchers have in fact reported multiple crossovers as a result of in vivo repair (13).
DNA polymerase III has very high processivity and can incorporate >1000 nt per binding event in the course of mismatch repair. It creates a potential for recombining large DNA constructs or entire operons. Heteroduplex recombination may be suitable for recombining DNA sequences that are too big for the PCR-based DNA recombination method yet too small for efficient recombination using the cellular homologous recombination machinery. For the relatively small (0.7 kb) GFP gene, the asymmetric PCR reaction was a logical choice. When recombining significantly larger genes one would like to avoid PCR reactions. In these cases asymmetric PCR should be replaced by selective degradation or removal of one strand.
CONCLUSIONS
The optimized insert heteroduplex recombination protocol efficiently generates chimeric DNA libraries with sequence components from two parent sequences. Multiple homologous sequences can be recombined and `shuffled' by this approach, simply by repeating the heteroduplex formation and transformation steps with additional sequences. We have described several improvements-insert annealing, ligation of the insert heteroduplexes into a cloning vector and asymmetric synthesis of parental strands-to the heteroduplex formation step that provide significant increases in the fraction of clones containing recombined sequences. This method should be effective for recombining relatively large target sequences.
ACKNOWLEDGEMENT
This research was supported in part by the Department of Energy.
REFERENCES
*To whom correspondence should be addressed. Tel: +1 626 395 4162; Fax: +1 626 568 8743; Email: frances{at}cheme.caltech.edu The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: jnl.info{at}oup.co.uk
Last modification:
Copyright© Oxford University Press, 1999.
This article has been cited by other articles:
![]() |
L. Yuan, I. Kurek, J. English, and R. Keenan Laboratory-Directed Protein Evolution Microbiol. Mol. Biol. Rev., September 1, 2005; 69(3): 373 - 392. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Meyer, J. J. Silberg, C. A. Voigt, J. B. Endelman, S. L. Mayo, Z.-G. Wang, and F. H. Arnold Library analysis of SCHEMA-guided protein recombination Protein Sci., August 1, 2003; 12(8): 1686 - 1693. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Miyazaki Random DNA fragmentation with endonuclease V: application to DNA shuffling Nucleic Acids Res., December 15, 2002; 30(24): e139 - e139. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Abecassis, D. Pompon, and G. Truan High efficiency family shuffling based on multi-step PCR and in vivo DNA recombination in yeast: statistical and functional analysis of a combinatorial library between human cytochrome P450 1A1 and 1A2 Nucleic Acids Res., October 15, 2000; 28(20): e88 - e88. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||





