ABSTRACT
Very complex mutant libraries of the dihydrofolate reductase (DHFR) gene encoded
by the
Escherichia coli
plasmid R67 were created using hypermutagenic PCR with biased deoxynucleotide
triphosphate (dNTP) concentrations. Exploiting the particular stability of the
G:T mismatch, the DHFR gene could be enriched in A+T by employing biased
deoxypyrimidine triphosphate concentrations, i.e. [dTTP] > [dCTP]. A sizeable
fraction of hypermutants were functional. A combination of [dTTP] > [dCTP] and
[dGTP] > [dATP] biases generated mutations at unexpectedly low frequencies.
This could be overcome by the addition of Mn
2+
cations. Overall mutation frequencies of 10% per amplification (range 4-18% per clone) could be attained. All four transitions and a smaller
number of transversions were produced throughout the gene. PCR mutagenesis
could be so extensive as to inactivate all amplified versions of the gene.
Although the mutation rates of DNA based organisms vary, they are considerably
less than one per genome per cycle. Those of the RNA viruses may approach two
to four substitutions per genome per cycle (
1
). Such rates must represent the upper end of the spectrum compatible with
viability as they may be only slightly increased by chemical mutagenesis (
2
). Higher mutation rates almost certainly result in extinction. However, apart
from this obvious restriction there is nothing
per se
to prohibit higher mutation rates
in vitro
or hypermutation restricted to small regions of a genome or gene segment
in vivo
(
3
). Perhaps the most startling example of this is retroviral G -> A hypermutation where hundreds of templated Gs may be copied into As (
4
-
6
). This is a particular trait of the lentiviral family of retroviruses, which
includes human immunodeficiency virus (HIV), and results from cDNA synthesis in
the presence of highly biased [dTTP]/[dCTP] ratios (
6
).
G -> A hypermutation can be reproduced
in vitro
using RNA, biased dNTP concentrations and preferentially the HIV-1 reverse transcriptase (
7
-
9
). Referred to as RNA hypermutagenesis, this method delivers elevated mutation
and mutant frequencies, <= 0.1 per G per cycle and >0.9 per DHFR gene per cycle respectively. The
complexity of the resulting libraries of hypermutated sequences was limited by
the monotony of G -> A hypermutation. Despite this, iterative hypermutagenesis of a bacterial
antibiotic resistance gene, the
Escherichia coli
R67 DHFR, resulted in substitution of up to 23% of amino acids without loss of
phenotype (
10
).
Genes and genomes exhibit G+C- or A+T-rich segments so that it would be useful to have a method capable of
enriching any sequence in either. Just as dNTP biases are mutagenic for reverse
transcription (
7
,
11
) so they are for PCR (
12
-
16
), although the magnitude of the bias has to be less to allow reasonably
efficient amplification. PCR has the advantage that both strands may be
mutated. A [dTTP] > [dCTP] bias would allow enrichment in A and T while a
[dGTP] > [dATP] bias would permit the converse. These biases generate G
t(template)
:T and T
t
:G mismatches respectively which are the most stable of the 12 possible (
17
). By combining both deoxypyrimidine and deoxypurine triphosphate biases, it is
shown here that PCR can be hypermutagenic to an unprecedented degree.
The oligonucleotides used for amplification of the R67 DHFR gene have been
described (
10
). PCR reactions were carried out in the following reaction mixture: 10 mM Tris-HCl pH 8.3, 50 mM KCl, 2.5 mM MgCl
2
, 100 pmol of each primer and 5 U
Taq
polymerase (Roche). The dNTP concentrations are described in the tables and
legends. Input was ~5 ng plasmid DNA. The cycling parameters were: 50* (95oC, 30 s; 60oC 30 s; 72oC 10 min). Long elongation times were used to favour
elongation after mismatches. Vent (Biolabs) and rTth (Roche) DNA polymerases
were used at 2 and 2.5 U per reaction. MnCl
2
and dNTPs were purchased from Sigma and Pharmacia. PCR products were cloned via
Sac
I and
Bam
HI restriction sites and individual colonies picked, grown up and sequenced as
described (
10
). A few products were cloned into the
Sac
I and
Bam
HI site of M13mp18 RF DNA. Recombinants were sequenced using thermosequenase
(USB Amersham).
Unlike the
E.coli
chromosomal counterpart, the R67 DHFR gene is resistant to trimethoprim (trim
R
). As the pTrc99A (Stratagene) cloning vector confers resistance to ampicillin
(ampi
R
) the ratio of the number of colonies on trimethoprim plus ampicillin and
ampicillin only plates yields the proportion of functional genes post-PCR. The plating efficiencies of wild-type DHFR construct on trimethoprim and ampicillin plates were
comparable. Greater than 90% of ampi
R
colonies had DHFR inserts.
Given the modified amplification protocol PCR conditions were first optimized
for primer, magnesium,
Taq
DNA polymerase concentrations and number of cycles. As usual there was a strong
Mg
2+
dependence for all the thermostable polymerases used, the 2.5-5 mM range proving satisfactory. Particularly with large dNTP biases 30
cycles of PCR yielded relatively little product. Fifty cycles allowed adequate
recovery for all but one reaction involving a 1000-fold [dTTP]/[dCTP] bias. In this case a further 25 cycles with equimolar
dNTPs were performed as a chase. The efficiency of a standard amplification
with equimolar 50 [mu]M dNTPs was not affected by the addition of 1 mM ATP indicating that any
increase in the ionic strength resulting from the addition of millimolar
triphosphate did not alter PCR yields (data not shown).
Table
1
gives viable mutant frequencies following DNA hypermutagenesis with increasing
[dTTP] > [dCTP] biases. The inverse relationship between the proportion of trim
R
colonies as a function of the total (i.e. ampi
R
) with increasing bias reflects the extent of DNA hypermutation. The overall
mutation frequency for the entire amplification was inversely proportional to
the dNTP bias and attained values as high as 2.9 * 10
-2
substitutions per base per reaction for the ampi
R
clones (Table
2
). A collection of hypermutated trim
R
sequences is given in Figure
1
A. Up to five amino acid substitutions per functional clone (6.5%) were obtained
which were generally well distributed throughout the sequence. Among the most
hypermutated ampi
R
clones up to 15 (6.5%) nucleotides and 11 (14%) amino acids respectively were
replaced (not shown). The vast majority of substitutions were GC -> AT transitions, as predicted from G
t
:T mispairing on both strands due to the [dTTP] > [dCTP] bias. A small
proportion (6%) of transversions were noted, uniquely A -> T and T -> A, to be expected from what is known about the ability of
Taq
DNA polymerase to elongate after mismatches (
18
,
19
).
The particularities of the G
t
:T mismatch ensured A+T enrichment of the R67 DHFR gene. Alternatively a [dGTP]
> [dATP] bias would have generated T
t
:G mismatches with resulting G+C enrichment. Yet if all four base transitions
could be generated during a single reaction the resulting mutant libraries
would be among the most complex possible accessing an even greater proportion
of sequence space. This is in principle possible if both a [dTTP] > [dCTP] and
[dGTP] > [dATP] bias were used during PCR. However, no product whatsoever was
obtained with a 1000- or 300-fold biases in both ratios. Only with <200-fold biases was this possible. Sequencing of the trim
R
hypermutated products yielded unexpectedly low mutation frequencies (Table
2
).
Transition metal ions such as manganese (Mn
2+
) and cobalt (Co
2+
) may decrease the fidelity of DNA synthesis including PCR (
12
,
20
,
21
). Addition of MnCl
2
to a final concentration of 0.5 mM in a reaction with both [dTTP]/[dCTP] =
[dGTP]/[dATP] = 1000 [mu]M/30 [mu]M overcame the enhanced fidelity noted above. The overall base mutation
frequency could be increased from ~10
-3
to ~10
-1
per site per amplification (Table
2
). In fact, the PCR was so error prone that no trim
R
colonies (0 trim
R
/600 ampi
R
) were identified. A collection of 34 clones is given in Figure
1
B, mutants starting with a minimum of 10 substitutions (4%) per clone. The
maximum number was 41 (18%) per clone. The proportion of transversions (31%)
was greatly enhanced by the addition of Mn
2+
and was accompanied by a few deletions and even fewer single base insertions
(Fig.
1
B). There was no correlation between the proportion of synonymous (s) to non-synonymous (ns) base substitutions within this or any other data sets (not
shown).
Figure
1
C collates amino acid replacements from all the data sets and indicates that
hypermutagenic PCR may introduce between one and seven (mean 3.7) different
amino acids per residue. The large 755 mutation data set resulting from
manganese mutagenesis was analyzed for substitution biases. The mutation
matrix, normalized for base composition effects, showed almost perfect strand
symmetry (i.e. G -> C [approx] C -> G, etc.) (Fig.
2
). However, there was a bias for AT -> GC transitions which perhaps may be attributable to subtle differences
between G
t
:T and T
t
:G mismatches in the
Taq
DNA polymerization site. Once again, A -> T and T -> A were the most frequent transversions.
Balanced DNA precursor concentrations are clearly crucial to the fidelity of
cellular DNA or retroviral cDNA synthesis
in vivo
and
in vitro
(
22
-
25
). The same is true of PCR, the present findings reproducing and extending
earlier work (
12
,
13
,
16
). The nature of the dNTP bias generally produced the substitution expected from
G:T mispairing once again highlighting the importance of this most stable of
base mismatches to hypermutation (
7
,
8
). Perhaps surprisingly, the fidelity of amplification was enhanced many fold
when both deoxypyrimidine and deoxypurine triphosphate biases were used (Table
2
). This might result from the fact that although G:T mismatches are being forced
so were G:G and T:T mispairs. From what is known of
Taq
DNA polymerase elongation beyond mismatches, G:G represents one of the most
substantial blocks to elongation and consequently amplification (
18
,
19
). By contrast T:T mismatches pose fewer problems. The addition of Mn
2+
ions, known to be mutagenic for DNA synthesis by a variety of mechanisms
including modification of the relative
K
m
s of mismatches and matches (
20
,
21
), overcame this problem. The 100-fold enhanced overall mutation frequency was indeed so great that no trim
R
clones could be derived.
With a double dNTP bias and manganese ions there was an excess of transitions
towards G+C which was not strand-specific (Fig.
2
). Clearly this could be countered by increasing [dTTP] or decreasing [dGTP] in
the reaction. There was evidence that the distribution of mutations was not
completely random. However, significant deviations from the expected values
were noted for only a few substitutions.
A comparison of RNA and DNA hypermutagenesis is telling (
7
,
8
). The HIV-1 reverse transcriptase error rate per pass is clearly greater than
Taq
DNA polymerase. Among the hundreds of RNA molecules hypermutated
in vitro
by the HIV-1 reverse transcriptase, up to 32% of G targets were substituted for one
clone with a best mean of 11%, all in a single cycle of cDNA synthesis (
7
). However, given the monotony (e.g. G -> A) of RNA hypermutagenesis these numbers translates into best and average
overall mutation frequencies of ~7 and 3% respectively. To date, DNA hypermutagenesis has produced up to 18%
base substitution per clone with a best mean of 10% involving copying of both
strands.
Despite the intrinsic properties of the HIV-1 RT the advantages of DNA hypermutagenesis by PCR are manifold. First,
the complexity of the mutant libraries are incomparably greater providing
access to even larger fraction of sequence space. Secondly, the procedure is
faster being reduced to a single reaction. Thirdly, as the PCR step is
mutagenic there is in principal no need to clone before undertaking a second
cycle of DNA hypermutagenesis. However, the power of DNA hypermutagenesis is
now so great that iteration without some sort of phenotypic selection is
probably unwise because the information threshold can be crossed. In addition,
preliminary work suggests that primer dimers and deleted molecules may be
preferentially amplified upon cycling without phenotypic selection or
purification of the DNA band. The conditions can surely be refined to purge the
present GC -> AT bias.
The extent of mutation described above, as well as the complexity of the mutant
libraries, exceeds that generated by any biological method to date. A recent
paper described hypermutagenic PCR using modified dCTP and dGTP substrates (
26
). The best and average mutation frequencies described here (0.18 and 0.1 per
base per reaction) are highly comparable with those reported, notably 0.19 and
0.1 per base per reaction. The modified bases generally produced AT -> GC transitions and a small percentage (<10%) of transversions. The present protocol used standard bases,
generates at high frequencies all four transitions and, given the presence of
manganese cations, approximately one third transversions. Clearly there is
considerable flexibility and choice in the production of hypermutants which
could be tailored to the desires or needs of the experimentalist.
Although DNA hypermutagenesis allows huge leaps through sequence space, viable
hypermutants are to be had. The diversity currently accessible is so great that
any screening procedure will explore only a minute fraction of the sequence
space accessed. The simplicity and efficiency of DNA hypermutagenesis transfers
the burden of work in protein evolution
in vitro
onto analytical procedures. The potential of the method is such that, after
iterative DNA hypermutagenesis, the historical information content of a
sequence might be annihilated, defying recognition.
The choice of the small DHFR gene was particularly propitious. PCR product yield
decreases with dNTP pool bias and is further reduced upon addition of Mn
2+
cations. This can be alleviated to some extent by a chase PCR with equimolar
dNTPs. Alternatively cycling the product from an agarose gel purified band
should allow one to extensively hypermutate larger genes. Yet as the
probability of introducing deleterious mutations increases with target DNA
length, inevitably hypermutation of such genes might not prove as informative,
unless some form of biological selection is used. A further reservation
concerns the nature of the transversions observed. That A -> T and T -> A transitions were the most common may be attributed to the ability
of
Taq
DNA polymerase to elongate after T:T mismatches (
18
,
19
). Inversely, the dearth of a number transitions correlates well with the
relative inefficiency of the enzyme to elongate after A
t
:G, G
t
:A, G
t
:G and C
t
:C mismatches. Thus the mutation spectrum is shaped to some extent by
Taq
DNA polymerase. It is possible that different thermostable enzymes might show
subtle differences. Alternatively, modifications to the reaction mix might be
introduced in an attempt to alleviate such preferences.
DNA hypermutation accelerates what may occur under more physiological
circumstances over much longer time periods. Indeed there is a wealth of
experimental data associating dNTP pool biases, mutation and cancer (
22
,
23
,
25
). The consequences of an intracellular [dTTP] > [dCTP] bias are particularly
intriguing. Among eukaryotic cells the intracellular dNTP concentrations are
invariably [dATP] >= [dTTP] > [dCTP] >= [dGTP] or, in other words, [dTTP] > [dCTP] and [dATP] > [dGTP] (
25
). Given the particular properties of the G:T mismatch any increase in the
deoxypyrimidine triphosphate bias would help enrich the sequence in A+T. The
potential mutagenic effects resulting from fluctuations in the deoxypurine
triphosphate bias would have to be even more substantial as they would need to
invert the natural [dATP] > [dGTP] bias (
25
). From this it might be surmised that any exacerbation of the natural [dTTP] >
[dCTP] bias should have more long term impact on the genome. In this context it
is interesting to note that among vertebrate cells non-coding segments are generally A+T rich.
It is salutary to realize that DNA synthesis can be so error prone. It might be
supposed that during the evolution of primitive DNA based replicons and before
highly integrated dNTP metabolism, biased dNTP concentrations alone, or in
conjunction with dilute solutions of some transition metal ions, might have
contributed to the genesis of DNA sequence diversity upon which natural
selection could work.
We would like to thank Drs Fredj Tekaia and Christophe Terzian for statistical
analyses. This work was supported by grants from Institut Pasteur and l'Agence
Nationale pour la Recherche sur le SIDA.
REFERENCES
Return

