ABSTRACT
We have studied a novel class of DNA sequences that cause DNA polymerases to
pause. These sequences have the central consensus Py-G-C and are not necessarily adjacent to hairpins in the DNA template.
Since most consensus sequences do not cause pauses under standard conditions,
additional template features must exist that make it difficult to incorporate
nucleotides at these positions. We believe that these pauses result from
constraints that make the conformation change involved in nucleotide selection
more difficult. These pauses can obscure parts of DNA sequencing ladders and
prevent DNA amplification by the polymerase chain reaction. The addition of
betaine, and some related compounds, relieves these pauses.
DNA polymerases need to elongate rapidly and accurately to function effectively
in vivo
and
in vitro
, yet certain DNA regions appear to interfere with their progress. One common
problem is pause sites, at which DNA polymerase molecules cease elongation for varying lengths of time. Many strong DNA polymerase pauses are at the
beginnings of regions of strong secondary structure such as template hairpins. These pauses can be eliminated by methods that destabilize hairpins, but other pauses are not affected by these
modifications (
1
-
9
).
Most of the studies on elongation by DNA polymerases have focused on the normal
elongation process rather than the situations that lead to pauses. Evidence
from studies on various DNA polymerases and HIV reverse transcriptase suggest that all of these enzymes
follow a similar six-step elongation cycle consisting of nucleic acid binding, nucleotide binding, a conformation change that
aids in nucleotide discrimination, phosphodiester bond formation, a conformation change that permits pyrophosphate release, and translocation or nucleic acid release. Although the initial
conformation change is typically the slowest step during processive elongation,
almost any of these steps can become rate limiting under appropriate conditions (reviewed in
10
).
A variety of studies have been done on T7 DNA polymerase that make it a
particularly useful system for studying pausing. T7 DNA polymerase consists of
two proteins, the virally encoded gene 5 product and the host thioredoxin
protein. Thioredoxin confers processivity on the gene 5 product, allowing it to
synthesize thousands of nucleotides without releasing the template (
11
). By comparison, estimates for the processivity of the Klenow fragment of
Escherichia coli
DNA polymerase I range from tens to hundreds of nucleotides (
10
-
12
), giving it a higher background of incompletely extended products.
Modified versions of T7 DNA polymerase have been produced that lack the normal 3' -> 5' exonuclease activity (
12
-
13
). This activity usually provides an extra opportunity for removing
misincorporated nucleotides, but also inconveniently removes chain-terminating dideoxynucleotides (ddNTPs) during DNA sequencing, and would
complicate analysis of polymerase pauses. Neither the mutated enzyme nor the
native version has a strong nucleotide bias, accepting ddNTPs almost as well as
dNTPs; this leads to relatively uniform band intensities on sequencing gels and
reduced pausing at minor pause sites (
13
). Remaining pause positions appear as bands in all four lanes on a sequencing
gel, since fragments of the lengths corresponding to the distance of pause
positions from the primer accumulate regardless of whether a chain-terminating ddNTP has been incorporated. This system provides an excellent
opportunity to study the mechanism of DNA polymerase pausing in a situation
with potential practical applications.
Betaine monohydrate, trimethylamine
N
-oxide (TMANO),
N
,
N
-dimethylglycine, sarcosine, tetraethylammonium acetate (TEAAc) and torula yeast RNA were purchased from Sigma (St Louis, MO). Betaine
was stored at -20oC as a 5.5 M stock (pH 7.0). Nucleotide mixes and T7 DNA polymerase
(Sequenase 2.0, genetically modified to lack exonuclease activity) were purchased from US Biochemicals (Cleveland, OH). The Klenow fragment of
E.coli
DNA polymerase I was purchased from Pharmacia (Piscataway, NJ).
Sequencing procedures generally followed the protocol of Del Sal
et al
. (
14
). Approximately 1 pmol of supercoiled double-stranded plasmid DNA [purified using a QIAGEN (Chatsworth, CA) column] was
mixed with 2-4 pmol of an oligonucleotide primer in a volume of 10 [mu]l containing 0.1 N NaOH. After 10 min at 68oC, the reaction was moved to room temperature and 4 [mu]l TDMN (200 mM NaCl, 50 mM DTT, 120 mM HCl, 280 mM TES) were
added. After an additional 10 min, 2 [mu]l labeling mix (7.5 [mu]M dCTP, 7.5 [mu]M dGTP, 7.5 [mu]M dTTP) and 5 [mu]Ci [
35
S]dATP were added, followed by 3 U T7 DNA polymerase. This mix was incubated for 5 min at room temperature, then 3.5 [mu]l aliquots were added to four tubes preheated to 37oC containing 2.5 [mu]l of one of the termination mixes (80 [mu]M each dNTP, 8 [mu]M one ddNTP, 50 mM NaCl). When testing pause suppressers,
additional chemicals were added to these tubes before preheating. For example,
3.5 [mu]l of 5.5 M betaine were added to achieve a final concentration of 2 M. After
5 min at 37oC, the reactions were stopped with 4 [mu]l stop solution (80% deionized formamide, 1* TBE, 0.05% xylene cyanol, 0.05% bromphenol blue) and electrophoresed
on a 6% polyacrylamide-7 M urea-1* TBE gel.
For the experiments on impurities in DNA preparations, plasmid DNA was prepared
by a standard `mini-prep' procedure involving lysing the cells in SDS/alkali then
precipitating the DNA with isopropanol, without RNAse digestion or additional
treatment (
15
).
Forty units of T7 DNA polymerase were incubated in a volume of 80 [mu]l of diluent (10 mM Tris-HCl pH 7.5, 5 mM DTT, 0.5 mg/ml BSA) containing 2 M betaine, 2 M proline or no osmoprotectant. After
various periods at 37oC, 10 [mu]l samples were removed and stored on ice. The remaining DNA polymerase
activity in each sample was measured by adding 90 [mu]l assay mix (which varied to equalize the osmoprotectant concentration) to
obtain reactions containing 300 [mu]M each dNTP, 40 mM Tris-HCl pH 7.5, 10 mM MgCl
2
, 5 mM DTT, 200 mM proline, 200 mM betaine, 75 c.p.m./pmol [
32
P]dATP and 1.5 pmol of plasmid DNA with annealed primer. The reactions were
incubated for 3 min at 37oC, then stopped with 200 [mu]l YEP (0.5 mg/ml torula yeast RNA, 50 mM EDTA, 50 mM NaPPi). TCA-precipitable counts were determined and compared with a standard curve to determine remaining units
(the DNA-polymerase ratio approached 1:1 at the highest polymerase concentrations,
so this curve was not linear, but was reproducible). The osmoprotectants had no
detectable effect on overall enzyme activity, as determined by comparing units
present in their presence and absence without preincubation. Most incorporation
occurred during the first 1-2 min, so the 3 min incubation allowed completion of one round of
synthesis per polymerase (
11
).
Single-stranded circular DNA was prepared according to the protocol of Russel
et al
. (
16
) as modified by Promega (
17
). Sequencing was performed as described for double-stranded DNA.
Sequencing was performed according to the above protocol with the following
modifications. Labeling mix contained 20 [mu]Ci [
35
S]dATP and 25 mM DTT. Five units of Klenow polymerase were used in place of T7
DNA polymerase. Instead of performing complete sets of sequencing reactions,
all reactions were terminated with the `A' termination mix (250 [mu]M ddATP, 25 [mu]M dATP, 250 [mu]M dCTP, 250 [mu]M dGTP, 250 [mu]M dTTP). Chases were performed with 1 [mu]l chase mix (250 [mu]M each dNTP) for 10 min at 37oC.
During supercoiled double-stranded DNA sequencing reactions using a standard protocol (see Materials
and Methods), T7 DNA polymerase pauses at certain positions, leading to bands
in all four lanes on the resulting denaturing polyacrylamide gel (Fig.
1
A). We have selected 26 examples of pause sites at random, including pauses on
several different templates at a variety of distances from the primer. We
classified these pause sequences as mild, moderate or severe based on the
extent of pausing: mild pauses have significant bands visible in multiple
lanes, moderate pauses have at least one other band approximately as intense as
the band corresponding to the nucleotide actually present at that position, and
severe pauses have strong bands in all four lanes, making it impossible to
determine what nucleotide is actually present at the corresponding template
position.
Since DNA polymerase pause sites often occur at positions where the DNA
polymerase first encounters a hairpin, we first searched the regions containing
these pause sites for secondary structures using the MFOLD program from the
Genetics Computer Group (Madison, WI) software package. We found that pause
sites are often located near putative hairpins, but that there is no consistent
positioning of the pauses: some are located at the beginnings of stems, some in
loops and some at the ends of stems (data not shown). These secondary
structures may contribute to elongation problems, but the lack of a consistent
relationship suggests that they are not uniquely responsible. This conclusion
is supported by the existence of positions at which T7 DNA polymerase pauses on
both strands: if the polymerase is pausing on one strand as it first encounters
a stable secondary structure, it must be just completing the synthesis of DNA
complementary to that secondary structure as it elongates on the other strand.
We next attempted to locate common features in the primary structure of these
pauses. As can be seen from an alignment of these sequences (Fig.
2
), most of the pauses occur just after the incorporation of deoxyguanosine, in
the middle of a sequence which is most commonly Py-G-C, and which essentially never differs at more than one position
from this consensus. The region surrounding this trinucleotide tends to be GC-rich, with a particular tendency for a G or C to appear 3 nt after the
last incorporated nucleotide. The only pause that differs by >1 nt from this
consensus is pause 11, which is in a cluster of pauses in the middle of an
extremely GC-rich region. There appears to be an overall tendency for pauses to occur
in the vicinity of other pauses (Fig.
2
; refs
4
,
8
,
9
), suggesting some global or cumulative impediment to elongation in these
regions.
The short consensus sequence for pauses described in the above section occurs
every 32 base pairs by chance, yet major pauses are far less common in
practice. While the consensus sequence may determine the exact position of a
pause, other factors clearly affect the likelihood of pausing and the severity
of pauses. As noted above, the region surrounding the consensus tends to be GC-rich (58%), but there is not a direct correlation between GC-richness and pause severity. In fact, the least GC-rich pause (pause 14) is a severe pause. However, this pause
has a run of five thymidine nucleotides that might contribute to DNA bending,
which may also exacerbate pauses (pauses 15 and 25 have similar runs, an
abnormally high frequency for a sample of this size). There is also a tendency
for pauses to occur with higher frequency farther from primers, ultimately completely obscuring the DNA sequence (Fig.
3
A), but pauses can occur at any distance from the primer (Fig.
2
).
Since there are no obvious specific causes of most of these pauses, we decided
to look at external factors that influence the frequency and severity of pauses
in general. The effects of these factors have been reduced over the last 20
years as DNA sequencing conditions have been optimized. When sequencing
reactions were run at 30oC instead of 37oC, there was a substantial increase in the severity of some pauses
(data not shown). Similarly, using DNA templates that had been prepared by a
mini-prep protocol (see Materials and Methods) instead of a more thorough
purification led to an increase in both the frequency and severity of pauses
(Fig.
4
A). Finally, substituting the nucleotide analog deoxyinosine for deoxyguanosine
also made pauses more numerous and more severe (Fig.
5
A).
Since it is clear that some feature of the template is having a significant
affect on whether T7 DNA polymerase pauses, we experimented with the inclusion
of betaine (
N
,
N
,
N
-trimethyl glycine) in our sequencing reactions. Betaine is a zwitterionic
osmoprotectant found in many halophilic organisms (
20
) that has been found to alter DNA stability such that GC-rich regions melt at temperatures more similar to AT-rich regions (
21
). We felt that this compound might alter the structure of the DNA at pause
sites, possibly affecting the tendency of DNA polymerases to pause.
In order to determine whether the results we have described above are generally
true of DNA polymerases, we attempted to extend our results by examining the
abilities of other DNA polymerases to pause while replicating DNA, and the
effect of betaine on these other polymerases. Our initial studies used the
Klenow fragment of
E.coli
DNA polymerase I, since this enzyme has been used in pausing studies in the
past. Pausing by this polymerase is not identical to pausing by T7 DNA
polymerase (Fig.
9
). The Klenow fragment does not pause at pause 25, which is a severe pause for
T7 DNA polymerase. It does, however, pause at three pauses that are weak pauses
by T7 DNA polymerase. These three pauses are all within 1 nt of the consensus
sequence described above: TCC, CGC and TGT. In each case, the pause is not
chased by the addition of more nucleotides (Fig.
9
, lane 2), but is prevented by the presence of 2 M betaine (lane 3). This data
suggests that Klenow stops at a similar consensus, but that it has somewhat
different regions that it has difficulties elongating through. The nature of
the pauses is probably similar, however, since betaine still prevents them.
Intriguingly, betaine also seems to alter nucleotide selection, at least at
some positions (making it easier, for example, to incorporate a
dideoxyadenosine in the second position of the sequence AA; Fig.
9
).
Figure
While initiating trial experiments with
Taq
DNA polymerase, we discovered that the Griffith group were working on a
template that provided a perfect natural test of pausing by this polymerase at
the Py-G-C consensus. Wang and Griffith (University of North Carolina at
Chapel Hill, unpublished data) were examining a region of the myotonic
dystrophy gene containing 75 repeats of the triplet TGC. They were unable to
amplify a 500 base pair region containing this sequence using normal PCR
conditions, but were able to do so using our modified conditions containing 2 M
betaine. We have also found higher yields of some longer fragments amplified in
the presence of 1-1.5 M betaine (data not shown). These experiments suggest that
Taq
DNA polymerase will also stop at the consensus sequence, and that this pausing
can also be prevented by the addition of betaine, demonstrating the generality
of this type of pause.
In order to learn more about pausing and mechanisms of relieving pausing, we
examined how betaine might be affecting the elongation process by testing other
chemicals that might have similar effects, to determine how betaine is
functioning and which of its properties are critical. We looked at two classes of chemicals: those that are physically similar to betaine and those that have some
functional similarity to betaine.
Figure
Numerous chemicals are known to affect the melting of DNA, but most of them are
ionic and tend to interfere with enzyme activity (
27
). Nevertheless, we examined the effects of TEAAc on T7 DNA polymerase pausing
at concentrations ranging from 0.25 to 2 M. At lower concentrations, there is substantially more pausing
(particularly at normal pause sites), with higher concentrations completely blocking elongation (Fig.
10
D). When betaine was added in addition to TEAAc, much of the pausing was
overcome (Fig.
11
), indicating that betaine and this DNA melting agent are acting inversely, but
on similar targets. Proline, which appears to act solely as a polymerase
stabilizer, was not able to counteract the effect of TEAAc (data not shown).
These experiments indicate that interactions with DNA are important for pause
suppression, supporting our prior conclusions that there are additional
features of the DNA that help determine the location of pauses.
Finally, we tried the osmoprotectant TMANO. In addition to stabilizing proteins
(
28
), TMANO has the trimethylamine group that appeared important for betaine
(compared with less amino-substituted derivatives), and has been shown to affect the melting of DNA
(
29
). TMANO proved capable of eliminating the pauses by T7 DNA polymerase in our
standard system (Fig.
10
E), confirming that the trimethylamine moiety is a critical part of this
molecule.
In this report, we have examined the tendency of DNA polymerases to pause at nonclassic pause sites and the ability of a class of chemicals to
overcome these pauses. The pauses we examined have a central consensus Py-G-C, but other factors must also be involved, since not all Py-G-C sites are pauses. One possible explanation might be
that anything that slows the polymerase, including both intrinsic features of
the template and factors like temperature and contaminants, leads to increases
in pausing at positions where nucleotide incorporation is difficult; this would
explain why pauses often occur in clusters, and why the pause sites differ
somewhat from polymerase to polymerase.
Abbotts
et al
. (
19
) have recently examined the tendency of reverse transcriptases to `terminate' during processive elongation on a single-stranded M13mp2 template. They quantitated termination at every position over a 255 nt region, and looked for patterns in the
positions with the highest termination frequencies, finding some polymerase-dependent tendencies that differ from those we have reported here, but
little that was predictable. However, many of the sites they included in their
analyses were very weak pauses, and may constitute a different class or classes
of pause sequence; we would expect the class of pauses we have described here
to occur at a far lower frequency than the ones they analyzed. If one only
examines the strongest three sites in their study, which are on average 3-fold stronger than the next strongest site and 15-fold stronger than some other pauses included as strong pauses, one
find that they form a cluster at the farthest point from the primer (~140 nt away) in which the first site is at a TGC sequence and the more
distant two sites are 1 nt past TGC and CGC sites respectively, and may be
mislocated. These sites may be of the same sort that we have examined in this
manuscript.
Our evidence suggests that the problem leading to pauses occurs at the
conformation change preceding phosphodiester bond formation. The fact that betaine alters nucleotide incorporation [
Taq
DNA polymerase: C. Robinett (University of California, Berkeley), unpublished
observation; Klenow: Fig.
9
; T7 DNA polymerase, AMV and HIV reverse transcriptases:
30
] indicates an effect on either nucleotide binding or the conformation change
involved in proofreading, and the more pronounced differences with G and C (
21
,
30
; nucleotides that have less in common structurally than, e.g., purines or
pyrimidines, but function together as a base pair) suggest that the effect
occurs at the geometric analysis of the base pair that accompanies the
conformation change rather than at the chemical interactions with the
polymerase and the template that occurs during nucleotide binding. Published
links between misincorporation and pausing (
31
-
33
) further implicate this part of the elongation cycle, and the significant
effect of a moderate temperature change suggests a problem at the conformation
change.
Although pausing appears to be a problem at the polymerase active site, many of the problems that exacerbate pausing primarily affect the surrounding template structure. However, Boyer
et al
. have seen that nucleotides up to six positions away from the active site can
affect nucleotide selection, indicating a profound effect of the overall
template structure (
18
). Similar results have been obtained with
E.coli
RNA polymerase, where surrounding sequences can significantly affect the ease
of nucleotide incorporation at specific positions (
34
). Betaine also appears to be affecting nucleotide incorporation and polymerase
pausing by altering the overall DNA template conformation.
Rees and von Hippel have suggested that betaine acts by contacting A-T base pairs in the major groove, particularly the methyl group of
thymine (
21
). However, the interaction with that group was specifically precluded by
earlier studies that they based their conclusions on (
27
,
35
). We feel that the differences we observed when dITP was substituted for dGTP
suggest that the major effect may be in the minor groove, and that betaine may
be altering the hydration of this region (the water spine) to affect the local
structure of the DNA molecule (
36
); previous studies have demonstrated that betaine is able to alter hydration in
other systems (
24
,
37
). It is known that AT-rich DNA is normally more hydrated (
38
), and a change that increases the hydration of GC-rich regions may also increase their flexibility, since GC-rich regions are generally more rigid (
39
). This increased flexibility could make the conformation change involved in
nucleotide selectivity more favorable.
Betaine has numerous practical applications in the laboratory. It can improve
sequencing reactions, permitting longer and more accurate reads under a variety
of conditions [including in cycle sequencing; N. Salama and C. Robinett
(University of California, Berkeley), personal communications]. It can also
improve PCR amplification of some difficult sequences, and may be of some use
for long PCR. Trial experiments have suggested that betaine may even allow
direct sequencing of cell pellets lysed by the addition of NaOH, which would normally yield no readable sequence.
This work was supported by NIH grant GM12010 to M.J.C. and an NSF graduate
fellowship to D.S.M. Some of this work is covered by pending US or foreign
patents. We thank Charles Richardson, Zoè Weaver and Karen Christie for helpful comments on drafts of this paper.


The two chemicals most structurally similar to betaine that we tested were
N
,
N
-dimethylglycine and
N
-monomethylglycine (sarcosine). Like betaine, sarcosine has some DNA melting ability (
25
), and dimethylglycine is probably intermediate between the two chemicals in
this regard. All three of these chemicals have comparable abilities to
stabilize proteins (
24
), though betaine is best at overcoming the sensitivity of cells to high
osmolarity and sarcosine is ineffective (
26
). When we examined the effects of these chemicals on T7 DNA polymerase pausing,
we found that dimethylglycine was less effective than betaine, and that sarcosine was largely ineffective (Fig.
10
B and C). This supports our previous hypothesis that betaine-like compounds are acting on the DNA (which these compounds affect
differentially) rather than on the enzyme (on which they seem to act
similarly). Furthermore, it suggests that the trimethylamine group is an
important functional group for pause suppression (and perhaps for
osmotolerance).
REFERENCES
Return





