Enrichment of oligo(dG)[middot]oligo(dC)-containing fragments from human genomic DNA by Mg2+-dependent triplex affinity capture
Enrichment of oligo(dG)[middot]oligo(dC)-containing fragments from human genomic DNA by Mg 2+ -dependent triplex affinity captureNaoko Nishikawa, Naotoshi Kanda1, Michio Oishi+ and Ryoiti Kiyama*
Institute of Molecular and Cellular Biosciences, University of Tokyo, Yayoi, Bunkyo-ku, Tokyo 113, Japan and 1Department of Veterinary Anatomy, School of Veterinary Medicine, Tokyo University of Agriculture and Technology, Fuchu-city, Tokyo 183, Japan
Received February 13, 1997;Revised and Accepted March 19, 1997DDBJ/EMBL/GenBank accession nos D88106-D88112
ABSTRACT
Oligo(dG)[middot]oligo(dC)- or short poly(dG)[middot]poly(dC)- containing fragments were enriched and cloned by means of Mg2+-dependent triplex affinity capture and subsequent cloning procedures. A library constructed after three cycles of enrichment showed that ~80% of the clones in the supercoiled form formed a complex with labeled oligonucleotide (dG)34. However, while the rest of the clones retained the ability to form a complex (type I clones), 90.9% failed to form a complex when they were linearized. This group of DNA was abundant in the genomic DNA, although it showed only ~3-fold enrichment by one cycle of affinity capture. This group was further classified into two species (types II and III) based on complex formation ability after phenol extraction. Type II clones retained the complex formation ability after treatment, while the human telomere [(TTAGGG)n] and telomere-like [(TGGAA)n] or [(TGGAG)n] sequences belonging to type III clones did not. Serial deletion experiments and the binding assays using oligonucleotides confirmed that the repetitive units containing T(G)nT (n= 3-5) tracts or (G)n-motifs (n >=
3) were the sites of complex formation for type II and III clones. On the other hand, type I clones contained poly(dG).poly(dC) tracts at least 10 nt long, and DNase I-footprinting analysis indicated that these tracts were the sites of complex formation.
INTRODUCTION
A group of repetitive sequences consisting of short unit sequences, usually up to a few bases, called microsatellites appear very frequently in the genomic DNA of eukaryotes (1 -4 ). Unlike satellite DNA, however, they usually exist in the euchromatic region and the numbers of repeats are, in most cases, 10-100. Therefore, they do not show a satellite profile on ultracentrifugation in the presence of CsCl. Generally, their frequencies in the genomic DNA are higher than expected, suggesting that several mechanisms are responsible for their active formation and stable maintenance (5 -7 ). Furthermore, their presence in regions close to genes has been shown to influence gene expression (8 -10 ). For example, a (CA)n repeat located in the promoter region of the rat prolactin (rPRL) gene changes the superhelical density of the region thereby inhibiting transcription (11 ). The recent finding of (CAG)n repeats in genes responsible for neural genetic diseases especially have focused attention on microsatellites (7 ,12 ).
Since genomic DNA of higher eukaryotes is generally A+T rich, guanine and cytosine bases are underrepresented. Therefore, unlike poly(dA)[middot]poly(dT) and poly(dG-dA)[middot]poly(dT-dC), poly(dG)[middot]poly(dC) and other G+C-rich microsatellites appear infrequently. Their frequencies in human genomic DNA are 0.3% for poly(dA)[middot]poly(dT), 0.2% for poly(dG-dA)[middot]poly(dT-dC) or 0.0002% for poly(dG)[middot]poly(dC), respectively (2 ). The unit sequences containing the dinucleotides CG appear very infrequently due to the transition of cytosine to thymine (13 ). As a result, TG dinucleotides appear at a higher frequency than the theoretical values.
In previous reports, we enriched two types of microsatellites, poly(dA)[middot]poly(dT) and poly(dG-dA)[middot]poly(dT-dC), from human genomic DNA by Mg2+-dependent triplex affinity capture (14 ,15 ). More than half of the clones in these libraries contained perfect repeats of these microsatellites. Their numbers of repetitions ranged from 14 to 37 for poly(dA)[middot]poly(dT) or from 15 to 42 for poly(dG-dA)[middot]poly(dT-dC), indicating that the complexes formed by these repeated sequences are relatively stable. However, we observed polymorphism associated with the repeat length between individuals and, in some cases, between normal and cancer tissues from the same individual (15 ). Such instability of microsatellites has been reported for colon cancers and other genetic diseases caused by deficiencies in mismatch repair genes (16 ,17 ).
While increasing numbers of microsatellites have been utilized for genetic and genome analysis (6 ,18 ), it has been laborious to enrich fragments containing these microsatellites. Affinity capture of duplex DNA by triplex DNA formation with biotin-labeled oligonucleotides as the third strand was first reported by Ito et al. (19 ). We and others have modified this method for broader use by introducing Mg2+-dependent association and dissociation of the triplex (14 ,15 ,20 ). While C+GC triplet bases are stabilized at a low pH without metal ions, several triplexes, poly(dT)[middot] poly(dA)[middot]poly(dT), poly(dG-dA)[middot]poly(dG-dA)[middot]poly(dT-dC) and poly(dG)[middot]poly(dG)[middot]poly(dC) for example, require metal ions for their formation. Triplexes with mixed bases could also be stabilized with metal ions (21 -24 ). However, the efficiencies of the procedure for other microsatellites and the types of microsatellites that could be obtained by specific probes have not been examined in detail. Therefore, for broader use of triplex affinity capture, it has been a prerequisite to characterize the method by first using simple repeats. In this study, we explored the method using poly(dG)[middot] poly(dC)-containing sequences by forming poly(dG)[middot]poly(dG)[middot] poly(dC) triplexes in the presence of Mg2+.
MATERIALS AND METHODS
Materials
A pair of oligonucleotides, oligo A (GATCCGCGGCCGCCCGAT) and oligo B (ATCGGGCGGCCGCG), for the adaptor were annealed and then ligated to AluI-digested human DNA. Oligo A was used as a primer for PCR. Biotinylated (dG)34 used for Mg2+-dependent triplex affinity capture was synthesized by Greiner (Japan). Streptavidin-coated magnetic beads were purchased from Promega (USA). Plasmid pGC19 was constructed from pUC19 by inserting a (dG)34 sequence between SacI and EcoRI sites. The clone containing the telomere sequence was TT6 (44 ). The oligonucleotides used for the binding assays were CTGTTCCAGGCTGTCAGATGCTCACCTGGGGGTGTGGGTG and (TGGGGGTGTGGGTG)2TGGGGGT from pHGC32, and (GGAGT)6GGAG from pHGC39, which were annealed with their complementary oligonucleotides.
Mg2+-dependent triplex affinity capture
This procedure was performed as described previously (15 ), except that oligonucleotide (dG)34 was used as the third strand. Approximately 1 [mu]g of plasmid DNA or genomic DNA was mixed with 200 nM biotinylated (dG)34 in 25 [mu]l of triplex buffer (10 mM Tris-HCl, pH 8.0, 10 mM MgCl2, 50 mM NaCl) and incubated at 37oC for 30 min. The sample was adsorbed onto 0.3 mg of streptavidin-coated magnetic beads. After incubation with 0.5 ml of the triplex buffer at room temperature for 30 min, the beads were separated with magnets. After extensive washing with the triplex buffer (seven times with 0.5 ml: FT fractions), the bound DNA was eluted with 2.5 ml of the elution buffer (10 mM Tris-HCl, pH 8.0, 5 mM EDTA, 50 mM NaCl: EDTA eluates).
Construction of the library
Chromosomal DNA from HeLa cells was digested with AluI and ligatedwith the adaptor. After PCR amplification (a total of 29 cycles with a 1/10 dilution after the 17th cycle) with theprimer (oligo A), the amplified fragments were purified using a Wizard PCR Preps kit (Promega, USA). One [mu]g of the amplified DNA was used for the first cycle of affinity capture. The DNA fragments recovered from the first cycle were amplified by PCR and used for the next cycle. Afterthe 3rd cycle, the amplified DNA was digested with NotI and ligated into pBluescript SK(-). The ligation products were used for transformation of Escherichia coli DH5[alpha]. The clones of the library were screened by triplex DNA-gel assay for complex formation with (dG)34. PCR was performed for appropriate cycles (a total of 38 cycles with a 1/20 dilution after the 20th cycle for the first, 31 cycles with a 1/20 dilution after the 15th cycle for the second and 30 cycles with a 1/20 dilution after the 15th cycle for the third cycle of the enrichment) consisting of 94oC for 1 min, 40oC for 2 min and 72oC for 3 min, with a final 10 min extension at 72oC.
DNase I footprinting
DNase I footprinting was performed as described previously (25 ). After the plasmid DNA was digested with HindIII followed by treatment with bacterial alkaline phasphatase, its 5' ends were labeled with 32P by T4 polynucleotide kinase. The labeled DNA was digested with ScaI and the 293 bp fragment was recovered from 1% low-melting agarose with Wizard PCR Preps. After aliquots of 40 [mu]l were incubated at 37oC for 30 min in the presence of 0, 0.2, 2 or 20 [mu]M (dG)34 in triplex buffer (10 mM Tris-HCl, pH 8.0, 10 mM MgCl2, 50 mM NaCl), 60 [mu]l of a solution containing 10 mM Tris-HCl (pH 8.0), 2.5 mM CaCl2 and 10 mM MgCl2 was added and the reaction was initiated with 2 [mu]l of 1.4 * 10-3 U of DNase I. After incubation for 2 min at room temperature, the samples were precipitated with ethanol and electrophoresed on a 6% polyacrylamide-7 M urea gel under denaturing conditions. The G+A marker was prepared according to the Maxam-Gilbert method (26 ).
Triplex DNA gel assay
Triplex DNA assay using oligonucleotides
One nM of labeled-duplex oligonucleotides derived from type II clone pHGC39 and type III clone pHGC32 were incubated with various concentrations of (dG)34 in triplex buffer for 30 min at 37oC. The duplex DNA and the complex formed with (dG)34 were resolved by 10% polyacrylamide gels. Duplex oligonucleotides were labeled with T4 polynucleotide kinase.
Construction of the deletion mutants
Each clone, pHGC32 or pHGC39, was digested with PstI and XbaI. After ethanol precipitation, the samples were incubated with 1.8 U/[mu]l of exonuclease III (Takara, Kyoto) in 50 mM Tris-HCl (pH 8.0), 5 mM MgCl2, 10 mM 2-mercaptoethanol at 37oC. Aliquots of 10 [mu]l were removed from the reaction mixture at 1 min intervals up to 10 min and, after phenol extraction, 0.5 U/[mu]l mungbean nuclease (Takara) was added to the reaction mixture containing 10 mM Tris-HCl (pH 8.0), 10 mM MgCl2, 50 mM NaCl, 1 mM dithiothreitol, 1 mM ZnSO4 and the mixture was incubated for 1 h at 37oC. After phenol extraction and ethanol precipitation, the samples were incubated with 0.04 U/[mu]l of Klenow fragment (New England Biolabs) in 10 mM Tris-HCl (pH 8.0), 5 mM MgCl2, 0.5 mM dithiothreitol, 1.25 mM dNTPs for 30 min at 37oC. They were then self-ligated with T4 DNA ligase (New England Biolabs) and used for transformation.
RESULTS
Enrichment of poly(dG)[middot]poly(dC)-containing fragments by Mg2+-dependent triplex affinity capture
Figure 1 shows a summary of enrichment of poly(dG)[middot]poly (dC)- containing fragments from human genomic DNA by Mg2+- dependent triplex affinity capture. In a separate experiment using pUC19 and pGC19, a pUC19 derivative containing a (dG)34.(dC)34 tract, ~30-fold enrichment was achieved by a single cycle of the procedure (data not shown). This method was applied for multiple cycles to human genomic DNA. Figure 1 A shows the results of a gel assay for triplex formation using 32P-labeled (dG)34. After three cycles of the procedure, fragments of up to 2-3 kb in length, but mainly <1 kb, were amplified (Fig. 1 A, lane 4). Since there was little enrichment after the third cycle (data not shown), we constructed a library with the products after the third cycle. The low intensity of the signals was presumably due to the inefficient complex formation of certain sequences under the conditions of affinity capture (discussed below).
In the library, 33 of 42 clones examined (78.6%, data not shown) including 14 randomly selected clones shown in Figure 1 B (lanes 1-14) formed complexes with 32P-labeled (dG)34 when in the form of supercoiled DNA. Since none of 14 clones randomly selected from the fragments before enrichment (cloned from the sample used in Fig. 1 A, lane 1) showed complex formation ability (data not shown), the procedure indeed enriched the fragments of interest. Table 1 summarizes the clones which formed a complex with (dG)34 and whose sequences were determined. Most of these clones contained multiple stretches of (G)n (n >= 3) tracts, while others were derived from satellite DNA containing the human telomere-like sequences. When these clones were linearized, 30 of 33 clones examined (90.9%) failed to form a complex with the oligonucleotide (details below).
. Summary of the clones that showed triplex formation
Clone no.
Insert length (bp)
Triplex formationa
Categoryb
Pattern of repeatsc
2
550
+
II
AGGGGCGTGGGC
CGGGGTGTG
AGGGGACCGGGGAG
7
150
+
III
(TCCAC)n(TGGAG)m
11
1050
++
II
CGGGGGCGGGGATGGGT
TGGGGCGGGGA
AGCGGGAGGGGGA
14
550
+
II
TGGGGTGGGC
AGGGGA
AGGGGC
15
950
+++
II
AGGGGGGGGGGA
AGGGCAGGGGA
CGGGGA
17
598
+++
I
CGGGGTGGGGGGGGGGC
CGGGGGC
TGGGGGA
19
1000
++
II
A(GGGCC)3GGGGGGA
CGGGA
30
581
+++
I
TGGGGGGGGGGC
TGGGC
AGGGTGGGA
32
1048
+++
II
TGGGGGTGTGGGT
TGGGGTGCAGGGT
TGGGGGTGTGTGGGT
34
750
++
III
TGGGGGGA
TGGGGGC
36
849
+++
II
AGGGGAGGGGGGT
TGGGGTGGGT
AGGGTCTGTGGGGA
37
1050
+
III
(TGGAA)n(TGGAG)m
39
1020
++
III
(TGGAA)n(TGGAG)m
40
900
+
III
AGGGGGT
AGGGGGC
42
1050
+
III
(TGGAA)n(TGGAG)m
aRelative intensity of triplex formation detected by triplex DNA gel assay with the cloned DNA in the supercoiled form.bClassified as types I, II or III (see text).cTypical sequences containing poly(dG) tracts. The repeats of (G)n (n >= 3) are underlined.
Characterization of the cloned DNA fragments
The nucleotide sequence data as well as the gel assay indicated that (i) most of the clones contained relatively short stretches of poly(dG) tracts, (ii) some were apparently derived from satellite DNA, containing multiple repeats of short (mostly 5 bp) sequences, (iii) there was no apparent correlation between the intensity of the signals and the length of the fragments, and (iv) some lost the signal when they were linearized. These observations indicated that the clones in the library could be classified into several types. As summarized in Figure 2 , they were tentatively classified into three types according to complex formation ability under various conditions. The first type (type I; pHGC17 and pHGC30, for example) of clones formed complexes with labeled (dG)34 in both supercoiled and linear forms, and these clones contained relatively long stretches of poly(dG) tracts. The second type (type II; pHGC32 and pHGC36) contained short, usually 3 bp, stretches of poly(dG) tracts which were sandwiched by Ts. These clones lost the signal when they were linearized. The third type (type III; pHGC39 and pHGC42) were derived from satellite DNA and contained multiple repeats of (TGGAG)n and/or (TGGAA)n. The clones belonging to this type also lost the signal after linearization (Fig. 2 ). Types II and III were separated according to the differences after phenol treatment (details below). The frequencies of each type in the library were 8.3% (for type I), 25.0% (for type II) and 66.7% (for type III).
To explain the enrichment of the fragments that did not contain a long stretch of poly(dG).poly(dC), we examined the degrees of enrichment of plasmids pHGC17 and pHGC32 from types I or II, respectively, and pH8, which showed no complex formation, as a control, from the mixture with pUC19 (Table 2 ). The clone pHGC17 was enriched to ~16.6-fold after one cycle of affinity capture. In contrast, pHGC32 was enriched to ~3-fold even in the form of supercoiled DNA. No enrichment was observed for the control clone pH 8. This suggests that type II and III clones represented by pHGC32 comprised a large portion (90.9%) of the library because, although they were enriched inefficiently, they were abundant in the genomic DNA. On the other hand, type I clones represented by pHGC17 appeared infrequently in the genomic DNA, although they can be enriched efficiently.
Complex formation of poly(dG)[middot]poly(dC)-containing sequences
DISCUSSION
Enrichment of poly(dG)[middot]poly(dC)-containing fragments
We reported previously the enrichment of microsatellites poly(dA)[middot]poly(dT) and poly(dG-dA)[middot]poly(dT-dC) from human genomic DNA by Mg2+-dependent triplex affinity capture (14 ,15 ). In both cases, the libraries exhibited substantial enrichment of these microsatellites whose lengths were 14-84 bp. In contrast, the library of poly(dG)[middot]poly(dC)-containing fragments constructed after three cycles of affinity enrichment consisted of two groups of clones: one containing perfect poly(dG)[middot]poly(dC) sequences (type I clones) and another containing either the consensus T(G)nT (n = 3-5) or telomere-like sequences. The latter group was further classified into two types (types II and III) according to the complex formation ability after phenol treatment. Although the library contained 8.3, 25.0 and 66.7% type I, II and III clones, respectively, the degrees of affinity enrichment varied among these sequences. Affinity enrichment using two clones from each group indicated that those with a perfect poly(dG)[middot]poly(dC) sequence (type I) showed 16.6-fold enrichment, while those with T(G)nT (types II and III) showed only 3-fold enrichment. Therefore, we assumed that sequences with T(G)nT are fairly abundant in human genomic DNA although they could have been lost during affinity enrichment because of their low affinity to (dG)34. Although we used 34-base long oligonucleotides for affinity enrichment, it might be possible to increase the stringency for perfect poly(dG)[middot]poly(dC) sequences by raising the temperature at the affinity capture or by shortening the length of the probe.
Complex formation of enriched fragments
DNase I footprinting experiments (Fig. 4 ) revealed that the poly(dG)[middot]poly(dC) tract was the site of complex formation between the type I clones and 32P-labeled (dG)34. However, we could not determine the site of interaction between T(G)nT-containing fragments and the labeled probe by this approach (data not shown). This was probably because the interaction between them was weak. However, since the specific sequence feature appearing in both pHGC32 and pHGC36 was T(G)nT, this tract was most likely to be the site of interaction. Especially, the sequence was repeated throughout the cloned pHGC32 fragment which consisted of multiple copies of a 40 bp consensus, the repetition of this sequence could stabilize the complex in the gel assay. This was also observed for the other clone pHGC36, where T(G)nT appeared ~30 times. It has been suggested that DNA fragments with interrupted (G)n-motifs still can interact with single-stranded poly(dG) (28 ). Thus, the repetition of sequences containing such motifs could also stabilize the complex. A large excess of (dG)34 (at least 10-100 [mu]M) was needed to see the complex formation with the repeat units containing these motifs (Fig. 7 ). However, note that there should be some interaction between the type II or III fragments and the probe (dG)34 even at 200 nM probe concentration.
A structural transition caused by phenol treatment was reported for A+T-rich large inverted repeats (29 ). This transition occurs through partial melting of duplex DNA facilitated by hydrophobic interaction which could be energetically favored by releasing the torsional stress caused by superhelicity of supercoiled DNA. This transition should be completed by forming a cruciform or hairpin structure when there is an (AT)n repeat or an A+T-rich inverted repeat in the cloned fragment, or by D-loop formation in the presence of interacting oligonucleotides. Complex formation with (dG)34 would, therefore, release torsional stress, and could be stabilized extensively by forming hydrogen bonds between the G-rich sequences and (dG)34 as shown here.
Biological significance of poly(dG)[middot]poly(dC)
As G+C-rich sequences are underrepresented in human genomic DNA, most of the sequences obtained here by affinity capture did not contain long and perfect poly(dG)[middot]poly(dC) tracts. Most consisted of guanines and other bases and they contained sequences with short (G)n-motifs and/or G-rich sequences. We tentatively classified these sequences into three types (types I-III). Since these three types of clones showed different biophysical behaviors, they may be related to different biological phenomena in vivo.
Apparently, some of these sequences were satellite DNAs, a group of DNA elements consisting of a unit sequence of a few to hundreds of base pairs long and repeats of a hundred times or more. One of the satellite DNAs (pHGC39) which had a homology to the telomere sequence (TTAGGG)n was localized at the centromeric heterochromatin region of several chromosomes. The other satellite DNA represented by the clones pHGC32 was localized at the telomeric region of chromosome 1. These satellite DNAs would be associated with functions that have been postulated for other satellite DNAs (5 ,30 ,31 ).
Poly(dG)[middot]poly(dC)-containing sequences tend to adopt non-B-DNA conformations (21 ,28 ,32 ,33 ). This is mainly due to the hydrogen bonds formed between guanine bases in the tract and guanine or other bases in other regions (34 ). In duplex DNA, guanine can accommodate three hydrogen donor or acceptor sites (N1, N2 and O6 positions) for base pairing, and the N7 position of purines can participate in an additional hydrogen bond with other bases while maintaining the G[middot]C pair. Therefore, guanine can afford multiple, stable interactions with other bases and thus the G-rich sequences tend to adopt `unusual' DNA conformations. These unusual structures containing G-tracts have been reported in association with many biological functions (35 -41 ). For example, telomere sequences contain (G)n motifs and they adopt unusual DNA conformations, quadruplex structures (42 ,43 ). We obtained such sequences, (TGGAA)n and (TGGAG)n, by affinity capture with (dG)34 (Table 1 ). Although they do not contain long stretches of poly(dG)[middot]poly(dC), the complexes can be stabilized by hydrogen bonds between perfect or interrupted (G)n motifs located in other chromosomal regions.
ACKNOWLEDGEMENT
This work was supported by a grant-in-aid from the Ministry of Education of Japan to R.K.
REFERENCES
1 Hamada,H., Petrino,M.G. and Kakunaga,T. (1982) Proc. Natl. Acad. Sci. USA79, 6465-6469.MEDLINE Abstract
2 Tautz,D. and Renz,M. (1984) Nucleic Acids Res12, 4127-4138.MEDLINE Abstract
3 Tautz,D., Trick,M. and Dover,G.A. (1986) Nature322, 652-656.MEDLINE Abstract
4 Beckmann,J.S. and Weber,J.L. (1992) Genomics12, 627-631.