Nucleic Acids Research Advance Access originally published online on October 2, 2007
Nucleic Acids Research 2007 35(22):7429-7455; doi:10.1093/nar/gkm711
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2007, Vol. 35, No. 22 7429-7455
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Survey and Summary |
Human telomere, oncogenic promoter and 5'-UTR G-quadruplexes: diverse higher order DNA and RNA targets for cancer therapeutics
1Structural Biology Program, Memorial Sloan-Kettering Cancer Center, New York, NY 10021, USA and 2Division of Physics and Applied Physics, School of Physical and Mathematical Sciences, Nanyang Technological University, Singapore
*To whom correspondence should be addressed. Tel: +1 212 639 7207; Fax: +1 212 717 3066; Email: pateld{at}mskcc.org
Received July 25, 2007. Revised August 24, 2007. Accepted August 27, 2007.
| ABSTRACT |
|---|
Guanine-rich DNA sequences can form G-quadruplexes stabilized by stacked G–G–G–G tetrads in monovalent cation-containing solution. The length and number of individual G-tracts and the length and sequence context of linker residues define the diverse topologies adopted by G-quadruplexes. The review highlights recent solution NMR-based G-quadruplex structures formed by the four-repeat human telomere in K+ solution and the guanine-rich strands of c-myc, c-kit and variant bcl-2 oncogenic promoters, as well as a bimolecular G-quadruplex that targets HIV-1 integrase. Such structure determinations have helped to identify unanticipated scaffolds such as interlocked G-quadruplexes, as well as novel topologies represented by double-chain-reversal and V-shaped loops, triads, mixed tetrads, adenine-mediated pentads and hexads and snap-back G-tetrad alignments. The review also highlights the recent identification of guanine-rich sequences positioned adjacent to translation start sites in 5'-untranslated regions (5'-UTRs) of RNA oncogenic sequences. The activity of the enzyme telomerase, which maintains telomere length, can be negatively regulated through G-quadruplex formation at telomeric ends. The review evaluates progress related to ongoing efforts to identify small molecule drugs that bind and stabilize distinct G-quadruplex scaffolds associated with telomeric and oncogenic sequences, and outlines progress towards identifying recognition principles based on several X-ray-based structures of ligand–G-quadruplex complexes.
| INTRODUCTION |
|---|
DNA can adopt structures other than the Watson–Crick duplex when actively participating in replication, transcription, recombination and damage repair. Of particular interest are guanine-rich regions, which can adopt a non-canonical four-stranded topology called the G-quadruplex. Such architectures are adopted in several key biological contexts, including DNA telomere ends, the purine-rich DNA strands of oncogenic promoter elements, and within RNA 5'-untranslated regions (UTR) in close proximity to translation start sites. Therefore, elucidation of the sequence-based diversity of G-quadruplex scaffolds could provide insights into the distinct biology of guanine-rich sequences within the genome.
Guanine-rich DNA G-quadruplexes
G-quadruplexes are built from the stacking of successive G–G–G–G tetrads (G-tetrads) and stabilized by bound monovalent Na+ and K+ cations (1). The G-tetrad is a cyclic hydrogen-bonded square planar alignment of four guanines (Figure 1a), with the guanines adopting either anti or syn alignments about glycosidic bonds (Figure 1b and c, respectively). G-quadruplexes are very stable, with their large diameter and four grooves defining a unique architecture (2) that is distinct from duplex DNA.
|
The backbone strands (or columns) that constitute the stacked G-tetrad core of the G-quadruplex can adopt different directionalities. Furthermore, the relative strand directionalities are geometrically related with the glycosidic conformation of the guanines. There are four possibilities: (i) Four strands are oriented in the same direction; the glycosidic angles around the G-tetrad are anti–anti–anti–anti (3–5), and occasionally syn–syn–syn–syn (6). (ii) Three strands are oriented in one direction and the fourth is oriented in the opposite direction; the glycosidic angles are syn–anti–anti–anti or anti–syn–syn–syn (7). (iii) Two neighboring strands are oriented in one direction and the two remaining strands oriented in the opposite direction (as a result of which each strand has both parallel and anti-parallel adjacent neighbors); the glycosidic angles are syn–syn–anti–anti (8–10). (iv) Each strand has adjacent anti-parallel neighbors; the glycosidic angles are syn–anti–syn–anti (11–14).
Loops in G-quadruplexes are linkers connecting G-rich tracts that support the stacked G-tetrad core. The loops can be classified into four major families that depend in part on the size and sequence of the linkers: (i) Edge-wise or lateral loops connect two adjacent anti-parallel strands (Figure 2a), and are generally composed of two or more residues (9,15). (ii) Diagonal loops connect two opposing anti-parallel strands (Figure 2b) (8–10), and are generally composed of three or more residues. (iii) Double-chain-reversal or propeller loops connect adjacent parallel strands (Figure 2c) (7,16,17), and can be as small as one and as large as six or more residues. The adenine in single-residue double-chain-reversal loops that bridge two G-tetrad planes can form hydrogen bonds with one edge of the G-tetrad resulting in A–(G–G–G–G) pentad formation (18) or two opposing edges of the G-tetrad resulting in A–(G–G–G–G)–A hexad formation (16). (iv) V-shaped loops connecting two corners of a G-tetrad core in which a support column is missing (Figure 2d) (18).
|
Furthermore, loop residues can form base-pairing alignments, which in turn stack with the terminal G-tetrads, further stabilizing G-quadruplex structures. These include three bases in a plane, which can be classified either as base triples, where all three bases are non-contiguous in the sequence, or as base triads (19), where two adjacent bases from one strand are involved in the pairing alignment with a base from a second strand (20). Loop conformations can adopt diverse topologies (21,22) making them attractive targets for small molecule-based ligand recognition.
The G-quadruplex topology is defined by four grooves whose dimensions (depth and width) and accessibility vary based on both the overall topology and whether the loops are edge-wise or diagonal on one hand, and double-chain-reversal on the other. G-quadruplex formation requires monovalent cations, which are positioned within the central channel of stacked G-tetrads, thereby neutralizing the strong electrostatic potential associated with the inwardly pointing guanine O6 oxygen (23). The dehydrated cations are positioned either in a tetragonal bipyramidal coordination between G-tetrads planes (K+) (Figure 1d) (10), or in a range of geometries that span positioning within G-tetrad planes to out of plane alignments (Na+) (24). It has been shown that in general G-quadruplexes prefer K+ over Na+, and that this reflects in part the much greater energetic penalty for Na+ dehydration (25). Finally, the same sequence can adopt different G-quadruplex conformations in Na+ (14) and K+ (26) solution as determined by NMR, and also as monitored by fluorescently labeled oligonucleotides (27).
The subject of G-quadruplexes has been extensively reviewed in the literature (28–38). Despite a wealth of crystal and solution structures, it has proved difficult to define a comprehensive set of rules that specify the folding propensity of G-quadruplexes. Therefore, each new guanine-rich telomeric and oncogenic promoter sequence has to be individually structurally characterized as a function of monovalent cation type and, in addition, checked for conformational heterogeneity between two or more topologies in solution.
This review presents a structural biology perspective of recent advances in structures of G-quadruplexes formed by human telomeric and oncogenic promoter G-rich tracts, as well as the potential of small molecules to target-specific G-quadruplex folds, thereby setting the stage for structure-based design of new classes of cancer therapeutics. The review also highlights the increasing attention being focused on G-quadruplexes formed by G-rich RNA sequences and their role in mRNA regulation and processing.
Biology of guanine-rich genomic sequences
Guanine-rich tracts are observed in critical segments of eukaryotic and prokaryotic genomes, promoter regions, both short microsatellite and longer minisatellite repeats, ribosomal DNAs, as well as telomeres in eukaryotes and immunoglobulin heavy chain switch regions of higher vertebrates. These guanine-rich tracts have the potential to form G-quadruplexes following transient destabilization of the duplex, a process that accompanies transcription, replication and recombination. Systematic algorithmic searches of bacterial and human genomes for guanine-rich tracts (restricted to minimum of four GGG segments separated by short linkers) (39–41) have noted that such putative G-quadruplex-forming sequences are prevalent in proto-oncogenes (which promote cell proliferation) and essentially lacking in tumor-suppressor genes (which maintain genomic stability) (42).
An increasing number of proteins have been identified that bind, promote or non-catalytically disrupt G-quadruplex formation (43–46). Both the β-subunit of the Oxytricha telomere end-binding protein (βTBP) (47) and repressor activator protein 1 (RAP1) in Saccharomyces cerevisiae (48) promote intermolecular G-quadruplex formation. In addition, the MutS
protein, involved in mismatch repair, targets G-quadruplex DNA in G-loop segments and promotes synapsis of transcriptionally activated immunoglobulin switch regions (49). Activation-induced cytosine deaminase (AID) also targets G-quadruplex DNA and plays a role in immunoglobulin class switch recombination (50). On the other hand, binding of POT1, a protein conserved from fission yeast to humans (51), disrupts G-quadruplex formation at telomeric G-rich overhangs (52), thereby promoting telomere extension by telomerase (53).
In addition, helicases catalytically unwind and nucleases cleave G-quadruplexes (43–46). RecQ DNA helicase family members are associated with genomic instability and predisposition to malignancies. The Bloom and Werner syndrome RecQ helicases bind to (54) and unwind intermolecular G-quadruplex scaffolds with a 3' to 5' polarity in the presence of ATP and Mg cations (55,56). Furthermore, G-quadruplex-specific nucleases cut within single-stranded DNA several nucleotides upstream of the G-quadruplex using a structure-specific mode of action (57–60). Gene disruption of such nucleases can lead to cellular senescence and telomere shortening (61). Such cleavage may also be required for DNA recombination and suggests that DNA quadruplexes may play a role in the formation of interchromosomal synapsis.
Strong evidence supporting G-quadruplex formation in vivo comes from the demonstration that in vitro generated single-chain antibody fragments specific for intermolecular telomeric G-quadruplex DNA react with ciliated protozoan Stylonychia lemnae macronuclei but not corresponding micronuclei (62). Additional evidence in support of G-quadruplex formation in vivo comes from the observation that telomere end-binding proteins control the formation of G-quadruplex DNA structures in vivo (63) and that intracellular transcription of G-rich DNAs induces formation of G-loops, novel structures containing G-quadruplex DNA on the non-template G-rich strand, as verified from nucleolin binding and sensitivity to G-quadruplex-specific nucleases (64). In addition, attempts have been made to monitor G-quadruplex formation at telomere proximal regions of chromosomal DNA using G-quadruplex-specific fluorescent 3,6-bis(1-methyl-4-vinylpyridinium) carbazole diiodide (BMVC) (65). Furthermore, it has been proposed that gene function correlates with potential for G-quadruplex formation in the human genome (42). Both inter- and intramolecular G-quadruplex formation has also been demonstrated for the diabetes susceptibility locus in the promoter region of the human insulin gene (66). Finally, guanine-rich tracts containing sequences capable of G-quadruplex formation have been shown to induce apoptosis in tumor cells (67–69).
In addition, guanine-rich RNA sequences capable of G-quadruplex formation have been identified in the vicinity of polyadenylation regions (70) involved in regulating 3'-end processing of mammalian pre-mRNAs (71). Such guanine-rich motifs can interact with hnRNP H protein subfamily members, thereby potentially mediating alternative, tissue-specific splicing events. There also appears to be a combinatorial code for splicing silencing which includes a combination of RNA UAGG and GGGG motifs (72). There are several examples of RNA G-quadruplex complexes that impact on pathways ranging from RNA processing, as in the case of the exoribonuclease mXRN1p (73), to translational repression, as in the case of the fd gene 5 protein (74). Guanine-rich tracts have also been observed within neuronal RNAs that bind the RGG-rich domain of the fragile X mental retardation (FXMR) protein (75,76).
| HUMAN TELOMERIC G-QUADRUPLEXES |
|---|
Telomeres, nucleoprotein complexes located at the ends of eukaryotic chromosomes, are composed of tandem DNA repeats of guanine-rich sequences (77). Telomeres are essential for chromosomal stability and genomic integrity, provide sites for recombination events and transcriptional silencing, and appear to play a critical role in cellular aging and cancer (43,78–81). Telomeric DNA ends are composed of both duplex and guanine-rich 3'-overhang segments, with the former progressively decreasing in length after each round of cell division in somatic cells (82). By contrast, telomeric overhangs can be elongated by the enzyme telomerase, a ribonucleoprotein complex with reverse transcriptase activity (83), which is expressed in the majority of cancer cells, thereby helping to maintain telomere length (84).
The pairing of homologous chromatids at their telomere ends can be mediated through bimolecular quadruplex formation (11). Such quadruplex structures may also play a role in chromosome synapsis and recombination during meiosis (85).
The guanine-rich 3'-overhangs of telomeres, such as TTAGGG repeats in humans can equilibrate between single-stranded and monovalent cation-mediated G-quadruplex folds, with the latter inhibiting the activity of telomerase. The telomeric ends in a single-stranded form are maintained by hPOT1 (86), while disruption of this interaction leads to quadruplex formation. Thus, ligand-induced stabilization of telomeric G-quadruplex scaffolds in humans constitutes a promising strategy for anti-cancer drug development (87–91). Therefore, much effort has been devoted to the structural characterization of G-quadruplex topologies formed by one, two, three and four human telomeric TTAGGG repeats as a function of monovalent cation, so as to define the scaffolds for anti-cancer drug discovery.
Though extensive studies have been undertaken on both ciliate (Tetrahymena and Oxytricha) and eukaryotic (yeast and human) telomeres, the emphasis in this review will be primarily on human telomeres. Single molecule fluorescence energy transfer (FRET) studies of structure and unfolding kinetics of the intramolecular human telomere G-quadruplex revealed two stable folded conformations in both K+ and Na+ buffers (92). Both folded conformations can be opened by addition of complementary oligonucleotide, with temperature dependent studies indicating that unfolding is entropically driven in K+ buffers (
H = 6.4 kcal mol–1 and
S = –52.3 cal mol–1 K–1), while unfolding in Na+ buffers exhibits a more significant enthalpic barrier (
H = 14.9 kcal mol–1 and
S = –23.0 kcal mol–1 K–1). Single-molecule FRET spectroscopy has also been used to probe the dynamics of human telomeric DNA containing four guanine-tracts in K+ solution. Interconversion was detected between three FRET values, interpreted in terms of an unfolded and two folded G-quadruplex states, each of which was further subdivided into long- and short-lived species (93). The short-lived species were shown to determine the overall dynamics, apparently because they bridge transitions between the long-lived G-quadruplex states.
Single-repeat sequences
The earliest structural information of the human telomere focused on NMR studies of the single-repeat d(TTAGGGT) human telomere sequence in K+ cation solution (4). The NMR data established that the single-repeat human telomere sequence tetramerizes to form an all-parallel-stranded G-quadruplex composed of three stacked G-tetrads with all anti guanine glycosidic torsion angles.
Two-repeat sequences
The X-ray structure of d(TAGGGTTAGGGT) crystals grown from K+-containing solution defined the architecture of the G-quadruplex formed by the two-repeat human telomere sequence (17). The structure contained an unanticipated all-parallel-stranded G-quadruplex following bimolecular association of the two-repeat human telomere sequences, with the TTA segments forming double-chain-reversal (or propeller) loops (Figure 3a). In addition, the end segments also participate in formation of a T–A–T–A tetrad, through pairing of the major groove edges of Watson–Crick A–T pairs (17).
|
NMR studies on the two-repeat human telomere sequence d(TAGGGTTAGGGT) demonstrates interconversion between two dimeric G-quadruplex conformers consisting of three stacked G-tetrads in K+ solution (94). One of these conformers adopts a symmetric all-parallel-stranded G-quadruplex with double-chain-reversal loops and all anti guanines (Figure 3b), similar to that observed in the crystal structure (17). This conformer predominates for an analog containing a specific dU (in bold) for T substitution (designated U6)
5'-T1AGGG5(dU)TAGG10GT-3'The other conformer adopts an asymmetric anti-parallel G-quadruplex with edge-wise loops composed of six syn guanines and six anti guanines (Figure 3c). This conformer predominates for an analog (designated U1,brU7) containing specific dU and dbrU (in bold) for T substitutions
5'-(dU)1AGGG5T(dbrU)AGG10GT-3'NMR-based complementary-strand trap, concentration-jump and temperature-jump methods have been used to monitor the kinetics of interconversion and activation barriers between the parallel and anti-parallel G-quadruplex conformers (94). The equilibrium shifts towards the anti-parallel G-quadruplex (Figure 3c) at low temperature and towards the parallel G-quadruplex (Figure 3b) at high temperature for the U1,brU7 sequence, with the corresponding enthalpy being 18.5 kcal mol–1. Furthermore, the anti-parallel G-quadruplex folds faster, but unfolds slower than the parallel quadruplex at temperatures below 40°C.
A related conformational equilibrium has also been observed between a pair of bimolecular G-quadruplexes formed by the d(TGGGGTTGGGGT) two-repeat Tetrahymena sequence in Na+-containing solution (95).
Three-repeat sequences
NMR-based studies have defined the folding topology (Figure 4a) and solution structure (Figure 4b) of the three-repeat human telomere sequence d[G3(T2AG3)2T]
5'-G1GGTT5AGGGT10TAGGG15T-3'in Na+ solution (96). This sequence forms a unique asymmetric bimolecular quadruplex, in which the core composed of three stacked G-tetrads, involves all three G-tracts from one strand and only the last G-tract of the second strand. In this (3+1) G-quadruplex assembly, there is one syn–syn–syn–anti and two anti–anti–anti–syn G-tetrads, two edge-wise loops, three G-tracts oriented in one direction and the fourth oriented in the opposite direction (Figure 4a).
|
The (3+1) G-quadruplex topology adopted by the three-repeat human telomere sequence establishes how a segment containing three G-tracts can bind to the 3'-end G-tract of another segment. Such quadruplex formation could occur within the 3'-end overhang of human telomeres or when the 3'-end invades the adjacent double-stranded segment of the telomere to form the so-called t-loop (see schematic in Figure 4c) (97).
Earlier studies on four-repeat sequences
In 1993, the NMR-based folding topology (Figure 5a) and solution structure (Figure 5b) of the four-repeat human telomeric sequence d[AG3(T2AG3)3]
5'-A1GGGT5TAGGG10TTAGG15GTTAG20GG-3'was solved in Na+ cation solution (9). The intramolecular fold contained three stacked G-tetrads connected by successive edge-wise, diagonal and edge-wise TTA loops. Each guanine-tract had both parallel and anti-parallel aligned neighboring strands around the G-quadruplex, with guanines adopting syn–syn–anti–anti glycosidic torsion alignments around each G-tetrad. The grooves were accessible for further recognition within this topology, while the connecting loops restricted access to the outward-directed faces of the terminal G-tetrads at both ends. Finally, the 5'- and 3'-terminii project toward the same ends of the G-quadruplex (Figure 5a).
|
The X-ray structure of d[AG3(T2AG3)3] crystals grown from K+ cation solution exhibited a completely different and unanticipated fold (Figure 5c) and structure (Figure 5d) for the intramolecular G-quadruplex (17). The G-quadruplex was composed of three stacked G-tetrads, such that all strands are parallel, all guanines adopt anti conformations and all three loops are of the double-chain-reversal (or propeller) type. The double-chain-reversal loops restrict access to three of the grooves, while access is available to the outward-directed faces of the terminal G-tetrads at both ends. Finally, the 5'- and 3'-terminii project toward opposite ends of the G-quadruplex (Figure 5c), thereby facilitating potential end-to-end alignments of successive G-quadruplexes.
These very different conformers reported for the four-repeat human telomeric sequence in Na+-containing aqueous solution (9) and in K+-containing crystals (17) appear to highlight the polymorphic character of G-quadruplex scaffolds (93) as a function of medium and/or monovalent cation type. Nevertheless, accumulating evidence, including biophysical measurements (98), implied that the intramolecular parallel-stranded G-quadruplex structure of the human telomere observed in K+-containing crystals, appears unlikely to be the major form in K+-containing aqueous solution. To this end, three groups have recently systematically investigated the solution structure(s) of four guanine-repeat human telomeric sequences in K+ cation solution, while keeping in mind that the more crowded environment of the crystal may more closely reflect the crowded situation in the cell nucleus.
More recent studies on four-repeat sequences
The imino proton NMR spectrum of d[AG3(T2AG3)3] in K+ cation solution is indicative of multiple conformations in equilibrium and hence this sequence context is not readily amenable to structural characterization. Three research groups (those of Hiroshi Sugiyama, Danzhou Yang and our group) have taken somewhat different approaches to overcome this limitation and recently contributed to determination of the solution structure(s) of four-repeat human telomeres in K+ solution. Our group's approach is outlined in detail below and these results are placed in the context of independent contributions from the other two groups.
The imino proton NMR spectra corresponding to distinct predominant conformers together with one or more minor conformers were observed for the d[TAG3(T2AG3)3] sequence, where a T was added at the 5'-end (99), and for the d[TAG3(T2AG3)3TT] sequence, where a T was added at the 5'-end and a TT was added at the 3'-end (100), both in K+ cation solution, with both cases maintaining the sequence context of the TTAGG human telomere repeat.
5'-T1AGGG5TTAGG10GTTAGG15GTTAG20GG(TT)-3'
The NMR-based folding topology was determined for the predominant conformer of the d[TAG3(T2AG3)3] sequence in K+ cation solution (Figure 6a), and the solution structure determined for an analog containing terminal modifications (underlined) of this sequence, namely d[TTG3(T2AG3)3A], with the latter yielding exceptional NMR spectra reflecting a single conformer, together with the same 2D spectral characteristics of the unmodified sequence (99). Similarly, insertion of a single 8-bromoguanine at position G16 in the d[TAG3(T2AG3)3] sequence to enforce a syn glycosidic bond at this position also resulted in NMR spectra corresponding to a single conformer with all the spectral characteristics of the unmodified sequence (101). The solution structure has been determined for the d[TAG3(T2AG3)3] G-quadruplex (designated human telomere G-quadruplex form-1) (Figure 6b) (101), whose (3+1) topology differs from folds reported previously in Na+ solution (Figure 5a) (9) and K+-containing crystal (Figure 5c) (17). Instead, this G-quadruplex contains three G-tracts oriented in one direction and the fourth in the opposite direction, one anti–syn–syn–syn and two syn–anti–anti–anti G-tetrads, and a double-chain-reversal loop followed by two edge-wise loops (99).
|
The same G-quadruplex folding topology (Figure 6a) has been independently reported for the four-repeat human telomere sequences in K+-containing solution by two other laboratories, one of which used NMR (102,103), while the other used both CD (104) and NMR (105). The NMR investigation by the former group focused on the sequence d[AAAG3(T2AG3)3AA], with the resulting (3+1) topology (102) stabilized by a stacked A–A–A triple (103), associated with introduction of terminal adenine modifications (underlined) at either end of the sequence. The latter groups research avoided terminal modifications and was based on judicious positioning of between four and five 8-bromoguanine substitutions, which enforce a syn guanine alignment at the corresponding guanines in the sequence (104,105).
The NMR-based folding topology has also been determined for the predominant conformer of the d[TAG3(T2AG3)3TT] sequence in K+ cation solution (100). This sequence adopts the same (3+1) G-quadruplex core topology adopted by the predominant conformer of the d[TAG3(T2AG3)3] in K+ cation solution (99) outlined in the previous paragraph, except that the first two linkers are of the edge-wise type and the last linker adopts a double-chain-reversal loop (designated human telomere G-quadruplex form-2) (Figure 6c). Insertion of a single 8-bromoguanine at position G15 in the sequence to enforce a syn glycosidic bond at this position resulted in NMR spectra corresponding to a single conformer with all the spectral characteristics of the unmodified sequence (101). The solution structure of the d[TAG3(T2AG3)3TT] G-quadruplex form-2 is shown in Figure 6d (101). An independent NMR-based study (106) has reached the same conclusions reported above regarding the folding topology (100) and solution structure (101) of form-2.
The demonstration of G-quadruplex forms 1 (Figure 6a) and 2 (Figure 6c) for the four-repeat human telomere in K+, together with the all-parallel-stranded, propeller-groove-linked G-quadruplex observed in crystals grown from K+ solution (Figure 5c) (17), support the view that multiple human telomeric G-quadruplex conformers can coexist in K+-containing solution, a conclusion reached from single molecule FRET studies of the four-repeat human telomere sequence (92). Furthermore, these studies establish that even small changes to flanking sequences perturb the equilibrium between different coexisting (3+1) G-quadruplex forms. More recent research has attempted to monitor G-quadruplex formation by the four-repeat human telomere in K+ solution under polyethylene glycol-induced crowding conditions (107) that perhaps mimic crystallization conditions.
(3 + 1) G-quadruplex fold
The (3 + 1) G-quadruplex scaffold is unique in that three stands are oriented in one direction and the fourth oriented in the opposite direction. Furthermore, two of the three G-tetrads adopt anti–anti–anti–syn alignments while the remaining G-tetrad adopts a syn–syn–syn–anti alignment. This topology was first reported in 1994 for the four-repeat Tetrahymena telomere sequence, d(T2G4)4, in Na+ solution (7) and observed a decade later for a four guanine-repeat variant bcl-2 promoter in K+ solution in which two guanines were replaced by thymines (108) (see bcl-2 sequence section).
The adaptation of the (3 + 1) core G-quadruplex by the three-repeat human telomere dimeric G-quadruplex in Na+ solution (Figure 4a) (96), as well as by the four-repeat human telomere G-quadruplexes form-1 (Figure 6a) and form-2 (Figure 6c) in K+ solution, established it to be a robust folding topology, thereby highlighting its candidacy as an important platform for structure-based drug design.
| ONCOGENIC PROMOTER G-QUADRUPLEXES |
|---|
Bioinformatics sequence analysis indicates that guanine-rich tracts capable of G-quadruplex formation are prevalent in the human genome (39–41). In addition, it has recently been shown that promoter regions spanning 1 kb upstream of transcription start sites of genes are significantly enriched in putative G-quadruplex-forming motifs and that these putative promoter G-quadruplex-forming regions strongly associate with nuclease hypersensitivity sites (109). It has been suggested that such promoter-based G-quadruplexes may be directly involved in gene regulation at the level of transcription (110). This has led to extensive investigations of the role of promoter-mediated G-quadruplex formation in transcriptional regulation of the oncogenic promoters of c-myc (111), VEGF (112), HIF-1
(113), bcl-2 (114) and c-kit (115,116). Since promoter regions are part of DNA duplexes, they would be unwound during replication, prior to G-quadruplex formation. Support for this concept has emerged from single-molecule FRET studies on the c-kit promoter (117). This process could be facilitated by formation of single-stranded tracts during transcription and further stabilized through addition of G-quadruplex-stabilizing ligands (118).
Earlier studies on c-myc sequence
Human c-myc is a transcription factor that is central to regulation of cell growth, proliferation, differentiation and apoptosis (119–121). The c-myc gene that encodes this protein is tightly regulated in normal cells and its aberrant overexpression is associated with the progression of many cancers (122). c-myc can be deregulated as a result of translocation, mutation and/or amplification. An important element in the c-myc promoter region, termed the nuclease hypersensitivity element IIII (NHE IIII), controls up to 90% of total c-myc transcription (123). The 27-nt purine-rich strand of this element, which contains six guanine-tracts (underlined)
5'-T1GGGG5AGGGT10GGGGA15GGGTG20GGGAA25GG-3'has the capacity for forming alternate G-quadruplex folds depending on which tracts participate in scaffold formation (111,124,125). Guanine to adenine mutants within the 27-nt c-myc segment that destabilize G-quadruplex formation, result in increased c-myc transcription, while ligands like the porphyrin TMPyP4 that stabilize G-quadruplex formation, result in decreased c-myc transcription (111).
The imino proton NMR spectrum of the 27-nt c-myc NHE IIII segment containing six guanine-tracts exhibited characteristics of multiple G-quadruplex folds in equilibrium, including a broad envelope characteristic of aggregated species, precluding structural characterization. Therefore, systematic NMR studies have been restricted to four and five guanine-tract sequences as part of an effort towards understanding the underlying principles contributing to c-myc G-quadruplex formation.
Initial efforts have focused on G-quadruplexes that can be generated through involvement of four of the six guanine-tracts associated with the 27-mer c-myc NHE IIII element. Over 50 sequence variants were checked prior to the identification of two that gave imino proton spectral quality reflective of distinct single conformers that justified further structural characterization (126). One of these involved the second, third, fourth and fifth guanine-tracts (designated c-myc-2345) as reflected in the sequence
TG5AGGGT10GGGGA15GGGTG20GGGAA25while the other involved the first, second, fourth and fifth guanine-tracts (designated variant c-myc-1245), with the guanines of the third tract replaced by thymines (in bold, below), as reflected in the sequence
T1GGGG5AGGGT10TTTTA15GGGTG20GGGAThe resulting NMR-based intramolecular G-quadruplex folding topologies in K+ solution for both c-myc-2345 and thymine for guanine-containing variant c-myc-1245 sequences contain a core of three stacked G-tetrads formed by four parallel G-tracts with all anti guanines and three double–chain-reversal loops bridging G-tetrad layers (126). The c-myc-2345 fold is shown in Figure 7a, while that for variant c-myc-1245 is shown in Figure 7b. These studies establish that single-residue (A or T) double-chain-reversal loops can bridge three G-tetrad layers. Indeed, systematic studies of DNA quadruplexes with different arrangements of short and long loops confirm that single-residue loops favor parallel-stranded topologies (127). Of the two G-quadruplex folds, c-myc-2345, which has a two-residue central loop (Figure 7a), is more stable by 15° than variant c-myc-1245, which has a six-residue central loop (Figure 7b), in K+ solution. This is also reflected in the imino proton exchange lifetimes of the central G-tetrads, which are longer for the c-myc-2345 compared to variant c-myc-1245, suggesting slower unfolding kinetics for the former G-quadruplex (126).
|
An NMR-based solution structure has been reported for a variant c-myc-2345 sequence in which guanines G14 and G23 have been replaced by thymines (in bold, below)
TG5AGGGT10GGGTA15GGGTG20GGTAA25The solution structure of this variant c-myc-2345 (Figure 7c) (128) adopts the topology (Figure 7a) shown previously for unmodified c-myc-2345 (126).
The NMR-based G-quadruplex topologies for myc-2345 (Figure 7a) and variant myc-1245 (Figure 7b) (126), as well as the related study of the solution structure of the variant c-myc-2345 (Figure 7c) (128) G-quadruplexes correct earlier conclusions regarding proposed c-myc folding topologies based solely on interpretation of footprinting data (111), in an otherwise highly cited contribution.
More recent studies on c-myc sequence
The variant c-myc-1245 (126) and c-myc-2345 (128) sequences replace guanines by thymines within G-rich tracts. Thymine, unlike inosine, has nothing in common with guanine, and thymine for guanine substitutions represent a significant perturbation of the wild-type c-myc sequence. Therefore, structural studies were next extended to the c-myc sequence containing five of the six guanine-tracts associated with the 27-mer c-myc NHE IIII element, while avoiding any thymine for guanine substitutions. This sequence (designated c-myc-23456)
5'-TG5AGGGT10GGGGA15GGGTG20GGGAA25GG-3'is composed of the second, third, fourth, fifth and sixth guanine-tracts. The NMR-based folding topology (Figure 8a) and solution structure (Figure 8b) of the c-myc-23456 G-quadruplex in K+ solution is composed of three stacked guanine tetrads formed by four parallel guanine-tracts with all anti guanines and a snap-back 3'-end syn guanine (129). The guanines involved in G-tetrad formation are highlighted in bold below
5'-TG5AGGGT10GGGGA15GGGTG20GGGAA25GG-3'and involve guanines from each of the five tracts. This snap-back configuration is facilitated by a stable diagonal loop, which contains a G–(A-G) triad, which stacks on and caps the G-tetrad core at one end of the G-quadruplex. The 5'- and 3'-ends of the sequences are at opposite ends of the snap-back c-myc-23456 G-quadruplex (Figure 8a) (129), as they are for the c-myc-2345 (Figure 7a) and variant c-myc-1245 (Figure 7b) G-quadruplexes (126).
|
c-kit sequences
The proto-oncogenic c-kit promoter encodes for a tyrosine kinase receptor, thereby regulating signal transduction cascades that control cell growth and proliferation (130). Oncogenic cellular transformations in c-kit are associated with mutations in structurally important regions, with human gastrointestinal stromal tumors (GIST) associated with mutations around the two main autophosphorylation sites in the juxtamembrane region (131), while myeloid leukemias and human germ cell tumors are associated with kinase domain mutants (132). The drug Gleevec (imatinib) is an effective in vitro and in vivo inhibitor of c-kit kinase activity and is widely used clinically against GIST (133). Like other small molecule drugs targeted against kinases, new patterns of resistance mutations within the active site, result in diminished binding and clinical effectiveness of the drug (134).
Selective gene regulation at the transcription level provides an alternate approach to c-kit inhibition. This can be achieved by induction of G-quadruplex structures within G-rich tracts of the c-kit promoter and their potential stabilization by bound ligands. Recently, imino proton NMR spectral studies established that the c-kit1 22-mer sequence
5'-A1GGGA5GGGCG10CTGGG15AGGAG20GG-3'positioned between –87 and –109 nt upstream of the transcription start site of the human c-kit gene, forms a single G-quadruplex scaffold in K+ solution (115). Expectations that this sequence, which contains four GGG tracts (underlined, above), forms a conventional G-quadruplex, appeared unlikely when it was found that mutations within the linker segments were detrimental to G-quadruplex formation (115). It should be mentioned that a second highly conserved guanine-rich sequence has been recently identified in the c-kit gene, at a site critical for core promoter activity (116).
The NMR-based solution structure has been determined for the 22-mer c-kit1 sequence in K+ cation solution (135). The c-kit1 sequence, which exhibits an exceptionally well-resolved NMR spectrum (115), adopts a G-quadruplex topology (Figure 8c) and solution structure (Figure 8d) composed of three stacked G-tetrads and four connecting loops. The guanines involved in G-tetrad formation (in bold, below) include isolated guanine G10, but excludes G20 of the last G-tract.
5'-A1GGGA5GGGCG10CTGGG15AGGAG20GG-3'Two single-residue linkers (A5 and C9) form two double-chain-reversal loops that bridge three G-tetrad layers, the two-residue linker connects two adjacent corners (G10 and G13), while the five-residue linker allows the terminal G21–G22 step to be inserted back into the G-quadruplex core. The loops are stabilized through formation of a Watson–Crick A–T pair that stacks over the top of the G-quadruplex and two non-canonical G–A pairs that stack over the bottom of the G-quadruplex.
This structure establishes a new folding principle that an isolated guanine (G10 in the present case) within a non-G-tract segment can participate in the formation of the structured G-quadruplex core (135). This result raises an element of caution regarding the use of programs that predict G-quadruplex folding topologies from sequence data, where they rely solely on the participation of guanines within G-tracts. Another notable feature is associated with formation of a snap-back parallel-stranded G-quadruplex core, where the last two guanines insert back into the core to complete adjacent G-tetrad alignments (Figure 8c). The 5'- and 3'-ends of the sequences are at opposite ends of the snap-back c-kit1 G-quadruplex, thereby allowing continuation of the DNA sequence in both directions without significant steric hindrance.
Both the c-myc 23456 (Figure 8a) (129) and c-kit1 (Figure 8c) (135) scaffolds contain distinct pronounced clefts, with their unique surface topologies making them attractive site-selective targets for drugs.
bcl-2 sequence
The bcl-2 gene mediates the t(14;18) chromosomal translocation associated with the onset of lymphomas (136,137). The bcl-2 gene is overexpressed in several human cancers, with the gene product functioning as an apoptosis inhibitor, thereby impacting adversely on the therapeutic action of cancer treatment regimes in the clinic (138). Thus, both the bcl-2 gene and its gene product constitute rational targets for anti-cancer therapy.
Transcriptional initiation of bcl-2 is controlled by a major promoter P1, containing a guanine-rich strand upstream of the initiation site and proximal to a nuclease hypersensitivity region (114). This bcl-2 promoter region contains six guanine-tracts containing three or more contiguous guanines (underlined)
5'-GGGGCG1GGCG5CGGGA10GGAAG15GGGGC20GGGAG25CGGGG-3'with non-denaturing gel, footprinting and cd data interpreted in terms of a mixture of at least three G-quadruplex conformers in K+ solution. The second to fifth G-tracts (designated bcl-2 2345) forms the most stable G-quadruplex (114), and an attempt has been made to structurally investigate this sequence composed of the four central guanine-tracts. The NMR studies were undertaken on a variant in which guanines G15 and G16 were replaced by thymines (in bold, below) (108).
5'-G1GGCG5CGGGA10GGAAT15TGGGC20GGG-3'The thymine for guanine-containing variant of the bcl-2 2345 adopts a (3 + 1) G-quadruplex topology (108) (Figure 9a) and solution structure (139) (Figure 9b).
|
The same (3 + 1) G-quadruplex scaffold was first reported over a decade ago for the four-repeat Tetrahymena telomere sequence, d(T2G4)4,
5'-T1TGGG5GTTGG10GGTTG15GGGTT20GGGG-3'in Na+ solution, with its unanticipated double-chain-reversal loop-containing folding topology (Figure 9c) and solution structure (Figure 9d) (7), considered to be an anomaly at that time.
Replacement of single guanines by inosines, where the exocyclic amino groups are replaced by protons, have been used previously in NMR-based studies of G-quadruplex formation in efforts to improve spectral quality (96). By contrast, replacement of two guanines by thymines in variant bcl-2 2345 constitutes a much more serious perturbation, especially for an internal guanine-tract, preventing these two guanines from potential participation in G-tetrad formation. Thus, opportunities exist for structurally investigating unperturbed bcl-2 oncogenic promoter sequences, perhaps involving five of the six guanine-tracts, as was accomplished previously for c-myc-23456 (129).
VEGF and HIF-1
sequences
Vascular endothelial growth factor (VEGF) stimulates the formation of new blood vessels, providing oxygen and nutrients to primary tumor sites, thereby facilitating the proliferation of cancer cells. VEGF-mediated tumor angiogenesis, has stimulated interest in the VEGF gene and its potential as a target for cancer therapy (140). Elevation of VEGF expression in cancer is primarily regulated at the transcription level, with the VEGF promoter containing a purine-rich strand composed of five guanine-tracts of at least three guanines each (underlined)
5'-G1GGGC5GGGCC10GGGGG15CGGGG20TCCCG25GCGGG30G-3'that also serves as binding sites for Sp1 and Egr-1 transcription factors. The guanine-rich VEGF sequence forms G-quadruplex structures in monovalent cation solution (as monitored by cd and footprinting measurements), which are stabilized by G-quadruplex-interacting agents TMPyP4 and telomestatin (112). In addition, a DNase1 and S1 nuclease hypersensitivity site was identified to the 3'-side of the G-quadruplex forming region, but not for mutant sequences that inhibit quadruplex formation. Finally, the cd spectrum of the guanine-rich VEGF sequence in K+ is consistent with formation of a parallel-stranded G-quadruplex. Overall, the results are suggestive of the importance of structural transitions in enhancing open promoter complex formation, thereby facilitating transcriptional regulation (112).
Hypoxia inducible factor-1
(HIF-1
) is activated in many common human tumors and is associated with local invasion and metastasis (141). The HIF-1
promoter contains five guanine-rich tracts of at least three guanines each (underlined)
5'-G1GGCG5CGCGG10GGAGG15GGAGA20GGGGG25CGGG-3'capable of all-parallel-stranded G-quadruplex formation in K+ solution, as indicated by chemical probing, cd and DNA polymerase arrest assays (113). Considerable effort has gone towards targeting HIF-1
in cancer therapy (142).
To date, no systematic structural investigations have been undertaken to determine the G-quadruplex structures adopted by the guanine-rich tracts of either the VEGF or HIF-1
promoters.
| TRIPLET REPEAT DISEASE G-QUADRUPLEXES |
|---|
A series of nucleotide or repeat expansion disorders caused by the dynamic intergenerational expansion of triple repeat d(CGG)n–d(CCG)n, d(CAG)n–d(CTG)n and d(GAA)n–d(TTC)n sequences are associated with neurological, neuromuscular and neurodegenerative disorders (143,144). These diseases exhibit genetic anticipation, whereby the symptoms and penetrance are manifested in subsequent generations at a decreased age of onset and increased severity. The expandable repeats are found in diverse settings ranging from coding segments, to 5'- and 3'-UTRs, promoter regions and introns. It is likely that the pathogenesis of these debilitating diseases, and their disruption of cellular replication, repair and recombination machineries, reflects unusual DNA conformations generated for long repeats, for which several secondary structural models have been proposed in the literature (145–149). These guanine-containing repeats within complementary repetitive strands of the duplex can form slip-out hairpin-like folds (150), which in turn could form higher order architectures, including quadruplex formation following bimolecular association. One of these repeat expansion models proposes that the higher order structures stall the replication fork, giving time for addition of extra repeats, prior to replication fork restart (151).
Though the early emphasis on triplet expansion diseases was focused on the DNA template, more recent analysis has brought RNA repeats to the forefront, with the emphasis on gain-of-function contributions at the RNA level (152). Thus, structural studies need to be undertaken on both triplet repeat-containing DNAs and RNAs.
CGG triplet repeats
There has been considerable interest in the molecular basis for expansion of d(CGG)n–d(CCG)n tracts in genomic DNA that results in the onset of the FXMR syndrome (153,154), the single most common inherited cause of mental retardation (155). The d(CGG)n triplet repeat (can be designated CGG, GGC or GCG repeat depending on the phase of the readout) is observed within the first exon of the FMR-1 gene with n < 30 nt in normal individuals. This number increases up to
200 nt in premutation carriers and further expands up to 2000 nt in individuals afflicted with fragile X syndrome. The genetic instability associated with the expansion of d(CGG)n repeats to the diseased state is facilitated by hypermethylation of cytosine residues (156) and results in suppression of FMR-1 gene transcription (154) and delay in replication in patients with the FMR-1 syndrome (157). It was initially shown that the fragile X syndrome d(CGG)n repeat forms a stable G-quadruplex in the presence of monovalent cations when n = 7, and also when n = 5, for its methylated cytosine counterpart (158). In addition, d(CGG)n repeats form structures that block DNA synthesis in vitro (159), with the block overcome by the Werner syndrome (WRN) helicase (160). Interestingly, the cationic porphyrin TMPyP4 (161) and the hnRNP-related protein CBF-A (162,163), both destabilize quadruplex formation, in contrast to their structural stabilization of the human telomere G-quadruplex.
Very high-quality NMR spectra were observed for d(GCGGT3GCGG), a sequence that embeds CGG and GCG steps, in Na+ solution, thereby defining a distinct folding topology (Figure 10a) and solution structure (Figure 10b) (164).
5'-G1CGGT5TTGCG10G-3'The sequence forms a bimolecular quadruplex containing G–C–G–C tetrads (Figure 10c) flanked by G–G–G–G tetrads in solution. The loops adopt edge-wise conformations and are aligned at opposite ends of the bimolecular quadruplex, while the strands directionalities alternate around the G-quadruplex and the G-tetrads adopt anti–syn–anti–syn alignments (Figure 10a). These studies establish the pairing alignments that can be potentially utilized by sequences containing the fragile X syndrome d(CGG)n triplet repeat to form quadruplex structures. Such quadruplex structures, stabilized by a mixture of G–C–G–C and G–G–G–G tetrads [see also, (165), for an alternate, but not structurally characterized quadruplex model], could serve as potential blockage sites for the progress of replication forks and account for the blockage at the fragile X locus observed experimentally (157).
|
GAA triplet repeats
The d(GAA)n-repeat is of considerable biological interest since expansion of d(GAA)n–d(TTC)n triplet repeats located within the first intron of the frataxin gene contributes to Friedrich's ataxia, an autosomal recessive neurodegenerative disease (166). The non-G–C rich nature of the sequence, together with the intronic localization and the requirement of both alleles, makes Friedrich's ataxia unique amongst the triplet-repeat disease sequences. Expression of the d(GAA)n triplet repeat leads to reduced levels of frataxin mRNA transcripts, and it has been shown to reflect impediment in transcription elongation, in a length and supercoil dependent manner (167). This impediment could reflect formation of a stable nucleic acid architecture (168), and several models have been proposed ranging from triplexes (169) to parallel-stranded duplexes (170). The parallel-stranded duplex model for d(GAA)n triplet repeats is intriguing, since further bimolecular pairing could result in quadruplex formation.
| NOVEL QUADRUPLEX FOLDS AND TETRAD PAIRING ALIGNMENTS |
|---|
G-quadruplexes can contain pairing alignments beyond the G-tetrad and considerable effort has gone into defining these alignments. These include other homo- and mixed-tetrad pairing alignments, triads, pentads, hexads and heptads. Triads and triples are generally observed within edge-wise and diagonal loop regions, where they stack on terminal G-tetrads. By contrast, mixed tetrads, pentads and hexads are observed at both the ends and within G-quadruplexes.
Double-chain-reversal loops
Early structural studies identified edge-wise (9,15), and diagonal (8,10) loops that bridged anti-parallel-aligned columns around the G-quadruplex. An unanticipated development was the identification of double-chain-reversal loops that bridged adjacent parallel-aligned columns within the four-repeat Tetrahymena G-quadruplex (7). In this case, two thymine residues span three stacked G-tetrad planes. Next it was demonstrated that single residue double-chain-reversal loops can span two G-tetrad planes in an all parallel-stranded G-quadruplex (16). The importance of double-chain-reversal loops emerged center-stage following the structure determination of the four-repeat human telomere from crystals grown in K+ solution (17), where all three TTA loops were of the double-chain-reversal (or hairpin) type and each spanned three stacked G-tetrad planes (Figure 5c). The next discovery was that single-residue double-chain-reversal loops could span both two (16) and three (126) stacked G-tetrads. The latter result was most unexpected but was confirmed in subsequent studies on additional G-quadruplex folds (128,129,171,172).
Mixed tetrads
The standard view of G-quadruplex formation involves a scaffold stabilized by stacked G–G–G–G tetrads. Nevertheless, mixed tetrads can also stabilize G-quadruplex formation and these include major groove-aligned G–C–G–C tetrads of the direct (Figure 10c) (14,164,173,174) and slipped (Figure 10d) (26) type and major groove-aligned A–T–A–T tetrads of the direct (17) and slipped (173) type. Minor groove-aligned mixed G–G–G–G and A–T–A–T tetrads have also been structurally characterized, but the bases deviate significantly from the tetrad plane (175,176).
A–A–A–A tetrads
NMR studies on d(AGGGT) in K+ solution are consistent with formation of a parallel-stranded G-quadruplex (177). Somewhat unexpectedly, nuclear Overhauser enhancement (NOE) cross peaks were observed between the adenine amino protons and the non-exchangeable H8 and H2 protons. This has lead to the proposal of A–A–A–A tetrad formation, with rapid interconversion between N6HN7 and N6HN3 hydrogen-bonding alignments. Furthermore,









