ABSTRACT
GPAT and AIRC encode enzymes for steps one and six plus seven respectively in the pathway for de novo purine nucleotide synthesis in vertebrates. The human GPAT and AIRC genes are divergently transcribed from a 558 bp intergenic promoter region. Cis-acting sites and transcription factors important for bidirectional expression were identified. A cluster of sites between nt 215 and 260 are essential, although not sufficient, for expression of both genes. Two proteins from HepG2 cell nuclear extract, identified as NRF-1 and Sp1, bound to the promoter at sites within the 215-260 region. NRF-1 was required for stable binding of Sp1. Deletion of a 5' promoter region including nt 215-260 resulted in decreased expression of GPAT and AIRC in transfected HepG2 cells. The decreased expression was accounted for by point mutations in an NRF-1 site and either of two flanking sites for Sp1. These transcription factors account in part for the coordinated expression of human GPAT and AIRC.
De novo synthesis of purine nucleotides proceeds by a 10 step pathway to the branch point intermediate IMP. AMP and GMP are each derived in two steps from IMP. Six genes encode the enzymes for IMP synthesis. Three of these genes in vertebrates, GART, AIRC and IMPS, code for multifunctional enzymes (1 ). GPAT, which encodes the enzyme for step one and is the key regulatory enzyme of the pathway, has been found to be closely linked on human chromosome 4 to AIRC (steps six and seven) (2 ), whereas the remaining genes for the pathway are all on different chromosomes. GPAT and AIRC, along with GART and ADSS1 (3 ), are the only vertebrate genes of the pathway thus far cloned. GPAT and AIRC have been isolated from chicken (4 ), rat (5 ) and human (2 ). In these animals GPAT and AIRC are divergently transcribed from an intergenic promoter region of ~230-625 bp. Whereas exon/intron organization and coding sequences are highly conserved in these mammalian and avian genes, there is only limited nucleotide sequence similarity in the intergenic region of the human and rat genes and there is no recognizable nucleotide sequence similarity between the comparable intergenic regions in the chicken and mammalian GPAT-AIRC. Yeast artificial chromosomes have been isolated that contain human GART, which encodes the trifunctional enzyme for steps two, three and five in the purine pathway (6 ). However, there is presently no further analysis of the GART gene.
In rat fibroblasts GPAT and AIRC mRNAs both showed ~5- to 6-fold increases in the G1/S phase of the cell cycle (5 ), consistent with co-expression. There is no information currently about factors that are utilized to coordinate expression of genes for the de novo pathway. It is known, however, that one or a small number of cis-acting sites are sufficient for co-expression of other divergently transcribed genes (see for example 7 -13 ) and thus could provide a mechanism to coordinate production of the enzymes encoded by GPAT and AIRC. In this report we have identified cis-acting sequences and transcription factors that are necessary for bidirectional transcription of the human GPAT-AIRC locus. This provides a starting point for determining how the six genes required for IMP synthesis are coordinately expressed.
The GPAT transcription start site was determined by RNase protection (14 ) using a kit from Ambion. Total RNA was isolated from HepG2 cells by the CsCl2 ultracentrifugation method (15 ). Two antisense RNA probes, nt 410-773 and 518-773 (Fig. 1 ), were synthesized in vitro by T7 RNA polymerase and radiolabeled with [[alpha]-32P]ATP.
The human GPAT-AIRC intergenic region nt 2-773 (Fig. 1 ) was cloned into the pLUC/CAT-3 bireporter promoter probe vector (16 ) to give the two possible orientations of the intergenic region in plasmids pHLC-1 and pHLC-2 (see Fig. 3 ). A series of deletions was introduced into the intergenic region as described previously (16 ). In the course of verifying the deletions both strands of the entire intergenic region were sequenced multiple times.
HepG2 cells were grown to 60-80% confluency in Eagle's minimum essential medium (MEM) supplemented with 10% fetal bovine serum at 37oC with 5% CO2. Cells were transfected by the calcium phosphate procedure (17 ) using 8 [mu]g CAT-LUC reporter plasmid and 2 [mu]g RSV-lacZ plasmid (18 ). All transfections were repeated at least twice. Two thirds of the cells from a 35 mm dish were used to prepare extract for the chloramphenicol acetyltransferase (CAT) assay and one third for luciferase (LUC) and [beta]-galactosidase assays. CAT was assayed by the liquid scintillation counting procedure (19 ). LUC and [beta]-galactosidase activities were determined by chemiluminescent assays as described by the suppliers of reagents (Promega and Tropix). Light emission was measured as relative light units (RLUs) using a Monolight 2010 luminometer (Analytical Luminescence Laboratory). All enzyme assays were in duplicate and results averaged. Protein concentration was determined by the Bradford procedure (20 ). CAT and LUC specific activities from at least three separate transfections were normalized for [beta]-galactosidase activity.
Incubations for gel shift assay of protein-DNA binding contained 25 mM HEPES, pH 7.6, 50 mM KCl, 0.1 mM EDTA, 0.1% NP40, 10% glycerol, 10 mM MgCl2, 2 [mu]g BSA, 10 fmol 32P-labeled DNA probe, binding protein (25 [mu]g nuclear extract, 2 ng affinity purified protein or 20 ng Escherichia coli extract) and 4 [mu]g non-specific DNA [sonicated salmon sperm DNA or poly(dI[middot]dC)]. The final volume was 20 [mu]l. Sperm DNA was used as non-specific DNA for detection of binding to site N1 and poly(dI[middot]dC) was used for binding to GC boxes. With purified DNA binding proteins, the same pattern of binding was obtained with or without non-specific DNA, thus non-specific DNA was omitted. The binding mixture was incubated for 15 min at room temperature. For supershift experiments, 1 [mu]l antiserum or control serum was added after the 15 min incubation. The mixture was incubated for an additional 15 min at room temperature prior to electrophoresis on a 5% polyacrylamide gel (17 ). The following antisera were used: goat anti-NRF-1 raised against recombinant NRF-1 was a generous gift of Richard Scarpulla (Northwestern University Medical School); non-immune goat serum was from an unrelated goat; rabbit anti-TLS raised against recombinant TLS was provided by David Ron (Department of Medicine, New York University Medical Center).
DNase I footprinting was carried out according to standard procedures (17 ) with incubations for protein-DNA binding similar to those for gel shift. DNA probes of 167 (nt 180-346) or 276 (nt 2-277) bp containing GC boxes gc-4, gc-5, gc-6 and NRF-1 site N1 (Fig. 1 ) were made by PCR. The fragments were labeled with [32P]dCTP at one end.
HeLa cells were grown in spinner culture in MEM with 10% calf serum at 37oC to a cell density of ~106 cells/ml. Nuclear extract was prepared from 15-20 l batches of cells (21 ) and stored at -80oC. For purification of the DNA binding activity, nuclear extract from a 20 l batch of cells was precipitated with 50% (NH4)2SO4 and the resulting pellet dissolved in 4 ml TM buffer containing 50 mM Tris-HCl, pH 7.9, 12.5 mM MgCl2, 1 mM EDTA, 1 mM dithiothreitol (DTT), 10% glycerol. The entire solution was applied to a 2.5 * 88 cm column of Sephacryl S-300 equilibrated in TM buffer plus 0.1 M KCl. Fractions containing DNA binding activity were pooled, 200 [mu]g/ml sonicated salmon sperm DNA added and the glycerol concentration increased to 20%. Aliquots were applied to three 1 ml DNA affinity columns equilibrated with buffer Z (25 mM HEPES, pH 7.6, 0.1 M KCl, 12.5 mM MgCl2, 1 mM DTT, 20% glycerol, 0.1% NP40). Each column was washed four times with 2 ml buffer Z containing 0.2 M KCl and proteins were then eluted batchwise with buffer Z containing increasing concentrations of KCl: 1 ml 0.3 M, 1 ml 0.4 M, 3 ml 0.5 M, 1 ml 0.6 M. Fractions eluted by 0.5 and 0.6 M KCl containing DNA binding activity were pooled and the salt concentration was reduced to 0.2 M KCl with buffer Z. Sonicated salmon sperm DNA was added to the diluted fraction as before, the solution was incubated on ice for 10 min and applied to the same three DNA affinity columns that had been used previously, stripped in buffer Z containing 1 M KCl and equilibrated with 0.2 M KCl. Elution was by the same batch method as the first time. Fractions with DNA binding activity in buffer Z plus 0.5 M KCl were pooled and stored at -80oC. To determine the protein concentration of the affinity purified binding proteins, 10% trichloroacetic acid was added to a 100 [mu]l aliquot and frozen overnight at -80oC. After thawing, the precipitated protein was electrophoresed on an SDS-7.5% polyacrylamide gel with known amounts of bovine serum albumin alongside. After silver staining (17 ), the quantity of binding protein was estimated by comparison with protein standards. Data from three 15-20 l preparations were combined for the purpose of summarizing the results of protein purification, as given in Table 1 .
Table 1
The DNA affinity columns were prepared by annealing 34mer oligonucleotides 5'-GATCCCCGCCGCGCAGGCGCAGAGACGCGACCCC and 5'-GATCGGGGTCGCGTCTCTGCGCCTGCGCGGCGGG, ligating the double-stranded oligomers end-to- end with T4 ligase and coupling the ligated dsDNA to CNBr- activated Sepharose by the method of Kadonaga (22 ).
Affinity purified protein was concentrated by centrifugation with a Centricon-10 membrane and precipitated with 10% trichloroacetic acid at -80oC overnight. Approximately 70 pmol protein purified from 100 l HeLa cells were electrophoresed on a SDS-7.5% acrylamide gel and stained with Coomassie blue. The stained protein band was excised, cut into pieces and digested with 0.6 ng protease Lys-C (Wako Quality Research Products) in 50 mM Tris-HCl, pH 9.0, 0.02% Tween 80 at 37oC for 15 h. Digested peptides were recovered by two extractions, each with 200 [mu]l 60% acetonitrile, 0.1% trifluoroacetic acid at 37oC for 20 min. The combined extracts were concentrated to 50 [mu]l in a rotary evaporator under vacuum and peptides separated by reverse phase HPLC using a C18 column and peptide detection at 214 nm. Peptide sequencing was carried out using an Applied Biosystems gas phase sequenator using standard operating procedures.
A plasmid having full-length NRF-1 cDNA under control of the T7 promoter was provided by Richard Scarpulla (Northwestern University Medical School). Overexpression was obtained in E.coli B834(DE3) induced by 1 mM IPTG at 21oC in LB medium (23 ). Cells were grown for 6 h after induction. An extract was obtained by breaking cells in a French press and centrifugation at 27 000 g for 30 min. NRF-1 accounted for ~10% of the soluble protein. Affinity-purified HeLa Sp1 was purchased from Promega. Binding of Sp1 was assayed by gel shift using a 25 bp synthetic oligonucleotide or with fragments of the GPAT-AIRC promoter.
The GPAT transcription start site was estimated previously from the position of a pseudogene (2 ). We have now determined the 5'-end of the GPAT mRNA by RNase protection using RNA from HepG2 cells and two different RNA probes. High expression of GPAT in liver (24 ) dictated the use of HepG2 cells. The probes correspond to nt 412-773 and 518-773 (Fig. 1 ). With each RNA probe a single protected fragment of 140 nt was obtained (Fig. 2 ), indicative of a transcription start site at nt 634. This transcription start site extends the GPAT 5' untranslated region (5'UTR) 72 nt from that estimated previously from the pseudogene. The 558 bp intergenic region between start sites for transcription of AIRC and GPAT has a GC content of 66% and contains no TATA or CAAT boxes. The positions of nine GC boxes, potential sites for Sp1, are marked in Figure 1 , along with N1, a binding site for the transcription factor NRF-1. Several errors in the previously reported sequence (accession no. U00239) were corrected.
The human GPAT and AIRC genes are divergently transcribed from a 558 bp intergenic promoter region. In previous work the human intergenic promoter region was isolated, transcription start sites for AIRC determined and promoter function estimated by cloning the promoter in both orientations into a single reporter vector (2 ). We have now determined the transcription start site for GPAT by RNase protection and identified cis-acting sites and transcription factors that are used for bidirectional transcription. Mutation of sites N1 and gc4, gc5 or gc6, between nt 214 and 260, resulted in decreased bidirectional expression. N1 is a key site for bidirectional transcription of these genes. It is the only site in the promoter for high affinity binding of nuclear proteins under the conditions that were used for gel shift and footprinting. Because N1 was not identified in a transcription database (25 ,26 ), it was necessary to purify the binding protein for identification as NRF-1. Two lines of evidence support a central role of NRF-1 in bidirectional expression. First, binding of NRF-1 to site N1 increased the affinity of Sp1 for flanking GC sites. Second, mutation of N1 indicates a major role of this site for AIRC transcription and a requirement together with gc4 or gc6 for concomitant GPAT transcription in transfected HepG2 cells. It should be noted, however, that the data in Figure 5 , while supporting roles of NRF-1 and Sp1 for bidirectional expression in HepG2 cells, do not provide in vivo evidence to support the idea that Sp1 binding to GC sites is dependent upon NRF-1 binding to site N1. The data in Figure 5 show that interaction of Sp1 at gc4 and gc5 supported partial GPAT expression, although not AIRC expression, when binding of NRF-1 to site N1 was blocked by mutation. Therefore, the conclusion that Sp1 binding to GC sites is dependent upon NRF-1 is derived solely from in vitro gel shift and footprinting experiments. NRF-1 was affinity purified together with an RNA binding protein, TLS. Since TLS is not known to bind to DNA and was not required for binding of recombinant NRF-1 to the promoter, we assume that co-purification was a result of non-specific interactions.
A potential site for NRF-1 binding is also found in the rat GPAT-AIRC promoter and it remains to be determined whether NRF-1 has a role in co-expression of these genes in rat fibroblasts (5 ). An NRF-1 site is not found in the promoter region of the divergently transcribed chicken GPAT-AIRC genes (4 ). This may reflect different requirements for de novo purine nucleotide synthesis in birds and mammals. The pathway in mammals functions solely for biosynthesis of purine nucleotides, whereas in avian species there is the added function of synthesizing uric acid for excretion of excess nitrogen.
Analyses of cytochrome c and cytochrome oxidase promoters led to the previous identification of nuclear respiratory factors (NRF), one of which was designated nuclear respiratory factor 1 (NRF-1) (29 ,30 ). Functional NRF-1 sites have been identified in nuclear genes encoding a number of mitochondrial respiratory proteins, genes for mitochondrial DNA replication and transcription and genes encoding enzymes for protein synthesis and rate limiting enzymes in biosynthesis and catabolism (31 ). Thus NRF-1 is predicted to play a role in coordinating the expression of >50 mammalian nuclear and mitochondrial genes. To this list should be added the human GPAT and AIRC genes for de novo purine biosynthesis. GPAT-encoded glutamine PRPP amidotransferase is the key regulatory enzyme of the de novo purine nucleotide biosynthetic pathway. These results support the proposal by Scarpulla and co-workers that NRF-1 may help to coordinate respiratory metabolism with other biosynthetic and degradative pathways (32 ).
There is a potential functional link between mitochondrial respiration and GPAT-encoded glutamine PRPP amidotransferase. Regulation of glutamine PRPP amidotransferase turnover is linked to aerobic metabolism in Bacillus subtilis (33 ). According to the current model, decreased growth resulting from nutrient limitation leads to an elevated cellular oxygen level as a consequence of decreased respiratory chain activity. Oxidation of a labile glutamine PRPP amidotransferase Fe-S center initiates a change in conformation that triggers enzyme degradation. In this way enzyme turnover and purine nucleotide synthesis are regulated by the availability of nutrients and the capacity for growth. Human glutamine PRPP amidotransferase contains an Fe-S center and has the same properties of oxygen lability as the Bacillus enzyme (34 ). Although the detailed steps and signals surely differ in mammals, it will be interesting to evaluate the possibility of an NRF-1-mediated link between mitochondrial respiration and the rate limiting oxygen labile enzyme for purine biosynthesis. Chickens, which apparently do not utilize NRF-1 for GPAT expression, still contain a glutamine PRPP amidotransferase with an oxygen-labile Fe-S cluster, suggesting that this putative mechanism for enzyme regulation has been retained.
A number of examples of bidirectional transcription of closely linked genes in vertebrates have been described (see for example 7 -13 ). In some of these cases both of the transcribed genes are known, whereas in others opposite strand transcription has been detected but the gene not identified. GPAT-AIRC and genes for human collagen type IV are the two best examples of divergent transcription of defined genes having closely related functions. The COL4A1 and COL4A2 genes code for the [alpha]1(IV) chains of collagen IV. A nuclear protein designated CTC box binding factor (CTCBF) binds to a CTC box within the 127 bp intergenic promoter and is required for bidirectional transcription (7 ). This transcription factor is homologous or identical to Ku antigen (35 ). Similar to the GPAT-AIRC locus, mutations in the CTC box reduce transcription only partially and to different extents in the two directions, suggesting that other elements contribute to promoter function. Indeed, CTC box motifs located within the first introns of COL4A1 and COL4A2, as well as intergenic CCAAT and GC boxes (36 ), may contribute to bidirectional transcription. Although the GPAT-AIRC promoter doesn't contain CCAAT or TATA motifs, additional cis-acting control elements in the intergenic promoter, as well as in downstream positions, likely have roles in expression which remain to be determined. NRF-1 and Sp1 binding to N1 and flanking gc sites are thus necessary, but not sufficient, for GPAT-AIRC bidirectional transcription. This report provides the first evidence for factors that are used to coordinate expression of genes for de novo purine synthesis in higher eukaryotes.
We thank Richard Scarpulla (Northwestern University Medical School) for providing NRF-1 antiserum and an NRF-1 cDNA clone, David Ron (New York University Medical Center) for TLS antiserum and other materials not used in the work described, as well as advice and stimulating discussions, Harry Charbonneau for expert advice on micro scale techniques for peptide isolation, Steven Broyles for advice on growth of HeLa cells and important discussions and Yongting Cai for technical assistance. Oligonucleotides were synthesized and peptides sequenced by Mary Bower and Alan Mahrenholz in the Purdue Laboratory for Macromolecular Structure, supported by the Diabetes Research and Training Center (NIH grant P60 DK20524). This research was supported by NIH grant GM 46466. Protein sequence and oligonucleotide synthesis were carried out by the Purdue Laboratory for Macromolecular Structure, supported by the Diabetes Research and Training Center (NIH grant P60 DK20524). This is journal paper number 15365 from the Purdue University Agricultural Research Station.
*To whom correspondence should be addressed. Tel: +1 765 494 1618; Fax: +1 765 494 7897; Email: zalkin@biochem.purdue.edu
Fraction
Protein
(mg)Specific activity
(U/mg)bTotal activity
(U)Purification
factor Yield
(%)
Nuclear extract
390
70
27 300
1.0
100
Gel filtration
48
404
19 400
5.8
71
DNA affinity 1
0.012
13 * 105
15 600
18 600
57
DNA affinity 1
0.024
35 * 105
8400
50 000
31
REFERENCES
Return

