DDBJ/EMBL/GenBank accession nos AF008653-55
ABSTRACT
The mouse glycinamide ribonucleotide formyltransferase (GART) locus is known to produce two functional proteins, one by recognition and use of an intronic polyadenylation site and the other by downstream splicing. We now report a similar intronic polyadenylation mechanism for the human GART locus. The human GART gene has two potential polyadenylation signals within the identically located intron as that involved in intronic polyadenylation in the mouse gene. Each of the potential polyadenylation signals in the human gene was followed by an extensive polyT rich tract, but only the downstream signal was preceded by a GT tract. Only the downstream signal was utilized. The polyT rich tract which followed the functional polyadenylation site in the human GART gene was virtually identical in sequence to a similarly placed region in the mouse gene. An exact inverted complement to the polyT rich stretch following the active polyadenylation signal was found in the upstream intron of the human gene, suggesting that a hairpin loop may be involved in this intronic polyadenylation.
Multifunctional enzymes have been found in several metabolic pathways in eukaryotes, e.g., purine and pyrimidine synthesis and the interconversion of folates. A comparison of the sequence of the genes encoding these multifunctional proteins across species suggests that they arose by fusion of genomic regions encoding single domain proteins with the loss of individual promoters (1 ,5 ,8 ). In mammals and birds, the GART locus encodes an enzyme of purine synthesis which catalyzes three steps of this pathway [glycinamide ribonucleotide synthetase (GARS), glycinamide ribonucleotide formyltransferase (GART) and aminoimidazole ribonucleotide synthetase (AIRS)] (1 ,4 ,8 ), each of which is expressed as a separate protein in Escherichia coli and Bacillus subtilis (6 ,13 -15 ). Paradoxically, the mouse GART gene has also maintained the ability to produce a second mRNA which translates to a monofunctional protein with GARS activity (8 ). It is not clear why a mechanism to express a separate, monofunctional GARS activity has been retained after the evolutionary selection of a trifunctional protein which also catalyzes the GARS reaction. However, the structure of the mouse GART gene suggests that the smaller protein has been actively retained and that it is not just a vestigial remnant (9 ).
Our previous studies on the mouse GART locus demonstrated that the mRNAs for monofunctional GART were produced by the utilization of several polyadenylation signals present in intron 11 (9 ). In the mouse, the transcripts for the trifunctional protein use the same promoter and same initial 11 exons of this gene as does the monofunctional GARS but processing of these transcripts ignores the intron 11 polyadenylation signals and splices in the downstream exons 12-22. Hence, processing of the primary transcripts from the GART gene represents a case in which two functional proteins are produced from a single gene but by alternative intronic polyadenylation and splicing rather than by alternative exon usage. The mechanism whereby intronic polyadenylation occurs for the GART gene remains to be established, although recent studies have directly addressed this issue in the immunoglobulin heavy chain gene (16 ). In this study, we isolated the human GART genomic locus and searched for cis elements common between the human and mouse (9 ) genes that might be involved in this mechanism. From the patterns of sequence conservation and analysis of frequency of utilization of polyadenylation signals, we noted that an extensive polyT tract located in the intron distal to the last utilized intronic cleavage/polyadenylation signal and a weak 5' splice donor site were common to both mouse and human GART genes. The GART gene in each species also had a backup motif which could accentuate the effectiveness of these intronic sequence motifs on cleavage/polyadenylation while still allowing downstream splicing.
Human genomic clones for the GART locus were isolated from a [lambda] FixII library constructed from male placental DNA (Stratagene). The library was screened with a cDNA probe, derived by PCR from poly(A)+ selected RNA isolated from CEM human lymphoblastic leukemia cells. This probe was clone HR24 (Fig. 3 B) derived by 3' RACE; the sequence has been deposited in GenBank. The library screen was performed using Denhardt's solution hybridization conditions as previously described (12 ).
Human GART cDNA sequence (1 ) that corresponds to exons 11-14 of the mouse gene (9 ) were mapped to restriction fragments of [lambda] clones by Southern blot hybridization to the cDNA probe used for the library screen and with 25mer oligonucleotides corresponding to this region. Hybridizing restriction fragments were subcloned into pBluescript SKII(+). Double-stranded plasmid DNA was isolated by the alkaline lysis method and partially sequenced using Sequenase 2.0 (Amersham).
Two independent methods were used to determine the use of polyadenylation signals, 3' RACE and ribonuclease protection assays. 3' RACE was performed as previously described (7 ,9 ) using poly(A)+ selected RNA from CEM cells, reverse transcribed using an oligo dT primer containing sequence of an anchor primer. The gene-specific primers used for 3' RACE were 5'-ctcaagctctaggactggaggtgttccatgcaggc-3' (P1), 5'-cttcatgatagcgtaagtttgg-3' (P2) and 5'-actgaagatgagaatactggtc-3' (P3). The PCR products obtained were gel purified and subcloned into the pCRII vector (Invitrogen). Constructs used for ribonuclease protection assays were two adjacent Sau3AI fragments of 0.9 and 0.4 kb from the genomic subclone HJK32 ligated into the BamHI site of pBluescript SKII (+) and named HS1 and HS13, respectively (Fig. 3 B). The subclone HJ1 was created by ligating the RsaI fragment from clone HS13 into the EcoRV site of pBluescript SK (+). Templates for in vitro transcription were linearized with either FokI, Bst71I, NdeI or EcoRV (Fig. 3 B) and were transcribed using either T3 or T7 RNA polymerase. Ribonuclease protection assays were performed essentially as described (2 ). Briefly, 30 [mu]g of total RNA from CEM cells was hybridized overnight at 50oC with an in vitro transcribed RNA probe. Approximately 5 * 105 c.p.m. of labeled RNA probe was used for each 30 [mu]l reaction. The hybridized RNA was treated with 100 [mu]g/ml of RNAse A at 4oC for 30 min followed by incubation in 250 [mu]g/ml of proteinase K at 42oC for 15 min. The samples were extracted with phenol-chloroform, then precipitated with ethanol. The precipitated RNA was resuspended in loading buffer and fractionated on a 6% polyacrylamide sequencing gel.
A total of 1.5 * 106 [lambda] plaques from a human male placenta [lambda] FIXII genomic library were screened using a radiolabeled probe containing human cDNA sequence corresponding to the end of the GARS domain and the beginning of the AIRS domain. Twenty two genomic clones were isolated. The alignment of four clones which overlapped in the region of interest is depicted in Figure 1 A. Restriction mapping and Southern blot analysis using an oligonucleotide probe corresponding to the 3'-segment of the GARS domain identified a hybridizing BglII-BamHI fragment on clone [lambda]HGAG1 which was then subcloned into the BamHI-StyI sites of pBluescript SKII (+). A BamHI-HindIII subfragment (Fig. 1 A) which contained human GART exon 14 sequence was also subcloned. The sizes of the exons 11-14 and the junctional sequences in this region of the human GART gene were determined by limited sequence analysis (Table 1 ). The exon numbering presented here was arbitrarily set to the assignments previously found for the mouse GART locus (9 ) to permit a facile comparison of these two genes. Exons 11-14 in the human GART gene were identical in size and in the exact site of interruption of the coding region by introns previously reported for the mouse gene (9 ). In the human GART gene, as in the mouse, exon 11 constituted the end of the coding region for the GARS domain. For the smaller (monofunctional GARS) transcript, the stop codon, taa, was part of the 5'-splice donor site used to generate the trifunctional transcript, and the 3'-untranslated region of the monofunctional GARS message was contiguous with the end of exon 11 (see below). Thus, as was previously found to be the case in the mouse, the monofunctional GARS message was produced from primary transcripts from this human gene by cleavage and polyadenylation within what constitutes intronic sequence for the longer transcript encoding the trifunctional GARS-AIRS-GART protein.<
P>
We have previously determined that multiple polyadenylation signal sequences were present in intron 11 of the mouse GART gene, most of which could be used to generate the heterogeneous 3'-termini of the monofunctional GARS mRNAs (9 ). There was also a striking 24 nt polyT tract in the mouse gene immediately downstream of the most 3' polyadenylation signal. Detailed sequence analysis of intron 11 in the human GART locus was performed to determine whether any or all of these structural features were conserved between mouse and human genes. In the first 900 nt of intron 11, immediately downstream of the last GARS domain-encoding exon in the human gene, there were two candidate polyadenylation signal sequences, which are referred to below as poly(A) site I (ATTAAA) and II (AATAAA) (Fig. 1 B). Interestingly, there was a polyT tract downstream of each of these potential polyadenylation signals, although the GT rich tract associated with polyadenylation sites was present only for site II, located immediately upstream of the polyT tract.
Chicken, mouse and human tissues express two sets of transcripts related to GART, a 1.7-1.9 kb message encoding a monofunctional GARS, and an ~3.4 kb transcript encoding three domains that catalyze the second (GARS), third (GART) and fifth (AIRS) reactions of de novo purine synthesis (1 ,8 ). We have previously reported that the protein encoded by the smaller set of messages in the mouse is a functional enzyme, in spite of the fact that this GARS reaction is also carried out by the trifunctional enzyme (8 ). We also showed that both classes of messages come from the same genomic locus (9 ). The monofunctional GARS transcript was generated by use of the first 11 exons of the mouse GART gene followed by cleavage and polyadenylation within intron 11, whereas for the larger transcript, the intronic polyadenylation signals present in this intron are ignored and exon 11 is spliced to exons downstream. In this study, we demonstrate that intronic polyadenylation is also responsible for generation of the monofunctional GARS transcripts in human cells and we examine the structural features of the human gene to search for common features with the mouse GART locus that might explain this phenomenon.
The most striking commonality between the mouse and human GART genes surrounding the intronic polyadenylation signals is a nearly identical polyT tract:
mouse:GTGTTTTGATTTT(T)24T
human:GTGTTTTGATTTC(T)24G
This sequence motif is immediately downstream of the only active polyadenylation signal in the human GART intron 11 and is also present close downstream of the most distal intron 11 polyadenylation signal used in the mouse gene. This polyT motif is immediately downstream of the GT tract (apparently required for efficient polyadenylation; 11 ) and the polyadenylation signal (AATAAA). This degree of conservation of a polyT region within intronic sequence would seem unlikely to be fortuitous, and may reflect an important component in the mechanism for generation of the smaller transcript in human and mouse GART genes.
Intron 10 of the human GART locus was partially sequenced during localization of the intron/exon borders of this region of the gene. These studies revealed the presence of a 20 nt sequence which was an exact inverted complement to a segment of the intron 11 polyT rich tract. This suggested that the polyT tract might be influencing the choice between splicing of exon 11 downstream and that of intronic cleavage/polyadenylation by the formation of a hairpin structure in the unprocessed primary transcript. This finding prompted us to determine the sequence of the intron immediately upstream of exon 11 in the mouse gene. Somewhat to our surprise, a complement to the mouse polyT tract was not found, at least within this exon, although it might be present elsewhere in the gene. However, for the human GART gene, there is only a single active polyadenylation signal whereas several intronic polyadenylation signals are present and are utilized in the mouse gene. Hence, it seems entirely likely that the formation of a hairpin loop accents the effects of the presence of the single active intronic polyadenylation site in the human gene, whereas the multiple active signals are sufficient to bring about intronic transcript termination in the mouse gene.
The 5' splice donor site immediately upstream of the site of intronic polyadenylation in both mouse and human GART genes differs from the consensus sequence by the presence of an A nucleotide in the +5 position, a deviation from the G nucleotide normally at this position. The mouse and human immunoglobulin mu heavy chain (IgM) locus, and other immunoglobulin genes that exhibit intronic polyadenylation, all share this deviation from consensus at the 5' splice donor site immediately upstream of the intronic polyadenylation signal (10 ). Previous studies on the immunoglobulins have demonstrated that mutation of this A nucleotide to the consensus G nucleotide markedly altered the pattern of splicing versus polyadenylation (10 ), directly implicating this `weak' splice site in the phenomenon and suggesting that it is needed to ensure downstream splicing in at least a fraction of the processed transcripts. This similarity suggests that the control of splicing and polyadenylation in the GART and immunoglobulin genes may share a common molecular mechanism.
During the course of our studies on the human GART gene, a splice variant was found that used a polyadenylation signal in exon 14 (Fig. 3 A). (The AATAAA found in exon 14 of the human sequence was not preserved in the mouse sequence so that analogous mouse transcripts were not possible.) Transcripts which used the 3'-splice acceptor site of exon 14 and the AATAAA within exon 14 were not found, nor were transcripts using exon 14A without recognition of the exonic polyadenylation signal sequence, although either would have been detected by our experiments. Nevertheless, although this class of transcripts was frequently isolated by 3' RACE, the message proved to be of low abundance as evidenced by its absence in both Northern blots and ribonuclease protection assays. We concluded that it represented a rare transcript which was overemphasized by PCR.
There is a short but growing list of genes recognized to produce different forms of the same protein by a choice of intronic polyadenylation versus downstream splicing. The two transcripts generated from the GART locus represent an extreme case with two functional proteins being produced, one of which has only GARS activity, the other capable of three of the steps of purine synthesis including the GARS reaction. To our knowledge, all of the other cases analyzed to date involve a switch from one isotype of a protein to another in response to the needs of a tissue-specific or developmental program of gene expression. Although it is not clear whether there is a developmental role for the GART gene, others have implicated this locus in Down's syndrome (3 ). From the striking sequence similarities and differences in the mouse and human GART genes near the region involved in intronic polyadenylation, this locus seems to be an interesting and informative model system to dissect the mechanisms involved in the choice of intronic polyadenylation versus downstream splicing. Whether and the extent to which the mechanism of this choice in the GART gene is common with other genes which utilize intronic cleavage/polyadenylation will prove useful in understanding the steps involved at a biochemical level.
We thank Mr Scott Smith for his excellent technical assistance. The study was supported in part by grant CA-27605 from the National Institutes of Health, DHHS and by training grant T32-CA-09564.
*To whom correspondence should be addressed at: Massey Cancer Center, Medical College of Virginia, MCV Box 980230, Richmond, VA 23298, USA. Tel: +1 804 828 5783; Fax: +1 804 828 5782; Email: rmoran@gems.vcu.edu
+Present address: Howard Hughes Medical Institute, Program in Molecular Medicine and University of Massachusetts Medical Center, 373 Plantation Street, Suite 309, Worcester, MA 01605, USA
REFERENCES
