ABSTRACT
A distinct nuclear form of human uracil-DNA glycosylase [UNG2, open reading frame (ORF) 313 amino acid residues]
from the
UNG
gene has been identified. UNG2 differs from the previously known form (UNG1,
ORF 304 amino acid residues) in the 44 amino acids of the N-terminal sequence, which is not necessary for catalytic activity. The rest
of the sequence and the catalytic domain, altogether 269 amino acids, are
identical. The alternative N-terminal sequence in UNG2 arises by splicing of a previously unrecognized
exon (exon 1A) into a consensus splice site after codon 35 in exon 1B
(previously designated exon 1). The UNG1 sequence starts at codon 1 in exon 1B
and thus has 35 amino acids not present in UNG2. Coupled
transcription/translation in rabbit reticulocyte lysates demonstrated that both
proteins are catalytically active. Similar forms of UNG1 and UNG2 are expressed
in mouse which has an identical organization of the homologous gene. Constructs
that express fusion products of UNG1 or UNG2 and green fluorescent protein
(EGFP) were used to study the significance of the N-terminal sequences in UNG1 and UNG2 for subcellular targeting. After
transient transfection of HeLa cells, the pUNG1-EGFP-N1 product colocalizes with mitochondria, whereas the pUNG2-EGFP-N1 product is targeted exclusively to nuclei.
Uracil-DNA glycosylase is the first enzyme in base excision repair (BER) for
removal of uracil from DNA and its main function is probably to remove
mutagenic uracil residues resulting from deamination of cytosine in DNA (
1
). A catalytically fully active recombinant form of human uracil-DNA glycosylase lacking an N-terminal sequence encoded by the open reading frame (ORF) (
2
) has been used to study structure-function relationships as determined by site directed mutagenesis and X-ray crystallography (
3
). These studies identified this form of human uracil-DNA glycosylase as a one domain structure with a positively charged DNA-binding groove and a deep uracil-binding pocket (
3
,
4
) which is accessed by flipping of uracil with its deoxyribose and 5'-phosphate out of the helix (
5
). The complete sequence of the
UNG
gene, which encodes human uracil-DNA glycosylase (UNG)
(
6
), and comparison with
UNG
cDNA from human placenta (
7
) indicated that the gene contained six exons and a TATA-less promoter. This cDNA contains an ORF of 304 amino acids (
7
), but a homogeneous form of uracil-DNA glycosylase purified from human placenta (
8
) lacked 77 N-terminal amino acids predicted from the ORF (
7
). Subsequent work demonstrated that the
UNG
gene encoded both the mitochondrial and nuclear forms of UNG-proteins and that this N-terminal sequence was required for transport of the enzyme to
mitochondria, but not for nuclear import as determined by transfection studies
(
9
). The similarity of the mitochondrial and nuclear forms was also demonstrated
by subcellular fractionation and Western blotting, as well as by immunostaining
(
9
,
10
). It was thus assumed that proteolytic removal of the N-terminal sequence could explain the differential transport of UNG either
to mitochondria or nuclei, although artificial cleavage during purification
could not be excluded.
Here we report a distinct form of human nuclear UNG. We designate this form UNG2
and the previously known human form (
7
) UNG1. Both forms are encoded by the
UNG
gene, but have different N-terminal sequences. In addition we have isolated cDNAs for mouse homologs
of UNG1 and UNG2. Transient transfection experiments with constructs that
directed expression of fusion products of either UNG1 or UNG2 and a variant of
green fluorescent protein indicated that the N-terminal sequences of UNG1 and UNG2 direct targeting to mitochondria and
nuclei, respectively.
Mouse embryonic carcinoma cDNA library, human liver cDNA library and NT2
neuronal precursor cell cDNA library were from Stratagene (La Jolla, CA, USA).
All libraries were propagated in the Uni-ZAPTXR vector using XL-1 blue as host. [[alpha]-
32
P]dCTP, [
35
S]methionine, Rediprime random labelling kit and Hybond N+ filters were all from
Amersham (UK). All sequencing primers were from MedProbe (Oslo, Norway). Dye
terminator cycle sequencing ready reaction kit was from Applied Biosystems
(Foster City, CA). The Dynazyme PCR kit was purchased from Finnzymes Oy (Espoo,
Finland). TNT
in vitro
transcription/translation rabbit reticulocyte lysate system kit, pGEM-T TA cloning kit, Alter Sites II
in vitro
Mutagenesis System, primers for sequencing from T3 and T7 promoters and T3 RNA
polymerase were from Promega (Madison, WI). The plasmid encoding the red-shifted variant of green fluorescent protein (pEGFP-N1) was from Clontech (Palo Alto, CA, USA). Restriction enzymes were
from New England Biolabs Inc. (Beverly, MA, USA).
All libraries were screened as recommended by the manufacturer, using
32
P-labeled
UNG40
cDNA (
7
) as probe. Hybridization was carried out at 65oC overnight in 6* SSC, 5* Denhardt's solution and 0.1% SDS. Filters were washed in 0.1* SSC/0.5% SDS at 65oC and autoradiographed. Three rounds of screening
were done.
In vivo
excision of pBluescript phagemids from the Uni-ZAPTXR vector was performed as recommended by the manufacturer.
Sequencing was performed on an Applied Biosystems Model 373A DNA Sequencing
System using the Dye terminator cycle sequencing ready reaction kit as
recommended by the manufacturer. The sequences were analyzed using the Auto
Assembler software (Applied Biosystems).
In vitro
transcription/translation was performed with the TNT transcription/translation
system with [
35
S]methionine as recommended by the manufacturer, using 200 ng of the expression
constructs per 10 [mu]l reaction volume. The mouse UNG1- pBluescript construct was transcribed from the T3 promoter in the
pBluescript vector. The insert of mouse UNG2-pBluescript was amplified by the polymerase chain reaction using Dynazyme
PCR kit, ligated into the pGEM-T vector and transcribed from the T7 promoter. The human UNG2-pBluescript was transcribed from the T3 promoter after
Sac
I/
Nhe
I excision of a 79 bp fragment from the polylinker and the 5'-end of cDNA for UNG2. Human UNG1 cDNA was transcribed from the T7
promoter as previously described (
2
). The samples were run on a 12% denaturing sodium dodecyl sulfate
polyacrylamide gel (SDS-PAGE). The gel was dried, autoradiographed overnight and scanned on an
LKB Ultroscan XL Enhanced Laser Densitometer. Uracil-DNA glycosylase activity was measured in parallel samples of the
in vitro
transcription/translation assay mixture containing unlabelled amino acids (
2
).
UNG15
cDNA, which encodes UNG1, in pGEM7Zf+ (pUNG15) (
2
,
7
) was digested with
Bcl
I, which cuts at bp 1019 in
UNG15
cDNA, blunted with DNA polymerase I (Klenow fragment), and ligated to an
Age
I linker prepared from the oligonucleotide 5'-ACCGGTGCC-3' and its complementary copy. The religated pUNG15
containing the
Age
I linker correctly ligated into the
Bcl
I site (verified by sequencing) was digested with
Rsr
II, which cuts at bp 49 in
UNG15
cDNA (
7
), blunted as above and finally digested with
Age
I. The fragment was then ligated into pEGFP-N1 digested with
Sma
I (blunt) and
Age
I. The construct was sequenced to verify that the construct was in frame with
the ATG of the EGFP-N1 fusion protein. The TGA stop codon of pUNG15 was changed to GGA by site-directed mutagenesis performed according to the procedure provided
by the manufacturer using ssDNA prepared with R408 phage. Potential pUNG1GGA
-EGFP-N1 constructs were screened by digestion with
Bcl
I (digests only unmutated plasmids) and verified by sequencing. The correct
construct was named pUNG1-EGFP-N1. cDNA for UNG2 (this article) in pBluescript was digested with
Nhe
I, which cuts 54 bp upstream of ATG, and
Eco
NI which cleaves the cDNAs in the sequence that is shared by cDNAs for UNG1 and
UNG2 (positions 529 and 520, respectively). The resulting fragment of interest
(501 bp) was isolated and ligated to the 5155 bp fragment of
Nhe
I/
Eco
NI-digested pUNG1-EGFP-N1 to obtain pUNG2- EGFP-N1. Transient transfections of HeLa cells were
done with the CaPO4
-method (Profection, Promega) according to the manufacturer's
recommendations. Confocal microscopy (BioRad MRC-600) of HeLa cells and staining of mitochondria with mouse anti human
mitochondria antibody (MAB 1273, Chemicon) and Texas Red anti-mouse IgG (Vector) were performed as previously described (
10
). Examination of HeLa cells transfected with expression plasmids pEGFP-N1, pUNG1-EGFP-N1 or pUNG2-EGFP-N1 was carried out using an excitation wave length
of 488 nm and emission wave length >515 nm at 16 h after transfection.
We have screened a human NT2 neuronal precursor cell cDNA library and a mouse
embryonic carcinoma cDNA library and discovered a new form of human uracil-DNA glycosylase (human UNG2) encoded by the
UNG
gene, as well as the homologous cDNA from mouse (mouse UNG2). In addition we
have isolated the cDNA for the mouse homologue (encoding mouse UNG1) of human
UNG1 (
7
). cDNA for human UNG2 has an ORF encoding 44 N-terminal amino acids not found in human UNG1 whereas cDNA for human UNG1
has an ORF encoding 35 amino acids not found in human UNG2 (Fig.
1
). The two forms are identical in the rest of the N-terminal sequence which is not required for enzyme activity, as well as in
the catalytic domain, altogether 269 identical consecutive amino acids. cDNAs
for human UNG2 and its mouse homologue are apparently as abundant as UNG1 in
cDNA libraries from proliferating cells since among 20 cDNA clones that were
sequenced 10 were of the UNG2 type and 10 were similar to the previously known
UNG1 type. Among four mouse cDNAs sequenced, three were of the UNG2 type and
one was of the UNG1 type. However, screening of a human hepatocyte cDNA library
with
UNG40
cDNA (
7
) resulted in the isolation of 80 strongly hybridizing clones and sequencing of
14 of these demonstrated that they were all similar to the previously
characterized cDNA for UNG1 or the cDNA
UNG40
(
7
).
To examine whether human UNG1 and UNG2 were translocated to different
subcellular compartments, we prepared constructs expressing fusion proteins of
the UNG proteins and a red shifted variant of green fluorescent protein (EGFP-N1). These were used for transient transfection experiments with HeLa
cells. The major advantage of the green fluorescent protein (over the use of
antibodies) is that this method relies on the autofluorescence of this protein
alone, and thus possible cross reaction of an antibody with epitopes in
irrelevant proteins is not a problem. The control (pEGFP-N1) shows that the green fluorescent protein displays a homogeneous
staining over the cells (Fig.
4
A). In contrast, the UNG2-EGFP-N1 fusion protein is exclusively located in the nuclei (Fig.
4
C) and the UNG1-EGFP-N1 fusion protein (Fig.
4
D) is mainly, if not exclusively, located in extranuclear spots that have the
same appearance as mitochondria stained with Texas red (Fig.
4
B).
The few spots of green fluorescence visible over the nucleus in Figure
4
D are probably due to the presence of mitochondria over and under the nuclei in
this summation image of optical sections. These results provide convincing
experimental evidence that UNG2 is a nuclear protein and UNG1 a mitochondrial
protein.
Figure
Figure
In the present paper we describe a distinct form of human nuclear uracil-DNA glycosylase (UNG2) encoded by the
UNG
gene as well as the mouse homologs of UNG1 and UNG2. UNG1 (the mitochondrial
form) and UNG2 have identical catalytic domains, but very different N-terminal sequences. The two forms of uracil-DNA glycosylase from the human
UNG
gene are generated using two promoters and splicing of an additional exon (exon
1A), transcribed from the putative upstream promoter PA
, into the first exon transcribed from the lower promoter PB
. This observation requires an update of the recently published organization of
the human
UNG
gene (
6
). In a previous study from our laboratory (
9
) we considered alternative splicing as a less likely mechanism for generation
of nuclear and mitochondrial forms because only one transcript was observed (
7
,
13
). However, the cDNAs for human UNG1 (2061 bp) and UNG2 (2058 bp) are of very
similar sizes and the corresponding mRNAs would probably not be separated on
standard gels for Northern analysis.
It has previously been documented by several methods that the nuclear and the
mitochondrial forms of human uracil-DNA glycosylase are strongly related (
5
,
10
). Transfection studies demonstrated that the N-terminal sequence in UNG1 directed transport to mitochondria, whereas the
absence of this sequence resulted in nuclear transport (
9
). Therefore, the putative nuclear localization signal within the catalytic
domain is presumably sufficient to direct nuclear transport in the absence of
the mitochondrial localization signal in the N-terminal sequence. Alternatively this small basic protein may not require
a signal for entering the nucleus. The discovery of UNG2 containing an N-terminal sequence that probably contains the nuclear localization signal
indicates, however, that this new form may represent the true nuclear form.
However, formally it can not be excluded that the absence of a mitochondrial
targeting signal in UNG2 results in nuclear localization. Furthermore, it is
possible that the N-terminal sequence and the putative nuclear localization signal in the
catalytic domain are both required for nuclear import. The predicted size of
UNG2 (~36 kDa) corresponds to the size reported for a highly purified nuclear form of rat uracil-DNA glycosylase (
14
), while others have reported a size corresponding to the catalytic domain
lacking the N-terminal sequence of uracil-DNA glycosylase from human tumor cells (
15
,
16
) and human placenta (
8
). Proteolytic cleavage of the N-terminal sequence for generation of the nuclear form of uracil-DNA glycosylase may have a physiological role, but it is also
possible that this is an artificial situation only seen in tumor cells that in
general are rich in proteases, and in other protease-rich tissues, such as placenta. Alternatively, the proteolytical removal
of the N-terminal sequence may be an artefact of protein purification. We have
recently found that treatment of a purified recombinant form of human UNG1
lacking the 28 N-terminal amino acid residues ([Delta]28UNG1) with proteinase K (or cell extracts) removes some
additional 50 amino acid residues, leaving the catalytic domain intact and
fully active (unpublished data). Thus, the different sizes previously reported
for the nuclear form of UNG may be explained by the action of protease activity
released after preparation of cell extracts for purification.
Two other cDNAs claimed to encode human nuclear uracil-DNA glycosylase have been reported (
17
,
18
). These are unrelated to uracil-DNA glycosylases of the UNG family, as well as to each other. Since
polyclonal antibodies raised against a recombinant form of human UNG (UNG[Delta]84) essentially completely inhibit total uracil-DNA glycosylase activity in crude cell extracts (
2
) the unrelated uracil-DNA glycosylases are unlikely to contribute very much, if at all, to
uracil-DNA glycosylase activities in human cells. In addition, deletion of
carboxy-terminal amino acids of a human T(U)/G-mismatch DNA glycosylase results in an enzyme that has lost the
thymine-DNA glycosylase activity, but retained uracil-DNA glycosylase activity for U/G mispairs. In addition, bacterial
proteins with homology to the core of the human glycosylase were identified (
19
). This indicates that this mismatch uracil-DNA glycosylase is an ancient enzyme that might represent a backup for the
uracil-DNA glycosylase from the
UNG
gene. Consistent with this idea, we have found that when crude cell extracts
are incubated with neutralizing antibodies to the UNG-proteins, the removal of uracil from oligonucleotides containing U/A is
completely abolished, but a small residual activity (<5%) is found for removal of U from U/G mismatches (unpublished data).
The
UNG
gene in the fish
Xiphophorus
is strongly related to mammalian
UNG
genes and is also similarly organized (
20
, and R. B. Walter, personal communication). The N-terminal sequence of a putative
Xiphophorus
UNG2 protein predicted from the gene structure is homologous to mammalian UNG2,
but much shorter. However, the N-terminal sequence in the putative UNG1 protein in
Xiphophorus
is only distantly related (data not shown). The
Xiphophorus UNG
gene has consensus splice sites located in exactly the same positions as those
verified to be splice sites in the human
UNG
gene (
6
) as well as in mouse (data not shown). It was therefore originally suggested
that the
UNG
gene of
Xiphophorus
has six exons (
20
), but the present work strongly indicates that it encodes seven exons. Other
repair gene structures are also highly conserved from fish to man. Thus, 22 of
23 exons of
ERCC2/XPD
in Chinese hamster ovary and
Xiphophorus
are of identical sizes. Since fish and mammals separated more than 450 million
years ago, this indicates a very high degree of conservation of coding
sequences and splicing patterns for DNA repair genes in vertebrates (
6
,
20
).
In conclusion, the human
UNG
gene encodes two forms of uracil-DNA glycosylase generated by alternative transcription starts, making use
of an exon specific for the N-terminal end of the nuclear form, and alternative splicing. The different
N-terminal amino acid sequences generated by this mechanism result in one
form that enters the nucleus (UNG2) while the other form enters the
mitochondria (UNG1).
This work was supported by the Norwegian Cancer Society, the Research Council of
Norway and the Cancer Fund at the Regional Hospital in Trondheim. We would like
to thank Dr Geir Slupphaug for helpful discussions and Dr Ronald B. Walter of
the South West State University, San Marcos, Texas, for permission to discuss
sequence information on the
UNG
gene from
Xiphophorus
.
*To whom correspondence should be addressed. Tel: +47 73 598680; Fax: +47 73
598705; Email: hans.krokan@unigen.unit.no
+
X08975, X15653, X89398, X99018 and Y09008.


REFERENCES
Return
