ABSTRACT
The C
6
zinc cluster family of fungal regulatory proteins shares as DNA-binding motif the C
6
zinc cluster, also known as the Zn(II)
2
Cys
6
binuclear cluster. This family includes transcriptional activators like Gal4p,
Leu3p, Hap1p, Put3p and Cha4p from
Saccharomyces cerevisiae
, qutA and amdR from
Aspergillus
, nit4 from
Neurospora
and Ntf1 from
Schizosaccharomyces pombe
. Seventy-nine proteins were retrieved from databases by homology to the C
6
zinc cluster. All were fungal and 56 were found in the entire genome sequence
of
S.cerevisiae
. Sequence analysis suggests that 60 of the 79 proteins possess one or more
coiled-coil dimerization regions succeeding the C
6
zinc cluster. Previous comparisons of Gal4p and seven other C
6
zinc cluster proteins identified an additional region with weak homology. This
region, designated the middle homology region (MHR), was shown to be present in
50 of the 79 proteins. Although reported mutation and deletion analyses suggest
a role of MHR in regulation of protein activity, no function has yet been
assigned specifically to this region. We find that the family of MHR sequences
is confined to C
6
zinc cluster proteins and hypothesize that one MHR function is to assist the C
6
zinc cluster in DNA target discrimination.
A family of fungal transcriptional regulatory proteins shares in their DNA-binding domain a motif containing six cysteines which complex two Zn
2+
ions, the C
6
zinc cluster (
1
,
2
). A well-known representative for this protein family is one of the most well
studied eukaryotic transcriptional activators, the
Saccharomyces cerevisiae
Gal4p, which is required for transcriptional activation of the genes coding for
galactose metabolizing enzymes seen in the presence of galactose (
3
). In spite of the immense work on Gal4p, the picture of its function is far
from complete. Its DNA-binding domain, activation domain and galactose regulatory domain have all
been described (
4
-
6
). Gal4p binds as a dimer to DNA sites characterized by the presence of two 5'-CGG-3' triplets separated by 11 base pairs (
7
,
8
). However, X-ray analysis of the DNA-Gal4p
1-65
complex and
in vitro
binding studies, using only the 74 N-terminal amino acids, indicate that the DNA-binding domain alone has insufficient specificity for its UAS (
9
-
11
), suggesting that another protein or some other part of the protein has a
supplementary role in this respect. The main activation domain was originally
described as an acidic amphipathic [alpha]-helix (
12
), but later work suggested a [beta]-sheet structure and also showed that the acidic residues are not
required for activation (
13
,
14
). The Gal4p main regulatory domain is the target for the repressor protein
Gal80p, the activity of which is inhibited by galactose (
15
). The activation potential of Gal4p is, however, also moderately regulated by
glucose, and the central part of the protein seems to be involved in this
regulation (
16
). A short coiled-coil dimerization motif is present close to the C
6
zinc cluster in the C-terminal direction and contributes to the dimerization of Gal4p (
7
,
9
).
Some information is also available about the functional domains of other C
6
zinc cluster proteins. For example, as for Gal4p, the activation domain has
been localized to acidic C-terminal regions in the cases of Hap1p (
2
), Leu3p (
17
), Pdr3p (
18
), nit4
Ncr
(
19
), qa1F
Ncr
(
20
) and Cha4p (
21
). Furthermore, among eight of the C
6
zinc cluster proteins an internal domain has been shown to exhibit sequence
conservation, but no function has yet been assigned to this domain (
22
-
24
). Whatever function(s) this region may have, it is obvious to speculate that it
is closely related to that of the C
6
zinc cluster.
A general means of delineating functional domains of a regulatory protein is a
mutation and deletion analysis. However, sequence comparisons among similar
proteins may link observations on functions in one protein to observations made
on the other proteins. In addition, homologies can define regions with
importance for the function of the proteins. In the present report we have
conducted a comparative study of all published amino acid sequences of putative
C
6
zinc cluster proteins in order to define the locations of possible functional
domains and to discuss their function.
EMBL, TREMBL, SWISSPROT and the PIR databases were searched employing the GCG
programs TFASTA, FASTA, PROFILEMAKE and PROFILESEARCH (
25
-
27
). The searches were performed September 11, 1996. For the PROFILEMAKE the
parameter table used was BLOSUM62 (
28
). To search the entire
S.cerevisiae
genome with PROFILESEARCH a database of peptides had to be constructed. The
chromosome sequences, retrieved from MIPS, were translated in all six reading
frames. All amino acid sequences >70 residues from stopcodon to stopcodon were
extracted. A database of these peptides (~30 000) was then searched in parallel with the searches of the other
databases.
Sequences were aligned using the GCG program PILEUP (
25
) and visual adjustments.
Coiled-coils were predicted with the EGCG program PEPCOIL (
29
). In Table
2
also the prediction program Paircoil (
30
) was used.
Within the C-terminal 100 amino acids of each protein the window giving the sum of
charges with the most negative value was determined. The range of the window
size was between 1 and 30 residues. The amino acids lysine, arginine and
histidine were each assigned one positive charge, aspartate and glutamate were
each assigned one negative charge, whereas no charge was assigned to the
remaining amino acids.
The C
6
zinc cluster motif is a common DNA-binding domain and contains six cysteines in the pattern CX
2
CX
6
CX
6
CX
2
CX
6
C (
2
) which complex two Zn
2+
ions (
1
,
31
,
32
). This cysteine pattern was used to select C
6
zinc cluster protein sequences present in the databases. From this first search
keratins and other proteins with a very high cysteine content were discarded.
The remaining proteins were aligned to generate a profile using PROFILEMAKE (
26
). This profile was used for a second search of the databases by PROFILESEARCH (
26
). Again, a profile was generated including the new proteins and the procedure
repeated until no new proteins were found. Seventy-nine proteins with the C
6
zinc cluster motif were identified (Table
3
). The motif is only found in fungi, ranging from baker's yeast to filamentous
fungi and even in the very distantly related
Schizosaccharomyces pombe
. In the entire genome of
S.cerevisiae
58 different proteins were found. The genes for two of the 58 proteins (
MAL28
and
MAL63
) were not present in the strain used for the sequencing project.
An alignment of the C
6
zinc clusters found is shown in Figure
1
. The metal-binding domain consists of two substructures, each containing three
cysteines joined by a region of variable length. On both sides of the cysteine
cluster there is a high occurrence of basic amino acids. The N-terminal three cysteines form a relatively conserved structure with a
basic region between cysteine two and cysteine three and the structure is most
often preceded by an alanine or serine and succeeded by an aspartate. The very
conserved proline, present in the region between the two substructures, is
important to avoid strain in the loop region (
9
), which correlates with the observation that the proline is only absent if
there are additional residues in the loop region. The spacing between cysteine
five and cysteine six is variable, ranging from six to nine residues with six
being the most common.
Witte and Dickson (
39
) proposed that Lac9p and Gal4p form an [alpha]-helical structure in a region 18-27 amino acids following the C
6
zinc cluster. X-ray analysis of the Gal4p-DNA complex showed that this region is involved in dimerization
through a coiled-coil structure (
9
). Later X-ray analysis of the Ppr1p-DNA complex showed that also Ppr1p dimerizes via a coiled-coil structure located C-terminally to the C
6
zinc cluster (
33
). The C
6
zinc cluster itself interacts with the CGG-triplets (
34
) while domain-swapping experiments have shown that the linker region between the C
6
zinc cluster and the coiled-coil element is responsible for additional target site specificity (
8
). However, some C
6
zinc cluster proteins bind non-repeat DNA sequences, suggesting that they do not dimerize (Table
1
).
Table 1
Published consensus binding sites for C
6
zinc cluster proteins
in vivo
and
in vitro
To investigate whether potential dimerization via coiled-coil regions is a general feature of C
6
zinc cluster DNA-binding, all known C
6
zinc cluster proteins (Table
3
) were analyzed for the presence of potential coiled-coil regions. A strategy developed by Lupas
et al
. (
29
) assigns within a window of 28 residues (four heptad repeats) a score to every
amino acid residue dependent of its likelihood to reside in a coiled-coil. However, a window of 28 residues failed to localize the coiled-coils identified in Gal4p and Ppr1p by the X-ray analysis (data not shown) and gave very few positive
predictions. Similar problems arose with a window size of 21 residues (three
heptad repeats, data not shown). Although the method is developed for
prediction of long coiled-coil stretches and is less reliable on very short stretches, a window size
of 14 residues predicted the known coiled-coils. The coiled-coils identified by the X-ray analysis are very short; 15 residues in the case of Gal4p
and 19 residues in the case of Ppr1p. This makes prediction difficult and
previous reports have used alignments to the known coiled-coils of Gal4p and Ppr1p rather than prediction tools (
8
,
40
,
41
).
Thus, using a 14 amino acid residue window, a profile was calculated for each of
the 79 C
6
zinc cluster proteins and the resulting profiles aligned with respect to the
last cysteine in the C
6
zinc cluster motif. At every position all residues having a score >1.35 or 1.5
[reported to yield 50 or 95% correct prediction when a 28 residue window is
used (
26
,
27
)] was counted and the result plotted relative to the distance from the sixth
cysteine of the C
6
zinc cluster motif (Fig.
2
). A high peak reaching ~40% of the sequences is seen immediately following the C
6
zinc cluster from position 5 to 41. Also shown is the result of choosing a
score >1.8 as being very significant. This resulted in a similar curve in which
the peak adjacent to the C
6
zinc cluster is even more pronounced compared with other peaks. The result
shows that the occurrence of coiled-coils after the C
6
zinc cluster is a rather general feature of the C
6
zinc cluster proteins. However, the two curves also suggest that the prediction
is not sufficient to eliminate all noise since reduction of the presumed noise
by choosing a higher cut level for the scores also drastically reduces the
presumed real predictions. In Table
2
the positions and scores for predicted coiled-coils within the first 150 amino acids following the C
6
zinc cluster are shown. Sixty of the 79 proteins have coiled-coil structure as determined by this method and in many cases the proteins
have several short coiled-coils. The prediction of coiled-coils in the case of Gal4p and Ppr1p respectively, matches the
location of their known coiled-coils (
9
,
33
), but also suggests that additional coiled-coils are positioned further towards the C-terminus (Table
2
). This region was not included in the peptides used in the X-ray analyses but shown earlier to possess dimerization activity (
7
).
Many of the selected proteins function as transcriptional activators. However,
activating regions have been defined only in the case of Gal4p (
5
), Leu3p (
17
), Hap1p (
2
), Pdr3p (
18
), nit4
Ncr
(
19
), qa1F
Ncr
(
20
) and Cha4p (
21
). In all seven proteins the main transcriptional activation domain is C-terminally located. In the case of Gal4p and Pdr3p an additional
activation domain is located near the C
6
zinc cluster domain (
5
,
18
). Although there is no strict relationship between negative charge and strength
of activation, the activation domain generally correlates with an acidic part
of the protein (
5
,
14
). Thus, the C-termini of the C
6
zinc cluster proteins were searched for parts enriched in acidic amino acids
(Table
3
). The degree of acidity ranges from a single negative charge (priB
Led
and Ybl066p) to -23 (Ybr150p), with most figures in the range of -3 to -10.
Table 3
.
Summary of data for the C
6
zinc cluster proteins
a
Name of sequence used in this article (
S.cerevisiae
names are followed by a `p' and the rest by abbreviation of species name).
b
SWISSPROT(SW:), PIR(PIR:), or MIPS( ) name.
c
Size of the published protein sequence.
d
C
6
zinc cluster location (position 10-49 in Fig. 1).
e
Distance between the C
6
zinc cluster and the first coiled-coil.
f
Sum of coiled-coil sizes from Table 2.
g
MHR location as shown in Figure 3.
h
Region of maximum acidic charge in the C-terminus.
i
Binding site; IR, inverted repeat, DR, direct repeat, NR, no repeat.
j
Number of base pairs between potential CGG triplets.
k
Sequence from intron two has been included.
l
Sequence from SW:P38781 has been used due to frameshift.
m
The entire ORF is not reported.
Besides the C
6
zinc cluster a second region of weak homology has been found among eight of the
C
6
zinc cluster proteins (
22
-
24
). We performed a thorough comparison and alignment of the 79 found C
6
zinc cluster proteins in an attempt to define the extent of the domain and to
determine which of the proteins contain this conserved domain structure. Thus,
as many as possible of the protein sequences were aligned to the original
alignment of six proteins by Chasman and Kornberg (
22
). Afterwards the comparison was expanded sideways using a combination of automatic and manual approaches
and including, when possible, additional proteins (Fig.
3
). It was possible to align 50 of the 78 C
6
zinc cluster proteins for which the full sequence is available. This region of
homology is mainly located to the middle of the proteins (Table
3
) and we have designated it the middle homology region (MHR).
To address whether the MHR is confined to C
6
zinc cluster proteins, we made two new profiles. First, an MHR profile was made
from 15 proteins with high homology to the MHR consensus. The 15 proteins were
chosen such that closely related sequences, like Gal4p and Lac9p, were only
represented once. The chosen proteins were amdR
Aor
, Cha4p, Gal4p, Hap1p, Mal63p, nirA
Eni
, ntf1
Spo
, Pdr3p, Put3p, uaY
Eni
, yakB
Spo
, yao7
Spo
, Yer184p, Yhr178p and Yol089p. This profile was employed to search the
databases for other proteins containing the MHR. Among the sequences selected
only C
6
zinc cluster proteins had a high homology to this profile. Since the MHR is not
a very highly conserved domain, the search also picked up non-C
6
zinc cluster proteins fitting the profile better than some of the sequences
aligned in Figure
3
. It is noteworthy, however, that 43 of the 50 aligned C
6
zinc cluster proteins had a better fit than any of the non-C
6
zinc cluster proteins. In the second approach, an MHR profile was generated
using the five proteins with the most conserved MHR (Cha4p, Gal4p, uaY
Eni
, Yer184p and Yol089p). A search of the databases with this MHR profile gave the
result that 25 of the 50 aligned C
6
zinc cluster proteins had a better fit than any non-C
6
zinc cluster protein. We were unable to align the best fitting non-C
6
zinc cluster sequences to the MHR alignment employing the same strategy as used
in constructing the alignment. In addition, the best fitting non-C
6
zinc cluster proteins identified in both searches were functionally and
evolutionary unrelated.
The results described above show that the MHR is a very common motif in C
6
zinc cluster proteins. In contrast, our searches have failed to locate this
motif in any other group of proteins, strongly suggesting that the MHR has a
structural or functional role confined to C
6
zinc cluster proteins.
Our comparison of C
6
zinc cluster proteins has shown that the general protein structure consists of
an N-terminal C
6
zinc cluster followed by interrupted coiled-coils. In the middle part of the protein there is a region of common
homology, the MHR, and an acidic region in the C-terminal is responsible for the activation.
Our search for C
6
zinc cluster proteins has shown that the motif is very common in the fungal
world. In
S.cerevisiae
alone there are ~56 proteins dependent on the strain. On the other hand, the motif has been
impossible to detect outside the fungal world. This is in contrast with other
DNA-binding domains, like the C
2
H
2
zinc finger, the leucine zipper and the helix-loop-helix, which are utilized by a much more diverse array of
organisms (
42
). We speculate that the lack of the C
6
zinc cluster motif in eukaryotes with a higher DNA content might reflect
limitations in discrimination capacity.
The prediction of coiled-coils indicated that many of the C
6
zinc cluster proteins contain short coiled-coil regions following the zinc cluster. Although each coiled-coil region is very short, the combined action of the small coiled-coil regions might provide sufficient stability for the
proteins to be able to dimerize. We speculate that the different interruptions
of the coiled-coils might serve as a key to assure that only the correct (homo)dimers
are formed.
What is the function of the MHR? All available data suggest a role in regulation
of transcriptional activity. For example, deletion of large parts of the middle
region, including MHR, of Leu3p and qa1F
Ncr
converted the proteins into constitutive activators (
20
,
43
,
44
) and the internal part of Gal4p including MHR has been shown to be involved in
glucose repression of Gal4p activity (
16
). Likewise, a deletion in amdR
Eni
covering approximately the MHR abolished regulation and reduced the activity of
the protein to the uninduced wild-type level (
45
). However, our very important observation that MHR is only found in C
6
zinc cluster proteins suggests a functional relation between the C
6
zinc cluster and the MHR. The C
6
zinc cluster-DNA complex of Gal4p and Ppr1p respectively, has a relatively open
structure (
9
,
33
). Furthermore, the
in vivo
target site specificity of Gal4p is more restricted than the
in vitro
specificity of the Gal4p DNA-binding domain. This suggests that one or more mechanisms operate
in vivo
that enhance the intrinsic DNA-binding specificity of the protein (
9
,
10
,
33
). By itself, the C
6
zinc cluster is only able to recognize the CGG triplet on both flanks of the
binding sequence and the spacing between them (
9
,
33
), so one possibility is that an additional protein is necessary for
in vivo
recognition of the binding site. However, extensive genetic analysis of the
galactose metabolism pathway has so far failed to uncover likely candidates for
proteins assisting Gal4p in the DNA binding. Gal11p has been proposed as a
candidate (
9
) but is now considered to be an enhancer of basal-level transcription (
46
,
47
). We hypothesize that the assisting factor is not another protein but the MHR
domain of the C
6
zinc cluster proteins. This function of MHR explains the low degree of homology
in the domain sequence since a specific MHR has evolved together with its
specific C
6
zinc cluster.
We wish to thank Morten Kielland-Brandt for critical reading of the manuscript. This work was supported by
the NOVO Foundation, the Danish Center of Microbiology and the Danish Research
Councils.
*To whom correspondence should be addressed. Tel: +45 3532 2119; Fax: +45 3532
2113; Email: gensteen@biobase.dk


REFERENCES
Return



