ABSTRACT
Cp1 is a polymorphic short interspersed repeat (SINE) which is distributed over
the whole genome of the dipteran
Chironomus pallidivittatus
, and is particularly abundant in the centromeres. It contains two different
sequence modules, one of which, the B module, has a polymerase III internal
control region (ICR) typical for tRNA genes (A and B box). Such sequence motifs
are common in SINEs and assumed to function in RNA-mediated transposition. In the present case, however, several structural
features speak for another role. An investigation of the transcription of the B
module shows that it encodes a 99 nt RNA species
in vivo,
Cp1-RNA, terminating within the module. The transcription unit is likely to
have evolved from a pre-tRNA gene and the transcript has sequence similarities to non-processed pre-tRNA. Most of the
in vitro
transcription is eliminated by deletion or substitution mutation of an upstream
TATA box, present within the B module, as well as by changing either the A or B
box. The properties of the transcript suggest that it does not have a role in
transposition but may have some other function, perhaps in the centromere.
SINEs are transposable elements derived from RNA polymerase III (pol III)
transcribed small structural RNAs which often carry functional internal
promoters (ICR) for their own transposition. The
Alu
family in primates is the best characterized member of this group. It consists
of an ~280 bp long dimeric DNA element derived from a 7SL RNA gene (
1
) and constitutes ~5% of the human genome (
2
). Most SINEs are, however, derived from tRNAs (reviewed in
3
) and have been seen in many species, like mammals (
4
-
6
), fish (
7
) and plants (
8
).
We have described a new type of SINE, termed Cp1, in chironomid insects (
9
). This element is interspersed between arrays of centromeric tandem repeats (
10
) and is also present at extracentromeric sites. Cp1 is polymorphic and modular
in design. It consists of two modules, the SCA and the B module, each ~200 bp long. The latter contains ICR consensus sequences (A and B box) and
termination signals typical for pol III genes, and it has other sequence
similarities with tRNA genes. Most SINEs only have internal transcription
control elements and probably use stop signals downstream of the element for
transposition. The unit in the B module has, however, properties suggesting
transcription for other purposes than transposition. There is a TATA box within
the SINE, upstream of the ICR region and there are typical pol III termination
signals within the element. Centromeric tRNA genes have previously been
observed in
Schizosaccharomyces pombe
(
11
,
12
). We were therefore interested in learning whether this component of the B
module is transcriptionally active
in vivo
and
in vitro
and, if so, how
cis
-acting control elements can be defined in relation to the transcription
unit.
Here we show that the B module is indeed able to direct pol III transcription
in vitro
and
in vivo.
Transcription
in vitro
requires the upstream TATA box in addition to both boxes in the ICR. The
in vitro
transcript is only 4 nucleotides (nt) larger than the
in vivo
form and some processing is therefore not excluded. The transcript is, however,
more similar to pre-tRNA than tRNA and starts 3 nt upstream of the 5'-end and extends 23 nt beyond the 3'-end of a processed tRNA. A predictive computer
program suggests that the product differs from tRNA in secondary structure and
adopts a hairpin structure. The transcript is unlikely to be of importance for
transposition of Cp1 since it only represents a minor part of the element. The
well defined transcriptional control elements which have survived evolution
from a tRNA gene indicate some function for this transcription unit. We propose
that this role is exerted in the centromere.
Total RNA was extracted from
C.pallidivittatus
embryos and salivary glands by standard methods (
13
). For RNA sequencing 10 [mu]g total RNA were annealed to 10 pmol 5'-end-labelled primer (5'-AATGACTCTTCCCGAGC-3'). The cDNA strand was
polymerized with 3 U AMV reverse transcriptase (Boehringer Mannheim) per 20 [mu]l buffer at 42oC in a mixture of deoxy/dideoxynucleotides.
For Northern analysis total RNA was used as well as the poly(A)
+
fraction, isolated with the Dynabeads biomagnetic separation system (Dynal).
The RNAs were fractionated in a 6% polyacrylamide denaturing gel (19:1)
containing 8 M urea. The RNA was then electrophoretically transferred (Trans
Blot Cell, BioRad) onto Hybond
+
nylon membranes (Amersham) in 0.5* Tris-borate buffer at 20 V/cm overnight in the cold. The filters were
hybridized with the clone pCp254 (
10
) and rehybridized with the salivary gland secretory protein gene sp12 (
14
).
Total RNA (10 [mu]g) was mixed with 5 pmol 5'-end-labelled primer described above in hybridization buffer
(0.15 M KCl, 10 mM Tris-HCl pH 8.3, 1 mM EDTA). The mixture was incubated in boiling water for 15
min and then annealed at 48oC for 1 h.
The annealing reaction was precipitated and resuspended in 6 [mu]l 5* AMV reverse transcriptase buffer (250 mM Tris-HCl, 40 mM MgCl
2
, 150 mM KCl, 5 mM DTT; pH 8.5 at 20oC), 3 [mu]l deoxynucleotides (5 mM each), 3 U AMV reverse transcriptase
(Boehringer Mannheim) and water to 30 [mu]l. The reaction was incubated at 42oC for 1 h. After precipitation cDNA was analyzed on a 8% polyacrylamide-7 M urea sequencing gel (19:1).
The 3'-end of the transcript was mapped according to Frohman
et al
. (
15
). Total embryonic RNA was treated with
E.coli
poly(A) polymerase (Pharmacia) in the presence of ATP. Polyadenylated RNA (~5 [mu]g) was heated at 70oC and cooled on ice. The RNA was then added to a 20 [mu]l reverse transcription mixture containing 2 pmol 3'-poly(dT) primer: 5'-AAGGATCCGTCGACATCGATAATACGACTCACTATAGGGAT
(17)
-3'. After the addition of 10 U AMV reverse transcriptase the reaction
was incubated for 2 h at 42oC. The resulting cDNA pool was diluted to 1 ml with water, 5 [mu]l of which were used for PCR amplification with the following primer
pair: (i) a 17mer primer present in the adapter poly(dT) primer (5'-AAGGATCCGTCGACATC-3') containing a
Bam
HI restriction site, (ii) a 17mer internal to the B module and complementary to
the 17mer used for primer extension and RNA sequencing (5'-GCTCGGGAAGAGTCATT-3'). The cDNA was then amplified by cycling at 94oC for 1 min, 50oC for 45 s and 72oC for 45 s, in 30 cycles. A single low
molecular weight band was amplified. This product was isolated from agarose
gel, blunt-ended with T4 polymerase and finally digested with
Bam
HI. The resulting fragment was cloned into a
Bam
HI/
Hin
dII digested pUC18 plasmid and sequenced by the dideoxy-termination method.
Chironomus tentans
tissue cultured cells (
16
) were used for cytoplasmic cell extracts (
17
). All 40 [mu]l transcription reactions contained 20 [mu]l cell extract and were carried out in 80 mM KCl, 3 mM MgCl
2
, 3 mM DTT, 30 mM HEPES-KOH pH 8.0, 0.5 mM each of unlabelled UTP, ATP and CTP, 0.050 mM [[alpha]-
32
P]GTP (5 Ci/mmol), 8 mM creatine phosphate, 0.5 U creatine phosphokinase and 1
U/ml RNAsin (Promega). Supercoiled plasmid (500 ng) was used as template. All
Cp1 constructs were cloned in pUC18 and most of them have previously been
described: pCp627 and pCp254 (
10
), pCp116, pCp125, pCp413 and pCt2 (
9
). The pCt2.2 clone is the fragment of pCt2 between the left
Eco
RI site in pCt2 and the
Eco
RI site between the S and C fragments of the SCA module. The pCp1A clone is an
unpublished PCR generated fragment where the B module is preceded by a SCA
module and followed by a 155 bp centromeric repeat (
10
). The reactions were incubated at room temperature for 2 h, phenol/chloroform
extracted and precipitated prior to analysis in 6% polyacrylamide-8 M urea gels (19:1).
The insert of the pCp627 clone (
10
) was isolated with
Eco
RI and treated with T4 polymerase. It was then cleaved with
Hin
dIII and a fragment containing the B module preceded by the A segment of the SCA
module (which is the one present in pCp254, a subclone of pCp627) subcloned
into the
Hin
dIII/
Hin
dII sites of pUC18. This plasmid was then linearized with
Xba
I/
Sac
I and digested with Exo III nuclease (Erase-a-base system, Promega) at 16oC. Mutants were recircularized, transformed and sequenced by
the dideoxy-termination method.
A series of plasmids with substitutions of TATA box, transcription start site, A
and B boxes, respectively, were created by PCR with the clone pCp254 as
template. The target regions were substituted by the introduction of
restriction endonuclease sites with oligonucleotides containing the desired
sequences in conjunction with primers flanking the cloning site of pUC18. The
oligonucleotides used are shown in Table
1
.
The PCR products were digested with
Spe
I for the TATA box mutant,
Cla
I for the transcription start site mutant and
Xho
I and
Sty
I for the A and B box mutants, respectively, and either
Eco
RI or
Hin
dIII and ligated into the polylinker of pUC18.
Table 1
Northern analysis of RNA from embryos or salivary glands showed a low molecular
weight RNA species in the 100 nt range in total RNA, absent in the poly(A)
+
fraction when pCp254 or a clone containing only the B module was used as probe
(pCp254, with the whole A segment of the SCA module and the initial 29 bases of
the B module deleted; Fig.
1
). The product, designated Cp1-RNA, contained 99 nt as determined by sequencing. Its 5'-end was mapped by primer extension with help of a 17mer
primer hybridizing downstream of the B box (positions 151-168 in the pCp254 sequence shown in Fig.
6
). The main extension product corresponded to an 81 nt fragment, mapping the 5'-end of Cp1-RNA to 10 positions upstream of the first base pair of the A
box (Fig.
2
A). We sequenced the transcript in a total RNA mixture using the same 17mer as
for primer extension. The 3'-end of Cp1-RNA was investigated with 3'-RACE PCR. A single electrophoretic band was
obtained. The PCR product was cloned in pUC18 and sequenced. The RNA derived
sequences agreed with those obtained directly from DNA within the stretch of 99
bp and are shown in Figure
2
B (in predicted secondary structure form). The localization within the B module
of DNA corresponding to the transcript is shown in Figure
3
.
Previously we found that much of the sequence can be aligned to tRNA genes from
D.melanogaster
(
9
). We have extended the search and found high scores in comparisons with several
tRNA genes from different species, including rat tRNA
Lys
and tRNA
Thr
from prokaryotes (
Chlamydia
and
Micrococcus
). The identity ranges from 98% compared with the
Chlamydia
tRNA
Thr
5'-end (30 starting nt) to 86% with rat tRNA
Lys
(for a stretch of 26 nt). However, the sequence did not give a typical tRNA-like cloverleaf with the RNAFold software but a hairpin (Fig.
2
B).
Figure
Figure
To examine transcriptional properties of the B module we tested different Cp1
containing plasmid clones in a cell free extract made from a
C.tentans
epithelial cell culture. These clones represent the B module in different
naturally occurring DNA environments, schematically illustrated in Figure
5
. The sequences of the B modules in these clones, which have some mutational
differences, are given in Figure
6
. It can be seen in Figure
7
A that the three centromeric clones (no. 1, 2 and 7) and the extracentromeric
transpositions (no. 3, 4 and 5) transcribed a 103 nt long RNA (exact size
determined in sequencing gel) whereas the extracentromeric transposition no. 6
was inactive. A shorter 78 nt byproduct of the cell extract (used in the
experiments shown in Fig.
7
A and B) was also labelled, present also when no template was included (not
shown). Consequently, there was a minor size difference between
in vitro
and
in vivo
transcribed RNA. We do not yet know the start point for the primary transcript
which is likely to be that of the
in vitro
transcript and, consequently, within 4 nt of the start of the
in vivo
product.
Figure
Figure
Figure
The transcriptionally inactive B module (Fig.
7
A, lane 6) contains ICR and TATA boxes matching the consensus. Three insertions
are, however, present upstream of the A box at -2, -1 and +6 relative to the start of the
in vivo
transcript (Fig.
6
). In another B module the A box contains a duplication of the C in the fourth
box position and has a C substituted for a T in the first position.
Nevertheless, transcription of this clone was not significantly lower than that
of other clones (Fig.
7
A, lane 5).
Several gene classes transcribed by RNA pol III are known to either require or
be influenced by both 5'- and 3'-flanking regions. In the experiment reviewed above the
templates contain B modules flanked at their 5'- and 3'-ends by DNA sequences of different kinds. At the 5'-side transcriptionally active units
border to a complete SCA module (nos 1 and 3), to the terminal A segment of
such a module (nos 2 and 7), to an inverted, initial part of the SCA module
(no. 5), or to the terminal part of another B module, in turn preceded by a
large segment of an inverted SCA module (no. 4). At the 3'-side there are centromeric repeats for no. 1 and 2, the upstream
part of the SCA module in no. 5, target site duplication with adjoining host
genomic DNA in nos 3 and 4 and vector sequence in no. 7. These differences did
not affect the transcriptional efficiency of the templates.
Further support that Cp1-RNA is a pol III product was obtained when [alpha]-amanitin was used
in vitro
during transcription of pCp254 (no. 7), which was resistant to the drug at the
highest concentration tested, 500 [mu]g/ml (results not shown), in agreement with previous observations of
insensitivity of insect pol III
(Bombyx mori)
to the drug (
19
). A
KrYppel
pol II promoter construct (Stratagene) was completely inactive in the cell free
extract from the
C.tentans
epithelial cell culture and, therefore, could not be used to control the
efficiency of [alpha]-amanitin in this extract. In parallel experiments we could,
however, show that the pol II construct transcribed in a
Drosophila
embryonic nuclear extract (Stratagene) and was completely sensitive to [alpha]-amanitin down to 5 [mu]g/ml, as expected. Furthermore, this extract promoted weak
transcriptional activity from pCp254 (tested in the presence of the
Krüppel
construct) and this transcription was completely insensitive to [alpha]-amantin even at the highest concentration tested, 250 [mu]g/ml (results not shown). An additional important aspect of
these experiments is that they show that our extract from the
C.tentans
epithelial cell culture is specific for pol III.
Since transcription of the B module was independent of components outside of the
unit, we focused our attention on possible regulatory elements within the
module, in particular the region delimited by the transcription start and the 5'-end. Within this 80 bp long region there is a sequence that
conforms with the TATA box consensus (positions -31 to -25 in relation to the first nucleotide of the
in vivo
transcript). To determine whether upstream elements influence transcription of
the B module, we created a series of partially deleted B modules by progressive
digestions of the pCp254 insert (Fig.
5
, no. 7) with Exo III nuclease. Deletion end points are shown in Figure
6
and results of
in vitro
transcription in Figure
7
B. One mutant, extending to position -99 (in relation to the first nucleotide of the
in vivo
transcript) only eliminates part of the A segment of the SCA module upstream of
the B module which is intact. The mutants used in lanes 2 and 3, with deletions
to -61 and -50, affecting the 5' part of the B module, directed transcription at the same
rate as the intact module in lane 1. With larger deletions, passing the TATA
box, transcription was completely abolished (lanes 4 and 5). No detectable
activity was found even after long exposure times.
The B module of pCp254 was mutated by substitutions that eliminated the TATA
box, the initiation region and the A and B boxes, respectively, as shown in
Figure
6
. DNA from the constructs was transcribed together with control DNA (Fig.
7
C). The control contained a 38 bp insertion (Fig.
6
) into the
Mbo
II site between the B box and the first stop signal (between position 167 and
168). Transcription was largely dependent on the integrity of the TATA box and
the two boxes of the ICR. There were weak 103 nt signals for the TATA (lane 2)
and A box (lane 4) constructs but incorporation directed by the box B mutant
(lane 5) could be seen only after prolonged exposure. When, however, the
initiation region was mutated (lane 3), there was no effect on transcription.
The relative template efficiency of the control construct was somewhat lower
than that of the non-mutated template, and it can therefore not be excluded that the region
around the
Mbo
II site has some effect on transcription efficiency. The 78 nt byproduct was not
seen with the extract used in these experiments.
Chironomus
centromeres have accumulated a polymorphic SINE-like transposable element, Cp1, present also in extracentromeric positions
(
9
). It contains two sequence modules, with variable arrangements in different
elements. Both modules start with a 22 bp sequence similar to the integration
site for the non-LTR retrotransposon, R2, in the pre-ribosomal gene (
20
). One part of the B module is likely to be tRNA derived, and contains consensus
sequences for pol III internal control regions. The origin of the remainder of
Cp1, i.e. less than half of B and most of SCA has not been traced. ICRs with A
and B boxes are common in SINEs, reflecting their origin, and do not
necessarily imply cellular function (
21
). The B module has, however, structural features unusual for a SINE, possibly
due to a cellular role other than transposition. Firstly the ICR box sequences,
with one exception, always agree with established consensus, secondly distinct pol III stop signals are present within the element, which is not the
case for SINEs in general, and thirdly the transcription unit is preceded by a
TATA box. Furthermore, DNA motifs for pol III transcription are not localized
to the 5'-part of the element, which is probably necessary for transposition.
We therefore suspected that the B module contains a complete pol III
transcription unit, entirely controlled within the module, probably unrelated
to a role in transposition.
There was pronounced transcription
in vivo
resulting in a product within the range of pre-tRNA, i.e. 99 nt, designated Cp1-RNA.
In vitro
an only slightly larger transcript was obtained, 103 nt. The small difference
between
in vivo
and
in vitro
results could have alternative explanations, such as processing in the nuclear
compartment, requiring factors not present in the cytoplasmic extract,
readthrough at the stop signal
in vitro
and the possible presence of a master gene for a 99 nt transcript. There was no
evidence for a processed product of a size more typical for mature tRNAs. The
difference in size between a mature tRNA and Cp1-RNA is due to differences at both the 5'- and 3'-ends. Cp1-RNA starts 3 nt upstream and terminates 23
nt downstream of a typical tRNA. The similarities with pre tRNA
ala
2
of
Bombyx mori
are striking, where the processed parts of the 5' leader and 3' trailer contain 3 and 22 nt, respectively (
22
).
The transcription unit in the B module is thus likely to have evolved from a
tRNA gene, during which mutations have taken place that result in loss of
processing and may have changed the secondary structure from a cloverleaf to a
hairpin.
Some transcriptional control features probably also have evolved.
Cis-
acting control elements were studied with natural B modules, experimental
deletions and substitution mutants. We found that
cis
elements required for basal transcription are contained within a stretch
extending to -50, still within the B-module. This region has a TATA box starting at -31. The TATA box is indeed an essential element, since a
substitution mutation of this motif eliminated almost all transcription. Also
each component of the ICR were, separately, essential for transcription as
shown by other experimental mutations. A naturally occurring variant of the A
box with only two base changes nevertheless transcribed. Our experiments do not
exclude other control elements within the transcription unit and its upstream
region extending to -50.
We found no evidence for regulatory sequences downstream of the B module (which
terminates ~20 bp downstream of the transcription unit). In this respect transcription
control may differ from that of some insect pol III transcribed genes which may
be influenced by relatively large 3' flanking regions (
23
,
24
).
Interestingly a natural variant which had three extra base pairs between the
TATA box and the A box (pCt2.2), in the initiator region, was completely
inactive
in vitro
. Without additional experimental support this result must be interpreted with
caution. Since, however, a substitution mutation in the same region showed full
activity, it may well be that it is the change in tilt of the DNA double helix
between the two boxes due to the insertions, almost one third of a turn, that
inactivates transcription. This could be because TFIIIC, binding the ICR, is
interacting with TFIIIB and has a role in attaching it to the upstream control
region (
25
). A more remote alternative is that it is the host genomic DNA upstream of the
transposition that has an inhibitory influence.
In conclusion the Cp1-RNA gene shows a unique combination of pol III control elements. Although
several tRNA genes have upstream regulatory motifs, some of which are AT rich (
26
-
31
) there is usually no TATA box, and the upstream elements are modulatory rather
than essential. An exception is the
Xenopus laevis
tRNA
(Ser)Sec
gene with an essential upstream TATA-box in addition to other regulatory sequences (
32
,
33
). In that case the A box is, however, non-functional. The promoter of the U6 gene in
S.cerevisiae
(
34
) is similar to the Cp1-RNA gene in having a TATA box and an internal split promoter although
there the B box has an unusual distal position. The U6 TATA box is,
furthermore, modulatory rather than essential. In summary, thus, the Cp1-RNA gene appears to be unique with its tripartite promoter in which all
three components are nearly essential, at least
in vitro
.
Our results raise the following questions. First, if Cp1-RNA cannot have a role in transposition, what then promotes the mobility
of Cp1? Secondly, is there any evidence that Cp1-RNA is functional at all?
Cp1 occurs both in centromeres and outside of them. In the former localization
Cp1 is integrated between centromeric 155 bp tandem repeats such that each
integration site is followed by a short partial duplication of the 155 bp
sequence situated upstream of the element. Even if this could be a target site
duplication, it is not necessarily of recent origin. This is because 155 bp
sequences might be exposed to parallel evolution and the duplication thus
conserved irrespective of age. It is possible, on the other hand, that
extracentromeric elements transposed more recently, since they are surrounded
by perfect target site duplications. Extracentromeric elements are highly
polymorphic and may be functionless like the element cloned in pCt2.2.
Furthermore, there are also large numbers of highly degenerate Cp1 fragments in
the extracentromeric genome (
9
). Therefore it appears unlikely that Cp1 could have an important role in
extracentromeric loci.
It is possible that the transposition of Cp1 has originally occurred into the
centromeres from a site in the pre-ribosomal transcription unit judging by the nature of the DNA at the 5'-end of each of its sequence modules. Also extracentromeric
elements might have this origin but another possibility is that they are
continuously acquired by transpositions from intracentromeric units.
Cp1 might have a physiological role, but not necessarily in extracentromeric
positions. This view is supported by the functional control elements in the Cp1-RNA gene and because a new well defined secondary structure has been
obtained during evolution from a tRNA gene. It is also of considerable interest
that the
S.pombe
centromeres contain clustered tRNA genes reported to be functionally active (
11
,
12
).
Cp1 may show interesting similarities with the
S.cerevisiae
centromere. When CEN 4 DNA from this organism was inserted into negatively
supercoiled plasmid, it obtained properties suggesting that DNA unwinds to form
RNA-like foldback structures, which may also exist
in vivo
(
35
). When individual strands from intracentromeric Cp1 are allowed to form
secondary structures by the M-fold program V.8, they give highly base paired, relatively symmetrical
structures with remarkably low free energy (our unpublished data). The
transcription unit in the B module is at the right end not only of this module
but of entire Cp1 elements (the transcriptionally inactive element in clone no.
6, Fig.
5
, being an exception). This transcription is directed away from the element.
Transcription might create negative superhelical tension in the region upstream
of the transcription unit (
36
,
37
), i.e. within the larger part of Cp1. This in turn should favour transitions to
single-strand structures. Cp1 is inserted into the AT-rich conserved region of the 155 bp repeat (
38
). These tandem repeats might function as spacers, periodically separating
functionally important Cp1 insertions. AT-rich regions could facilitate structural transitions (
39
). We might speculate that structural-functional change in the centromere could
be regulated in this way by control of pol III transcription.
The investigation was supported by grants from the Swedish Cancer Society, the
Philip-Sörensen Foundation and the Crafoord Foundation. We are indebted to Joakim Galli and Lars Wieslander, Karolinska Institutet, Stockholm for the sp12 DNA clone.
Oligonucleotide
TATA box mutant
5'-CCTGC
ACTAGT
TTGGCATTCTGCCAAT-3' and
5'-CGATCG
ACTAGT
TGAACCACTCTTTTTTC-3'
Transcription start site mutant
5'-GATGCC
ATCGAT
AGCCCGATAGCTCAG-3' and
5'-CATAGC
ATCGAT
TTGGCATTGGCAGAATG-3'
A box
5'-CGATT
CTCGAG
TCTGAGCACTTGACCGGC-3' and
5'-TTTTC
CTCGAGTTATG
TCGGGCTTTGCCATTGGC-3'
B box
5'-TTAGCC
TCCTTGG
CGCTCGGGAAGAGTCATTGG-3' and
5'-GCTATT
CCAAGGACGTA
TCGCACCTCTCGATTGCC-3'





REFERENCES
Return
