ABSTRACT
Pre-mRNA transcripts in a variety of organisms, including plants,
Drosophila
and
Caenorhabiditis elegans
, contain introns which are significantly richer in adenosine and uridine
residues than their flanking exons. Previous analyses using exonic and intronic
replacements between two nonequivalent 5
'
splice sites in the 469 nt long
rbcS3A
intron 1 provided the first evidence indicating that, in both tobacco and
Drosophila
nuclei, 5
'
splice site selection is strongly influenced by the position of that site
relative to the AU transition point between exon and intron. To differentiate
between two potential models for 5
'
splice site recognition, we have expressed a completely different set of
intronic and exonic replacement constructs containing identical 5
'
splice sites upstream of
[beta]
-conglycinin intron 4 (115 nt). Mutagenesis and deletion of the upstream 5
'
splice site demonstrate that intronic AU-rich sequences function by promoting recognition of the most upstream 5
'
splice site rather than by masking the downstream 5
'
splice site. Sequence insertions define a role for AG-rich exonic sequences in plant pre-mRNA splicing by demonstrating that an AG-rich element is capable of promoting downstream 5
'
splice site recognition. We conclude that AU-rich intronic sequences, AG-rich exonic sequences and the 5
'
splice site itself collectively define 5
'
intron boundaries in dicot nuclei.
Despite tremendous advances in the understanding of pre-mRNA splicing over the past several years, the mechanism for splice site
recognition remains elusive. Several parameters have been implicated in 5' splice site selection. These include the degree of splice site
complementarity to the U1, U5 and U6 snRNAs (
1
-
5
), the nuclear concentration of SR splicing factors (
6
-
9
) and the concentration of hnRNP A1 protein (
10
). Some of the factors which affect 3' splice site recognition include the degree of branchpoint
complementarity to U2 snRNA (
11
,
12
) and the affinity of the 3' pyrimidine tract for U2AF
65
and polypyrimidine tract binding protein (
13
,
14
). The brevity of the intron border and branchpoint sequences, however, suggests
that the splicing machinery utilizes other pre-mRNA features in the process of splice site definition.
Mammalian and
Drosophila
genes often contain exonic and intronic splicing enhancer elements that
facilitate recognition of exons that have suboptimal splice sites and/or
lengths (
15
-
23
). Plant and invertebrate genes contain biased adenosine- and uridine-rich compositions in their introns (compared to exons) (
24
,
25
) that facilitate intron recognition. These transitions in AU-richness are required for efficient splicing in plants (
25
,
26
) and capable of modulating splice site selection patterns in
Nicotiana benthamiana
(tobacco) (
27
-
29
),
Caenorhabditis elegans
(
30
,
31
) and
Drosophila melanogaster
(
32
).
Based on our analysis of 5' and 3' splice site selection schemes in tobacco, we have proposed a
model for intron recognition in dicot nuclei suggesting that the 5' and 3' splice sites are selected in a position-dependent manner relative to AU-rich elements spread throughout the intronic sequences
(
27
-
29
). The experimental support for this model is derived from our demonstration
that AU-rich elements upstream from the 3' end of maize
Adh1
intron 3 are essential for defining the 3' splice site (
27
,
28
) and that AU transitions strongly modulate 5' splice site selection in pea
rbcS3A
intron 1 (
29
).
This model has now been extended in several ways. First, using derivatives of
soybean [beta]
-
conglycinin intron 4 containing two identical 5' splice sites and AU-rich intronic or AU-moderate exonic replacement sequences entirely different from
those used in our previous studies (
29
,
32
), we demonstrate that the relative activities of two equivalent 5' splice sites are strongly modulated by the composition of the sequence
between the
cis
-competing splice sites. Insertion of heterologous AU-moderate exonic sequences between the competing sites allows for
selection of the downstream 5' splice site; insertion of AU-rich intronic sequences between the competing sites allows for
selection of the upstream 5' splice site. Second, we provide evidence that the intronic AU-rich sequences in these transcripts function by stimulating
recognition of the upstream 5' splice site rather than by masking the downstream site. The 5' splice site buried within AU-rich intronic sequence is efficiently used when the
functional 5' splice site at the upstream AU transition is deleted or severely
mutated. Third, we define a role for AG-rich exonic sequences in plant pre-mRNA splicing by demonstrating that an AG-rich element is capable of promoting downstream 5' splice site recognition. These results indicate that
AU-rich intronic sequences, AG-rich exonic sequences and the 5' splice site itself collectively define 5' intron boundaries in dicot nuclei.
The constructs used in this study contain the entire fourth and fifth exons and
the intervening intron of the soybean (
Glycine max
L.) [beta]-conglycinin [alpha]' subunit gene (GenBank accession no. M26128) inserted
into the unique
Bgl
II site of pMON458 (
33
) as previously described (
34
). The substitution constructs were generated using an inverse PCR strategy to
replace the sequences between -75 and -3 of upstream [beta]-conglycinin exon 4 with an
Xho
I restriction site and to mutate the cryptic 5' splice site (
Recombinant pMON458 constructs were introduced into
N.benthamiana
leaf discs, samples were harvested 3 days after transfection and total RNA
samples were prepared as follows. Tissue (10-12 leaf discs) was ground for 1 min using a Mini-Beadbeater (Biospec Products) in 1 ml of an RNA isolation buffer
containing [50% phenol-chloroform (1:1), 50 mM LiCl, 50 mM Tris-HCl (pH 8.0), 5 mM EDTA, 0.5% SDS]. Samples were centrifuged for 5
min at 12 000
g
to separate the aqueous and organic phases after which each aqueous phase was
re-extracted with an equal volume of phenol-chloroform (1:1). The aqueous
phases were adjusted to 2 M LiCl by the addition of 8 M LiCl and high molecular
weight RNAs were precipitated on wet ice for 8-12 h. RNA was collected by centrifugation at 12 000
g
for 15 min and residual contaminating DNA was removed by digestion with 5 U
RNase-free DNase (Promega) for 60 min at 37oC. Total RNA from 10-12 leaf discs was resuspended in 100 [mu]l sterile water and 1 [mu]l was used for RT-PCR analysis.
First strand cDNA synthesis and PCR amplifications were done in a single
reaction mixture containing 1 [mu]g total RNA, 50 mM KCl, 10 mM Tris-HCl (pH 8.4), 2.5 mM MgCl
2
, 200 [mu]g/ml gelatin, 200 uM each dNTP, 5 U AMV reverse transcriptase (Promega), 2.5
U
Taq
DNA polymerase (BRL), 20 U RNasin (Promega) and 50 pmol of the 115 5' full and 115 3' full primers complementary to the 5' and 3' ends of [beta]-conglycinin exons 4 and 5, respectively
(Fig.
1
). First strand cDNA was synthesized for 30 min at 50oC and subsequently amplified by 15 cycles of PCR. Each PCR cycle consists
of 94oC denaturation for 1 min, 55oC annealing for 2 min and 72oC extension for 2 min. PCR products were fractionated on 1.5%
agarose gels containing 1* TBE buffer, transferred to Genescreen (DuPont) and probed with a random-hexamer
32
P-labeled [beta]-conglycinin exon 5 DNA probe. Blots were hybridized in 50%
formamide, 5* SSC, 25 mM sodium phosphate buffer (pH 6.5)
,
0.5% SDS (w/v), 5* Denhardt's solution for 16 h at 42oC and membranes were washed twice in 0.2* SSC, 0.1% SDS for 60 min at 68oC. The hybridization signals were quantified with a
Phosphorimager (Molecular Dynamics). The absence of contaminating vector DNA,
which might generate the same size product as unspliced transcript, was
verified by performing parallel RT-PCR reactions lacking AMV reverse
transcriptase. Coamplification of spliced and unspliced
rbcS3A
and other transcripts has indicated that this assay quantitatively amplifies
precursor and spliced products over a range of RNA concentrations and
precursor/product ratios (
32
; data not shown). The overall splicing efficiency cited for each transcript was
defined as spliced/(precursor plus spliced) transcript. The percentage of
transcript spliced at a particular 5' splice site was defined as the (amount of a single spliced
transcript)/(sum of all spliced transcripts). Each reported splicing efficiency
represents the average of at least four independent transfection experiments
with their corresponding standard errors (SE). Splice site selection patterns
were defined by cloning PCR products into pBluescript II SK+ (Stratagene) using
restriction sites in the PCR primers and sequencing using T7 DNA polymerase (US
Biochemicals) and the 115 3' oligonucleotide primer (Fig.
1
).
The 115 nucleotide fourth intron of the soybean [beta]
-
conglycinin gene used in these experiments is 72% AU-rich and seven of nine nucleotides at the 5' splice site are consensus nucleotides complementary to sequences
conserved in the 5' ends of mammalian, yeast and plant U1 snRNAs. The parent substrate for
the
cis-
competition experiments described below was generated using site-directed mutagenesis to alter the sequence (
For analysis of pre-mRNA splicing in plant nuclei, this intron and its flanking exons were
expressed in
Nicotiana benthamiana
leaf disc cells using the autonomously replicating tomato golden mosaic virus
(TGMV) vector expression system described in
McCullough
et al
. (
34
). The splice site selection patterns for each construct were analyzed by
reverse transcriptase-polymerase chain reaction (RT-PCR) gel blot analysis and
verified by sequencing the cloned PCR product(s) corresponding to each spliced
transcript.
In a competition between two identical 5' splice sites, our model predicts that a site at the transition between
AU-moderate exonic and AU-rich intronic sequences should be selected preferentially over a
site buried within AU-rich sequences. To test this prediction, intronic sequences derived from
intron 5 of the [beta]
-
conglycinin gene were inserted in between the equivalent strength -81 (distal) and +1 (proximal) 5' splice sites in either the sense (int5S) or antisense (int5AS)
orientations and expressed in tobacco nuclei. RT-PCR analysis (Fig.
2
) indicated that the distal 5' splice site, located upstream of the AU-rich intronic sequences in both of these transcripts, was used
almost exclusively. In contrast, the proximal site at +1, which is buried in AU-rich sequences, was not used at any detectable level.
Our model predicts that the proximal 5' splice site, which is inactive when buried in AU-rich sequences, will be active when AU-moderate exonic sequences are placed upstream from it. To
test this prediction, exonic sequences derived from exon 6 of the [beta]-conglycinin gene were inserted between the two competing 5' splice sites in the sense (ex6S) and antisense (ex6AS)
orientations. RT-PCR analysis indicated that the proximal site was efficiently
activated in both the ex6S and ex6AS transcripts, albeit to a higher degree in
ex6S (Fig.
2
). The overall splicing efficiency of the ex6AS replacement construct (56%, SE =
2.4%) is significantly lower than the ex6S replacement (75%, SE = 1.1%) or
either of the intron replacements (82%, SE = 1.9% for intS; 73%, SE = 2.1% for
intAS).
In the
cis
-competition experiments described above, the 5' splice site positioned at the AU transition between AU-moderate exonic and AU-rich intronic sequences is favored over an identical 5' splice site buried in exonic or intronic
sequences (Fig.
2
). Two possible mechanistic interpretations exist for these results which can
not be differentiated on the basis of previous pre-mRNA splicing studies conducted by this or any other laboratory. The first
postulates that a 5' splice site buried within exonic or intronic sequences is not used
because it is masked from recognition. The second postulates that recognition
of a 5' splice site at the transition between AU-moderate and AU-rich sequences is actively promoted by recognition of the
sequences on both sides of the AU transition point.
To differentiate between these two mechanistic models, we deleted the distal 5' splice site from the int5S construct as shown in Figure
3
. If the proximal site is masked by the presence of upstream AU-rich sequences, the proximal site should remain inactive in this deletion
mutant ([Delta]-81/int5S/+1wt). [Mutants are designated as (distal
site/replacement sequence/proximal site).] RT-PCR analysis of RNA isolated from
transfected leaf discs indicated, however, that the proximal site is
efficiently selected when the distal site is deleted (Fig.
4
, lane 3). To determine whether the proximal site is also active when the distal
site is inactivated rather than deleted, single +1A (-81.+1A/int5S/+1wt) or double -2T,+5A (-81.-2T,+5A/int5S/+1wt) mutations were introduced at the
distal site. The proximal site is used exclusively in both of these mutants
(Fig.
4
, lanes 4 and 7) indicating that the 69 nucleotide AU-rich sequence positioned upstream of the proximal site is not capable of
blocking its usage in the absence of a functional upstream site.
The experiments described above and in McCullough
et al
. (
29
) have demonstrated that 5' splice sites located at the AU transition are favored over other sites
in the upstream exon or downstream intron. To determine if AU-moderate exonic sequences contribute to recognition of the downstream 5' splice site, truncated transcripts containing a single 5' splice site preceded by either AU-moderate exonic or AU-rich intronic sequences were expressed in
tobacco leaf disc nuclei. All of these transcripts begin 12 nucleotides (nt)
downstream from the distal 5' splice site and have 69 nucleotides of int5S or ex6S sequence (Fig.
1
) preceding the proximal splice site. RT-PCR analysis of these truncated
transcripts (Fig.
5
) indicated that transcripts containing AU-moderate exonic sequence preceding the +1wt site (lane 1) are spliced
substantially better (70% splicing efficiency, SE = 3.3%) than those containing
a similar length of AU-rich sequence (51% splicing efficiency, SE = 2.0%) (lane 2). These results
indicate that, although AU-moderate exonic sequences are not essential for recognition of the downstream +1 splice site, they significantly improve its splicing activity.
Figure
To more clearly define the importance of exonic sequences in 5' splice site recognition, AG-rich and AU-rich sequence elements were inserted in the truncated int5S
transcript upstream from the +1 proximal site. The particular AG-rich element chosen for this analysis was derived from a previous study
showing that mutation of a 27 nt AU-rich block (AAAG
Figure
To further test this hypothesis, we inserted the short purine-rich motif (GGAGAGGCAG) corresponding to the left subsection of the longer
27 nt element into int5S AU-rich sequences separating the distal and proximal 5' splice sites (int5S+AG; Fig.
3
). Because this insertion increases the distance between the distal and proximal
sites, we also generated a control construct with an AU-rich insertion (int5S+AU; Fig.
3
). For cloning purposes, both insertions are preceded by a
Sph
I restriction site. As a result of these insertions, the AU compositions of the
sequences between the distal and proximal sites are 68% for int5S, 61% for
int5S+AG and 69% for int5S+AU. The RT-PCR analysis of these constructs (Fig.
7
) indicated that the inserted AG-rich element, but not the AU-rich element, activated the proximal 5' splice site in int5S transcript to some extent in the
presence of the wildtype (-81wt) distal site and to a greater extent in the presence of a weakened (-81.-2T) distal site. Usage of the proximal site occurred in 13%
of the spliced transcripts containing equivalent 5' splice sites (Fig.
7
, lane 3) and in 55% of the spliced transcripts containing a weakened (-2T) distal site (Fig.
7
, lane 6). Note that the weakened -2T distal site is used exclusively in the int5S (lane 4) and int5S+AU
(lane 5) constructs. This series of constructs demonstrates that this AG-rich element can positively activate selection of a proximal 5' splice site buried within AU-rich intronic sequences. The variable degrees to which the
proximal site was activated indicate that 5' splice site selection in these transcripts is modulated by a balance
between the relative strengths of the exonic AG-rich element, the competing 5' splice sites and the intronic AU-rich sequences.
Figure
To further test the ability of this purine-rich element to act as a positive splicing regulator, the GGAGAGGCAG motif
was introduced into the truncated int5S construct (Fig.
5
). Consistent with its activity in the full length constructs described above,
it increases recognition of the +1 site in the truncated int5S transcript.
Truncated int5S transcripts containing the AG-rich element are spliced at an efficiency (67% splicing efficiency, SE =
2.6%) not significantly different from truncated ex6S transcripts containing
wildtype exonic sequences (70% splicing efficiency, SE = 3.3%) (Fig.
5
).
Pre-mRNA introns found in a variety of organisms including
Tetrahymena
,
Drosophila
,
C.elegans
and plants are significantly richer in adenosine and uridine than their flanking
exons (
24
). This intronic AU-richness has been shown to be important for efficient splicing in dicot
plant nuclei (
25
,
26
) and capable of compensating for weak splice sites in monocot plant nuclei (
26
). We originally hypothesized that the plant splicing machinery used these
variations in exonic and intronic base composition to delineate intron
boundaries and to distinguish authentic from cryptic splice sites. This premise
was supported by experiments in which AU-rich intronic or AU-moderate exonic sequences were inserted between competing but
nonequivalent 5' splice sites upstream of pea
rbcS3A
intron 1 (
29
). In these experiments, weak 5' splice sites upstream of AU-rich intronic sequences were selected in favor of perfect 5' splice site consensus sequences buried within AU-rich sequences.
Cis
-competition experiments between 3' splice sites in maize
Adh1
intron 3 indicated that selection of these sites were also significantly
modulated by altering the nucleotide composition between competing 3' splice sites (
27
,
28
). These results led us to propose a model for intron recognition in dicot
nuclei suggesting that intron boundaries are initially established by
recognition of the transition points between AU-moderate (exon) and AU-rich (intron) sequences and subsequently defined by the snRNAs and
other splicing factors.
We have now tested this model for intron recognition using an independent set of
transcripts which allow us to separate the effects of sequence composition and
position from splice site strength. In agreement with our previous replacement
substitutions between nonequivalent 5' splice sites upstream from the longer (469 nt)
rbcS3A
intron 1 (
29
), placement of AU-rich intronic sequences (in their sense orientation) between two identical
5' splice sites allows for selection of the distal site located upstream of
the AU-rich intronic sequences; placement of AU-moderate exonic sequences (in their sense orientation) allows for
strong activation of the downstream proximal site. Exon antisense replacement
sequences activate the proximal site as efficiently (80% of spliced product) as
in a similar
rbcS3A1
ex3 construct containing a perfect 5' splice consensus sequence at the position of the distal site and a
weaker sequence at the position of the proximal site. The consistency of these
results and the strong activation of the proximal site regardless of the
strength of the upstream 5' splice site clearly indicate that sequence transition points play a
dominant role in defining the 5' intron boundary. Minimal consensus requirements determine the
functionality of a 5' splice site at this boundary but, once these minimal requirements are
met, 5' splice site selection becomes dependent on its proximity to this
boundary rather than its agreement to the 5' splice site consensus.
We have, for the first time, differentiated between two models which might
explain the 5' splice site selection patterns reported here and in a number of other
plant pre-mRNA splicing studies. One model supposes that 5' splice sites buried within AU-rich sequences are not used for splicing because they are
sterically masked and cannot be bound by splicing factors. The second model
supposes that sites at the exon/intron boundary are preferentially activated
for splicing by recognition of sequences on both sides of the AU transition
point. Collectively, our results support the second transition-enhancement model. First, the proximal AU-buried 5' splice site is efficiently used when the distal site at the
exon/intron transition is deleted or severely mutated. The contrasting steric-masking model would predict that the proximal AU-buried site should remain unspliced in the absence of a competing
site. Second, the weakened -81.-2T and -81.+5A distal 5' splice sites at the AU transition outcompete
wildtype proximal 5' splice sites buried in AU-rich sequences. These data argue strongly for a `transition-enhancement' model in which factors capable of recognizing
sequence composition promote splicing at the weaker 5' splice site at the AU transition point by the recruitment of splicing
factors or hnRNP proteins to the transition site or by preferential promotion
of interactions between the transition 5' splice site and the 3' splice site.
Our data, especially that obtained with the truncated and AG-rich insertion constructs, indicate that the exact sequence of the
upstream `exonic' sequence is crucial for its recognition. Individual sense and
antisense exon replacements have slightly different abilities to activate the
distal 5' splice site: the ex6S exon sense replacement promotes exclusive usage of
the proximal site, the ex6AS antisense replacement promotes usage of the
proximal site in 80% of the transcripts. Likewise, the
rbcS3A1
ex3 sense replacement between non-equivalent 5' splice sites promotes usage of the proximal +1wt site in 80% of
the spliced transcripts, natural
rbcS3A
exon 1 sequences in the same construct promote usage of the proximal site in
40% of the transcripts and the enhanced distal site in 60% of the transcripts (
29
). In the truncated series of constructs presented here, AU-rich truncated first exons are recognized but not as efficiently (51%
splicing efficiency, SE = 2%) as normal AU-moderate plant exons (70% splicing efficiency, SE = 3.3%). We have
concluded that, although normal exonic sequences are not essential for
splicing, sequence components within them positively enhance their recognition
as exonic sequences.
The splicing of transcripts containing AG-rich insertions in the int5S replacement sequence have indicated that the
GGAGAGGCAG motif represents one distinct sequence element that is capable of
modulating exon recognition in plant nuclei. Placement of this particular AG-rich element downstream from a variety of distal 5' splice sites indicates that it actively promotes recognition of
the proximal site to varying degrees depending on the strength of the distal
site and the context in which it is placed. Even in the presence of a
functional distal site, the element is capable of promoting recognition of AU-rich intron sequences as exonic sequences. [Further structural analysis on
this sequence (
35
) indicates that the effect mediated by this short sequence element is not
simply an effect of decreasing AU-richness, since sequences reducing AU-richness without generating AG-richness fail to have similar effects (ATb2 in Fig.
6
).] In its ability to effect 5' exon recognition, this element functionally resembles the purine-rich splicing enhancers found in vertebrate 3' exons. By analogy with these vertebrate elements which
enhance 3' splice site recognition as a result of their binding SR proteins (
36
-
38
), purine-rich elements such as the one identified here may be proposed to play a
general role in plant exon recognition, possibly by interacting with some of
the recently identified plant SR proteins (
39
,
40
).
It has become increasingly evident that RNA processing signals are distributed
throughout the exons and introns of many genes in most, if not all classes of
organisms (
15
-
23
). The functionality of these exonic and intronic elements in a diverse array of
constitutive and alternatively spliced transcripts suggest that they serve to
distinguish specific splice sites in transcripts which contain multiple
potential cryptic sites, possibly by `tagging' intron or exon sequences for the
subsequent assembly of splicing factors. Our current evidence suggests that, in
plants, introns are tagged by AU-rich elements and exons are tagged by AG-rich elements. We speculate that splicing occurs at sites properly
positioned between recognition complexes associated with these elements and is
modulated by the concentration of nuclear factors binding to them.
The authors acknowledge Mr Cesar Egoavil and Dr Hua Lou for valuable discussions
throughout this project. This work was supported by National Institutes of
Health grant R01 GM39025 (MAS) and US Department of Agriculture Competitive
Research grant AG92-37301-7964 (AJM).
*To whom correspondence should be addressed. Tel: +1 217 333 8784; Fax: +1 217
244 1336; Email: maryschu@uiuc.edu



REFERENCES
Return


