Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (193K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (52)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Kraus, R.
Right arrow Articles by Mertz, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kraus, R.
Right arrow Articles by Mertz, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 1995 Oxford University Press 1531-1540

Footnote

Experimentally determined weight matrix definitions of the initiator and TBP binding site elements of promoters

Experimentally determined weight matrix definitions of the initiator and TBP binding site elements of promoters Richard J. Kraus , Elizabeth E. Murray + , Steven R. Wiley w , Nancy M. Zink , Karla Loritz , Gregory W. Gelembiuk and Janet E. Mertz*

McArdle Laboratory for Cancer Research, University of Wisconsin Medical School, Madison , WI 53706-1599, USA

Received December 4, 1995; Revised and Accepted February 28, 1996

ABSTRACT

The basal elements of class II promoters are: (i) a -30 region, recognized by TATA binding protein (TBP); (ii) an initiator (Inr) surrounding the start site for transcription; (iii) frequently a downstream (+10 to +35) element. To determine the sequences that specify an Inr, we performed a saturation mutagenesis of the Inr of the SV40 major late promoter (SV40-MLP). The transcriptional activity of each mutant was determined both in vivo and in vitro . An excellent correlation between transcriptional activity and closeness of fit to the optimal Inr sequence, 5 ' -CAG/TT-3 ' , was found to exist both in vivo and in vitro . Employing a neural network technique we generated from these data a weight matrix definition of an Inr that can be used to predict the activity of a given sequence as an Inr. Using saturation mutagenesis data of TBP binding sites we likewise generated a weight matrix definition of the -30 region element. We conclude the following: (i) Inrs are defined by the nucleotides immediately surrounding the transcriptional start site; (ii) most, if not all, Inrs are recognized by the same general transcription factor(s). We propose that the mechanism of transcription initiation is fundamentally conserved, with the formation of pre-initiation complexes involving the concurrent binding of general transcription factors to the -30, Inr and, possibly, downstream elements of class II promoters.

INTRODUCTION

Considerable progress has been made toward elucidating the mechanism of transcription initiation by RNA polymerase II ( 1 - 5 ). For class II genes of higher eukaryotes containing an A+T-rich sequence (TATA box) ~30 bp upstream of the initiation site formation of the pre-initiation complex is believed to occur in a stepwise fashion. The first step is recognition of this element by TATA binding protein (TBP), a DNA binding component of TFIID. This factor then recruits the other general transcription factors to form the pre-initiation complex (PIC). Lastly, the addition of ATP promotes an open structure in the promoter region of the DNA and synthesis of RNA begins ( 6 - 8 ).

However, many class II genes lack an obvious TATA box 30 bp upstream of their transcription start site. It has been proposed that the formation of pre-initiation complexes on these TATA-less promoters may occur via alternative pathways involving other sequence elements and factors that functionally replace the TATA box and TBP to accurately select the initiation site ( 9 - 12 ).

One candidate for an alternative positioning element is the sequence immediately surrounding the transcription initiation site, referred to as the initiator (Inr). Numerous workers have shown that this element is genetically important for efficient transcription from a variety of promoters (for example 13 - 18 ). Several groups have identified proteins that bind Inrs ( 19 - 23 ). However, Means et al . ( 24 ) and Javahery et al . ( 25 ) have reported that the binding of HIP1/E2F-1 and YY1 to the Inrs of the dihydrofolate reductase ( dhfr ) and adeno-associated virus (AAV) P5 promoters respectively does not correlate with the transcriptional activities of these promoters. On the other hand, Usheva and Schenk ( 26 ) found that YY1, TFIIB and polymerase II were sufficient for basal transcription from the AAV P5 promoter. Thus it remains controversial whether any Inr binding proteins are truly novel factors that direct the formation of initiation complexes or simply regulatory factors that act by binding to sites situated at or near Inrs.

We have been investigating the mechanism of transcription initiation from the naturally occurring, TATA-less major late promoter of simian virus 40 (SV40-MLP). This promoter contains three genetically important proximal sequence elements located at approximately -30, +1 and +30 relative to the start site of transcription ( 27 and citations therein). The -30 region element functions as a binding site for TBP despite lack of a consensus TATA box sequence and it cooperatively interacts with the Inr to determine the transcriptional start site ( 28 ). The +30 region element binds a cellular factor, DAP ( 29 ), that may play a role in anti-termination of transcription (T.E.Eisenbraun, F.Zuo, R.J.Kraus and J.E.Mertz, manuscript in preparation). The +1 region element binds several members of the steroid/thyroid hormone receptor superfamily (e.g. hERR1 and TR[alpha]1/RXR[alpha]; 30 , 31 ), however, these factors act as sequence-specific repressors, not activators, of late transcription when viral template copy number is low.

To precisely determine the nucleotides that define an Inr we performed a saturation mutagenesis of the bases surrounding the +1 site of the SV40-MLP. We found that an excellent correlation exists between transcriptional activity and closeness of fit to the optimal Inr sequence, 5'-CAG/TT-3'. Using these data we derived a weight matrix that can be used to predict the activity of a sequence as an Inr. We conclude that transcription initiation at the SV40-MLP probably occurs via a mechanism similar, if not identical, to that used by TATA box-containing promoters, rather than by a mechanism involving a novel initiator factor. We propose that transcription initiation is fundamentally conserved among class II promoters, with the functions of the Inr being to bind TFIID/TFIIB/pol II/TFIIF and to serve as a site at which RNA polymerase II can initiate transcription.

MATERIALS AND METHODS

Plasmid DNAs

Plasmid DNAs were constructed by standard recombinant DNA techniques ( 32 ). Plasmid pSV1773(WT) contains the pseudo-wild-type SV40 used in all in vivo experiments except where indicated otherwise. It is a variant of pSVS ( 33 ) lacking SV40 nucleotides (nt) 1629-1635 inclusive ( 34 ).

Plasmid pSV1790 is a derivative of pSV1773 in which SV40 nt 319-336 inclusive have been replaced with the sequence 5'-CTGGGCAGGTCTCGAGACCTGCCCAG-3' ( 28 ), containing two Bsp MI sites. Cleavage with Bsp MI yields a vector containing non-complementary single-stranded ends that can be used for cassette mutagenesis of SV40 nt 319-336. Plasmids pSV4501(-6/-4TCT), pSV4502(-3/-1TAA), pSV4503 (+1/ +3GGG), pSV4504(+4/+6GAC), pSV4505(+7/+9CCT), pSV- 4506(+10/+12GCG) and pSV4528(-1C,+2G) were generated by insertion of appropriate synthetic oligonucleotides into Bsp MI-cut pSV1790 DNA. Plasmids pSV4507(-3A), pSV4516(+1G), pSV4517(+1C), pSV4518(+1T), pSV4519(+2A), pSV4520(+2G), pSV4521(+2C), pSV4523(+3G), pSV4524(+3C), pSV4526(+4G) and pSV4527(+4C), each containing the indicated single base pair change, were generated likewise.

Plasmids pSV4508(-3C), pSV4509(-3T), pSV4510(-2A), pSV4511(-2G), pSV4512(-2C), pSV4513(-1A), pSV4514(-1G), pSV4515(-1C), pSV4522(+3A), pSV4525(+4A), pSV4529 (-3C, +3G), pSV4530(+2C,+4A,+5A), pSV4531 (-4A,-3C,+3A,+4A) and pSV4538 (-4A,-3C,+3A) were generated by a variation of the mutagenesis procedure of Chen and Struhl ( 35 ), starting with a pair of 55 bp oligonucleotides synthesized to contain 2% random bases in each of the nucleotides corresponding to SV40 nt -4 to +4 relative to the major late initiation site. These oligonucleotides were annealed via their complementary 5'- and 3'-ends, end filled with Klenow polymerase, cleaved with Kpn I and ligated into Kpn I (SV40 nt 294)- and Nae I (SV40 nt 345)-cut pSV1773 DNA.

Plasmid pXS13 is a derivative of pSVS lacking the 72 bp repeat region (i.e. SV40 nt 115-272 inclusive) of the SV40 promoter ( 33 ). Plasmids pSV4532, pSV4533, pSV4536 and pSV4537 were constructed by substitution of the smaller Kpn I- Eco RV (SV40 nt 294-770) fragment of plasmids pSV4503, pSV4516, pSV4518 and pSV5428 respectively for the corresponding fragment in pXS13.

Cell-free transcription assays

Transcription reactions (50 [mu]l) were performed at 26oC with 0.5 [mu]g circular plasmid DNA as template and 16 [mu]l HeLa cell nuclear extract (18-20 mg protein/ml). At these protein:DNA ratios repression of the SV40-MLP by initiator binding proteins (IBPs) does not occur ( 30 ). The quantities of the 5'-ends of the SV40 late and early RNAs were analyzed by primer extension as previously described ( 34 ). Synthetic oligonucleotides corresponding to SV40 nt 394-369 and nt 5136-5160 served as primers for detecting the late and early RNAs respectively. The relative amount of SV40 late RNA synthesized from the MLP of each mutant was determined by normalization to both: (i) the amount of SV40 early RNA synthesized from the same plasmid in the same reaction; (ii) the amount of SV40 late RNA synthesized in a parallel reaction from the MLP of WT SV40.

In vivo transcription assays

All assays were performed utilizing CV-1PD cells grown in Dulbecco's modified Eagle's medium supplemented with 5% fetal bovine serum. Viral recombinant DNA (3.5 [mu]g/10 cm dish) was excised from the vector sequences and ligated to form monomer circles prior to transfection. The cells were transfected and whole cell RNA was isolated 42 h post-transfection as described previously ( 34 ). By these post-transfection times expression of the SV40-MLP is unaffected by IBPs ( 30 , 31 ). The relative amount of steady-state SV40 late RNA synthesized from each mutant was determined by quantitative primer extension analysis as described above, with normalization to both: (i) the amount of SV40 late RNA present in cells transfected in parallel with WT SV40 DNA; (ii) the relative amount of replicated viral DNA present in these cells (determined by quantitative Southern blot analysis; 36 ).

Generation of a weight matrix definition of an Inr

We used a neural network learning algorithm as described by Rumelhart et al . ( 37 , 38 ), along with our experimentally determined data, to generate a weight matrix definition of an Inr. In brief, a neural network with 4 * 8 input units and 1 output unit was created to represent the possible nucleotides located at positions -4 to +4 relative to the major late transcription start site. Each nucleotide position was coded as four bits, with each bit representing the presence [1] or absence [0] of the nucleotide A, C, G or T at that position ( 39 , 40 ). The network was trained on our in vivo data of the 21 single base pair Inr mutants (Fig. 3 B) and mutant SV4538(-4A,-3C,+3A) (relative activity 0.3). In mutant SV4525(+4A) two nearby start sites of similar intensity were used (+1 and +4); the output of each of these Inrs was doubled to compensate for apparent competition between them. Where transcription from a mutant promoter was observed to initiate predominantly at a position other than +1 [i.e. mutant SV4513(-1A), which initiates from -1; relative activity 1.0] the 8 bp input sequence was shifted such that the experimentally determined start site remained aligned. The output of the neural network was offset by 0.010 and normalized to a scale ranging from 0.010 to 0.869. In addition, 24 random nucleotide sequences, pre-screened to eliminate any sequences resembling a functional initiator, were included in the training set as examples of nulls. Back propagation was performed without hidden units using the function y = 1/[1 + e -2( x + b ) ], a learning parameter of 0.2 and a momentum parameter of 0.8; b signifies a `bias' term representing the weight for an additional input unit which was always set to unity.

Generation of a weight matrix definition of a TATA box

A similar analysis was performed for the TATA box. In this case we used a 4 * 7 input unit matrix (coded as above). Training was performed using the data sets of Wobbe and Struhl ( 41 ) and Mukumoto et al. ( 42 ) for cell-free transcription in HeLa cell nuclear extracts of mutants of the Saccharomyces cerevisiae his 3 and Arabidopsis TC7 promoters respectively. To create a single training set these two sets of data were first aligned by eye and the alignments confirmed by the method of Bucher ( 43 ); their relative outputs were calibrated by first training on each dataset individually. Output was offset by 0.010 and normalized to a scale ranging from 0.10 to 0.70. In addition, 39 random nucleotide sequences, pre-screened to eliminate any sequences resembling a functional TATA box, were included as examples of nulls. Back propagation was performed using a hidden layer containing one neuron. A learning parameter of 0.2 and a momentum parameter of 0.8 were used for both the hidden and output layers. The hidden layer employed the function y = (e x /1.5 - e - x /1.5 )/(e x /1.5 + e - x /1.5 ); the output layer used the function y = 1/[1 + e -2( mx + b ) ]. The single hidden unit was used to appropriately adjust the gain and threshhold parameters of the output function.

RESULTS

Transcriptional activity of cluster point mutants in the SV40-MLP Inr

Previously Ayer and Dynan ( 27 ) showed that a linker-scanning substitution mutant spanning nt -5 to +7 relative to the transcriptional initiation site of the SV40-MLP was defective in the synthesis of RNA from this promoter. To identify more precisely the bases that define the Inr of this promoter we constructed a set of cluster point substitution mutants spanning this region of the SV40-MLP (Fig. 1 B). Each mutant was assayed for transcriptional activity both in a cell-free transcription system (Fig. 1 C) and in vivo (Fig. 1 E). These data, summarized in Figure 1 B, indicated that only mutations in the bases at or immediately surrounding the start site (i.e. nt -3 to +3) significantly affect transcription from nt +1. Taken together with the finding of Ayer and Dynan ( 27 ) that mutants spanning nt +10 to +19 and -25 to -3 are not defective in transcription from the SV40-MLP, we conclude that only the bases immediately surrounding the initiation site are critical for defining initiator function in the context of this promoter.


Figure 1 . Transcriptional analysis of cluster point mutants in the SV40-MLP Inr. ( A ) Schematic diagram of the promoter region of the SV40 genome. The numbers below the line indicate the nucleotide residues in the SV40 numbering system (80). The arrow labeled +1 at nt 325 indicates the direction and location of transcription initiation from the MLP. The rectangles within the 21 bp repeats depict Sp1 binding sites. ( B ) Sequences of the Inr regions of the cluster point mutants and a summary of the data obtained with them. The columns on the right are means +- SEM of three assays of the in vitro and in vivo data obtained from experiments similar to the ones shown in (C) and (E) respectively. ( C ) Autoradiogram of primer extension analyses showing quantities and locations of the 5'-ends of the SV40 late RNAs synthesized from the indicated mutant SV40 genomes in a cell-free transcription system. Lane 1, no plasmid DNA; lane 2, pSV1773(WT); lane 3, pSV4501(-6/-4TCT); lane 4, pSV4502 (-3/-1TAA); lane 5, pSV4503(+1/+3GGG); lane 6, pSV4504(+4/+6GAC); lane 7, pSV4505(+7/+9CCT); lane 8, pSV4506(+10/+12GCG). ( D ) Autoradiogram of primer extension analyses showing quantities and locations of the 5'-ends of the SV40 early RNAs from the same experiment shown in (C). ( E ) Autoradiogram of primer extension analyses showing the quantities and locations of the 5'-ends of the SV40 late RNAs present in CV-1PD cells 42 h after transfection with the indicated mutants of SV40. Lane 1, no DNA; lane 2, SV1773(WT); lane 3, SV4501; lane 4, SV4502, lane 5, SV4503; lane 6, SV4504; lane 7, SV4505; lane 8, SV4506.

In vitro transcription of single base pair substitution mutants in the SV40-MLP Inr

To further define the Inr motif of the SV40-MLP we performed a saturation mutagenesis of nt -3 to +4. In the cell-free transcription system mutations at either nt -3 or -2 had little effect on either the efficiency or location of the major start site of transcription (data not shown and Fig. 2 A, lanes 2-5; summarized in Fig. 2 B). On the other hand, the nucleotide in the -1 position had major effects on transcription: a T -> A change resulted in a dramatic reduction in +1 initiated transcription, as well as an increase in the frequency of initiation from other nearby sites (Fig. 2 A, lane 6 versus 2); substitution of a G led to a significant reduction in transcription initiation (Fig. 2 A, lane 7); placement of a C in this position led to a 7-fold increase in +1 initiated transcription (Fig. 2 A, lane 8 versus 2). The sequence of the initiating nucleotide (+1) was equally critical: alteration of the A to any other base resulted in significant reduction in transcription initiation from this site.


Figure 2 . In vitro transcriptional analyses of single base pair mutants in the SV40-MLP Inr. ( A ) Autoradiogram of primer extension analyses showing the relative amounts of +1 initiated RNA synthesized from the indicated point mutants in a cell-free transcription system. (The microheterogeneity surrounding the start site is not reproducible and is likely the result of primer extension artifacts.) Lane 1, no DNA; lane 2, pSV1773(WT); lane 3, pSV4510(-2A); lane 4, pSV4511(-2G); lane 5, pSV4512(-2C); lane 6, pSV4513(-1A); lane 7, pSV4514(-1G); lane 8, pSV4515(-1C); lane 9, pSV4516(+1G); lane 10, pSV4517(+1C); lane 11, pSV4518(+1T); lane 12, pSV4519(+2A); lane 13, pSV4520(+2G); lane 14, pSV4521(+2C); lane 15, pSV4522(+3A); lane 16, pSV4523(+3G); lane 17, pSV4524(+3C); lane 18, Msp I-cut pBR322 DNA. ( B ) Summary of the effects of single base pair alterations in the nt -3 to +4 region on +1 initiated transcription in vitro. Shown are the means +- SEM of data from three experiments.


Figure 3 . In vivo transcriptional analysis of single base mutants in the SV40-MLP Inr. ( A ) Autoradiogram of primer extension analyses showing the relative amounts of +1 initiated RNA synthesized from the indicated point mutants in transfected CV-1PD cells. Lane 1, no DNA; lane 2, SV1773(WT); lanes 3-17, SV4510-SV4524; lane 18, Msp I-cut pBR322 DNA. ( B ) Summary of the effects of single base pair alterations in the nt -3 to +4 region on +1 initiated transcription in vivo . Shown are the means +- SEM of data from three experiments.

The precise nucleotides present at positions +2 and +3 also significantly affected Inr activity (Fig. 2 A, lanes 12-17 and data not shown; summarized in Fig. 2 B). For example, a T -> G change at nt +2 resulted in a 4-fold increase in the synthesis of +1 initiated RNA, while substitution of the T at nt +3 with any other nucleotide resulted in +1 initiated transcription being reduced to 50-20% of the level observed with WT. Interestingly, a T -> G change at nt +3 led to the utilization of an alternative Inr-like sequence, 5'-GAGGTT-3', situated 5 bp upstream of the natural start site (Fig. 2 A, lane 16). At nt +4 a T -> A, creating the sequence 5'- TT- ATTAAGGC-3' from nt -2 to +8, was the only sequence alteration found to lower +1 initiated transcription, however, in this case transcription initiated at significant levels at nt +4 (data not shown). These data indicate that in the cell-free system the SV40-MLP Inr is defined largely by the sequences from nt -1 to +3, relative to the transcriptional start site, with the nucleotides 5'-CAGT-3' at these positions being optimal.

In vivo transcription of single base pair substitution mutants in the SV40-MLP Inr

Except for a few quantitative differences, similar results were obtained for transcription of these mutants after transfection into monkey cells (Fig. 3 ). Again, changes in the nucleotides outside of nt -1 to +3 affected transcription at most 2-fold (summarized in Fig. 3 B). Transcription from the +1 site was lower in the nt +4 T -> A substitution mutant, because of the creation of a new functional Inr (data not shown). Similarly, the trends and qualitative effects of the mutations in the -1 to +3 positions were nearly identical to those observed in vitro (Fig. 3 B versus 2 B). For example, the T -> A change at nt -1 produced nearly wild-type levels of RNA, but starting from the -1 position (Fig. 3 A, lane 6).

Three significant quantitative differences from the in vitro data were observed: (i) the WT genome produced amounts of +1 initiated RNA similar to the substitution mutant containing a G at the +2 position; (ii) the nt -1 T -> C substitution resulted in only a 2-fold increase in transcription; (iii) the nt +1 A -> C substitution decreased RNA synthesis only 20%. Overall, while the trends were similar, the extents of the quantitative effects of the sequence alterations on transcription in vivo were less dramatic than they were in vitro .

We conclude from these experimental data that an Inr in the context of the SV40-MLP is largely defined by nt -1 to +3 relative to the transcription initiation site, with the nucleotides at positions -1, +1 and +3 having the most profound effects on determining the strength of the sequence as an Inr. If the Inr with maximal activity is the one in which each of these positions contains the nucleotide that individually yielded maximal activity, we predict that the functionally optimal Inr sequence should be 5'-CAG/TT-3'. This prediction was confirmed by the analysis of mutant pSV4528(-1C,+2G) in both our cell-free (Fig. 7 B below; summarized in Fig. 7 A) and in vivo assay systems (data not shown): 7- and 3-fold respectively more late RNA was synthesized from this mutant than from the wild-type promoter.

Weight matrix definition of an Inr

Genetic elements can also be identified by comparison of functionally similar sequences. By mathematically analyzing the sequences of the Inrs of 502 naturally existing polymerase II promoters Bucher ( 43 ) concluded that the Inr consensus sequence is 5'-TCAGT-3', with initiation occurring at the A and the dinucleotide CA being most prevalent. We also concluded that this consensus sequence is optimal for transcriptional activity.

Using the in vivo data on the transcriptional activities of our complete set of Inr point mutants and a neural network learning algorithm ( 37 ) we generated a weight matrix definition of an Inr that indicates true transcriptional activity, rather than frequency of occurrence in nature (Fig. 4 A). This matrix indicates the weights of a representative net connecting the input units to the output unit. The columns correspond to the positions in the sequence relative to the transcription initiation site; the rows correspond to the nucleotide at each of these positions. By summing the weights shown in this matrix that correspond to the 8 bp of a given sequence and plugging this sum into the formula presented in the legend to Figure 4 one can calculate a predicted relative activity for that sequence as an Inr. The excellent convergence we obtained between predicted and actual transcriptional activities (open circles in Fig. 4 B) indicates that this matrix successfully assimilated this SV40-MLP point mutant data set on which the net was trained.


Figure 4 . Experimentally determined weight matrix definition of an Inr and its value in predicting the activity of a given sequence as an Inr. ( A ) Weight matrix definition of an Inr, derived as described in Materials and Methods. This matrix can be used to calculate the predicted activity of a given sequence as an Inr relative to the optimal Inr by: (i) summing the weights shown in this matrix that correspond to the 8 bp of the sequence; (ii) substituting this sum for x in the equation `relative transcriptional activity' = 1.025{1/[1 + e -2( x - 0.674) ] - 0.01}. The constants -0.01 and 1.025 have been introduced to offset and scale the activity respectively, causing it to range from 1 to ~0. Negative values should be interpreted as equivalent to 0. ( B ) Predicted versus experimentally measured transcriptional activities for various sequences as Inrs relative to the optimal Inr sequence, 5'-TGGCATTG-3', in the context of the indicated promoter. Open circles, point mutants of the SV40-MLP on which the net was trained; solid squares, Inr mutants of the TdT promoter tested in vivo (25); solid triangles, Inr mutants of the mdr -1 promoter tested in vivo (44); solid diamonds, Inr mutants of the Ad-MLP promoter tested in vitro (45).

Repeated runs of this algorithm with this data set always generated matrices whose general features were quite similar, but not identical, to that shown in Figure 4 A (data not shown). Noteworthy is the fact that our weight matrix has a window of 8 input units, even though we had not systematically mutagenized the base in the -4 position of the Inr. Because some of our mutants exhibited alterations in the initiating nucleotide [e.g. mutant pSV4513(-1A); Figs 2 and 3 , lane 6], some input data were, nevertheless, available with alterations involving this position. We chose to employ an 8 bp window because we needed `hidden units' (i.e. additional units between the input and output units) to produce a good convergence with a 7 bp window (data not shown).

Use of the Inr weight matrix to predict transcriptional activity

Having trained the neural net on experimentally derived data, we next used this weight matrix as described in the legend to Figure 4 to predict the relative Inr activity of any given nt -4 to +4 sequence. We initially compared the predicted and in vivo determined transcriptional activities of several SV40 mutants that contained multiple mutations in the MLP Inr (data not shown). These mutants, SV4529(-3C,+3G), SV4530(+2C,+4A,+5A) and SV4531(-4A,-3C,+3A,+4C), had not been part of the original data set used to train the net. Nevertheless, the relative experimentally determined activities of each of these mutant promoters (0.12, 0.09 and 0.3 respectively) correlated reasonably well with their predicted activities (0.11, 0.04 and 0.20 respectively).

To test more generally the value of our weight matrix we also examined the predicted versus experimentally determined transcriptional activities of mutants in the Inrs of several other polymerase II promoters (Fig. 4 B). For the mdr -1 promoter a correlation of r = 0.68 was obtained. The mdr -1 promoter also has a genetically defined tripartite proximal sequence element structure, as well as a weak TATA box ( 44 ). In the case of the TATA-less TdT promoter a reasonable correlation ( r = 0.57) between calculated and actual transcriptional activity was observed, however, for several of the TdT mutants the experimentally determined activity was somewhat lower than predicted.

To examine the predictive value of our Inr weight matrix for a promoter containing a strong TATA box we compared the predicted transcriptional activities of Inr mutants of a hybrid promoter consisting of the TATA box of the hsp 70 promoter linked to the initiation site region of the Ad-MLP with the activities of these mutants in a cell-free transcription system ( 45 ). Once again, mutations in the nt -3 to +3 region had the qualitative effects predicted ( r = 0.75; Fig. 4 B). However, as expected, the quantitative effects on transcription were somewhat smaller than those observed with the promoters containing a weak TATA box.

Thus we conclude that the weight matrix presented here can be used to predict the qualitative effects on transcriptional activity of alterations in the sequence of the -4 to +4 region of a promoter when this region does not also contain an overlapping binding site for a non-general transcription factor. When the -30 region of the promoter has a weak TATA box this matrix can also be used to predict quantitative effects.

Weight matrix definition of the -30 region element

A similar neural network approach was employed to generate an experimentally determined weight matrix definition of the -30 region (TBP binding site) of polymerase II promoters (Fig. 5 A). In this case we used as our data sets the in vitro transcriptional analyses performed by Wobbe and Struhl ( 41 ) and Mukumoto et al. ( 42 ) of -30 region mutants of the his 3 and TC7 promoters respectively. The columns in Figure 5 A correspond to the alignment of the given -30 region sequence with respect to the experimentally derived optimal -30 region sequence, 5'-TATAAAA-3'; the rows correspond to the nucleotides at these positions. The excellent correlation obtained between predicted and experimentally derived transcriptional activities for these mutants (depicted by open circles in Fig. 5 B) indicates that the matrix successfully assimilated these training data.


Figure 5 . Experimentally determined weight matrix definition of a TBP binding site and its value in predicting the ability of a given sequence to function as the -30 region element of an RNA polymerase II promoter. ( A ) Weight matrix definition of a TBP binding site, derived as described in Materials and Methods. This matrix can be used to calculate the predicted activity of a given sequence as a -30 region element of a polymerase II promoter relative to the optimal TATA box sequence, 5'-TATAAAA-3', by: (i) summing the weights shown in this matrix that correspond to the 7 bp of the sequence; (ii) substituting this sum for x in the equation z = [e (1.2061 - x) /1.5 - e -(1.2061 - x )/1.5 ]/[e (1.2061 - x )/1.5 + e -(1.2061 - x )/1.5 ] (representing the network's hidden layer); (iii) calculating `relative transcriptional activity' = 3.0439{[1/(1 + e -2(0.02102 - 1.1709 z ) ) - 0.1]} (corresponding to the network's output layer). The constants -0.1 and 3.0439 have been introduced to offset and scale the activity respectively, allowing it to range from 1 to ~0. ( B ) Predicted versus experimentally measured transcriptional activities for various -30 region sequences relative to the optimal TATA box sequence in the context of the indicated promoter. Open circles, point mutants of the his 3 and TC7 promoters on which the net was trained; solid diamonds, mutants of the SV40-EES1 promoter tested in vitro (48); solid triangles, mutants of the [beta]-globin promoter tested in vivo (46); solid squares, mutants of the SV40-MLP promoter tested in vitro (47). ( C ) Predicted transcriptional activity for -30 region sequences from eight promoters versus the observed relative TBP binding affinity of these sequences (28).

Using this -30 region weight matrix as described in the legend to Figure 5 A, we tested whether one could predict the ability of a given sequence to function as the -30 region element of a polymerase II promoter (Fig. 5 B). Correlations of r = 0.74, 0.98 and 0.69 were obtained for the predicted versus experimentally determined transcriptional activities of mutants in the -30 region of the human [beta]-globin ( 46 ), SV40 late ( 47 ) and SV40 early ( 48 ) promoters respectively. Remarkably, a correlation of 0.98 was observed between predicted transcriptional activity and experimentally determined binding of recombinant human TBP to a set of eight sequences that included the -30 regions of the promoters for Ad-ML, human dhfr , rpL32, SV40-ML, TdT and IRF -1 ( 28 ). Thus we conclude that the weight matrix presented in Figure 5 A can be used to predict the ability of a given sequence both: (i) to bind TBP; (ii) to function as the -30 region element of a polymerase II promoter.

Analysis of -30 regions and Inrs of natural promoters

Figure 6 shows the relative activities predicted from our weight matrices of the -30 regions and Inrs of a variety of naturally occurring polymerase II promoters. A striking finding is that the activities predicted for either of these two basal elements vary greatly from one promoter to the next, spanning the complete range from very strong to quite inactive. In addition, the combined predicted activities of these two basal sequence elements on any given promoter also span the entire range from highly active to barely functional.


Figure 6 . Calculated closeness of match to the optimal of the -30 region and Inr sequence elements of various natural promoters. The columns on the left show the sequences of the -30 regions and Inrs of the indicated promoters. These sequences were used to calculate, as described in the legends to Figures 4 and 5, the numbers shown in the columns on the right. The predicted value of the minimal output initiator and TATA box sequences, 5'-ATCGTACA-3' and 5'-CGACCCC-3' respectively, served as base lines. In each case the sequence shown for the -30 region element is that within the window -30 +- 3 bp relative to the initiation site that gave the best match to the optimal -30 region sequence. The sequences shown are from the following genes: ( A ) baculovirus Orgyia pseudidotsugata multicapsid nuclear polyhedrosis virus glycoprotein 64 and immediate early 1; ( B ) Drosophila heat shock protein 70 and 26; ( C ) various viral genes expressed in mammalian cells, AAV P5 (adeno-associated virus P5), AdE1b (adenovirus E1B), Ad-IVa2 (adenovirus type 2 IVa2), Ad-ML (adenovirus type 2 major late), HIV-1 (human immunodeficency virus type 1) and SV40-ML (simian virus 40 major late); ( D ) cellular genes expressed in mammalian cells, CAD [carbamyl-phosphate synthase (glutamine-hydrolyzing)/aspartatecarbamoyl transferase/dihydroorotase], dhfr (dihydrofolate reductase), HPRT (hypoxanthine phosphoribosyltransferase), Ig [kappa] (immunoglobulin [kappa]), IRF -1 (interferon regulatory factor 1), Ki- ras , mdr -1 (multiple drug resistance), PGK (3-phosphoglycerate kinase), rpS16 (ribosomal protein S16), sparc /osteonectin and TdT (terminal deoxynucleotidyltransferase).


Figure 7 . Effects of upstream Sp1 binding sites on location and efficiency of transcriptional initiation from promoters with a strong versus weak Inr. ( A ) Sequences of the Inr regions of the mutants studied here and a summary of the in vitro data obtained with them. The columns on the right show the means +- SEM of data from three experiments of +1 initiated SV40 late RNA synthesized in a cell-free system. For mutant pSV5418(+1T), indicated by (*), the nt -3 to +1 initiated RNA was quantitated. ( B and C ) Autoradiograms of primer extension analyses of the SV40 late and early RNAs respectively synthesized in a cell-free system from pSV1773(WT) and the optimal Inr mutant pSV4528(-1C,+2G). ( D and E ) Autoradiograms of primer extension analyses of the SV40 late and early RNAs respectively synthesized in a cell-free system from the indicated Sp1-activated Inr mutants of the SV40-MLP.

As a control, we evaluated the output of our Inr and -30 region matrices across all possible 8 and 7 bp sequences respectively (data not shown). These data indicated that substantial specificity is associated with these two sequence motifs: only 2.4% of random 7 bp sequences yielded scores with our -30 region matrix that exceeded 10% of a consensus -30 element; only 20% of random 8 bp sequences yielded scores with our Inr matrix that exceeded 10% of the output of a consensus Inr element. Thus, even in the absence of other cis -acting elements, the presence of the -30 region and Inr elements spaced appropriately apart is predicted to provide fairly good selectivity in defining a site for transcription initiation.

Role of upstream activating elements

What sequence(s) determines the transcription initiation site in promoters such as CAD in which both basal elements are weak? It has been suggested that upstream Sp1 binding sites may affect the location as well as the efficiency of transcription initiation from the human dhfr and CAD promoters ( 49 , 50 ).

To determine whether such upstream activating sequences can determine the initiation site when both the -30 region and Inr motifs are weak, we constructed mutants of the SV40-MLP in which the six Sp1 binding sites of the SV40 promoter region (Fig. 1 A) were relocated to ~75 bp upstream of the major late start site of transcription (Fig. 7 ). As expected, substitution of the Sp1 binding sites for the wild-type activating sequences led to significant enhancement in the efficiency of transcription initiation regardless of the sequence of the Inr (Figs 1 and 2 and 7 B versus 7 D; summarized in Fig. 7 A). For example, transcription from nt +1 in the cell-free system was at least 10- to 15-fold higher when the +1/+3GGG and +1G Inr mutant promoters were activated by Sp1 than when they were present in the WT background, however, the major site of initiation from these weak Inr promoters remained unchanged by the presence of the Sp1 binding sites (Fig. 7 D, lanes 3 and 4), in agreement with the observations of Smale et al. ( 51 ).

Likewise, a low level of RNA was synthesized from the +1T Inr mutant in the wild-type background, with heterogenous 5'-ends mapping to approximately nt -3 to +1, rather than +1 (Figs 2 A and 3 A, lane 11); when placed downstream of the Sp1 sites transcription increased 10- to 15-fold (Fig. 7 D, lane 5; summarized in Fig. 7 A), again starting at approximately nt -3 to +1. Therefore, the site of transcription initiation is predominantly determined by the Inr, not by its distance to a binding site for an upstream activator.

DISCUSSION

Experimental determination of an optimal Inr

Using a natural, TATA-less promoter under assay conditions free of the effects of overlapping regulatory factors we investigated the role Inr sequences play in determining the efficiency and selection of the start site of transcription. Reported here is the first systematic mutagenesis of an Inr element (Figs 1 - 3 ). We found that only the region from -2 to +3 relative to the +1 start site is genetically important, with the nucleotides at positions -1, +1 and +3 being critical. We determined that the functionally optimal Inr sequence is 5'-(T/G)CA(G/T)T-3' (Figs 2 , 3 and 7 ) and correlated transcriptional activity with similarity to this sequence (Fig. 4 ). This finding concurs with the partial genetic analyses of the Inrs of other promoters ( 52 - 57 ). Thus the optimal sequence of an Inr is probably universal, with non-general Inr binding factors functioning as transcriptional regulators, not as novel `selectors'.

Weight matrix definitions of basal elements

Using a neural net algorithm ( 37 ) and our experimental mutagenesis data of the SV40-MLP Inr (Fig. 3 ) we generated a weight matrix that should be of general use to predict the relative strength of any given sequence as an Inr (Fig. 4 A). For most non-training set sequences examined a good correlation was found to exist between predicted and actual transcriptional activity (Fig. 4 B). Analysis of our data by the methodology of Stormo et al. ( 58 ) generated a weight matrix with comparable predictive value, i.e. r = 0.36, 0.55 and 0.78 for the mdr -1, Ad-MLP and TdT Inr data sets respectively. We prefer the neural net method, since it minimizes the differences between the predicted and actual activities, rather than their natural logs, and thus more likely reflects the biological systems being considered here.

Divergence between predicted and actual activity could be attributed to any of several factors. (i) The number of sequences we sampled in training the neural net may have been insufficient. (ii) The importance of the Inr is probably dependent upon its context with respect to the strengths of other basal and regulatory elements present in the promoter. (iii) Bases within the Inr may also be part of overlapping elements recognized by non-general regulatory factors (see for example 19 , 24 , 30 ). In fact, significant deviation between measured and predicted transcriptional activity might indicate the existence of a binding site for a regulatory factor.

Using mutagenesis data of others ( 41 , 42 ) we likewise generated a weight matrix for the -30 region of polymerase II promoters (Fig. 5 A). This matrix was found to have excellent predictive value, not only for transcriptional activity (Fig. 5 B), but also for binding of TBP to the -30 region (Fig. 5 C). However, a major limitation of the usefulness of these matrices remains in that they fail to incorporate information concerning the effects of the distance and interactions between the -30 and Inr elements.

Function of the Inr

Several recent studies have indicated that the Inr is recognized by components of holoTFIID (59- 66 ). Especially noteworthy is the finding of Purnell et al. ( 63 ) that the optimal sequence in the Inr region of the hsp 70 promoter that binds Drosophila (d) TFIID is 5'-CA(T/G)T(T/G)-3', the same sequence we found to be optimal for transcriptional activity of the SV40-MLP (Figs 2 , 3 and 7 ). This match between binding of TFIID and transcriptional activity provides compelling evidence that a component(s) of TFIID is a primary basal factor that functionally recognizes the Inr. Furthermore, Tjian and colleagues ( 64 - 66 ) have demonstrated that `TAF 11 150', a component of dTFIID, can specifically interact with and functionally recognize proximal elements, including the Inr.

The Inr likely also plays a direct role in promoter recognition and initiation of transcription by RNA polymerase. Caramco et al . ( 67 ) reported that highly purified RNA polymerase II by itself can weakly bind to the Inr of the Ad-MLP and preferentially initiate transcription from sequences resembling Inrs. Escherichia coli RNA polymerase also recognizes sites of transcription initiation ( 68 ), with the optimal start site being 5'-CAGT-3'. Interestingly, sequence alterations in the Inr that were deleterious to transcriptional activity of the SV40-MLP are known to interfere with binding by prokaryotic RNA polymerase ( 68 ). Furthermore, as has been documented with E.coli polymerase ( 69 ), we also noted a phenomenon consistent with `slippage' by eukaryotic polymerase II when multiple T residues were present at the initiation site (Fig. 7 D, lane 5). Thus we conclude that components of both RNA polymerase II and holoTFIID probably recognize the Inr in a sequence-specific manner. Most interestingly, recognition of the Inr by components of the basal machinery appears to be highly conserved from prokaryotes to higher eukaryotes.

Role of upstream activating sequences

To examine whether the location of binding sites for an activator protein can affect the site of transcription initiation we repositioned the multiple Sp1 binding sites naturally present in SV40 directly upstream of the basal elements of the SV40-MLP. Contrary to the conclusions reached with the dhfr ( 49 ) and CAD ( 50 ) promoters, we found that the locations of the transcriptional initiation sites were unaltered by the location of the Sp1 binding sites (Fig. 7 ). Thus we conclude that initiation site location is probably determined primarily by the binding of general transcription factors to multiple, appropriately spaced, basal elements of promoters.

The mechanism of basal transcription initiation is fundamentally conserved among class II promoters

Until recently a prevailing view has been that polymerase II promoters can be divided into two subclasses in which the early steps in the formation of a pre-initiation complex occur either by a TATA-dependent or an Inr-dependent mechanism ( 9 , 11 , 67 ). We consider this hypothesis unlikely to be valid. First, in higher eukaroytes the distance between the TBP binding site and the Inr is fixed to within a few bases ( 28 , 70 ). Second, the strengths of both the -30 region and Inr elements vary enormously in natural promoters, ranging from optimal to very weak ( 12 ; Fig. 6 ). Third, a single TBP binding site or Inr region does not provide sufficient sequence specificity to determine either the site or direction of transcription initiation. Fourth, most, if not all, class II promoters require TBP for transcription, whether they appear to contain a TBP binding site or not ( 3 , 15 , 51 , 71 - 73 ). Finally, cooperation between multiple elements is necessary for accurate start site selection ( 15 , 28 , 74 , 75 ).

As an alternative hypothesis we propose that the mechanism of transcription initiation is fundamentally conserved among class II promoters. The initial event is recognition by the holoTFIID complex of the multiple basal elements of the promoter existing in a fairly strict spatial alignment to each other. Next, TFIIB binds via protein-protein interactions with TBP and recruits polymerase II/TFIIF into the complex ( 71 , 73 , 76 , 77 ). Alternatively, these factors pre-exist and bind concurrently to the multiple basal elements in the form of a holocomplex ( 78 , 79 ). Finally, within a small window, sequence recognition of the Inr by RNA polymerase II (or another member of this complex) accurately defines the start site of transcription.

This genetic arrangement meets the requirement that start site sequences occur uniquely and rather infrequently in the genome. Within this spatial context each of the basal elements can possess individual binding capacities. Nevertheless, sufficient total binding capacity exists to permit the stable formation of pre-initiation complexes. Biochemical evidence supporting this hypothesis includes the recent studies of Aso et al. ( 73 ) that a variety of both TATA and TATA-less promoters require the same general factors for basal transcription and that basal transcriptional activity correlates with the binding affinities of these factors for the promoter.

ACKNOWLEDGEMENTS

We thank Peggy Farnham for HeLa S-3 cells and Dick Burgess, Peggy Farnham and members of their laboratories and our laboratory for helpful discussions. We especially thank Paul Lambert, Gary Stormo, Grace Wahba and Nancy Thompson for helpful comments on the manuscript. This research was supported by US Public Health Service Research grants CA-07175, CA-09075, CA-09135 and CA-22443 from the National Cancer Institute.

REFERENCES

1 Zawel,L. and Reinberg,D. (1992) Curr. Opin. Cell Biol., 4, 488-495. MEDLINE Abstract

2 Conaway,R.C. and Conaway,J.W. (1993) Annu. Rev. Biochem., 62, 161-190. MEDLINE Abstract

3 Hernandez,N. (1993) Genes Dev., 7, 1291-1308. MEDLINE Abstract

4 Zawel,L. and Reinberg,D. (1993) In Cohn,W.E. and Moldave,K. (Eds), Progress in Nucleic Acid Research and Molecular Biology. Academic Press, San Diego, CA, Vol. 44, pp. 67-108.

5 Buratowski,S. (1994) Cell, 77, 1-3. MEDLINE Abstract

6 Jiang,Y., Smale,S.T. and Gralla,J.D. (1993) J. Biol. Chem., 268, 6535-6540. MEDLINE Abstract

7 Goodrich,J.A. and Tjian,R. (1994) Cell, 77, 145-156. MEDLINE Abstract

8 Maxon,M.E., Goodrich,J.A. and Tjian,R. (1994) Genes Dev., 8, 515-524. MEDLINE Abstract

9 Roeder, R. G. (1991) Trends Biochem. Sci., 16, 402-408. MEDLINE Abstract

10 Weis, L. and Reinberg, D. (1992) FASEB J., 6, 3300-3309. MEDLINE Abstract

11 Roy,A.L., Malik,S., Meisterernst,M. and Roeder,R.G. (1993) Nature, 365, 355-359. MEDLINE Abstract

12 Kollmar,R. and Farnham,P.J. (1993) Proc. Soc. Exp. Biol. Med., 203, 127-139. MEDLINE Abstract

13 Grosschedl,R. and Birnstiel,M.L. (1980) Proc. Natl. Acad. Sci. USA, 77, 1432-1436. MEDLINE Abstract

14 Piatak,M., Ghosh,P.K., Norkin,L.C. and Weissman,S.M. (1983) J. Virol., 48, 503-520. MEDLINE Abstract

15 Concino,M.F., Lee,R.F., Merryweather,J.P. and Weinmann,R. (1984) Nucleic Acids Res., 12, 7423- 7433. MEDLINE Abstract

16 Somasehhar,M.B. and Mertz,J.E. (1985) J. Virol., 56, 1002-1013. MEDLINE Abstract

17 Smale,S.T. and Baltimore,D. (1989) Cell, 57, 103-113 MEDLINE Abstract

18 Du,H., Roy,A.L. and Roeder,R.G. (1993) EMBO J., 12, 501-511. MEDLINE Abstract

19 Blake,M.C. and Azizkhan,J.C. (1989) Mol. Cell. Biol., 9, 4994-5002. MEDLINE Abstract

20 Garfinkel,S., Thompson,J.A., Jacob,W.F., Cohen,R. and Safer,B. (1990) J. Biol. Chem., 265, 10309-10319. MEDLINE Abstract

21 Means,A.L. and Farnham,P.J. (1990) Mol. Cell. Biol., 10, 653-661. MEDLINE Abstract

22 Roy,A.L., Meisterernst,M., Pognonec,P. and Roeder,R. G. (1991) Nature, 354, 245-248. MEDLINE Abstract

23 Seto,E., Shi,Y. and Shenk,T. (1991) Nature, 354, 241-245. MEDLINE Abstract

24 Means,A.L. Slansky,J.E., McMahon,S.L., Knuth,M.W. and Farnham,P.J. (1992) Mol. Cell. Biol., 12, 1054-1063.

25 Javahery,R., Khachi,A., Lo,K., Zenzie-Gregory,B. and Smale,S.T. (1994) Mol. Cell. Biol., 14, 116-126.

26 Usheva,A. and Shenk,T. (1994) Cell, 76, 1115-1121. MEDLINE Abstract

27 Ayer,D.E. and Dynan,W.S. (1988) Mol. Cell. Biol., 8, 2021-2033. MEDLINE Abstract

28 Wiley,S.R., Kraus,R.J. and Mertz,J.E. (1992) Proc. Natl. Acad. Sci. USA, 89, 5814- 5818. MEDLINE Abstract

29 Ayer,D.E. and Dynan,W.S. (1990) Mol. Cell. Biol., 10, 3635-3645. MEDLINE Abstract

30 Wiley,S.R., Kraus,R.J., Zuo,F.R., Murray,E.E., Loritz,K. and Mertz,J.E. (1993) Genes Dev., 7, 2206-2219. MEDLINE Abstract

31 Zuo,F. and Mertz,J.E. (1995) Proc. Natl. Acad Sci. USA 92, 8586-8590. MEDLINE Abstract

32 Sambrook,J., Fritsch,E.F. and Maniatis,T. (1989) Molecular Cloning: A Laboratory Manual, 2nd Edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY.

33 Fromm,M. and Berg,P. (1982) J. Mol. Appl. Genet., 1, 457-481.

34 Good,P.J., Welch,R.C., Ryu,W.-S. and Mertz,J.E. (1988) J. Virol., 62, 563-571. MEDLINE Abstract

35 Chen,W. and Struhl,K. (1988) Proc. Natl. Acad. Sci. USA, 85, 2691-2695. MEDLINE Abstract

36 Hertz,G.Z. and Mertz,J.E. (1986) Mol. Cell. Biol., 6, 3513-3522. MEDLINE Abstract

37 Rumelhart,D.E., Hinton,G.E. and Williams,R.J. (1986) Nature, 323, 533-536.

38 Rumelhart,D.E., Hinton,G.E. and Williams,R.J. (1986) Learning Internal Representations by Error Propagation. Parrallel Distributed Processing: Explorations in the Microstructure of Cognition. MIT Press, Cambridge, MA, Vol. 1, pp. 318-362.

39 Demeler,B. and Zhou,G.W. (1991) Nucleic Acids Res., 19, 1593-1599. MEDLINE Abstract

40 O'Neil,M.C. (1991) Nucleic Acids Res., 19, 313-318.

41 Wobbe,C.R. and Struhl,K. (1990) Mol. Cell. Biol., 10, 3859-3867. MEDLINE Abstract

42 Mukumoto,F., Hirose,S., Imaseki,H. and Yamazaki,K. (1993) Plant Mol. Biol., 23, 995-1003. MEDLINE Abstract

43 Bucher,P. (1990) J. Mol. Biol., 212, 563-578. MEDLINE Abstract

44 van Groenigen,M.L., Valentijn,J. and Baas,F. (1993) Biochim. Biophys. Acta, 1172, 138-146.

45 Wang,J.C. and Van Dyke,M.W. (1993) Biochim. Biophys. Acta, 1216, 73-80.

46 Myers,R.M., Tilly,K. and Maniatis,T. (1986) Science, 232, 613-618. MEDLINE Abstract

47 Nandi,A., Das,G. and Salzman,N.P. (1985) Mol. Cell. Biol., 5, 591-594.

48 Pauly,M., Treger,M., Westhof,E. and Chambon,P. (1992) Nucleic Acids Res., 20, 975-982. MEDLINE Abstract

49 Blake,M.C., Jambou,R.C., Swick,A.G., Kahn,J.W. and Azizkhan,J.C. (1990) Mol. Cell. Biol., 10, 6632-6641.

50 Kollmar,R., Sukow,K.A., Sponagle,S.K. and Farnham,P.J. (1994) J. Biol. Chem. 269, 2252-2257. MEDLINE Abstract

51 Smale,S.T., Schmidt,M.C., Berk,A.J. and Baltimore,D. (1990) Proc. Natl. Acad. Sci. USA, 87, 4509-4513. MEDLINE Abstract

52 Beaupain,D., Eleouet,J.F. and Romeo,P.H. (1990) Nucleic Acids Res., 18, 6509-6515. MEDLINE Abstract

53 Chung,S. and Perry,R.P. (1991) Gene, 100, 173-180. MEDLINE Abstract

54 Hariharan,N. and Perry,R.P. (1990) Proc. Natl. Acad. Sci. USA, 87, 1526-1530. MEDLINE Abstract

55 Blissard,G.W., Kogan,P.H., Wei,R. and Rohrmann,G.F. (1992) Virology, 190, 783-793. MEDLINE Abstract

56 Lieber,A., Teppke,M., Herrmann,G. and Strauss,M. (1991) FEBS Lett., 282, 225- 227. MEDLINE Abstract

57 Lescure,A., Murgo,S., Carbon,P. and Krol,A. (1992) Nucleic Acids Res., 20, 1573-1578. MEDLINE Abstract

58 Stormo,G.D., Schneider,T.D. and Gold,L. (1986) Nucleic Acids Res., 14, 6661-6679. MEDLINE Abstract

50 Emanuel,P.A. and Gilmour,D.S. (1993) Proc. Natl. Acad. Sci. USA, 90, 8449-8453. MEDLINE Abstract

60 Kaufman,J. and Smale,S.T. (1994) Genes Dev., 8, 821-829.

61 Pugh,B.F. and Tjian,R. (1991) Genes Dev., 5, 1935-1945. MEDLINE Abstract

62 Purnell,B.A. and Gilmour,D.S. (1993) Mol. Cell. Biol., 13, 2593-2603. MEDLINE Abstract

63 Purnell,B.A., Emanuel,P.A. and Gilmour,D.S. (1994) Genes Dev., 8, 830-842.

64 Verrijzer,C.P., Yokomori,K., Chen,J.L. and Tjian,R. (1994) Science, 264, 933-941. MEDLINE Abstract

65 Verrijzer,C.P., Chen,J.L., Yokomori,K. and Tjian,R. (1995) Cell, 81, 1115-1125. MEDLINE Abstract

66 Hansen,S.K. and Tjian,R. (1995) Cell, 82, 565-575. MEDLINE Abstract

67 Carcamo,J., Buckbinder,L. and Reinberg,D. (1991) Proc. Natl. Acad. Sci. USA, 88, 8052-8056.

68 Simpson,R.B. (1979) Cell, 18, 277-285. MEDLINE Abstract

69 Guo,H.C. and Roberts,J.W. (1990) Biochemistry, 29, 10702-10709 MEDLINE Abstract

70 O'Shea-Greenfield,A. and Smale,S.T. (1992) J. Biol. Chem., 267, 1391-1402.

71 Pugh,B.F. and Tjian,R. (1990) Cell, 61, 1187-1197. MEDLINE Abstract

72 Zhou,Q., Lieberman,P.M., Boyer,T.G. and Berk,A.J. (1992) Genes Dev., 6, 1964-1974.

73 Aso,T., Conaway,J.W. and Conaway,R.C. (1994) J. Biol. Chem., 269, 26575-26583. MEDLINE Abstract

74 Zenzie-Gregory,B., O'Shea-Greenfield,A. and Smale,S.T. (1992) J. Biol. Chem., 267, 2823-2830.

75 Zenzie-Gregory,B., Khachi,A., Garraway,I.P. and Smale,S.T. (1993) Mol. Cell. Biol., 13, 3841-3849.

76 Pinto,I., Ware,D.E. and Hampsey,M. (1992) Cell, 68, 977-988. MEDLINE Abstract

77 Ha,I., Roberts,S., Maldonado,E., Sun,X., Kim,L.U., Green,M. and Reinberg,D. (1993) Genes Dev., 7, 1021-1032. MEDLINE Abstract

78 Koleske,A.J. and Young,R.A. (1994) Nature, 368, 466-469. MEDLINE Abstract

79 Ossipow,V., Tassan,J.P., Nigg,E.A. and Schibler,U. (1995) Cell, 83, 137-146. MEDLINE Abstract

80 Buchman,A.R., Burnett,L. and Berg,P. (19??) In Tooze,J., (ed.), DNA Tumor Viruses., 2nd Edn. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY, pp. 799-841.


Return

* To whom correspondence should be addressed

Present addresses: + Promega Corporation, Madison, WI 53711, USA and [sect] Immunex Corporation, Seattle, WA 98101, USA.
Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
R. Yamashita, Y. Suzuki, N. Takeuchi, H. Wakaguri, T. Ueda, S. Sugano, and K. Nakai
Comprehensive detection of human terminal oligo-pyrimidine (TOP) genes and analysis of their characteristics
Nucleic Acids Res., June 1, 2008; 36(11): 3707 - 3715.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
N. I. Gershenzon, G. D. Stormo, and I. P. Ioshikhes
Computational technique for improvement of the position-weight matrices for the DNA/protein binding sites
Nucleic Acids Res., April 22, 2005; 33(7): 2290 - 2301.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. N. Vallejo, E. Bryl, K. Klarskov, S. Naylor, C. M. Weyand, and J. J. Goronzy
Molecular Basis for the Loss of CD28 Expression in Senescent T Cells
J. Biol. Chem., November 27, 2002; 277(49): 46940 - 46949.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Locker, D. Ghosh, P.-V. Luc, and J. Zheng
Definition and prediction of the full range of transcription factor binding sites--the hepatocyte nuclear factor 1 dimeric site
Nucleic Acids Res., September 1, 2002; 30(17): 3809 - 3817.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
M. L. Farrell and J. E. Mertz
Cell Type-Specific Replication of Simian Virus 40 Conferred by Hormone Response Elements in the Late Promoter
J. Virol., June 5, 2002; 76(13): 6762 - 6770.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
C. Abrescia, E. De Gregorio, M. Frontini, R. Mantovani, and P. Di Nocera
A Novel Intragenic Sequence Enhances Initiator-dependent Transcription in Human Embryonic Kidney 293 Cells
J. Biol. Chem., May 24, 2002; 277(22): 19594 - 19599.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. V. Ponomarenko, D. P. Furman, A. S. Frolov, N. L. Podkolodny, G. V. Orlova, M. P. Ponomarenko, N. A. Kolchanov, and A. Sarai
ACTIVITY: a database on DNA/RNA sites activity adapted to apply sequence-activity relationships from one system to another
Nucleic Acids Res., January 1, 2001; 29(1): 284 - 287.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
E. Scandella, U. O. Nagl, B. Oehl, F. Bergmann, M. Gschwentner, J. Furst, A. Schmarda, M. Ritter, S. Waldegger, F. Lang, et al.
The Promoter for Constitutive Expression of the Human ICln Gene CLNS1A
J. Biol. Chem., May 19, 2000; 275(21): 15613 - 15620.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Physiol. Heart Circ. Physiol.Home page
L. Liu, Q. I. Fan, M. R. El-Zaru, K. Vanderpool, R. N. Hines, and J. D. Marsh
Regulation of DHP receptor expression by elements in the 5'-flanking sequence
Am J Physiol Heart Circ Physiol, April 1, 2000; 278(4): H1153 - H1162.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
B. R. Sevetson, J. Svaren, and J. Milbrandt
A Novel Activation Function for NAB Proteins in EGR-dependent Transcription of the Luteinizing Hormone beta Gene
J. Biol. Chem., March 24, 2000; 275(13): 9749 - 9757.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
D. Solecki, E. Wimmer, M. Lipp, and G. Bernhardt
Identification and Characterization of the cis-Acting Elements of the Human CD155 Gene Core Promoter
J. Biol. Chem., January 15, 1999; 274(3): 1791 - 1800.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. Biol.Home page
K. A. Johnston, M. Polymenis, S. Wang, J. Branda, and E. V. Schmidt
Novel Regulatory Factors Interacting with the Promoter of the Gene Encoding the mRNA Cap Binding Protein (eIF4E) and Their Function in Growth Regulation
Mol. Cell. Biol., October 1, 1998; 18(10): 5621 - 5633.
[Abstract] [Full Text]


Home page
J. Virol.Home page
C. M. Mobley and L. Sealy
Role of the Transcription Start Site Core Region and Transcription Factor YY1 in Rous Sarcoma Virus Long Terminal Repeat Promoter Activity
J. Virol., August 1, 1998; 72(8): 6592 - 6601.
[Abstract] [Full Text] [PDF]


Home page
Cardiovasc ResHome page
H. A.A. van Heugten, M. C. van Setten, K. Eizema, P. D. Verdouw, and J. M.J. Lamers
Sarcoplasmic reticulum Ca2+ ATPase promoter activity during endothelin-1 induced hypertrophy of cultured rat cardiomyocytes
Cardiovasc Res, February 1, 1998; 37(2): 503 - 514.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. W. Fickett and A. G. Hatzigeorgiou
Eukaryotic Promoter Recognition
Genome Res., September 1, 1997; 7(9): 861 - 878.
[Full Text] [PDF]


Home page
J. Biol. Chem.Home page
D. Solecki, S. Schwarz, E. Wimmer, and M. Lipp
The Promoters for Human and Monkey Poliovirus Receptors. REQUIREMENTS FOR BASIC AND CELL TYPE-SPECIFIC ACTIVITY
J. Biol. Chem., February 28, 1997; 272(9): 5579 - 5586.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
W. W. Quitschke, J. P. Matthews, R. J. Kraus, and A. A. Vostrov
The Initiator Element and Proximal Upstream Sequences Affect Transcriptional Activity and Start Site Selection in the Amyloid beta -Protein Precursor Promoter
J. Biol. Chem., September 6, 1996; 271(36): 22231 - 22239.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
K. H. Cox, R. Rai, M. Distler, J. R. Daugherty, J. A. Coffman, and T. G. Cooper
Saccharomyces cerevisiae GATA Sequences Function as TATA Elements during Nitrogen Catabolite Repression and When Gln3p Is Excluded from the Nucleus by Overproduction of Ure2p
J. Biol. Chem., June 2, 2000; 275(23): 17611 - 17618.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. N. Vallejo, C. M. Weyand, and J. J. Goronzy
Functional Disruption of the CD28 Gene Transcriptional Initiator in Senescent T Cells
J. Biol. Chem., January 19, 2001; 276(4): 2565 - 2570.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (193K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (52)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Kraus, R.
Right arrow Articles by Mertz, J.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Kraus, R.
Right arrow Articles by Mertz, J.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?