| Nucleic Acids Research | Pages |
Using sequence logos and information analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX
Introduction
Materials And Methods
Results
Natural Lrp sites
Evidence of model accuracy
Informational predictions of possible sites
SELEX-generated sites
Discussion
Acknowledgements
References
Using sequence logos and information analysis of Lrp DNA binding sites to investigate discrepancies between natural selection and SELEX
ABSTRACT
INTRODUCTION
Genetic control is exerted when proteins bind to specific nucleic acid sequences. Traditionally, these sequences have been collected from the naturally evolved sites. More recently, protein-binding motifs have been characterized by using in vitro selection procedures. It is often assumed that in vitro results accurately reflect natural binding sites, but a quantitative comparison of the two approaches has usually been lacking. In this paper we make this comparison for the leucine-responsive regulatory protein (Lrp).
Lrp is a pleiotropic DNA-binding protein in Escherichia coli and Salmonella typhimurium that consists of two 18.8 kDa subunits (1), and that forms a homodimer in solution (2). Lrp binds to multiple sites in a number of operons, including dad, fanABC, papBA and ilvIH (3-5). Leucine can invoke either positive or negative transcriptional control by Lrp (1,6).
Cui et al. investigated Lrp by using the SELEX (systematic evolution of ligands by exponential enrichment) procedure (7), an in vitro method that is used to identify binding motifs. In the SELEX procedure, a specific protein is used to select binding sequences from random synthetic sequences (8). Since its introduction, the SELEX technique has been used to study a variety of systems (9,10).
Since Lrp has many natural binding sites, a reasonably accurate model for in vivo binding sequences can be created and compared with sites produced by SELEX. Based on Claude Shannon's information theory (11,12), molecular information theory (13,14) is a mathematical approach to explaining molecular interactions. Using information theory, we constructed two separate models of Lrp binding sequences for comparison. These quantitative models, called sequence logos (15), graphically represent Lrp binding in both the natural and synthetic environments. Comparison of the models allowed us to test whether the sites selected in vitro had evolved to simulate natural binding sites.
MATERIALS AND METHODS
Twenty-seven Lrp binding sites were aligned for analysis of Lrp binding patterns (Fig.
Figure 1. Aligned listing of 27 E.coli Lrp binding sites. Columns from left to right indicate gene or operon name; Lrp activation (A), repression (R) or whether its effect is unknown (?); GenBank accession number; zero coordinate in GenBank entry; the orientation of the sequence relative to the GenBank entry; sequence number; the binding sequence and Ri of each site in bits for the range -1 to +12. Twenty-five footprinted sites [ilvIH (5,27), trxB (36), micF and ompC (37), gcv (38), gltBDF (33), lysU (39), papBA (4,40,41), faeA (42), daaAB and sfaBA (43)], and two mutated sites that affected binding [dad (3) and tdh (44)] are shown. The alignment is derived from Fraenkel (16). These sites are summarized as sequence logos in Figure 2. The delila, alist, encode, rseq, dalvec and makelogo programs were used as described previously to create both natural (Fig. Figure 2. Sequence logos of natural Lrp sites. (A) Sequence logo of the 27 wild-type Lrp binding sites shown in Figure 1. Sequence conservation, measured in bits of information, is depicted by the height of a stack of letters for each position in the binding sites. The relative heights of the letters within a stack are proportional to their frequencies. Circles were placed below guanines protected from DMS attacks by Lrp (27). Open circles are guanines protected on the top strand, and filled circles are guanines protected on the bottom strand. Triangles denote DNase I hypersensitive sites (43). (B) Sequence logo of 17 Lrp-activation binding sites (A in Fig. 1). (C) sequence logo of nine Lrp-repression binding sites (R in Fig. 1). The cosine wave represents the 10.6 base twist of B-form DNA (17,26). The information content of individual genetic sequences (Ri) can be determined to identify potential binding sites (20-25).The programs ri, scan, search, live and lister were used to identify and map the Lrp sites relative to the footprint data, and sites were displayed by the sequence walker method (20,21). First, we used ri to create an Riw (b, l) weight matrix from the aligned set of sites. Then we scanned each sequence with the natural Lrp weight matrix using scan. Next, we used the search program to identify and mark the footprinted regions on the map. The live program was used to create a spectrum color strip to indicate protein binding site orientation on B-form DNA (Figs Figure 3. Lrp sequence walkers compared to in vivo DMS footprinting data. Six in vivo DMS footprinted ilvIH sites are marked by dashed lines (5). Beneath these sites are sequence walkers along with the Ri of each site given in bits. The height of each letter in a walker is the sequence conservation that that base contributes to the average sequence conservation shown in the sequence logo (Fig. 2A). The green rectangles mark the zero coordinate of each walker and provide a scale from -3 to +2 bits. All letters of the walkers are rotated 90° counter-clockwise, indicating that all Lrp ilvIH sites have the same orientation. Walker location was determined by the scan program for Ri > 4 bits, which includes all known natural sites in Figure 1. The asterisks and numbers above the sequence indicate the position on the E.coli genome, GenBank entry U00096 (45). The color strip above the sequence has a 10.6 base cycle, representing the helical structure of B-form DNA. Sites 1, 2 and 4 are on the same face because their zero coordinates fall under the same color. Sites 3, 5 and 6 are on the opposite face. Figure 4. Predicted Lrp sites in the fimA regulatory region. Two Lrp binding sites previously predicted (31) are marked with dashes, while (OP)2Cu2+ and two-stage methidiumpropyl-EDTA footprinting data shows extended protection (31,32), that corresponds to a series of Lrp walkers. To include the 3.3 bit site predicted by Gally et al. (31), all sequence walkers with Ri > 3 bits are shown. The black rectangle indicates a base not observed in the original data set (Fig. 1). If this site were included in the model, it would become 7.2 bits (20). All predicted Lrp sites have the same orientation and the four sites having 10.1, 5.1, 4.8 and 5.5 bits would be on the same face of the DNA since their zero positions are at nearly the same color on the spectrum, while the 4.0, 4.6 and 7.5 bit sites are approximately on the opposite face. The sequence is from GenBank accession U00096 (45). Figure 5. Sequence logos of SELEX-generated Lrp binding sites. (A) Sequence logo of 30 sites that made up SELEX experiment 1 of Cui et al. (7), in which there was no leucine added. (B) Sequence logo of 25 sites that made up experiment 2, without leucine. (C) Sequence logo for experiments 1 and 2 combined, without leucine. (D) Sequence logo of 12 sites that made up experiment 3, with leucine. (E) Sequence logo for experiments 1, 2 and 3 combined. (E) has two cosine waves to show possible major groove binding on two different faces. The SELEX and natural coordinate systems were chosen to facilitate comparison with each other. To compare binding energy with individual information (Fig. Figure 6. Comparing the relative binding strength of Cui et al. with Ri. (A) Using the Riw (b, l) weight matrix from the natural sites (Fig. 1), we scanned the SELEX-generated sequences in Cui et al. (7). The highest Ri for each SELEX-generated site was chosen for the 62 sites reported; no correlation was observed between binding strength and Ri (r = 0.15). Also, the sum of all positive Ri values in each sequence was compared with the reported binding energy, but no correlation was found by this second approach (r = 0.06). (B) An Riw (b, l) matrix was made from the SELEX sequences and used to evaluate the same sequences. The single outlier, referred to as 7 in Figure 3 of Cui et al. (7), contains a T at +2 that is not observed in any other SELEX sequence and is therefore rated with a low value (20) (r = 0.43 without the outlier). Further information on programs is available at http://www-lecb.ncifcrf.gov/~toms/
RESULTS
Natural Lrp sites
The Lrp sequence logo (Fig.
Lrp is known to both activate and repress transcription (6), so sequence logos for both Lrp activation and repression sites were made (Fig.
To see if activation sites can be distinguished from repression sites, activation and repression Riw (b, l) weight matrices were assembled for individual informational analysis of all footprinted Lrp sites. Activation sites were given higher Ri evaluation by the activation model and the repression sites were favored by the repression model. To test the predictive capability of these matrices, we repeated this analysis but excluded each site from its own matrix. We found that we could not predict repression versus activation. The failure of this bootstrap test, for all sites, suggests that either the activation and repression sites are essentially identical or that more examples are needed to distinguish between them.
Evidence of model accuracy
To test our model's accuracy, we scanned the complete 27 site individual information weight matrix across the six in vivo footprinted ilvIH sites (5) and displayed the results using sequence walkers (Fig.
Informational predictions of possible sites
To test our Lrp binding site model, we excluded the fimA regulatory region from our data set (Fig.
SELEX-generated sites
Cui et al. used the SELEX procedure (8) to obtain sequences that bind Lrp (7). Two SELEX experiments that had been performed in the absence of leucine are represented by sequence logos in Figure
DISCUSSION
Experimentally characterized natural Lrp binding sites (Fig.
Surprisingly, the sequence logos for natural Lrp binding sites determined by footprints or mutations (Fig.
To explain these major discrepancies between the natural and the SELEX sites, we suggest that three proteins are binding in the SELEX experiment, since there are three bulges of sequence conservation that rise above the 1 bit mark in the SELEX logo (Fig.
A possible explanation for differences between the in vivo and in vitro results is that Lrp naturally forms a trimer with only the central molecule specifically binding to the DNA. A homodimer flanking a central monomer would not only be consistent with sequence conservation in the center of both natural and SELEX logos (-1 to +3 in Figs
When in vitro selections for OxyR and TrpR were analyzed by information theory, they were also found to give results that differ from those obtained from naturally selected sites (17,34). The differences between the in vivo and in vitro Lrp logos might be attributed to unnatural experimental conditions. Inappropriate salt levels or temperatures, the absence of spermidine (35), or selection of band-shifted DNA with the highest molecular weight (i.e., a triplet complex), among many other possibilities, might be reasons for these results. Sequence logos could be used to quantitatively investigate the effects of such varying conditions (17).
In vitro selection procedures do not always mimic natural evolution (10). The strongest sites, such as those found by SELEX, are not `optimal' when viewed on an information theory scale (20). Instead, natural sites are observed to have a Gaussian distribution that peters out at the high end. From this viewpoint, the strongest possible sites are seen as abnormal. When SELEX is pushed to obtain the strongest possible binding sites, the resulting sequence logo should show more sequence conservation than the natural sites, and as shown in this paper may be radically different from the natural logo. When the in vitro selections are more mild, the logos may resemble each other if the conditions are comparable. If one's goal is to obtain the strongest binder, as has been the emphasis for most of the work with SELEX (10), then strong selection is appropriate and sequence logos can be used to characterize the strong sites. If, instead, the goal is to learn more about the binding pattern of natural ligands, then weaker selection under various conditions could be guided by using sequence logos.
ACKNOWLEDGEMENTS
We thank Paul Shultzaberger for providing R.K.S. with a car, Debby Shultzaberger for re-typing sequences for comparison, Kay Kennedy and Emily Moler for running the Werner H. Kirsten Student Intern Program at NCI, Brenda Deener for scientific guidance and instruction, Mike Miller for bringing attention to grammatical errors, Elaine Bucheimer, Karen Lewis, Paul N. Hengen and Peter K. Rogan for their critique. We thank an anonymous referee for suggesting Figure
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 15 Jan 1999
Copyright©Oxford University Press, 1999.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
J. Zhang, E. Li, and G. J. Olsen
Protein-coding gene promoters in Methanocaldococcus (Methanococcus) jannaschii
Nucleic Acids Res.,
June 1, 2009;
37(11):
3588 - 3601.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
I. G. Lyakhov, A. Krishnamachari, and T. D. Schneider
Discovery of novel tumor suppressor p53 response elements using information theory
Nucleic Acids Res.,
June 1, 2008;
36(11):
3828 - 3833.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. Quatrini, C. Lefimil, F. A. Veloso, I. Pedroso, D. S. Holmes, and E. Jedlicki
Bioinformatic prediction and experimental verification of Fur-regulated genes in the extreme acidophile Acidithiobacillus ferrooxidans
Nucleic Acids Res.,
April 1, 2007;
35(7):
2153 - 2166.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. K. Shultzaberger, Z. Chen, K. A. Lewis, and T. D. Schneider
Anatomy of Escherichia coli {sigma}70 promoters
Nucleic Acids Res.,
February 16, 2007;
35(3):
771 - 788.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. Hasan and M. Schreiber
Recovering motifs from biased genomes: application of signal correction
Nucleic Acids Res.,
October 6, 2006;
(2006)
gkl676v3.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. A. Vyhlidal, P. K. Rogan, and J. S. Leeder
Development and Refinement of Pregnane X Receptor (PXR) DNA Binding Site Model Using Information Theory: INSIGHTS INTO PXR-MEDIATED GENE REGULATION
J. Biol. Chem.,
November 5, 2004;
279(45):
46779 - 46786.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
L. A. Mirny and M. S. Gelfand
Structural analysis of conserved base pairs in protein-DNA complexes
Nucleic Acids Res.,
April 1, 2002;
30(7):
1704 - 1711.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. V. Ponomarenko, G. V. Orlova, A. S. Frolov, M. S. Gelfand, and M. P. Ponomarenko
SELEX_DB: a database on in vitro selected oligomers adapted for recognizing natural sites and for analyzing both SNPs and site-directed mutagenesis data
Nucleic Acids Res.,
January 1, 2002;
30(1):
195 - 199.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
T. D. Schneider
Strong minor groove base conservation in sequence logos implies DNA distortion or base flipping during replication and transcription initiation
Nucleic Acids Res.,
December 1, 2001;
29(23):
4881 - 4891.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
T. D. Schneider
Evolution of biological information
Nucleic Acids Res.,
July 15, 2000;
28(14):
2794 - 2799.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. M. Vijesurier, L. Carlock, R. M. Blumenthal, and J. C. Dunbar
Role and Mechanism of Action of C {middle dot} PvuII, a Regulatory Protein Conserved among Restriction-Modification Systems
J. Bacteriol.,
January 15, 2000;
182(2):
477 - 487.
[Abstract]
[Full Text]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (1115K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (22)
![]()
Request Permissions ![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Shultzaberger, R. K.
![]()
Articles by Schneider, T. D.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Shultzaberger, R. K.
![]()
Articles by Schneider, T. D.
![]()
Social Bookmarking ![]()
![]()
What's this?