Nucleic Acids Research Advance Access published online on July 24, 2008
Nucleic Acids Research, doi:10.1093/nar/gkn477
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Gene Regulation, Chromatin and Epigenetics |
A high-throughput percentage-of-binding strategy to measure binding energies in DNA–protein interactions: application to genome-scale site discovery
1Department of Molecular Virology & Microbiology, Baylor College of Medicine, Houston, TX 77030, 2Institute for Environmental Genomics, Department of Botany and Microbiology, University of Oklahoma, Norman, OK 73019, USA, 3College of Life Sciences, Zhejiang University, Hangzhou, Zhejiang 310029, P. R. China and 4Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX 77030, USA
*To whom correspondence should be addressed. Tel: +1 713 798 5609; Fax: +1 713 798 7375; Email: timothyp{at}bcm.tmc.edu
Received May 8, 2008. Revised June 11, 2008. Accepted July 8, 2008.
| ABSTRACT |
|---|
|
|
|---|
Quantifying the binding energy in DNA–protein interactions is of critical importance to understand transcriptional regulation. Based on a simple computational model, this study describes a high-throughput percentage-of-binding strategy to measure the binding energy in DNA–protein interactions between the Shewanella oneidensis ArcA two-component transcription factor protein and a systematic set of mutants in an ArcA-P (phosphorylated ArcA) binding site. The binding energies corresponding to each of the 4 nt at each position in the 15-bp binding site were used to construct a position-specific energy matrix (PEM) that allowed a reliable prediction of ArcA-P binding sites not only in Shewanella but also in related bacterial genomes.
| INTRODUCTION |
|---|
|
|
|---|
Transcription factor proteins can recognize and bind a collection of similar DNA sequences with various affinities (1). This degenerate binding ability renders cells capable of controlling thousands of genes with relatively few regulatory proteins (2). Degenerate binding, however, poses a significant challenge for understanding the mechanism of transcriptional regulation, especially in terms of identifying new binding sites (i.e. site discovery).
During the past two decades, numerous computational site-discovery methods have been developed. However, it is still a challenge to predict transcription factor binding sites (3–5). One explanation for the difficulty is that computational predictions are usually based on sequence conservation of transcription factor binding sites rather than thermodynamic parameters that govern DNA–protein interactions. Experimental data, such as that obtained from footprinting assays and transcriptional profiling, can greatly increase the accuracy of computational predictions (6). Obtaining sufficient high quality experimental data, however, is a work-intensive task. For high-throughput experimental methods such as various ChIP-based approaches (7–10), a high-quality antibody and multiple experimental steps are usually necessary.
Although the in vivo occupancy of cis-regulatory elements may be affected by many factors (11), the occurrence and strength of a DNA–protein interaction is ultimately determined by whether it is a thermodynamically favored reaction. Thus, measuring a DNA–protein binding constant and thereby the binding energy of an interaction represents a crucial step towards understanding transcriptional regulation. Although experimental approaches for measuring these thermodynamic parameters are well established, high-throughput methods have not yet been extensively developed. The few available medium- or high-throughput experimental methods, including SPR (surface plasmon resonance) (12), microwell-based assays (13), displacement of DNA binding dye (14), MITOMI (mechanically induced trapping of molecular interactions) (15,16) and competition assays (17–20) are limited by various factors, such as cost, sensitivity and special protein/DNA constructs. To date, the most commonly used experimental approach remains the time consuming curve-fitting method.
The goal of this study was to develop an effective general approach for a rapid and accurate genome-scale prediction of transcription factor binding sites using ArcA as a model system. The ArcA transcription factor belongs to the canonical ArcA/B two-component system in which ArcB is a membrane associated histidine kinase and ArcA is a downstream transcription response regulator (21). As a major oxygen response regulator, the ArcA protein is well conserved in many Gram-negative bacteria including Shewanella oneidensis MR-1, which is a model organism for bioremediation studies (22). Recent studies, however, indicate the ArcA protein may regulate a different set of genes in S. oneidensis than those regulated in Escherichia coli (23,24). In order to determine the sequence requirements for ArcA-P binding, systematic mutagenesis of an ArcA-P binding site and subsequent quantitation of binding energy of each mutant was performed. Mathematical modeling indicated that, in principle, the binding energy in DNA–protein interactions can be determined using a simple percentage-of-binding approach instead of curve fitting. By applying this method to the traditional electrophoretic mobility shift assay (EMSA) and a recently developed protein binding microarray (PBM) technology (25), the DNA sequence requirements and associated binding energies for the ArcA-P protein were systematically determined by a simple one-step binding assay and this experimental information was used to construct a position-specific energy matrix (PEM) for genome-scale prediction of binding sites.
| MATERIALS AND METHODS |
|---|
|
|
|---|
Computational model of DNA–protein interactions
In a DNA–protein interaction: [L]+[P]
[LP], where [L], [P] and [LP] represent the concentration of free (or unbound) DNA ligand, free protein and the DNA–protein complex, respectively. If it is assumed that the pressure and temperature do not change in the binding reaction, the Gibbs binding energy
G can be then determined by Equation (1) and the dissociation constant Kd by Equation (2) when the binding reaction reaches equilibrium, in which R, T, x and 1–x represent the gas constant, the absolute temperature, the fraction of DNA ligand bound to protein and the fraction of free DNA ligand, respectively (26,27). By combining equations, the
G can be calculated according to Equation (3).
|
| (1) |
|
| (2) |
|
| (3) |
|
| (4) |
|
| (5) |
|
| (6) |
|
| (7) |
|
| (8) |
G1 and
GRef reactions, can be determined using Equations (4) and (5), respectively. The relative binding energy (
G) between
G1 and
GRef reactions can then be calculated using Equation (6). If the free protein concentration is kept constant in these two binding reactions (i.e. [P]1 = [P]Ref), Equation (6) can be simplified as Equation (7), where 
G is determined by the percentage of DNA ligand bound. In actual binding reactions, it is very difficult to keep the free protein concentration constant. It is possible, however, to achieve an approximately constant protein concentration by performing assays with a protein concentration that is much higher than the DNA ligand concentration.
EMSA with systematically mutated SO1661 promoters
The purification of His-tagged Shewanella ArcA and E. coli ArcB78-778 as well as the ArcA phosphorylation were performed as described previously (24,28,29). ArcA protein is labeled as ArcA-PC or ArcA-PE when phosphorylated by carbamoyl phosphate or E. coli ArcB78-778, respectively. In this study, a total of 46 oligonucleotides of 48 bp each were synthesized. Forty-five of these primers contained a single mutation in the 15-bp ArcA-P binding site region in the SO1661 promoter (Figure 1A and Table 1). Radiolabeled promoters (144 bp each) were generated from these 46 primers by PCR amplification with a common P33 5'-end labeled SO1661-328 primer (5'-CCACACCATACCGATAAAGAAGC). The interaction of each radiolabeled DNA with ArcA-P (phosphorylated ArcA) was then tested using EMSAs containing
100–250 fmol (
10–20 nM) labeled probe and 1.5 µM ArcA-PE as previously described (24) in which the active amount of ArcA-P was estimated to be 100 nM (Supplementary Method 1). These binding assays were repeated three times and the fraction of promoter DNA bound to the ArcA-P protein was quantified by measuring the density of both shifted and nonshifted DNA bands using ImageQuant TL (GE Healthcare Life Sciences, Piscataway, NJ, USA).
|
Protein binding microarray
For protein binding microarray studies, a series of 48-bp mutant promoters were constructed by synthesis of oligonucleotide pairs that were annealed rather than using PCR for second strand synthesis. The annealed promoter DNAs each contained a single mutation in the conserved 15-bp ArcA-P site (Figure 1A and Table 1). In addition, an amine group was also synthesized onto the 5'-end of one of the paired oligonucleotides. These promoter DNAs were printed and covalently immobilized onto Codelink activated slides in 50 mM PBS (pH 8.5) at 50 pmol/µl, (Amersham Biosciences, Piscataway, NJ, USA) according to the manufacturer's instructions.
|
The PBM experiment was performed by first incubating the microarray slides with blocking buffer (10 mM Tris/HCl, 150 mM NaCl, 5 mM MgCl2, 0.05% Tween-20, 5% milk, pH 7.4) for 30 min at room temperature, and then rinsing briefly with 1 x TBS (10 mM Tris/HCl, 150 mM NaCl, 5 mM MgCl2, pH 7.4). The on-slide DNA–protein interaction was initiated by covering the slides with 300 µl binding solution consisting of 100 mM Tris/HCl (pH 7.4), 20 mM KCl, 10 mM MgCl2, 2 mM DTT, 10% glycerol, 0.1 µg/µl poly(dI·dC) and 2 µM carbamyol phosphate phosphorylated ArcA protein [895 nM active protein concentration (Supplementary Method 1)]. The binding reaction was performed at room temperature for 1 h. After washing off unbound protein, the microarray slides were incubated with an anti-His-tag antibody for 1 h at room temperature. The anti-His-tag antibody was purchased from Qiagen, Valencia, CA, USA (Cat# 34660) and diluted to 1:1200 in blocking buffer. The unbound anti-His-tag antibody was then washed off and the slides were incubated with Cy5-conjugated secondary antibody for 1 h at room temperature. The secondary antibody was purchased from Chemicon, Temecula, CA, USA (Cat# AP160s) and diluted 1:1200 in blocking buffer. After washing away unbound secondary antibody, the slides were dried and quantified using a microarray scanner. To reduce errors in data analysis, the signal intensity of each spot was normalized to the average signal intensity of all the 48 spots. In total, 27 binding replicas were performed with the promoter DNAs.
Genome-scale ArcA-P binding site discovery method
In this study, the upstream intergenic DNA for each S. oneidensis MR-1 ORF (open reading frame) (www.ncbi.nlm.nih.gov) including the first 100 bp of coding sequence was first obtained. These DNA sequences were then scanned with a sliding 15-bp DNA motif window, and the scores of each motif were calculated based on the energy-based ArcA-P PEM (15 x 4) (Table 2) with the assumption that each base contributes independently to the total score of the 15-bp motif (1). These scores represent the predicted binding energies for each DNA site with ArcA-P. The motif with the lowest score (most favorable binding energy) was selected for each intergenic DNA region and ranked according to their 
G scores (Supplementary Table 1). Using the same sliding window approach with the same energy matrix, potential ArcA-P binding sites were also predicted in the promoter regions (intergenic region + the first 100-bp coding sequence) of E. coli K12 MG1655 and Haemophilus influenzae Rd KW20 genomes (Supplementary Tables 2 and 3).
|
| RESULTS |
|---|
|
|
|---|
Model for determining binding energies of mutant promoter DNA to ArcA-P
According to the computational model implemented in Equation (7) (see Materials and Methods section), the relative binding energy difference (

G) for wild-type versus a mutant DNA with respect to a DNA–protein interaction can be determined by performing two separate binding reactions and measuring the fraction of DNA ligand bound (percentage-of-binding) for each reaction at equilibrium. This model assumes that the concentration of free active protein ([P]) remains constant in both binding reactions. In an actual experiment, [P] varies to differing extents according to the binding conditions. Based on Equation (3) or (6), the overall 
G is determined solely by two variable factors,
ln[(1–x)/x] and
ln[P]. This suggests that the error associated with using Equation (7) to determine 
G can be estimated according to the relative weights of
ln[(1–x)/x] and
ln[P]. The ratio of
ln[(1–x)/x] versus
ln[P], is shown as the function f(x) in Equation (8), in which [P], [P]0 or [L]0 represent the concentration of free protein, the total input protein or total input DNA ligand, respectively. The plot generated using Equation (8) is shown in Supplementary Figure 1. The results indicate that the minimal value of f(x) is 9.9, 17.9 or 37.9, if [P]0 is 3, 5 or 10 times that of [L]0, respectively. Therefore, when [P]0 is 3, 5 or 10 times that of [L]0, the weight of
ln[P] in the estimation of total 
G is less than 9.2%, 5.3% or 2.6%, respectively. These data suggest that 
G can be determined accurately using Equation (7) with a ratio of [P]0/[L]0 above 5 (error < 5.3%).
Relative binding energies determined using comparative EMSAs (EMSA-
G)
The SO1661 promoter has been shown to be under the direct control of ArcA (24). Based on footprinting assays and mutational analyses (data not shown), the ArcA-P binding site within the SO1661 promoter was determined to be a 15-bp DNA motif (Figure 1A). To understand the role of ArcA in Shewanella, each position of the 15-bp ArcA-P binding motif within the SO1661 promoter was systematically mutated and the effect on the binding of ArcA-PE [ArcA protein phosphorylated by E. coli ArcB protein is labeled ArcA-PE (see Materials and Methods section)] was examined by EMSAs (Figure 1B). In these EMSAs, the molar ratio of active ArcA-PE versus DNA ligand was kept above 5 (see Materials and methods section and Supplementary Method 1). The percentage of DNA bound by ArcA-PE [equal to x in Equation (7)] of wild-type and various mutant SO1661 promoters was determined by measuring the band intensity of shifted and nonshifted DNA. The fraction DNA bound data was then used to determine the relative binding energy of the mutant promoters (i. e.
Gmu –
Gwt which will be referred to as 
G hereafter) using Equation (7). These EMSA-
G values (Table 1) represent the contribution relative to wild-type of each nucleotide at a given position within the 15-bp binding sequence to the total binding energy (
G). A PEM was generated by placing the 
G values of the promoter DNA with each mutant nucleotide at the corresponding position within the 15-bp DNA motif (Table 2). The information in the matrix can be summarized as a sequence logo using the enoLOGOS program (30) (Figure 2). The results indicate that two repeating GTTA units are very important for binding ArcA-P (Figure 2). This pattern is similar in sequence but significantly different in position weighting to a consensus revealed by searching for a common motif in 11 E. coli ArcA-P interacting promoters (31). The importance of both GTTA sites may be related to the fact that the active form of ArcA-P is a dimer (32).
|
Relative binding energies determined using PBM assays (PBM-

G)PBM technology is a recently developed high-throughput method to study DNA–protein interactions (25). To test if Equation (7) can also be used to determine

G in microarray-based DNA–protein interaction measurements, SO1661 promoter microarrays were generated using a series of synthesized 48-bp promoter DNAs (Table 1). The PBM binding reactions were performed with either ArcA-PC [ArcA protein phosphorylated by carbamyol phosphate is labeled ArcA-PC (see Materials and Methods section)] or ArcA-PE. However, the results with ArcA-PE were not as reproducible, possibly due to the low efficiency of E. coli ArcB78-778 in phosphorylating the Shewanella ArcA protein (data not shown) and therefore the ArcA-PE results were not used for further analysis. The PBM results indicated that ArcA-PC exhibited varied binding affinities with different mutant promoters consistent with the results from the EMSAs (Figure 1C), while the unphosphorylated ArcA protein did not exhibit detectable DNA binding activity (data not shown).
The percentage of binding values for various mutant promoters in the PBM assays were determined indirectly by comparing their signal intensity relative to SO1661wt promoter DNA signal intensity and the percentage of SO1661wt DNA bound in an EMSA performed under identical conditions (Supplementary Method 2). The 
G of different mutant promoters relative to wild-type was then calculated using Equation (7) (Table 1). With these PBM-
G scores, an energy-based sequence logo was created for ArcA-PC using enoLOGOS (Figure 2). The sequence logos generated using 
G values determined by EMS and PBM assays are similar, suggesting the percentage of promoter binding approach can be used to estimate binding energies with either assay.
Binding energies determined using a curve-fitting method (Curve-
G)
In order to validate the binding energy values obtained by the percentage of binding approach, several mutant SO1661 promoter DNAs (48-bp synthesized DNAs) were selected and their binding constants (Kd) with the ArcA-P protein were determined by a competitive EMSA using a curve-fitting method (33). In these assays, the input ArcA-P was held constant and the binding of labeled promoters (radioligand) to ArcA-P was subjected to competition with various amounts of unlabeled promoter DNA (Supplementary Method 3). By fitting the EMS data to a sigmoidal dose response curve (Figure 3A and B), the IC50 value for SO1661wt was determined to be 234 ± 24 nM for ArcA-PC and 216 ± 21 nM for ArcA-PE, with corresponding Kd values of 154 ± 24 nM and 136 ± 21 nM, respectively (according to the formula IC50 = Kd + [radioligand]) (33) (Table 3). The Kd values of several selected mutant promoters including SO1661-15, SO1661-17, SO1661-19 and SO1661-20 were also determined using the same method with ArcA-PC (Table 3). The binding energy (Curve fitting-
G) of these promoters was then determined from the Kd values according to Equation (1) (Table 3).
|
|
Comparison of the thermodynamic parameters (

G,
G or Kd) determined using EMSA, PBM and curve-fitting methodsIn order to evaluate the relationship between the

G values determined by EMSA versus the PBM methods, the values obtained for different mutant promoters were compared by linear regression (Figure 4, solid line). The resulting Pearson correlation coefficient is 0.97 (Figure 4), indicating the results are in close agreement in terms of the trend of the 
G values, i.e. the methods give very similar results on the relative importance of a position. Consistent with this finding, the sequence logos generated using the EMSA and PBM data are also very similar (Figure 2). However, the EMSA-
G values are usually higher than the corresponding PBM-
G values as indicated by the position of the data points in Figure 4 relative to the dotted line that depicts a perfect correlation between the absolute values of 
G. A possible explanation is that the ratio of [P]0/[L]0 in the EMSAs was relatively low (5–10 : 1) and the larger 
G values may be a result of neglecting the contribution of
lnP. In addition, the EMSA values in Table 1 were determined using ArcAE and the PBM values were determined using ArcAC. To test these possibilities, several mutant promoters (48-bp synthesized DNAs) were selected (Table 3) and their interaction with ArcA-PC was determined using the same comparative EMSA but at an increased ratio of [P]0/[L]0 (
100:1), and the resulting EMSA-
G values agree more closely with the corresponding PBM-
G values (Table 3).
|
As stated earlier, the binding energy (Curve-
G) of the SO1661-16, SO1661-17, SO1661-19 and SO1661-20 mutant DNAs was determined by a curve-fitting method. As a comparison, the EMSA-
G and PBM-
G of these four mutant promoters was converted into binding energy (EMSA- or PBM-
G) according to Equation (6) (
G =
Gmu –
Gwt) (Table 3). The average difference between the Curve-
G, EMSA-
G and PBM-
G of the four selected promoters is <5.7% (Table 3). Since
G is relatively insensitive to changes in binding affinity, these
G scores were then converted into binding constants (Kd) using Equation (1) (
G = RT lnKd). The results show that the average difference of these Kd scores determined by different methods is within a 2.6-fold range (Table 3). Considering that it is not uncommon to observe 2- to 3-fold variations when determining Kd even by curve-fitting methods (1), these results suggest that
G and Kd can be reliably obtained using the comparative EMS and PBM assays described in this study.
Genome-scale prediction of ArcA-P binding sites
The EMSA results indicated that any mutant binding site with a 
G score above 1.77 kcal/mol interacted with ArcA-PE weakly (Table 1 and Figure 1B). This energy score, which yields an estimated binding constant of
1 µM, was used as the cut-off 
G value to predict the ArcA-P binding sites with a PEM based on the values in Table 2. In total, 45 ArcA-P binding sites with a 
G score below 1.77 kcal/mol were identified within the Shewanellla genome (Supplementary Table 1). Because a single binding site can be situated between divergently transcribed genes, there are 61 genes directly associated with the 45 binding sites. In a previous study, several promoters with varying binding energies were selected and their interactions with the ArcA-P protein were examined by EMSAs (24). Of the 14 promoters that contain a binding site with predicted 
G values ranging from 0.64 to 1.77 kcal/mol, 13 exhibited clear binding with ArcA-P when tested by in vitro EMSAs with radiolabeled PCR products (24). The promoter (SO3659) that bound weakly has a relatively high 
G value (1.60 kcal/mol). Of the nine promoters that contain sites with predicted 
G values above 3.09 kcal/mol, none exhibited noticeable binding with ArcA-P (24). These results suggest that the predicted binding energies are strongly associated with the ability to interact with ArcA-P. To date, microarray gene expression data is available for the genes encoded at 43 of the 45 predicted ArcA-P sites corresponding to 61 potentially regulated genes. Of these 43 sites, 27 (63%) encode genes exhibiting >2-fold regulation and 35 (81%) exhibit >1.5-fold regulation by ArcA (Supplementary Table 1). Because the microarray study only examined transcriptional regulation at a given condition and a given time, the accuracy of the site-discovery approach described in this study is likely >81% with regard to in vivo gene regulation. Thus, the number of false positive predictions appears to be low. False negatives, however, may be higher since a total of
300 genes were identified as under ArcA regulation (>2-fold regulation) in the microarray study (24).
The Shewanella ArcA protein shares high sequence identity with ArcA from several other Gram-negative bacteria, such as E. coli (81%) and H. influenzae (79%). The role of ArcA has been extensively studied in E. coli, thus providing an ideal system to examine the accuracy of the site-discovery approach described in this study. For this purpose, a genome-scale prediction ArcA-P binding sites in E. coli was performed using the same ArcA-PE-derived PEM as that used above for S. oneidensis (Table 2). Using the same 1.77 kcal/mol energy threshold, a total of 57 putative ArcA-P binding sites were identified including seven of the nine canonical ArcA-P regulated promoters (21,31) (Supplementary Table 2). Among the 57 predicted sites, 27 (47%) have been reported to encode genes exhibiting >2-fold regulation by ArcA (Supplementary Table 2). This number could increase as additional gene regulatory data becomes available. To date, footprinting assay results have been published for a total of 15 E. coli ArcA-P interacting DNAs including 14 promoters (Supplementary Table 4). For these ArcA-P footprinted DNAs, a total of 27 ArcA-P binding sites (up to two sites per footprinted DNA) were predicted using the Shewanella ArcA-PE PEM with no preset threshold (Table 2), among which 24 sites are located exactly within the ArcA-P footprinting regions and three are within a region that was not examined by footprinting or any other assays (Supplementary Table 4). For the 14 E. coli ArcA-P footprinted promoters, eight contain a site with a predicted 
G value below 1.82 kcal/mol and these promoters are all strongly regulated by ArcA-P (>5-fold regulation), including the lctPRD promoter which contains an ArcA-P site with the most favorable predicted 
G score and which also exhibits the most significant level of regulation by ArcA-P (
90- to 100-fold) (34). For the six promoters containing sites with predicted 
G values >2.95 kcal/mol, five are weakly regulated by ArcA-P (<3-fold regulation). Taken together, these results indicate a strong correlation of predicted 
G scores with the strength of ArcA-P binding and regulation.
Genome-scale predictions of H. influenzae ArcA-P binding sites were also performed using the ArcA-PE PEM (Table 2). By using the 1.77 kcal/mol cut-off 
G score used for the S. oneidensis and E. coli predictions, a total of 22 ArcA-P target binding sites were identified (Supplementary Table 3). The 22 binding sites is a somewhat smaller number of predictions than those for the S. oneidensis (45) and E. coli (57) genomes, but it is consistent with a recent microarray study where only 23 genes exhibited >2-fold regulation by ArcA in H. influenzae (35). Among these 23 genes identified in the microarray study, 12 were predicted to contain an ArcA-P binding site using the 1.77 kcal/mol threshold. Interestingly, the eight predicted ArcA-P binding sites with the most favorable energy scores (
G
1.13 kcal/mol) were all within promoter regions for genes exhibiting >2-fold regulation by ArcA-P in the microarray study (35) (Supplementary Table 3). These results, in addition to the S. oneidensis and E. coli results, suggest that false positive predictions are rare among the binding sites with favorable energy scores.
| DISCUSSION |
|---|
|
|
|---|
In this study, a simple model was used to examine binding energy in DNA–protein interactions using electrophoretic gel shift and PBM assays. With this approach, the importance of each position within the ArcA-P binding site was quantitatively established by characterizing the interaction between Shewanella ArcA-P and a series of mutant promoter DNAs, whereby each position in the binding site was systematically mutated to all possible single nucleotide changes. The results of the fine mapping were used to create a PEM that was used for a genome-scale prediction of 45 ArcA-P sites in Shewanella. A further examination suggests that this prediction is >81% consistent with in vivo gene regulation according to microarray studies and >92% (13/14) accurate in terms of published in vitro gel shift validation binding assays (24). In addition, this study predicted 27 ArcA-P sites for 15 published E. coli ArcA-P footprinted DNAs, and 24 of them were found exactly within the footprinting protected regions and the other three sites fall into the regions that were not examined by footprinting assays (Supplementary Table 4). This is the first report showing that footprinting protected regions can be effectively predicted by starting from a single known transcription factor binding site. Finally, the predicted H. influenzae ArcA-P sites correlate well with in vivo regulation determined by a microarray analysis in that the eight predicted binding sites with the most favorable

G scores all exhibit ArcA dependent gene regulation (Supplementary Table 3) (35).
As indicated earlier, the available validation data suggest the identification of binding sites using binding energies is highly accurate in terms of very few false positives but that false negatives clearly occur. There are several possible explanations for false negative predictions. One obvious contributor to false negatives is the 
G threshold chosen for the scores obtained from the genome scan. False negative predictions may also occur due to cooperative protein binding to multiple weak binding sites present in a promoter region. It has been shown that ArcA protein multimerizes upon phosphorylation and that the multimeric protein can bind to multiple sites within a promoter region (36,37).
The one-step percentage-of-binding strategy described in this study provides a rapid approach to examine binding energy in DNA–protein interactions via systematic mutation of the DNA binding site. Since most cis-regulatory sites are
6–12 bp long (38), the one-step EMSA described here provides an efficient means of generating a PEM for genome-scale site discovery. Compared with other site-discovery approaches, the method described in this study requires little previously known experimental data (only a single known binding site is necessary). Compared with the few available high-throughput methods (12–20) to measure DNA–protein binding energies, the percentage-of-binding approach represents a simple yet effective method. In addition, the application of percentage-of-binding strategy to microarray-based DNA–protein interactions could result in a low cost and high throughput genome-scale site-discovery approach for many other transcription factors.
| SUPPLEMENTARY DATA |
|---|
|
|
|---|
Supplementary data are available at NAR Online.
| ACKNOWLEDGEMENTS |
|---|
We thank Dr Hiram F. Gilbert of Baylor College of Medicine for discussions and comments on the article, and Dr Dimitris Georgellis of the Universidad Nacional Autonoma de Mexico for the gift of E. coli ArcB expressing plasmid pQE30-ArcB78-778. This research was supported by the U.S. Department of Energy under the Genomics: GTL Program through Shewanella Federation, Office of Biological and Environmental Research, Office of Science. Funding to pay the Open Access publication charges for this article was provided by Baylor College of Medicine.
Conflict of interest statement. None declared.
| REFERENCES |
|---|
|
|
|---|
- Stormo GD, Fields DS. Specificity, free energy and information content in protein-DNA interactions. Trends Biochem. Sci (1998) 23:109–113.[CrossRef][Web of Science][Medline]
- Bulyk ML. Computational prediction of transcription-factor binding site locations. Genome Biol (2003) 5:201.[CrossRef][Medline]
- Kim TH, Ren B. Genome-wide analysis of protein-DNA interactions. Annu Rev. Genomics Hum. Genet (2006) 7:81–102.[CrossRef][Web of Science][Medline]
- Tompa M, Li N, Bailey TL, Church GM, De Moor B, Eskin E, Favorov AV, Frith MC, Fu Y, Kent WJ, et al. Assessing computational tools for the discovery of transcription factor binding sites. Nat. Biotechnol (2005) 23:137–144.[CrossRef][Web of Science][Medline]
- Li N, Tompa M. Analysis of computational approaches for motif discovery. Algorithms Mol. Biol (2006) 1:8.[CrossRef][Medline]
- Tronche F, Ringeisen F, Blumenfeld M, Yaniv M, Pontoglio M. Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J. Mol. Biol (1997) 266:231–245.[CrossRef][Web of Science][Medline]
- Ren B, Robert F, Wyrick JJ, Aparicio O, Jennings EG, Simon I, Zeitlinger J, Schreiber J, Hannett N, Kanin E, et al. Genome-wide location and function of DNA binding proteins. Science (2000) 290:2306–2309.
[Abstract/Free Full Text] - Iyer VR, Horak CE, Scafe CS, Botstein D, Snyder M, Brown PO. Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF. Nature (2001) 409:533–538.[CrossRef][Web of Science][Medline]
- Kim J, Bhinge AA, Morgan XC, Iyer VR. Mapping DNA-protein interactions in large genomes by sequence tag analysis of genomic enrichment. Nat. Methods (2005) 2:47–53.[CrossRef][Web of Science][Medline]
- Johnson DS, Mortazavi A, Myers RM, Wold B. Genome-wide mapping of in vivo protein-DNA interactions. Science (2007) 316:1497–1502.
[Abstract/Free Full Text] - Audic S, Claverie JM. Visualizing the competitive recognition of TATA-boxes in vertebrate promoters. Trends Genet (1998) 14:10–11.[CrossRef][Web of Science][Medline]
- Brockman JM, Frutos AG, Corn RM. A multistep chemical modification procedure to create DNA arrays on gold surfaces for the study of protein-DNA interactions with surface plasmon resonance imaging. J. Am. Chem. Soc (1999) 121:8044–8051.[CrossRef][Web of Science]
- Hallikas O, Palin K, Sinjushina N, Rautiainen R, Partanen J, Ukkonen E, Taipale J. Genome-wide prediction of mammalian enhancers based on analysis of transcription-factor binding affinity. Cell (2006) 124:47–59.[CrossRef][Web of Science][Medline]
- Boger DL, Fink BE, Brunette SR, Tse WC, Hedrick MP. A simple, high-resolution method for establishing DNA binding affinity and sequence selectivity. J. Am. Chem. Soc (2001) 123:5878–5891.[CrossRef][Web of Science][Medline]
- Thorsen T, Maerkl SJ, Quake SR. Microfluidic large-scale integration. Science (2002) 298:580–584.
[Abstract/Free Full Text] - Maerkl SJ, Quake SR. A systems approach to measuring the binding energy landscapes of transcription factors. Science (2007) 315:233–237.
[Abstract/Free Full Text] - Zhang L, Kasif S, Cantor AC. Quantifying DNA-protein binding specificities by using oligonucleotide mass tags and mass spectroscopy. Proc. Natl Acad. Sci. USA (2007) 104:3061–3066.
[Abstract/Free Full Text] - Fields DS, Stormo GD. Quantitative DNA sequencing to determine the relative protein-DNA binding constants to multiple DNA sequences. Anal. Biochem (1994) 219:230–239.[CrossRef][Web of Science][Medline]
- Fields DS, He Y, Al-Uzri AY, Stormo GD. Quantitative specificity of the Mnt repressor. J. Mol. Biol (1997) 271:178–194.[CrossRef][Web of Science][Medline]
- Luo B, Perry DJ, Zhang L, Kharat I, Basic M, Fagan JB. Mapping sequence specific DNA-protein interactions: a versatile, quantitative method and its application to transcription factor XF1. J. Mol. Biol (1997) 266:479–492.[CrossRef][Web of Science][Medline]
- Lynch AS, Lin ECC. Responses to molecular oxygen. In: Salmonella: Cellular and Molecular Biology.—Neidhardt FC, Curtiss RIII, Ingraham JL, Lin ECC, Low KB, Magasanik B, Reznikoff WS, Riley M, Schaechter M, Umbarger HE, eds. (1996) 2nd. Washington, DC: American Society for Microbiology. 1526–1538.
- Heidelberg JF, Paulsen IT, Nelson KE, Gaidos EJ, Nelson WC, Read TD, Eisen JA, Seshadri R, Ward N, Methe B, et al. Genome sequence of the dissimilatory metal ion-reducing bacterium Shewanella oneidensis. Nat. Biotechnol (2002) 20:1118–1123.[CrossRef][Web of Science][Medline]
- Gralnick JA, Brown CT, Newman DK. Anaerobic regulation by an atypical Arc system in Shewanella oneidensis. Mol. Microbiol (2005) 56:1347–1357.[CrossRef][Web of Science][Medline]
- Gao H, Wang X, Yang ZK, Palzkill T, Zhou J. Probing the regulation of ArcA in Shewanella oneidensis MR-1 by integrated genomic analyses. BMC Genomics (2008) 9:42.[CrossRef][Medline]
- Mukherjee S, Berger MF, Jona G, Wang XS, Muzzey D, Snyder M, Young RA, Bulyk ML. Rapid analysis of the DNA-binding specificities of transcription factors with DNA microarrays. Nat. Genet (2004) 36:1331–1339.[CrossRef][Web of Science][Medline]
- Pyle AM, McSwiggen JA, Cech TR. Direct measurement of oligonucleotide substrate binding to wild-type and mutant ribozymes from Tetrahymena. Proc. Natl Acad. Sci. USA (1990) 87:8187–8191.
[Abstract/Free Full Text] - Del Carmine R, Molinari P, Sbraccia M, Ambrosio C, Costa T. Induced-fit mechanism for catecholamine binding to the beta2-adrenergic receptor. Mol. Pharmacol (2004) 66:356–363.
[Abstract/Free Full Text] - Iuchi S, Lin EC. Purification and phosphorylation of the Arc regulatory components of Escherichia coli. J. Bacteriol (1992) 174:5617–5623.
[Abstract/Free Full Text] - Georgellis D, Lynch AS, Lin EC. In vitro phosphorylation study of the arc two-component signal transduction system of. Escherichia coli. J. Bacteriol (1997) 179:5429–5435.
[Abstract/Free Full Text] - Workman CT, Yin Y, Corcoran DL, Ideker T, Stormo GD, Benos PV. enoLOGOS: a versatile web tool for energy normalized sequence logos. Nucleic Acids Res (2005) 33:W389–W392.
[Abstract/Free Full Text] - Liu X, De Wulf P. Probing the ArcA-P modulon of Escherichia coli by whole genome transcriptional analysis and sequence recognition profiling. J. Biol. Chem (2004) 279:12588–12597.
[Abstract/Free Full Text] - Toro-Roman A, Mack TR, Stock AM. Structural analysis and solution studies of the activated regulatory domain of the response regulator ArcA: a symmetric dimer mediated by the alpha4-beta5-alpha5 face. J. Mol. Biol (2005) 349:11–26.[CrossRef][Web of Science][Medline]
- Bylund DB, Murrin LC. Radioligand saturation binding experiments over large concentration ranges. Life Sci (2000) 67:2897–2911.[CrossRef][Web of Science][Medline]
- Iuchi S, Lin ECC. arcA (dye), a global regulatory gene in Escherichia coli mediating repression of enzymes in aerobic pathways. Proc. Natl Acad. Sci. USA (1988) 85:1888–1892.
[Abstract/Free Full Text] - Wong SM, Alugupalli KR, Ram S, Akerley BJ. The ArcA regulon and oxidative stress resistance in Haemophilus influenzae. Mol. Microbiol (2007) 64:1375–1390.[CrossRef][Web of Science][Medline]
- Jeong JY, Kim YJ, Cho N, Shin D, Nam TW, Ryu S, Seok YJ. Expression of ptsG encoding the major glucose transporter is regulated by ArcA in. Escherichia coli. J. Biol. Chem (2004) 279:38513–38518.
- Lynch AS, Lin EC. Transcriptional control mediated by the ArcA two-component response regulator protein of Escherichia coli: characterization of DNA binding at target promoters. J. Bacteriol (1996) 178:6238–6249.
[Abstract/Free Full Text] - Berger MF, Philippakis AA, Qureshi AM, He FS, Estep PW III, Bulyk ML. Compact, universal DNA microarrays to comprehensively determine transcription-factor binding site specificities. Nat. Biotechnol (2006) 24:1429–1435.[CrossRef][Web of Science][Medline]
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||



