ABSTRACT
A computer program, GelExplorer, which uses a new methodology for obtaining
quantitative information about electrophoresis has been developed. It provides
a straightforward, easy-to-use graphical interface, and includes a number of features which
offer significant advantages over existing methods for quantitative gel
analysis. The method uses curve fitting with a nonlinear least-squares optimization to deconvolute overlapping bands. Unlike most curve
fitting approaches, the data is treated in two dimensions, fitting all the data
across the entire width of the lane. This allows for accurate determination of
the intensities of individual, overlapping bands, and in particular allows
imperfectly shaped bands to be accurately modeled. Experiments described in
this paper demonstrate empirically that the Lorentzian lineshape reproduces the
contours of an individual gel band and provides a better model than the
Gaussian function for curve fitting of electrophoresis bands. Results from
several fitting applications are presented and a discussion of the sources and
magnitudes of uncertainties in the results is included. Finally, the method is
applied to the quantitative analysis of a hydroxyl radical footprint titration
experiment to obtain the free energy of binding of the [lambda] repressor protein to the OR1 operator DNA sequence.
Chromatographic techniques, such as HPLC and FPLC, are invaluable not only
because of the relative ease with which separation of substances can be
achieved, but also because of established methods for quantifying amounts of
resolved material. In biochemistry and molecular biology, gel electrophoresis
is employed for the separation of proteins and nucleic acids. Methods available
for quantitation of electrophoresis data require a digital image of the gel,
obtained via phosphorimagery or densitometry, followed by analysis of the digital image to
obtain positions and intensities of bands in the gel. However, the approaches
which have been used most often to quantify electrophoresis results have
suffered from a variety of limitations. As a result, no standard for high-resolution, quantitative analysis of electrophoretograms has been adopted.
In order to realize the full potential of the electrophoresis technique, an
easy, reliable method for quantitative analysis is needed.
The simplest quantitative approach involves the integration of intensity in a
spot or rectangle drawn on the gel image. While this technique has been
successfully applied in numerous experiments [e.g., refs (1
-3
)], it can only be used to determine individual band intensity in cases for
which bands are extremely well-resolved. As a result, this approach is often limited to the integration
of a group of closely spaced bands.
A second type of approach is generally applied to an average peak profile, or
linegraph, representation of the data. It involves simple integration of peak
area between selected boundaries, chosen to be the minima between adjacent
peaks (4
-6
). This approach is also limited to cases in which peaks are extremely well-separated, or when the area under several adjacent peaks is sought. For
overlapping bands, the determination of minima to divide the peaks is
unreliable and the technique is therefore unable to quantify individual band
intensities with accuracy.
Because most electrophoresis data consist of a series of overlapping bands,
meaningful quantitative results can only be obtained for individual bands if
the contribution to the area of each peak by its neighbors is accounted for. To
address this point, a number of programs have been described which model each
band in an average linegraph peak profile with an analytic function and use
least-squares optimization to fit the data (6
-11
). The fitted curves are then integrated to provide information about bands in
the data. The Gaussian function has been the most commonly used function in
this type of approach (12
-15
). However, the Gaussian, which is characterized by low intensity in the tailing
regions of the function, may not be an accurate model for electrophoresis
bandshapes (16
). While the Gaussian may fit the top half of the band profile, in cases for
which the tails of gel peaks are visible, the data are much wider than the
Gaussian. Some authors have suggested that electrophoresis bands might be
better modeled by asymmetric functions (17
,18
) and/or functions having broader tailing regions than the Gaussian (19
).
Many of the approaches which have used analytic functions for band fitting
require subtraction from the raw data of a background, which increases towards
the top of a gel lane, in order to produce a good fit to the data (6
,19
). Such a procedure has the unfortunate effect of subtracting off the tailing
regions of the bands. Other methods have also been used to determine an
appropriate background subtraction of the raw data for analysis (17
,18
). However, in many cases the method for determination of a background, to allow
for valid comparisons between lanes or peaks in a lane, is not clear.
We have developed a new computer program, GelExplorer, to accomplish
quantitative analysis of electrophoresis data. It provides a straightforward,
easy to use approach, and includes a number of features which offer significant
advantages over existing methods for quantitative gel analysis. Our method uses
curve fitting with a nonlinear least-squares optimization to deconvolute overlapping bands. Unlike most curve
fitting approaches, however, we treat the data in two dimensions, fitting all
the data across the entire width of the lane. This allows for accurate
determination of the intensities of individual, overlapping bands, and in
particular allows imperfectly shaped bands to be accurately modeled.
In this paper we describe experiments which demonstrate empirically that the
Lorentzian lineshape reproduces the contours of an individual gel band and
provides a better model than the Gaussian function for curve fitting of
electrophoresis bands. We introduce the strategy employed by GelExplorer for
curve fitting analysis. Results from several fitting applications are presented
and a discussion of the sources and magnitudes of uncertainties in the results
is included. Finally, the method is applied to the quantitative analysis of a
hydroxyl radical footprint titration experiment to obtain the free energy of
binding of the [lambda] repressor protein to the OR1 operator DNA sequence.
A gel image with a single, isolated band was obtained using the following
procedures. The plasmid pUC18 was amplified in the DH5[alpha] strain of Escherichia coli, isolated using the alkaline lysis method, and purified by ultracentrifugation
through a cesium chloride gradient (20
). The DNA was 3'-radiolabeled by standard methods (20
) at the BamHI site using [[alpha]-32P]dGTP (Amersham, Arlington Heights, IL) and ddATP. After a second cut at the PvuII site, the desired 123 bp fragment was separated from the 199 bp fragment on
an 8% native polyacrylamide gel, and recovered using the `crush and soak'
method (21
). A sample of the 123 bp fragment (100 d.p.m./lane) was ethanol precipitated,
rinsed, and lyophilized. The pellet was dissolved in formamide loading dye,
heated at 90oC for 5 min, and loaded onto a 6% denaturing polyacrylamide sequencing gel
[acrylamide:bisacrylamide ratio of 19:1]. After electrophoresis, the gel was
dried onto filter paper and exposed to an imaging phosphor plate for 36 h.
The construction of the pUC18 plasmid containing an A5N5 insert has been described previously (22
). A 260 bp AccI-PvuII restriction fragment, 3'-radiolabeled at the AccI site by standard methods (20
) with [[alpha]-32P]dCTP (Amersham), was used for hydroxyl radical cleavage reactions (23
,24
). Each 100 [mu]l hydroxyl radical cleavage reaction involved treatment of radiolabeled DNA
(10 000 d.p.m.) and the following final reagent concentrations: 10 mM Tris-HCl (pH 8), 10 mM NaCl, 50/100 [mu]M Fe(II)/EDTA, 0.3% H2O2, and 1 mM sodium ascorbate. The reaction was stopped after 2 min by the
addition of 100 [mu]l of a solution of 13.5 mM thiourea, 13.5 mM EDTA, and 0.6 M sodium
ascorbate. DNA was precipitated, rinsed, and lyophilized. The pellet was
dissolved in formamide loading dye, heated at 90oC for 5 min, and loaded onto an 8% denaturing polyacrylamide sequencing gel
[acrylamide:bisacrylamide ratio of 19:1]. After electrophoresis, the gel was
dried onto filter paper and exposed to an imaging phosphor plate for 48 h.
The [lambda] cI repressor used in these studies was the generous gift of Professor
Gary Ackers and was prepared as previously described (1
). A stock solution of 6.92 [mu]M protein in storage buffer [10 mM Tris (pH 8.0), 0.2 M KCl, 2 mM CaCl2, 0.1 mM DTT, 0.1 mM EDTA, 5% glycerol] was made and stored at -70oC. Dilutions in a 5:8 ratio were made starting with 10 [mu]l stock plus 6 [mu]l dilution buffer [10 mM Tris (pH 8.0), 200 mM KCl, 2 mM
CaCl2, 0.1 mM EDTA, 5% glycerol]. Subsequent 5:8 dilutions were made from each
protein dilution. A DNA binding activity of 80% and a dimer dissociation
constant of 27.7 nM (25
) was used to calculate the concentration of [lambda] repressor dimer present in solution. Total monomer concentrations were
corrected for the reduced activity prior to calculation of the dimer
concentration (26
).
A 31 bp insert containing the OR1 binding site was cloned into pUC18 at the PstI restriction site. The plasmid was amplified in the DH5[alpha] strain of E.coli, isolated using the alkaline lysis method, purified by ultracentrifugation
through a cesium chloride gradient (20
), and stored in TE buffer at -20oC. A 231 bp EcoRI/BglII restriction fragment was 3'-radiolabeled at the EcoRI site according to published methods, using [[alpha]-32P]dATP and [[alpha]-32P]dTTP (Amersham) to `fill in' the site, followed by a `cold chase' of dNTPs (27
). The desired labeled fragment was gel-purified and isolated by overnight `crush and soak' treatment at 4oC (21
).
The DNA-repressor binding reaction was performed as follows. Each 35 [mu]l binding reaction mixture contained 3.5 [mu]l binding buffer (0.1 M HEPES buffer (pH 7.0), 0.5 M KCl, 10 mM
CaCl2, and 0.1 mM EDTA), 2 [mu]l calf thymus DNA (0.1 mg/ml), 14.5 [mu]l TE buffer, 5 [mu]l radiolabeled DNA (~27 000 c.p.m. total), and 5 [mu]l [lambda] repressor of appropriate concentration. The
binding reactions were allowed to come to equilibrium in a water bath at 22oC for 30 min. Hydroxyl radical footprinting (28
) involved addition of 5 [mu]l each of 2/4 mM Fe(II)/EDTA, 10 mM sodium ascorbate, and 0.3% H2O2. The Fe(II)/EDTA and H2O2 solutions were made fresh; the ascorbate solution was stored at -20oC. The footprinting reaction was stopped after 1 min by addition of
a stop solution to give final concentrations of 7.5 mM thiourea and 0.3 M
NaOAc. DNA was precipitated, rinsed, and lyophilized. The pellet was dissolved
in 3 [mu]l of formamide loading dye, heated at 90oC for 5 min, and loaded on a 10% denaturing polyacrylamide sequencing
gel [acrylamide:bisacrylamide ratio of 19:1]. Each 6 mm-wide well in the gel was separated by 3 mm to maximize separation of the
lanes of data for fitting. After electrophoresis, the gel was transferred to
Whatman filter paper, dried, and exposed to an imaging phosphor plate for >9
days. The long exposure time was necessary to maximize the signal-to-noise ratio for curve fitting analysis.
For each of the above data sets, the exposed imaging phosphor plate (Molecular
Dynamics, Sunnyvale, CA) was scanned with a Model 400E PhosphorImager
(Molecular Dynamics). An image of each lane for curve fitting was cropped from
the gel image using the ImageQuanttm software package.
GelExplorer, the software package developed in our laboratory for quantitative
analysis of electrophoresis data, utilizes the IRIS Explorertm (version 3.0) programming environment (29
) for data visualization and analysis. Nonlinear least-squares fits to the data utilize the Levenberg-Marquardt algorithm as
coded in Numerical Recipes in C (30
). The fitting routine has been adapted to output confidence limits (one
standard deviation) and a correlation matrix of the parameters from the
covariance matrix calculated in the fitting algorithm. The code for these
additions was adapted from GnuPlot fit.c.
Fitting experiments were performed on a Silicon Graphics Indigo R3000 (33 MHz)
or Indigo2 R4400 (200 MHz) workstation; an average fit took ~12 h or 2 h, respectively (75-80 slices, 60-70 peaks/slice).
Binding isotherms obtained from footprint titration data were fit using NONLIN,
a program for non-linear least-squares analysis (31
). The free energy of [lambda] repressor dimer binding to the OR1 site was determined according to published methods (27
) with modifications described below.
The use of curve fitting to obtain reliable quantitative information about the
intensities of bands in a lane of electrophoresis data requires that the
modeling function be an accurate representation of the bandshape of the data.
Because most electrophoresis data are a series of bands with overlapping
intensities, the true shape of a single band can be difficult to determine. To
overcome this difficulty, we produced a lane of electrophoresis data containing
a single band. The image of this band is shown in Figure 1
a. This band was modeled using our quantitative two-dimensional curve fitting approach (vide infra) with Gaussian and Lorentzian lineshapes. The fits are compared to the
linegraph of the band in Figure 1
b and c. The fitting results clearly show that the Lorentzian lineshape is a
better model for the data than the Gaussian. Our results are consistent with
other studies which have found that functions having greater intensity in the
`tailing' regions of the peak provide a better approximation of electrophoresis
band intensity than the Gaussian (19
). While the peak in Figure 1
demonstrates some asymmetry, the symmetric Lorentzian function is a very good
approximation of the peak intensity. In tests using a wide variety of
electrophoretic data, we have found that gel bands are reproduced very well by
the Lorentzian function, without the inclusion of an additional parameter
allowing for peak asymmetry.
The GelExplorer program uses curve fitting to obtain a quantitative description
of gel electrophoresis data, one lane at a time. The bands in a gel lane are
deconvoluted by simultaneously fitting a set of Lorentzian lineshapes to each
band in the lane. The optimized Lorentzians provide accurate information about
the integrated intensity and position of each band in the lane.
Each lane is treated as a two-dimensional image of pixels. The data are analyzed as a set of neighboring
slices. Each slice is one pixel wide and extends the length of the lane,
parallel to the direction of electrophoresis. Each peak in each slice is
modeled with a Lorentzian function. Nonlinear least-squares optimization to the data is performed separately for each slice of data in a lane. Because each slice is optimized separately,
variations in the bandshape across the lane are reproduced and a detailed
description of all data present in the lane is obtained.
Here we briefly describe the steps involved in the use of GelExplorer for
quantitative analysis of electrophoresis data. GelExplorer runs under the IRIS
Explorertm programming environment. IRIS Explorertm was first implemented on Silicon Graphics workstations, and has
since been ported to Sun, Hewlett Packard, IBM, DEC, and Cray computers.
Individual modules in IRIS Explorertm are linked together to perform specific program functions with an easy-to-use, graphical interface.
Quantitative analysis of electrophoresis data first requires a digital image of
each lane of a gel. We have found that phosphorimager-generated data is superior to densitometer data because of the larger
dynamic range and because long exposures of phosphorimager plates allow for
increased signal-to-noise ratios of the image. A constant background is subtracted from
the entire image to account for the plate (or film) background. The average
pixel intensity from a gel region without any data serves as the background
value.
Next, the region of a given lane to be fit is defined. The top and bottom
boundaries define the least- and best-resolved bands in the lane which will be modeled, respectively. The
left and right boundaries define how much of the width of the lane will be fit
(how many pixel slices will be included). The criteria which determine the
choice of these boundaries will be described in further detail below.
The Lorentzian function used to model each peak in a lane of electrophoresis
data is given by equation 1, where C = amplitude, x = position, and [gamma] = full-width at half-height.
y ( x ) = {{C gamma} over {{{( x - x} sub o} {) sup 2} + {{gamma sup 2} over 4}}}1
Starting values for three parameters must be specified for each Lorentzian to be
included in the fit. Peaks are specified and positions are chosen by clicking
on the image of the data at each band position at which a Lorentzian lineshape
should be modeled. Widths are given a default value of 20 (pixels) and starting
amplitudes are guessed automatically in each slice so that the height of the starting Lorentzian matches the pixel intensity of
the data at the center designated for the Lorentzian. Thus, the standard set of
starting parameters defines a set of peaks; each peak has the same starting
width and position in all slices, but has a different amplitude in each slice
(to account for variability in the band over the width of the lane). All
starting parameters can be conveniently edited. Nonlinear least-squares optimization to the data is performed separately for each slice of
data in a lane. The criteria for convergence are defined by the user such that
the [chi]2 function of the optimization changes by less than a specified value (the
tolerance) for n specified iterations.
Figure 2
a highlights a single slice within an image of a gel lane to be modeled by a
series of Lorentzian curves. Linegraphs depicting the total fit to the slice
and the individual optimized Lorentzian contributions to the fit are shown in
Figure 2
b and c, respectively. In a full fitting analysis of the lane, each slice will
be modeled by such a sum of Lorentzian contributions.
GelExplorer has been applied to quantify the intensities of bands produced by
hydroxyl radical treatment of a restriction fragment containing four phased A
tract [A5TG3C] sequences (22
). We have chosen this DNA molecule as a test case because A tracts have
characteristic hydroxyl radical cleavage patterns (32
). Having four repeats of the same sequence in a DNA molecule allows for
evaluation of the consistency and reproducibility of the curve fitting
procedure to model the experimental cleavage pattern. Shown in Figure 5
a is an image of the background-subtracted data. The hydroxyl radical cleavage pattern, which reflects
structural variations in the A tracts (22
), shows a repeating sinusoidal pattern. There is an apparent increase in
overall intensity towards the top of the gel lane, which is often attributed to
an unspecified `background' in hydroxyl radical cleavage data. GelExplorer
fitting was undertaken to determine the source of the signal variations over
the length of the lane. Figure 5
b shows the image of the fit, generated from the sum of 70 optimized Lorentzian
curves. Linegraphs comparing the average data to the average fit (across the
width of all slices in the fit) are shown in Figure 6
a. A subset of the Lorentzian contributions to the total fit are highlighted in
Figure 6
b. The fitting procedures reproduce the contours of the data extremely well. The
amplitudes for the peaks in the four A5N5 sequences are fairly consistent over the length of the lane. This is
demonstrated by a histogram plot of the fitted amplitudes in Figure 6
c. While the individual Lorentzians have somewhat higher peak heights at the top
of the lane, the widths of the peaks also decrease towards the top of the lane,
and, as a result, the amplitudes are relatively invariant. Thus, the increase
in total intensity towards the top of the lane (Fig. 6
a) is a result of overlapping bands which are less well-resolved than those at the bottom of the lane, not because of a change in the background or in intensities of bands. This example
shows how deconvolution of gel bands by curve fitting allows quantitative
comparison of cleavage at nucleotides throughout a DNA molecule.
Figure
Figure
There are several sources of uncertainty in the fitting method. For example, the
error bars in Figure 6
c reflect the uncertainty in the amplitude parameter introduced by the fitting
procedure itself. This uncertainty is generally <= 1% of the amplitude value for a given peak, with uncertainties up to ~2% in the least well-resolved peaks in the fit. However, the uncertainty in the fitted
amplitudes determined by the fitting procedure is not the only source of
uncertainty in the quantitative results. Other factors which contribute to
uncertainty include the choice of baseline subtraction, the boundaries chosen
for the fit, the dependence of the fit on starting parameters, and the
convergence criteria. We have performed a series of fitting experiments on
several sets of data to determine the magnitude of uncertainty introduced by
the method.
The background subtracted from the raw data reflects only the imaging phosphor
plate background. GelExplorer will, however, successfully fit both raw data and
data from which more than the plate background has been subtracted. The fitting
results vary as expected: lower amplitude values are obtained for fits to data
with higher background subtractions. As a result, it is important to use the
same criteria for background-subtraction for all fitting procedures. For comparisons of different lanes
within the same gel, identical background values are subtracted and the
background subtraction does not contribute uncertainty to comparisons between
these lanes.
The top and bottom boundaries of the fit define which peaks will be included in
the fit. The bottom boundary is chosen so that the fits include the best-resolved band in the lane. The top boundary of the fit is limited by the
fact that, in a single slice of data, the valleys between poorly-resolved peaks are often not well-defined. Thus, at the top boundary of the fit, an artificial
endpoint must be imposed. The top-most peaks defined by the boundary will have intensity contributions from
peaks above them in the lane which are not modeled by the fit. As a result, the
amplitudes of the top-most peaks in the fit are not an accurate reflection of the intensities of
those bands. Comparison of a 70 peak fit to a 65 peak fit for the A-tract cleavage pattern in Figure 6
a reveals that the top four peaks in the 65 peak fit have amplitudes which
differ from the analogous peaks in the 70 peak fit by more than the
uncertainties determined by the fits. All other differences between the two
fits are less than the uncertainty in the fit. For this reason, one must always
fit beyond the peaks in the lane which are of interest for quantitative
analysis. It is our practice to fit at least five peaks beyond (preferably 10 peaks beyond) the region for which reliable fit
parameters are sought. Further, peaks at the top of the fit are not always well
behaved because of the artificially-defined boundary. Fixing the Lorentzian widths and positions at reasonable
values for the top three peaks in a fit solves this problem.
The left and right boundaries of the fit define the width of the lane (in pixel
slices) over which the fit will be performed. The pixel-width of images obtained for different lanes (even within the same gel)
can differ because of variations in the shapes of wells or in the amount of
salt in a lane. In order to compare all the data in one lane to all the data in
another, then, it is necessary to fit from one edge of the lane to the other.
The left and right boundaries of our fits were chosen on the basis that, in a
single slice at the left or right extreme of the lane, the peak shapes in the
data must be visibly discernible above the noise level. In practice,
GelExplorer has difficulty converging if peaks are not obviously above the
noise. At the edges of a lane, the intensities of bands drop off gradually, and
at the boundaries chosen by our criteria, the fitted amplitudes of peaks are
approximately one-half of the maximum observed for peaks in slices in the central part of
the lane. To appropriately compare fitted amplitudes of bands in different
lanes it is important to compare the sum of amplitudes for a given peak over
all slices in the fit. Average amplitudes are less appropriate for comparison
because the weight accorded to the data in a particular slice depends on how
many slices are included.
For comparisons of peak amplitudes within a lane, it is important that the lane
of data be approximately the same width over its entire length. While most
lanes are somewhat wider at the top than in the most well-resolved region, the difference must be minimal to ensure that all of the
data in each peak in the lane is being reflected in the fit results.
In some cases, the extreme right or left slice in the fit did not result in a
well-behaved fit (e.g., peaks had unreasonable widths) and had to be excluded
from the total summed result. Further, it may be possible to apply the above
criteria for choice of the right and left fit boundaries and arrive at a
slightly different choice of limits. We have found that omission of one slice
on either the left or the right of the lane introduces an average variation in
the summed amplitudes (over all peaks in a lane) of 0.6-1%, depending on the lane tested. A conservative estimate of the degree
to which the choice of left and right pixel boundaries might be different is
four slices (two on each edge). This introduced a variation in the resultant
amplitudes of 2.6-3.8%, depending on the lane tested.
Different sets of reasonable starting parameters (e.g., positions chosen several
times, different starting widths) were used for a series of fits and the fitted
amplitudes were compared. Generally, the variation in amplitude was <0.5%, with a few peaks in each fit varying as much as 1-2%. The highest variations were generally observed for the least well-resolved peaks in the lane.
Convergence is defined such that the [chi]2 function of the optimization changes by less than the tolerance for n
successive iterations. Our fitting experiments have shown that the results are
virtually independent of the tolerance, and that most large changes in
parameter values occur within three iterations. All fits were thus performed
with a tolerance of 0.1 and three iterations. Since identical criteria are
applied, we have assumed the convergence criteria do not contribute significant
uncertainty to comparisons between lanes.
In reporting optimized peak amplitudes we have added, to the uncertainty
calculated by the fitting routine, additional uncertainties due to the choice
of fitting boundaries of 3% and the choice of starting parameters of 1% to the
fit uncertainties. These errors are carried through in analyses which utilize
the fitting results.
The uncertainty in the optimized peak positions is <0.03% as calculated by the fit and varies by <0.05% for fits performed with different starting parameters. Values for peak
widths are related to amplitudes and thus vary in a similar manner.
One of the most important applications requiring quantitative analysis of
electrophoresis band intensities is the determination of a protein-DNA binding constant from a footprint titration experiment (1
). In this experiment, a DNA-bound protein protects the backbone of the DNA from cleavage by hydroxyl
radical (33
), or other cleaving agent (1
). A series of reactions are performed in which protein concentrations are
varied systematically. The relative amount of cleavage at a particular
nucleotide position is a measure of the fraction of DNA molecules that do not
have protein bound. The protein concentration-dependent protection can be analyzed to obtain thermodynamic information
for protein binding (1
). We have generated a hydroxyl radical footprint titration of the [lambda] cI repressor bound to the OR1 operator sequence. This is a very well-understood system (1
,26
-28
,34
,35
) and provides a clear demonstration of the utility of GelExplorer for
quantitative analysis.
The titration experiment was conducted over 22 protein concentration points.
Images of the data for the hydroxyl radical reference lane (reaction without
protein) and the hydroxyl radical footprint lane containing the highest
concentration of repressor are shown in Figure 7
a and c, respectively. Figure 7
b and d show the images of the 60-peak fits to these data, respectively. The average linegraphs of the data
and fit (across the width of the lane) for the reference and footprint data are
shown in Figure 7
e and f, respectively. The footprint shows three regions of protection by the
protein, labeled a', b' and c' (28
). The regions a' and b' correspond to the edges of the major groove within which the
repressor is thought to make sequence specific contacts with the DNA bases of
the operator (36
-39
). The exact nature of the protection in the c' region, which is across the minor groove from the main binding region of
the protein, has not been elucidated. The fitting procedure clearly reproduces
the data very well. The peak numbers used in the fit, which correspond to
sequences which show protections, are summarized in Table 1
.
Figure
In a thermodynamic analysis of protein binding, the amplitudes of gel bands
serve as a reflection of relative rates of cleavage (or relative degrees of
protection). The fitted amplitudes in each lane must be normalized before
reliable comparisons of peak intensities can be made. The amplitudes for a set
of 18 peaks (10 peaks below the footprint and 8 peaks above the footprint) in
each lane were summed. These peaks were chosen on the basis that they showed
very little variation over the series of 22 lanes. For each lane, the amplitude
of each peak was multiplied by a factor such that the summed amplitudes for the
normalizing peaks had the same value as that for the reference lane.
The normalized amplitudes were converted to fractional protection (pi) according to equation 2, where AN(n,site) is the normalized amplitude of nucleotide n from a lane containing protein, and AN(n, ref) is the normalized amplitude of nucleotide n from the reference lane.
{p sub i} = 1 ^ - ^ {{{roman {A sub N}} ( n {roman {, s i t e )}}} over {{roman {A sub N}}
( n {roman {, r e f )}}}}2
Fractional protections were converted to fractional saturations Y as has been described for other footprint titration analyses (1
,27
).
The relationship between the fractional protection at a given nucleotide and the
protein concentration is the binding isotherm. Protein binding constants are
obtained by fitting the Langmuir expression, given in equation 3, to each nucleotide's binding isotherm. The microscopic equilibrium binding
constant is k and [P] is the concentration of unliganded protein, active to bind DNA.
Y = {{k {roman {[ P ]}}} over {1 + k {roman {[ P ]}}}}3
In contrast to analyses which quantify protections by integrating the intensity
of bands in a rectangle drawn around the entire binding site, thereby obtaining
a single binding constant for the entire site, our approach provides
quantitative binding isotherm curves for each nucleotide individually.
Representative fits to binding isotherms for footprinted nucleotide positions
are shown in Figure 8
. There are 12 positions, including all of the positions previously reported to
show protection from hydroxyl radical cleavage (28
), having binding isotherms which could be fit to obtain binding constants and
free energy of binding ([Delta]G). The results for the different nucleotide positions range from [Delta]G = -11.4 +- 0.3 to -12.7 +- 0.4 kcal/mol and are summarized in
Table 1
. These results are in good agreement with the previously reported value of -12.6 kcal/mol for this system (1
). An analysis was also performed for the sum of all the normalized amplitudes
of peaks within the footprint region (peaks 23-45). Fits to the single isotherm gave [Delta]G = -11.6 +- 0.2 kcal/mol. For individual nucleotides, no
systematic variations were observed in [Delta]G values for the different regions of the footprint in our data. In
particular, nucleotides in footprint region c' exhibit titration behavior similar to that of the main portion of the
footprint (see Table 1
).
Figure
Table 1
In this paper, we have described GelExplorer, which uses a new methodology for
obtaining quantitative information about electrophoretic band intensities. The
program uses a novel two-dimensional curve fitting approach to deconvolute band intensities and to
account for variations across the width of a lane of electrophoresis data. The
Lorentzian lineshape has been demonstrated to successfully model
electrophoresis bandshapes and is appropriate for use in curve fitting
analysis. Because reasonably close initial parameter values are of vital
importance for a successful nonlinear least-squares optimization, the program is designed to provide an excellent set
of starting parameters.
High quality data are required for successful fitting by GelExplorer. In
particular, the signal-to-noise ratio for the data must be very good because curves are
optimized to a single slice of data for which no averaging or smoothing has
been applied to reduce noise. As a result, faint bands, which can be difficult
to fit, may require additional criteria in the choice of starting Lorentzian
parameter values or fixing of some parameters. Further, lanes must be
reasonably straight, as curved lanes are not easily treated. The fitting method
is generally not limited by the resolution or separation of bands, and is not
limited by variability in band shape or lane width.
Because GelExplorer consists of a set of modules linked together in maps, it is
very flexible. It is currently equipped to read images generated by ImageQuanttm software. Expansion or adaptation of the program for use with images
from other programs, for special applications, or for additional analysis of
fitting results, is easily accomplished by the user within the IRIS Explorertm programming environment.
GelExplorer includes a calculation of uncertainties in the fit parameter
outputs, allowing the reliability of the fitting results to be evaluated. In
addition, other sources of uncertainty in the fitting results have been
evaluated. The overall uncertainty in the fit results is very small, yielding
well-determined values for peak intensities and positions. The uncertainties
reported here have been evaluated for fitting of relatively low percent (6-10%) polyacrylamide sequencing gel data. Fits to other types of
electrophoresis data will require additional experiments to evaluate
uncertainties in the results.
The application of the program to the determination of free energy of binding of
[lambda] repressor to the OR1 binding site has demonstrated the utility of GelExplorer for quantitative
analysis. The values for [Delta]G obtained from our analysis are in very good agreement with those
previously reported. The observed differences likely result from lower protein
activity and/or concentration than those used in the analysis presented here,
which is likely given the age of the protein sample (>6 years). For example, a
decrease in the concentration of protein active to bind DNA would result in a
more negative value of [Delta]G at each nucleotide. Importantly, the analysis demonstrates the level of
detail which can be reliably obtained from this type of analysis. In
particular, the protection pattern in region c' of the footprint has titration behavior similar to the other regions of
the footprint, which demonstrates that the observed c' protections are related to the same protein binding event as causes
protections in regions a' and b'. These results are possible only as a result of high resolution,
quantitative curve fitting analysis. Details such as these, at the level of
individual nucleotide binding, will provide further insight into protein-DNA
binding events. We expect that the methodology employed by GelExplorer will
prove successful in the analysis of a wide variety of problems requiring
quantitative analysis of electrophoresis data.
GelExplorer software is available upon request from Prof. Tom Tullius by
anonymous file transfer protocol (FTP). For users from academic institutions
there is no charge to obtain the program. However, users must be licensed to
use IRIS Explorertm version 3.0 (29
). Detailed instructions for the use of GelExplorer are described in an on-line manual, which is available at http://dna.chm.jhu.edu.
This research was supported by PHS National Research Service Award Grant 5 F32
GM 16828-02 (S.E.S.) and by PHS grant GM 40894 (T.D.T). We gratefully acknowledge
the use of phosphorimagery instrumentation maintained by the Institute for Biophysical Research on
Macromolecular Assemblies at Johns Hopkins, which was supported by an NSF
Biological Research Centers Award (DIR-8721059) and by a grant from the W.M. Keck Foundation. We thank Ruth M.
Ganunis for generation of the A tract hydroxyl radical cleavage electrophoresis
data, Lori M. Ottinger for construction and purification of plasmid DNA
containing the OR1 binding site, Prof. Gary Ackers for the [lambda] repressor protein, and Dr Michael Brenowitz for the NONLIN fitting
program and instruction in its use. We also appreciate help from John Chandler,
Computer Science Department, Oklahoma State University, in implementing the
least-squares error analysis in GelExplorer.
Lorentzian lineshapes are intrinsic to autoradiographic detection
Jeremy M. Berg
Department of Biophysics and Biophysical Chemistry, The Johns Hopkins University School of Medicine, Baltimore, MD 21205, USA
Consider the detection of a point radiation source using autoradiography. The source will emit radiation in all directions with equal probability. For the emission and detection of each photon, the geometrical arrangement shown below applies:
Here, d is the distance between the source and the image plate or film (hereafter, the term film will be used), [theta] is the angle between the normal to the film and the emitted radiation, and x is the distance from the point directly above the source to the point at which the radiation strikes the film. The quantities are related by x = dtan[theta].
Since emission at any angle [theta] is equally likely but x increases more rapidly at larger values of [theta], the density of detected radiation as a function of x will be proportional to the inverse of the rate of change of x with respect to [theta].{{d x} over {d theta}} = {d over {{cos sup 2} theta}}B u t , ^ cos theta = {d over sqrt {{x sup 2} + {d sup 2}}} ^ s o ^ t h a t {{^ cos} sup 2} theta = {{d sup 2} over {{x sup 2} + {d sup 2}}}T h u s , ^ {{d x} over {d theta}} = {{{x sup 2} + {d sup 2}} over d}
and the density on the film will be give byrho + N {1 over {{d x} over {d theta}}} = N {d over {{x sup 2} + {d sup 2}}} w h e r e ^ N ^ i s ^ a ^ s c a l e ^ f a c t o r .
Thus, [rho], the density of radiation detection as a function of x, will be Lorentzian with full width at half height of 2d, corresponding to C =N/2 and [gamma] = 2d in equation (
*To whom correspondence should be addressed. Tel: +1 410 516 7449; Fax: +1 410
516 8468; Email: tom@radical.chm.jhu.edu
Present addresses: +Department of Chemistry, Boise State University, Boise, ID 83725, USA, [sect]Stratus Computer, Vienna, VA 22182, USA, [Dagger]Department of Chemistry, SUNY Geneseo, Geneseo NY 14454, USA and [para]Molecular Dynamics, Sunnyvale, CA 94086, USA




Sequence Footprint Peakb [Delta]G labela region (kcal/mol)c T* c' 23 -11.7 +- 0.2 A* 24 -12.0 +- 0.2 G* a' 32 -11.4 +- 0.3 A* 33 -12.5 +- 0.3 C* 34 -12.7 +- 0.4 C* 35 -11.5 +- 0.2 G 36 -11.8 +- 0.2 C 37 -11.6 +- 0.2 A* b' 42 -11.5 +- 0.3 T* 43 -11.6 +- 0.2 T* 44 -11.7 +- 0.2 A 45 -12.3 +- 0.2 REFERENCES
Return

