Nucleic Acids Research Advance Access originally published online on February 24, 2008
Nucleic Acids Research 2008 36(7):2379-2394; doi:10.1093/nar/gkn082
Nucleic Acids Research, 2008, Vol. 36, No. 7 2379-2394
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Towards a molecular dynamics consensus view of B-DNA flexibility
Alberto Pérez1,2,3,
Filip Lankas4,5,
F. Javier Luque3 and
Modesto Orozco1,2,6,7,*
1Joint IRB-BSC Program on Computational Biology, Institute of Research in Biomedicine, Parc Científic de Barcelona, Josep Samitier 1-5, Barcelona 08028, 2Barcelona Supercomputing Centre, Jordi Girona 31, Edifici Torre Girona. Barcelona 08034, 3Departament de Fisicoquímica, Facultat de Farmàcia, Avgda Diagonal sn, Barcelona 08028, Spain, 4Laboratory for Computation and Visualization in Mathematics and Mechanics, Swiss Federal Institute of Technology (EPFL), CH-1015 Lausanne, Switzerland, 5Centre for Complex Molecular Systems and Biomolecues, Institute of Organic Chemistry and Biochemistry Flemingovo nam. 2, 166 10 Praha 6, Czech Republic, 6National Institute of Bioinformatics, Parc Científic de Barcelona, Josep Samitier 1-5 and 7Departament de Bioquímica, Facultat de Biología, Avgda Diagonal 647, Barcelona 08028, Spain
*To whom correspondence should be addressed. Tel: +34 93 403 71 55; Fax: +34 93 403 71 57; Email: modesto{at}mmb.pcb.ub.es
Received December 3, 2007. Revised February 7, 2008. Accepted February 8, 2008.
 |
ABSTRACT
|
|---|
We present a systematic study of B-DNA flexibility in aqueous
solution using long-scale molecular dynamics simulations with
the two more recent versions of nucleic acids force fields (CHARMM27
and parmbsc0) using four long duplexes designed to contain several
copies of each individual base pair step. Our study highlights
some differences between pambsc0 and CHARMM27 families of simulations,
but also extensive agreement in the representation of DNA flexibility.
We also performed additional simulations with the older AMBER
force fields parm94 and parm99, corrected for non-canonical
backbone flips. Taken together, the results allow us to draw
for the first time a consensus molecular dynamics picture of
B-DNA flexibility.
 |
INTRODUCTION
|
|---|
Sequencing programs (
1–6) have increased dramatically
our knowledge on the primary structure of nucleic acids and
proteins for different species (
6–8) and even for entire
ecosystems (
9). However, the intrinsic limitations of the sequencing
projects became clear when researchers realized that the mechanisms
allowing the supramolecular organization of the genome and the
control of its expression were not directly coded in the sequence,
but depend on the chromatin structure and flexibility (
6). For
example, we and others have found that promoter and regulatory
regions display unusual physical properties, both in prokaryotes
(
10,
11) and eukaryotes (
12–19). Furthermore, nucleosome
condensation of DNA, which is crucial for gene regulation is
tightly related to DNA structure and flexibility, and unusual
DNA structures are known to play a key role in DNA recombination
(
20–24). Finally, the mechanisms used by the cell to maintain
sequence integrity are dependent on DNA mechanical properties
(
25–27). The emerging idea behind these findings is the
existence of a hidden structural and deformation code, which
has been conserved throughout the evolution and that helps the
cell to complement the primary information coded in the DNA
sequence (
19,
28–30).
The general structure of B-DNA is known since the 1950s (31), but only since the 1980s it is available in atomistic detail. Unfortunately, the dependence of structure on sequence is not so well determined, since certain nucleotide sequences have problems to crystallize, or do not produce well-defined NMR maps. The polymorphism of DNA has introduced additional problems: changes in crystallization buffer or the presence of ligands can generate artefactual structures for some sequences, as is the case of many A-forms deposited in the PDB database for duplex and triplex DNA and DNA–RNA hybrids. As a result, despite the large amount of structural information available, there is still no complete experimentally derived map of the conformational space of B-DNA (Table 1). Theoretical techniques (32,33), in particular molecular dynamics (MD) simulations (34–36) then emerge as a useful tool to complement the already existing structural information on physiological DNA (37–39).
View this table:
[in this window]
[in a new window]
|
Table 1. Number of unique sequences in antiparallel DNA duplexes of different lengths and number of cases for which experimental data are available
|
|
Different experimental techniques can be used to obtain macroscopic
information of DNA harmonic deformability (
40,
41). Unfortunately,
the microscopic analysis is much more difficult, and in fact
most of the experimental microscopic information about
flexibility is based on the analysis of structural databases
(mostly X-ray structures), and on the assumption that flexible
molecules display larger structural diversity than rigid ones.
This approach has been successfully used to obtain rough descriptions
of DNA deformability (
30,
39), but presents two main problems:
(i) there is no guarantee that the type of deformation induced
by crystal lattice or ligands on the DNA will be the same as
the one spontaneously followed by relaxed DNA and (ii) statistical
quality of the stiffness estimates requires a large number of
structures for a given sequence, and there are many sequences
for which no such wide structural information is available (
Table 1).
Once again, MD appears as an excellent complementary tool to
experimental techniques.
MD simulations on DNA started in the mid-80s (42), but only after the mid-90s unrestrained calculations were possible in the nanosecond time scale (43–45). Since then, MD has been widely used to study a variety of normal and unusual forms of DNA and other nucleic acids (34–36, 46–50). Many MD studies of DNA have been focused on structural aspects, but Lankas and others (51–53), and Gonzalez and Maddocks (54) pointed to the possibility to use MD trajectories to obtain information on the flexibility of DNA, which can be then incorporated into pseudoharmonic mesoscopic models to evaluate properties of very long fragments of DNA (36,55,56) or to improve the prediction of protein–DNA binding affinities (57,58). The pseudoharmonic mesoscopic approach assumes that global DNA deformability can be described as a combination of six local harmonic deformabilities (three translations and three rotations) at the base pair level and despite the neglect of non-harmonic terms, which can be important to describe large localized structural changes, such as the kinks (59) provides a reasonable representation of DNA flexibility.
In this article, using state-of-the-art simulation, techniques we will re-visit the field of DNA flexibility, trying to obtain for the first time a global picture of DNA deformability using four long duplexes containing different copies of each non-redundant dinucleotide step. Based on our previous microsecond simulation (60), the trajectories were extended to 100 ns, which is expected to be enough to capture the most important dynamic properties of these oligomers. Furthermore, to obtain a robust picture of DNA flexibility, and mimicking our recent studies in proteins (61), all simulations were repeated using the two more recently developed nucleic acids force-fields: CHARMM27 (62,63) and parmbsc0 (64). This theoretical effort provides for the first time a rather complete picture of the structural and flexibility properties of B-DNA.
 |
METHODS
|
|---|
Selection of sequences to study
In a previous paper (
53) two duplexes were designed as models
for the study of flexibility of duplex B-DNA: SEQ1: d(GCCTATAAACGCCTATAA)
and SEQ2: d(CTAGGTGGATGACTCATT). The sequences were selected
to: (i) yield normal B-DNA, (ii) be long enough to fully represent
a complete DNA turn and (iii) contain copies of the 10 unique
dinucleotide steps: d(AA)·d(TT), d(AC)·d(GT),
d(AG)·d(CT), d(AT)·d(AT), d(CA)·d(TG),
d(CC)·d(GG), d(CG)·d(CG), d(GA)·d(TC),
d(GC)·d(GC) and d(TA)·d(TA). In this article,
to obtain a more complete picture and to have more examples
of all the different steps we add two more sequences, selected
under similar premises: SEQ3: d(CACGGAACCGGTTCCGTG) and SEQ4:
d(GGCGCGCACCACGCGCGG).
System set-up and simulation conditions
Starting structures were created using standard B-DNA fibre coordinates and were then immersed in a box containing
10 600 water molecules adding then Na+ to obtain neutral systems using CMIP calculations (65) to optimize the original positions of the ions. All systems were then optimized, thermalized (298 K) and equilibrated using our standard multi-step protocol (66,67), doubling the simulation length at every window. Final structures were further re-equilibrated for 10 ns prior to data collection. Simulations were extended for 100 ns in the isothermic isobaric ensemble (P = 1 atm, T = 298 K) using periodic boundary conditions and Particle Mesh Ewald calculations (44). A time step of 1 fs was used in conjunction with SHAKE (68) or RATTLE (69) algorithms for maintaining bonds involving hydrogen atoms at equilibrium distances. Atomic interactions were represented using parmbsc0 (64) or CHARMM27 (62,63) force fields and the TIP3P water model combined with standard parmbsc0 and CHARMM27 ion models. For comparison, additional simulations (20 ns each) were performed using previous AMBER force fields parm94 (45) and parm99 (70,71) for SEQ1 and SEQ2, implementing an information-bias procedure to avoid flips of
/
backbone dihedral angles to non-canonical substates: every time an
/
transition occurred, the trajectory was restarted using the coordinates and velocities found
100 ps before the transition; the flip then never occurred again at the same time point. These simulations are called parm94* and parm99*.
All CHARMM27 simulations were carried out using NAMD (72,73) computer program, while the pmemd module of AMBER8.1 (74) computer program was used for parmbs0 and parm94*/99* calculations [previous work (61) demonstrated that the two computer programs provide equivalent results for identical force fields]. CHARMM27/NAMD calculations gives an output of 0.03 s/step using 16 Myrinet-connected Power PC processors, while with the same hardware parmbsc0/pmemd gives 0.07 s/step.
Analysis of trajectories
Standard geometrical descriptors and helical analysis tools (see below) were used to describe the main structural characteristics of the duplexes. Unless otherwise stated, analysis was performed considering the central 16-mer portion of the duplexes. The potential for interactions of the duplexes was analysed by computing the classical molecular interaction potentials (CMIP; at –5 kcal/mol) taking Na+ as a probe.
The essential dynamics of the different duplexes was derived by diagonalization of the covariance matrix (36,75), which yields a set of eigenvectors {
i} and eigenvalues {
i} describing the nature and amplitude of the different essential movements. Similarly, the eigenvalues correspond to frequencies (
i) if they are obtained by diagonalization of the mass-weighted covariance matrix. Such frequencies can be manipulated to yield entropies (76) in the pseudo-harmonic limit [see Equation (1)].
| (1) |
where

, and the sum extends to all the non-trivial vibrations (all
the other symbols have the standard physical meaning).
In order to determine the similarity in the set of essential movements determined by any two trajectories we compared the corresponding eigenvectors using absolute and relative similarity indexes [see Equations (2) and (3) and references (75 and 77)] and the associated Z-scores [see Equation (4)]. The absolute similarity index is defined by
| (2) |
where the indexes A, B refer to the two trajectories
and
i is the eigenvalue (in Å
2) associated with eigenvector
i. The sum is extended to the important modes (i.e. those explaining

90% variance, in our case
z 
30). The
x is set to a standard
value for DNA duplexes (
77). Note that the similarity index
ranges from 0 (no similarity) to 1 (identity). The relative
similarity index is defined as
| (3) |
where the self-similarity indexes

stand for the value obtained by comparing the
first and second half of the same trajectory. The associated
Z-score value is computed as:
| (4) |
where the random models were obtained by diagonalization
of a pseudo-covariance matrix obtained by random permutation
of the atoms for each snapshot. The standard deviation (SD)
was obtained by considering 500 different random models (
78).
Note that the
Zscore determines the statistical significance
of a given

-value.
Z-scores above 1 indicate a significant dissimilarity
between model and background similarities.
As described elsewhere (53), elastic force constants associated with helical deformation at the base pair step level were determined by inversion of the covariance matrix in helical space, which yields stiffness matrices [
h; see Equation (5)] whose diagonal elements provide the stiffness constants associated with pure rotational (twist, roll and tilt) and translational (rise, shift and slide) deformations within the given step:
| (5) |
where
Ch is the covariance matrix in helical
space.
Standard geometrical and energetic analysis was done using the ptraj module of AMBER8.1 and 3DNA (79) programs as well as in-house programs. Essential dynamics was done with the analysis modules in the PCAZIP program (which can be downloaded from the following MMB and CCPB websites: http://mmb.pcb.ub.es/software/pcasuite.html and http://www.ccpb.ac.uk/events/workshops/previous/analysis/), and other in house programs. The different trajectories collected here are available in compressed format (95% variance threshold) at http://mmb.pcb.ub.es/~raist/CONSENSUS and can be decompressed with the PCAZIP program (80).
Structural database analysis
X-ray naked DNA dataset was built according to Perez et al. (30). For the NMR dataset we followed Olson's et al. culling of base pair step parameters and those steps outside the 3 SD limit from the average for any of the six parameters was excluded from the study. Averaging of the steps was done by using both strands in each oligo. Generic averages and standard deviations were prepared using equal weight for each of the 16 possible base pairs. The effective temperature used to describe force constants using Olson et al.apos;s (39) data are 295 K as described in Lankas et al. (53).
 |
RESULTS AND DISCUSSION
|
|---|
General structure
The two force fields studied here yield stable trajectories
in the 100-ns simulation time, sampling regions of conformation
space expected for canonical B-DNA (
Figure 1). Detailed analysis
of root mean square deviation (RMSD) plots demonstrates that
CHARMM27 always yields conformations

1 Å closer to the
fibre B-DNA structure than parmbsc0 (no major differences are
in general found if the reference structure is created by using
crystal-averaged helical parameters for each step;
Figure 1).
In all cases, oscillations in RMSD are smaller for CHARMM27
than for parmbsc0 calculations, suggesting a stiffer simulation
in the first case.

View larger version (67K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 1. RMSd (in Å) of sampled structures from fibre conformation (in red). Smoothed RMSd (in black) from ideal structures created from the average helical crystal parameters for each sequence is also displayed.
|
|
The distribution of helical parameters averaged over the central
16mer of the four sequences show Gaussian-like profiles centred
in positions rather close to those expected from average values
derived from experimental structures (
Figure 2 and
Table 2).
On average, all parameters, except rise, are closer to the values
derived from NMR experiments in solution. Shift, tilt and rise
distributions are almost identical in both force fields, while
small differences are found for the other three helical parameters.
Thus, parmbsc0 calculations show slide distributions centred

–0.5 Å (close to the value obtained by analysis
of NMR structures), while CHARMM27 distributions are centred
at 0.0 Å, closer to the estimate obtained from X-ray data.
Both force fields give roll distributions centred at positive
values: parmbsc0 around 3.5°, and CHARMM27 around 6°,
parmbsc0 being closer to the experimental estimates. Finally,
MD simulations yield average twist around 33 (parmbsc0) or 34°
(CHARMM27), while experimental data suggest slightly larger
values (35.5 from X-ray data and 34.5 from NMR experiments).
Considering the range of uncertainties in the averaged experimental
measures (arising from packing effect, low resolution, incomplete
sampling, reduced hydration, etc.) noted in the standard deviations
(
Table 2) we can conclude that both force fields provide a quite
reasonable distribution of helical parameters.

View larger version (50K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 2. Distribution of base pair step helical parameters averaged, for each snapshot, over the central 16-bp portion of each sequence. Translational parameters are in angstroms and rotational parameters in degrees. Values for parmbsc0 simulations are represented by lines; CHARMM27 values correspond to lines with points. For comparison, sequence-dependent average values based on crystal data (black) and values averaged over all available nmr data (magenta) are shown as straight vertical lines with 1 SD confidence intervals (dotted lines). Histograms were constructed using 50 bins between the maximum and minimum values for each parameter.
|
|
In order to complement the study of the global shape of the
duplexes we analysed the groove geometries (
81) obtained in
both series of simulations (
Figure 3). Groove geometry is especially
important since it determines the ability of the duplex to be
recognized by ligands (either small drugs or proteins). Analysis
of the average groove distribution for the four sequences illustrates
probably the most significant difference between parmbsc0 and
CHAMM27 calculations. Thus, parmbsc0 groove widths are centred
at 19 Å (major) and 12–13 Å (minor), with
a major–minor width difference around 6 Å. In CHARMM27
calculations, the major groove is narrower (around 17 Å)
and the minor is wider (around 13–14 Å), reducing
then major–minor asymmetry to only 3–4 Å.
It is worth to note that X-ray data suggest widths around 17
(major) and 11 Å (minor), values that are enlarged by

1 Å if NMR data are considered; irrespective of the source
of experimental information, the major–minor difference
is around 6 Å, closer to parmbsc0 values. Quite surprisingly,
the different geometry of the grooves in parmbsc0 and CHARMM27
simulations leads to only small changes in the predicted pattern
of interactions of the duplexes with cations (
Figure 4), which
suggest that in general reasonably similar information on DNA
interactions could be obtained for parmbsc0 and CHARMM27 simulations.

View larger version (69K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 3. Evolution of groove widths (in angstroms) in time for parmbsc0 and CHARMM27 simulations. Average experimental values are shown as horizontal straight lines (crystal: black, NMR: blue and magenta). Groove widths were calculated as P-P distances according to El Hassan and Calladine's work (81).
|
|

View larger version (136K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 4. Classical interaction potential (CMIP, in purple) of average parmbsc0 and CHARMM27 DNA conformations. Grid was constructed with a 0.5-Å spacing. The contour shown corresponds to –5 kcal/mol level. CMIP distributions obtained for idealized conformations built by using sequence-dependent average base pair step parameters from the X-ray ensemble are shown as reference. SEQ1: top left, SEQ2: top right, SEQ3: bottom left, SEQ4: bottom right.
|
|
In summary, MD trajectories performed using either parmbsc0
or CHARMM27 force field yield to a reasonable representation
of the expected solution geometry of B-DNA duplexes. The subtle
differences found favour in some cases CHARMM27 (average twist
and slide closer to current available experimental data) and
in others to parmbsc0 (roll and groove asymmetry), but in any
case we must emphasize that the differences between force fields
are rather small, and taking all the results together, we should
conclude that both force fields are giving general duplex geometries
of enough quality for most modelling studies.
Sequence-dependent structural properties
We computed helical coordinates for all the 10 unique steps in the four sequences in both parmbsc0 and CHARMM27 trajectories. Results (Table 2) show a generally good agreement in the sequence trends for the different helical coordinates. This is especially remarkable for the two helical parameters, which on average display the largest divergence between force fields: roll and twist (Figure 5). Also worth noting is the general agreement between force-field estimates and experimental trends (Table 2 and Figure 5), which reinforces our confidence in simulations with current force fields. Not surprisingly, the most significant differences between parmbsc0, CHARMM27 and experimental values are found for d(CA) and d(TA), the pyrimidine–purine steps known for their exceptional flexibility.
Backbone geometry and flexibility
A reasonable helical structure does not always guarantee correct
backbone geometry. Thus, we analysed in detail the three major
elements of flexibility in the duplex: (i)
sugar puckering,
(ii)
rotations around
/
torsions and (iii)
rotations around
/
torsions. Both force fields suggest the South and South-East
(puckering annotation was done by dividing the pseudo-rotational
circle in four equivalent sections; North: [(315:45°], East:
[45:135°], South: [135:225°], West: [225:315°] conformations
as those dominating sugar conformational space (
Figure 6), in
agreement with all available experimental data (
82). CHARMM27
and parmbsc0 distributions of the phase or the

angles show
maxima at identical positions, and the only difference is that
CHARMM27 distributions are slightly narrower and that temporary
visits to the North-puckering region are even less common in
CHARMM27 than in parmbsc0 simulations. The concerted rotation
around

/

torsions generates two major conformers: B
I and B
II (
83), which are experimentally known to co-exist in a ratio
around 80(B
I):20(B
II) in B-DNA (
84) [for definition of these
two substates see Hartmann
et al. (
83)]. As seen in
Figure 7 both force fields detect the existence of these two subpopulations
showing quite fast and frequent B
I
B
II transitions. Detailed
comparison of population plots demonstrates that again CHARMM27
is more rigid and leads on average to a smaller population of
B
II conformer [around 90(BI):10(BII)] than parmbsc0 [85(BI):15(BII)].
Finally, the third degree of flexibility for DNA backbone originates
from concerted

/

rotations, which generate non-canonical local
conformations leading to a reduced twist and which are important
in the formation of several protein–DNA complexes (
85).
Our parmbsc0 calculations suggest that non-canonical

/

conformers
represent around 1.5% of the population of dinucleotide backbone
conformations, in good agreement with experimental measurements
(
86), which means that for the central 16-mer duplex we can
expect frequent (around 40%) snapshots showing at least one

/

pair of torsions in the non-canonical region. On the contrary,
CHARMM27 trajectories are quite fixed at canonical values (
Figure 8)
and the population of non-canonical

/

conformers is much smaller
than experimentally predicted. However, we should bear in mind
that in any case non-canonical forms represent only a small
fraction of the

/

conformational space, and accordingly the
impact of this CHARMM27/parmbsc0 discrepancy is expected to
be moderate in the study of relaxed duplex geometry (it might
be not so irrelevant in the study of protein–DNA complexes).

View larger version (48K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 6. Left: population (in percentage) of nucleotides with South puckering for the four oligonucleotides in parmbsc0 and CHARMM27 simulations. Right: histogram of the backbone torsion angle using 72 bins between –180 and 180°. Results corresponding to parmbsc0 simulations are drawn with lines and CHARMM27 ones with lines and points. Crystal values are shown as reference (dotted black line).
|
|
In summary, distributions of backbone torsions obtained in CHARMM27
and parmbsc0 simulations are quite similar, except for a general
tendency of the CHARMM27 force field to stick more firmly to
regions around canonical values, removing or making less prevalent
non-canonical transitions of potential biological impact. However,
we can affirm again that for most studies, both force fields
provide a close enough picture of the backbone conformational
space and that such a picture is of a quality close to what
can be experimentally derived.
Global and sequence-dependent deformability
The global flexibility of DNA is dominated by untwisting and bending movements which are present in the first essential deformation modes of DNA (see deformation movies at: http://mmb.pcb.ub.es/~raist/CONSENSUS). Comparison of frequencies assigned to the first modes of the four sequences show that sequence-induced and force –field-induced variability are not much different (Figure 9) and that in general, for a given sequence the frequencies assigned to the first modes are slightly lower in parmbsc0 than in CHARMM27 calculations. As expected from the frequency profile, the average of polymer entropies computed with parmbsc0 are slightly larger (5%) than those obtained from CHARMM27 samplings (Table 3), but the difference is not much larger than that introduced by the sequence and probably in the range of the noise introduced by the limited length of current trajectories (60). Thus, we can conclude that while parmbsc0 is more flexible in terms of lower frequency movements, the overall conformational space visited in CHARMM27 simulations is not dramatically smaller than that explored in parmbsc0 calculations.

View larger version (34K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 9. Frequency (in cm–1) of the first 35 collective modes for the four sequences considered here. Results corresponding to parmbsc0 simulations are drawn with lines and CHARMM27 ones with lines and points. For extension of the graph to higher modes see Figure S1.
|
|
View this table:
[in this window]
[in a new window]
|
Table 3. Entropies (in kcal/mol) for 100-ns simulation time for all atoms and backbone of the four sequences considered here (central 16mer)
|
|
Once the relative stiffness of the softer deformation modes
in CHARMM27 and parmbsc0 is determined, we need to verify whether
or not the type of essential deformations sampled spontaneously
by the two force fields is similar. For this purpose, we computed
relative similarity indexes and the associated
Z-scores considering
both all atoms and backbone-only
representations.
Tables 4 and
5 show high similarity indexes
(around 0.7), with very large associated
Z-scores, which emphasizes
the statistical confidence in the similarity between both sets
of trajectories. More interestingly, if only backbone atoms
are considered, similarity indexes rise to 0.8 and become of
the same order than those obtained when trajectories for different
sequences obtained with the same force field are compared. In
summary, we can conclude that despite differences in the stiffness
of the first modes, the general deformation pattern of B-DNA
is described quite similarly by both force fields, being dominated
by global bending and untwisting movements (
77).
View this table:
[in this window]
[in a new window]
|
Table 4. Relative similarity indexes [ ; see Equation (6)] between parmbsc0 and CHARMM27 trajectories considering all atoms
|
|
View this table:
[in this window]
[in a new window]
|
Table 5. Relative similarity indexes [ ; see Equation (6)] between indexes obtained by comparing different sequences (only backbone)
|
|
Stiffness analysis at the base pair step level allowed us to
obtain sequence-dependent helical stiffness parameters (see
Methods section) which can then be used for mesoscopic simulations
of very long pieces of B-DNA (
87) or to estimate the indirect
readout component of protein–DNA interactions (
57,
58).
Results in
Table 6 illustrate the deformation parameters obtained
by averaging individual estimates for base pair steps of the
same sequence in the different oligomers. Quite interestingly,
despite CHARMM27 force constants being in general higher than
those obtained by parmbsc0 (illustrating the stiffer nature
of backbone torsional profiles), the average difference is only
15%, and for many helical parameters CHARMM27 and parmbsc0 quite
close to each other. In relative terms, the two force fields
suggest the following decreasing order of stiffness for translational
movements: rise>>slide>shift. The values which Olson
and co-workers (
39) derived from crystal structures of protein-bound
DNAs exhibit rise stiffness much greater than that of slide
and shift, but no clear ordering between shift and slide. The
rigidity ordering of rotational distortions is less clearly
defined, since parmbsc0 predicts the stiffness order: tilt

twist

roll, CHARMM27 found no simple relationship between tilt and
twist but yields tilt

roll, twist

roll. Olson's data did not
establish a clear ordering between twist and roll, but suggest:
tilt

twist, tilt

roll, which seems to be more consistent with
parmbsc0 ordering (
Table 6).
View this table:
[in this window]
[in a new window]
|
Table 6. Sequence-dependent dinucleotide force constants associated with the deformation of a single helical degree of freedom, computed from parmbsc0 and CHARMM27 simulations
|
|
Analysis of helical stiffness parameters for each unique base
pair step confirms the very remarkable similarity between parmbsc0
and CHARMM27 estimates of the sequence-dependent stiffness of
B-DNA (
Table 6 and
Figure 10) in aqueous solution and the quite
unexpected similarity with Olson's estimates, which were obtained
from the analysis of a reduced set (which warns against the
statistical quality in estimates of some steps) of protein-induced
distortions in crystals (which grew and were diffracted at typically
low or very low temperatures). Analysis of MD simulation data
suggests a quite significant dependence of stiffness on sequence,
which again highlights the intrinsic shortcomings of sequence-independent
macroscopic models of DNA flexibility. In general, our simulations
strongly suggest that some steps like CG, CA and TA appear as
ideal hinge points for global DNA bending and
twisting, while AT and GC are on the contrary quite stiff points
for these deformations. However, caution is needed in using
these general concepts of rigidity or flexibility
since different base pair steps show different stiffness against
different helical deformations and one step which might be very
difficult to unwind by twist deformation might be on the contrary
easily deformed by modifying the slide of shift. Again, we should
note that the concept of flexibility should be
linked to an exact definition of the deformation explored.

View larger version (21K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 10. Sequence-dependent force constants for helical deformation (translational ones in kcal/mol Å2, rotational ones in kcal/mol degree2) for the different unique steps obtained by averaging individual data for the four sequences in parmbsc0 and CHARMM27 simulations (data from Table). Olson's values are shown for comparison.
|
|
Local nucleobase dynamics
While DNA flexibility mostly manifests itself in movements involving
base pairs as basic units, the individual flexibility of nucleobases
is required for some biological processes like DNA methylation
or repair (
26,
27,
88,
89). Accordingly, we complement our analysis
of DNA geometry by inspecting the pattern of hydrogen bonding
between nucleobases. Fraying effects leading to the break of
one or both terminal pairs are found in most of the simulations,
even when the terminal pairs are G-C, but the pattern of canonical
hydrogen bonds (defined as the distance between the heteroatoms
involved being <3.5 Å) in the central portion or the
duplexes is quite well preserved (see
Figure 11). Thus, the
three H-bonds of each G-C pair are conserved during 93% (parmbsc0)
or 74% (CHARMM27) of the time, while the two H-bonds of the
A-T pairs are slightly less conserved [74% (parmbsc0) and 67%
(CHARMM27)]. The significant higher fragility of G-C hydrogen
bonds in CHARM27 compared to parmbsc0 can be understood by analysing
the hydrogen bond energy in d(G-C) steps in MD-averaged structures
[–23.5 kcal/mol for CHARMM27, –27.4 kcal/mol for
parmbsc0; CCSD(T)/CBS estimate –28.8 kcal/mol from reference
90]. On average, we found a reversible loss of one of the Watson–Crick
hydrogen bonds every 80 (parmbsc0) and 70 (CHARMM27) ps for
A-T and every 170 (parmbsc0) and 55 (CHARMM27) ps for G-C. Most
disruptions of the ideal pattern of hydrogen bonding is due
to fast opening (<10 ps), which do not yield stable opened
forms and which do not perturb the general structure of the
duplex yielding to co-operative openings. A few long (>1
ns) opening events are found throughout the trajectories (sequence
1 for both parmbsc0 and CHARMM27 and sequence 2 for CHARMM27);
in all cases they correspond to A-T pairs contiguous to terminal
base pairs which are disrupted during the simulation. In one
case (for SEQ2), CHARMM27 yields a base flipping which is not
reversible in the simulation time considered here. This flipping
movement is expected to occur in the millisecond time scale
and accordingly can be considered an artefact of the simulation,
probably originated in the perturbing effect of the fully disrupted
terminal base pair and without much impact on the global dynamics
of the duplex.

View larger version (36K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 11. Number of conserved canonical Watson and Crick hydrogen bonds (end pairs were excluded from this analysis) for the four sequences (TOP to BOTTOM: SEQ1 to SEQ4) in parmbsc0 and CHARMM27. Red: total number of hydrogen bonds. Blue: A-T hydrogen bonds. Green: C-G hydrogen bonds.
|
|
In summary, both force fields preserve reasonably well the pattern
of canonical hydrogen bonding in the central portion of the
duplex. Hydrogen bonds detected by parmbsc0 seem to be stronger
than those detected by CHARMM27, the difference being especially
noticeable for the G-C pairs. Temporary reversible loss of hydrogen
bonding is common and fast according to both force fields, but
no base flipping transitions are detected except for one CHARMM27
simulation, where the disrupted pair is contiguous to a terminal
base pair.
Comparison with previous AMBER force fields
Most MD simulations in the literature have been performed with older AMBER force fields, in particular with parm94 [including an extensive recent study (37,38)], and parm99 (53,64). Unfortunately, as noted elsewhere (64), both force fields lead to severely distorted geometries in the >10-ns range, which implies that their use should be avoided in studies such as the present one. However, for the sake of completeness we analysed the behaviour of parm94/99 force fields in the ideal limit of no
/
transitions (see Methods section). The removal of
/
transitions produces parm99* trajectories with average helical properties similar to parmbsc0 and not two different groove geometries; on the contrary, parm94* calculations lead to slide and twist distributions which are farther from experimental values, yielding wider than expected grooves and an overestimation of groove asymmetry (see Supplementary Table 1 and Figures S1 and S2). Backbones show reduced flexibility compared to parmbsc0 in terms of puckering (parm99*, see Figure S3) and BI/BII equilibrium (parm94*, see Figure S4). Interestingly, very similar stiffness parameters are in general found for parmbsc0 and parm99*, while parm94* mostly predicts the twist stiffness greater than that of tilt (53), in qualitative disagreement with parmbsc0 and Olson's estimates (see Supplementary Table 2 and Figure S5). The ordering of stiffness for translational deformations, rise>>slide>shift, is uniformly followed by parmbsc0, charmm27, parm94*, parm99* and older simulation data (53), but only partially by Olson's values. Finally, we emphasize the similarity in sequence-dependent stiffness profiles between parmbsc0 and the two other corrected AMBER force fields. In summary, our results here suggest that in the absence of
/
transitions (either fortuitous, due to the limited length of simulation, or to the use of a posteriori correction) old AMBER force fields, if corrected for
/
transitions provide also a reasonable representation of B-DNA structure and dynamics, supporting then the quality of a large number of previous published works (see above).
Present model is based on the dinucleotide model which assumes that elastic properties can be modelled considering only near neighbours, neglecting then environment-effects. This can be a source or potential uncertainties, since elastic properties of a dinucleotide step d(XY) might depend on the nature of the flanking sequences d(A ... ZXYA' ... Z'). In order to have a rough estimate of the level of error implicit to our averaged stiffness parameters we explored the dispersion of stiffness parameters for step d(XY) depending on the nature of the flanking bases A and B in the tetramers (37,38) d(AXYB) sampled during trajectories. Results displayed in Supplementary Table S3 demonstrate that in general the impact of neighbouring steps is not dramatic as standard deviations (associated to variation of flanking bases) account typically for <10% the value of the stiffness parameter. Twist and slide seem to be the helical coordinates whose stiffness parameters are more dependent on neighbouring effects, which seem to affect not equally to all steps [for example stiffness of d(CC) steps seems to be particularly dependent on flanking bases]. Table S3 suggests that in general CHARMM27 has a slightly lower neighbour dependence than parmbsc0, but in any case values are quite close (see Supplementary Table S3), confirming again the remarkable robustness of MD simulations to changes in the force field.
 |
FINAL REMARKS
|
|---|
The field of MD simulations of nucleic acids is reaching its
maturity and microsecond-long simulations are becoming possible,
which allows us to obtain a more complete dynamic picture of
DNA flexibility. The area has been dominated by two families
of force fields, AMBER and CHARMM. Comparison studies (typically
based on very short trajectories) are scarce and often overemphasize
small differential points which can favour one force field over
the other. However, in this paper, based on very long trajectories
on a significant number of long duplexes, we found that there
is much more agreements than differences between the structural
and dynamical view of B-DNA provided by both force fields. We
demonstrate that it is possible to arrive at a high-quality
consensus picture of the basic structural dynamics characteristics
of B-DNA, drawing an atlas which can help experimentalists in
areas such as molecular biophysics, molecular biology and bio-nanotechnology.
 |
SUPPLEMENTARY DATA
|
|---|
Supplementary Data are available at NAR Online.
 |
ACKNOWLEDGEMENTS
|
|---|
Calculations were performed on the
MareNostrum supercomputer
at the Barcelona Supercomputer Center, and on EPFL clusters.
The authors thank Leesa Heffler for help with the parm94*/99*
simulations. This work has been supported by the Spanish Ministry
of Education and Science (BIO2006-01602 and Consolider supercomputation
for E-Science), the National Institute of Bioinformatics (Structural
Bioinformatics Node) and the Fundación Marcelino Botín.
Further support came from the Swiss National Science Foundation
(project no. 205320-103833/1). Funding to pay the Open Access
publication charges for this article was provided by the University
of Barcelona.
Conflict of interest statement. None declared.
 |
REFERENCES
|
|---|
- Bajic VB, Seah SH. Dragon Gene Start Finder: an advanced system for finding approximate locations of the start of gene transcriptional units. Genome Res (2003) 13:1923–1929.[Abstract/Free Full Text]
- Down TA, Hubbard TJP. Computational detection and location of transcription start sites in mammalian genomic DNA. Genome Res (2002) 12:458–461.[Abstract/Free Full Text]
- Gross SS, Brent MR. Using multiple alignments to improve gene prediction. J. Comput. Biol (2006) 13:379–393.[CrossRef][ISI][Medline]
- Knudsen S. Promoter2.0: for the recognition of PolII promoter sequences. Bioinformatics (1999) 15:356–361.[Abstract/Free Full Text]
- Ponger L, Mouchiroud D. CpGProD: identifying CpG islands associated with transcription start sites in large genomic mammalian sequences. Bioinformatics (2002) 18:631–633.[Abstract/Free Full Text]
- Solovyev VV, Shahmuradov IA. PromH: promoters identification using orthologous genomic sequences. Nucleic Acids Res (2003) 31:3540–3545.[Abstract/Free Full Text]
- Korf I, Flicek P, Duan D, Brent MR. Integrating genomic homology into gene structure prediction. Bioinformatics (2001) 17(Suppl 1):S140–S148.[Abstract]
- Brown RH, Gross SS, Brent MR. Begin at the beginning: predicting genes with 5' UTRs. Genome Res (2005) 15:742–747.[Abstract/Free Full Text]
- Feder ME, Mitchell-Olds T. Evolutionary and ecological functional genomics. Nat. Rev. Genet (2003) 4:651–657.[CrossRef][ISI][Medline]
- Pedersen AG, Jensen LJ, Brunak S, Staerfeldt HH, Ussery DW. A DNA structural atlas for Escherichia coli. J. Mol. Biol (2000) 299:907–930.[CrossRef][ISI][Medline]
- Ponomarenko JV, Ponomarenko MP, Frolov AS, Vorobyev DG, Overton GC, Kolchanov NA. Conformational and physicochemical DNA features specific for transcription factor binding sites. Bioinformatics (1999) 15:654–668.[Abstract/Free Full Text]
- Pedersen AG, Baldi P, Chauvin Y, Brunak S. The biology of eukaryotic promoter prediction - a review. Comput. Chem (1999) 23:191–207.[CrossRef][ISI][Medline]
- Pedersen AG, Baldi P, Chauvin Y, Brunak S. DNA structure in human RNA polymerase II promoters. J. Mol. Biol (1998) 281:663–673.[CrossRef][ISI][Medline]
- Ohler U, Niemann H, Liao G, Rubin GM. Joint modeling of DNA sequence and physical properties to improve eukaryotic promoter recognition. Bioinformatics (2001) 17(Suppl 1):S199–S206.[Abstract]
- Kanhere A, Bansal M. Structural properties of promoters: similarities and differences between prokaryotes and eukaryotes. Nucleic Acids Res (2005) 33:3165–3175.[Abstract/Free Full Text]
- Florquin K, Saeys Y, Degroeve S, Rouze P, Van de Peer Y. Large-scale structural analysis of the core promoter in mammalian and plant genomes. Nucleic Acids Res (2005) 33:4255–4264.[Abstract/Free Full Text]
- Goni JR, de la Cruz X, Orozco M. Triplex-forming oligonucleotide target sequences in the human genome. Nucleic Acids Res (2004) 32:354–360.[Abstract/Free Full Text]
- Goni JR, Perez A, Torrents D, Orozco M. Determining promoter location based on first-principles calculations. Genome Biol (2007) 8:R263.[CrossRef][Medline]
- Goni JR, Vaquerizas JM, Dopazo J, Orozco M. Exploring the reasons for the large density of triplex-forming oligonucleotide target sequences in the human regulatory regions. BMC Genomics (2006) 7:1–10.[CrossRef][ISI][Medline]
- Dempsey LA, Sun H, Hanakahi LA, Maizels N. G4 DNA binding by LR1 and its subunits, nucleolin and hnRNP D, a role for G-G pairing in immunoglobulin switch recombination. J. Biol. Chem (1999) 274:1066–1071.[Abstract/Free Full Text]
- Wasserman SA, Cozzarelli NR. Biochemical topology - applications to DNA recombination and replication. Science (1986) 232:951–960.[Abstract/Free Full Text]
- Oleksi A, Blanco AG, Boer R, Uson I, Aymami J, Rodger A, Hannon MJ, Coll M. Molecular recognition of a three-way DNA junction by a metallosupramolecular helicate. Angew. Chem. Int. Ed (2006) 45:1227–1231.[CrossRef]
- Lilley DMJ. The inverted repeat as a recognizable structural feature in supercoiled DNA-molecules. Proc. Natl Acad. Sci. USA Biol. Sci (1980) 77:6468–6472.[CrossRef]
- Lilley DMJ, Gough GW, Hallam LR, Sullivan KM. The physical-chemistry of cruciform structures in aupercoiled DNA-molecules. Biochimie (1985) 67:697–706.[Medline]
- Peter BJ, Ullsperger C, Hiasa H, Marians KJ, Cozzarelli NR. The structure of supercoiled intermediates in DNA replication. Cell (1998) 94:819–827.[CrossRef][ISI][Medline]
- Sancar A. DNA excision repair. Annu. Rev. Biochem (1996) 65:43–81.[CrossRef][ISI][Medline]
- Vanhouten B. Nucleotide excision repair in Escherichia-coli. Microbiol. Rev (1990) 54:18–51.[Abstract/Free Full Text]
- Birney E, Stamatoyannopoulos JA, Dutta A, Guigo R, Gingeras TR, Margulies EH, Weng ZP, Snyder M, Dermitzakis ET, Stamatoyannopoulos JA, et al. Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project. Nature (2007) 447:799–816.[CrossRef][Medline]
- Feingold EA, Good PJ, Guyer MS, Kamholz S, Liefer L, Wetterstrand K, Collins FS, Gingeras TR, Kampa D, Sekinger EA, et al. The ENCODE (ENCyclopedia of DNA elements) Project. Science (2004) 306:636–640.[Abstract/Free Full Text]
- Perez A, Noy A, Lankas F, Luque FJ, Orozco M. The relative flexibility of B-DNA and A-RNA duplexes: database analysis. Nucleic Acids Res (2004) 32:6144–6151.[Abstract/Free Full Text]
- Watson JD, Crick FHC. Molecular structure of nucleic acids – a structure for deoxyribose nucleic acid. Nature (1953) 171:737–738.[CrossRef][Medline]
- Prevost C, Louisemay S, Ravishanker G, Lavery R, Beveridge DL. Persistence analysis of the static and dynamic helix deformations of DNA oligonucleotides - application to the crystal-structure and molecular-dynamics simulation of D(Cgcgaattcgcg)2. Biopolymers (1993) 33:335–350.[CrossRef][ISI][Medline]
- Zhurkin VB, Ulyanov NB, Gorin AA, Jernigan RL. Static and statistical bending of DNA evaluated by Monte-Carlo simulations. Proc. Natl Acad. Sci. USA (1991) 88:7046–7050.[Abstract/Free Full Text]
- Beveridge DL, McConnell KJ. Nucleic acids: theory and computer simulation, Y2K. Curr. Opin. Struct. Biol (2000) 10:182–196.[CrossRef][ISI][Medline]
- Cheatham TE III, Kollman PA. Molecular dynamics simulation of nucleic acids. Annu. Rev. Phys. Chem (2000) 51:435–471.[CrossRef][ISI][Medline]
- Orozco M, Perez A, Noy A, Luque FJ. Theoretical methods for the simulation of nucleic acids. Chem. Soc. Rev (2003) 32:350–364.[CrossRef][Medline]
- Beveridge DL, Barreiro G, Byun KS, Case DA, Cheatham TE, Dixit SB, Giudice E, Lankas F, Lavery R, Maddocks JH, et al. Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. I. Research design and results on d(C(p)G) steps. Biophys. J (2004) 87:3799–3813.[Abstract/Free Full Text]
- Dixit SB, Beveridge DL, Case DA, Cheatham TE, Giudice E, Lankas F, Lavery R, Maddocks JH, Osman R, Sklenar H, et al. Molecular dynamics simulations of the 136 unique tetranucleotide sequences of DNA oligonucleotides. II: Sequence context effects on the dynamical structures of the 10 unique dinucleotide steps. Biophys. J (2005) 89:3721–3740.[Abstract/Free Full Text]
- Olson WK, Gorin AA, Lu XJ, Hock LM, Zhurkin VB. DNA sequence-dependent deformability deduced from protein-DNA crystal complexes. Proc. Natl Acad. Sci. USA (1998) 95:11163–11168.[Abstract/Free Full Text]
- Hagerman PJ. Straightening out the bends in curved DNA. Biochim. Biophys. Acta (1992) 1131:125–132.[Medline]
- Crothers DM, Drak J, Kahn JD, Levene SD. DNA bending, flexibility, and helical repeat by Cyclization Kinetics. Methods Enzymol (1992) 212:3–29.[ISI][Medline]
- Levitt M. Computer simulation of DNA double-helix dynamics. Cold Spring Harb. Symp. Quant. Biol (1983) 47(Pt 1):251–262.[ISI][Medline]
- Cheatham TE, Miller JL, Fox T, Darden TA, Kollman PA. Molecular-dynamics simulations on solvated biomolecular systems - the particle mesh Ewald method leads to stable trajectories of DNA, RNA, and proteins. J. Am. Chem. Soc (1995) 117:4193–4194.[CrossRef][ISI]
- Darden T, York D, Pedersen L. Particle mesh Ewald - an N.Log(N) method for Ewald sums in large systems. J. Chem. Phys (1993) 98:10089–10092.[CrossRef]
- Cornell WD, Cieplak P, Bayly CI, Gould IR, Merz KM, Ferguson DM, Spellmeyer DC, Fox T, Caldwell JW, Kollman PA. A 2Nd generation force-field for the simulation of proteins, nucleic-acids, and organic-molecules. J. Am. Chem. Soc (1995) 117:5179–5197.[CrossRef][ISI]
- Cheatham TE, Young MA. Molecular dynamics simulation of nucleic acids: successes, limitations, and promise. Biopolymers (2000) 56:232–256.[CrossRef][Medline]
- Cheatham TE. Simulation and modeling of nucleic acid structure, dynamics and interactions. Curr. Opin. Struct. Biol (2004) 14:360–367.[CrossRef][ISI][Medline]
- Giudice E, Lavery R. Simulations of nucleic acids and their complexes. Acc. Chem. Res (2002) 35:350–357.[CrossRef][ISI][Medline]
- Sponer J, Lankas F, eds. Computational Studies of RNA and DNA (2006) Dordrecht: Springer.
- Fadrna E, Spackova N, Stefl R, Koca J, Cheatham TE, Sponer J. Molecular dynamics simulations of guanine quadruplex loops: Advances and force field limitations. Biophys. J (2004) 87:227–242.[Abstract/Free Full Text]
- Lankas F, Cheatham TE, Spackova N, Hobza P, Langowski J, Sponer J. Critical effect of the N2 amino group on structure, dynamics, and elasticity of DNA polypurine tracts. Biophys. J (2002) 82:2592–2609.[Abstract/Free Full Text]
- Lankas F, Sponer J, Hobza P, Langowski J. Sequence-dependent elastic properties of DNA. J. Mol. Biol (2000) 299:695–709.[CrossRef][ISI][Medline]
- Lankas F, Sponer J, Langowski J, Cheatham TE. DNA basepair step deformability inferred from molecular dynamics simulations. Biophys. J (2003) 85:2872–2883.[Abstract/Free Full Text]
- Gonzalez O, Maddocks JH. Extracting parameters for base-pair level models of DNA from molecular dynamics simulations. Theor. Chem. Acc (2001) 106:76–82.
- Coleman BD, Olson WK, Swigon D. Theory of sequence-dependent DNA elasticity. J. Chem. Phys (2003) 118:7127–7140.[CrossRef]
- Becker NB, Everaers R. From rigid base pairs to semiflexible polymers: coarse-graining DNA. Phys. Rev. E (2007) 76. 021923.
- Becker NB, Wolff L, Everaers R. Indirect readout: detection of optimized subsequences and calculation of relative binding affinities using different DNA elastic potentials. Nucleic Acids Res (2006) 34:5638̵