Nucleic Acids Research Advance Access originally published online on June 13, 2008
Nucleic Acids Research 2008 36(Web Server issue):W216-W222; doi:10.1093/nar/gkn367
Nucleic Acids Research, 2008, Vol. 36, No. suppl_2 W216-W222
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
LocalMove: computing on-lattice fits for biopolymers
Y. Ponty,
R. Istrate,
E. Porcelli and
P. Clote*
Department of Biology, Boston College, Chestnut Hill, MA 02467, USA
*To whom correspondence should be addressed. Tel: +1 617 552 1332; Fax: +1 617 552 2011; Email: clote{at}bc.edu
Received February 29, 2008. Revised May 21, 2008. Accepted May 22, 2008.
 |
ABSTRACT
|
|---|
Given an input Protein Data Bank file (PDB) for a protein or
RNA molecule,
LocalMove is a web server that determines an on-lattice
representation for the input biomolecule. The web server implements
a Markov Chain Monte-Carlo algorithm with simulated annealing
to compute an approximate fit for either the coarse-grain model
or backbone model on either the cubic or face-centered cubic
lattice.
LocalMove returns a PDB file as output, as well as
dynamic movie of 3D images of intermediate conformations during
the computation. The
LocalMove server is publicly available
at
http://bioinformatics.bc.edu/clotelab/localmove/.
 |
INTRODUCTION
|
|---|
Predicting the structure of biopolymers is one of the most important
and well-studied computational problems of the 20th century—a
problem, that despite enormous advances, remains only partially
solved. In an effort to minimize the number of conformations
to be explored, coarse-grain lattice models (beads on a string)
have been studied by many authors (
1–4), while coarse-grain
off-lattice models have been used in discrete molecular dynamics
(
5). In this article, we present the
LocalMove web server, which
implements a Markov Chain Monte-Carlo (MCMC) algorithm to compute
an approximate cubic or face-centered cubic lattice fit of either
the coarse-grain or backbone model for an input Protein Data
Bank (PDB) (
6) file for a protein or RNA molecule.
Finding a self-avoiding walk on the cubic lattice that minimizes the coordinate root mean square deviation (given sequences
and
of 3D points, the coordinate root mean square deviation, denoted rms or cRMS, is
) with the original PDB file, after normalization to ensure unit distance between successive monomers, is known to be NP-complete (7). Thus various heuristic approaches (8–13) have been proposed to approximately solve this problem, including Hopfield nets, self-consistent field optimization, integer programming (the application of integer programming (13) provides an optimal, not just approximate, solution, however with exponential run time), etc. Unfortunately, none of these methods is publicly available, so that LocalMove is the only publicly available tool for on-lattice fit of biopolymers, allowing users to postprocess certain threading energies (aka knowledge-based potentials) for structure classification and prediction.
The method LocalMove, presented in this article, performs a Monte-Carlo exploration of the on-lattice conformational landscape through a sequence of local moves, which generalize the single-monomer end and corner moves, and the 2-monomer crankshaft moves used in ref. (14) for the cubic lattice. At each step, a measure of similarity, distance root mean square deviation [dRMS (distance root mean square deviation (dRMS) between two conformations
and
is defined by
where
D(
P) = (
di,j) and
D(
Q) = (
ei,j) are the corresponding
distance matrices, where
di,j = ||
pi –
pj|| and
ei,j =
||
qi –
qj||)] is evaluated and the candidate move is either
accepted or stochastically rejected, according to the Metropolis
criterion. Different levels of representation are supported
by
LocalMove, scaling from the coarse-grain monomer model (
C
for amino acids,
C1' for RNA nucleotides or alternatively nucleotide
centers of mass), to all backbone atoms.
LocalMove supports
the cubic and face-centered cubic (FCC) lattices. Various termination
conditions can be defined for the walk.
There appears to be little data on the quality, in terms of coordinate root mean square deviation, cRMS, of on-lattice fits, an exception being the data of Reva et al. (15) for approximate cubic lattice fits of C
-atom traces of proteins from a small representative sample. See Tables 1 and 2 for a comparison of LocalMove with the method of Reva et al. (15,16) on the only published data of on-lattice fits that we could find.
 |
MATERIALS AND METHODS
|
|---|
LocalMove addresses the problem of finding the best on-lattice
fit for the coarse-grain model or backbone model for proteins
and RNA, with a number of parameter choices for the user. Lattice
type can be either the cubic or FCC lattice, described later.
LocalMove applies the Monte-Carlo algorithm (17,18), where energy is defined as follows. Given a conformation P = p1,..., pn, where each pi
R3, define the distance matrix D(P) = (di,j), where di,j is the Euclidean distance between pi and pj. Define the dRMS between two conformations P = p1,...,pn and Q' = q1,...,qn by
where
D(
P) = (
di,j)
and
D(
Q) = (
ei,j) are the corresponding distance matrices. To
determine approximate on-lattice fit, define the energy
E(
C)
of a given lattice conformation by

, where
P0 is the normalized conformation of monomers
C
or
C1' in the coarse-grain model, or backbone atoms, as depicted
in
Figure 1 in the backbone model. The off-lattice conformation
P0 is normalized so that distance between successive atoms is
1.
In
LocalMove, if
C' denotes the temporary conformation obtained
by replacing a
k-monomer segment in the current conformation
C, then
C' becomes the next configuration, provided that
C'
is a self-avoiding walk and either
E(
C')
E(
C) or a random real
z is less than
e– (E(C') – E(C))/RT, i.e. the Metropolis
criterion holds. Details and parameter choices for the user
are suggested below. Algorithmic details, computational experiments
for various parameters and extensive benchmarking will appear
in a companion methods paper in preparation.
Models
Backbone representation
For protein, on-lattice models have historically considered the coarse-grain representation where each residue is represented by a single point, yielding the C
-trace. For proteins, this level of granularity seems reasonable, since the average distance between consecutive C
carbons in proteins extracted from the Nucleic Acid Database (NDB) (19) yields an average of 3.8 Å with a low SD of 0.04 Å. In the case of RNA, a coarse-grain model is less able to capture the essence of an RNA conformation, since the average distance between successive C4' atoms is 6.1 Å with a SD of 0.46 Å. In the case of RNA, the backbone model thus appears to be a better representative of the conformation than is the coarse-grain model.
While it is beyond the scope of the current article to answer the question of choosing the best representation of biopolymers backbone for general on-lattice applications, we tried to offer the user the choice of a suitable representation. Namely, our algorithm extracts a subset of the atoms in the model/chain of interest, and performs its search for the best fit of this selection. The different levels of representation currently supported by LocalMove are:
|
Proteins |
RNA |
|
| Full backbone |
N-C -C |
P-O5' – C5' – C4' – C3' – O3' |
| Coarse-grain |
C or µ |
C1', N, P or µ |
|
where in the RNA coarse-grain model, the user can select among the carbon C1'- or nitrogen N-atom, both adjacent to the glycosidic bond, the backbone phosphorus or the center of mass of the nucleotide, denoted by µ.
Lattices
LocalMove supports the cubic and FCC lattice. The latter, well-known to crystallographers as one of the Bravais lattices, has contact number 12, meaning that each lattice point has 12 immediate neighbors; see Figure 2. Covell and Jernigan have shown that the FCC lattice is the most appropriate 3D lattice for fitting protein C
-atoms as a self-avoiding walk; i.e. cRMS values are smaller for the FCC than for the cubic, body-centered cubic and tetrahedral lattices.

View larger version (15K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 2. Neighbors of a point under various lattice models: (left) 3D cubic lattice, (middle) 3D FCC, (right) numbers of self-avoiding walks of various sizes on FCC. The FCC lattice can be represented as the set of all integral coordinates (x,y,z), such that . If p = (x,y,z) and q = (a,b,c), then p,q are immediate neighbors if , and |x – a|,|y – b|,|z – c| 1. Note that immediate neighbors on the FCC lattice are at Euclidean distance from each other, hence comparisons with PDB data are made after normalization that ensures unit distance between successive monomers.
|
|
Algorithm
Simulated annealing
LocalMove implements the MCMC algorithm, as well as simulated
annealing, and the user can set an initial temperature, terminal
threshold temperature and temperature scaling factor
c (i.e.
temperature is periodically decreased by
T =
c·
T). Alternatively,
a greedy descent (no Metropolis step) and a Fixed Metropolis
probability strategy are implemented.
Three strategies are implemented in LocalMove to choose an initial self-avoiding configuration: Random, a random 3D self-avoiding walk is generated; Straight line; Rounded (greedy). By rounding, we mean a greedy, iterative procedure to place the next monomer (or atom) of a growing chain on the closest lattice point to the previous monomer (or atom), while guaranteeing a self-avoiding walk. If this strategy does not produce a self-avoiding walk, which sometimes happens, then LocalMove chooses a random self-avoiding walk as the initial on-lattice conformation.
LocalMove performs local k-monomer moves, generalizing the move set of
ali et al. (14). Given a current self-avoiding walk p1,...,pn, LocalMove randomly chooses positions i,j, and replaces the intermediate k-monomer walk pi + 1,...,pj – 1, where k = j – i – 1, by a different k-monomer walk p';i + 1,...,p';j – 1 having the same vector difference. Three types of strategies are proposed regarding self-avoidance: strict, where the self-avoidance of the resulting walk is tested in linear time and the move is rejected if the test is failed; local, where only a subset of points adjacent to the insertion point are tested; none, where self-avoidance is not enforced, depending on the option. The relevant parameters handled by LocalMove for such moves are the local move size, the self-avoidance strategy and the strategy for picking a new local move at random.
LocalMove simulations can be stopped for some of the different following reasons: either a limit temperature is bypassed during the simulated annealing; a distance threshold is reached; the maximal number of steps have been performed or the simulation is stalled for too long, leaving few hope for improvement. In the latter, the required improvement over a user-defined period of time can be either relative or absolute.
Additional features
In addition to the features described earlier, our webserver gives its user the possibility to follow in realtime the lattice fitting process. After the beginning of the lattice fitting process, the user's; browser is redirected to a webpage featuring an experiment player based on the popular JMol. Additionally, an email is sent to the user, featuring an unique identifier for the ongoing experiment. By entering this identifier at any time during or after completion of the experiment, the user can access its results or follow its progress. Results are kept until about 1 week after the end of the experiment, and are then deleted. Even if such is the case, the user is proposed to repeat the experiment, using the same parameters or is allowed to modify them in a prefilled version of the webserver form. This allows for a quick and easy modification of an already run experiment.
Finally, movies can be generated automatically after the lattice fitting process is over. To that purpose, snapshots of the molecule are rendered using PyMol each 500 steps of the Monte-Carlo algorithm, and assembled using FFMpeg.
 |
RESULTS
|
|---|
Preliminary results are given in
Tables 1 and
2, to compare
LocalMove (greedy strategy, rounded initial conformation, self-avoiding
walk test for intermediate conformations) with the method of
Reva
et al. (15) (optimal parameters A=10,
T
0.1 – see
p. 7 of ref. (
15)). Although the method of Reva
et al. is clearly
superior for cubic lattice fits, it is not publicly available.
In contrast,
LocalMove provides acceptable approximate lattice
representations for cubic and FCC lattices, for various coarse
grain and backbone models of both protein and RNA.
Table 1 and 2 respectively list the best scores and average scores for cubic lattice fits of 17 protein chains of various sizes. Scores for the method of Reva et al. (15) are values of RMS in lattice units, while those of LocalMove are values of cRMS in lattice units—i.e. PDB files are scaled to have distance 1.0 between successive monomers (or atoms) when superimposing structures. RMS, as measured in ref. (15), is approximately the same as cRMS; however, there is a technical difference, explained as follows. In Reva's; method, a cubic orthonormal lattice is projected onto the C
-trace of a protein, self-consistent field is approximated, followed by dynamic programming. It is unclear from ref. (15), whether the (stochastic) cubic orthonormal lattice is defined from any three randomly chosen orthogonal basis vectors emanating from origin (0,0,0), or whether the origin is randomly chosen as well. In contrast, given the C
traces p1,...,pn and q1,...,qn, BioPython computes cRMS by superimposing the centers of mass, then computing optimal rotation matrix to return the C
-trace r1,...,rn obtained by q1,...,qn by the computed translation and rotation. The value of cRMS is then
|
To illustrate
the stability of our approach, we ran
LocalMove on all RNA models/chains
found in the NDB (
Figure 3). Namely, we fit the backbone atoms
O5',
P,
O3',
C5',
C4',
C3' on the FCC lattice. We rescaled the resulting
models and superimposed them with the original NDB backbone
data, normalized so that adjacent atoms were at distance

, the distance between adjacent lattice points
in the FCC lattice. Superimposition was performed using Biopython
http://biopython.org/. After removal of 17 spurious values,
the
cRMS values obtained when superimposing the 1735 on-lattice
RNA models/chains on (normalized) backbone data from the NDB,
we obtain mean
cRMS is 0.554169 with SD of 0.145392. Similarly,
we obtained
LocalMove fits of backbone atoms (
N,
C
,
C) of monochain
proteins from PDBselect25](
20), a nonredundant protein database,
where pairwise sequence identity is at most 25%. When on-lattice
fits were superimposed on original (normalized) backbone off-lattice
data from PDBselect25, the
cRMS had mean of 0.612181 and SD
of 0.161009.

View larger version (5K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 3. (Left) Distribution of cRMS for LocalMove best on-lattice fits for the backbones (O5',P,O3',C5',C4',C3') of 1735 RNA models/chains from the NDB, superimposed with the (normalized) NDB files. Statistics for cRMS: mean is 0.554169, SD is 0.145392, both measured in lattice units. (Right) Distribution of cRMS for LocalMove best on-lattice fits for the backbone (N,C ,C) of 1733 (monochain) proteins from PDBselect25 (20), a nonredundant protein database (pairwise, proteins have at most 25% sequence identity), superimposed with the (normalized) original PDB files. Statistics for cRMS: mean is 0.612181, SD is 0.161009.
|
|
View this table:
[in this window]
[in a new window]
|
Table 1. Comparison of best scores out of 100 runs. Scores are RMS for the optimized method of Reva et al. (15) (A = 10,T 0.1,shells1,2), while remaining scores are cRMS using various strategies LocalMove (See text for distinction between RMS and cRMS.)
|
|
View this table:
[in this window]
[in a new window]
|
Table 2. Comparison of average scores out of 100 runs, for method of Reva et al. (15) and the four strategies of LocalMove, as explained in Table 1
|
|
For these experiments with both NDB and PDBselect25,
LocalMove was run for one million steps, using the greedy (Monte Carlo
with zero probability for Metropolis moves) strategy with at
most three-monomer moves. In this case, the greedy strategy
attempts to minimize
dRMS; i.e.
LocalMove accepts a randomly
proposed
k-monomer move, for
k 
3, provided that the
dRMS score
of the proposed move is lower. The initial on-lattice structure
determined by
rounding. We allow early termination when relative
improvement is <0.01%; i.e. after every 15 000 steps, if
the relative difference between best score and that of an ancestor
15 000 steps prior to current step is less than 0.0001, (recall
that score means
dRMS) then computation terminates. Technically,
this means that we compute whether s
0 –
s1/s
0 < 0.0001,
where
s0 denotes the ancestor score 15 000 steps before and
s1 denotes the current move.
 |
DISCUSSION
|
|---|
In this article, we present a new web server,
LocalMove, capable
of determining approximate on-lattice fits of protein and RNA
3D conformations on the cubic and the FCC lattice (
Figure 4).
LocalMove returns the PDB file of the approximate on-lattice
fit, and interactively displays a dynamic movie of 3D images
of intermediate conformations during the computation. In
Tables 1 and
2, we benchmark
LocalMove against what appears to be the
only publicly available data set for previous on-lattice fits.
To the best of our knowledge, no other method is publicly available
to compute on-lattice fits of protein and RNA molecules. Reva's;
method and most of the earlier methods handle only the cubic
lattice, known not to be optimal for biopolymer folding, while
LocalMove handles cubic and FCC lattices with a variety of coarse
grain and backbone models for both protein and RNA.
We believe that the new server,
LocalMove (
Figure 5), as well
as our previous 3D RNA motif detection server,
DIAL, described
in Ferrè
et al. (
21), will contribute to better detection
and classification of RNA motifs, essential ultimately for predicting
tertiary structure, catalytic sites and function of RNA.
 |
ACKNOWLEDGEMENTS
|
|---|
We would like to thank anonymous referees for helpful comments
and suggestions. The initial MCMC algorithm
LocalMove was designed
by P.C. and implemented in C++ for the undergraduate thesis
of R.I. and E.P., directed by P.C., who then entirely re-wrote
the core of
LocalMove, while Y.P. extended the algorithm and
built the web server, requiring an interface using Python, JavaScript
Jmol and mySQL. Invaluable technical help was provided by staff
bioinformatics programmer, Jason Persampieri.
LocalMove is somewhat
related to the software MoCaPro, written by Sebastian Will under
the direction of P.C. at Ludwig-Maximilians-Universität
München. The latter software generalized the work of

ali
et al. (14); MoCaPro is not publicly available and cannot perform
the computations supported by
LocalMove.
The research of P.C. and Y.P. was supported by National Science Foundati grant DBI-0543506 with PI P.C. Funding to pay the Open Access publication charges for this article was provided by NSF grant DBI-0543506.
Conflict of interest statement. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation.
 |
Footnotes
|
|---|
The authors wish it to be known that, in their opinion, the
first two authors should be regarded as joint First Authors
 |
REFERENCES
|
|---|
- Abe H, Go N. Noninteracting local-structure model of folding and unfolding transition in globular proteins. II. Application to two-dimensional lattice proteins. Biopolymers (1981) 20:1013–1031.[CrossRef][Web of Science][Medline]
- Lau K, Dill KA. A lattice statistical mechanics model of the conformational and sequence spaces of proteins. J. Am. Chem. Soc. (1989) 22:3986–3997.
- Hart WE, Istrail SC. Fast protein folding in the hydrophobic-hydrophilic model within three-eighths of optimal. J. Comput. Biol (1996) 3:53–96.[Web of Science][Medline]
- Kihara D, Lu H, Kolinski A, Skolnick J. TOUCHSTONE: an ab initio protein structure prediction method that uses threading-bases tertiary restraints. Proc. Natl Acad. Sci. USA (2001) 98:10125–10130.[Abstract/Free Full Text]
- Dokholyan NV, Buldyrev SV, Stanley HE, Shakhnovich EI. Discrete molecular dynamics studies of the folding of a protein-like model. Fold. Des (1998) 3:577–587.[CrossRef][Web of Science][Medline]
- Berman HM, Battistuz T, Bhat TN, Bluhm WF, Bourne PE, Burkhardt K, Feng Z, Gilliland GL, Iype L, Jain S, et al. The Protein Data Bank. Acta Crystallogr. D. Biol. Crystallogr (2002) 58:899–907.[CrossRef][Medline]
- Manuch J, Gaur DR. Fitting protein chains to cubic lattice is np-complete. J. Bioinform. Comput. Biol (2008) 6:93–106.[CrossRef][Medline]
- Godzik A, Kolinski A, Skolnick J. Lattice representations of globular proteins: How good are they? J. Comput. Chem (1993) 14:1194–1202.[CrossRef][Web of Science]
- Rabow AA, Scheraga HA. Lattice neural network minimization. Application of neural network optimization for locating the global-minimum conformations of proteins. J. Mol. Biol (1993) 232:1157–1168.[CrossRef][Web of Science][Medline]
- Rykunov DS, Reva BA, Finkelstein AV. Accurate general method for lattice approximation of three-dimensional structure of a chain molecule. Proteins Struct. Funct. Genet (1995) 22:100–109.[CrossRef][Web of Science][Medline]
- Park BH, Levitt M. The complexity and accuracy of discrete state models of protein structure. J. Mol. Biol (1995) 249:493–507.[CrossRef][Web of Science][Medline]
- Reva BA, Rykunov DS, Olson AJ, Finkelstein AV. Constructing lattice models of protein chains with side groups. J. Comput. Biol (1995) 2:527–535.[Medline]
- Huang X. Fitting protein chains to lattice using integer programming approach. M.Sc. thesis (2007) School of Computing Science, Simon Fraser University.
- Sali A, Shakhnovich E, Karplus M. How does a protein fold? Nature (1994) 369:248–251.[CrossRef][Medline]
- Reva B, Finkelstein A, Rykunov D, Olson A. Building self-avoiding latice models of proteins using a self-consistent field optimization. Proteins Struct. Funct. Genet (1996) 26:1–8.[CrossRef][Web of Science][Medline]
- Reva BA, Rykunov DS, Finkelstein AV, Skolnick J. Optimization of protein structure on lattices using a self-consistent field approach. J. Comput. Biol (1998) 5:531–538.[Web of Science][Medline]
- Metropolis N, Rosenbluth A, Rosenbluth M, Teller A, Teller E. Equation of state calculations by fast computing machines. J. Chem. Phys (1953) 21:1087–1092.[CrossRef]
- Clote P, Backofen R. Computational Molecular Biology: An Introduction (2000) John Wiley & Sons Ltd. Chichester, pp. 279.
- Berman HM, Olson WK, Beveridge DL, Westbrook J, Gelbin A, Demeny T, Hsieh SH, Srinivasan AR, Schneider B. The nucleic acid database. A comprehensive relational database of three-dimensional structures of nucleic acids. Biophys. J (1992) 63:751–759.[Web of Science][Medline]
- Hobohm U, Sander C. Enlarged representative set of protein structures. Protein Sci (1994) 3:522.[Web of Science][Medline]
- Ferre F, Ponty Y, Lorenz WA, Clote P. DIAL: a web server for the pairwise alignment of two RNA three-dimensional structures using nucleotide, dihedral angle and base-pairing similarities. Nucleic Acids Res (2007) 35. W659–W668.

CiteULike
Connotea
Del.icio.us What's this?