| Nucleic Acids Research | Pages |
A database of macromolecular motions
Introduction
Overall Organization Of The Database
Unique motion identifier
Attributes of a motion
Hierarchical Classification Scheme Based On Size Then Packing
Size classification: fragment, domain, subunit
Packing classification: hinge and shear
Other classification Annotation Of Evidence Related To The Motion
Levels of annotation and types of experimental information
Inferred motions
Computer Implementation As A Relational Database
Representing Motion Pathways As 'Morph Movies'
Conclusion And Future Directions
Acknowledgements
References
A database of macromolecular motions
ABSTRACT
INTRODUCTION
Motions of macromolecules (proteins and nucleic acids) are often the essential link between structure and function; that is, motion is frequently the way a structure actually carries out a particular function. Protein motions, in particular, are involved in many basic functions such as catalysis, regulation of activity, transport of metabolites, formation of large assemblies and cellular locomotion. Highly mobile proteins have, in fact, been implicated in a number of diseases, e.g., the motion of gp41 in AIDS and that of the prion protein in scrapie (19,27,45,79,111).
Macromolecular motions are also of intrinsic interest because of their fundamental relationship to the principles of protein and nucleic acid structure and stability. They are, however, among the most complicated biological phenomena that can be studied in great quantitative detail, involving concerted changes in thousands of precisely specified atomic coordinates. Moreover, the time scales of macromolecular motions range over more than nine orders of magnitude (from sub-nanosecond loop closures to more than one second refoldings; 26,71,74) placing their study beyond any single type of experimental technique or numerical simulation.
Fortunately, it is now possible to study these motions in a database framework, by analyzing and systematizing many of the instances of protein structures solved in multiple conformations. We present here a comprehensive database of macromolecular motions, intended to be of use to those studying structure-function relationships (e.g. as in rational drug design; 64) and also to those involved in large-scale proteome or genome surveys (33,37,59). There are a number of reasons why it is favorable (and feasible) at present to construct such a database. (i) The amount of raw data (known protein and nucleic acid structures and sequences homologous to them) is rapidly increasing (15,48,78), and an increasing fraction of new structures have non-trivial motions (see below). (ii) The graphical and interactive nature of a database is particularly well suited for presenting macromolecular motions, which are often difficult to represent on a static journal page. [This is particularly true because many published papers about interesting motions do not precisely describe the relationship between the motion and specific publicly accessible coordinate files and viewing orientations. That is, many papers do not tell you that, say, the atomic coordinates for the open form have identifier 6LDH and those for the closed form, 1LDM, and that the motion is best viewed when looking down the crystallographic 3-fold after fitting residues 5-90.] (iii) A loose infrastructure of federated databases has emerged in the structural community, allowing the motions database to connect to a variety of information sources (114) (see the list in the legend to Fig.
![]() |
Query with 'Calmodulin'
![]() |
Figure 1. The motions database on the web. (Left) World Wide Web 'home page' of the database. One can type keywords into the small box at the top to retrieve entries. (Right) An entry retrieved by such a keyword search (the entry for calmodulin). Graphics and movies are accessed by clicking on an entry page. (These have been deliberately segregated from the textual parts of the database since the interface was designed to make it easy to use on a low-bandwidth, text-only browser, e.g. lynx or the original www_3.0). An example of a segregated graphic for calmodulin is the movie shown in Figure 5. The main URL for the database is http://bioinfo.mbb.yale.edu/MolMovDB . Beneath this are pages listing all the current movies, graphics illustrating the use of VRML to represent endpoints, and an automated submission form to add entries to the database. The database has direct links to the PDB for current entries (http://www.pdb.bnl.gov ); the obsolete database for out-of-date entries (http://pdbobs.sdsc.edu ); scop for structure classification (http://scop.mrc-lmb.cam.ac.uk ); Entrez/PubMed for literature citations (http://www.ncbi.nlm.nih.gov/PubMed ); LPFC for core structures (Library of Protein Family Core Structures, http://smi-web.stanford.edu/projects/helix/LPFC ); and GeneCensus for information related to structural genomics (http://bioinfo.mbb.yale.edu/census ) (3,75,95,96). Through these links one can easily connect to other common protein databases such Swiss-Prot, Pro-Site, CATH, RiboWeb and FSSP (4,7,8,21,47,78). For all these links, PDB identifiers or PubMed unique IDs are used as foreign keys. External databases may also link to entries in the motions database by using PDB identifiers as foreign keys. In particular, the interface to the database is via the following URL convention: http://bioinfo.mbb.yale.edu/MolMovDB/search.cgi?pdb=1abc , where 1abc is a PDB structure identifier referenced in the movements database. Furthermore, information on the database's public interface and on linking external resources to it may be obtained at http://bioinfo.mbb.yale.edu/MolMovDB/linkhelp.txt . We are developing transaction-processing features that allow authorized remote experts to serve as database editors and anticipate that these will become an important part of the interface in the future. (This figure, as well as Figs 2-5, is adapted directly from the web presentation of the database, which is copyright, Gerstein and Krebs, 1998).
Only one previous attempt has been made at the systematic classification of protein motions. Boutonnet et al. (14) do not present a database but rather develop an automatic tool for classifying proteins. In indirectly related work, a data set of protein interfaces has also been developed (108).
OVERALL ORGANIZATION OF THE DATABASE
A public interface to the database exists on the World Wide Web at http://bioinfo.mbb.yale.edu/MolMovDB . Presently, this consists of a set of coupled hypertext pages with graphic images and a simple query box, though more sophisticated interfaces are planned in the future. As shown in Figure
Unique motion identifier
Each entry is indexed by a unique motion identifier, rather than around individual proteins and nucleic acids. This is because a single macromolecule can have a number of motions and the same essential motion can be shared amongst different macromolecules (see below). (The motion identifier is a short string like 'igelbow,' which attempts to evoke some characteristic of the motion or protein in the mnemonic style of the SwissProt identifiers; 7.)
Attributes of a motion
In addition to the motion identifier, each entry has the following information.
(i) Classification. A classification number gives the place of a motion in the size and packing classification scheme for motions described below. In addition to its basic classification, a motion can also be annotated as being 'similar-to' or 'sharing-characteristics-with' a motion in a different protein or 'part-of' or 'containing' another motion in the same protein. For instance, the motions in all the different bacterial sugar binding proteins are similar to each other (98,110), and the domain closure in aspartate carbamoyltransferase is clearly part of and driven by a larger allosteric transition, involving the motion of subunits (103,104). (ii) Structures. Databank identifiers are given for the various conformations of the macromolecule (e.g. open and closed). These act as foreign keys into other databases. In particular, they have been used to link directly to the entries in the main protein and nucleic acid databases (PDB and NDB), to sequence and journal cross-references via the Entrez and MMDB, and to related structures via the Structural Classification of Proteins (SCOP) (3,11,28,46,51,75,96). In the more highly annotated entries, residue selections are given for the main rigid core, for other secondary cores moving rigidly relative to the main core and for flexible hinge regions linking the cores. (iii) Literature. Literature references are given. Where possible these are via Medline unique identifiers, allowing a link to be made into the PubMed database (28,96).(iv) Blurb. Each entry has a paragraph or so of plain text documentation. While this is, in a sense, the least precisely defined field, it is the heart of each entry, describing the motion in intelligible prose and referring to figures, where appropriate. The rationale behind each motion's classification is discussed, at least implicitly, here.(v) Standardized nomenclature. For many entries we describe the overall motion using standardized numeric terminology, such as the maximum displacement (overall and of just backbone atoms) and the degree of rotation around the hinge. These statistics are summarized in Table 1. We also attempt to give the transformations [from (ii)] needed to optimally superimpose and orient each coordinate set to best see the motion (i.e. down screw-axis) and the selections of residues with large changes in torsion angles, packing efficiency or neighbor contacts.HIERARCHICAL CLASSIFICATION SCHEME BASED ON SIZE THEN PACKING
Size classification: fragment, domain, subunit
In the classification scheme currently in use, the most basic division is between proteins and nucleic acids. There are far fewer motion entries for nucleic acids than for proteins, reflecting the much larger number of known protein structures. [At the time of writing, the PDB contained in excess of 6600 protein structures, but <600 nucleic acids structures.]
Table 1.
| Value | No. of entries | Minimum | Maximum | Average |
| Maximum C[alpha] displacement | 1.5 | 60 | 12 | |
| Maximum atomic displacement | 3 | 8.8 | 10 | 9.3 |
| Maximum rotation | 12 | 5 | 148 | 24 |
| Maximum translation | 2 | 0.7 | 2.7 | 1.7 |
Currently, the database includes the nucleic acid motions evident from comparing various conformations of the known structures of catalytic RNAs and tRNAs (specifically, the Hammerhead ribozyme, the P4-P6 domain of the Group II intron and Asp-tRNA; 18,81,85,91,97).
The classification scheme for proteins has a hierarchical layout shown in Figure Nearly all large proteins are built from domains, and domain motions, such as those observed in hexokinase or citrate synthase (10,86), provide the most common examples of protein flexibility (9,39,53). The motion of fragments smaller than domains usually refers to the motion of surface loops, such as those in triose phosphate isomerase or lactate dehydrogenase, but it can also refer to the motion of secondary structures, such as of the helices in insulin (2,24,113). Often domain and fragment motions involve portions of the protein closing around a binding site, with a bound substrate stabilizing a closed conformation. They, consequently, provide a specific mechanism for induced-fit in protein recognition (61,62). In enzymes this closure around a binding site has been analyzed in particular detail (6,57,58,92,106). It serves to position important chemical groups around the substrate, shielding it from water and preventing the escape of reaction intermediates. Figure 2. Schematic showing the overall classification scheme for motions. (Top) The database is organized around a hierarchical classification scheme, based on size (fragment, domain, subunit) and then packing (hinge or shear). Currently, the hierarchy also contains a third level for whether or not the motion is inferred. (Bottom) Schematic showing the difference between shear (sliding) and hinge motions. This figure is adapted from the database and refs 38 and 39. It is important to realize that the hinge-shear classification in the database is only 'predominate' so that a motion classified as shear can contain a newly formed interface and one classified as hinge can have a preserved interface across which there is motion. The essential characteristics of the various motions are summarized below. To annotate a macromolecule's classification succinctly a three-letter short-hand code is used. It designates the major classification (fragment, domain, subunit, complex or nucleic acid), sub-classification (hinge, shear, allosteric, non-allosteric, RNA or DNA), and whether or not the motion has been solved structurally in at least two conformations. For example, 'D-h-2' would indicate a domain hinge motion with at least two conformations solved. Subunit motion is distinctly different from fragment or domain motion. It affects two large sections of polypeptide that are not covalently connected. It is often part of an allosteric transition and tied to regulation (29,80). For instance, the relative motions of the subunits in the transport protein hemoglobin and the enzyme glycogen phosphorylase change the affinity with which these proteins bind to their primary substrates (30,54).
Shear Mechanism
Hinged Mechanism
Well-packed interfaces
MAINTAINED,
throughout motionNOT MAINTAINED,
rather created, burying surface
Mainchain packing
Constrained by close packing
Free to kink at hinge
Mainchain torsions
Many small changes
A few large changes
Motion overall
Concatenation of small local motions
Identical to twisting at hinge
Motion at interface
Parallel to plane of interface (shear)
Perpendicular to interface
Sidechain packing
Same packing in both forms
New contacts; packing at base of hinge crucial
Sidechain torsions
Mostly small changes
Some large changes
Simple example
Trp repressor, insulin
Lactoferrin, calmodulin
Packing classification: hinge and shear
We have systematized the motions of protein domains and smaller units on the basis of packing, using an expanded version of a scheme developed previously (39). This is because the tight packing of atoms inside of proteins provides a most fundamental constraint on protein structure (42,44,68,87-89). It is usually impossible for an atom inside a protein to move much without colliding with a neighboring atom, unless there is a cavity or packing defect (49,50).
Internal interfaces between different parts of a protein are packed very tightly (35,38,39). Furthermore, they are not smooth, but are formed from interdigitating sidechains. Common sense consideration of these aspects of interfaces places strong constraints on how a protein can move and still maintain its close packing. Specifically, maintaining packing throughout a motion implies that the sidechains at the interface must maintain their same relative orientation and pattern of inter-sidechain contacts in both conformations (e.g. open and closed).
These straightforward constraints on the types of motions that are possible at interfaces allow an individual movement within a protein to be described in terms of two basic mechanisms, shear and hinge, depending on whether or not it involves sliding over a continuously maintained interface (39) (Fig.
Figure 3. Close-up on the shear mechanism. This figure gives a close up illustrating shear motion in one protein, citrate synthase (39,66). (Top) Representative shear motions between close-packed helices. Note how the mainchain only shifts by a small amount and the sidechains stay in the same rotamer configuration.(Centre, left) Diagram of one subunit of citrate synthase (1CTS) gives an overall view of the protein showing that it is composed of many helices. The adjacent subunit is related by 2-fold axis shown. (The small two-stranded sheet is omitted to improve clarity.) [alpha]-helices are represented by cylinders. The small domain contains helices N, O, P, Q and R. The mobile OP helix is highlighted. (Centre, right) Details on the mobile interfaces. The orientation is perpendicular to the 2-fold axis. The particular section is indicated by the dotted line on the centre left subfigure. Selected helixes from both subunits are shown. (Upper-case letters are for one subunit and lower-case letters are for the other one.) The helices shown with white lettering on a black background are motionless, while those shown in black on white move appreciably. Edges indicate the existence of helix-helix packing in both the open and closed form. Double edges are nearly parallel packing (0-30°); single edges, intermediate packing (30-60°); and dotted edges, crossed packing (60-90° and on-end packing). There is no packing between helixes L and N because helixes L, M, G and F are much higher (coming out of page) than O, N, Q, P, R and K. S and I are long and make contacts with both sets. Note in the diagram how the dimer neatly divides into six layers with the active site, indicated by a star, at the intersection between layers. This is representative of how proteins undergoing shear motions can be divided into layers. Part of one subunit is enlarged at the bottom of the diagram and shows the relative movements of the principal helices in citrate synthase. The shifts (in Angstroms) and rotations (in degrees) show local changes in the positions of pairs of packed helices (i.e. the movement in one helix in a pair relative to the other). Clearly, larger relative movements tend to be associated with more crossed helix-helix packing. (Bottom) Depiction of how these small motions can be added together to produce a large overall motion. Specifically, many small motions add up to shift helix O by 10.1 Å and rotate it by 28°. The incremental motion in shear domain closure is shown by C[alpha] traces of the whole protein and of a close-up of the OP loop. Black is the apo form; white, holo form; gray, cumulative effect of motion over the K, P and then Q helix-helix interfaces. (The apo form was fit to the holo form, first on the core, and then on the K, P and Q helices.)
(ii) Hinge. As shown in Figure
Figure 4. Close-up on the hinge mechanism. The figure shows the hinge motion in lactoferrin (38,39). (Left) Ribbon drawing of the protein in the open conformation. The view is down the screw-axis, which is indicated in the figure by the circle with the dot in it. The screw-axis passes very close to the hinge region, which occurs in the middle of two [beta] strands (highlighted in bold). (Center left and center right) Open and closed conformations in terms of space filling slices. A thick black line highlights the hinge region. Note how few packing constraints there are on the hinge in contrast to the other atoms in the protein. (Right) A close-up of the hinge region. (The numbered residues correspond to the open circles in the ribbon drawing.) (Figure adapted from the database and ref. 38). Gerstein et al. (36,38,40) analyzed the hinged domain and loop motion in specific proteins (lactate dehydrogenase, adenylate kinase, lactoferrin). These studies emphasized how critical the packing at the base of a protein hinge is-in the same sense that the 'packing' at the base of an everyday door hinge determines whether or not the door can close). Protein hinges are special regions of mainchain in that they are exposed and have few packing constraints on them and are thus free to sharply kink (Fig. It is important to emphasize that most shear motions do, in fact, contain hinges (joining the various sliding parts) and that the existence of a hinge is not the salient difference between the two basic mechanisms-rather it is the existence of a continuously maintained interface.
Other classification
Most of the fragment and domain motions in the database fall within the hinge-shear classification. However, there are a number of exceptions, and we have created special categories to deal with them.
(i) A special mechanism that is clearly neither hinge nor shear accounts for the motion. An example of this sort of motion is what occurs in the immunoglobulin ball-and-socket joint (67), where the motion involves sliding over a continuously maintained interface (like a shear motion) but, because the interface is smooth and not interdigitating, the motion can be large (like a hinge). (ii) Motion involves a partial refolding of the protein. This usually results in dramatic changes in the overall structure. Examples where both endpoints are known include the motion in the serpins and influenza virus haemagglutinin (17,102). Also, included in this category are order-to-disorder transitions (as when a DNA recognition domain becomes ordered upon binding DNA), protein domains that only become structured upon oligomerization (e.g. leucine zipper dimerization domain), and pro-enzymes that dramatically change shape upon cleavage. (iii) Motion cannot yet be classified. An example of this is the [beta]-sheet deformations in the TATA-box binding protein (20,56).For the motions of subunits a different division is made (other than hinge or shear):
(i) Allosteric. Examples include hemoglobin and aspartate carbamoyltransferase (30,103,104).Table 2.
|   | Size | Domain | Fragment | Subunit | Complex | Total | |||||
| Mechanism | |||||||||||
| Hinge | 38 | 51% | 16 | 59% | 54 | 44% | |||||
| Shear | 14 | 19% | 3 | 11% | 17 | 14% | |||||
| Partial refolding | 5 | 7% | 5 | 4% | |||||||
| Allosteric | 8 | 57% | 8 | 7% | |||||||
| Other/non-allosteric | 2 | 3% | 1 | 4% | 6 | 43% | 9 | 7% | |||
| Unclassifiable | 15 | 20% | 7 | 26% | 3 | 50% | 25 | 20% | |||
| Notably motionless | 1 | 1% | |||||||||
| Nucleic acid | 3 | 50% | 3 | 2% | |||||||
| Knowna / % category | 53 | 72% | 25 | 93% | 11 | 79% | 5 | 83% | 94 | 77% | |
| Suspected / % category | 21 | 28% | 2 | 7% | 3 | 21% | 1 | 17% | 28 | 23% | |
| Totals / % DB | 74 | 61% | 27 | 22% | 14 | 11% | 6 | 5% | 122 | 100% | |
A breakdown of the categorization of entries in the current database is given in Table 2. At the time of this writing (version 1.71), the database describes 122 macromolecular motions which reference 249 PDB structures. The hinge mechanism is the most common classification in the database, accounting for 45% of the entries. Over 60% of the motions in the database are classified as domain motions. Interestingly, a greater percentage of fragment motions have structures for multiple conformations in the motion, probably reflecting the greater ease with which these smaller motions can be studied experimentally.
ANNOTATION OF EVIDENCE RELATED TO THE MOTION
Levels of annotation and types of experimental information
For each entry in the database, we have tried to indicate the evidence behind its description and classification: i.e. is it based on careful manual analysis of two conformations, automatic output of a conformation comparison program, inference based on structure comparison or inference based on sequence comparison? Thus, a clear distinction is made between the carefully documented, 'gold-standard' motion in lactoferrin (i.e. as shown in Fig.
At present, nearly all entries in the motions database are the result of careful manual analysis and classification; thus, the current database is intended to serve as an accurate 'core' around which a much larger, semi-automatically populated database may be constructed. We hope that this attention to the evidence behind the motion in the annotation will allow the database to grow rapidly in the future without becoming corrupted with false assertions. [It is worth noting that this approach to evidence is not always taken in the annotation of the sequence databanks which is now leading to problems with the advent of large-scale genome sequencing. For instance, the following often arises: a scientist biochemically and structurally characterizes a particular motif, say a zinc finger, in one protein (protein A). This is added to the database and annotated as a zinc finger. A second investigator sequences another protein (B), does a databank similarity search and finds this protein is similar to protein A. Based on this, protein B is annotated in the database as a zinc finger. Now a third investigator sequences protein C. This is found to be similar to B and is, consequently, thought to be a zinc finger. Clearly, the chain of evidence is getting much weaker.]
Experimental information on macromolecular movements comes from a number of sources: X-ray structures of particular proteins and nucleic acids in different conformational states (typically 'open' and 'closed,' but other configurations occur, e.g. in allostery and order-disorder transitions), NMR studies (e.g. Pf1 coat protein; 99), time-resolved studies (e.g. ras, PYP, bacteriorhodopsin; 32,94,107), fluorescence techniques and small-angle scattering. There is much less information on the time scales of the motions in comparison to the detailed information on coordinate changes. Some 95% of entries in the database have been studied by traditional X-ray crystallography, and 8% by NMR (Table 3). A smaller number have been investigated by other techniques, such as time-resolved crystallography.
Inferred motions
Thus far, the discussion has focused only on 'well-documented' motions, where high-resolution structures of at least two conformations (i.e. open and closed) are known. However, there is also the situation where one knows a single conformation of a given protein (A) is similar in structure to another protein (B) and that protein B has a well-documented motion. In this case, one can reasonably infer that protein A has a similar motion to that in protein B. Inferred motions are principally added to the database by finding sequence or structure homologues of a protein or nucleic acid already in the database. The inference is currently expressed at the top level in the preliminary classification scheme (Fig.
Table 3.
| Experimental technique | Entries studied by this technique | Fraction of database (%) |
| All techniques | 122 | 100 |
| Traditional X-ray crystallography | 116 | 95 |
| NMR | 9 | 7 |
| Molecular dynamics simulations | 4 | 3 |
| Time-resolved crystallography | 3 | 2 |
| Circular dichroism (CD) | 2 | 2 |
| Fourier transform infrared | 1 | <1 |
| Molecular biology studies of motion | 1 | <1 |
Motions can also be inferred based on a single known conformation and evidence based on requirements for the macromolecule's function, careful calculations or small-angle scattering experiments. Examples include the motions in myosin (84), plasminogen (70) and acetylcholinesterase (41). In total, ~78% of the motions have solved structures available for two or more conformations; for the remaining 22% the motions are inferred.
COMPUTER IMPLEMENTATION AS A RELATIONAL DATABASE
Standard tools and approaches are currently used in the implementation of the database. A free relational database server engine, called mini-SQL (52), has been used with a schema that contains ~20 tables. Data entry has been done through a variety of methods: a web form, Microsoft Access and Excel (using ODBC connectivity or the dbf2msql program), or via the emacs text editor (101) (using a custom 'mode' written in elisp). Initially, the web pages were generated 'on the fly' in response to a query but then it was decided to pre-build most of them. This proved to be an unexpectedly good move as it allowed on-line search engines to automatically build indices (e.g. AltaVista), enabling the database to be easily queried from outside. Because it is built using very standard tools, the database has been easily ported into a variety of programs (e.g. Oracle) and into a variety of PC mail-merge programs (for nicely formatted output). Although we plan to maintain pre-built pages in the future, we are investigating the use of high-speed web-database connectivity software (such as Informix's Web datablade) to allow instantaneous updates to the database's Web presence yet maintain a level of performance comparable to static pages.
Figure 5. Interpolated motion pathways. A preliminary pathway of the hinge motion in the protein calmodulin is shown (73). This was constructed by a variant of the second method of interpolation; it involves Cartesian interpolation with minimization of the intermediate structures using both stereochemical and packing terms. This and >30 other movies are available at http://bioinfo.mbb.yale.edu/MolMovDB/movie . For the actual generation of representations, currently one orientation is chosen (i.e. down the screw-axis) and then the animated intermediates are drawn in a variety of 2D-movie formats (MPEG, QuickTime, SGI movie format, MultiGIF and so on). Preliminary 3D animation has been implemented using the new VRML-2 specification (100); however, we have encountered some compatibility problems due to the great state of flux that VRML 2.0 browser software presently is in. Calmodulin, which is shown in Figure 1 as well as in this figure, is one of the more highly annotated motions in the database. It provides a good example of how the overall annotation process works. A motion is initially brought to our attention either directly by researchers solving particular structures or indirectly by surveying the literature. Once we decide to add it to the database, we do a comprehensive literature search, usually via Medline, and retrieve from the original publications statistics associated with the motion. It is in itself quite a complex nomenclature problem to reconcile the many different terms used to describe motion and create truly standardized statistics (such as a well-defined maximum atomic displacement or precise selections for hinge residues). This is one aspect of the larger problem of nomenclature that is becoming increasingly important in bioinformatics (1,83). Next, we fetch coordinate sets from the PDB and run various comparison programs on these structures (e.g. to calculate torsion angle differences, do least-squares fits, evaluate packing, etc.). Part of the process of conformation comparison is the generation of a 'morph movie,' such as the one shown in the figure. Our server (W.Krebs and M.Gerstein, in preparation) can produce a morph completely automatically. Typically, two structures are selected as being representative of the endpoints of the motion. Intermediate conformations are generated from these endpoints by linear interpolation with restraints applied at each interpolated time point to ensure realism. (For the case of calmodulin, bond length and angle restraints were applied.) The interpolated coordinates are joined into an animation through the use of any of a number of widespread molecular rendering software packages (e.g. Molscript or Rasmol; 63,93). Morphing and automatic conformation comparison generates a second, more standardized set of statistics, which can be compared against those culled from the literature. Finally, based on running programs and reading the literature, we decide on the motion classification and write the entry. Presently, much of this process is done manually, but we hope to automate large amounts of it in the future. The automatic classification tool developed by Boutonnet et al. (14) may be useful in this regard. Because our database schema is flexible, it can readily accommodate different types of automatic and manual annotation. In total, the database presently contains many disparate types of information: standardized annotation values, literature references, large blocks of free-text, three-dimensional structures and motion pathways. This presents a particular challenge in terms of integrating the information in a comprehensible format. At present, many of the elements (e.g. movies) are stored outside of the central database (and accessed via stored pointers) or in the actual tables as large binary objects ('BLOBS'). We are presently migrating the database to an object-relational system made by Informix, a commercial product that traces its roots to the postgres database project at Berkeley (60,90,105). The object-relational database model supports the referencing of complex data types in relational tables and sophisticated querying of these complex types through user-defined functions. There are also plans to develop a data dictionary for the database around mmCIF (13).
REPRESENTING MOTION PATHWAYS AS 'MORPH MOVIES'
One of the most interesting of the complex data types kept in the database are 'morph movies' which give a plausible representation for the pathway of the motion. These movies can immediately give the viewer an idea of whether the motion is a rigid-body displacement or involves significant internal deformations (as in tomato bushy stunt virus versus citrate synthase). Pathway movies were pioneered by Vonrhein et al. (109), who used them to connect the many solved conformations of adenylate kinase.
Normal molecular-dynamics simulations (without special techniques, such as high temperature simulation or Brownian dynamics; 55,71,112) cannot currently approach the time scales of most of the motions in the database, which are estimated to be from several nanoseconds (loop closure) to several seconds (slow refolding) (26,71,74). Consequently, a pathway movie cannot be generated directly via molecular simulation alone. Rather, it is constructed as an interpolation between known endpoints (usually two crystal structures). The interpolation can be done in a number of ways.
(i) Straight Cartesian interpolation. The difference in each atomic coordinate (between the known endpoint structures) is simply divided into a number of evenly spaced steps, and intermediate structures are generated for each step. This was the method used by Vonrhein et al. It is easy to do, only requiring that the beginning and ending structures be intelligently positioned by fitting on a motionless core (34). However, it produces intermediates with clearly distorted geometry. (ii) Interpolation with restraints. This is the above method where each intermediate structure is restrained to have correct stereochemistry and/or valid packing. One simple approach is to energy minimize each intermediate (with only selected energy terms) using a molecular mechanics program, such X-PLOR (16). This technique will be described more fully in a forthcoming paper (W.Krebs and M.Gerstein, in preparation). The database, furthermore, is currently home to an experimental server that applies this interpolation technique to two arbitrary structures, generating a movie.CONCLUSION AND FUTURE DIRECTIONS
We have constructed a database of macromolecular motions, which currently documents >120 motions. To describe each motion we have developed a classification scheme based on size then packing (whether or not there is motion across a well-packed interface) and a standardized nomenclature, such as maximum atomic displacement or degrees of rotation. We have also developed a way of annotating and categorizing inferred motions.
At present, many of the standardized statistics are culled from the literature, and most of the classification is done by eye. However, in the future much of the annotation will be done automatically with software tools. In particular, we are developing tools to objectively determine standardized statistics for a motion, produce 'morph movies,' locate flexible linkers using amino-acid composition or crystallographic temperature factors, classify motions, and cross-reference new motions to manually annotated 'gold-standards' (using sequence and structure comparison).
We anticipate that the database w e known endpoint structures) is simply divided into a number of evenly spaced s\ill constitute an important resource for the molecular biology community. In fact, we expect that the number of macromolecular motions will greatly increase in the future, making a database of motions increasingly valuable. The reasoning behind this conjecture is as follows: the number of new structures continues to go up at a rapid rate (nearly exponential). However, the increase in the number of folds is much slower and is expected to level off much more in the future as we find more and more of the limited number of folds in nature, estimated to be as low as 1000 (15,23). Each new structure solved that has the same fold as one in the database represents a potential new motion-i.e. it is often a structure in a different liganded state or a structurally perturbed homologue. Thus, as we find more and more of the finite number of folds, crystallography and NMR will increasingly provide information about the variability and mobility of a given fold, rather than identify new folding patterns.
ACKNOWLEDGEMENTS
The authors gratefully acknowledge the financial support of the National Science Foundation (Grant DBI-9723182) and the numerous people who have either contributed entries or information to the database or have given us feedback on what the user community wants. The authors also wish to thank Informix Software, Inc. for providing a grant of its database software.
REFERENCES
This article has been cited by other articles:
This page is run by Oxford University Press, Great Clarendon Street, Oxford OX2 6DP, as part of the OUP Journals
Comments and feedback: www-admin{at}oup.co.uk
Last modification: 29 Aug 1998
Copyright©Oxford University Press, 1998.
![]()
CiteULike
Connotea
Del.icio.us What's this?
![]()
![]()

![]()
![]()
![]()
A. Vallee-Belisle, F. Ricci, and K. W. Plaxco
Thermodynamic basis for the optimization of binding-induced biomolecular switches and structure-switching biosensors
PNAS,
August 18, 2009;
106(33):
13802 - 13807.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. A. Hellmann and S. A. Martinis
Defects in Transient tRNA Translocation Bypass tRNA Synthetase Quality Control Mechanisms
J. Biol. Chem.,
April 24, 2009;
284(17):
11478 - 11484.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
C. Madden, P. Bohnenkamp, K. Kazerounian, and H. T. Ilies
Residue Level Three-dimensional Workspace Maps for Conformational Trajectory Planning of Proteins
The International Journal of Robotics Research,
April 1, 2009;
28(4):
450 - 463.
[Abstract]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. E. Dobbins, V. I. Lesk, and M. J. E. Sternberg
Insights into protein flexibility: The relationship between normal modes and conformational change upon protein-protein docking
PNAS,
July 29, 2008;
105(30):
10390 - 10395.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. Mosca and T. R. Schneider
RAPIDO: a web server for the alignment of protein structures in the presence of conformational changes
Nucleic Acids Res.,
July 1, 2008;
36(suppl_2):
W42 - W46.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. K. Meray and P. T. Lansbury Jr.
Reversible Monoubiquitination Regulates the Parkinson Disease-associated Ubiquitin Hydrolase UCH-L1
J. Biol. Chem.,
April 6, 2007;
282(14):
10567 - 10575.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Y. Zhao, D. Stoffler, and M. Sanner
Hierarchical and multi-resolution representation of protein flexibility
Bioinformatics,
November 15, 2006;
22(22):
2768 - 2774.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
A. E. C. Burba, U. Lehnert, E. Z. Yu, and M. Gerstein
Helix Interaction Tool (HIT): a web-based tool for analysis of helix-helix interactions in proteins
Bioinformatics,
November 15, 2006;
22(22):
2735 - 2738.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
M. J. Page and E. Di Cera
Role of na+ and k+ in enzyme function.
Physiol Rev,
October 1, 2006;
86(4):
1049 - 1092.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
K.-i. Okazaki, N. Koga, S. Takada, J. N. Onuchic, and P. G. Wolynes
Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: Structure-based molecular dynamics simulations
PNAS,
August 8, 2006;
103(32):
11844 - 11849.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Y. Jang, J. I. Jeong, and M. K. Kim
UMMS: constrained harmonic and anharmonic analyses of macromolecules based on elastic network models.
Nucleic Acids Res.,
July 1, 2006;
34(Web Server issue):
W57 - W62.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
P. Hariharan, W. Liang, S.-H. Chou, and D.-H. Chin
A New Model for Ligand Release: ROLE OF SIDE CHAIN IN GATING THE ENEDIYNE ANTIBIOTIC
J. Biol. Chem.,
June 9, 2006;
281(23):
16025 - 16033.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
S. Flores, N. Echols, D. Milburn, B. Hespenheide, K. Keating, J. Lu, S. Wells, E. Z. Yu, M. Thorpe, and M. Gerstein
The Database of Macromolecular Motions: new features added at the decade mark
Nucleic Acids Res.,
January 1, 2006;
34(suppl_1):
D296 - D301.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
Z. Bu, R. Biehl, M. Monkenbusch, D. Richter, and D. J. E. Callaway
Coupled protein domain motion in Taq polymerase revealed by neutron spin-echo spectroscopy
PNAS,
December 6, 2005;
102(49):
17646 - 17651.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
E. Lindahl and M. Delarue
Refinement of docked protein-ligand and protein-DNA structures using low frequency normal mode amplitude optimization
Nucleic Acids Res.,
August 8, 2005;
33(14):
4496 - 4506.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
G. Qi, R. Lee, and S. Hayward
A comprehensive and non-redundant database of protein domain movements
Bioinformatics,
June 15, 2005;
21(12):
2832 - 2838.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
P. Towler, B. Staker, S. G. Prasad, S. Menon, J. Tang, T. Parsons, D. Ryan, M. Fisher, D. Williams, N. A. Dales, et al.
ACE2 X-Ray Structures Reveal a Large Hinge-bending Motion Important for Inhibitor Binding and Catalysis
J. Biol. Chem.,
April 23, 2004;
279(17):
17996 - 18007.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
R. I. MacDonald and J. A. Cummings
Stabilities of folding of clustered, two-repeat fragments of spectrin reveal a potential hinge in the human erythroid spectrin tetramer
PNAS,
February 10, 2004;
101(6):
1502 - 1507.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
G. Finocchiaro, T. Wang, R. Hoffmann, A. Gonzalez, and R. C. Wade
DSMM: a Database of Simulated Molecular Motions
Nucleic Acids Res.,
January 1, 2003;
31(1):
456 - 457.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
N. Echols, D. Milburn, and M. Gerstein
MolMovDB: analysis and visualization of conformational change and structural flexibility
Nucleic Acids Res.,
January 1, 2003;
31(1):
478 - 482.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
P. Bertone, Y. Kluger, N. Lan, D. Zheng, D. Christendat, A. Yee, A. M. Edwards, C. H. Arrowsmith, G. T. Montelione, and M. Gerstein
SPINE: an integrated tracking database and data mining approach for identifying feasible targets in high-throughput structural proteomics
Nucleic Acids Res.,
July 1, 2001;
29(13):
2884 - 2898.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
J. Qian, B. Stenger, C. A. Wilson, J. Lin, R. Jansen, S. A. Teichmann, J. Park, W. G. Krebs, H. Yu, V. Alexandrov, et al.
PartsList: a web-based system for dynamically ranking protein folds based on disparate attributes, including whole-genome expression and interaction information
Nucleic Acids Res.,
April 15, 2001;
29(8):
1750 - 1764.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
N. Sinha and R. Nussinov
Point mutations and sequence variability in proteins: Redistributions of preexisting populations
PNAS,
March 13, 2001;
98(6):
3139 - 3144.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
F. Tama and Y.-H. Sanejouand
Conformational change of proteins arising from normal mode calculations
Protein Eng. Des. Sel.,
January 1, 2001;
14(1):
1 - 6.
[Abstract]
[Full Text]
[PDF]
![]()
![]()
![]()

![]()
![]()
![]()
B. W. Lennon, C. H. Williams Jr., and M. L. Ludwig
Twists in Catalysis: Alternating Conformations of Escherichia coli Thioredoxin Reductase
Science,
August 18, 2000;
289(5482):
1190 - 1194.
[Abstract]
[Full Text]
![]()
![]()
![]()

![]()
![]()
![]()
K. M. Ottemann, W. Xiao, Y. Shin, and D. E. Koshland
Jr.
A Piston Model for Transmembrane Signaling of the Aspartate Receptor
Science,
September 10, 1999;
285(5434):
1751 - 1754.
[Abstract]
[Full Text]
![]()
This Article ![]()
![]()
Abstract
![]()
Print PDF (417K)
![]()
Alert me when this article is cited
![]()
Alert me if a correction is posted
![]()
Services ![]()
![]()
Email this article to a friend
![]()
Similar articles in this journal
![]()
Similar articles in ISI Web of Science
![]()
Similar articles in PubMed
![]()
Alert me to new issues of the journal
![]()
Add to My Personal Archive
![]()
Download to citation manager
![]()
Search for citing articles in:
ISI Web of Science (148)
![]()
Request Permissions ![]()
Commercial Re-use Guidelines
for Open Access NAR Content
![]()
Google Scholar ![]()
![]()
Articles by Gerstein, M.
![]()
Articles by Krebs, W.
![]()
Search for Related Content
![]()
PubMed ![]()
![]()
PubMed Citation
![]()
Articles by Gerstein, M.
![]()
Articles by Krebs, W.
![]()
Social Bookmarking ![]()
![]()
What's this?

