Skip Navigation


Nucleic Acids Research Advance Access originally published online on May 6, 2008
Nucleic Acids Research 2008 36(Web Server issue):W42-W46; doi:10.1093/nar/gkn197
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4906K) Freely available
Right arrow Screen PDF (545K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_2/W42    most recent
gkn197v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Mosca, R.
Right arrow Articles by Schneider, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mosca, R.
Right arrow Articles by Schneider, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2008, Vol. 36, No. suppl_2 W42-W46
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Articles

RAPIDO: a web server for the alignment of protein structures in the presence of conformational changes

Roberto Mosca1,3 and Thomas R. Schneider1,2,3,*

1IFOM, the FIRC Institute for Molecular Oncology Foundation, Via Adamello 16, 20139, 2European Institute of Oncology, Via Ripamonti 435, 20141 Milan, Italy and 3European Molecular Biology Laboratory Hamburg, c/o DESY Notkestrasse 85 22603 Hamburg, Germany

*To whom correspondence should be addressed. Tel: +49 40 89902 190; Fax: +49 40 89902 149; Email: thomas.schneider{at}embl-hamburg.de

Received January 31, 2008. Revised March 12, 2008. Accepted April 2, 2008.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 AN EXAMPLE: BIOTIN CARBOXYLASE
 CONCLUSION
 REFERENCES
 
Rapid alignment of proteins in terms of domains (RAPIDO) is a web server for the 3D alignment of crystal structures of different protein molecules in the presence of conformational change. The structural alignment algorithm identifies groups of equivalent atoms whose interatomic distances are constant (within a defined tolerance) in the two structures being compared and considers these groups of atoms as rigid bodies. In addition to the functionalities provided by existing tools, RAPIDO can identify structurally equivalent regions also when these consist of fragments that are distant in terms of sequence and separated by other movable domains. Furthermore, RAPIDO takes the variation in the reliability of atomic coordinates into account in the comparison of distances between equivalent atoms by employing weighting-functions based on the refined B-values. The regions identified as equivalent by RAPIDO furnish reliable sets of residues for the superposition of the two structures for subsequent detailed analysis. The RAPIDO server, with related documentation, is available at http://webapps.embl-hamburg.de/rapido.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 AN EXAMPLE: BIOTIN CARBOXYLASE
 CONCLUSION
 REFERENCES
 
Structural alignment, i.e. the definition of an equivalence map between residues in different structures based on their relative position in space, is a key step in protein structure analysis. The comparison of a protein structure with other structures of the same or similar proteins reveals differences and similarities between related molecules and allows inferring how functional properties are implemented. In the context of a crystallographic structure determination, the alignment of structures of related proteins can identify structurally conserved fragments to be used in molecular replacement (1).

A large number of tools have been developed both for the pairwise and the multiple alignment of structures (2–4). Computer programs for structural alignment can be divided into two main categories depending on whether the molecules under comparison are considered as rigid entities or whether molecular flexibility is taken into account. The first group of computer programs includes DALI (5), CE (6) and MAMMOTH (7) for pairwise alignment and CEMC (8), SSM (9) and MAMMOTH-Mult (10) for multiple alignment. However, it is well known that protein molecules can undergo internal movements, in particular, between their domains and subdomains (11,12). To take molecular flexibility into account, tools for the flexible alignment of protein structures have been implemented. These include FlexProt (13) and FATCAT (14) for pairwise alignment and MultiProt (15) and POSA (16) for multiple alignment.

In this article, we introduce a new web server, named RAPIDO (for rapid alignment of proteins in terms of domains), implementing a new algorithm for the 3D alignment of protein structures in the presence of conformational changes. The web server accepts a set of protein structures and aligns all structures against a reference structure in a pairwise fashion. The algorithm is capable of aligning models of two proteins also in cases of large structural changes such as hinge motions between domains. Furthermore, it is able to identify conformationally invariant regions (rigid bodies) and to produce superpositions.

Among the tools mentioned before, the ones providing the most closely related facilities are FATCAT (http://fatcat.burnham.org/) and FlexProt (http://bioinfo3d.cs.tau.ac.il/FlexProt/). In comparison to these services, RAPIDO has the additional capability of identifying conformational invariant regions when they are not sequential in the residue chain (e.g. when a rigid body contains regions at the N- and C-terminus of a protein separated by another movable rigid body in between). Furthermore, RAPIDO takes into account the variation in the reliability of atomic coordinates by using a B-factor-based weighting scheme. On output, various scripts for displaying the results with PyMOL (http://www.pymol.org) and RasMOL (http://www.umass.edu/microbio/rasmol/index2.htm) are produced.


    MATERIALS AND METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 AN EXAMPLE: BIOTIN CARBOXYLASE
 CONCLUSION
 REFERENCES
 
Input data
As input, the web server accepts coordinate files in PDB (17) format. The user can either provide the PDB-IDs of structures that are already present in the Protein Data Bank or upload a tarball containing a set of PDB files. The PDB files are parsed and subdivided into chains, which are then called conformers. From the list of conformers, the user can then select a subset for alignment.

Processing method
The structural alignment algorithm consists of four steps:

  1. Detection of short structurally similar fragments, so-called matching fragment pairs (MFPs) (6).
  2. Chaining of the MFPs by a graph-based algorithm.
  3. Identification of rigid bodies with a genetic algorithm (18).
  4. Refinement of the alignment.
At first, the algorithm searches for pairs of structurally similar fragments in the two structures where a fragment is defined as an ungapped stretch of residues and the similarity between fragments is measured by a difference score. The difference score used is the sum over the absolute values of all elements of the difference distance matrix between the C{alpha}-atom positions of the fragments being compared. Pairs of fragments whose difference score is below a defined threshold, are stored as MFPs. In other publications (6,14,19), the term aligned fragment pairs (AFPs) has been used instead of MFPs. In the context of the RAPIDO aligner, we prefer to use the notation of MFPs in order to clarify that in a later stage of the alignment algorithm, a subset of the MFPs forming the initial set is selected to assemble the actual alignment, and the selected MFPs thus become AFPs. In order to do that, the MFPs are represented as nodes of a graph and two MFPs (two nodes) are connected by an edge if they are topologically ordered, i.e. if they are composed of two pairs of fragments that appear in the same order in the two residue sequences. A path in this graph corresponds to a subset of MFPs representing a structural alignment between the two proteins structures. To take into account the varying degree of similarity and size of the MFPs, the gaps between them and their relative displacement, a weight is attached to each edge of the graph in a way inspired by ref. (14). A standard dynamic programing algorithm is then employed to identify the longest path in the graph, which can then be translated into a structural alignment. Further details on the alignment algorithm can be found in (Mosca, Brannetti, Schneider, manuscript in preparation).

Finally, a genetic algorithm originally designed for the identification of conformationally invariant regions in different conformations of the same protein molecule (18) is applied in order to find rigid bodies and the alignment is refined through the application of several heuristics.

Output of the web server
A dot plot of the alignment is provided together with statistics (Figure 1). A textual representation of the alignment is displayed on the web page and can be downloaded in FASTA format. It should be noted that, even if the textual representation of the alignment is referring to the sequence of residues, the equivalent pairs of residues are determined purely on the 3D information contained in two structures. Through a Jmol applet (http://www.jmol.org/), the user can have an immediate overview of the alignment-based superposition. Different types of superpositions are available: rigid superposition on all aligned atoms, superpositions on individual rigid bodies, etc. A particularly revealing way of superposition is the ‘flexible superposition’ of structures. For this type of superposition, the rigid bodies identified in the structural alignment are superimposed separately. For display, parts of the structures falling between the boundaries of two rigid bodies are moved together with the rigid body closest in sequence. The RMSD for a flexible superposition (RMSDf) is calculated as the RMSD over all C{alpha}-atoms of the individual rigid bodies superimposed separately. The superimposed structures in PDB format together with the PyMOL or RasMOL scripts for displaying the superpositions can be downloaded. All output information is color-coded consistently with respect to the rigid body assignments so that conformationally invariant parts can be easily analyzed.


Figure 1
View larger version (45K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 1. Output of RAPIDO. (a) The front page displays information about the input data (PDB files and the conformers extracted from them) and a summary of the alignments performed. The selection of conformers or the value of ‘low limit’ can be modified and a new calculation can be submitted by pressing the update button in the lower right corner of the page. (b) Pressing ‘Details and superpositions’ in the table summarizing the alignments will launch a page with more details about the alignment of a particular structure to the reference structure. The page provides various statistics for the alignment, a dot-plot, a Jmol-applet for immediate graphical inspection and a table with a textual description of the results. A consistent color scheme is used for the presentation of all results. Residues assigned to rigid bodies are colored in blue, green, cyan, magenta, etc. (see legend besides the Jmol-window), while residues that were aligned but which were too different to be assigned to a rigid body are colored in red. Residues that were not aligned are colored gray.

 

    AN EXAMPLE: BIOTIN CARBOXYLASE
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 AN EXAMPLE: BIOTIN CARBOXYLASE
 CONCLUSION
 REFERENCES
 
Biotin carboxylase (BC) is a component of enzymes such as pyruvate carboxylase (PC) and acetyl-CoA carboxylase (ACC) mediating the transfer of a carboxyl group through biotin. BCs typically have the ATP-grasp fold (20) and are composed of three sub-domains named A, B and C. The A and C domains form a cylindrical structure and the B domain is positioned at the top of this cylinder creating a pocket in which the active site is located (Figure 2). When ATP binds to the protein, the molecule undergoes a large conformational transition from an open to a closed state in which the B domain moves towards the A and C domains (20,21). Here, we selected two BCs from different organisms, PC from Aquifex aeolicus (22) (PDB-id 1ULZ) and ACC from Escherichia coli (21) (PDB-id 1DV2), in different states (ATP-bound 1DV2 versus apo-1ULZ) to demonstrate the function of RAPIDO.


Figure 2
View larger version (53K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 2. Schematic view of BC. Domains A, B and C are color-coded in orange, yellow and cyan, a bound molecule of ATP is shown in stick representation. The figure was created with PyMOL (http://pymol.sourceforge.net).

 
To start the alignment of the two structures, the PDB-ids of the two crystal structures are filled into the user interface together with an email address to which results will be sent.

After the submission, the web server analyzes the PDB files and subdivides each PDB file into conformers each consisting of one chain. The subset of conformers to be subjected to the alignment procedure is then specified by the user. When the calculations are finished, a link to the URL where the results are stored is sent to the email address provided. This URL contains a randomly generated alphanumeric code to protect the results from unauthorized access. The results remain accessible on the server for 24 hours after the completion of the alignment job. Figure 1 shows the results of the alignment as displayed by the web server. The front page contains a summary table with various statistics: the length (number of residues aligned, #al.), the RMSD of the global superposition (RMSDr), the number of residues belonging to rigid bodies (#rb) and the RMSDf.

Clicking the link on the right side of the rows describing individual alignments in the summary table launches a web page providing more details of the respective alignment. On this page, the first item is a color-coded dot-plot representation (23) of the structural alignment (Figure 1). The 3D superpositions based on the derived alignments can be interactively inspected via a Jmol applet; a set of buttons allows to change the visualization styles, selection of different superpositions modes, the color scheme and the structures actually being displayed.

At the top of the page links are provided for downloading RasMol and PyMOL scripts for the superposition of the structures. Separate PyMOL scripts (pml extension) are generated for each pair of structures and are named with the PBD-id of the two structures followed by a suffix. The suffix indicates the type of superposition: flexible superposition (_flex), rigid superposition (_rigid) and rigid superposition on the i-th rigid body (_rbi). For all the PyMOL scripts rigid bodies and aligned residue can be highlighted by pressing the function keys from F1 to F5 from the PyMOL interface.

For this example, the first rigid body corresponds to domains A and C and consists of 339 residues that can be superimposed with an RMSD of 0.84 Å. This rigid body is continuous in space but not in polypeptide sequence, containing the N and C terminus but not the central part of the protein sequence. In the center of the polypeptide chain, a short fragment of 46 residues forms a second rigid body, which can be superimposed independently of the rest of the molecule with an RMSD of 0.94 Å and corresponds to a part of the B domain. The flexible superposition (Figure 1) clearly shows that both rigid bodies are structurally very similar although they originate from different conformational states of homologues molecules from different organisms. Superposition of the entire molecules on the C{alpha}-atoms of the first rigid body clearly reveals the displacement between the two conformations of the B domain (Figure 3) depending on the presence or absence of ATP.


Figure 3
View larger version (46K):
[in this window]
[in a new window]
[Download PowerPoint slide]
 
Figure 3. Superpositions of aligned structures of BC from Aquifex aeolicus (1ULZ, light colors) and ACC in E. coli (1DV2, dark colors) (a) Superposition based on 339 C{alpha}-atoms belonging to the first rigid body (in blue). The closure of domain B onto domains A and C is clearly visible. (b) Superposition based on 42 C{alpha}-atoms belonging to the second rigid body (in green). The movement of the helix on top of the B domain (red) with respect to the rest of the B domain (green) becomes visible.

 
Adjustable parameters
The only user adjustable parameter of the web server is the ‘low limit’—the value for this parameter can be modified in a box displayed at the end of the summary table (Figure 1). This parameter controls to what extent equivalent distances are allowed to change between different models while the corresponding atoms are still counted as belonging to a rigid body (in which in principle all interatomic distances should remain identical). The ‘low limit’ corresponds to the parameter {varepsilon}l used in the comparison of different conformers of the same molecule via a genetic algorithm (18). However, in the present implementation it does not relate to a coordinate uncertainty estimated via Cruickshank's formula (as in ref. 18), but to a more crude weighting function based on B factors only. This choice was made to allow for a fully automatic processing of many PDB-files. The default value for ‘low-limit’ is 2.0 and was optimized for detection of typical domain motions; lower values for ‘low-limit’ will enforce a stricter similarity criterion for distances within rigid bodies leading to smaller rigid bodies, while larger values will do the opposite resulting in fewer rigid bodies of larger size.


    CONCLUSION
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 AN EXAMPLE: BIOTIN CARBOXYLASE
 CONCLUSION
 REFERENCES
 
We have presented a new server for the 3D alignment of protein structures in the presence of conformational changes. The server is able to identify conformational invariant regions between the two structures and to produce superpositions on different rigid bodies separately. Application to a pair of homologues structures of BC from different organisms has shown how the automatic determination of rigid bodies and the distinction between rigid and flexible regions by RAPIDO highlights important functional features of the two analyzed structures. Furthermore, the superposition of the structures on each rigid body separately helps the user identify and quantify the relative movements between conformationally invariant regions.

The choice of the residues for the superimposition is done automatically and based on a sound physical definition of conformationally invariant regions (18) and is not biased by manual intervention.


    ACKNOWLEDGEMENTS
 
This work was supported by grants from Associazione Italiana per la Ricerca sul Cancro (R.M., T.R.S.). Funding to pay the Open Access publication charges for this article was provided by European Molecular Biology Laboratory.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 MATERIALS AND METHODS
 AN EXAMPLE: BIOTIN CARBOXYLASE
 CONCLUSION
 REFERENCES
 

  1. Schwarzenbacher R, Godzik A, Jaroszewski L. The JCSG MR pipeline: optimized alignments, multiple models and parallel searches. Acta Crystallogr. (2008) 64:133–140.[Medline]

  2. Lemmen C, Lengauer T. Computational methods for the structural alignment of molecules. J. Comput. Aided. Mol. Des. (2000) 14:215–232.[CrossRef][Web of Science][Medline]

  3. Sierk ML, Kleywegt GJ. Deja vu all over again: finding and analyzing protein structure similarities. Structure (2004) 12:2103–2111.[Medline]

  4. Kolodny R, Koehl P, Levitt M. Comprehensive evaluation of protein structure alignment methods: scoring by geometric measures. J. Mol. Biol. (2005) 346:1173–1188.[CrossRef][Web of Science][Medline]

  5. Holm L, Sander C. Protein structure comparison by alignment of distance matrices. J. Mol. Biol. (1993) 233:123–138.[CrossRef][Web of Science][Medline]

  6. Shindyalov IN, Bourne PE. Protein structure alignment by incremental combinatorial extension (CE) of the optimal path. Protein Eng. (1998) 11:739–747.[Abstract/Free Full Text]

  7. Ortiz AR, Strauss CE, Olmea O. MAMMOTH (matching molecular models obtained from theory): an automated method for model comparison. Protein Sci. (2002) 11:2606–2621.[CrossRef][Web of Science][Medline]

  8. Guda C, Lu S, Scheeff ED, Bourne PE, Shindyalov IN. CE-MC: a multiple protein structure alignment server. Nucleic Acids Res. (2004) 32:W100–W103.[Abstract/Free Full Text]

  9. Krissinel E, Henrick K. Secondary-structure matching (SSM), a new tool for fast protein structure alignment in three dimensions. Acta Crystallogr. (2004) 60:2256–2268.[Web of Science]

  10. Lupyan D, Leo-Macias A, Ortiz AR. A new progressive-iterative algorithm for multiple structure alignment. Bioinformatics (2005) 21:3255–3263.[Abstract/Free Full Text]

  11. Gerstein M, Lesk AM, Chothia C. Structural mechanisms for domain movements in proteins. Biochemistry (1994) 33:6739–6749.[CrossRef][Web of Science][Medline]

  12. Gerstein M, Krebs W. A database of macromolecular motions. Nucleic Acids Res. (1998) 26:4280–4290.[Abstract/Free Full Text]

  13. Shatsky M, Nussinov R, Wolfson HJ. Flexible protein alignment and hinge detection. Proteins (2002) 48:242–256.[CrossRef][Web of Science][Medline]

  14. Ye Y, Godzik A. Flexible structure alignment by chaining aligned fragment pairs allowing twists. Bioinformatics (2003) 19(Suppl. 2):II246–II255.[Medline]

  15. Shatsky M, Nussinov R, Wolfson HJ. A method for simultaneous alignment of multiple protein structures. Proteins (2004) 56:143–156.[CrossRef][Web of Science][Medline]

  16. Ye Y, Godzik A. Multiple flexible structure alignment using partial order graphs. Bioinformatics (2005) 21:2362–2369.[Abstract/Free Full Text]

  17. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat TN, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Res. (2000) 28:235–242.[Abstract/Free Full Text]

  18. Schneider TR. A genetic algorithm for the identification of conformationally invariant regions in protein molecules. Acta Crystallogr. (2002) 58:195–208.[Web of Science]

  19. Menke M, Berger B, Cowen L. Matt: local flexibility aids protein multiple structure alignment. PLoS Comput. Biol. (2008) 4:e10.[CrossRef][Medline]

  20. Tong L, Harwood H.J. Jr. Acetyl-coenzyme A carboxylases: versatile targets for drug discovery. J. Cell Biochem. (2006) 99:1476–1488.[CrossRef][Web of Science][Medline]

  21. Thoden JB, Blanchard CZ, Holden HM, Waldrop GL. Movement of the biotin carboxylase B-domain as a result of ATP binding. J. Biol. Chem. (2000) 275:16183–16190.[Abstract/Free Full Text]

  22. Kondo S, Nakajima Y, Sugio S, Yong-Biao J, Sueda S, Kondo H. Structure of the biotin carboxylase subunit of pyruvate carboxylase from Aquifex aeolicus at 2.2 A resolution. Acta Crystallogr. (2004) 60:486–492.[CrossRef][Web of Science]

  23. Maizel JV Jr, Lenk RP. Enhanced graphic matrix analysis of nucleic acid and protein sequences. Proc. Natl Acad. Sci. USA (1981) 78:7665–7669.[Abstract/Free Full Text]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
T. Margraf, G. Schenk, and A. E. Torda
The SALAMI protein structure search server
Nucleic Acids Res., July 1, 2009; 37(suppl_2): W480 - W484.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (4906K) Freely available
Right arrow Screen PDF (545K) Freely available
Right arrowOA All Versions of this Article:
36/suppl_2/W42    most recent
gkn197v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Mosca, R.
Right arrow Articles by Schneider, T. R.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mosca, R.
Right arrow Articles by Schneider, T. R.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?