Skip Navigation

Nucleic Acids Research 2006 34(Web Server issue):W235-W238; doi:10.1093/nar/gkl163
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (493K) Freely available
Right arrow Screen PDF (141K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Liu, Y.
Right arrow Articles by Kuhlman, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liu, Y.
Right arrow Articles by Kuhlman, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2006. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org


Article

RosettaDesign server for protein design

Yi Liu and Brian Kuhlman*

Department of Biochemistry and Biophysics, University of North Carolina Chapel Hill, NC 27599, USA

*To whom correspondence should be addressed. Tel: +1 919 843 0188; Fax: +1 919 966 2852; Email: bkuhlman{at}email.unc.edu

Received January 30, 2006. Revised February 22, 2006. Accepted March 20, 2006.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD USED
 SERVICES
 INPUTS, OUTPUTS AND JOB...
 SERVER PERFORMANCE
 REFERENCES
 
The RosettaDesign server identifies low energy amino acid sequences for target protein structures (http://rosettadesign.med.unc.edu). The client provides the backbone coordinates of the target structure and specifies which residues to design. The server returns to the client the sequences, coordinates and energies of the designed proteins. The simulations are performed using the design module of the Rosetta program (RosettaDesign). RosettaDesign uses Monte Carlo optimization with simulated annealing to search for amino acids that pack well on the target structure and satisfy hydrogen bonding potential. RosettaDesign has been experimentally validated and has been used previously to stabilize naturally occurring proteins and design a novel protein structure.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD USED
 SERVICES
 INPUTS, OUTPUTS AND JOB...
 SERVER PERFORMANCE
 REFERENCES
 
Recently, there have been many successes in the area of computational protein design. Protein design software has been used to stabilize naturally occurring proteins, perturb protein binding specificity, design novel biosensors and enzymes and create novel protein structures [for a review see (13)]. In most cases, these studies have been performed by laboratories that specialize in computational design and have direct access to the software and the source code (46). To make this technology more accessible to the large number of molecular biology laboratories that regularly use amino acid mutagenesis to probe protein structure and function, we have established a web server for protein design that uses the design module of the Rosetta program (Rosetta Design) (7,8).

Given a target protein structure or complex, RosettaDesign searches for amino acid sequences that pack well, bury their hydrophobic atoms and satisfy the hydrogen bonding potential of polar atoms. RosettaDesign has been parameterized to return sequences with amino acid frequencies comparable to those found in naturally occurring proteins, and to partition the hydrophobic and polar residues between the surface and the core at naturally occurring frequencies. In general, when redesigning a naturally occurring protein ~65% of the residues will mutate. As expected, more sequence variability is seen on the surface of the protein where there are fewer packing constraints. In the core of the protein 45% of the residues mutate on average. RosettaDesign has been experimentally validated. It has been used to stabilize naturally occurring proteins (9), enhance protein binding affinities (10), design a protein that can switch between 2-folds (11) and create a protein with a novel structure (12).


    METHOD USED
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD USED
 SERVICES
 INPUTS, OUTPUTS AND JOB...
 SERVER PERFORMANCE
 REFERENCES
 
The RosettaDesign server uses the design module of the Rosetta program to perform fixed backbone protein design simulations. The algorithm has been described previously (7,8). Like other protein design programs, RosettaDesign has two primary components: an energy function for evaluating the relative favorability of a sequence and an optimization procedure for searching through sequence space. All atoms in the protein, including hydrogen, are explicitly modeled. The energy function consists of (i) a Lennard–Jones potential that favors close packed residues, (ii) the Lazaridis–Karplus implicit solvation model which favors hydrophobic amino acids in the interior of proteins and polar amino acids on the surface (13), (iii) an explicit orientation dependent hydrogen bonding term (14), (iv) torsion potentials derived from the PDB (15), (v) a unique reference value for each amino acid type and (vi) electrostatic interactions between charged residues are modeled by an additional term that is based on the probability of seeing two amino acid types near each other in the PDB (16). This is a relatively weak term in the energy function.

To simplify the optimization procedure and favor low energy designs, amino acid side chains are only allowed to adopt a discrete set of favorable conformations, typically referred to as rotamers. RosettaDesign uses Dunbrack's backbone dependent rotamer library (15). To allow for relaxation away from the most preferred side chain conformations, additional rotamers are created for buried residues by varying chi1 and chi2 one standard deviation (~10°) away from the most preferred values. Rotamers are also created for the alternate positions hydrogen can adopt on serine, threonine and tyrosine. To find low energy sequences, RosettaDesign uses Monte Carlo optimization with simulated annealing. Starting from a random sequence, single amino acid substitutions or rotamer switches are accepted based on the Metropolis criterion. The simulation starts at a very high temperature where almost all substitutions are accepted and finishes at 0°. Approximately 1 million rotamer substitutions are attempted per 100 residues being varied. Independent simulations in which every residue in the protein is allowed to vary generally converge to sequences that are 70–80% identical to each other.


    SERVICES
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD USED
 SERVICES
 INPUTS, OUTPUTS AND JOB...
 SERVER PERFORMANCE
 REFERENCES
 
Protein design
The RosettaDesign server returns low energy sequences for target protein structures. The protein backbone remains fixed during the simulation.

Side chain conformation prediction
Given a protein structure and sequence, the RosettaDesign server can be used to predict the lowest energy conformations of the side chains.


    INPUTS, OUTPUTS AND JOB OPTIONS
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD USED
 SERVICES
 INPUTS, OUTPUTS AND JOB...
 SERVER PERFORMANCE
 REFERENCES
 
Registration
To receive results via email users must register. Alternatively, users can access the web server as a ‘guest’. In this case they must return to the web site to retrieve results.

Input files
PDB file: users must submit a file with the atomic coordinates of the protein that will be the template for design. The coordinates must be in PDB format. There can be gaps in the structure, but each residue must have a complete set of backbone heavy atoms—N', C', O and C{alpha}. The residues can be missing side chain atoms.

Resfile: the resfile specifies which sequence positions will be varied, and which amino acids will be considered at each position. Users can also request that the native amino acid be kept at a particular sequence position, but allow the side chain to adopt a new conformation. The resfile can be created on the web site using point-and-click operations (Figure 1) or a user can upload his or her own resfile. The server will check the integrity of the uploaded resfile to ensure the correct format. The resfile created on the web site with point-and-click operations can also be saved for future use. A full description of the format for a resfile is provided in the documentation section of the web site.


Figure 1
View larger version (46K):
[in this window]
[in a new window]
 
Figure 1 Interface for choosing which sequence positions to vary.

 
Job options
Users can choose either to redesign the whole protein with all 20 amino acids considered at each sequence position, or to redesign part of the target protein as specified in a resfile. Because RosettaDesign uses a stochastic sampling algorithm to identify low energy sequences, different simulations will not necessarily give identical results. Users can choose to repeat the same simulation up to 10 times with a single job submission.

Output files
The simulation results are compressed as a zip file that unzips into three files: a log file indicating what commands were used for the simulation, a text file with a list of the mutations that were made, and a third file that provides the coordinates in PDB format along with the energies of the redesigned protein. If a run does not finish, the server will email the user the suspected reason for failure.

There are three sections of the PDB file pertaining to the energy of the redesigned protein:

The first part is a list of scores. Except for the reference energies, a lower score is better. The second section of energies is a table with the energy of each residue in the protein (Table 1). In the cases in which an energy depends on two atoms in separate residues (for instance the Lennard–Jones energy), half of the energy is assigned to each residue. The third section is a table of measured energies—expected energies. Expected energies are derived by calculating the average energies of the different amino acids as a function of buriedness in a large set of proteins from the PDB. For instance, in the PDB leucines with 20 neighbors (residues within 10 Å) have an average Lennard–Jones score of –3.79 kcal/mol. If a leucine in the redesigned protein has 20 neighbors and has a Lennard–Jones energy of –4.2 kcal/mol, then it indicates that leucine is more tightly packed than the average leucine in the PDB. In general, we have found this table especially useful in the design of new protein structures, as it allows one to estimate how much the designed protein resembles proteins found in nature.


View this table:
[in this window]
[in a new window]
 
Table 1 The scores relevant to protein design

 

    SERVER PERFORMANCE
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD USED
 SERVICES
 INPUTS, OUTPUTS AND JOB...
 SERVER PERFORMANCE
 REFERENCES
 
There have been 3000 jobs submitted by more than 320 clients to the RosettaDesign web server since March 2005. The server can accept proteins as large as 1000 residues and can redesign up to 200 residues in one simulation. The web site is set up as an apache server with a daemon that automatically invokes the Rosetta++ executable with the users input file and options obtained from the web interface. The user's input files, job options, and the results are recorded in a mySQL database via a php-http module. A maximum of two jobs can be run at the same time. The daemon checks the mySQL database for pending jobs every minute. For proteins between 100 and 200 residues, the simulation typically finishes in 5–30 min.

Accuracy of the RosettaDesign server
In a large scale test of RosettaDesign, the program was used to completely redesign nine naturally occurring proteins (9). The redesigned sequences were on average 35% identical to the wild-type sequence. Five out of the nine proteins were well-folded as evidenced by NMR and thermal and chemical denaturation experiments. All five of the well-folded proteins had higher thermal unfolding midpoints than the wild-type sequence. RosettaDesign has also been used to redesign small regions of a protein to increase protein stability or binding affinities (10,17,18). In many of these cases, lower free energies were obtained by building additional hydrophobic interactions. RosettaDesign has had less success with creating buried hydrogen bond networks. This is presumably because hydrogen bonds are very sensitive to small changes in distance and orientation, and desolvation penalties are difficult to calculate accurately.

Because the RosettaDesign energy function favors like amino acids being near other (polars with polars, hydrophobics with hydrophobics) it will in some cases design large patches of hydrophobic amino acids on the surface of a protein. Although this may be favorable for protein stability, it can lead to aggregation of the protein. In this event, the user can force a small set of residues in the center of the patch to be polar, and this in general will encourage RosettaDesign to put polar residues at the neighboring positions as well.

Possible uses for the RosettaDesign server
Over the last 10 years protein design software has been applied to a large number of interesting problems. Several laboratories have used sequence optimization algorithms to explore the size and characteristics of sequence space compatible with a particular fold. In a few cases, this information has been used to help detect remote homologs (19,20). In general, protein structures and complexes can be stabilized by identifying mutations that increase buried hydrophobic surface area. Towards this end, the RosettaDesign server can be used to search for holes in proteins that can be filled with larger hydrophobic residues, or partially buried polar residues that can be replaced by hydrophobic residues.

RosettaDesign can be used to search for second-site suppressor mutations. In this scenario, the user has a priori knowledge of a mutation that destabilizes a protein or protein–protein complex. Using a resfile, the user can force the destabilizing mutation and use RosettaDesign to search for mutations that will compensate for the first mutation. A similar approach was recently used to redesign a protein–protein interface so that the redesigned proteins still bind each other, but no longer bind their other naturally occurring binding partners (21). These types of redesigns are useful for probing signal transduction pathways.

In cases where a protein can adopt multiple conformations, RosettaDesign can be used to identify sequences that are specifically optimized for one of the conformations. Mayo and colleagues used this approach to increase the affinity between a receptor protein and its ligand (22). More ambitiously RosettaDesign can be used to help design new protein structures or portions of proteins. In this case, the user must supply the backbone coordinates of the target structure. The challenge is that many arbitrarily chosen protein backbones will not be designable. This is generally reflected in poor LJatr and SASApack (see Table 1) values for the redesigned protein. In the future, as our computational resources grow, we plan to modify the RosettaDesign server so that the backbone coordinates and the sequence can be optimized simultaneously to allow for tight packing between side chains.


    ACKNOWLEDGEMENTS
 
This work was supported by a grant from the NIH (1RO1 GM073151-01). Funding to pay the Open Access publication charges for this article was provided by the NIH.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHOD USED
 SERVICES
 INPUTS, OUTPUTS AND JOB...
 SERVER PERFORMANCE
 REFERENCES
 

  1. Pokala, N. and Handel, T.M. (2001) Review: protein design–where we were, where we are, where we're going J. Struct. Biol, . 134, 269–281[CrossRef][ISI][Medline] .

  2. Park, S., Yang, X., Saven, J.G. (2004) Advances in computational protein design Curr. Opin. Struct. Biol, . 14, 487–494[CrossRef][ISI][Medline] .

  3. Butterfoss, G.L. and Kuhlman, B. (2005) Computer-based design of novel protein structures Annu. Rev. Biophys. Biomol. Struct, . 35, 49–65[CrossRef][ISI] .

  4. Dahiyat, B.I. and Mayo, S.L. (1997) De novo protein design: fully automated sequence selection Science, 278, 82–87[Abstract/Free Full Text] .

  5. Dwyer, M.A. and Hellinga, H.W. (2004) Periplasmic binding proteins: a versatile superfamily for protein engineering Curr. Opin. Struct. Biol, . 14, 495–504[CrossRef][ISI][Medline] .

  6. Harbury, P.B., Plecs, J.J., Tidor, B., Alber, T., Kim, P.S. (1998) High-resolution protein design with backbone freedom Science, 282, 1462–1467[Abstract/Free Full Text] .

  7. Rohl, C.A., Strauss, C.E., Misura, K.M., Baker, D. (2004) Protein structure prediction using Rosetta Meth. Enzymol, . 383, 66–93[ISI][Medline] .

  8. Kuhlman, B. and Baker, D. (2000) Native protein sequences are close to optimal for their structures Proc. Natl Acad. Sci. USA, 97, 10383–10388[Abstract/Free Full Text] .

  9. Dantas, G., Kuhlman, B., Callender, D., Wong, M., Baker, D. (2003) A large scale test of computational protein design: folding and stability of nine completely redesigned globular proteins J. Mol. Biol, . 332, 449–460[CrossRef][ISI][Medline] .

  10. Eletr, Z.M., Huang, D.T., Duda, D.M., Schulman, B.A., Kuhlman, B. (2005) E2 conjugating enzymes must disengage from their E1 enzymes before E3-dependent ubiquitin and ubiquitin-like transfer Nature Struct. Mol. Biol, . 12, 933–934[CrossRef] .

  11. Ambroggio, X.I. and Kuhlman, B. (2006) Computational design of a single amino acid sequence that can switch between two distinct protein folds J. Am. Chem. Soc, . 128, 1154–1161[CrossRef][Medline] .

  12. Kuhlman, B., Dantas, G., Ireton, G.C., Varani, G., Stoddard, B.L., Baker, D. (2003) Design of a novel globular protein fold with atomic-level accuracy Science, 302, 1364–1368[Abstract/Free Full Text] .

  13. Lazaridis, T. and Karplus, M. (1999) Effective energy function for proteins in solution Proteins, 35, 133–152[CrossRef][ISI][Medline] .

  14. Kortemme, T., Morozov, A.V., Baker, D. (2003) An orientation-dependent hydrogen bonding potential improves prediction of specificity and structure for proteins and protein–protein complexes J. Mol. Biol, . 326, 1239–1259[CrossRef][ISI][Medline] .

  15. Dunbrack, R.L., Jr and , Cohen, F.E. (1997) Bayesian statistical analysis of protein side-chain rotamer preferences Protein Sci, . 6, 1661–1681[Abstract] .

  16. Simons, K.T., Ruczinski, I., Kooperberg, C., Fox, B.A., Bystroff, C., Baker, D. (1999) Improved recognition of native-like protein structures using a combination of sequence-dependent and sequence-independent features of proteins Proteins, 34, 82–95[CrossRef][ISI][Medline] .

  17. Korkegian, A., Black, M.E., Baker, D., Stoddard, B.L. (2005) Computational thermostabilization of an enzyme Science, 308, 857–860[Abstract/Free Full Text] .

  18. Nauli, S., Kuhlman, B., Baker, D. (2001) Computer-based redesign of a protein folding pathway Nature Struct. Biol, . 8, 602–605[CrossRef][ISI][Medline] .

  19. Pei, J., Dokholyan, N.V., Shakhnovich, E.I., Grishin, N.V. (2003) Using protein design for homology detection and active site searches Proc. Natl Acad. Sci. USA, 100, 11361–11366[Abstract/Free Full Text] .

  20. Saunders, C.T. and Baker, D. (2005) Recapitulation of protein family divergence using flexible backbone protein design J. Mol. Biol, . 346, 631–644[CrossRef][ISI][Medline] .

  21. Kortemme, T., Joachimiak, L.A., Bullock, A.N., Schuler, A.D., Stoddard, B.L., Baker, D. (2004) Computational redesign of protein–protein interaction specificity Nature Struct. Mol. Biol, . 11, 371–379 .

  22. Shimaoka, M., Shifman, J.M., Jing, H., Takagi, J., Mayo, S.L., Springer, T.A. (2000) Computational design of an integrin I domain stabilized in the open high affinity conformation Nature Struct. Biol, . 7, 674–678[CrossRef][ISI][Medline] .


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Nucleic Acids ResHome page
S. Lyskov and J. J. Gray
The RosettaDock server for local protein-protein docking
Nucleic Acids Res., July 1, 2008; 36(suppl_2): W233 - W238.
[Abstract] [Full Text] [PDF]


Home page
J. Virol.Home page
J. S. Sparks, E. F. Donaldson, X. Lu, R. S. Baric, and M. R. Denison
A Novel Mutation in Murine Hepatitis Virus nsp5, the Viral 3C-Like Proteinase, Causes Temperature-Sensitive Defects in Viral Growth and Protein Processing
J. Virol., June 15, 2008; 82(12): 5999 - 6008.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (493K) Freely available
Right arrow Screen PDF (141K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Liu, Y.
Right arrow Articles by Kuhlman, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Liu, Y.
Right arrow Articles by Kuhlman, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?