Nucleic Acids Research Advance Access published online on June 21, 2007
Nucleic Acids Research, doi:10.1093/nar/gkm319
© 2007 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Pcons.net: protein structure prediction meta server
Björn Wallner1,*,
Per Larsson2 and
Arne Elofsson2
1Department of Biochemistry, University of Washington, Box 357350, Seattle, WA 98195, USA and 2Center for Biomembrane Research, Stockholm University, SE-106 91 Stockholm, Sweden
*To whom correspondence should be addressed. Tel: +1 206 616 4396; Fax: +1 206 685 1792; Email: bjornwa{at}u.washington.edu
Received January 26, 2007. Revised March 19, 2007. Accepted April 17, 2007.
 |
ABSTRACT
|
|---|
The
Pcons.net Meta Server (http://pcons.net) provides improved
automated tools for protein structure prediction and analysis
using consensus. It essentially implements all the steps necessary
to produce a high quality model of a protein. The whole process
is fully automated and a potential user only submits the protein
sequence. For PSI-BLAST detectable targets, an accurate model
is generated within minutes of submission. For more difficult
targets the sequence is automatically submitted to publicly
available fold-recognition servers that use more advanced approaches
to find distant structural homologs. The results from these
servers are analyzed and assessed for structural correctness
using Pcons and ProQ; and the user is presented with a ranked
list of possible models. In addition, if the protein sequence
contains more than one domain, these are automatically parsed
out and resubmitted to the server as individual queries.
 |
INTRODUCTION
|
|---|
Reliable and accurate predictions of protein structure are important
for many biologists. For many years it was believed that manual
experts significantly outperformed all automatic methods. However
since consensus-based approaches (
1) were introduced it has
been found that at the most a handful experts in the world can
outperform the community of web-servers. It has
also been shown consistently in CASP that consensus methods
are superior compared to individual methods in predicting the
structure of a protein sequence (
24). Pcons has been
among the top performing automated predictors since CASP5 and
was the best method for assessing model quality in CASP7 (
5).
Here, we introduce the Pcons.net meta server (http://pcons.net) which provides improved automated tools for protein structure prediction and analysis using consensus. The whole process is fully automated and a potential user only submits the protein sequence. This makes it easy to acquire structural information without any prior knowledge of remote homology detection, model building and model quality assessment. Pcons has previously been available as a downloadable program as well as through several other meta servers (genesilico.pl and bioinfo.pl). Pcons.net meta server provides significant improvements over these servers. It has an improved web interface and prediction accuracy, the local accuracy for each residue is also provided and for easy targets an accurate 3D model is build within minutes of submission.
 |
SERVER DESCRIPTION
|
|---|
The
Pcons.net Meta Server (
http://pcons.net) essentially implements
all the steps necessary to produce a high quality model of a
protein sequence:
- Finding the best possible template.
- Aligning the template to the query sequence.
- Building a 3D structure based on the alignment.
- Assessing the quality of the model.
An overview of the method is shown in Figure 1. In the first step domains are assigned using Pfam (6) and a quick database search against known protein structures (PDB90) is performed using BLAST (7) and RPS-BLAST (8). This also establishes the difficulty of the submitted sequence. If a significant hit is found using RPS-BLAST, an all-atom model is produced using, Pfrag, a novel rapid homology modeling program based on segment matching and assembly. If the sequence identity is above 50% this model will be quite close to the native structure, comparable to low-resolution X-ray and NMR structures (9,10). The whole process from sequence to all-atom model takes
30 s, making it one of the fastest comparative modeling servers available.
RPS-BLAST is also used to parse the sequence into structural
domains by analyzing the significance and span of the best RPS-BLAST
alignment. If the hit is (i) significant (10
5) and (ii)
the alignment contains more than 30 unaligned residues, the
unaligned residues are parsed out and resubmitted to the servers
as a separate submission. In many cases, these domains agree
well with the domains obtained using Pfam.
It is only if no significant hits are found using RPS-BLAST, that the sequence is submitted to publicly available more advanced fold-recognition servers (Table 1). The user has the possibility to force the submission of sequences that has clear RPS-BLAST hits. However, we strongly discourage overuse of this possibility in order to not overload the external servers with trivial queries.
View this table:
[in this window]
[in a new window]
|
Table 1. Internal and external servers utilized by the Pcons.net Meta Server. For similar servers, e.g. bas_b and bas_c only one of them is used in the consensus analysis
|
|
The alignments from the initial BLAST, RPS-BLAST as well as
the alignments from the fold-recognition servers are collected
as they finish and all-atom models are built using Pfrag. When
the model building is finished, the quality of the models is
assessed using Pcons (
1,
2,
11). Pcons benefits from the use of
as many individual servers as possible. Thus, it is important
to not put too much weight on a consensus analysis that is only
based on the results from a few servers. In parallel to the
consensus analysis, the model quality is also assessed purely
based on structural features using ProQ (
12). Both Pcons and
ProQ give an overall quality to each model as well as a local
quality score to each individual residue (
13). In CASP7, Pcons
was one of the best method for assessing the overall quality
of protein models and the best method for assessing the local
quality of residues (
5).
In summary, the major advances over other web servers are:
- For PSI-BLAST detectable targets a quite accurate homology model is generated within minutes.
- A query sequence with PSI-BLAST detectable domains is automatically parsed into domains.
- A novel approach to display alignment similarity makes it easy to quickly select the best model.
- The overall as well as local quality of the model is assessed, using state-of-the-art methods.
 |
SERVER INPUTS AND OUTPUTS
|
|---|
The server takes a protein sequence in one-letter amino acid
format as input. The user has the possibility to name the sequence
and to give their e-mail address. Both the name and e-mail address
can be used to filter the results in the job queue (
http://pcons.net/index.php?queue).
Results for a specific job are provided through the web interface
by clicking on the job id listed in the job queue table (
Figure 2).
This page is updated continuously as more predictions are finished.
If an e-mail is provided the top 10 ranked model coordinates
are e-mailed after 46 h. The 46 h time limit is set to allow
for as many fold-recognition servers as possible to finish and
provide the basis for the consensus analysis. However, if a
significant hit indeed is found using the locally run RPS-BLAST,
an accurate model should be ready within minutes of submission.
In addition to the web interface, the
Pcons.net meta server
will also be made available as a web service using the Web Service
Description Language (WSDL) (
14). The idea behind web services
is to allow applications to communicate with each other in a
standardized way. WSDL is used to conceptually describe the
operations available at the service, and expresses the data
formats using XML Schema definitions. Communication between
web services and clients is done using the SOAP language (Simple
Object Application Protocol) (
15). For
Pcons.net this will mean
that a user who has access to a web service client, such as
Taverna (
16), will be able to make submissions to the meta server
and also build in these submissions into more complex analysis
workflows.
 |
ALIGNMENT REPRESENTATION
|
|---|
An additional novel feature is the representation of the different
alignments (
Figure 3), which enables a quick overview of the
alignment quality and facilitates comparisons of many alternative
alignments.
The alignment is represented as a line that is color-coded according
to the secondary structure. For the template structure STRIDE
(
17) is used to assign secondary structure based on the coordinates,
for the target sequence PSIPRED (
18) is used to predict secondary
structure and assign it to each residue. Both the target and
the template sequence are represented as full-length sequences,
making it possible to see which parts of the target and template
that are covered; and if the alignment spans only a part of
the whole template structure.
Here, the user also has the possibility to submit unaligned regions that did not fulfill the criteria for automatic domain resubmission (see above).
 |
MODEL BUILDING
|
|---|
The model building based on the targettemplate alignment
is performed using Pfrag, a reimplementation of the SegMod (
19)
homology modeling program. It builds models based on segment
matching. By searching a database of highly refined protein
structures, structural fragments are found that matches the
template structure as closely as possible. Criteria for evaluating
individual fragments are the degree of amino acid sequence homology
between the target and the template, the RMSD deviation between
a fragment and the template structure and the LennardJones
interaction energy between fragments and the structure. Initial
screening of fragments is done using the methodology of distance
matching by Jones and Thirup (
20). The all-atom models are then
energy minimized using the ENCAD force field (
21) to enforce
proper stereochemistry.
 |
QUALITY SCORES
|
|---|
A key component for any successful protein structure protocol
is the ability to assign quality scores to the created models.
Pcons.net scores models using the best methods currently available.
For each model three global quality scores are provided, one
based on consensus (Pcons), one based solely on structure (ProQ)
and one using a combination of the two (Pmodeller). All are
presented in the job summary page. The reason for providing
more than one score is that they contain complementary information.
The Pcons score, for instance, is only meaningful if a sufficient
number of models are available. If this is not the case, a structural
evaluation using ProQ might be more suitable and for other cases
the ProQ score might be a useful aid in the process of choosing
the best model.
From a user perspective it is important to know when to trust a certain score. Based on results from the quality assessment category in CASP7 (5) the Pcons score correlates well with the correct quality of the models as measured by LGscore (22) (R = 0.96). Moreover a Pcons score above 1.1 separates correct from incorrect models almost perfectly (only 2.5% false predictions). The ProQ and Pmodeller scores are the predicted LGscore and score values above 1.5 correspond to P-values better than 103.
In addition to the global quality scores, each amino acid in the models is given an estimate of the CACA error as measured by the local S-score (S = 1/(1 + error2/5)). The S-score varies between 0 and 1 corresponding to high and low error, respectively, e.g. if the S-score is larger than 0.5 the error is predicted to be <2.24 Å (51/2). The advantage with this type of score is that it focusses on the regions that have low error and gives the same score value for regions that are wrong. As for the global scores the local quality is predicted using either consensus (Pcons) or structural features (ProQres). In terms of performance, Pcons is superior to ProQres (13). In fact, no non-consensus-based approach is nearly as good as consensus-based approaches (5). However, ProQres still provide some additional value as a complement when there is no clear consensus or as additional augmentation when the consensus is weak. The local quality predictions are accessible by clicking either on the Pcons score or on the ProQ score in the job summary page (Figure 2). The local quality scores predicted by Pcons are also added to the B-factor column of all models for easy visualization in any coordinate viewing program (Figure 4).

View larger version (37K):
[in this window]
[in a new window]
[Download PowerPoint slide]
|
Figure 4. Local quality prediction using Pcons. (A) Predicted quality plotted for each residue in the sequence. (B) The structure color-coded from red to blue using the predicted quality, corresponding to poor and good, respectively (picture made using PyMOL (33). In this particular example, Pcons has identified a region around residue number 100 and the C-terminal to be incorrect. Despite that these two regions are far apart in sequence they end up on the same side of the protein, since the rest of the protein is correct; this suggests that the C-terminal residues makes some interactions with residues in other region that is not capture by this model. With this information it might be possible to improve the model.
|
|
 |
THROUGHPUT
|
|---|
The throughput of
Pcons.net depends to a large degree on the
difficulty of the target. For the easy targets, the meta server
could easily handle more than 1000 requests per day. But for
the harder targets it can only handle about 50 requests per
day, due to the throughput of the external server it uses. To
avoid overloading the external servers there is also a limit
in the number of pending external server jobs the meta server
can have. If this limit is reached, the meta server will queue
the jobs locally until the number of pending jobs decreases.
 |
ACKNOWLEDGEMENTS
|
|---|
First of all we want to thank all developers of servers. Without
these the consensus approach would not have any value. The success
of consensus-based methods should really be attributed to the
whole collective force of fold-recognition method developers
and we encourage users of Pcons.net to cite the individual servers
as well. We would also like to thank Michael Levitt for kindly
providing the source code to SegMod and Erik Lindahl for scientific
advise.
This work was supported by grants from the Swedish Research Councils and the EU 6th Framework Program is gratefully acknowledged for support to the GeneFun project, contract LSHG-CT-2004-503567 and to the EMBRACE project, contract LHSG-CT-2004-512092. Funding to pay the Open Access publication charges for this article was provided by the EMBRACE project .
Conflict of interest statement. None declared.
 |
REFERENCES
|
|---|
- Lundström J., Rychlewski L., Bujnicki J., Elofsson A. Pcons: a neural-network-based consensus predictor that improves fold recognition. Protein Sci. (2001) 10:23542362.[CrossRef][Web of Science][Medline]
- Wallner B., Fang H., Elofsson A. Automatic consensus-based fold recognition using Pcons, ProQ, and Pmodeller. Proteins (2003) 53(Suppl. 6):534541.[CrossRef][Web of Science][Medline]
- Moult J., Fidelis K., Zemla A., Hubbard T. Critical assessment of methods of protein structure prediction (CASP)-round V. Proteins (2003) 53(Suppl. 6):334339.[CrossRef][Web of Science][Medline]
- Kryshtafovych A., Venclovas C., Fidelis K., Moult J. Progress over the first decade of CASP experiments. Proteins (2005) 61(Suppl. 7):225236.[CrossRef][Web of Science][Medline]
- Wallner B., E. Elofsson A. Assessment of global and local quality model in casp7 using pcons. Manuscript in preparation (2007).
- Sonnhammer E., Eddy S., Durbin R. Pfam: a comprehensive database of protein domain families based on seed alignments. Proteins (1997) 28:405420.[CrossRef][Web of Science][Medline]
- Altschul S., Madden T., Schaffer A., Zhang J., Zhang Z., Miller W., Lipman D. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. (1997) 25:33893402.[Abstract/Free Full Text]
- Marchler-Bauer A., Panchenko A.R., Shoemaker B.A., Thiessen P.A., Geer L.Y., Bryant S.H. CDD: a database of conserved domain alignments with links to domain three-dimensional structure. Nucleic Acids Res. (2002) 30:281283.[Abstract/Free Full Text]
- Marti-Renom M., Stuart A., Fiser A., Sánchez R., Melo F., Sali A. Comparative protein structure modeling of genes and genomes. Annu. Rev. Biophys. Biomol. Struct. (2000) 29:291325.[CrossRef][Web of Science][Medline]
- Baker D., Sali A. Protein structure prediction and structural genomics. Science (2001) 294:9396.[Abstract/Free Full Text]
- Wallner B., Elofsson A. All are not equal: a benchmark of different homology modeling programs. Protein Sci. (2005) 14:13151327.[CrossRef][Web of Science][Medline]
- Wallner B., Elofsson A. Can correct protein models be identified? Protein Sci. (2003) 12:10731086.[CrossRef][Web of Science][Medline]
- Wallner B., Elofsson A. Identification of correct regions in protein models using structural, alignment, and consensus information. Protein Sci. (2006) 15:900913.[CrossRef][Web of Science][Medline]
- Web services description language. http://www.w3.org/TR/wsdl.
- Simple object access protocol. http://www.w3.org/TR/soap.
- Hull D, Wolstencroft K, Stevens R, Goble C, Pocock M.R, Li P, Oinn T. Taverna: a tool for the composition and enactment of bioinformatics workflow. Bioinformatics (2004) 20:30453054.[Abstract/Free Full Text]
- Frishman D., Argos P. Knowledge-based protein secondary structure assignment. Proteins (1995) 23:566579.[CrossRef][Web of Science][Medline]
- Jones D. Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. (1999) 292:195202.[CrossRef][Web of Science][Medline]
- Levitt M. Accurate modeling of protein conformation by automatic segment matching. J. Mol. Biol. (1992) 226:507533.[CrossRef][Web of Science][Medline]
- Jones T. A., Thirup S. Using known substructures in protein model building and crystallography. EMBO J. (1986) 5:819822.[Web of Science][Medline]
- Levitt M. Molecular dynamics of native protein. i. computer simulation of trajectories. J. Mol. Biol. (1983) 168:595617.[Web of Science][Medline]
- Cristobal S., Zemla A., Fischer D., Rychlewski L., Elofsson A. A study of quality measures for protein threading models. BMC Bioinformatics (2001) 2(5).
- Jaroszewski L., Rychlewski L., Li Z., Li W., Godzik A. FFAS03: a server for profile-profile sequence alignments. Nucleic Acids Res. (2005) 33(Web Server issue):W284W288.[Abstract/Free Full Text]
- Ginalski K., von Grotthuss M., Grishin N. V., Rychlewski L. Detecting distant homology with meta-BASIC. Nucleic Acids Res. (2004) 32(Web Server issue):W576W581.[Abstract/Free Full Text]
- Ginalski K., Pas J., Wyrwicz L. S., von Grotthuss M., Bujnicki J. M., Rychlewski L. ORFeus: detection of distant homology using sequence profiles and predicted secondary structure. Nucleic Acids Res. (2003) 31:38043807.[Abstract/Free Full Text]
- Karplus K., Karchin R., Draper J., Casper J., Mandel-Gutfreund Y., Diekhans M., Hughey R. Combining local-structure, fold-recognition, and new fold methods for protein structure prediction. Proteins (2003) 53(Suppl. 6):491496.[CrossRef][Web of Science][Medline]
- McGuffin L. J., Jones D. T. Improvement of the GenTHREADER method for genomic fold recognition. Bioinformatics (2003) 19:874881.[Abstract/Free Full Text]
- Shi J., Blundell T., Mizuguchi K. Fugue: sequence-structure homology recognition using environment-specific substitution tables and structure-dependent gap penalties. J. Mol. Biol. (2001) 310:243257.[CrossRef][Web of Science][Medline]
- Zhou H., Zhou Y. Fold recognition by combining sequence profiles derived from evolution and from depth-dependent structural alignment of fragments. Proteins (2005) 58:321328.[CrossRef][Web of Science][Medline]
- Fischer D. 3D-SHOTGUN: a novel, cooperative, fold-recognition meta-predictor. Proteins (2003) 51:434441.[CrossRef][Web of Science][Medline]
- Tomii K., Akiyama Y. FORTE: a profile-profile comparison tool for protein fold recognition. Bioinformatics (2004) 20:594595.[Abstract/Free Full Text]
- Soding J., Biegert A., Lupas A. N. The HHpred interactive server for protein homology detection and structure prediction. Nucleic Acids Res. (2005) 33(Web Server issue):W244W248.[Abstract/Free Full Text]
- DeLano W. The pymol molecular graphics system. (2002) http://www.pymol.org.

CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:

|
 |

|
 |
 
S. Pebernard, J. J. P. Perry, J. A. Tainer, and M. N. Boddy
Nse1 RING-like Domain Supports Functions of the Smc5-Smc6 Holocomplex in Genome Stability
Mol. Biol. Cell,
October 1, 2008;
19(10):
4099 - 4109.
[Abstract]
[Full Text]
[PDF]
|
 |
|