Skip Navigation

Nucleic Acids Research 2005 33(Web Server Issue):W230-W232; doi:10.1093/nar/gki412
This Article
Right arrow Abstract Freely available
Right arrow Print PDF (553K) Freely available
Right arrow Screen PDF (271K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Ferrè, F.
Right arrow Articles by Clote, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ferrè, F.
Right arrow Articles by Clote, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© The Author 2005. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions{at}oupjournals.org


Article

DiANNA: a web server for disulfide connectivity prediction

F. Ferrè1 and P. Clote1,2,*

1Department of Biology, Boston College Chestnut Hill, MA 02467, USA 2Department of Computer Science (courtesy appointment), Boston College Chestnut Hill, MA 02467, USA

*To whom correspondence should be addressed. Tel: +1 617 552 1332; Fax: +1 617 552 2011; Email: clote{at}bc.edu

Received February 13, 2005. Revised March 21, 2005. Accepted March 21, 2005.


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 SERVER DESCRIPTION
 DISCUSSION
 REFERENCES
 
Correctly predicting the disulfide bond topology in a protein is of crucial importance for the understanding of protein function and can be of great help for tertiary prediction methods. The web server http://clavius.bc.edu/~clotelab/DiANNA/ outputs the disulfide connectivity prediction given input of a protein sequence. The following procedure is performed. First, PSIPRED is run to predict the protein's secondary structure, then PSIBLAST is run against the non-redundant SwissProt to obtain a multiple alignment of the input sequence. The predicted secondary structure and the profile arising from this alignment are used in the training phase of our neural network. Next, cysteine oxidation state is predicted, then each pair of cysteines in the protein sequence is assigned a likelihood of forming a disulfide bond—this is performed by means of a novel architecture (diresidue neural network). Finally, Rothberg's implementation of Gabow's maximum weighted matching algorithm is applied to diresidue neural network scores in order to produce the final connectivity prediction. Our novel neural network-based approach achieves results that are comparable and in some cases better than the current state-of-the-art methods.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 SERVER DESCRIPTION
 DISCUSSION
 REFERENCES
 
Disulfide bonds are covalently bonded sulfur atoms from nonadjacent cysteine residues, which stabilize the protein structure and are often found in extracytoplasmatic proteins. The knowledge of cysteine connectivity (i.e. which, if any, pairs of cysteines form a bond in a given protein sequence) can reduce greatly the conformational space for protein structure prediction algorithms. Moreover, as shown by Chuang and co-workers (1), a similar disulfide connectivity pattern frequently implies a structural similarity even when the sequence similarity is undetectable. Notwithstanding, only a few attempts have been made to solve this problem. In contrast, many methods have been developed for the related, but simpler problem of cysteine oxidation state prediction, i.e. to determine the cysteines that are involved in a disulfide bond, without predicting the connectivity pattern. Recent methods based on machine learning techniques have reached an outstanding accuracy of 90% on certain test data (25). In spite of this, accuracy for the disulfide connectivity problem remains measured. The reason for this is simple—amino acids that flank half-cystines (disulfide-bonded cysteines) are quite different from those that flank free cysteines (non-bonded cysteines) (6,7). In contrast, the residues that flank two incorrectly paired half-cystines are quite similar to those that flank the half-cystines in a disulfide bond. Two recent and remarkable papers based on different approaches (8,9) outperform early attempts by Fariselli and co-workers (10,11). The Vullo and Frasconi method (9) uses recursive neural networks (12) to score undirected graphs that represent cysteine connectivity. The method of Zhao and co-workers (8) is based on recurrent patterns of sequence separation between bonded half-cystines. Web servers that allow online disulfide connectivity prediction are available for Vullo/Frasconi (http://cassandra.dsi.unifi.it/cysteines) and, as a prototype, for Fariselli/Casadio (http://gpcr.biocomp.unibo.it/cgi/predictors/cys-cys/pred_dconcgi.cgi). Here, we describe a web server for disulfide connectivity prediction that implements our novel approach, which results in comparable and sometimes better than the state-of-the-art methods (8,9). Algorithm details and performance of the method are described previously by Ferrè and Clote (13).


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 SERVER DESCRIPTION
 DISCUSSION
 REFERENCES
 
The stand-alone program for disulfide connectivity prediction, implemented in our web server DIANNA (for DiAminoacid Neural Network Application), uses a three-step procedure. First, a neural network is trained to recognize cysteines in an oxidized state (sulfur covalently bonded) as distinct from cysteines in a reduced state (sulfur occurring in reactive sulfhydryl group SH), based on the previous work by Fariselli et al. (14) only those monomers that have at least two predicted half-cysteines are submitted to the second step. The neural network input is a window of size w centered at each cysteine in the sequence. This first filtering step is called Module A. Then, a second neural network (Module B) is used to score each pair of symmetric windows of size w, each one centered at a cysteine in the input sequence. The network input contains evolutionary information, i.e. each residue is encoded by 20 input units corresponding to the PSIBLAST-computed profile row (obtained from the multiple alignment of the input sequence against the non-redundant SwissProt), and secondary structure information, computed using PSIPRED (15) and encoded in unary format by the addition of three input units, e.g. helix is encoded 1 0 0, coil is 0 1 0 and sheet is 0 0 1). Using secondary structure information leads to a marked improvement and is justified by the bias in the secondary structure preference of free cysteines and half-cystines (16). The architecture of the Module B neural network is as follows. Given an encoded input containing secondary structure information, thus having w x 23 input units, we designed a first hidden layer containing units, one for each pair 1 ≤ i < j ≤ w of positions, with connections to input units representing the profile for residues at position i, j and secondary structures at those positions. Thus, each of the w (w – 1)/2 hidden units in the first hidden layer (the diresidue layer) is connected to 2 (20 + 3) = 46 input units (Figure 1). A second hidden layer, containing five units, all fully connected with those of the first hidden layer, is then fully connected to the single output unit. We designed this unusual neural network architecture, with the aim of emphasizing the signal that arises when using diresidue position-specific scoring matrices (13), i.e. for all windows of length w, for positions 1 ≤ i < j ≤ w and amino acids a, b, we consider the frequency of occurrence of amino acid a in position i when amino acid b is found in position j; moreover, though there are many hidden units, the training phase is still reasonably fast since the diresidue layer is not fully connected with the input layer.



View larger version (37K):
[in this window]
[in a new window]
 
Figure 1 A toy example of the diresidue neural network architecture. Six input units (named 1, ..., 6) are connected to the units of the first hidden layer (7, ..., 21), called the diresidue layer. Each pair of input units is connected to a distinct unit in the diresidue layer. The units of the diresidue layer are then fully connected to the five units (22, ..., 26) of the second hidden layer, which are fully connected to the single output unit. Using the second hidden layer provided a better performance than connecting the diresidue layer units directly to the output unit. In the DIANNA application, each residue is encoded by 23 input units (20 encoding the evolutionary information and 3 for the secondary structure information); therefore, each unit in the diresidue layer is connected to 23 + 23 = 46 input units that code a pair of residues.

 
Finally, following Fariselli and Casadio (10), our algorithm applies the Edmonds–Gabow maximum weight matching algorithm (17,18), using Ed Rothberg's implementation wmatch (http://elib.zib.de/pub/Packages/mathprog/matching/weighted), to the weighted complete graph, whose nodes are half-cystines and whose weights are values output from the neural network of Module B. This last step is called Module C.


    SERVER DESCRIPTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 SERVER DESCRIPTION
 DISCUSSION
 REFERENCES
 
The web server takes as input a protein sequence in FASTA format and can output the following: (i) oxidation state prediction for all the cysteines in the input sequence, using our implementation of the neural network described in (14) (Module A); (ii) a score for each pair of cysteines in the input, obtained by our diresidue neural network (Module B); (iii) the disulfide connectivity prediction obtained using the maximum weighted matching algorithm (Module C) applied to the scores of Module B. The user is warned if Module A predicts less than two half-cystines in the input sequence. A statistical evaluation of the connectivity prediction is not attempted. A sample output is shown in Figure 2.



View larger version (55K):
[in this window]
[in a new window]
 
Figure 2 Output from DIANNA when given as input the sequence for human growth hormone receptor (SwissProt ID GHR_HUMAN, PDB code 1kf9 [PDB] chain F). This protein has 6 cysteines that form 3 disulfide bonds, with connectivity pattern 1–2, 3–4, 5–6 (between cysteines 6 and 16, 33 and 44, 58 and 72). The upper portion of the output page reports the Module B score (see text) for each pair of cysteines, ranging from 0 to 1 (scores >0.9 are highlighted). In the lower portion, the proposed connectivity (i.e. the Module C output) is shown.

 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 SERVER DESCRIPTION
 DISCUSSION
 REFERENCES
 
Trained and tested on a list of proteins having at most five and at lest two bonds, equivalent to those used in (9,11), the software achieves a rate Qp of 49% for perfect predictions (i.e. the fraction of proteins for which there are no false-positive or false-negative predictions made), 86% accuracy and 51% Matthews' correlation coefficient (13). For proteins having two and four bonds, the fraction of perfect predictions improves to 62 and 55%, respectively. Although future improvement for disulfide connectivity is still desired, our approach is nonetheless reliable when used on proteins having a relatively small number of disulfide bonds.


    ACKNOWLEDGEMENTS
 
Funding to pay the Open Access publication charges for this article was provided by Boston College.

Conflict of interest statement. None declared.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 SERVER DESCRIPTION
 DISCUSSION
 REFERENCES
 

  1. Chuang, C.C., Chen, C.Y., Yang, J.M., Lyu, P.C., Hwang, J.K. (2003) Relationship between protein structures and disulfide-bonding patterns Proteins, 53, 1–5[CrossRef][Web of Science][Medline] .

  2. Martelli, P.L., Fariselli, P., Casadio, R. (2004) Prediction of disulfide-bonded cysteines in proteomes with a hidden neural network Proteomics, 4, 1665–1671[CrossRef][Web of Science][Medline] .

  3. Martelli, P.L., Fariselli, P., Malaguti, L., Casadio, R. (2002) Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks Protein Eng., 15, 951–953[Abstract/Free Full Text] .

  4. Mucchielli-Giorgi, M.H., Hazout, S., Tuffery, P. (2002) Predicting the disulfide bonding state of cysteines using protein descriptors Proteins, 46, 243–249[CrossRef][Web of Science][Medline] .

  5. Chen, Y.C., Lin, Y.S., Lin, C.J., Hwang, J.K. (2004) Prediction of the bonding states of cysteines using the support vector machines based on multiple feature vectors and cysteine state sequences Proteins, 55, 1036–1042[CrossRef][Web of Science][Medline] .

  6. Muskal, S.M., Holbrook, S.R., Kim, S.H. (1990) Prediction of the disulfide-bonding state of cysteine in proteins Protein Eng., 3, 667–672[Abstract/Free Full Text] .

  7. Fiser, A., Cserzo, M., Tudos, E., Simon, I. (1992) Different sequence environments of cysteines and half cystines in proteins. Application to predict disulfide forming residues FEBS Lett., 302, 117–120[CrossRef][Web of Science][Medline] .

  8. Zhao, E., Liu, H.L., Tsai, C.H., Tsai, H.K., Chan, C.H., Kao, C.Y. (2004) Cysteine separations profiles (CSP) on protein sequences infer disulfide connectivity Bioinformatics, 20, 653–659[Abstract/Free Full Text] .

  9. Vullo, A. and Frasconi, P. (2004) Disulfide connectivity prediction using recursive neural networks and evolutionary information Bioinformatics, 20, 653–659 .

  10. Fariselli, P. and Casadio, R. (2001) Prediction of disulfide connectivity in proteins Bioinformatics, 17, 957–964[Abstract/Free Full Text] .

  11. Fariselli, P., Martelli, P.L., Casadio, R. (2002) A neural network based method for predicting the disulfide connectivity in proteins In Damiani, E. (Ed.), et al. Knowledge Based Intelligent Information Engineering Systems and Allied Technologies (KES), Amsterdam IOS Press pp. 464–468 .

  12. Frasconi, P., Gori, M., Sperduti, A. (1998) A general framework for adaptive processing of data structures IEEE Trans. Neural Netw., 9, 768–786[Medline] .

  13. Ferrè, F. and Clote, P. (2005) Disulfide connectivity prediction using secondary structure information and diresidue frequencies Bioinformatics, in press .

  14. Fariselli, P., Riccobelli, P., Casadio, R. (1999) Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins Proteins, 36, 340–346[CrossRef][Web of Science][Medline] .

  15. Jones, D.T. (1999) Protein secondary structure prediction based on position-specific scoring matrices J. Mol. Biol., 292, 195–202[CrossRef][Web of Science][Medline] .

  16. Petersen, M.T., Jonson, P.H., Petersen, S.B. (1999) Amino acid neighbours and detailed conformational analysis of cysteines in proteins Protein Eng., 12, 535–548[Abstract/Free Full Text] .

  17. Gabow, H. Implementation of Algorithms for Maximum Matching on Nonbipartite Graphs (1973) CA Stanford University PhD Thesis .

  18. Lovasz, L. and Plummer, M. Matching Theory. B.V. North Holland Mathematical Studies, (1985) Elsevier Science Publishers Vol. 121, .


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Exp. Biol. Med.Home page
V. L. Kolossov, B. Q. Spring, A. Sokolowski, J. E. Conour, R. M. Clegg, P. J. A. Kenis, and H. R. Gaskins
Engineering Redox-Sensitive Linkers for Genetically Encoded FRET-Based Biosensors
Experimental Biology and Medicine, February 1, 2008; 233(2): 238 - 248.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Song, Z. Yuan, H. Tan, T. Huber, and K. Burrage
Predicting disulfide connectivity from protein sequence using multiple sequence feature vectors and secondary structure
Bioinformatics, December 1, 2007; 23(23): 3147 - 3154.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
J. Zhang
Disulfide-Bond Reshuffling in the Evolution of an Ape Placental Ribonuclease
Mol. Biol. Evol., February 1, 2007; 24(2): 505 - 512.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Ceroni, A. Passerini, A. Vullo, and P. Frasconi
DISULFIND: a disulfide bonding state and cysteine connectivity prediction server.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W177 - W181.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Ferre and P. Clote
DiANNA 1.1: an extension of the DiANNA web server for ternary cysteine classification.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W182 - W185.
[Abstract] [Full Text] [PDF]


Home page
Mol. Pharmacol.Home page
K. Susankova, K. Tousova, L. Vyklicky, J. Teisinger, and V. Vlachova
Reducing and Oxidizing Agents Sensitize Heat-Activated Vanilloid Receptor (TRPV1) Current
Mol. Pharmacol., July 1, 2006; 70(1): 383 - 394.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (553K) Freely available
Right arrow Screen PDF (271K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Ferrè, F.
Right arrow Articles by Clote, P.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Ferrè, F.
Right arrow Articles by Clote, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?