Nucleic Acids Research Advance Access originally published online on April 4, 2008
Nucleic Acids Research 2008 36(9):3025-3030; doi:10.1093/nar/gkn159
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2008, Vol. 36, No. 9 3025-3030
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
Using support vector machine combined with auto covariance to predict protein–protein interactions from protein sequences
1College of Chemistry, Sichuan University, Chengdu 610064 and 2State Key Laboratory of Biotherapy, Sichuan University, Chengdu 610041, P.R. China
*To whom correspondence should be addressed. Tel: +86 28 89005151; Fax: +86 28 85412356; Email: liml{at}scu.edu.cn
Received January 10, 2008. Revised March 3, 2008. Accepted March 20, 2008.
Compared to the available protein sequences of different organisms, the number of revealed protein–protein interactions (PPIs) is still very limited. So many computational methods have been developed to facilitate the identification of novel PPIs. However, the methods only using the information of protein sequences are more universal than those that depend on some additional information or predictions about the proteins. In this article, a sequence-based method is proposed by combining a new feature representation using auto covariance (AC) and support vector machine (SVM). AC accounts for the interactions between residues a certain distance apart in the sequence, so this method adequately takes the neighbouring effect into account. When performed on the PPI data of yeast Saccharomyces cerevisiae, the method achieved a very promising prediction result. An independent data set of 11 474 yeast PPIs was used to evaluate this prediction model and the prediction accuracy is 88.09%. The performance of this method is superior to those of the existing sequence-based methods, so it can be a useful supplementary tool for future proteomics studies. The prediction software and all data sets used in this article are freely available at http://www.scucic.cn/Predict_PPI/index.htm.