Published online 7 July 2004
Nucleic Acids Research, Vol. 32 No. 12 © Oxford University Press 2004; all rights reserved
Sequence-based prediction of protein domains
1 CUBIC, Department of Biochemistry and Molecular Biophysics, 2 Columbia University Center for Computational Biology and Bioinformatics (C2B2) and 3 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA
* To whom correspondence should be addressed at CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, New York, NY 10032, USA. Tel: +1 212 305 4018; Fax: +1 212 305 7932; Email: liu{at}cubic.bioc.columbia.edu
Received January 20, 2004; Revised April 18, 2004; Accepted June 16, 2004
Guessing the boundaries of structural domains has been an important and challenging problem in experimental and computational structural biology. Predictions were based on intuition, biochemical properties, statistics, sequence homology and other aspects of predicted protein structure. Here, we introduced CHOPnet, a de novo method that predicts structural domains in the absence of homology to known domains. Our method was based on neural networks and relied exclusively on information available for all proteins. Evaluating sustained performance through rigorous cross-validation on proteins of known structure, we correctly predicted the number of domains in 69% of all proteins. For 50% of the two-domain proteins the centre of the predicted boundary was closer than 20 residues to the boundary assigned from three-dimensional (3D) structures; this was about eight percentage points better than predictions by equal split. Our results appeared to compare favourably with those from previously published methods. CHOPnet may be useful to restrict the experimental testing of different fragments for structure determination in the context of structural genomics.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. N.I. Pang, K. Lin, M. A. Wouters, J. Heringa, and R. A. George Identifying foldable regions in protein sequence from the hydrophobic signal Nucleic Acids Res., February 2, 2008; 36(2): 578 - 588. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Smialowski, A. J. Martin-Galiano, A. Mikolajka, T. Girschick, T. A. Holak, and D. Frishman Protein solubility: sequence based prediction and experimental verification Bioinformatics, October 1, 2007; 23(19): 2536 - 2542. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Cheng DOMAC: an accurate, hybrid protein domain prediction server Nucleic Acids Res., July 13, 2007; 35(suppl_2): W354 - W356. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Levitt Growth of novel protein structural data PNAS, February 27, 2007; 104(9): 3183 - 3188. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Chen, W. Wang, S. Ling, C. Jia, and F. Wang KemaDom: a web server for domain prediction using kernel machine with local context. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W158 - W163. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Bae, B. K. Mallick, and C. G. Elsik Prediction of protein interdomain linker regions by a hidden Markov model Bioinformatics, May 15, 2005; 21(10): 2264 - 2270. [Abstract] [Full Text] [PDF] |
||||


