Published online 20 September 2005
Article |
Gibbs sampling and helix-cap motifs
1NEC Laboratories America, Inc. 4 Independence Way, Princeton, NJ 08544, USA 2Department of Biopharmaceutical Sciences, California Institute for Quantitative Biomedical Research UCSF Box 2540 University of California San Francisco San Francisco, CA 94143-2540, USA 3Department of Biochemistry and Biophysics, California Institute for Quantitative Biomedical Research UCSF Box 2540 University of California San Francisco San Francisco, CA 94143-2540, USA 4Department of Molecular Biology, Princeton University Princeton, NJ 08544-1014, USA
*To whom correspondence should be addressed. Tel: +1 609 951 2628; Fax: +1 609 951 2482; Email: kruus{at}nec-labs.com
Received April 13, 2005. Revised August 8, 2005. Accepted August 30, 2005.
Protein backbones have characteristic secondary structures, including
-helices and ß-sheets. Which structure is adopted locally is strongly biased by the local amino acid sequence of the protein. Accurate (probabilistic) mappings from sequence to structure are valuable for both secondary-structure prediction and protein design. For the case of
-helix caps, we test whether the information content of the sequencestructure mapping can be self-consistently improved by using a relaxed definition of the structure. We derive helix-cap sequence motifs using database helix assignments for proteins of known structure. These motifs are refined using Gibbs sampling in competition with a null motif. Then Gibbs sampling is repeated, allowing for frameshifts of ±1 amino acid residue, in order to find sequence motifs of higher total information content. All helix-cap motifs were found to have good generalization capability, as judged by training on a small set of non-redundant proteins and testing on a larger set. For overall prediction purposes, frameshift motifs using all training examples yielded the best results. Frameshift motifs using a fraction of all training examples performed best in terms of true positives among top predictions. However, motifs without frameshifts also performed well, despite a roughly one-third lower total information content.
Correspondence may also be addressed to Ned S. Wingreen. Tel: +1 609 258 8476; Fax: +1 609 258 8616; Email: wingreen{at}molbio.princeton.edu