Nucleic Acids Research, 2003, Vol. 31, No. 13 3833-3835
© 2003 Oxford University Press
NORSp: predictions of long regions without regular secondary structure
1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA 2 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St Nicholas Avenue, New York, NY 10032, USA 3 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA 4 Department of Pharmacology, Columbia University, 630 West 168th Street, New York, NY 10032, USA
*To whom correspondence should be addressed. Tel: +1 2123054018; Fax: +1 2123057932; Email: rost{at}columbia.edu
Received February 14, 2003; Revised and Accepted March 17, 2003
| ABSTRACT |
|---|
|
|
|---|
Many structurally flexible regions play important roles in biological processes. It has been shown that extended loopy regions are very abundant in the protein universe and that they have been conserved through evolution. Here, we present NORSp, a publicly available predictor for disordered regions in protein. Specifically, NORSp predicts long regions with NO Regular Secondary structure. Upon user submission of a protein sequence, NORSp will analyse the protein for its secondary structure, presence of transmembrane helices and coiled-coil. It will then return email to the user about the presence and position of disordered regions. NORSp can be accessed from http://cubic.bioc.columbia.edu/services/NORSp/.
| INTRODUCTION |
|---|
|
|
|---|
Irregular structures mediate function
The three-dimensional (3D) structure of a protein is assumed to largely determine its biological function. The first decades of rapid progress in the experimental determination of 3D structures by X-ray crystallography (1) focused on determining rigid structures at high resolution. Recently, a new type of structure has emerged with very long regions that appear to adopt regular structure only upon binding to substrates or other proteins (2); they are referred to as floppy, natively disordered, natively unfolded or loopy (3,47). It seems that these irregular regions are important for function.
Predicting irregular structures
Structural irregularity can be studied from several aspects: one class of natively disordered regions was defined as the regions invisible in electron density maps of X-ray diffraction, presumably since the flexibility keeps them from crystallising into well-ordered structures. These regions sometimes are associated with regions with compositional bias or low sequence complexity (810). Another class is characterised by proteins that appear unfolded by CD measurements (5). Previously, we investigated the problem of disordered proteins from a structure-oriented perspective and studied extended regions of very low regular secondary structure (helix or strand) content (NORS) (3). We showed that NORS regions are particularly abundant in eukaryotic proteomes, conserved during evolution, over-represented in regulatory function category and important in proteinprotein interactions. These results were in agreement with studies that predicted natively disordered regions through neural networks (11).
Here, we introduced a web-based interface to make our method of predicting NORS regions publicly accessible. The method can be useful for biologists in several ways. For example, crystallographers can check whether their proteins contain NORS regions and make the decision about whether to proceed with the experiments since NORS proteins may be difficult to crystallise, as demonstrated by their low occurrence in PDB (3). Biologists interested in protein structurefunction relationship may also find it interesting to verify whether the proteinprotein interaction sites coincide with NORS regions.
| DESIGN AND IMPLEMENTATION |
|---|
|
|
|---|
Definition of NORS
We defined NORS regions as segments of >70 consecutive residues with <12% of the residues in helix, strand or coiled-coil regions and with at least one segment of 10 adjacent residues exposed to solvent. We identify such NORS regions by merging predictions of secondary structure, transmembrane helices and coiled-coil regions. We pre-calculate this information as well as NORS regions for each protein in >60 completely sequenced genomes (Fig. 1), and have included them in our PEP database (12) through a searchable SRS (13) interface (http://cubic.bioc.columbia.edu/db/PEP/). NORS information has also been used in our target selection process for North East Structural Genomics Consortium (14) to exclude proteins likely to pose problems to crystallisation.
|
Prediction by NORSp
Protein sequences submitted to our web site are subjected to the following steps. (a) Build sequence profile through a database search with an automated, iterated PSI-BLAST (15). (b) Secondary structure and solvent accessibility are predicted by PROFphd (16), membrane helices are predicted by the PHDhtm (17) using the PSI-BLAST profiles. (c) Coiled-coil regions are predicted by COILS (18). (d) The secondary structure, membrane helices and coiled-coil information are then combined to calculate the structural content for each sequence window of a certain length, and NORS regions are identified when the structural content is below the given threshold; overlapping NORS regions are joined. Technically, to obtain most of these intermediate results, NORSp utilises the same engine which is behind the PredictProtein server (19) (http://cubic.bioc.columbia.edu/predictprotein/).
| INPUT, OUTPUT AND ADVANCED OPTIONS |
|---|
|
|
|---|
Input
The input to NORSp is protein sequence; proteins shorter than 70 residues are returned unprocessed. Currently, the valid input format is a sequence in one-letter residue code or a FASTA-format. The sequence can be entered into the sequence text box or uploaded from users' local disk.
Output
Users have the option of receiving succinct output, which only shows the position of the NORS region in the context of the submitted sequence, or verbose output, which includes the intermediate data used by NORSp: secondary structure, solvent accessibility, transmembrane helices and coiled-coil prediction. By default, the results will be in plain text (ASCII) format. However, HTML formatted results can also be requested that can be displayed in any web browser. Due to concerns about file size and user mailbox overflow, the results will normally be available to download from our website and only URLs are sent to the users by email unless users request the full results being sent directly.
Recommendation and advanced options
We determined the particular threshold used to define NORS regions in order to minimise the false positive rate as determined by manually inspecting PDB proteins (3). This conservative solution implies that the vast majority of NORS regions that we detect are likely to constitute structurally irregular, floppy, loopy or natively disordered regions. However, we supposedly miss many such regions in our predictions. Users who are aware of this may be interested in changing the threshold to see which regions may be good candidates for irregular regions although not detected by our default. We provide three options for advanced users: the size of sequence window for calculating secondary structure content (default=70), maximum of secondary structure content (default=12%) and the minimum length of consecutive exposed residues (default=10).
| ACKNOWLEDGEMENTS |
|---|
We are grateful to Hepan Tan (Columbia) for his help in developing the tool. This work was supported by grants 1-P50-GM62413-01 and RO1-GM63029-01 from the National Institute of Health (NIH). Last, but not least, thanks go to all those who deposit their experimental data in public databases and to those who maintain these databases.
| REFERENCES |
|---|
|
|
|---|
- Hendrickson,W.A. (1991) Determination of macromolecular structures from anomalous diffraction of synchrotron radiation. Science, 254, 5158.
[Abstract/Free Full Text] - Wright,P.E. and Dyson,H.J. (1999) Intrinsically unstructured proteins: re-assessing the protein structure-function paradigm. J. Mol. Biol., 293, 321331.[CrossRef][Web of Science][Medline]
- Liu,J., Tan,H. and Rost,B. (2002) Loopy proteins appear conserved in evolution. J. Mol. Biol., 322, 5364.[CrossRef][Web of Science][Medline]
- Dunker,A.K. and Obradovic,Z. (2001) The protein trinity-linking function and disorder. Nat. Biotechnol., 19, 805806.[CrossRef][Web of Science][Medline]
- Uversky,V.N., Gillespie,J.R. and Fink,A.L. (2000) Why are natively unfolded proteins unstructured under physiologic conditions? Proteins, 41, 415427.[CrossRef][Web of Science][Medline]
- Zetina,C.R. (2001) A conserved helix-unfolding motif in the naturally unfolded proteins. Proteins, 44, 479483.[CrossRef][Web of Science][Medline]
- Namba,K. (2001) Roles of partly unfolded conformations in macromolecular self-assembly. Genes Cells, 6, 112.[Abstract]
- Dunker,A.K., Garner,E., Guilliot,S., Romero,P., Albrecht,K., Hart,J., Obradovic,Z., Kissinger,C. and Villafranca,J.E. (1998) Protein disorder and the evolution of molecular recognition: theory, predictions and observations. Pac. Symp. Biocomput., 473484.
- Wootton,J.C. and Federhen,S. (1996) Analysis of compositionally biased regions in sequence databases. Methods Enzymol., 266, 554571.[Web of Science][Medline]
- Dunker,A.K., Lawson,J.D., Brown,C.J., Williams,R.M., Romero,P., Oh,J.S., Oldfield,C.J., Campen,A.M., Ratliff,C.M., Hipps,K.W. et al. (2001) Intrinsically disordered protein. J. Mol. Graph. Model., 19, 2659.[CrossRef][Web of Science][Medline]
- Romero,P., Obradovic,Z., Li,X., Garner,E.C., Brown,C.J. and Dunker,A.K. (2001) Sequence complexity of disordered protein. Proteins, 42, 3848.[CrossRef][Web of Science][Medline]
- Carter,P., Liu,J. and Rost,B. (2003) PEP: predictions for entire proteomes. Nucleic Acids Res., 31, 410413.
[Abstract/Free Full Text] - Etzold,T. and Argos,P. (1993) SRSan indexing and retrieval tool for flat file data libraries. Comput. Appl. Biosci., 9, 4957.
[Abstract/Free Full Text] - Liu,J. and Rost,B. (2002) Target space for structural genomics revisited. Bioinformatics, 18, 922933.
[Abstract/Free Full Text] - Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 33893402.
[Abstract/Free Full Text] - Rost,B. (2001) Review: protein secondary structure prediction continues to rise. J. Struct. Biol., 134, 204218.[Web of Science][Medline]
- Rost,B., Casadio,R. and Fariselli,P. (1996) Topology prediction for helical transmembrane proteins at 86% accuracy. Protein Sci., 5, 17041718.[Web of Science][Medline]
- Lupas,A. (1996) Prediction and analyis of coiled-coil structures. Methods Enzymol., 266, 513525.[Web of Science][Medline]
- Rost,B. and Liu,J. (2003) The PredictProtein server. Nucleic Acids Res., 31, 33003304.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
C. Lepere-Douard, M. Trotard, J. Le Seyec, and P. Gripon The First Transmembrane Domain of the Hepatitis B Virus Large Envelope Protein Is Crucial for Infectivity J. Virol., November 15, 2009; 83(22): 11819 - 11829. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Rotem, C. Katz, H. Benyamini, M. Lebendiker, D. Veprintsev, S. Rudiger, T. Danieli, and A. Friedler The Structure and Interactions of the Proline-rich Domain of ASPP2 J. Biol. Chem., July 4, 2008; 283(27): 18990 - 18999. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Rogemond, C. Auger, P. Giraudon, M. Becchi, N. Auvergnon, M.-F. Belin, J. Honnorat, and M. Moradi-Ameli Processing and Nuclear Localization of CRMP2 during Brain Development Induce Neurite Outgrowth Inhibition J. Biol. Chem., May 23, 2008; 283(21): 14751 - 14761. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Jakubiec, G. Drugeon, L. Camborde, and I. Jupin Proteolytic Processing of Turnip Yellow Mosaic Virus Replication Proteins and Functional Impact on Infectivity J. Virol., October 15, 2007; 81(20): 11402 - 11412. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlessinger, M. Punta, and B. Rost Natively unstructured regions in proteins identified from contact predictions Bioinformatics, September 15, 2007; 23(18): 2376 - 2384. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Li, T. Mao, Z. Zhang, and M. Yuan The AtMAP65-1 Cross-Bridge Between Microtubules is Formed by One Dimer Plant Cell Physiol., June 1, 2007; 48(6): 866 - 874. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. H. Tsang, M. L. Woodruff, K. M. Janisch, M. C. Cilluffo, D. B. Farber, and G. L. Fain Removal of phosphorylation sites of {gamma} subunit of phosphodiesterase 6 alters rod light response J. Physiol., March 1, 2007; 579(2): 303 - 312. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Bhalla, G. B. Storchan, C. M. MacCarthy, V. N. Uversky, and O. Tcherkasskaya Local Flexibility in Molecular Function Paradigm Mol. Cell. Proteomics, July 1, 2006; 5(7): 1212 - 1223. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Hoppner, A. Carle, D. Sivanesan, S. Hoeppner, and C. Baron The putative lytic transglycosylase VirB1 from Brucella suis interacts with the type IV secretion system core components VirB8, VirB9 and VirB11 Microbiology, November 1, 2005; 151(11): 3469 - 3482. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Prilusky, C. E. Felder, T. Zeev-Ben-Mordehai, E. H. Rydberg, O. Man, J. S. Beckmann, I. Silman, and J. L. Sussman FoldIndex(C): a simple tool to predict whether a given protein sequence is intrinsically unfolded Bioinformatics, August 15, 2005; 21(16): 3435 - 3438. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Coeytaux and A. Poupon Prediction of unfolded segments in a protein sequence based on amino acid composition Bioinformatics, May 1, 2005; 21(9): 1891 - 1900. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mika and B. Rost NMPdb: Database of Nuclear Matrix Proteins Nucleic Acids Res., January 1, 2005; 33(suppl_1): D160 - D163. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Cherepanov, E. Devroe, P. A. Silver, and A. Engelman Identification of an Evolutionarily Conserved Domain in Human Lens Epithelium-derived Growth Factor/Transcriptional Co-activator p75 (LEDGF/p75) That Binds HIV-1 Integrase J. Biol. Chem., November 19, 2004; 279(47): 48883 - 48892. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Jakubiec, J. Notaise, V. Tournier, F. Hericourt, M. A. Block, G. Drugeon, L. van Aelst, and I. Jupin Assembly of Turnip Yellow Mosaic Virus Replication Complexes: Interaction between the Proteinase and Polymerase Domains of the Replication Proteins J. Virol., August 1, 2004; 78(15): 7945 - 7957. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Rost, G. Yachdav, and J. Liu The PredictProtein server Nucleic Acids Res., July 1, 2004; 32(suppl_2): W321 - W326. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Rost and J. Liu The PredictProtein server Nucleic Acids Res., July 1, 2003; 31(13): 3300 - 3304. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||








