Nucleic Acids Research, 2003, Vol. 31, No. 13 3789-3791
© 2003 Oxford University Press
UniqueProt: creating representative protein sequence sets
1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA 2 Institute of Physical Biochemistry, University Witten/Herdecke, Stockumer Strasse 10, 58448 Witten, Germany 3 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St Nicholas Avenue, New York, NY 10032, USA 4 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA
*To whom correspondence should be addressed. Tel: +1 2123054018, Fax: +1 2123057932; Email: mika{at}cubic.bioc.columbia.edu
UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the representatives are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
F. Sirocco and S. C. E. Tosatto TESE: generating specific protein structure test set ensembles Bioinformatics, November 15, 2008; 24(22): 2632 - 2633. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ofran, A. Schlessinger, and B. Rost Automated Identification of Complementarity Determining Regions (CDRs) Reveals Peculiar Characteristics of CDRs and B Cell Epitopes J. Immunol., November 1, 2008; 181(9): 6230 - 6235. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sweredoski and P. Baldi PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure Bioinformatics, June 15, 2008; 24(12): 1459 - 1460. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Shu, T. Zhou, and S. Hovmoller Prediction of zinc-binding sites in proteins from sequence Bioinformatics, March 15, 2008; 24(6): 775 - 782. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlessinger, M. Punta, and B. Rost Natively unstructured regions in proteins identified from contact predictions Bioinformatics, September 15, 2007; 23(18): 2376 - 2384. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Ofran, V. Mysore, and B. Rost Prediction of DNA-binding residues from sequence Bioinformatics, July 1, 2007; 23(13): i347 - i353. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu UniRef: comprehensive and non-redundant UniProt reference clusters Bioinformatics, May 15, 2007; 23(10): 1282 - 1288. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. T. Chang, D. Ghosh, D. E. Kirschner, and J. J. Linderman Peptide length-based prediction of peptide-MHC class II binding Bioinformatics, November 15, 2006; 22(22): 2761 - 2767. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Ferre and P. Clote DiANNA 1.1: an extension of the DiANNA web server for ternary cysteine classification. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W182 - W185. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Schlessinger, Y. Ofran, G. Yachdav, and B. Rost Epitome: database of structure-inferred antigenic epitopes Nucleic Acids Res., January 1, 2006; 34(suppl_1): D777 - D780. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Wang and R. L. Dunbrack Jr PISCES: recent improvements to a PDB sequence culling server Nucleic Acids Res., July 1, 2005; 33(suppl_2): W94 - W98. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mika and B. Rost NMPdb: Database of Nuclear Matrix Proteins Nucleic Acids Res., January 1, 2005; 33(suppl_1): D160 - D163. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Mika and B. Rost NLProt: extracting protein names and sequences from papers Nucleic Acids Res., July 1, 2004; 32(suppl_2): W634 - W637. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. R. Bigelow, D. S. Petrey, J. Liu, D. Przybylski, and B. Rost Predicting transmembrane beta-barrels in proteomes Nucleic Acids Res., May 11, 2004; 32(8): 2566 - 2577. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Passerini and P. Frasconi Learning to discriminate between ligand-bound and disulfide-bound cysteines Protein Eng. Des. Sel., April 1, 2004; 17(4): 367 - 373. [Abstract] [Full Text] [PDF] |
||||



