Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (111K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (42)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Mika, S.
Right arrow Articles by Rost, B.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Mika, S.
Right arrow Articles by Rost, B.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2003, Vol. 31, No. 13 3789-3791
© 2003 Oxford University Press

UniqueProt: creating representative protein sequence sets

Sven Mika*,1,2 and Burkhard Rost1,3,4

1 CUBIC, Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA 2 Institute of Physical Biochemistry, University Witten/Herdecke, Stockumer Strasse 10, 58448 Witten, Germany 3 Columbia University Center for Computational Biology and Bioinformatics (C2B2), Russ Berrie Pavilion, 1150 St Nicholas Avenue, New York, NY 10032, USA 4 North East Structural Genomics Consortium (NESG), Department of Biochemistry and Molecular Biophysics, Columbia University, 650 West 168th Street BB217, New York, NY 10032, USA

*To whom correspondence should be addressed. Tel: +1 2123054018, Fax: +1 2123057932; Email: mika{at}cubic.bioc.columbia.edu

UniqueProt is a practical and easy to use web service designed to create representative, unbiased data sets of protein sequences. The largest possible representative sets are found through a simple greedy algorithm using the HSSP-value to establish sequence similarity. UniqueProt is not a real clustering program in the sense that the ‘representatives’ are not at the centres of well-defined clusters since the definition of such clusters is problem-specific. Overall, UniqueProt is a reasonable fast solution for bias in data sets. The service is accessible at http://cubic.bioc.columbia.edu/services/uniqueprot; a command-line version for Linux is downloadable from this web site.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
F. Sirocco and S. C. E. Tosatto
TESE: generating specific protein structure test set ensembles
Bioinformatics, November 15, 2008; 24(22): 2632 - 2633.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
Y. Ofran, A. Schlessinger, and B. Rost
Automated Identification of Complementarity Determining Regions (CDRs) Reveals Peculiar Characteristics of CDRs and B Cell Epitopes
J. Immunol., November 1, 2008; 181(9): 6230 - 6235.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. J. Sweredoski and P. Baldi
PEPITO: improved discontinuous B-cell epitope prediction using multiple distance thresholds and half sphere exposure
Bioinformatics, June 15, 2008; 24(12): 1459 - 1460.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
N. Shu, T. Zhou, and S. Hovmoller
Prediction of zinc-binding sites in proteins from sequence
Bioinformatics, March 15, 2008; 24(6): 775 - 782.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
A. Schlessinger, M. Punta, and B. Rost
Natively unstructured regions in proteins identified from contact predictions
Bioinformatics, September 15, 2007; 23(18): 2376 - 2384.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
Y. Ofran, V. Mysore, and B. Rost
Prediction of DNA-binding residues from sequence
Bioinformatics, July 1, 2007; 23(13): i347 - i353.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
B. E. Suzek, H. Huang, P. McGarvey, R. Mazumder, and C. H. Wu
UniRef: comprehensive and non-redundant UniProt reference clusters
Bioinformatics, May 15, 2007; 23(10): 1282 - 1288.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. T. Chang, D. Ghosh, D. E. Kirschner, and J. J. Linderman
Peptide length-based prediction of peptide-MHC class II binding
Bioinformatics, November 15, 2006; 22(22): 2761 - 2767.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F. Ferre and P. Clote
DiANNA 1.1: an extension of the DiANNA web server for ternary cysteine classification.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W182 - W185.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Schlessinger, Y. Ofran, G. Yachdav, and B. Rost
Epitome: database of structure-inferred antigenic epitopes
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D777 - D780.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
G. Wang and R. L. Dunbrack Jr
PISCES: recent improvements to a PDB sequence culling server
Nucleic Acids Res., July 1, 2005; 33(suppl_2): W94 - W98.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mika and B. Rost
NMPdb: Database of Nuclear Matrix Proteins
Nucleic Acids Res., January 1, 2005; 33(suppl_1): D160 - D163.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Mika and B. Rost
NLProt: extracting protein names and sequences from papers
Nucleic Acids Res., July 1, 2004; 32(suppl_2): W634 - W637.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H. R. Bigelow, D. S. Petrey, J. Liu, D. Przybylski, and B. Rost
Predicting transmembrane beta-barrels in proteomes
Nucleic Acids Res., May 11, 2004; 32(8): 2566 - 2577.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
A. Passerini and P. Frasconi
Learning to discriminate between ligand-bound and disulfide-bound cysteines
Protein Eng. Des. Sel., April 1, 2004; 17(4): 367 - 373.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.