Skip Navigation



Nucleic Acids Research Advance Access published online on October 26, 2006

Nucleic Acids Research, doi:10.1093/nar/gkl731
This Article
Right arrow Full Text Freely available
Right arrow Print PDF (204K) Freely available
Right arrow Screen PDF (206K) Freely available
Right arrow All Versions of this Article:
34/20/5966    most recent
gkl731v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Yu, Y.-K.
Right arrow Articles by Altschul, S. F.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Yu, Y.-K.
Right arrow Articles by Altschul, S. F.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published by Oxford University Press 2006

Computational Biology

Retrieval accuracy, statistical significance and compositional similarity in protein sequence database searches

Yi-Kuo Yu, E. Michael Gertz, Richa Agarwala, Alejandro A. Schäffer and Stephen F. Altschul*

National Center for Biotechnology Information, National Library of Medicine NIH, DHHS, Bethesda, MD 20894, USA

*To whom correspondence should be addressed. Tel: +301 435 7803; Fax: +301 480 2288; Email: altschul{at}ncbi.nlm.nih.gov

Received July 27, 2006. Revised September 15, 2006. Accepted September 21, 2006.

Protein sequence database search programs may be evaluated both for their retrieval accuracy—the ability to separate meaningful from chance similarities—and for the accuracy of their statistical assessments of reported alignments. However, methods for improving statistical accuracy can degrade retrieval accuracy by discarding compositional evidence of sequence relatedness. This evidence may be preserved by combining essentially independent measures of alignment and compositional similarity into a unified measure of sequence similarity. A version of the BLAST protein database search program, modified to employ this new measure, outperforms the baseline program in both retrieval and statistical accuracy on ASTRAL, a SCOP-based test set.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
A. Stojmirovic, E. M. Gertz, S. F. Altschul, and Y.-K. Yu
The effectiveness of position- and composition-specific gap costs for protein similarity searches
Bioinformatics, July 1, 2008; 24(13): i15 - i23.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. I. Sadreyev and N. V. Grishin
Accurate statistical model of comparison between multiple sequence alignments
Nucleic Acids Res., April 1, 2008; 36(7): 2240 - 2248.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Chen and L. Kurgan
PFRES: protein fold classification by using evolutionary information and predicted secondary structure
Bioinformatics, November 1, 2007; 23(21): 2843 - 2850.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.