Published online 8 July 2005
Article |
A sequence sub-sampling algorithm increases the power to detect distant homologues
Department of Clinical Pharmacology, Bioinformatics Group, Royal College of Surgeons in Ireland 123 St Stephens Green, Dublin 2, Ireland
*To whom correspondence should be addressed. Tel: +353 1 4022790; Fax: +353 1 4022453; Email: kjohnston{at}rcsi.ie
Received January 12, 2005. Revised June 14, 2005. Accepted June 14, 2005.
Searching databases for distant homologues using alignments instead of individual sequences increases the power of detection. However, most methods assume that protein evolution proceeds in a regular fashion, with the inferred tree of sequences providing a good estimation of the evolutionary process. We investigated the combined HMMER search results from random alignment subsets (with three sequences each) drawn from the parent alignment (Rand-shuffle algorithm), using the SCOP structural classification to determine true similarities. At false-positive rates of 5%, the Rand-shuffle algorithm improved HMMER's sensitivity, with a 37.5% greater sensitivity compared with HMMER alone, when easily identified similarities (identifiable by BLAST) were excluded from consideration. An extension of the Rand-shuffle algorithm (Ali-shuffle) weighted towards more informative sequence subsets. This approach improved the performance over HMMER alone and PSI-BLAST, particularly at higher false-positive rates. The improvements in performance of these sequence sub-sampling methods may reflect lower sensitivity to alignment error and irregular evolutionary patterns. The Ali-shuffle and Rand-shuffle sequence homology search programs are available by request from the authors.