Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (140K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Edgar, R. C.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Edgar, R. C.
Related Collections
Right arrow Computational methods
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published online 16 January 2004

Nucleic Acids Research, 2004, Vol. 32, No. 1 380-385
© 2004 Oxford University Press

Local homology recognition and distance measures in linear time using compressed amino acid alphabets

Robert C. Edgar*

195 Roque Moraes Drive, Mill Valley, CA 94941, USA

*Email: bob{at}drive5.com

Methods for discovery of local similarities and estimation of evolutionary distance by identifying k-mers (contiguous subsequences of length k) common to two sequences are described. Given unaligned sequences of length L, these methods have O(L) time complexity. The ability of compressed amino acid alphabets to extend these techniques to distantly related proteins was investigated. The performance of these algorithms was evaluated for different alphabets and choices of k using a test set of 1848 pairs of structurally alignable sequences selected from the FSSP database. Distance measures derived from k-mer counting were found to correlate well with percentage identity derived from sequence alignments. Compressed alphabets were seen to improve performance in local similarity discovery, but no evidence was found of improvements when applied to distance estimates. The performance of our local similarity discovery method was compared with the fast Fourier transform (FFT) used in MAFFT, which has O(L log L) time complexity. The method for achieving comparable coverage to FFT is revealed here, and is more than an order of magnitude faster. We suggest using k-mer distance for fast, approximate phylogenetic tree construction, and show that a speed improvement of more than three orders of magnitude can be achieved relative to standard distance methods, which require alignments.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
S. A. Shiryev, J. S. Papadopoulos, A. A. Schaffer, and R. Agarwala
Improved BLAST searches using longer words for protein seeding
Bioinformatics, November 1, 2007; 23(21): 2949 - 2951.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
A. E. Greenstein, N. Echols, T. N. Lombana, D. S. King, and T. Alber
Allosteric Activation by Dimerization of the PknD Receptor Ser/Thr Protein Kinase from Mycobacterium tuberculosis
J. Biol. Chem., April 13, 2007; 282(15): 11427 - 11435.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. C. Edgar
MUSCLE: multiple sequence alignment with high accuracy and high throughput
Nucleic Acids Res., March 19, 2004; 32(5): 1792 - 1797.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.