Published online 1 August 2005
Article |
miBLAST: scalable evaluation of a batch of nucleotide sequence queries with BLAST
Department of Electrical Engineering and Computer Science, University of Michigan Ann Arbor, MI 48109, USA 1Michigan Center for Biological Information, University of Michigan 3600 Green Court, Ann Arbor, MI 48109, USA
*To whom correspondence should be addressed. Tel: +1 734 647 1806; Fax: +1 734 763 8094; Email: jignesh{at}eecs.umich.edu
Received March 18, 2005. Revised May 26, 2005. Accepted July 11, 2005.
A common task in many modern bioinformatics applications is to match a set of nucleotide query sequences against a large sequence dataset. Exis-ting tools, such as BLAST, are designed to evaluate a single query at a time and can be unacceptably slow when the number of sequences in the query set is large. In this paper, we present a new algorithm, called miBLAST, that evaluates such batch workloads efficiently. At the core, miBLAST employs a q-gram filtering and an index join for efficiently detecting similarity between the query sequences and database sequences. This set-oriented technique, which indexes both the query and the database sets, results in substantial performance improvements over existing methods. Our results show that miBLAST is significantly faster than BLAST in many cases. For example, miBLAST aligned 247 965 oligonucleotide sequences in the Affymetrix probe set against the Human UniGene in 1.26 days, compared with 27.27 days with BLAST (an improvement by a factor of 22). The relative performance of miBLAST increases for larger word sizes; however, it decreases for longer queries. miBLAST employs the familiar BLAST statistical model and output format, guaranteeing the same accuracy as BLAST and facilitating a seamless transition for existing BLAST users.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Qu, Z. Shen, D. Zhao, Y. Yang, and C. Zhang MFEprimer: multiple factor evaluation of the specificity of PCR primers Bioinformatics, January 15, 2009; 25(2): 276 - 278. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. G. Tarcea, T. Weymouth, A. Ade, A. Bookvich, J. Gao, V. Mahavisno, Z. Wright, A. Chapman, M. Jayapandian, A. Ozgur, et al. Michigan molecular interactions r2: from interacting proteins to pathways Nucleic Acids Res., January 1, 2009; 37(suppl_1): D642 - D646. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Morgulis, G. Coulouris, Y. Raytselis, T. L. Madden, R. Agarwala, and A. A. Schaffer Database indexing for production MegaBLAST searches Bioinformatics, August 15, 2008; 24(16): 1757 - 1764. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Jayapandian, A. Chapman, V. G. Tarcea, C. Yu, A. Elkiss, A. Ianni, B. Liu, A. Nandi, C. Santos, P. Andrews, et al. Michigan Molecular Interactions (MiMI): putting the jigsaw puzzle together Nucleic Acids Res., January 12, 2007; 35(suppl_1): D566 - D571. [Abstract] [Full Text] [PDF] |
||||

