Published online 6 September 2005
Article |
The Gumbel pre-factor k for gapped local alignment can be estimated from simulations of global alignment
National Center for Biotechnology Information, National Library of Medicine Bethesda MD 20894, USA
*To whom correspondence should be addressed. Tel: +301 402 9310; Fax: +301 480 2288; Email: spouge{at}ncbi.nlm.nih.gov
Received May 12, 2005. Revised July 13, 2005. Accepted August 12, 2005.
The optimal gapped local alignment score of two random sequences follows a Gumbel distribution. The Gumbel distribution has two parameters, the scale parameter
and the pre-factor k. Presently, the basic local alignment search tool (BLAST) programs (BLASTP (BLAST for proteins), PSI-BLAST, etc.) use all time-consuming computer simulations to determine the Gumbel parameters. Because the simulations must be done offline, BLAST users are restricted in their choice of alignment scoring schemes. The ultimate aim of this paper is to speed the simulations, to determine the Gumbel parameters online, and to remove the corresponding restrictions on BLAST users. Simulations for the scale parameter
can be as much as five times faster, if they use global instead of local alignment [R. Bundschuh (2002) J. Comput. Biol., 9, 243260]. Unfortunately, the acceleration does not extend in determining the Gumbel pre-factor k, because k has no known mathematical relationship to global alignment. This paper relates k to global alignment and exploits the relationship to show that for the BLASTP defaults, 10 000 realizations with sequences of average length 140 suffice to estimate both Gumbel parameters
and k within the errors required (
, 0.8%; k, 10%). For the BLASTP defaults, simulations for both Gumbel parameters now take less than 30 s on a 2.8 GHz Pentium 4 processor.
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.