Nucleic Acids Research, 2002, Vol. 30, No. 5 1268-1277
© 2002 Oxford University Press
BALSA: Bayesian algorithm for local sequence alignment
1The Wadsworth Center for Laboratories and Research, New York State Department of Health, Albany, NY 12201, USA, 2Department of Decision Sciences and Engineering Systems, Rensselaer Polytechnic Institute, Troy, NY 12180, USA, 3Department of Statistics, Harvard University, Cambridge, MA 02138, USA and 4Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA
The SmithWaterman algorithm yields a single alignment, which, albeit optimal, can be strongly affected by the choice of the scoring matrix and the gap penalties. Additionally, the scores obtained are dependent upon the lengths of the aligned sequences, requiring a post-analysis conversion. To overcome some of these shortcomings, we developed a Bayesian algorithm for local sequence alignment (BALSA), that takes into account the uncertainty associated with all unknown variables by incorporating in its forward sums a series of scoring matrices, gap parameters and all possible alignments. The algorithm can return both the joint and the marginal optimal alignments, samples of alignments drawn from the posterior distribution and the posterior probabilities of gap penalties and scoring matrices. Furthermore, it automatically adjusts for variations in sequence lengths. BALSA was compared with SSEARCH, to date the best performing dynamic programming algorithm in the detection of structural neighbors. Using the SCOP databases PDB40D-B and PDB90D-B, BALSA detected 19.8 and 41.3% of remote homologs whereas SSEARCH detected 18.4 and 38% at an error rate of 1% errors per query over the databases, respectively.
* To whom correspondence should be addressed at: Wadsworth Center, New York State Department of Health, Empire State Plaza, PO Box 509, Albany, NY 12201-0509, USA. Tel: +1 518 473 3853; Fax: +1 518 473 2900; Email: lawrence{at}wadsworth.org
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
A. Tomovic and E. J. Oakeley Quality estimation of multiple sequence alignments by Bayesian hypothesis testing Bioinformatics, September 15, 2007; 23(18): 2488 - 2490. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Huang and D. L. Brutlag Dynamic use of multiple parameter sets in sequence alignment Nucleic Acids Res., January 28, 2007; 35(2): 678 - 686. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Thompson, E. C. Rouchka, and C. E. Lawrence Gibbs Recursive Sampler: finding transcription factor binding sites Nucleic Acids Res., July 1, 2003; 31(13): 3580 - 3585. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Katayama, T. Kondo, J. Xiong, and S. S. Golden ldpA Encodes an Iron-Sulfur Protein Involved in Light-Dependent Modulation of the Circadian Period in the Cyanobacterium Synechococcuselongatus PCC 7942 J. Bacteriol., February 15, 2003; 185(4): 1415 - 1422. [Abstract] [Full Text] [PDF] |
||||


