Published online 16 December 2005
Article |
Automatic assessment of alignment quality
Center for Genomics and Bioinformatics, Karolinska Institutet S-17177 Stockholm, Sweden
*To whom correspondence should be addressed. Tel: +46 8 5248 6372; Fax: +46 8 337983; Email: timo.lassmann{at}cgb.ki.se
Received September 8, 2005. Revised October 21, 2005. Accepted November 30, 2005.
Multiple sequence alignments play a central role in the annotation of novel genomes. Given the biological and computational complexity of this task, the automatic generation of high-quality alignments remains challenging. Since multiple alignments are usually employed at the very start of data analysis pipelines, it is crucial to ensure high alignment quality. We describe a simple, yet elegant, solution to assess the biological accuracy of alignments automatically. Our approach is based on the comparison of several alignments of the same sequences. We introduce two functions to compare alignments: the average overlap score and the multiple overlap score. The former identifies difficult alignment cases by expressing the similarity among several alignments, while the latter estimates the biological correctness of individual alignments. We implemented both functions in the MUMSA program and demonstrate the overall robustness and accuracy of both functions on three large benchmark sets.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Blouin, S. Perry, A. Lavell, E. Susko, and A. J. Roger Reproducing the manual annotation of multiple sequence alignments using a SVM classifier Bioinformatics, December 1, 2009; 25(23): 3093 - 3098. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Thomson and H. B. Shaffer Sparse Supermatrices for Phylogenetic Inference: Taxonomy, Alignment, Rogue Taxa, and the Phylogeny of Living Turtles Syst Biol, November 11, 2009; (2009) syp075v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Kemena and C. Notredame Upcoming challenges for multiple sequence alignment methods in the high-throughput era Bioinformatics, October 1, 2009; 25(19): 2455 - 2465. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Misof and K. Misof A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments: A More Objective Means of Data Exclusion Syst Biol, May 20, 2009; (2009) syp006v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann, O. Frings, and E. L. L. Sonnhammer Kalign2: high-performance multiple alignment of protein and nucleotide sequences allowing external features Nucleic Acids Res., February 1, 2009; 37(3): 858 - 865. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ahola, T. Aittokallio, M. Vihinen, and E. Uusipaikka Model-based prediction of sequence alignment quality Bioinformatics, October 1, 2008; 24(19): 2165 - 2171. [Abstract] [Full Text] [PDF] |
||||
![]() |
A.-C. M. Toes, M. H. Daleke, J. G. Kuenen, and G. Muyzer Expression of copA and cusA in Shewanella during copper stress Microbiology, September 1, 2008; 154(9): 2709 - 2718. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. G. Hall How Well Does the HoT Score Reflect Sequence Alignment Accuracy? Mol. Biol. Evol., August 1, 2008; 25(8): 1576 - 1580. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. G. Hall Simulating DNA Coding Sequence Evolution with EvolveAGene 3 Mol. Biol. Evol., April 1, 2008; 25(4): 688 - 695. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Lunter, A. Rocco, N. Mimouni, A. Heger, A. Caldeira, and J. Hein Uncertainty in homology inferences: Assessing and improving genomic sequence alignment Genome Res., February 1, 2008; 18(2): 298 - 309. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Benavides, R. Baum, D. McClellan, and J. W. Sites Molecular Phylogenetics of the Lizard Genus Microlophus (Squamata:Tropiduridae): Aligning and Retrieving Indel Signal from Nuclear Introns Syst Biol, October 1, 2007; 56(5): 776 - 797. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Carroll, W. Beckstead, T. O'Connor, M. Ebbert, M. Clement, Q. Snell, and D. McClellan DNA reference alignment benchmarks based on tertiary structure of encoded proteins Bioinformatics, October 1, 2007; 23(19): 2648 - 2649. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Moretti, F. Armougom, I. M. Wallace, D. G. Higgins, C. V. Jongeneel, and C. Notredame The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods Nucleic Acids Res., July 13, 2007; 35(suppl_2): W645 - W648. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. McMahon and M. J. Sanderson Phylogenetic Supermatrix Analysis of GenBank Sequences from 2228 Papilionoid Legumes Syst Biol, October 1, 2006; 55(5): 818 - 836. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann and E. L. L. Sonnhammer Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W596 - W599. [Abstract] [Full Text] [PDF] |
||||





