Nucleic Acids Research, Vol 27, Issue 13 2682-2690, Copyright © 1999 by Oxford University Press
JD Thompson, F Plewniak and O Poch
In recent years improvements to existing programs and the introduction of
new iterative algorithms have changed the state-of-the-art in protein
sequence alignment. This paper presents the first systematic study of the
most commonly used alignment programs using BAliBASE benchmark alignments
as test cases. Even below the 'twilight zone' at 10-20% residue identity,
the best programs were capable of correctly aligning on average 47% of the
residues. We show that iterative algorithms often offer improved alignment
accuracy though at the expense of computation time. A notable exception was
the effect of introducing a single divergent sequence into a set of closely
related sequences, causing the iteration to diverge away from the best
alignment. Global alignment programs generally performed better than local
methods, except in the presence of large N/C-terminal extensions and
internal insertions. In these cases, a local algorithm was more successful
in identifying the most conserved motifs. This study enables us to propose
appropriate alignment strategies, depending on the nature of a particular
set of sequences. The employment of more than one program based on
different alignment techniques should significantly improve the quality of
automatic protein sequence alignment methods. The results also indicate
guidelines for improvement of alignment algorithms.
ARTICLES
A comprehensive comparison of multiple sequence alignment programs
Laboratoire de Biologie Structurale, Institut de Genetique et de Biologie Moleculaire et Cellulaire, (CNRS/INSERM/ULP), BP 163, 67404 Illkirch Cedex, France.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
M. Hamada, K. Sato, H. Kiryu, T. Mituyama, and K. Asai CentroidAlign: fast and accurate aligner for structured RNAs by maximizing expected sum-of-pairs score Bioinformatics, December 15, 2009; 25(24): 3236 - 3243. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Blouin, S. Perry, A. Lavell, E. Susko, and A. J. Roger Reproducing the manual annotation of multiple sequence alignments using a SVM classifier Bioinformatics, December 1, 2009; 25(23): 3093 - 3098. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. A. Cromie Phylogenetic Ubiquity and Shuffling of the Bacterial RecBCD and AddAB Recombination Complexes J. Bacteriol., August 15, 2009; 191(16): 5076 - 5084. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Lee, M. K. Chan, and R. Bundschuh SIB-BLAST: a web server for improved delineation of true and false positives in PSI-BLAST searches Nucleic Acids Res., July 1, 2009; 37(suppl_2): W53 - W56. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. W. Mount Comparing Programs and Methods to Use for Global Multiple Sequence Alignment CSH Protocols, July 1, 2009; 2009(7): pdb.ip61 - pdb.ip61. [Abstract] [Full Text] |
||||
![]() |
B. Misof and K. Misof A Monte Carlo Approach Successfully Identifies Randomness in Multiple Sequence Alignments: A More Objective Means of Data Exclusion Syst Biol, May 20, 2009; (2009) syp006v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Lu and S.-H. Sze Improving accuracy of multiple sequence alignment algorithms based on alignment of neighboring residues Nucleic Acids Res., February 1, 2009; 37(2): 463 - 472. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. R. Aniba, S. Siguenza, A. Friedrich, F. Plewniak, O. Poch, A. Marchler-Bauer, and J. D. Thompson Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis Brief Bioinform, January 1, 2009; 10(1): 11 - 23. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. Ahola, T. Aittokallio, M. Vihinen, and E. Uusipaikka Model-based prediction of sequence alignment quality Bioinformatics, October 1, 2008; 24(19): 2165 - 2171. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. G. Hall How Well Does the HoT Score Reflect Sequence Alignment Accuracy? Mol. Biol. Evol., August 1, 2008; 25(8): 1576 - 1580. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Do, C.-S. Foo, and S. Batzoglou A max-margin model for efficient simultaneous alignment and folding of RNA sequences Bioinformatics, July 1, 2008; 24(13): i68 - i76. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Torarinsson and S. Lindgreen WAR: Webserver for aligning structural RNAs Nucleic Acids Res., July 1, 2008; 36(suppl_2): W79 - W84. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, M. Tang, and N. V. Grishin PROMALS3D web server for accurate multiple protein sequence and structure alignments Nucleic Acids Res., July 1, 2008; 36(suppl_2): W30 - W34. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Sanderson, D. Boss, D. Chen, K. A. Cranston, and A. Wehe The PhyLoTA Browser: Processing GenBank for Molecular Phylogenetics Research Syst Biol, June 1, 2008; 57(3): 335 - 346. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. M. Lee, M. K. Chan, and R. Bundschuh Simple is beautiful: a straightforward approach to improve the delineation of true and false positives in PSI-BLAST searches Bioinformatics, June 1, 2008; 24(11): 1339 - 1343. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wilm, D. G. Higgins, and C. Notredame R-Coffee: a method for multiple alignment of non-coding RNA Nucleic Acids Res., May 1, 2008; 36(9): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
O. Deusch, G. Landan, M. Roettger, N. Gruenheit, K. V. Kowallik, J. F. Allen, W. Martin, and T. Dagan Genes of Cyanobacterial Origin in Plant Nuclear Genomes Point to a Heterocyst-Forming Plastid Ancestor Mol. Biol. Evol., April 1, 2008; 25(4): 748 - 761. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Lindgreen, P. P. Gardner, and A. Krogh MASTR: multiple alignment and structure prediction of non-coding RNAs using simulated annealing Bioinformatics, December 15, 2007; 23(24): 3304 - 3311. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Golubchik, M. J. Wise, S. Easteal, and L. S. Jermiin Mind the Gaps: Evidence of Bias in Estimates of Multiple Sequence Alignments Mol. Biol. Evol., November 1, 2007; 24(11): 2433 - 2442. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Carroll, W. Beckstead, T. O'Connor, M. Ebbert, M. Clement, Q. Snell, and D. McClellan DNA reference alignment benchmarks based on tertiary structure of encoded proteins Bioinformatics, October 1, 2007; 23(19): 2648 - 2649. [Abstract] [Full Text] [PDF] |
||||
![]() |
C.D Hulsey, M.C Mims, and J.T Streelman Do constructional constraints influence cichlid craniofacial diversification? Proc R Soc B, August 7, 2007; 274(1620): 1867 - 1875. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Xu, Y. Ji, and G. D. Stormo RNA Sampler: a new sampling based algorithm for common RNA secondary structure prediction and structural alignment Bioinformatics, August 1, 2007; 23(15): 1883 - 1891. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Slot, K. N. Hallstrom, P. B. Matheny, and D. S. Hibbett Diversification of NRT2 and the Origin of Its Fungal Homolog Mol. Biol. Evol., August 1, 2007; 24(8): 1731 - 1743. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, B.-H. Kim, M. Tang, and N. V. Grishin PROMALS web server for accurate multiple protein sequence alignments Nucleic Acids Res., July 13, 2007; 35(suppl_2): W649 - W652. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Landan and D. Graur Heads or Tails: A Simple Reliability Check for Multiple Sequence Alignments Mol. Biol. Evol., June 1, 2007; 24(6): 1380 - 1383. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei and N. V. Grishin PROMALS: towards accurate multiple sequence alignments of distantly related proteins Bioinformatics, April 1, 2007; 23(7): 802 - 808. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kumar and A. Filipski Multiple sequence alignment: In pursuit of homologous DNA positions Genome Res., February 1, 2007; 17(2): 127 - 135. [Abstract] [Full Text] [PDF] |
||||
![]() |
X. Zhang and T. Kahveci QOMA: quasi-optimal multiple alignment of protein sequences Bioinformatics, January 15, 2007; 23(2): 162 - 168. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Roshan and D. R. Livesay Probalign: multiple sequence alignment using partition function posterior probabilities Bioinformatics, November 15, 2006; 22(22): 2715 - 2721. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. M. Phuong, C. B. Do, R. C. Edgar, and S. Batzoglou Multiple alignment of protein sequences with repeats and rearrangements Nucleic Acids Res., November 6, 2006; 34(20): 5932 - 5942. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. D. Bailey, M. A. Koch, M. Mayer, K. Mummenhoff, S. L. O'Kane Jr, S. I. Warwick, M. D. Windham, and I. A. Al-Shehbaz Toward a Global Phylogeny of the Brassicaceae Mol. Biol. Evol., November 1, 2006; 23(11): 2142 - 2160. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei and N. V. Grishin MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information Nucleic Acids Res., September 11, 2006; 34(16): 4364 - 4374. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Hazony, J. Lu, C. St. Hilaire, and K. Ravid Hematopoietic gene promoters subjected to a group-combinatorial study of DNA samples: identification of a megakaryocytic selective DNA signature Nucleic Acids Res., September 11, 2006; 34(16): 4416 - 4428. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dalli, A. Wilm, I. Mainz, and G. Steger STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time Bioinformatics, July 1, 2006; 22(13): 1593 - 1599. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. Rivals, C. Bruyere, C. Toffano-Nioche, and A. Lecharny Formation of the Arabidopsis Pentatricopeptide Repeat Family Plant Physiology, July 1, 2006; 141(3): 825 - 839. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Y. Sun and Y. Sun A System for Automated Lexical Mapping J. Am. Med. Inform. Assoc., May 1, 2006; 13(3): 334 - 343. [Abstract] [Full Text] [PDF] |
||||
![]() |
J.-H. Lee and D. J. O'Sullivan Sequence Analysis of Two Cryptic Plasmids from Bifidobacterium longum DJO10A and Construction of a Shuttle Cloning Vector Appl. Envir. Microbiol., January 1, 2006; 72(1): 527 - 535. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann and E. L. L. Sonnhammer Automatic assessment of alignment quality Nucleic Acids Res., December 16, 2005; 33(22): 7120 - 7128. [Abstract] [Full Text] [PDF] |
||||
![]() |
H. Zhou and Y. Zhou SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures Bioinformatics, September 15, 2005; 21(18): 3615 - 3621. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Siebert and R. Backofen MARNA: multiple alignment and consensus structure prediction of RNAs based on sequence structure comparisons Bioinformatics, August 15, 2005; 21(16): 3352 - 3359. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. E. Donald and E. I. Shakhnovich Determining functional specificity from protein sequences Bioinformatics, June 1, 2005; 21(11): 2629 - 2635. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Nozaki and M. Bellgard Statistical evaluation and comparison of a pairwise alignment algorithm that a priori assigns the number of gaps rather than employing gap penalties Bioinformatics, April 15, 2005; 21(8): 1421 - 1428. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. G. Hall Comparison of the Accuracies of Several Phylogenetic Methods Using Protein and DNA Sequences Mol. Biol. Evol., March 1, 2005; 22(3): 792 - 802. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. B. Do, M. S.P. Mahabhashyam, M. Brudno, and S. Batzoglou ProbCons: Probabilistic consistency-based multiple sequence alignment Genome Res., February 1, 2005; 15(2): 330 - 340. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Belshaw and A. Katzourakis BlastAlign: a program that uses blast to align problematic nucleotide sequences Bioinformatics, January 1, 2005; 21(1): 122 - 123. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C.E. Darling, B. Mau, F. R. Blattner, and N. T. Perna Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements Genome Res., July 1, 2004; 14(7): 1394 - 1403. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. Morgenstern DIALIGN: multiple DNA and protein sequence alignment at BiBiServ Nucleic Acids Res., July 1, 2004; 32(suppl_2): W33 - W36. [Abstract] [Full Text] [PDF] |
||||
![]() |
W.-G. Qiu, N. Schisler, and A. Stoltzfus The Evolutionary Gain of Spliceosomal Introns: Sequence and Phase Preferences Mol. Biol. Evol., July 1, 2004; 21(7): 1252 - 1263. [Abstract] [Full Text] [PDF] |
||||
![]() |
B. M. d. C. Bronsvoort, A. D. Radford, V. N. Tanya, C. Nfon, R. P. Kitching, and K. L. Morgan Molecular Epidemiology of Foot-and-Mouth Disease Viruses in the Adamawa Province of Cameroon J. Clin. Microbiol., May 1, 2004; 42(5): 2186 - 2196. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. J. Liu and B. D. Hall Body plan evolution of ascomycetes, as inferred from an RNA polymerase II phylogeny PNAS, March 30, 2004; 101(13): 4507 - 4512. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. C. Edgar MUSCLE: multiple sequence alignment with high accuracy and high throughput Nucleic Acids Res., March 19, 2004; 32(5): 1792 - 1797. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Thompson, V. Prigent, and O. Poch LEON: multiple aLignment Evaluation Of Neighbours Nucleic Acids Res., February 24, 2004; 32(4): 1298 - 1307. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Schwartz, L. Elnitski, M. Li, M. Weirauch, C. Riemer, A. Smit, N. C. S. Program, E. D. Green, R. C. Hardison, and W. Miller MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences Nucleic Acids Res., July 1, 2003; 31(13): 3518 - 3524. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. R. Panchenko Finding weak similarities between proteins by sequence profile comparison Nucleic Acids Res., January 15, 2003; 31(2): 683 - 689. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Olesky, M. Hobbs, and R. A. Nicholas Identification and Analysis of Amino Acid Mutations in Porin IB That Mediate Intermediate-Level Resistance to Penicillin and Tetracycline in Neisseria gonorrhoeae Antimicrob. Agents Chemother., September 1, 2002; 46(9): 2811 - 2820. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh, K. Misawa, K.-i. Kuma, and T. Miyata MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform Nucleic Acids Res., July 15, 2002; 30(14): 3059 - 3066. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Manivet, B. Schneider, J. C. Smith, D.-S. Choi, L. Maroteaux, O. Kellermann, and J.-M. Launay The Serotonin Binding Site of Human and Murine 5-HT2B Receptors. MOLECULAR MODELING AND SITE-DIRECTED MUTAGENESIS J. Biol. Chem., May 3, 2002; 277(19): 17170 - 17178. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Palacios, I. Casas, A. Tenorio, and C. Freire Molecular Identification of Enterovirus by Analyzing a Partial VP1 Genomic Region with Different Methods J. Clin. Microbiol., January 1, 2002; 40(1): 182 - 192. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. Laprevotte, M. Pupin, E. Coward, G. Didier, C. Terzian, C. Devauchelle, and A. Henaut HIV-1 and HIV-2 LTR Nucleotide Sequences: Assessment of the Alignment by N-block Presentation, "Retroviral Signatures" of Overrepeated Oligonucleotides, and a Probable Important Role of Scrambled Stepwise Duplications/Deletions in Molecular Evolution Mol. Biol. Evol., July 1, 2001; 18(7): 1231 - 1245. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bahr, J. D. Thompson, J.-C. Thierry, and O. Poch BAliBASE (Benchmark Alignment dataBASE): enhancements for repeats, transmembrane sequences and circular permutations Nucleic Acids Res., January 1, 2001; 29(1): 323 - 326. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Lindauer, T. Loerting, K. R. Liedl, and R. T. Kroemer Prediction of the structure of human Janus kinase 2 (JAK2) comprising the two carboxy-terminal domains reveals a mechanism for autoregulation Protein Eng. Des. Sel., January 1, 2001; 14(1): 27 - 37. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. J. McGeoch, A. Dolan, and A. C. Ralph Toward a Comprehensive Phylogeny for Mammalian and Avian Herpesviruses J. Virol., November 15, 2000; 74(22): 10401 - 10406. [Abstract] [Full Text] |
||||
![]() |
J. D. Thompson, F. Plewniak, J.-C. Thierry, and O. Poch DbClustal: rapid and reliable global multiple alignments of protein sequences detected by database searches Nucleic Acids Res., August 1, 2000; 28(15): 2919 - 2926. [Abstract] [Full Text] [PDF] |
||||

















