Published online 20 January 2005
Article |
MAFFT version 5: improvement in accuracy of multiple sequence alignment
1 Bioinformatics Center, Institute for Chemical Research, Kyoto University Uji, Kyoto 611-0011, Japan 2 Biohistory Research Hall Takatsuki, Osaka 569-1125, Japan 3 Department of Electrical Engineering and Bioscience, Science and Engineering, Waseda University Tokyo 169-8555, Japan 4 Department of Biophysics, Graduate School of Science, Kyoto University Kyoto 606-8502, Japan
*To whom correspondence should be addressed. Tel: +81 774 38 3119; Fax: +81 774 38 3059; Email: kkatoh{at}kuicr.kyoto-u.ac.jp
Received October 14, 2004. Revised November 16, 2004. Accepted December 29, 2004.
The accuracy of multiple sequence alignment program MAFFT has been improved. The new version (5.3) of MAFFT offers new iterative refinement options, H-INS-i, F-INS-i and G-INS-i, in which pairwise alignment information are incorporated into objective function. These new options of MAFFT showed higher accuracy than currently available methods including TCoffee version 2 and CLUSTAL W in benchmark tests consisting of alignments of >50 sequences. Like the previously available options, the new options of MAFFT can handle hundreds of sequences on a standard desktop computer. We also examined the effect of the number of homologues included in an alignment. For a multiple alignment consisting of
8 sequences with low similarity, the accuracy was improved (210 percentage points) when the sequences were aligned together with dozens of their close homologues (E-value < 1051020) collected from a database. Such improvement was generally observed for most methods, but remarkably large for the new options of MAFFT proposed here. Thus, we made a Ruby script, mafftE.rb, which aligns the input sequences together with their close homologues collected from SwissProt using NCBI-BLAST.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. G. Hall How Well Does the HoT Score Reflect Sequence Alignment Accuracy? Mol. Biol. Evol., August 1, 2008; 25(8): 1576 - 1580. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. L. Knies, K. K. Dang, T. J. Vision, N. G. Hoffman, R. Swanstrom, and C. L. Burch Compensatory Evolution in RNA Secondary Structures Increases Substitution Rate Variation among Sites Mol. Biol. Evol., August 1, 2008; 25(8): 1778 - 1787. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-M. Bourbon Comparative genomics supports a deep evolutionary origin for the large, four-module transcriptional mediator complex Nucleic Acids Res., July 1, 2008; 36(12): 3993 - 4008. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Moretti, A. Wilm, D. G. Higgins, I. Xenarios, and C. Notredame R-Coffee: a web server for accurately aligning noncoding RNA sequences Nucleic Acids Res., July 1, 2008; 36(suppl_2): W10 - W13. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, M. Tang, and N. V. Grishin PROMALS3D web server for accurate multiple protein sequence and structure alignments Nucleic Acids Res., July 1, 2008; 36(suppl_2): W30 - W34. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Loytynoja and N. Goldman Phylogeny-Aware Gap Placement Prevents Errors in Sequence Alignment and Evolutionary Analysis Science, June 20, 2008; 320(5883): 1632 - 1635. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Kuraku, Y. Takio, K. Tamura, H. Aono, A. Meyer, and S. Kuratani Noncanonical role of Hox14 revealed by its expression patterns in lamprey and shark PNAS, May 6, 2008; 105(18): 6679 - 6683. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Quan, M. van Vuuren, P. G. Howell, D. Groenewald, and A. J. Guthrie Molecular epidemiology of the African horse sickness virus S10 gene J. Gen. Virol., May 1, 2008; 89(5): 1159 - 1168. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Wilm, D. G. Higgins, and C. Notredame R-Coffee: a method for multiple alignment of non-coding RNA Nucleic Acids Res., May 1, 2008; 36(9): e52 - e52. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Balke, J. Gomez-Zurita, I. Ribera, A. Viloria, A. Zillikens, J. Steiner, M. Garcia, L. Hendrich, and A. P. Vogler Ancient associations of aquatic beetles and tank bromeliads in the Neotropical forest canopy PNAS, April 29, 2008; 105(17): 6356 - 6361. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, B.-H. Kim, and N. V. Grishin PROMALS3D: a tool for multiple protein sequence and structure alignments Nucleic Acids Res., April 1, 2008; 36(7): 2295 - 2300. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. M. Bushell, C. Sollner, B. Schuster-Boeckler, A. Bateman, and G. J. Wright Large-scale screening for novel low-affinity extracellular protein interactions Genome Res., April 1, 2008; 18(4): 622 - 630. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. D. Moore and R. G. Allaby TreeMos: a high-throughput phylogenomic approach to find and visualize phylogenetic mosaicism Bioinformatics, March 1, 2008; 24(5): 717 - 718. [Abstract] [Full Text] [PDF] |
||||
![]() |
W. Pirovano, K. A. Feenstra, and J. Heringa PRALINETM: a strategy for improved multiple alignment of transmembrane proteins Bioinformatics, February 15, 2008; 24(4): 492 - 497. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Michael, G. Trave, C. Ramu, C. Chica, and T. J. Gibson Discovery of candidate KEN-box motifs using Cell Cycle keyword enrichment combined with native disorder prediction and motif conservation Bioinformatics, February 15, 2008; 24(4): 453 - 457. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. G. Conte, S. Gaillard, N. Lanau, M. Rouard, and C. Perin GreenPhylDB: a database for plant comparative genomics Nucleic Acids Res., January 11, 2008; 36(suppl_1): D991 - D998. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hashimoto, A. C. Yoshizawa, S. Okuda, K. Kuma, S. Goto, and M. Kanehisa The repertoire of desaturases and elongases reveals fatty acid variations in 56 eukaryotic genomes J. Lipid Res., January 1, 2008; 49(1): 183 - 191. [Abstract] [Full Text] [PDF] |
||||
![]() |
C. Casola, D. Hucks, and C. Feschotte Convergent Domestication of pogo-like Transposases into Centromere-Binding Proteins in Fission Yeast and Mammals Mol. Biol. Evol., January 1, 2008; 25(1): 29 - 41. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Wang, R. R. Gutell, and D. P. Miranker Biclustering as a method for RNA local multiple sequence alignment Bioinformatics, December 15, 2007; 23(24): 3289 - 3296. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. A. Watkins, D. Maiguel, Z. Jia, and J. Pevsner Evidence for 26 distinct acyl-coenzyme A synthetase genes in the human genome J. Lipid Res., December 1, 2007; 48(12): 2736 - 2750. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. A. CRAIG and L. LIAO Improving Protein Protein Interaction Prediction Based on Phylogenetic Information Using a Least-Squares Support Vector Machine Ann. N.Y. Acad. Sci., December 1, 2007; 1115(1): 154 - 167. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Schmidt von Braun, A. Sabetti, P. J. Hanic-Joyce, J. Gu, E. Schleiff, and P. B. M. Joyce Dual targeting of the tRNA nucleotidyltransferase in plants: not just the signal J. Exp. Bot., December 1, 2007; 58(15-16): 4083 - 4093. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Schlegel, O. Mirus, A. von Haeseler, and E. Schleiff The Tetratricopeptide Repeats of Receptors Involved in Protein Translocation across Membranes Mol. Biol. Evol., December 1, 2007; 24(12): 2763 - 2774. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Golubchik, M. J. Wise, S. Easteal, and L. S. Jermiin Mind the Gaps: Evidence of Bias in Estimates of Multiple Sequence Alignments Mol. Biol. Evol., November 1, 2007; 24(11): 2433 - 2442. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hirano, M. Nakajima, K. Asano, T. Nishiyama, H. Sakakibara, M. Kojima, E. Katoh, H. Xiang, T. Tanahashi, M. Hasebe, et al. The GID1-Mediated Gibberellin Perception Mechanism Is Conserved in the Lycophyte Selaginella moellendorffii but Not in the Bryophyte Physcomitrella patens PLANT CELL, October 1, 2007; 19(10): 3058 - 3079. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Labarga, F. Valentin, M. Anderson, and R. Lopez Web Services at the European Bioinformatics Institute Nucleic Acids Res., July 13, 2007; 35(suppl_2): W6 - W11. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei, B.-H. Kim, M. Tang, and N. V. Grishin PROMALS web server for accurate multiple protein sequence alignments Nucleic Acids Res., July 13, 2007; 35(suppl_2): W649 - W652. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Chikkagoudar, U. Roshan, and D. Livesay eProbalign: generation and manipulation of multiple sequence alignments using partition function posterior probabilities Nucleic Acids Res., July 13, 2007; 35(suppl_2): W675 - W677. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Moretti, F. Armougom, I. M. Wallace, D. G. Higgins, C. V. Jongeneel, and C. Notredame The M-Coffee web server: a meta-method for computing multiple sequence alignments by combining alternative alignment methods Nucleic Acids Res., July 13, 2007; 35(suppl_2): W645 - W648. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Pagni, V. Ioannidis, L. Cerutti, M. Zahn-Zabal, C. V. Jongeneel, J. Hau, O. Martin, D. Kuznetsov, and L. Falquet MyHits: improvements to an interactive resource for analyzing protein sequences Nucleic Acids Res., July 13, 2007; 35(suppl_2): W433 - W437. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. J. Wheeler and J. D. Kececioglu Multiple alignment by aligning alignments Bioinformatics, July 1, 2007; 23(13): i559 - i568. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. J. Gentles, M. J. Wakefield, O. Kohany, W. Gu, M. A. Batzer, D. D. Pollock, and J. Jurka Evolutionary dynamics of transposable elements in the short-tailed opossum Monodelphis domestica Genome Res., July 1, 2007; 17(7): 992 - 1004. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Sugahara, N. Yachie, K. Arakawa, and M. Tomita In silico screening of archaeal tRNA-encoding genes having multiple introns with bulge-helix-bulge splicing motifs RNA, May 1, 2007; 13(5): 671 - 681. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Van Vooren, B. Thienpont, B. Menten, F. Speleman, B. D. Moor, J. Vermeesch, and Y. Moreau Mapping biomedical concepts onto the human genome by mining literature on chromosomal aberrations Nucleic Acids Res., April 3, 2007; 35(8): 2533 - 2543. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. J. Spatz, L. Petherbridge, Y. Zhao, and V. Nair Comparative full-length sequence analysis of oncogenic and vaccine (Rispens) strains of Marek's disease virus J. Gen. Virol., April 1, 2007; 88(4): 1080 - 1096. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Babcock, S. Yatsenko, P. Stankiewicz, J. R. Lupski, and B. E. Morrow AT-rich repeats associated with chromosome 22q11.2 rearrangement disorders shape human genome architecture on Yq12 Genome Res., April 1, 2007; 17(4): 451 - 460. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei and N. V. Grishin PROMALS: towards accurate multiple sequence alignments of distantly related proteins Bioinformatics, April 1, 2007; 23(7): 802 - 808. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Hanekamp, U. Bohnebeck, B. Beszteri, and K. Valentin PhyloGena a user-friendly system for automated phylogenetic annotation of unknown sequences Bioinformatics, April 1, 2007; 23(7): 793 - 801. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. Richardt, D. Lang, R. Reski, W. Frank, and S. A. Rensing PlanTAPDB, a Phylogeny-Based Resource of Plant Transcription-Associated Proteins Plant Physiology, April 1, 2007; 143(4): 1452 - 1466. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Kim and S. Sinha Indelign: a probabilistic framework for annotation of insertions and deletions in a multiple alignment Bioinformatics, February 1, 2007; 23(3): 289 - 297. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Katoh and H. Toh PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences Bioinformatics, February 1, 2007; 23(3): 372 - 374. [Abstract] [Full Text] [PDF] |
||||
![]() |
U. Roshan and D. R. Livesay Probalign: multiple sequence alignment using partition function posterior probabilities Bioinformatics, November 15, 2006; 22(22): 2715 - 2721. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. N. Kinch and N. V. Grishin Longin-like folds identified in CHiPS and DUF254 proteins: Vesicle trafficking complexes conserved in eukaryotic evolution. Protein Sci., November 1, 2006; 15(11): 2669 - 2674. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Sato, Y. Yamanishi, K. Horimoto, M. Kanehisa, and H. Toh Partial correlation coefficient between distance matrices as a new indicator of protein-protein interactions Bioinformatics, October 15, 2006; 22(20): 2488 - 2492. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. K. Kojima, K.-i. Kuma, H. Toh, and H. Fujiwara Identification of rDNA-Specific Non-LTR Retrotransposons in Cnidaria Mol. Biol. Evol., October 1, 2006; 23(10): 1984 - 1993. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Pei and N. V. Grishin MUMMALS: multiple sequence alignment improved by using hidden Markov models with local structural information Nucleic Acids Res., September 11, 2006; 34(16): 4364 - 4374. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Garnier, A. Friedrich, R. Bolze, E. Bettler, L. Moulinier, C. Geourjon, J. D. Thompson, G. Deleage, and O. Poch MAGOS: multiple alignment and modelling server Bioinformatics, September 1, 2006; 22(17): 2164 - 2165. [Abstract] [Full Text] [PDF] |
||||
![]() |
K. Kongsuwan, P. Josh, M. J. Picault, G. Wijffels, and B. Dalrymple The Plasmid RK2 Replication Initiator Protein (TrfA) Binds to the Sliding Clamp {beta} Subunit of DNA Polymerase III: Implication for the Toxicity of a Peptide Derived from the Amino-Terminal Portion of 33-Kilodalton TrfA. J. Bacteriol., August 1, 2006; 188(15): 5501 - 5509. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Lassmann and E. L. L. Sonnhammer Kalign, Kalignvu and Mumsa: web servers for multiple sequence alignment. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W596 - W599. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. Armougom, S. Moretti, O. Poirot, S. Audic, P. Dumas, B. Schaeli, V. Keduas, and C. Notredame Expresso: automatic incorporation of structural information in multiple sequence alignments using 3D-Coffee. Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W604 - W608. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Dalli, A. Wilm, I. Mainz, and G. Steger STRAL: progressive alignment of non-coding RNA using base pairing probability vectors in quadratic time Bioinformatics, July 1, 2006; 22(13): 1593 - 1599. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Flaus, D. M. A. Martin, G. J. Barton, and T. Owen-Hughes Identification of multiple distinct Snf2 subfamilies with conserved structural motifs Nucleic Acids Res., May 31, 2006; 34(10): 2887 - 2905. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. M. Wallace, O. O'Sullivan, D. G. Higgins, and C. Notredame M-Coffee: combining multiple sequence alignment methods with T-Coffee Nucleic Acids Res., March 23, 2006; 34(6): 1692 - 1699. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. C. Chiu, E. K. Lee, M. G. Egan, I. N. Sarkar, G. M. Coruzzi, and R. DeSalle OrthologID: automation of genome-scale ortholog identification within a parsimony framework Bioinformatics, March 15, 2006; 22(6): 699 - 707. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Yu. Mitrophanov and M. Borodovsky Statistical significance in biological sequence analysis Brief Bioinform, March 1, 2006; 7(1): 2 - 24. |
||||
![]() |
G. C. Hunter, B. D. Wingfield, P. W. Crous, and M. J. Wingfield A multi-gene phylogeny for species of Mycosphaerella occurring on Eucalyptus leaves. Stud Mycol, January 1, 2006; 55: 147 - 161. [Abstract] [Full Text] [PDF] |
||||
![]() |
|
















