Skip Navigation

This Article
Right arrow Abstract Freely available
Right arrow Print PDF (291K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (364)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Katoh, K.
Right arrow Articles by Miyata, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Katoh, K.
Right arrow Articles by Miyata, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Nucleic Acids Research, 2002, Vol. 30, No. 14 3059-3066
© 2002 Oxford University Press

MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform

Kazutaka Katoh, Kazuharu Misawa1, Kei-ichi Kuma and Takashi Miyata*

Department of Biophysics, Graduate School of Science, Kyoto University, Kyoto 606-8502, Japan and 1 Institute of Molecular Evolutionary Genetics, Pennsylvania State University, University Park, PA 16802, USA

*To whom correspondence should be addressed. Tel: +81 75 753 4220; Fax: +81 75 753 4223; Email: miyata@biophys.kyoto-u.ac.jp

Received April 8, 2002; Revised and Accepted May 24, 2002


    ABSTRACT
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
A multiple sequence alignment program, MAFFT, has been developed. The CPU time is drastically reduced as compared with existing methods. MAFFT includes two novel techniques. (i) Homo logous regions are rapidly identified by the fast Fourier transform (FFT), in which an amino acid sequence is converted to a sequence composed of volume and polarity values of each amino acid residue. (ii) We propose a simplified scoring system that performs well for reducing CPU time and increasing the accuracy of alignments even for sequences having large insertions or extensions as well as distantly related sequences of similar length. Two different heuristics, the progressive method (FFT-NS-2) and the iterative refinement method (FFT-NS-i), are implemented in MAFFT. The performances of FFT-NS-2 and FFT-NS-i were compared with other methods by computer simulations and benchmark tests; the CPU time of FFT-NS-2 is drastically reduced as compared with CLUSTALW with comparable accuracy. FFT-NS-i is over 100 times faster than T-COFFEE, when the number of input sequences exceeds 60, without sacrificing the accuracy.


    INTRODUCTION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Multiple sequence alignment is a basic tool in various aspects of molecular biological analyses ranging from detecting key functional residues to inferring the evolutionary history of a protein family. It is, however, difficult to align distantly related sequences correctly without manual inspections by expert knowledge. Many efforts have been made on the problems concerning the optimization of sequence alignment. Needleman and Wunsch (1) presented an algorithm for sequence comparison based on dynamic programming (DP), by which the optimal alignment between two sequences is obtained. The generalization of this algorithm to multiple sequence alignment (2) is not applicable to a practical alignment that consists of dozens or hundreds of sequences, since it requires huge CPU time proportional to NK, where K is the number of sequences each with length N. To overcome this difficulty, various heuristic methods, including progressive methods (3) and iterative refinement methods (46), have been proposed to date. They are mostly based on various combinations of successive two-dimensional DP, which takes CPU time proportional to N2.

Even if these heuristic methods successfully provide the optimal alignments, there remains the problem of whether the optimal alignment really corresponds to the biologically correct one. The accuracy of resulting alignments is greatly affected by the scoring system. Thompson et al. (7) developed a complicated scoring system in their program CLUSTALW, in which gap penalties and other parameters are carefully adjusted according to the features of input sequences, such as sequence divergence, length, local hydropathy and so on. Nevertheless, no existing scoring system is able to process correctly global alignments for various types of problems including large terminal extension of internal insertion (8). Considerable improvements in the accuracy have recently been made in CLUSTALW (7) version 1.8, the most popular alignment program with excellent portability and operativity, and T-COFFEE (9), which provides alignments of the highest accuracy among known methods to date.

On the other hand, few improvements have been made successfully to reduce the CPU time, since the proposal of the progressive method by Feng and Doolittle (3). A high-speed computer program applicable to large-scale problems is becoming more important with the rapid increase in the number of protein and DNA sequences. In order to improve the speed of DP, it is effective to use highly homologous segments in the procedure of multiple sequence alignment (10). There are well-known homology search programs, such as FASTA (11) and BLAST (12), based on string matching algorithms.

In this report, we developed a novel method for multiple sequence alignment based on the fast Fourier transform (FFT), which allows rapid detection of homologous segments. In spite of its great efficiency, FFT has rarely been used practically for detecting sequence similarities (13,14). We also propose an improved scoring system, which performs well even for sequences having large insertions or extensions as well as distantly related sequences of similar length. The efficiency (CPU time and accuracy) of the method was tested by computer simulations and the BAliBASE (15) benchmark tests in comparison with several existing methods. These tests showed that the CPU time has been drastically reduced, whereas the accuracy of the resulting alignments is comparable with that of the most accurate methods among existing ones.


    METHODS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Group-to-group alignments by FFT
The frequency of amino acid substitutions strongly depends on the difference of physico-chemical properties, particularly volume and polarity, between the amino acid pair involved in the substitution (16). Substitutions between physico-chemically similar amino acids tend to preserve the structure of proteins, and such neutral substitutions have been accumulated in molecules during evolution (17). It is therefore reasonable to consider that an amino acid a is assigned to a vector whose components are the volume value v(a) and the polarity value p(a) introduced by Grantham (18). We use the normalized forms of these values: (a) = [v(a) – ]/{sigma}v and (a) = [ p(a) – ]/{sigma}p, where an overbar denotes the average over 20 amino acids, and {sigma}v and {sigma}p denote the standard deviations of volume and polarity, respectively. An amino acid sequence is converted to a sequence of such vectors.

Calculation of the correlation between two amino acid sequences. We define the correlation c(k) between two sequences of such vectors as

c(k) = cv(k) + cp(k),1

where cv(k) and cp(k) are, as defined below, the correlations of volume component and polarity component, respectively, between two amino acid sequences to be aligned. The correlation c(k) represents the degree of similarity of two sequences with the positional lag of k sites. The high value of c(k) indicates that the sequences may have homologous regions.

The correlation cv(k) of volume component between sequence 1 and sequence 2 with the positional lag of k sites is defined as

where 1(n) and 2(n) are the volume component of the nth site of sequence 1 with the length of N and that of sequence 2 with the length of M, respectively. Considering N ~= M in many cases, equation 2 takes O(N2) operations. The FFT reduces the CPU time of this calculation to O(N log N) (19). If V1(m) and V2(m) are the Fourier transform of 1(n) and 2(n), i.e.

1(n) {iff} V1(m)3

2(n) {iff} V2(m),4

it is known that the correlation cv(k) is expressed as

cv(k) {iff} V1*(m) · V2(m),5

where {iff} represents transform pairs, and the asterisk denotes complex conjugation.

The correlation cp(k) of polarity component between two sequences

is calculated in the same manner.

Finding homologous segments. If two sequences compared have homologous regions, the correlation c(k) has some peaks corresponding to these regions (Fig. 1A). By the FFT analysis, however, we can know only the positional lag k of a homologous region in two sequences but not the position of the region. As shown in Figure 1B, to determine the positions of the homologous region in each sequence, a sliding window analysis with the window size of 30 sites is carried out, in which the degree of local homologies is calculated for each of the highest 20 peaks in the correlation c(k). We identify a segment of 30 sites with score value exceeding a given threshold (0.7 per site in our program, see below for details of the scoring system) as a homologous segment. If two or more successive segments are identified as homologous segments, they are combined into one segment of larger length. If the length of the combined segment is longer than 150 sites, the segment is divided into several segments with 150 sites each.



View larger version (23K):
[in this window]
[in a new window]
 
Figure 1. (A) A result of the FFT analysis. There are two peaks corresponding to two homologous blocks. (B) Sliding window analysis is carried out and the positions of homologous blocks are determined. Note that window size is 30 (see text) but the window size is set to 4 in (B) for simplicity.

 

Dividing a homology matrix. To obtain an alignment between two sequences, the homologous segments must be arranged consistently in both sequences. A matrix Sij(1 <= i, j <= n, n is the number of homologous segments) is constructed in the following manner. If the ith homologous segment on sequence 1 corresponds to the jth homologous segment on sequence 2, Sij has the score value of the homologous segment calculated above; otherwise Sij is set to 0. By applying the standard DP procedure to matrix Sij, we obtain the optimal path, which corresponds to the optimal arrangement of homologous segments. Figure 2A shows an example in which five homologous segments exist. The order of segments in sequence 1 differs from that in sequence 2. The optimal path depends on S23 and S32; if S23 > S32, the path with bold arrows is optimal.



View larger version (37K):
[in this window]
[in a new window]
 
Figure 2. (A) An example of the segment-level DP; (B) Reducing the area for DP on a homology matrix.

 
Overall homology matrix is divided into some sub-matrices at the boundary corresponding to the center of homologous segments as illustrated in Figure 2B. As a result, the shaded area in Figure 2B is excluded from the calculation. As many homologous segments are detected by FFT, the CPU time is reduced.

Extension to group-to-group alignments. The procedure described above can be easily extended to group-to-group alignment by considering equations 2 and 6 as a special case with one sequence in each group. These equations are extended to group-to-group alignment by replacing 1(n) with group1(n), which is the linear combination of the volume components of the members belonging to group 1:

where wi is the weighting factor for sequence i, which is calculated in the same manner as CLUSTALW (7) for the progressive method, or in the same manner as Gotoh’s (20) weighting system for the iterative refinement method. Similarly, polarity component is calculated as:

This method is applicable to nucleotide sequences by converting a sequence to a sequence of four-dimensional vectors whose components are the frequencies of A, T, G and C at each column, instead of volume and polarity values. In this case, correlation between two nucleotide sequences is:

c(k) = cA(k) + cT(k) + cG(k) + cC(k).

Scoring system
Similarity matrix. In order to increase the efficiency of alignment, the scoring system (similarity matrix and gap penalties) was also modified. Vogt et al. (21) suggested that the Needleman–Wunsch (NW) algorithm performs well with all-positive matrices, in which all elements have positive values. CLUSTALW (7) and other methods use such all-positive matrices by default. Since Vogt et al. (21) examined only the cases in which members of each protein familiy are similar in length, it is not clear whether such all-positive matrices are suitable to various alignment problems, particularly to those of different length. Accordingly, contrary to existing methods, we adopted a normalized similarity matrix ab (a and b are amino acids) that has both positive and negative values:

ab = [(Mab average2)/(average1 – average2)] + Sa, 7

where average1 = {Sigma}afaMaa, average2 = {Sigma}a,bfafbMab, Mab is raw similarity matrix, fa is the frequency of occurrence of amino acid a, and Sa is a parameter that functions as a gap extension penalty. Under this similarity matrix ab, the score per site between two random sequences is Sa, and the score per site between two identical sequences is 1.0 + Sa. If Sa is much smaller than unity, gaps are scored virtually equivalent to random amino acid sequences.

The default parameters of our program are: Mab is the 200 PAM log-odds matrix by Jones et al. (22), fa is the frequency of occurrence for amino acid a calculated by Jones et al. (22), Sop (gap opening penalty, defined below) is 2.4 and Sa is 0.06, for amino acid sequences. For nucleic acid sequences, Mab is the 200 PAM log-odds matrix calculated from Kimura’s two parameter model (23) with transition/transversion ratio of 2.0, fa is 0.25, Sop is 2.4 and Sa is 0.06.

Gap penalty. Homology matrix H(i, j) between two amino acid sequences A(i) and B(j) is constructed from the similarity matrix as H(i, j) = A(i)B(i), where i and j are positions in sequences. When two groups of sequences are aligned, homology matrix between group 1 and group 2 is calculated as:

where A(n, i) indicates the ith site of the nth sequence in group 1, B(m, j) is the jth site of the mth sequence in group 2, and wn is the weighting factor, defined previously, for nth sequences.

In the NW algorithm (1), the optimal alignment between two groups of sequences is calculated as:

where P(i,j) is the accumulated score for the optimal path from (1,1) to (i, j), and G1(i, x) and G2(j, y) are gap penalties defined below.

Each group of sequences may contain the gaps already introduced at previous steps. If a gap is newly introduced at the same position as one of such existing gaps, the new gap should not be penalized, because these new and existing gaps are probably resulting from a single insertion or deletion event. Gotoh (6) and Thompson et al. (7) developed position-specific gap penalties depending on the pattern of existing gaps. Our method used in this report is rather simpler than theirs:

G1(i, x) = Sop · {1 – [g1start(x) + g1end(i)]/2},

where Sop corresponds to a gap opening penalty, g1start(x) is the number of the gaps that start at the xth site, and g1end(i) is the number of the gaps that end at the ith site. That is,

where zm(i) = 1 and am(i) = 0, if the ith site of sequence m is a gap; otherwise zm(i) = 0 and am(i) = 1; wm is the weighting factor for sequence m. The other penalty G2(j, y) is calculated in the same manner. Because this formulation is simpler than existing ones (6,7), the CPU time is considerably reduced, but the accuracy of resulting alignments is comparable with that by existing scoring systems (see Results).

Computer programs
We have developed a program package MAFFT, which incorporates new techniques described above. The source code for the FFT algorithm has been taken from Press et al. (19). In MAFFT, the progressive method (3,7) (FFT-NS-1, FFT-NS-2) and the iterative refinement method (46) (FFT-NS-i) are implemented with some slight modifications described below.

FFT-NS-1. Using the FFT algorithm and the normalized similarity matrix described above, input sequences are progressively aligned following the branching order of sequences in the guide tree. This method is hereafter referred to as FFT-NS-1. This method requires a guide tree based on the all-pairwise comparison, whose CPU time is O(K2), where K is the number of sequences. Rapid calculation of a distance matrix is important for the case of large K. Thus we adopted the method of Jones et al. (22) with two modifications; 20 amino acids are grouped into six physico-chemical groups (24), and the number Tij of 6-tuples shared by sequence i and sequence j is counted. This value is converted to a distance Dij between sequence i and sequence j as

Dij = 1 – [Tij/min(Tii, Tjj)].

The guide tree is constructed from this distance matrix using the UPGMA method (25).

FFT-NS-2. Input sequences are realigned along the guide tree inferred from the alignment by FFT-NS-1. It is expected that more reliable alignments are obtained on the basis of more reliable guide trees (26). This method is referred to as FFT-NS-2.

FFT-NS-i. An alignment obtained by FFT-NS-2 is subjected to further improvement, in which the alignment is divided into two groups and realigned (46). We employ a technique called tree-dependent restricted partitioning (27). This process is repeated until no better scoring alignment is obtained in respect of the score described above. This method is referred to as FFT-NS-i.

To test the effect of the FFT algorithm or the normalized similarity matrix described above, we compared these three methods with several methods in which these newly developed techniques are not used.

NW-NS-1/NW-NS-2. We examined a method that uses the standard NW algorithm, instead of the FFT algorithm, with the normalized similarity matrix described above. This method is referred to as NW-NS-1 or NW-NS-2. Concerning the guide trees, NW-NS-1 and NW-NS-2 are identical to FFT-NS-1 or FFT-NS-2, respectively.

NW-AP-2. To test the effect of the normalized similarity matrix described above, we examined a method with conventional all-positive similarity matrix (21), which is made positive by subtracting the smallest number in the matrix from all elements. This is equivalent to setting Sa in equation 7 to 0.82 for the similarity matrix we use. This method is referred to as NW-AP-2. Except for the similarity matrix, the procedure of NW-AP-2 is identical to that of NW-NS-2.


    RESULTS
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
Computer simulations
In order to evaluate the performance of the present methods, we have conducted computer simulations focusing on the CPU time and the accuracy. Using the sequences generated by a simulation program ROSE (28), the CPU times of the present methods and two existing methods, CLUSTALW version 1.82 and T-COFFEE, were compared for the various length and the various numbers of sequences. Two types of sequence sets were used; one is composed of highly conserved sequences with ~35–85% identities (average distance is 100 PAM), and the other is a group of distantly related sequences with ~15–65% identities (average distance is 250 PAM). We also estimated the order of CPU time [Y of O(XY), where X is the length or the number of input sequences] by the power regression analysis.

Figure 3 shows the dependence of CPU time on sequence length. The regression coefficient of each method is also shown. The standard NW-based methods, CLUSTALW and NW-NS-2, require the CPU time proportional to the square of sequence length (the regression coefficients are close to 2 for both methods) independently of the degrees of sequence similarities, as expected. In contrast, the CPU times of FFT-based methods, FFT-NS-2 and FFT-NS-i, depend on the degree of similarities of input sequences; the CPU times of FFT-NS-2 and FFT-NS-i are virtually proportional to the sequence length for highly conserved sequences (regression coefficients are close to 1 in Fig. 3A), whereas the CPU time of FFT-NS-2 is close to that of NW-NS-2 for distantly related sequences (Fig. 3B).



View larger version (54K):
[in this window]
[in a new window]
 
Figure 3. The plot of CPU time versus the average lengths of input sequences for three methods described in the text, FFT-NS-2, FFT-NS-i and NW-NS-2, and two existing methods, CLUSTALW and T-COFFEE. The average percent identities among input sequences are ~35–85% (A) and ~15–65% (B). The number of sequences is 40. The regression coefficient calculated from the power regression analysis is shown for each method. For all cases, default parameters were used, except for CLUSTALW, in which both cases default setting (CLW18d) and ‘quicktree’ option (CLW18q) were examined. All of the calculations were performed on a Linux operating system (Intel Xeon 1.7 GHz with 1 GB of memory). The gcc version 2.96 compiler was used with the optimization option ‘-O3’.

 
Figure 4A and B show the dependence of CPU times on the number (K) of input sequences. The time consumption of T-COFFEE is O(K3) for alignments of relatively large number of sequences, as Notredame et al. (9) estimated. CLUSTALW (default), which requires the all-pairwise comparison by the standard NW algorithm, consumes O(K2) CPU time. Other methods require CPU times of approximately O(K).



View larger version (54K):
[in this window]
[in a new window]
 
Figure 4. The plot of CPU time versus the number of input sequences for three methods described in the text, FFT-NS-2 and FFT-NS-i, and two existing methods, CLUSTALW and T-COFFEE. The average percent identities among input sequences are ~35–85% (A) and ~15–65% (B). The average length of input sequences is 300. The regression coefficient calculated from the power regression analysis is shown for each method. For all cases, default parameters were used, except for CLUSTALW, in which both cases default setting (CLW18d) and ‘quicktree’ option (CLW18q) were examined. All of the calculations were performed on a Linux operating system (Intel Xeon 1.7 GHz with 1 GB of memory). The gcc version 2.96 compiler was used with the optimization option ‘-O3’.

 
To test the accuracy, five newly developed methods, FFT-NS-1, FFT-NS-2, NW-NS-1, NW-NS-2 and FFT-NS-i, were applied to the sequences of various homology levels generated by ROSE (28). The accuracy of each method was measured by sum-of-pairs score, where a reconstructed alignment is compared with the simulated (‘correct’) alignment and the ratio of correctly aligned pairs is calculated from all possible pairs (8). The simulations were repeated 100 times and averaged for each method (Fig. 5).



View larger version (24K):
[in this window]
[in a new window]
 
Figure 5. The plot of sum-of-pairs score (8) versus the average distance of input sequences for five methods, FFT-NS-1, FFT-NS-2, FFT-NS-i, NW-NS-1 and NW-NS-2. The number of input sequences is 40, and sequence lengths are 200 sites on average. Vertical lines indicate the standard deviations of the scores. For all cases, default parameters were used.

 
The accuracy of FFT-based methods (FFT-NS-1 and FFT-NS-2) is almost equivalent to that of standard NS-based methods (NW-NS-1 and NW-NS-2). This result indicates that the FFT algorithm does not sacrifice the accuracy. FFT-NS-2 performs better than FFT-NS-1 as expected. FFT-NS-i has an advantage in accuracy over FFT-NS-1 and FFT-NS-2 for distantly related sequences.

Benchmarks using BAliBASE
Thompson et al. (8) have published a systematic comparison of widely distributed alignment programs using the BAliBASE benchmark alignment database (15), a database of ‘correct’ alignments based on three-dimensional structural superimpositions. The BAliBASE database is categorized into five different types of references. The first category is made up of phylogenetically equidistant members of similar length. In the second category, each alignment contains up to three orphan sequences with a group of close relatives. The third category contains up to four distantly related groups, while the fourth and fifth categories involve long terminal and internal insertions, respectively. These references will be referred to as categories 1–5 hereafter.

We have applied four methods described in Methods, NW-AP-2, NW-NS-2, FFT-NS-2 and FFT-NS-i, to this database to compare their efficiencies with those of five existing methods, DIALIGN (29,30), PIMA (31), CLUSTALW (7) version 1.82, PRRP (32) and T-COFFEE (9). The sum-of-pairs scores (see above) and the column scores [the ratio of correctly aligned columns (8)] were calculated and averaged in each category. Wilcoxon matched-pair signed-rank test and t-test were carried out to test the significance of the difference in the accuracy of each method. These tests give P-values, which is the probability that the observed differences may be due to chance.

Table 1 shows the results of this benchmark test together with the CPU time of each method for performing this test. Unlike the simulation above, FFT-NS-2 (FFT-based method) takes CPU time almost equivalent to NW-NS-2. This is because the FFT algorithm is not efficient for distantly related sequences like these tests. NW-NS-2 takes less CPU time than CLUSTALW does, possibly because of the simple calculation procedure of the former. FFT-NS-i takes less CPU time than T-COFFEE does.


View this table:
[in this window]
[in a new window]
 
Table 1. Sum-of-pairs scores and column scores of various alignment methods for the BAliBASE benchmark tests
 
The accuracy of NW-AP-2, which contains neither the improved scoring system described above nor the FFT algorithm, is comparable with that of the previous version (1.7) of CLUSTALW (data not shown). By using the improved scoring system shown in equation 7, NW-NS-2 and FFT-NS-2 perform considerably better than NW-AP-2. T-COFFEE marked the highest average accuracy, but the accuracy of FFT-NS-i is comparable with that of T-COFFEE. P-values by Wilcoxon matched-pair signed-rank test are 0.13 for sum-of-pairs score and 0.43 for column score, and P-values by t-test are 0.10 for sum-of-pairs score and 0.23 for column score. Thus the difference is not significant.

Applications to the LSU rRNA and RNA polymerase sequences
BAliBASE is biased toward alignments composed of a small number of short sequences; the number of sequences in each alignment is 9.2 and sequence length is 251.1 on average. To illustrate the power of our approach to practical sequence analyses, we selected two examples of relatively large data sets: the nucleotide sequences of LSU rRNA and the amino acid sequences of the RNA polymerase largest subunit.

LSU rRNA. The Ribosomal Database Project (RDP-II) (33) contains 72 LSU rRNA sequences from Bacteria, Archaea and Eucarya. This alignment was used as a reference alignment. We also use another reference alignment of 59 sequences in which fragment sequences were excluded from the full 72 sequences set (the reference alignments are available at http://www.biophys.kyoto-u.ac.jp/~katoh/align/example/lsu). The CPU times and the sum-of-pairs and column scores (8) of NW-AP-2, NW-NS-2, FFT-NS-2 and FFT-NS-i were compared with those of two existing methods, CLUSTALW (version 1.82) and T-COFFEE using these two data sets (Table 2). The FFT-based methods (FFT-NS-2 and FFT-NS-i) are efficient for such relatively large data sets.


View this table:
[in this window]
[in a new window]
 
Table 2. Comparison of several methods using the LSU rRNA sequences
 
The largest subunit of RNA polymerase. We used a reference alignment of the largest subunit sequences of RNA polymerase by Iwabe et al. (34), which includes 11 highly conserved blocks. Two data sets, one (large) composed of 76 sequences and the other (small) composed of 24 sequences, were compiled. Both of them contain amino acid sequences from Bacteria, Archaea and three major classes (I, II and III) from Eucarya (the reference alignments are available at http://www.biophys.kyoto-u.ac.jp/~katoh/align/example/rpol). Table 3 shows the CPU time and the number of correctly detected conserved blocks of sequences by six methods: NW-AP-2, FFT-NS-2, NW-NS-2, FFT-NS-i, CLUSTALW version 1.82 and T-COFFEE. T-COFFEE, FFT-NS-2, FFT-NS-i and NW-NS-2 successfully detected all of the 11 blocks, although the CPU times differ for different methods. The CPU time of FFT-NS-2 (FFT-based method) is about one-third of that of NW-NS-2 (standard NW-based method).


View this table:
[in this window]
[in a new window]
 
Table 3. Comparison of several methods using the largest subunit sequences of RNA polymerase
 

    DISCUSSION
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 
It has been supposed that appropriate alignment algorithm depends on the nature of the sequences to be aligned (8,35); the NW algorithm produces accurate and reliable alignments for references 1, 2 and 3 in BAliBASE, whereas the Smith–Waterman (SW) algorithm (36), a method for detecting local homology, is successful for categories 4 and 5. It may be quite impractical to select properly these different algorithms, depending on the nature of input sequences; actual sequence data contain various types of sequences, i.e. fragment sequences, fusion proteins, orphan sequences, over-representation of some members and so on.

On the basis of such considerations, Notredame et al. (9) formulates a combination of NW and SW alignment procedures in T-COFFEE. This attempt is successful in improving the accuracy at the sacrifice of the computational simplicity. Thus, this method may be applicable to short and small data sets like those in BAliBASE as Karplus and Hu (37) pointed out. In contrast, the present methods employ a simple NW algorithm (NW-NS-2) or a more rapid algorithm based on FFT (FFT-NS-2 and FFT-NS-i). Nevertheless, the BAliBASE benchmark tests show that the present methods with the normalized similarity matrix perform well also for categories 4 and 5. As a result, the accuracy of FFT-NS-i is comparable with that of T-COFFEE. This result indicates that the accuracy of alignments can be considerably improved without complicating any computational process, contrary to the conventional thought that a combination of the NW and SW algorithms was necessary for computing high-quality alignments (8,9,35). The improvement in accuracy was achieved simply by normalizing the similarity matrix.

This suggests the importance of parameter choice as Thompson et al. (7,8) pointed out. However, there is a large difference between their strategy and ours. The scoring system used in CLUSTALW is complicated and time consuming; many parameters in the scoring system dynamically vary depending on input sequences. In contrast, the present scoring system is simple; the similarity matrix is fixed for any input sequences, and even extension gap penalty is not explicitly contained in the DP algorithm. Nevertheless, the accuracy of NW-NS-2/FFT-NS-2 is comparable with that of CLUSTALW.

In all cases tested above, the present methods consume generally less CPU time than existing methods of comparable accuracy do. It is remarkable that the order of CPU time is reduced from O(N2) to O(N) by the FFT algorithm for highly conserved sequences (Fig. 3A), where N is sequence length. Such a rapid multiple alignment method is suitable for automated high-throughput analysis of genomic sequences. At the same time, biologists’ expertise is still of particular importance and, consequently, a user-friendly alignment workbench is required, which provides easy access to the various information collected by database searches, alignment analyses and the predictions obtained by non-homology methods (38). The method presented here is also useful as a core component of such an integrated alignment workbench.

The MAFFT program package is freely available at http://www.biophys.kyoto-u.ac.jp/~katoh/programs/align/mafft. It has been tested on the Linux operating system. A graphical user interface, written by H. Suga, K. Katoh, Y. Yamawaki, K. Kuma, D. Hoshiyama, N. Iwabe and T. Miyata, on the X Window System is also available at http://www.biophys. kyoto-u.ac.jp/~katoh/programs/align/xced.


    ACKNOWLEDGEMENTS
 
We thank Drs N. Iwabe, H. Suga and D. Hoshiyama for helpful comments. This work was supported by grants from the Ministry of Education, Culture, Sports, Science and Technology of Japan.


    REFERENCES
 TOP
 ABSTRACT
 INTRODUCTION
 METHODS
 RESULTS
 DISCUSSION
 REFERENCES
 

  1. Needleman,S.B. and Wunsch,C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J. Mol. Biol., 48, 443–453.[Web of Science][Medline]

  2. Sankoff,D. and Cedergren,R.J. (1983) Simultaneous comparison of three or more sequences related by a tree. In Sankoff,D. and Kruskal,J.B. (eds), Time Warps, String Edits and Macromolecules: The Theory and Practice of Sequence Comparison. Addison-Wesley, London, UK, pp. 253–264.

  3. Feng,D.F. and Doolittle,R.F. (1987) Progressive sequence alignment as a prerequisite to correct phylogenetic trees. J. Mol. Evol., 25, 351–360.[Web of Science][Medline]

  4. Barton,G.J. and Sternberg,M.J. (1987) A strategy for the rapid multiple alignment of protein sequences. Confidence levels from tertiary structure comparisons. J. Mol. Biol., 198, 327–337.[Web of Science][Medline]

  5. Berger,M.P. and Munson,P.J. (1991) A novel randomized iterative strategy for aligning multiple protein sequences. Comput. Appl. Biosci., 7, 479–484.[Abstract/Free Full Text]

  6. Gotoh,O. (1993) Optimal alignment between groups of sequences and its application to multiple sequence alignment. Comput. Appl. Biosci., 9, 361–370.[Abstract/Free Full Text]

  7. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680.[Abstract/Free Full Text]

  8. Thompson,J.D., Plewniak,F. and Poch,O. (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res., 27, 2682–2690.[Abstract/Free Full Text]

  9. Notredame,C., Higgins,D.G. and Heringa,J. (2000) T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol., 302, 205–217.[Web of Science][Medline]

  10. Delcher,A.L., Kasif,S., Fleischmann,R.D., Peterson,J., White,O. and Salzberg,S.L. (1999) Alignment of whole genomes. Nucleic Acids Res., 27, 2369–2376.[Abstract/Free Full Text]

  11. Pearson,W.R. and Lipman,D.J. (1988) Improved tools for biological sequence comparison. Proc. Natl Acad. Sci. USA, 85, 2444–2448.[Abstract/Free Full Text]

  12. Altschul,S.F., Madden,T.L., Schaffer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402.[Abstract/Free Full Text]

  13. Felsenstein,J., Sawyer,S. and Kochin,R. (1982) An efficient method for matching nucleic acid sequences. Nucleic Acids Res., 10, 133–139.[Abstract/Free Full Text]

  14. Rajasekaran,S., Jin,X. and Spouge,J.L. (2002) The efficient computation of position-specific match scores with the fast Fourier transform. J. Comput. Biol., 9, 23–33.[Web of Science][Medline]

  15. Thompson,J.D., Plewniak,F. and Poch,O. (1999) BAliBASE: a benchmark alignment database for the evaluation of multiple alignment programs. Bioinformatics, 15, 87–88.[Abstract/Free Full Text]

  16. Miyata,T., Miyazawa,S. and Yasunaga,T. (1979) Two types of amino acid substitutions in protein evolution. J. Mol. Evol., 12, 219–236.[Web of Science][Medline]

  17. Kimura,M. (1983) The Neutral Theory of Molecular Evolution. Cambridge University Press, Cambridge, UK.

  18. Grantham,R. (1974) Amino acid difference formula to help explain protein evolution. Science, 185, 862–864.[Abstract/Free Full Text]

  19. Press,W.H., Teukolsky,S.A., Vetterling,W.T. and Flannery,B.P. (1995) Numerical Recipes in C: The Art of Scientific Computing, 2nd Edn. Cambridge University Press, Cambridge, UK.

  20. Gotoh,O. (1995) A weighting system and algorithm for aligning many phylogenetically related sequences. Comput. Appl. Biosci., 11, 543–551.[Abstract/Free Full Text]

  21. Vogt,G., Etzold,T. and Argos,P. (1995) An assessment of amino acid exchange matrices in aligning protein sequences: the twilight zone revisited. J. Mol. Biol., 249, 816–831.[Web of Science][Medline]

  22. Jones,D.T., Taylor,W.R. and Thornton,J.M. (1992) The rapid generation of mutation data matrices from protein sequences. Comput. Appl. Biosci., 8, 275–282.[Abstract/Free Full Text]

  23. Kimura,M. (1980) A simple method for estimating evolutionary rates of base substitutions through comparative studies of nucleotide sequences. J. Mol. Evol., 16, 111–120.[Web of Science][Medline]

  24. Dayhoff,M.O., Schwartz,R.M. and Orcutt,B.C. (1978) A model of evolutionary change in proteins. In Dayhoff,M.O. and Ech,R.V. (eds), Atlas of Protein Sequence and Structure. National Biomedical Research Foundation, MD, pp. 345–352.

  25. Sokal,R.R. and Michener,C.D. (1958) A statistical mehod for evaluating systematic relationships. University of Kansas Scientific Bulletin, 28, 1409–1438.

  26. Tateno,Y., Ikeo,K., Imanishi,T., Watanabe,H., Endo,T., Yamaguchi,Y., Suzuki,Y., Takahashi,K., Tsunoyama,K., Kawai,M., Kawanishi,Y., Naitou,K. and Gojobori,T. (1997) Evolutionary motif and its biological and structural significance. J. Mol. Evol., 44 (Suppl. 1), S38–S43.

  27. Hirosawa,M., Totoki,Y., Hoshida,M. and Ishikawa,M. (1995) Comprehensive study on iterative algorithms of multiple sequence alignment. Comput. Appl. Biosci., 11, 13–18.[Abstract/Free Full Text]

  28. Stoye,J., Evers,D. and Meyer,F. (1997) Generating benchmarks for multiple sequence alignments and phylogenetic reconstructions. Proc. Int. Conf. Intell. Syst. Mol. Biol., 5, 303–306.[Medline]

  29. Morgenstern,B., Dress,A. and Werner,T. (1996) Multiple DNA and protein sequence alignment based on segment-to-segment comparison. Proc. Natl Acad. Sci. USA, 93, 12098–12103.[Abstract/Free Full Text]

  30. Morgenstern,B. (1999) DIALIGN2: improvement of the segment-to-segment approach to multiple sequence alignment. Bioinformatics, 15, 211–218.[Abstract/Free Full Text]

  31. Smith,R.F. and Smith,T.F. (1992) Pattern-induced multi-sequence alignment (PIMA) algorithm employing secondary structure-dependent gap penalties for use in comparative protein modelling. Protein Eng., 5, 35–41.[Abstract/Free Full Text]

  32. Gotoh,O. (1996) Significant improvement in accuracy of multiple protein sequence alignments by iterative refinement as assessed by reference to structural alignments. J. Mol. Biol., 264, 823–838.[Web of Science][Medline]

  33. Maidak,B.L., Cole,J.R., Lilburn,T.G., Parker,C.T.,Jr, Saxman,P.R., Farris,R.J., Garrity,G.M., Olsen,G.J., Schmidt,T.M. and Tiedje,J.M. (2001) The RDP-II (ribosomal database project). Nucleic Acids Res., 29, 173–174.[Abstract/Free Full Text]

  34. Iwabe,N., Kuma,K., Kishino,H., Hasegawa,M. and Miyata,T. (1991) Evolution of RNA polymerases and branching patterns of the three major groups of archaebacteria. J. Mol. Evol., 32, 70–78.[Web of Science][Medline]

  35. McClure,M.A., Vasi,T.K. and Fitch,W.M. (1994) Comparative analysis of multiple protein-sequence alignment methods. Mol. Biol. Evol., 11, 571–592.[Abstract]

  36. Smith,T.F. and Waterman,M.S. (1981) Identification of common molecular subsequences. J. Mol. Biol., 147, 195–197.[Web of Science][Medline]

  37. Karplus,K. and Hu,B. (2001) Evaluation of protein multiple alignments by SAM-T99 using the BAliBASE multiple alignment test set. Bioinformatics, 17, 713–720.[Abstract/Free Full Text]

  38. Lecompte,O., Thompson,J.D., Plewniak,F., Thierry,J. and Poch,O. (2001) Multiple alignment of complete sequences (MACS) in the post-genomic era. Gene, 270, 17–30.[Web of Science][Medline]


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
J. Bacteriol.Home page
U. Krauss, B. Q. Minh, A. Losi, W. Gartner, T. Eggert, A. von Haeseler, and K.-E. Jaeger
Distribution and Phylogeny of Light-Oxygen-Voltage-Blue-Light-Signaling Proteins in the Three Kingdoms of Life
J. Bacteriol., December 1, 2009; 191(23): 7234 - 7242.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
T. A. Leski, C. C. Caswell, M. Pawlowski, D. J. Klinke, J. M. Bujnicki, S. J. Hart, and S. Lukomski
Identification and Classification of bcl Genes and Proteins of Bacillus cereus Group Organisms and Their Application in Bacillus anthracis Detection and Fingerprinting
Appl. Envir. Microbiol., November 15, 2009; 75(22): 7163 - 7172.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
S. M. Rodriguez, M. D. Golemba, R. H. Campos, K. Trono, and L. R. Jones
Bovine leukemia virus can be classified into seven genotypes: evidence for the existence of two novel clades
J. Gen. Virol., November 1, 2009; 90(11): 2788 - 2797.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
R. Carter and G. Drouin
The Evolutionary Rates of Eukaryotic RNA Polymerases and of Their Transcription Factors Are Affected by the Level of Concerted Evolution of the Genes They Transcribe
Mol. Biol. Evol., November 1, 2009; 26(11): 2515 - 2520.
[Abstract] [Full Text] [PDF]


Home page
JEMHome page
T. Graef, A. K. Moesta, P. J. Norman, L. Abi-Rached, L. Vago, A. M. Older Aguilar, M. Gleimer, J. A. Hammond, L. A. Guethlein, D. A. Bushnell, et al.
KIR2DS4 is a product of gene conversion with KIR3DL2 that introduced specificity for HLA-A*11 while diminishing avidity for HLA-C
J. Exp. Med., October 26, 2009; 206(11): 2557 - 2572.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
Y. Ogura, T. Ooka, A. Iguchi, H. Toh, M. Asadulghani, K. Oshima, T. Kodama, H. Abe, K. Nakayama, K. Kurokawa, et al.
Comparative genomics reveal the mechanism of the parallel evolution of O157 and non-O157 enterohemorrhagic Escherichia coli
PNAS, October 20, 2009; 106(42): 17939 - 17944.
[Abstract] [Full Text] [PDF]


Home page
Gen Biol EvolHome page
F. Burki, Y. Inagaki, J. Brate, J. M. Archibald, P. J. Keeling, T. Cavalier-Smith, M. Sakaguchi, T. Hashimoto, A. Horak, S. Kumar, et al.
Large-Scale Phylogenomic Analyses Reveal That Two Enigmatic Protist Lineages, Telonemia and Centroheliozoa, Are Related to Photosynthetic Chromalveolates
Gen Biol Evol, October 19, 2009; 2009(0): 231 - 238.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
F. H. Leendertz, M. Deckers, W. Schempp, F. Lankester, C. Boesch, L. Mugisha, A. Dolan, D. Gatherer, D. J. McGeoch, and B. Ehlers
Novel cytomegaloviruses in free-ranging and captive great apes: phylogenetic evidence for bidirectional horizontal transmission
J. Gen. Virol., October 1, 2009; 90(10): 2386 - 2394.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
X. Y. Han, K. C. Sizer, E. J. Thompson, J. Kabanja, J. Li, P. Hu, L. Gomez-Valero, and F. J. Silva
Comparative Sequence Analysis of Mycobacterium leprae and the New Leprosy-Causing Mycobacterium lepromatosis
J. Bacteriol., October 1, 2009; 191(19): 6067 - 6074.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
R. R. Stocsits, H. Letsch, J. Hertel, B. Misof, and P. F. Stadler
Accurate and efficient reconstruction of deep phylogenies from structured RNAs
Nucleic Acids Res., October 1, 2009; 37(18): 6184 - 6193.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
E. van den Born, A. Bekkelund, M. N. Moen, M. V. Omelchenko, A. Klungland, and P. O. Falnes
Bioinformatics and functional analysis define four distinct groups of AlkB DNA-dioxygenases in bacteria
Nucleic Acids Res., September 28, 2009; (2009) gkp774v1.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
L. Zhao, H. Dong, C. C. Zhang, L. Kinch, M. Osawa, M. Iacovino, N. V. Grishin, M. Kyba, and L. J.-s. Huang
A JAK2 Interdomain Linker Relays Epo Receptor Engagement Signals to Kinase Activation
J. Biol. Chem., September 25, 2009; 284(39): 26988 - 26998.
[Abstract] [Full Text] [PDF]


Home page
MycologiaHome page
D. Pavlic, B. Slippers, T. A. Coutinho, and M. J. Wingfield
Molecular and phenotypic characterization of three phylogenetic species discovered within the Neofusicoccum parvum/N. ribis complex
Mycologia, September 1, 2009; 101(5): 636 - 647.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
M. Roettger, W. Martin, and T. Dagan
A Machine-Learning Approach Reveals That Alignment Properties Alone Can Accurately Predict Inference of Lateral Gene Transfer from Discordant Phylogenies
Mol. Biol. Evol., September 1, 2009; 26(9): 1931 - 1939.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
G. A. Cromie
Phylogenetic Ubiquity and Shuffling of the Bacterial RecBCD and AddAB Recombination Complexes
J. Bacteriol., August 15, 2009; 191(16): 5076 - 5084.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
G. Fourie, E. T. Steenkamp, T. R. Gordon, and A. Viljoen
Evolutionary Relationships among the Fusarium oxysporum f. sp. cubense Vegetative Compatibility Groups
Appl. Envir. Microbiol., July 15, 2009; 75(14): 4770 - 4781.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. D. O'Connor and N. I. Mundy
Genotype-phenotype associations: substitution models to detect evolutionary associations between phenotypic variables and genotypic evolutionary rate
Bioinformatics, June 15, 2009; 25(12): i94 - i100.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Hawkins, C. Grant, W. S. Noble, and T. L. Bailey
Assessing phylogenetic motif models for predicting transcription factor binding sites
Bioinformatics, June 15, 2009; 25(12): i339 - i347.
[Abstract] [Full Text] [PDF]


Home page
GlycobiologyHome page
G. Michel, T. Barbeyron, B. Kloareg, and M. Czjzek
The family 6 carbohydrate-binding modules have coevolved with their appended catalytic modules toward similar substrate specificity
Glycobiology, June 1, 2009; 19(6): 615 - 623.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
H. Chen, S. Kshirsagar, I. Jensen, K. Lau, R. Covarrubias, S. F. Schluter, and J. J. Marchalonis
Characterization of arrangement and expression of the T cell receptor {gamma} locus in the sandbar shark
PNAS, May 26, 2009; 106(21): 8591 - 8596.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Rausch, S. Koren, G. Denisov, D. Weese, A.-K. Emde, A. Doring, and K. Reinert
A consistency-based consensus algorithm for de novo and reference-guided sequence assembly of short reads
Bioinformatics, May 1, 2009; 25(9): 1118 - 1124.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
D. J. McGeoch
Lineages of varicella-zoster virus
J. Gen. Virol., April 1, 2009; 90(4): 963 - 969.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
T. Nabeshima, H. T. K. Loan, S. Inoue, M. Sumiyoshi, Y. Haruta, P. T. Nga, V. T. Q. Huoung, M. del Carmen Parquet, F. Hasebe, and K. Morita
Evidence of frequent introductions of Japanese encephalitis virus from south-east Asia and continental east Asia to Japan
J. Gen. Virol., April 1, 2009; 90(4): 827 - 832.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Takeuchi, D. Schmitt, C. Chapple, E. Babaylova, G. Karpova, R. Guigo, A. Krol, and C. Allmang
A short motif in Drosophila SECIS Binding Protein 2 provides differential binding affinity to SECIS RNA hairpins
Nucleic Acids Res., April 1, 2009; 37(7): 2126 - 2141.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
N. Mine, J. Guglielmini, M. Wilbaux, and L. Van Melderen
The Decay of the Chromosomally Encoded ccdO157 Toxin-Antitoxin System in the Escherichia coli Species
Genetics, April 1, 2009; 181(4): 1557 - 1566.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
A. K. Moesta, L. Abi-Rached, P. J. Norman, and P. Parham
Chimpanzees Use More Varied Receptors and Ligands Than Humans for Inhibitory Killer Cell Ig-Like Receptor Recognition of the MHC-C1 and MHC-C2 Epitopes
J. Immunol., March 15, 2009; 182(6): 3628 - 3637.
[Abstract] [Full Text] [PDF]


Home page
Mol. Cell. ProteomicsHome page
C. Suss, C. Czupalla, C. Winter, T. Pursche, K.-P. Knoch, M. Schroeder, B. Hoflack, and M. Solimena
Rapid Changes of mRNA-binding Protein Levels following Glucose and 3-Isobutyl-1-methylxanthine Stimulation of Insulinoma INS-1 Cells
Mol. Cell. Proteomics, March 1, 2009; 8(3): 393 - 408.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
K. M. Eriksson, A. K. Clarke, L.-G. Franzen, M. Kuylenstierna, K. Martinez, and H. Blanck
Community-Level Analysis of psbA Gene Sequences and Irgarol Tolerance in Marine Periphyton
Appl. Envir. Microbiol., February 15, 2009; 75(4): 897 - 906.
[Abstract] [Full Text] [PDF]


Home page
Eukaryot CellHome page
M. M. Cockell, L. Lo Presti, L. Cerutti, E. Cano Del Rosario, P. M. Hauser, and V. Simanis
Functional Differentiation of tbf1 Orthologues in Fission and Budding Yeasts
Eukaryot. Cell, February 1, 2009; 8(2): 207 - 216.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
Y. Pauchet, D. Freitak, H. M. Heidel-Fischer, D. G. Heckel, and H. Vogel
Immunity or Digestion: GLUCANASE ACTIVITY IN A GLUCAN-BINDING PROTEIN FAMILY FROM LEPIDOPTERA
J. Biol. Chem., January 23, 2009; 284(4): 2214 - 2224.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
M. R. Aniba, S. Siguenza, A. Friedrich, F. Plewniak, O. Poch, A. Marchler-Bauer, and J. D. Thompson
Knowledge-based expert systems and a proof-of-concept case study for multiple sequence alignment construction and analysis
Brief Bioinform, January 1, 2009; 10(1): 11 - 23.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Bot.Home page
P. S. Soltis, S. F. Brockington, M.-J. Yoo, A. Piedrahita, M. Latvis, M. J. Moore, A. S. Chanderbali, and D. E. Soltis
Floral variation and floral genetics in basal angiosperms
Am. J. Botany, January 1, 2009; 96(1): 110 - 128.
[Abstract] [Full Text] [PDF]


Home page
GeneticsHome page
F. C. Almeida and R. DeSalle
Orthology, Function and Evolution of Accessory Gland Proteins in the Drosophila repleta Group
Genetics, January 1, 2009; 181(1): 235 - 245.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. Sarkar, R. DeSalle, and P. B. Fisher
Evolution of MDA-5/RIG-I-dependent innate immunity: Independent evolution by domain grafting
PNAS, November 4, 2008; 105(44): 17040 - 17045.
[Abstract] [Full Text] [PDF]


Home page
Antimicrob. Agents Chemother.Home page
A. J. Schmidtke and N. D. Hanson
Role of ampD Homologs in Overproduction of AmpC in Clinical Isolates of Pseudomonas aeruginosa
Antimicrob. Agents Chemother., November 1, 2008; 52(11): 3922 - 3927.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
S. Ramirez-Flandes and O. Ulloa
Bosque: integrated phylogenetic analysis software
Bioinformatics, November 1, 2008; 24(21): 2539 - 2541.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
Y. Hiwatashi, M. Obara, Y. Sato, T. Fujita, T. Murata, and M. Hasebe
Kinesins Are Indispensable for Interdigitation of Phragmoplast Microtubules in the Moss Physcomitrella patens
PLANT CELL, November 1, 2008; 20(11): 3094 - 3106.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Koyanagi, K. Takano, H. Tsukamoto, K. Ohtsu, F. Tokunaga, and A. Terakita
Jellyfish vision starts with cAMP signaling mediated by opsin-Gs cascade
PNAS, October 7, 2008; 105(40): 15576 - 15580.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
V. Ahola, T. Aittokallio, M. Vihinen, and E. Uusipaikka
Model-based prediction of sequence alignment quality
Bioinformatics, October 1, 2008; 24(19): 2165 - 2171.
[Abstract] [Full Text] [PDF]


Home page
Appl. Environ. Microbiol.Home page
K. Igarashi, T. Ishida, C. Hori, and M. Samejima
Characterization of an Endoglucanase Belonging to a New Subfamily of Glycoside Hydrolase Family 45 of the Basidiomycete Phanerochaete chrysosporium
Appl. Envir. Microbiol., September 15, 2008; 74(18): 5628 - 5634.
[Abstract] [Full Text] [PDF]


Home page
Proc R Soc BHome page
L. Bocak, M. Bocakova, T. Hunt, and A. P Vogler
Multiple ancient origins of neoteny in Lycidae (Coleoptera): consequences for ecology and macroevolution
Proc R Soc B, September 7, 2008; 275(1646): 2015 - 2023.
[Abstract] [Full Text] [PDF]


Home page
Int. J. Syst. Evol. Microbiol.Home page
T. Barbeyron, S. L'Haridon, G. Michel, and M. Czjzek
Mariniflexile fucanivorans sp. nov., a marine member of the Flavobacteriaceae that degrades sulphated fucans from brown algae
Int J Syst Evol Microbiol, September 1, 2008; 58(9): 2107 - 2113.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
A. R. Gruber, C. Kilgus, A. Mosig, I. L. Hofacker, W. Hennig, and P. F. Stadler
Arthropod 7SK RNA
Mol. Biol. Evol., September 1, 2008; 25(9): 1923 - 1930.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
G. H. Gile and P. J. Keeling
Nucleus-Encoded Periplastid-Targeted EFL in Chlorarachniophytes
Mol. Biol. Evol., September 1, 2008; 25(9): 1967 - 1977.
[Abstract] [Full Text] [PDF]


Home page
Biol LettHome page
F. Burki, K. Shalchian-Tabrizi, and J. Pawlowski
Phylogenomics reveals a new 'megagroup' including most photosynthetic eukaryotes
Biol Lett, August 23, 2008; 4(4): 366 - 369.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. Rausch, A.-K. Emde, D. Weese, A. Doring, C. Notredame, and K. Reinert
Segment-based multiple sequence alignment
Bioinformatics, August 15, 2008; 24(16): i187 - i192.
[Abstract] [Full Text] [PDF]


Home page
RNAHome page
S. Cordey, D. Gerlach, T. Junier, E. M. Zdobnov, L. Kaiser, and C. Tapparel
The cis-acting replication elements define human enterovirus and rhinovirus species
RNA, August 1, 2008; 14(8): 1568 - 1578.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
J. Azimzadeh, P. Nacry, A. Christodoulidou, S. Drevensek, C. Camilleri, N. Amiour, F. Parcy, M. Pastuglia, and D. Bouchez
Arabidopsis TONNEAU1 Proteins Are Essential for Preprophase Band Formation and Interact with Centrin
PLANT CELL, August 1, 2008; 20(8): 2146 - 2159.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
B. G. Hall
How Well Does the HoT Score Reflect Sequence Alignment Accuracy?
Mol. Biol. Evol., August 1, 2008; 25(8): 1576 - 1580.
[Abstract] [Full Text] [PDF]


Home page
Brief BioinformHome page
K. Katoh and H. Toh
Recent developments in the MAFFT multiple sequence alignment program
Brief Bioinform, July 1, 2008; 9(4): 286 - 298.
[Abstract] [Full Text] [PDF]


Home page
J. Clin. Microbiol.Home page
M. C. Oosthuizen, E. Zweygarth, N. E. Collins, M. Troskie, and B. L. Penzhorn
Identification of a Novel Babesia sp. from a Sable Antelope (Hippotragus niger Harris, 1838)
J. Clin. Microbiol., July 1, 2008; 46(7): 2247 - 2251.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
H.-M. Bourbon
Comparative genomics supports a deep evolutionary origin for the large, four-module transcriptional mediator complex
Nucleic Acids Res., July 1, 2008; 36(12): 3993 - 4008.
[Abstract] [Full Text] [PDF]


Home page
MycologiaHome page
E. M. de Meyer, Z. W. de Beer, R. C. Summerbell, A.M. Moharram, G. S. de Hoog, H. F. Vismer, and M. J. Wingfield
Taxonomy and phylogeny of new wood- and soil-inhabiting Sporothrix species in the Ophiostoma stenoceras-Sporothrix schenckii complex
Mycologia, July 1, 2008; 100(4): 647 - 661.
[Abstract] [Full Text] [PDF]


Home page
J. Cell Sci.Home page
M. Kirkham, S. J. Nixon, M. T. Howes, L. Abi-Rached, D. E. Wakeham, M. Hanzal-Bayer, C. Ferguson, M. M. Hill, M. Fernandez-Rojo, D. A. Brown, et al.
Evolutionary analysis and molecular dissection of caveola biogenesis
J. Cell Sci., June 15, 2008; 121(12): 2075 - 2086.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
J. Orlowski and J. M. Bujnicki
Structural and evolutionary classification of Type II restriction enzymes based on theoretical and experimental analyses
Nucleic Acids Res., June 1, 2008; 36(11): 3552 - 3569.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
J. Schmitz, A. Zemann, G. Churakov, H. Kuhl, F. Grutzner, R. Reinhardt, and J. Brosius
Retroposed SNOfall--A mammalian-wide comparison of platypus snoRNAs
Genome Res., June 1, 2008; 18(6): 1005 - 1010.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
M. Komatsu, M. Tsuda, S. Omura, H. Oikawa, and H. Ikeda
Identification and functional analysis of genes controlling biosynthesis of 2-methylisoborneol
PNAS, May 27, 2008; 105(21): 7422 - 7427.
[Abstract] [Full Text] [PDF]


Home page
J. Gen. Virol.Home page
M. Quan, M. van Vuuren, P. G. Howell, D. Groenewald, and A. J. Guthrie
Molecular epidemiology of the African horse sickness virus S10 gene
J. Gen. Virol., May 1, 2008; 89(5): 1159 - 1168.
[Abstract] [Full Text] [PDF]


Home page
ScienceHome page
K. M. Wong, M. A. Suchard, and J. P. Huelsenbeck
Alignment Uncertainty and Genomic Analysis
Science, January 25, 2008; 319(5862): 473 - 476.
[Abstract] [Full Text] [PDF]


Home page
J. Bacteriol.Home page
S. Moslavac, K. Nicolaisen, O. Mirus, F. Al Dehni, R. Pernil, E. Flores, I. Maldener, and E. Schleiff
A TolC-Like Protein Is Required for Heterocyst Development in Anabaena sp. Strain PCC 7120
J. Bacteriol., November 1, 2007; 189(21): 7887 - 7895.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M.A. Larkin, G. Blackshields, N.P. Brown, R. Chenna, P.A. McGettigan, H. McWilliam, F. Valentin, I.M. Wallace, A. Wilm, R. Lopez, et al.
Clustal W and Clustal X version 2.0
Bioinformatics, November 1, 2007; 23(21): 2947 - 2948.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
T. Golubchik, M. J. Wise, S. Easteal, and L. S. Jermiin
Mind the Gaps: Evidence of Bias in Estimates of Multiple Sequence Alignments
Mol. Biol. Evol., November 1, 2007; 24(11): 2433 - 2442.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
E. Benavides, R. Baum, D. McClellan, and J. W. Sites
Molecular Phylogenetics of the Lizard Genus Microlophus (Squamata:Tropiduridae): Aligning and Retrieving Indel Signal from Nuclear Introns
Syst Biol, October 1, 2007; 56(5): 776 - 797.
[Abstract] [Full Text] [PDF]


Home page
Mol Biol EvolHome page
Y. Hashiguchi and M. Nishida
Evolution of Trace Amine Associated Receptor (TAAR) Gene Family in Vertebrates: Lineage-Specific Expansions and Degradations of a Second Class of Vertebrate Chemosensory Receptors Expressed in the Olfactory Epithelium
Mol. Biol. Evol., September 1, 2007; 24(9): 2099 - 2107.
[Abstract] [Full Text] [PDF]


Home page
Syst BiolHome page
G. Talavera and J. Castresana
Improvement of Phylogenies after Removing Divergent and Ambiguously Aligned Blocks from Protein Sequence Alignments
Syst Biol, August 1, 2007; 56(4): 564 - 577.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
F.-C. Chen, C.-J. Chen, and T.-J. Chuang
INDELSCAN: a web server for comparative identification of species-specific and non-species-specific insertion/deletion events
Nucleic Acids Res., July 13, 2007; 35(suppl_2): W633 - W638.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
T. J. Wheeler and J. D. Kececioglu
Multiple alignment by aligning alignments
Bioinformatics, July 1, 2007; 23(13): i559 - i568.
[Abstract] [Full Text] [PDF]


Home page
J. Immunol.Home page
L. Abi-Rached, K. Dorighi, P. J. Norman, M. Yawata, and P. Parham
Episodes of Natural Selection Shaped the Interactions of IgA-Fc with Fc{alpha}RI and Bacterial Decoy Proteins
J. Immunol., June 15, 2007; 178(12): 7943 - 7954.
[Abstract] [Full Text] [PDF]


Home page
Plant CellHome page
M. J. Baumann, J. M. Eklof, G. Michel, A. M. Kallas, T. T. Teeri, M. Czjzek, and H. Brumer III
Structural Evidence for the Evolution of Xyloglucanase Activity from Xyloglucan Endo-Transglycosylases: Biological Implications for Cell Wall Metabolism
PLANT CELL, June 1, 2007; 19(6): 1947 - 1963.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
D. Przybylski and B. Rost
Consensus sequences improve PSI-BLAST through mimicking profile-profile alignments
Nucleic Acids Res., April 1, 2007; 35(7): 2238 - 2246.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
K. Katoh and H. Toh
PartTree: an algorithm to build an approximate tree from a large number of unaligned sequences
Bioinformatics, February 1, 2007; 23(3): 372 - 374.
[Abstract] [Full Text] [PDF]


Home page
J. Biol. Chem.Home page
R. Bredemeier, T. Schlegel, F. Ertel, A. Vojta, L. Borissenko, M. T. Bohnsack, M. Groll, A. von Haeseler, and E. Schleiff
Functional and Phylogenetic Properties of the Pore-forming beta-Barrel Transporters of the Omp85 Family
J. Biol. Chem., January 19, 2007; 282(3): 1882 - 1890.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
X. Zhang and T. Kahveci
QOMA: quasi-optimal multiple alignment of protein sequences
Bioinformatics, January 15, 2007; 23(2): 162 - 168.
[Abstract] [Full Text] [PDF]


Home page
SIMHome page
K.A. Seifert, S.J. Hughes, H. Boulay, and G. Louis-Seize
Taxonomy, nomenclature and phylogeny of three cladosporium-like hyphomycetes, Sorocybe resinae, Seifertia azaleae and the Hormoconis anamorph of Amorphotheca resinae
Stud Mycol, January 1, 2007; 58(1): 235 - 245.
[Abstract] [Full Text] [PDF]


Home page
Am. J. Physiol. Regul. Integr. Comp. Physiol.Home page
G. Nishimoto, G. Sasaki, E. Yaoita, M. Nameta, H. Li, K. Furuse, H. Fujinaka, Y. Yoshida, A. Mitsudome, and T. Yamamoto
Molecular characterization of water-selective AQP (EbAQP4) in hagfish: insight into ancestral origin of AQP4
Am J Physiol Regulatory Integrative Comp Physiol, January 1, 2007; 292(1): R644 - R651.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
S. Deshmukh, R. Huckelhoven, P. Schafer, J. Imani, M. Sharma, M. Weiss, F. Waller, and K.-H. Kogel
Colloquium Paper: The root endophytic fungus Piriformospora indica requires host cell death for proliferation during mutualistic symbiosis with barley
PNAS, December 5, 2006; 103(49): 18450 - 18457.
[Abstract] [Full Text] [PDF]


Home page
Protein Eng Des SelHome page
H. Matsumura, K. Izui, and K. Mizuguchi
A novel mechanism of allosteric regulation of archaeal phosphoenolpyruvate carboxylase: a combined approach to structure-based alignment and model assessment
Protein Eng. Des. Sel., September 1, 2006; 19(9): 409 - 419.
[Abstract] [Full Text] [PDF]


Home page
MycologiaHome page
F. Oberwinkler, R. Kirschner, F. Arenal, M. Villarreal, V. Rubio, D. Begerow, and R. Bauer
Two new pycnidial members of the Atractiellales: Basidiopycnis hyalina and Proceropycnis pinicola.
Mycologia, July 1, 2006; 98(4): 637 - 649.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
A. Biegert, C. Mayer, M. Remmert, J. Soding, and A. N. Lupas
The MPI Bioinformatics Toolkit for protein sequence analysis.
Nucleic Acids Res., July 1, 2006; 34(Web Server issue): W335 - W339.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
I. M. Wallace, O. O'Sullivan, D. G. Higgins, and C. Notredame
M-Coffee: combining multiple sequence alignment methods with T-Coffee
Nucleic Acids Res., March 23, 2006; 34(6): 1692 - 1699.
[Abstract] [Full Text] [PDF]


Home page
Biol. Bull.Home page
A. Schulze
Phylogeny and genetic diversity of palolo worms (palola, eunicidae) from the tropical north pacific and the Caribbean.
Biol. Bull., February 1, 2006; 210(1): 25 - 37.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Chakrabarti, C. J. Lanczycki, A. R. Panchenko, T. M. Przytycka, P. A. Thiessen, and S. H. Bryant
Refining multiple sequence alignments with conserved core regions.
Nucleic Acids Res., January 1, 2006; 34(9): 2598 - 2606.
[Abstract] [Full Text] [PDF]


Home page
SIMHome page
M. Gryzenhout, H. Myburg, C. S. Hodges, B. D. Wingfield, and M. J. Wingfield
Microthia, Holocryphia and Ursicollum, three new genera on Eucalyptus and Coccoloba for fungi previously known as Cryphonectria.
Stud Mycol, January 1, 2006; 55: 35 - 52.
[Abstract] [Full Text] [PDF]


Home page
SIMHome page
G. Nakabonge, M. Gryzenhout, J. Roux, B. D. Wingfield, and M. J. Wingfield
Celoporthe dispersa gen. et sp. nov. from native Myrtales in South Africa.
Stud Mycol, January 1, 2006; 55: 255 - 267.
[Abstract] [Full Text] [PDF]


Home page
SIMHome page
X. Zhou, Z. W. de Beer, and M. J. Wingfield
DNA sequence comparisons of Ophiostoma spp., including Ophiostoma aurorae sp. nov., associated with pine bark beetles in South Africa.
Stud Mycol, January 1, 2006; 55: 269 - 277.
[Abstract] [Full Text] [PDF]


Home page
SIMHome page
Z. W. de Beer, D. Begerow, R. Bauer, G. S. Pegg, P. W. Crous, and M. J. Wingfield
Phylogeny of the Quambalariaceae fam. nov., including important Eucalyptus pathogens in South Africa and Australia.
Stud Mycol, January 1, 2006; 55: 289 - 298.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
S. Hartmann, D. Lu, J. Phillips, and T. J. Vision
Phytome: a platform for plant comparative genomics
Nucleic Acids Res., January 1, 2006; 34(suppl_1): D724 - D730.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
T. Lassmann and E. L. L. Sonnhammer
Automatic assessment of alignment quality
Nucleic Acids Res., December 16, 2005; 33(22): 7120 - 7128.
[Abstract] [Full Text] [PDF]


Home page
Genome ResHome page
A. Pavlicek, R. House, A. J. Gentles, J. Jurka, and B. E. Morrow
Traffic of genetic information between segmental duplications flanking the typical 22q11.2 deletion in velo-cardio-facial syndrome/DiGeorge syndrome
Genome Res., November 1, 2005; 15(11): 1487 - 1495.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
D. G. Higgins, G. Blackshields, and I. M. Wallace
Mind the gaps: Progress in progressive alignment
PNAS, July 26, 2005; 102(30): 10411 - 10412.
[Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
A. Loytynoja and N. Goldman
From The Cover: An algorithm for progressive multiple alignment of sequences with insertions
PNAS, July 26, 2005; 102(30): 10557 - 10562.
[Abstract] [Full Text] [PDF]


This Article
Right arrow Abstract Freely available
Right arrow Print PDF (291K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in ISI Web of Science
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (364)
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Katoh, K.
Right arrow Articles by Miyata, T.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Katoh, K.
Right arrow Articles by Miyata, T.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?