Nucleic Acids Research, 2003, Vol. 31, No. 13 3525-3526
© 2003 Oxford University Press
MAVID multiple alignment server
Department of Mathematics, 970 Evans Hall, UC Berkeley, Berkeley, CA 94720, USA
*To whom correspondence should be addressed. Tel: +1 510 642 2028; Fax: +1 510 642 8204; Email: lpachter{at}math.berkeley.edu
Received February 15, 2003; Revised April 4, 2003. Accepted April 16, 2003
| ABSTRACT |
|---|
|
|
|---|
MAVID is a multiple alignment program suitable for many large genomic regions. The MAVID web server allows biomedical researchers to quickly obtain multiple alignments for genomic sequences and to subsequently analyse the alignments for conserved regions. MAVID has been successfully used for the alignment of closely related species such as primates and also for the alignment of more distant organisms such as human and fugu. The server is fast, capable of aligning hundreds of kilobases in less than a minute. The multiple alignment is used to build a phylogenetic tree for the sequences, which is subsequently used as a basis for identifying conserved regions in the alignment. The server can be accessed at http://baboon.math.berkeley.edu/mavid/.
| INTRODUCTION |
|---|
|
|
|---|
The comparison of the mouse and human genomes (1) has demonstrated the power of comparative genomics in inferring the evolutionary history of species and in identifying functional regions in genomes. The possibilities for identifying regions under selection are enhanced with the addition of more sequences and this observation has led to numerous focused sequencing projects which seek to obtain sequence for a small region of a genome in numerous other organisms (2).
Biologists who seek to analyse conserved regions among homologous sequences are faced with the daunting task of aligning large genomic regions and subsequently sifting through massive amounts of data. In order to facilitate the discovery process without requiring biologists to download and install complex software, a number of web servers for alignment and analysis have been set up in recent years (3,4). These servers align submitted sequences and then generate plots or graphs designed to help researchers identify conserved regions. A major drawback of existing servers which support the alignment of large genomic sequences is that they can only perform pairwise sequence comparisons (some servers allow for the input of multiple sequences, but only perform pairwise comparisons).
We have developed a web server to allow researchers to obtain multiple alignments for homologous sequences from multiple genomes and to extract meaningful information from the alignments. The web server uses the MAVID alignment program, which is able to quickly and accurately align large genomic regions. The program is also efficient in processing large numbers of sequences and is therefore also suitable for the alignment of mitochondrial sequences, viral genomes and other data sets for which there are many sequences.
We have designed the server with biomedical researchers in mind. At times, flexibility has been sacrificed for clarity and transparency, but the advantage is that different sequences can be easily aligned without the need to set parameters and otherwise interact with the server. For example, human, mouse and rat BACs can be aligned just as easily as chicken and fugu sequences, or hundreds of HIV sequences. The output is organized in such a way that conserved regions between subsets of sequences can be quickly identified for further investigation. Multiple alignments are provided in a variety of convenient formats which facilitate visual checking of the alignments and at the same time enable more sophisticated checking of consistency and accuracy.
The web server is freely accessible and privacy of users is ensured by not requesting user email addresses or other information.
| METHOD |
|---|
|
|
|---|
MAVID is a progressive alignment program (Bray and Pachter, manuscript submitted). The program works by recursively aligning the alignments at ancestral nodes of the guide tree. At each internal node, ancestral sequences are inferred from the existing alignments using maximum likelihood and these alignments are then aligned using the AVID program (5).
The server goes through a number of steps:
- Sequences are repeat-masked using the DUST program (Tatusov and Lipman, unpublished).
- A random (almost complete) binary guide tree is generated for alignment of the sequences using the progressive alignment method.
- The sequences are aligned using MAVID.
- A phylogenetic tree is inferred from the multiple alignment using the neighbor joining method.
- Steps 3 and 4 are repeated for a total of three iterations.
- Pairwise alignments are generated from the multiple alignment with respect to all of the sequences and these are used to generate conservation plots and to identifiy conserved regions.
| FEATURES |
|---|
|
|
|---|
The MAVID server supports a number of functions that are useful for biomedical researchers. Input to the server consists solely of a set of sequences in multi-FASTA format; prior knowledge of the phylogenetic tree relating the sequences is not required. Output (Fig. 1) includes the multiple alignment, phylogenetic tree and ATV applet for visualization (6), visualization of the induced pairwise alignments in VISTA format (3) and conserved (not just similar) regions. Results are saved on a web page which users can bookmark for future reference; sequences can also be submitted anonymously.
|
| REFERENCES |
|---|
|
|
|---|
- Waterston,R.H., Lindblad-Toh,K., Birney,E., Rogers,J., Abril,J.F., Agarwal,P., Agarwala,R., Ainscough,R., Alexandersson,M., An,P. et al. (2002) Initial sequencing and comparative analysis of the mouse genome. Nature, 420, 520562.[CrossRef][Medline]
- Boffelli,D., McAuliffe,J., Ovcharenko,D., Lewis,K., Ovcharenko,I., Pachter,L. and Rubin,E.M. (2003) Phylogenetic shadowing of primate sequences to find functional regions in the human genome. Science, 299, 13911394.
[Abstract/Free Full Text] - Mayor,C., Brudno,M., Schwartz,J.R., Poliakov,A., Rubin,E.M., Frazer,K.A., Pachter,L. and Dubchak,I. (2000) VISTA: visualizing global DNA sequence alignments of arbitrary length. Bioinformatics, 16, 10461047.
[Abstract/Free Full Text] - Schwartz,S., Zhang,Z., Frazer,K.A., Smit,A., Riemer,C., Bouck,J., Gibbs,R., Hardison,R. and Miller,W. (2000) PiPMakera web server for aligning two genomic DNA sequences. Genome Res., 10, 577586.
[Abstract/Free Full Text] - Bray,N., Dubchak,I. and Pachter,L. (2003) AVID: a global alignment program. Genome Res., 13, 97102.
[Abstract/Free Full Text] - Zmasek,C.M. and Eddy,S.R. (2001) ATV: display and manipulation of annotated phylogenetic trees. Bioinformatics, 17, 383384.
[Abstract/Free Full Text]
This article has been cited by other articles:
![]() |
M. Irimia, S. W. Roy, D. E. Neafsey, J. F. Abril, J. Garcia-Fernandez, and E. V. Koonin Complex selection on 5' splice sites in intron-rich organisms Genome Res., November 1, 2009; 19(11): 2021 - 2027. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Bashir, C. Ye, A. L. Price, and V. Bafna Orthologous repeats and mammalian phylogenetic inference Genome Res., July 1, 2005; 15(7): 998 - 1006. [Abstract] [Full Text] [PDF] |
||||
![]() |
Y. Zhang and M. S. Waterman An Eulerian path approach to local multiple alignment for DNA sequences PNAS, February 1, 2005; 102(5): 1285 - 1290. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Boffelli, C. V. Weer, L. Weng, K. D. Lewis, M. I. Shoukry, L. Pachter, D. N. Keys, and E. M. Rubin Intraspecies sequence comparisons for annotating genomes Genome Res., December 1, 2004; 14(12): 2406 - 2411. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Coventry, D. J. Kleitman, and B. Berger MSARI: Multiple sequence alignments for statistical detection of RNA secondary structure PNAS, August 17, 2004; 101(33): 12102 - 12107. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. C.E. Darling, B. Mau, F. R. Blattner, and N. T. Perna Mauve: Multiple Alignment of Conserved Genomic Sequence With Rearrangements Genome Res., July 1, 2004; 14(7): 1394 - 1403. [Abstract] [Full Text] [PDF] |
||||
![]() |
V. B. Yap and L. Pachter Identification of Evolutionary Hotspots in the Rodent Genomes Genome Res., April 1, 2004; 14(4): 574 - 579. [Abstract] [Full Text] [PDF] |
||||
![]() |
N. Bray and L. Pachter MAVID: Constrained Ancestral Alignment of Multiple Sequences Genome Res., April 1, 2004; 14(4): 693 - 699. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Blanchette, W. J. Kent, C. Riemer, L. Elnitski, A. F.A. Smit, K. M. Roskin, R. Baertsch, K. Rosenbloom, H. Clawson, E. D. Green, et al. Aligning Multiple Genomic Sequences With the Threaded Blockset Aligner Genome Res., April 1, 2004; 14(4): 708 - 715. [Abstract] [Full Text] [PDF] |
||||
![]() |
A. Siepel and D. Haussler Phylogenetic Estimation of Context-Dependent Substitution Rates by Maximum Likelihood Mol. Biol. Evol., March 1, 2004; 21(3): 468 - 488. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. B. Major and D. A. Jones Identification of a gadd45{beta} 3' Enhancer That Mediates SMAD3- and SMAD4-dependent Transcriptional Induction by Transforming Growth Factor {beta} J. Biol. Chem., February 13, 2004; 279(7): 5278 - 5287. [Abstract] [Full Text] [PDF] |
||||
![]() |
E. H. Margulies, M. Blanchette, NISC Comparative Sequencing Program, D. Haussler, and E. D. Green Identification and Characterization of Multi-Species Conserved Sequences Genome Res., December 1, 2003; 13(12): 2507 - 2518. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||




