Nucleic Acids Research Advance Access originally published online on October 26, 2006
Nucleic Acids Research 2006 34(20):5932-5942; doi:10.1093/nar/gkl511
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2006, Vol. 34, No. 20 5932-5942
© 2006 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
Multiple alignment of protein sequences with repeats and rearrangements
1 Department of Computer Science, Posts & Telecommunications Institute of Technology Hanoi, Vietnam 2 Department of Computer Science, Stanford University Stanford, CA, USA 3 195 Roque Moraes Drive, Mill Valley CA 94941, USA
*To whom correspondence should be addressed. Email: serafim{at}cs.stanford.edu
Received April 25, 2006. Revised June 28, 2006. Accepted July 5, 2006.
Multiple sequence alignments are the usual starting point for analyses of protein structure and evolution. For proteins with repeated, shuffled and missing domains, however, traditional multiple sequence alignment algorithms fail to provide an accurate view of homology between related proteins, because they either assume that the input sequences are globally alignable or require locally alignable regions to appear in the same order in all sequences. In this paper, we present ProDA, a novel system for automated detection and alignment of homologous regions in collections of proteins with arbitrary domain architectures. Given an input set of unaligned sequences, ProDA identifies all homologous regions appearing in one or more sequences, and returns a collection of local multiple alignments for these regions. On a subset of the BAliBASE benchmarking suite containing curated alignments of proteins with complicated domain architectures, ProDA performs well in detecting conserved domain boundaries and clustering domain segments, achieving the highest accuracy to date for this task. We conclude that ProDA is a practical tool for automated alignment of protein sequences with repeats and rearrangements in their domain architecture.
*Correspondence may also be addressed to Tu Minh Phuong. Tel: 844 8692133; Fax: 8434 511408; Email: phuongtm{at}fpt.com.vn
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
K. Katoh and H. Toh Recent developments in the MAFFT multiple sequence alignment program Brief Bioinform, July 1, 2008; 9(4): 286 - 298. [Abstract] [Full Text] [PDF] |
||||
