Published online 5 January 2006
Article |
Application of a superword array in genome assembly
Department of Computer Science, Iowa State University Ames, IA 50011, USA 1Genome Sequencing Center, Washington University School of Medicine St Louis, MO 63108, USA
*To whom correspondence should be addressed at Department of Computer Science, Iowa State University, 226 Atanasoff Hall, Ames, IA 50011-1040, USA. Tel: +1 515 294 2432; Fax: +1 515 294 0258; Email: xqhuang{at}cs.iastate.edu
Received October 3, 2005. Revised November 9, 2005. Accepted December 13, 2005.
We introduce a data structure called a superword array for finding quickly matches between DNA sequences. The superword array possesses some desirable features of the lookup table and suffix array. We describe simple algorithms for constructing and using a superword array to find pairs of sequences that share a unique superword. The algorithms are implemented in a genome assembly program called PCAP.REP for computation of overlaps between reads. Experimental results produced by PCAP.REP and PCAP on a whole-genome dataset show that PCAP.REP produced a more accurate and contiguous assembly than PCAP.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
R. L. Warren, D. Varabei, D. Platt, X. Huang, D. Messina, S.-P. Yang, J. W. Kronstad, M. Krzywinski, W. C. Warren, J. W. Wallis, et al. Physical map-assisted whole-genome shotgun sequence assemblies. Genome Res., June 1, 2006; 16(6): 768 - 775. [Abstract] [Full Text] [PDF] |
||||
