Nucleic Acids Research, 2003, Vol. 31, No. 15 4663-4672
© 2003 Oxford University Press
Correcting errors in shotgun sequences
Center for Genomics and Bioinformatics, Karolinska Institutet, Berzelius väg 35, 171 77 Stockholm, Sweden
*To whom correspondence should be addressed. Tel: +46 8 728 3986; Fax: +46 8 311620; Email: martti.tammi{at}cgb.ki.se
Sequencing errors in combination with repeated regions cause major problems in shotgun sequencing, mainly due to the failure of assembly programs to distinguish single base differences between repeat copies from erroneous base calls. In this paper, a new strategy designed to correct errors in shotgun sequence data using defined nucleotide positions, DNPs, is presented. The method distinguishes single base differences from sequencing errors by analyzing multiple alignments consisting of a read and all its overlaps with other reads. The construction of multiple alignments is performed using a novel pattern matching algorithm, which takes advantage of the symmetry between indices that can be computed for similar words of the same length. This allows for rapid construction of multiple alignments, with no previous pair-wise matching of sequence reads required. Results from a C++ implementation of this method show that up to 99% of sequencing errors can be corrected, while up to 87% of the single base differences remain and up to 80% of the corrected reads contain at most one error. The results also show that the method outperforms the error correction method used in the EULER assembler. The prototype software, MisEd, is freely available from the authors for academic use.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
J. Schroder, H. Schroder, S. J. Puglisi, R. Sinha, and B. Schmidt SHREC: a short-read error correction method Bioinformatics, September 1, 2009; 25(17): 2157 - 2163. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Chaisson, D. Brinza, and P. A. Pevzner De novo fragment assembly with short mate-paired reads: Does the read length matter? Genome Res., February 1, 2009; 19(2): 336 - 346. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. J. Chaisson and P. A. Pevzner Short read fragment assembly of bacterial genomes Genome Res., February 1, 2008; 18(2): 324 - 330. [Abstract] [Full Text] [PDF] |
||||
![]() |
T. Allander, M. T. Tammi, M. Eriksson, A. Bjerkner, A. Tiveljung-Lindell, and B. Andersson From The Cover: Cloning of a human parvovirus by molecular screening of respiratory tract samples PNAS, September 6, 2005; 102(36): 12891 - 12896. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Gajer, M. Schatz, and S. L. Salzberg Automated correction of genome sequence errors Nucleic Acids Res., January 26, 2004; 32(2): 562 - 569. [Abstract] [Full Text] [PDF] |
||||



