Published online 30 September 2004
Nucleic Acids Research, Vol. 32 No. 17 © Oxford University Press 2004; all rights reserved
Adjust quality scores from alignment and improve sequencing accuracy
Computational Biology, University of Southern California, Los Angeles, CA, USA
* To whom correspondence should be addressed. Tel: +1 213 740 2407; Fax: +1 213 740 2437; Email: lilei{at}usc.edu
Received July 9, 2004; Revised and Accepted September 8, 2004
In shotgun sequencing, statistical reconstruction of a consensus from alignment requires a model of measurement error. Churchill and Waterman proposed one such model and an expectationmaximization (EM) algorithm to estimate sequencing error rates for each assembly matrix. Ewing and Green defined Phred quality scores for base-calling from sequencing traces by training a model on a large amount of data. However, sample preparations and sequencing machines may work under different conditions in practice and therefore quality scores need to be adjusted. Moreover, the information given by quality scores is incomplete in the sense that they do not describe error patterns. We observe that each nucleotide base has its specific error pattern that varies across the range of quality values. We develop models of measurement error for shotgun sequencing by combining the two perspectives above. We propose a logistic model taking quality scores as covariates. The model is trained by a procedure combining an EM algorithm and model selection techniques. The training results in calibration of quality values and leads to a more accurate construction of consensus. Besides Phred scores obtained from ABI sequencers, we apply the same technique to calibrate quality values that come along with Beckman sequencers.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
W. Qu, S.-i. Hashimoto, and S. Morishita Efficient frequency-based de novo short-read clustering for error trimming in next-generation sequencing Genome Res., July 1, 2009; 19(7): 1309 - 1315. [Abstract] [Full Text] [PDF] |
||||
![]() |
F. De Bona, S. Ossowski, K. Schneeberger, and G. Ratsch Optimal spliced alignments of short sequence reads Bioinformatics, August 15, 2008; 24(16): i174 - i180. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. L.F. Johnson and M. Slatkin Inference of population genetic parameters in metagenomics: A clean look at messy data Genome Res., October 1, 2006; 16(10): 1320 - 1327. [Abstract] [Full Text] [PDF] |
||||

