Nucleic Acids Research, 1989, Vol. 17, No. 10 3951-3957
© 1989
MOLECULAR BIOLOGY |
Sequence errors described in GenBank: a means to determine the accuracy of DNA sequence interpretation
Department of Molecular Biology and Genetics and The Center for Molecular Biology, Wayne State University 4th Floor MCHT, Laboratory 13, 2727 2nd Avenue, Detroit. MI 48201, USA
Received November 14, 1988. Revised April 18, 1989. Accepted April 18, 1989.
The accuracy of nucleic acid sequence data interpretation was determined by assessing and quantifying the discrepancies reported in the GenBank database. This permitted the calculation of an Error Rate (ER) for nucleic acid sequence determination. If one assumes that most entries (TB, Total Bases) were independently verified or those without reported discrepancies were correct, the ER is 0.368 errors per 1000 bases. However, if one assumes that only those sequences with reported discrepancies (TBIQ, Total Bases from entries In Question) were verified and are thus correct, the ER is 2.887 errors per 1000 bases. This establishes the first set of limit boundaries of the ER for sequence interpretation and sequence errors within the GenBank database and provides the foundation for future assessments and the monitoring of sequence data accumulation. In addition, the ER measure provides a basis to evaluate the efficiency and merit of present and future automated nucleic acid sequencing technologies which will have a direct impact upon the final outcome of the "Human Genome Initiative".
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
U. Bhatia, K. Robison, W. Gilbert;, H. Klenk, O. White, and J. C. Venter Dealing with Database Explosion: A Cautionary Note Science, June 13, 1997; 276(5319): 1724 - 1725. [Full Text] |
||||
