Nucleic Acids Research, 1992, Vol. 20, No. 11 2741-2747
© 1992
MOLECULAR BIOLOGY |
Corruption of genomic databases with anomalous sequence
Department of Neurdogy, Children's Hospital, Harvard Medical School Boston, MA 02115, USA 1Molecular Biology Computer Research Resource (MBCRR), Dana-Farber Cancer Institute, Harvard School of Public Health Boston, MA 02115, USA
*To whom correspondence should be addressed at: Department of Neurology, Enders 250, Children's Hospital, 300 Longwood Avenue, Boston, MA 02115, USA
Received March 3, 1992. Revised May 8, 1992. Accepted May 8, 1992.
We describe evidence that DNA sequences from vectors used for cloning and sequencing have been incorporated accidentally into eukaryotic entries in the GenBank database. These incorporations were not restricted to one type of vector or to a single mechanism. Many minor instances may have been the result of simple editing errors, but some entries contained large blocks of vector sequence that had been incorporated by contamination or other accidents during cloning. Some cases involved unusual rearrangements and areas of vector distant from the normal insertion sites. Matches to vector were found in 0.23% of 20,000 sequences analyzed in GenBank Release 63. Although the possibility of anomalous sequence incorporation has been recognized since the inception of GenBank and should be easy to avoid, recent evidence suggests that this problem is increasing more quickly than the database Itself. The presence of anomalous sequence may have serious consequences for the interpretation and use of database entries, and will have an impact on issues of database management. The incorporated vector fragments described here may also be useful for a crude estimate of the fidelity of sequence information in the database. In alignments with well-defined ends, the matching sequences showed 96.8% identity to vector; when poorer matches with arbitrary limits were included, the aggregate identity to vector sequence was 94.8%.
+Present address: MBCRR, Molecular Engineering Research Center, Boston University, 36 Cummington St., Boston, MA 02215, USA
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
C. Liang, Y. Liu, L. Liu, A. C. Davis, Y. Shen, and Q. Q. Li Expressed Sequence Tags With cDNA Termini: Previously Overlooked Resources for Gene Annotation and Transcriptome Exploration in Chlamydomonas reinhardtii Genetics, May 1, 2008; 179(1): 83 - 93. [Abstract] [Full Text] [PDF] |
||||
![]() |
I. S. Kohane Bioinformatics and Clinical Informatics: The Imperative to Collaborate J. Am. Med. Inform. Assoc., September 1, 2000; 7(5): 512 - 516. [Full Text] |
||||

