Nucleic Acids Research, 1984, Vol. 12, No. 13 5471-5474
© 1984
MOLECULAR BIOLOGY |
Fast computer search for similar DNA sequences

*University of Cambridge, Department of Zoology Downing Street, Cambridge, CB2 3EJ
University of Cambridge, Department of Pure Mathematics and Mathematical Statistics, Statistical Laboratory 16 Mill Lane, Cambridge, CB2 1SB, UK
Received April 2, 1984. Accepted June 11, 1984.
An extremely fast method of searching a nucleic acid sequence database against a probe sequence is described. The method is based the detection of deviation from expected number and deviation from random spatial distribution of sub-sequences which are unique within a sequence, and shared between that sequence and the probe. On an IBM 3081 computer, total search of an encoded form of the EMBL nucleic acid sequence database with a 1 kbase probe sequence is completed in a few seconds. Previous best methods for a similar task required a few minutes.