Nucleic Acids Research, 1984, Vol. 12, No. 1Part1 215-226
© 1984
MAIN-FRAME COMPUTERS |
On the statistical significance of nucleic add similarities
1Mathematical Research Branch, NIADDK, National Institutes of Health Bethesda, MD 20205 2Departments of Mathematics and Biological Sciences, University of Southern California Los Angeles, CA 90089, USA
Received July 22, 1983. When evaluating sequence similarities among nucleic acids by the usual methods, statistical significance is often found when the biological significance of the similarity is dubious. We deinonsttrate that the known statistical properties of nucleic acid sequences strongly affect the statistical distribution of similarity values when calculated by standard procedures. We propose a series of models which account for some of these known statistical properties. The uti1ity of the method is demonstrated in evaluating high relative similarity scores in four specific cases in which there is little biological context by which to judge the similarities. In two of the cases we identify the statistical properties which are responsible for the apparent similarity. In the research-article two cases the statistical significance of the similarity persists even when the known statistical properties of sequences are modelled. For one of these cases biological significance is likely while the research-article case remains an enigma.