Nucleic Acids Research Advance Access originally published online on July 7, 2009
Nucleic Acids Research 2009 37(17):e117; doi:10.1093/nar/gkp559
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 17 e117
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
Hybridization modeling of oligonucleotide SNP arrays for accurate DNA copy number estimation
1School of Mathematical Sciences, Peking University, Beijing 100871 China, 2The Computational Genomics Lab, Department of Epidemiology, Michigan State University, East Lansing, MI 48824, 3Department of Biochemistry, Michigan State University, East Lansing, MI 48824, 4Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824 and 5Department of Epidemiology and Biostatistics, Case Western Reserve University, Cleveland, OH 44106, USA
*To whom correspondence should be addressed. Tel: +1 (517) 353 8623 ext. 113; Fax: +1 (517) 432 1130; Email: fuw{at}msu.edu
Received February 4, 2009. Revised June 15, 2009. Accepted June 16, 2009.
Affymetrix SNP arrays have been widely used for single-nucleotide polymorphism (SNP) genotype calling and DNA copy number variation inference. Although numerous methods have achieved high accuracy in these fields, most studies have paid little attention to the modeling of hybridization of probes to off-target allele sequences, which can affect the accuracy greatly. In this study, we address this issue and demonstrate that hybridization with mismatch nucleotides (HWMMN) occurs in all SNP probe-sets and has a critical effect on the estimation of allelic concentrations (ACs). We study sequence binding through binding free energy and then binding affinity, and develop a probe intensity composite representation (PICR) model. The PICR model allows the estimation of ACs at a given SNP through statistical regression. Furthermore, we demonstrate with cell-line data of known true copy numbers that the PICR model can achieve reasonable accuracy in copy number estimation at a single SNP locus, by using the ratio of the estimated AC of each sample to that of the reference sample, and can reveal subtle genotype structure of SNPs at abnormal loci. We also demonstrate with HapMap data that the PICR model yields accurate SNP genotype calls consistently across samples, laboratories and even across array platforms.