Nucleic Acids Research Advance Access originally published online on July 6, 2009
Nucleic Acids Research 2009 37(16):5365-5377; doi:10.1093/nar/gkp493
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 16 5365-5377
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Genomics |
Integrated study of copy number states and genotype calls using high-density SNP arrays
1Department of Biostatistics, 2Department of Genetics, University of North Carolina, Chapel Hill, NC, USA, 3Department of Genetics, Institute for Cancer Research, Oslo University Hospital-Radiumhospitalet, Oslo, Norway, 4Department of Molecular and Developmental Genetics, Vlaams Instituut voor Biotechnologie, 5Department of Human Genetics, Katholieke Universiteit Leuven, Leuven, Belgium, 6Department of Biostatistics and Bioinformatics, Emory University, Atlanta, GA and 7Lineberger Comprehensive Cancer Center, University of North Carolina, Chapel Hill, NC, USA
*To whom correspondence should be addressed. Tel: 919-966-7266; Fax: 919-966-3804; Email: wsun{at}bios.unc.edu
Correspondence may also be addressed to Charles Perou. Tel: 919-843-5740; Fax: 919-843-5718; Email: cperou{at}med.unc.edu
Received February 17, 2009. Revised April 21, 2009. Accepted May 21, 2009.
We propose a statistical framework, named genoCN, to simultaneously dissect copy number states and genotypes using high-density SNP (single nucleotide polymorphism) arrays. There are at least two types of genomic DNA copy number differences: copy number variations (CNVs) and copy number aberrations (CNAs). While CNVs are naturally occurring and inheritable, CNAs are acquired somatic alterations most often observed in tumor tissues only. CNVs tend to be short and more sparsely located in the genome compared with CNAs. GenoCN consists of two components, genoCNV and genoCNA, designed for CNV and CNA studies, respectively. In contrast to most existing methods, genoCN is more flexible in that the model parameters are estimated from the data instead of being decided a priori. GenoCNA also incorporates two important strategies for CNA studies. First, the effects of tissue contamination are explicitly modeled. Second, if SNP arrays are performed for both tumor and normal tissues of one individual, the genotype calls from normal tissue are used to study CNAs in tumor tissue. We evaluated genoCN by applications to 162 HapMap individuals and a brain tumor (glioblastoma) dataset and showed that our method can successfully identify both types of copy number differences and produce high-quality genotype calls.