Nucleic Acids Research Advance Access originally published online on January 10, 2008
Nucleic Acids Research 2008 36(2):e13; doi:10.1093/nar/gkm1143
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2008, Vol. 36, No. 2 e13
© 2008 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
Identification of cancer genes using a statistical framework for multiexperiment analysis of nondiscretized array CGH data
1Netherlands Cancer Institute, Division of Molecular Biology, Plesmanlaan 121 1066 CX Amsterdam and 2Delft University of Technology, Information and Communication Theory group, Faculty of Electrical Engineering, Mathematics and Computer Science, Delft, The Netherlands
*To whom correspondence should be addressed. Tel: +31-205122000; Fax: +31-206691383; Email: j.jonkers{at}nki.nl
Correspondence may also be addressed to Lodewyk Wessels. Tel: +31-205127987; Fax: +31-206691383; Email: l.wessels{at}nki.nl
Received August 20, 2007. Revised December 7, 2007. Accepted December 10, 2007.
Tumor formation is in part driven by DNA copy number alterations (CNAs), which can be measured using microarray-based Comparative Genomic Hybridization (aCGH). Multiexperiment analysis of aCGH data from tumors allows discovery of recurrent CNAs that are potentially causal to cancer development. Until now, multiexperiment aCGH data analysis has been dependent on discretization of measurement data to a gain, loss or no-change state. Valuable biological information is lost when a heterogeneous system such as a solid tumor is reduced to these states. We have developed a new approach which inputs nondiscretized aCGH data to identify regions that are significantly aberrant across an entire tumor set. Our method is based on kernel regression and accounts for the strength of a probe's signal, its local genomic environment and the signal distribution across multiple tumors. In an analysis of 89 human breast tumors, our method showed enrichment for known cancer genes in the detected regions and identified aberrations that are strongly associated with breast cancer subtypes and clinical parameters. Furthermore, we identified 18 recurrent aberrant regions in a new dataset of 19 p53-deficient mouse mammary tumors. These regions, combined with gene expression microarray data, point to known cancer genes and novel candidate cancer genes.