Published online 1 March 2006
Article |
A data-driven clustering method for time course gene expression data
Department of Statistics, Harvard University Cambridge, MA 02138, USA
*To whom correspondence should be addressed. Tel: +1 617 495 1600; Fax: +1 617 496 8057; Email: jliu{at}stat.harvard.edu
Received December 1, 2005. Revised January 29, 2006. Accepted February 13, 2006.
Gene expression over time is, biologically, a continuous process and can thus be represented by a continuous function, i.e. a curve. Individual genes often share similar expression patterns (functional forms). However, the shape of each function, the number of such functions, and the genes that share similar functional forms are typically unknown. Here we introduce an approach that allows direct discovery of related patterns of gene expression and their underlying functions (curves) from data without a priori specification of either cluster number or functional form. Smoothing spline clustering (SSC) models natural properties of gene expression over time, taking into account natural differences in gene expression within a cluster of similarly expressed genes, the effects of experimental measurement error, and missing data. Furthermore, SSC provides a visual summary of each cluster's gene expression function and goodness-of-fit by way of a mean curve construct and its associated confidence bands. We apply this method to gene expression data over the life-cycle of Drosophila melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns of gene expression in each species, respectively. New and previously described expression patterns in both species are discovered, the majority of which are biologically meaningful and exhibit statistically significant gene function enrichment. Software and source code implementing the algorithm, SSCLUST, is freely available (http://genemerge.bioteam.net/SSClust.html).
The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
Z. H. Gumus, B. Du, A. Kacker, J. O. Boyle, J. M. Bocker, P. Mukherjee, K. Subbaramaiah, A. J. Dannenberg, and H. Weinstein Effects of Tobacco Smoke on Gene Expression and Cellular Pathways in a Cellular Model of Oral Leukoplakia Cancer Prevention Research, July 1, 2008; 1(2): 100 - 111. [Abstract] [Full Text] [PDF] |
||||
![]() |
R. Tang and H.-G. Muller Time-synchronized clustering of gene expression trajectories Biostat., May 22, 2008; (2008) kxn011v1. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. Sahoo, D. L. Dill, R. Tibshirani, and S. K. Plevritis Extracting binary signals from microarray time-course data Nucleic Acids Res., June 28, 2007; 35(11): 3705 - 3712. [Abstract] [Full Text] [PDF] |
||||
![]() |
L. Wang, G. Chen, and H. Li Group SCAD regression analysis for microarray time course gene expression data Bioinformatics, June 15, 2007; 23(12): 1486 - 1494. [Abstract] [Full Text] [PDF] |
||||



