Nucleic Acids Research Advance Access originally published online on June 24, 2009
Nucleic Acids Research 2009 37(17):e113; doi:10.1093/nar/gkp536
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 17 e113
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Methods Online |
Extracting transcription factor targets from ChIP-Seq data
Department of Genetics and Institute of Diabetes, Obesity and Metabolism, University of Pennsylvania School of Medicine, Philadelphia, PA 19104, USA
*To whom correspondence should be addressed. Tel: +215 898 8759; Fax: +215 573 5892; Email: kaestner{at}mail.med.upenn.edu
Received May 7, 2009. Revised June 8, 2009. Accepted June 8, 2009.
ChIP-Seq technology, which combines chromatin immunoprecipitation (ChIP) with massively parallel sequencing, is rapidly replacing ChIP-on-chip for the genome-wide identification of transcription factor binding events. Identifying bound regions from the large number of sequence tags produced by ChIP-Seq is a challenging task. Here, we present GLITR (GLobal Identifier of Target Regions), which accurately identifies enriched regions in target data by calculating a fold-change based on random samples of control (input chromatin) data. GLITR uses a classification method to identify regions in ChIP data that have a peak height and fold-change which do not resemble regions in an input sample. We compare GLITR to several recent methods and show that GLITR has improved sensitivity for identifying bound regions closely matching the consensus sequence of a given transcription factor, and can detect bona fide transcription factor targets missed by other programs. We also use GLITR to address the issue of sequencing depth, and show that sequencing biological replicates identifies far more binding regions than re-sequencing the same sample.