Nucleic Acids Research Advance Access originally published online on August 6, 2009
Nucleic Acids Research 2009 37(18):5943-5958; doi:10.1093/nar/gkp625
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Nucleic Acids Research, 2009, Vol. 37, No. 18 5943-5958
© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
Computational Biology |
Predicting eukaryotic transcriptional cooperativity by Bayesian network integration of genome-wide data
1Bioinformatics Program, Department of Chemistry, Boston University, Boston, MA 02215, USA and 2Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China
*To whom correspondence should be addressed. Tel: 617 358 2302; Fax: 617 353 4814; Email: yuxia{at}bu.edu
Received March 12, 2009. Revised July 13, 2009. Accepted July 13, 2009.
Transcriptional cooperativity among several transcription factors (TFs) is believed to be the main mechanism of complexity and precision in transcriptional regulatory programs. Here, we present a Bayesian network framework to reconstruct a high-confidence whole-genome map of transcriptional cooperativity in Saccharomyces cerevisiae by integrating a comprehensive list of 15 genomic features. We design a Bayesian network structure to capture the dominant correlations among features and TF cooperativity, and introduce a supervised learning framework with a well-constructed gold-standard dataset. This framework allows us to assess the predictive power of each genomic feature, validate the superior performance of our Bayesian network compared to alternative methods, and integrate genomic features for optimal TF cooperativity prediction. Data integration reveals 159 high-confidence predicted cooperative relationships among 105 TFs, most of which are subsequently validated by literature search. The existing and predicted transcriptional cooperativities can be grouped into three categories based on the combination patterns of the genomic features, providing further biological insights into the different types of TF cooperativity. Our methodology is the first supervised learning approach for predicting transcriptional cooperativity, compares favorably to alternative unsupervised methodologies, and can be applied to other genomic data integration tasks where high-quality gold-standard positive data are scarce.