Skip Navigation



Nucleic Acids Research Advance Access published online on July 3, 2009

Nucleic Acids Research, doi:10.1093/nar/gkp554
This Article
Right arrow Full Text Freely available
Right arrow Print PDF (438K) Freely available
Right arrow Screen PDF (1601K) Freely available
Right arrowOA All Versions of this Article:
37/16/5246    most recent
gkp554v1
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Zamdborg, L.
Right arrow Articles by Ma, P.
PubMed
Right arrow PubMed Citation
Right arrow Articles by Zamdborg, L.
Right arrow Articles by Ma, P.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

© 2009 The Author(s)
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.


Computational Biology

Discovery of protein–DNA interactions by penalized multivariate regression

Leonid Zamdborg1,2 and Ping Ma2,3,*

1Center for Biophysics and Computational Biology, 2Institute for Genomic Biology and 3Department of Statistics, University of Illinois at Urbana-Champaign, IL, USA

*To whom correspondence should be addressed. Tel: +1 217 244 7095; Fax: +1 217 244 7190; Email: pingma{at}illinois.edu

Received May 15, 2009. Revised June 11, 2009. Accepted June 14, 2009.

Discovering which regulatory proteins, especially transcription factors (TFs), are active under certain experimental conditions and identifying the corresponding binding motifs is essential for understanding the regulatory circuits that control cellular programs. The experimental methods used for this purpose are laborious. Computational methods have been proven extremely effective in identifying TF-binding motifs (TFBMs). In this article, we propose a novel computational method called MotifExpress for discovering active TFBMs. Unlike existing methods, which either use only DNA sequence information or integrate sequence information with a single-sample measurement of gene expression, MotifExpress integrates DNA sequence information with gene expression measured in multiple samples. By selecting TFBMs that are significantly associated with gene expression, we can identify active TFBMs under specific experimental conditions and thus provide clues for the construction of regulatory networks. Compared with existing methods, MotifExpress substantially reduces the number of spurious results. Statistically, MotifExpress uses a penalized multivariate regression approach with a composite absolute penalty, which is highly stable and can effectively find the globally optimal set of active motifs. We demonstrate the excellent performance of MotifExpress by applying it to synthetic data and real examples of Saccharomyces cerevisiae. MotifExpress is available at http://www.stat.illinois.edu/~pingma/MotifExpress.htm.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?




Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.