Skip Navigation

Nucleic Acids Research 2006 34(5):1608-1619; doi:10.1093/nar/gkl047
This Article
Right arrow Full Text Freely available
Right arrow Print PDF (296K) Freely available
Right arrow Screen PDF (198K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrow Search for citing articles in:
ISI Web of Science (10)
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Gan, X.
Right arrow Articles by Yan, H.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Gan, X.
Right arrow Articles by Yan, H.
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published online 20 March 2006

© The Author 2006. Published by Oxford University Press. All rights reserved
The online version of this article has been published under an open access model. Users are entitled to use, reproduce, disseminate, or display the open access version of this article for non-commercial purposes provided that: the original authorship is properly and fully attributed; the Journal and Oxford University Press are attributed as the original place of publication with the correct citation details given; if an article is subsequently reproduced or disseminated not in its entirety but only in part or as a derivative work this must be clearly indicated. For commercial re-use, please contact journals.permissions@oxfordjournals.org


V1 Comp methods and X1 microarray

Microarray missing data imputation based on a set theoretic framework and biological knowledge

Xiangchao Gan1, Alan Wee-Chung Liew2,* and Hong Yan1,3

1Department of Electronic Engineering, City University of Hong Kong 83 Tat Chee Avenue, Kowloon, Hong Kong 2Department of Computer Science and Engineering, The Chinese University of Hong Kong Shatin, Hong Kong 3School of Electrical and Information Engineering, University of Sydney NSW 2006, Australia

*To whom correspondence should be addressed. Tel: 852 26098419; Fax: 852 26035024; Email: wcliew{at}cse.cuhk.edu.hk

Received October 31, 2005. Revised February 22, 2006. Accepted February 22, 2006.

Gene expressions measured using microarrays usually suffer from the missing value problem. However, in many data analysis methods, a complete data matrix is required. Although existing missing value imputation algorithms have shown good performance to deal with missing values, they also have their limitations. For example, some algorithms have good performance only when strong local correlation exists in data while some provide the best estimate when data is dominated by global structure. In addition, these algorithms do not take into account any biological constraint in their imputation. In this paper, we propose a set theoretic framework based on projection onto convex sets (POCS) for missing data imputation. POCS allows us to incorporate different types of a priori knowledge about missing values into the estimation process. The main idea of POCS is to formulate every piece of prior knowledge into a corresponding convex set and then use a convergence-guaranteed iterative procedure to obtain a solution in the intersection of all these sets. In this work, we design several convex sets, taking into consideration the biological characteristic of the data: the first set mainly exploit the local correlation structure among genes in microarray data, while the second set captures the global correlation structure among arrays. The third set (actually a series of sets) exploits the biological phenomenon of synchronization loss in microarray experiments. In cyclic systems, synchronization loss is a common phenomenon and we construct a series of sets based on this phenomenon for our POCS imputation algorithm. Experiments show that our algorithm can achieve a significant reduction of error compared to the KNNimpute, SVDimpute and LSimpute methods.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
BioinformaticsHome page
R. Varshavsky, A. Gottlieb, D. Horn, and M. Linial
Unsupervised feature selection under perturbations: meeting the challenges of biological data
Bioinformatics, December 15, 2007; 23(24): 3343 - 3349.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Hua and Y. Lai
An ensemble approach to microarray data-based gene prioritization after missing value imputation
Bioinformatics, March 15, 2007; 23(6): 747 - 754.
[Abstract] [Full Text] [PDF]



Disclaimer:
Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.