Skip Navigation

This Article
Right arrow Full Text Freely available
Right arrow Print PDF (201K) Freely available
Right arrow Alert me when this article is cited
Right arrow Alert me if a correction is posted
Services
Right arrow Email this article to a friend
Right arrow Similar articles in this journal
Right arrow Similar articles in PubMed
Right arrow Alert me to new issues of the journal
Right arrow Add to My Personal Archive
Right arrow Download to citation manager
Right arrowRequest Permissions
Right arrow Commercial Re-use Guidelines
for Open Access NAR Content
Google Scholar
Right arrow Articles by Bø, T. H.
Right arrow Articles by Jonassen, I.
Right arrow Search for Related Content
PubMed
Right arrow PubMed Citation
Right arrow Articles by Bø, T. H.
Right arrow Articles by Jonassen, I.
Related Collections
Right arrow Microarray
Social Bookmarking
 Add to CiteULike   Add to Connotea   Add to Del.icio.us  
What's this?

Published online 20 February 2004

Nucleic Acids Research, 2004, Vol. 32, No. 3 e34
© 2004 Oxford University Press

LSimpute: accurate estimation of missing values in microarray data with least squares methods

Trond Hellem Bø*,1, Bjarte Dysvik1 and Inge Jonassen1,2

1 Department of Informatics and 2 Computational Biology Unit, BCCS, University of Bergen, HIB, N5020 Bergen, Norway

*To whom correspondence should be addressed. Tel: +47 55584067; Fax: +47 55584199; Email: trondb{at}ii.uib.no

Microarray experiments generate data sets with information on the expression levels of thousands of genes in a set of biological samples. Unfortun ately, such experiments often produce multiple missing expression values, normally due to various experimental problems. As many algorithms for gene expression analysis require a complete data matrix as input, the missing values have to be estimated in order to analyze the available data. Alternatively, genes and arrays can be removed until no missing values remain. However, for genes or arrays with only a small number of missing values, it is desirable to impute those values. For the subsequent analysis to be as informative as possible, it is essential that the estimates for the missing gene expression values are accurate. A small amount of badly estimated missing values in the data might be enough for clustering methods, such as hierachical clustering or K-means clustering, to produce misleading results. Thus, accurate methods for missing value estimation are needed. We present novel methods for estimation of missing values in microarray data sets that are based on the least squares principle, and that utilize correlations between both genes and arrays. For this set of methods, we use the common reference name LSimpute. We compare the estimation accuracy of our methods with the widely used KNNimpute on three complete data matrices from public data sets by randomly knocking out data (labeling as missing). From these tests, we conclude that our LSimpute methods produce estimates that consistently are more accurate than those obtained using KNNimpute. Additionally, we examine a more classic approach to missing value estimation based on expectation maximization (EM). We refer to our EM implementations as EMimpute, and the estimate errors using the EMimpute methods are compared with those our novel methods produce. The results indicate that on average, the estimates from our best performing LSimpute method are at least as accurate as those from the best EMimpute algorithm.


Add to CiteULike CiteULike   Add to Connotea Connotea   Add to Del.icio.us Del.icio.us    What's this?


This article has been cited by other articles:


Home page
Brief BioinformHome page
T. Aittokallio
Dealing with missing values in large-scale studies: microarray data imputation and beyond
Brief Bioinform, December 4, 2009; (2009) bbp059v1.
[Abstract] [Full Text] [PDF]


Home page
Reproductive SciencesHome page
V. Sitras, R. Paulssen, J. Leirvik, A. Vartun, and G. Acharya
Placental Gene Expression Profile in Intrauterine Growth Restriction Due to Placental Insufficiency
Reproductive Sciences, July 1, 2009; 16(7): 701 - 711.
[Abstract] [PDF]


Home page
J. Immunol.Home page
M. E. Hystad, J. H. Myklebust, T. H. Bo, E. A. Sivertsen, E. Rian, L. Forfang, E. Munthe, A. Rosenwald, M. Chiorazzi, I. Jonassen, et al.
Characterization of Early Stages of Human B Cell Development by Gene Expression Profiling
J. Immunol., September 15, 2007; 179(6): 3662 - 3671.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. S. V. Wong, F. K. Wong, and G. R. Wood
A multi-stage approach to clustering and imputation of gene expression profiles
Bioinformatics, April 15, 2007; 23(8): 998 - 1005.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D. Hua and Y. Lai
An ensemble approach to microarray data-based gene prioritization after missing value imputation
Bioinformatics, March 15, 2007; 23(6): 747 - 754.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
D.-W. Kim, K.-Y. Lee, K. H. Lee, and D. Lee
Towards clustering of incomplete microarray data without the use of imputation
Bioinformatics, January 1, 2007; 23(1): 107 - 113.
[Abstract] [Full Text] [PDF]


Home page
Clin. Cancer Res.Home page
N. Anensen, A. M. Oyan, J.-C. Bourdon, K. H. Kalland, O. Bruserud, and B. T. Gjertsen
A Distinct p53 Protein Isoform Signature Reflects the Onset of Induction Chemotherapy for Acute Myeloid Leukemia.
Clin. Cancer Res., July 1, 2006; 12(13): 3985 - 3992.
[Abstract] [Full Text] [PDF]


Home page
Nucleic Acids ResHome page
X. Gan, A. W.-C. Liew, and H. Yan
Microarray missing data imputation based on a set theoretic framework and biological knowledge
Nucleic Acids Res., March 20, 2006; 34(5): 1608 - 1619.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
J. Tuikkala, L. Elo, O. S. Nevalainen, and T. Aittokallio
Improving missing value estimation in microarray data with gene ontology
Bioinformatics, March 1, 2006; 22(5): 566 - 572.
[Abstract] [Full Text] [PDF]


Home page
Proc. Natl. Acad. Sci. USAHome page
J.-E. Gottenberg, N. Cagnard, C. Lucchesi, F. Letourneur, S. Mistou, T. Lazure, S. Jacques, N. Ba, M. Ittah, C. Lepajolec, et al.
Activation of IFN pathways and plasmacytoid dendritic cell recruitment in target organs of primary Sjogren's syndrome
PNAS, February 21, 2006; 103(8): 2770 - 2775.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
I. Scheel, M. Aldrin, I. K. Glad, R. Sorum, H. Lyng, and A. Frigessi
The influence of missing value imputation on detection of differentially expressed genes from microarray data
Bioinformatics, December 1, 2005; 21(23): 4272 - 4279.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
R. Jornsten, H.-Y. Wang, W. J. Welsh, and M. Ouyang
DNA microarray data imputation and significance analysis of differential expression
Bioinformatics, November 15, 2005; 21(22): 4155 - 4161.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
M. S. B. Sehgal, I. Gondal, and L. S. Dooley
Collateral missing value imputation: a new robust missing value estimation algorithm for microarray data
Bioinformatics, May 15, 2005; 21(10): 2417 - 2423.
[Abstract] [Full Text] [PDF]


Home page
BioinformaticsHome page
H. Kim, G. H. Golub, and H. Park
Missing value estimation for DNA microarray gene expression data: local least squares imputation
Bioinformatics, January 15, 2005; 21(2): 187 - 198.
[Abstract] [Full Text] [PDF]



Disclaimer: Please note that abstracts for content published before 1996 were created through digital scanning and may therefore not exactly replicate the text of the original print issues. All efforts have been made to ensure accuracy, but the Publisher will not be held responsible for any remaining inaccuracies. If you require any further clarification, please contact our Customer Services Department.