Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC DatasetsReport as inadecuate

Comparing Imputation Procedures for Affymetrix Gene Expression Datasets Using MAQC Datasets - Download this document for free, or read online. Document in PDF available to download.

Advances in BioinformaticsVolume 2013 2013, Article ID 790567, 10 pages

Research Article

Department of Biostatistics, Roswell Park Cancer Institute, Buffalo, NY 14263, USA

Center for Computational Research, University at Buffalo, NYS Center of Excellence in Bioinformatics and Life Sciences, Buffalo, NY 14203, USA

Department of Biostatistics, SUNY University at Buffalo, Buffalo, NY 14214, USA

Received 26 June 2013; Accepted 28 August 2013

Academic Editor: Shandar Ahmad

Copyright © 2013 Sreevidya Sadananda Sadasiva Rao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.


Introduction. The microarray datasets from the MicroArray Quality Control MAQC project have enabled the assessment of the precision, comparability of microarrays, and other various microarray analysis methods. However, to date no studies that we are aware of have reported the performance of missing value imputation schemes on the MAQC datasets. In this study, we use the MAQC Affymetrix datasets to evaluate several imputation procedures in Affymetrix microarrays. Results. We evaluated several cutting edge imputation procedures and compared them using different error measures. We randomly deleted 5% and 10% of the data and imputed the missing values using imputation tests. We performed 1000 simulations and averaged the results. The results for both 5% and 10% deletion are similar. Among the imputation methods, we observe the local least squares method with is most accurate under the error measures considered. The k-nearest neighbor method with has the highest error rate among imputation methods and error measures. Conclusions. We conclude for imputing missing values in Affymetrix microarray datasets, using the MAS 5.0 preprocessing scheme, the local least squares method with has the best overall performance and k-nearest neighbor method with has the worst overall performance. These results hold true for both 5% and 10% missing values.

Author: Sreevidya Sadananda Sadasiva Rao, Lori A. Shepherd, Andrew E. Bruno, Song Liu, and Jeffrey C. Miecznikowski



Related documents