Title: A divide-and-conquer strategy to solve the out-of-memory problem of processing thousands of Affymetrix microarrays

Authors: Chia-Ju Lee, Dong Fu, Pan Du, Hongmei Jiang, Simon M. Lin, Warren Kibbe

Addresses: Computational Biology and Bioinformatics Program, Northwestern University, 2145 Sheridan Rd., Evanston, IL 60208, USA. ' Center for Biomedical Informatics, and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 676 North St. Clair, Suite 1200, Chicago, IL 60611, USA. ' Center for Biomedical Informatics, and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 676 North St. Clair, Suite 1200, Chicago, IL 60611, USA. ' Department of Statistics, Northwestern University, 2006 Sheridan Rd., Evanston, IL 60208, USA. ' Center for Biomedical Informatics, and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 676 North St. Clair, Suite 1200, Chicago, IL 60611, USA. ' Center for Biomedical Informatics, and Robert H. Lurie Comprehensive Cancer Center, Feinberg School of Medicine, Northwestern University, 676 North St. Clair, Suite 1200, Chicago, IL 60611, USA

Abstract: Out-of-memory problem was frequently encountered when processing thousands of CEL files using Bioconductor. We propose a divide-and-conquer strategy combined with randomised resampling to solve this problem. The CAMDA 2007 META-analysis data set which contains 5896 CEL files was used to test the approach on a typical commodity computer cluster by running established pre-processing algorithms for Affymetrix arrays in the Bioconductor package. The results were validated against a golden standard obtained by using a supercomputer. In addition to the performance improvement, the general divide-and-conquer strategy can be applied to any other normalisation algorithms without modifying the underlying implementation.

Keywords: divide-and-conquer; resampling; out-of-memory; microarrays; Affymetrix arrays; R/Bioconductor; supercomputers; computer clusters; bioinformatics.

DOI: 10.1504/IJCBDD.2008.022209

International Journal of Computational Biology and Drug Design, 2008 Vol.1 No.4, pp.396 - 405

Published online: 22 Dec 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article