Title: Spark based classification of microarray data using scalable artificial neural network

Authors: Mukesh Kumar; Ransingh B. Ray; Santanu K. Rath

Addresses: Department of Computer Science and Engineering, NIT Rourkela, Odisha, India ' Department of Computer Science and Engineering, NIT Rourkela, Odisha, India ' Department of Computer Science and Engineering, NIT Rourkela, Odisha, India

Abstract: Microarray data has a major drawback of a curse of dimensionality, where the number of features are huge in comparison with that of samples. The data retrieved from microarray cover the varieties in its nature, and changes observed with time. The vast amount of raw gene expression data often leads to computational and analytical challenges, including classification of the dataset into correct groups or classes. In this paper, various feature selection techniques based on statistical tests are proposed using Spark framework. After selecting the relevant features using various statistical tests, Artificial Neural Network (ANN) based on Spark framework (sf-ANN) is proposed, which runs on a scalable cluster with multiple nodes. The performance of sf-ANN is tested with the help of microarray datasets of various dimensions. A detailed comparative analysis in terms of execution time is presented on sf-ANN classifier based on Spark framework and conventional system (data is stored on a standalone machine) respectively, in order to examine its performance.

Keywords: artificial neural network; big data; feature selection; machine learning; microarray; Spark.

DOI: 10.1504/IJDMB.2017.091363

International Journal of Data Mining and Bioinformatics, 2017 Vol.19 No.4, pp.312 - 339

Accepted: 12 Feb 2018
Published online: 27 Apr 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article