A new scalable approach for missing value imputation in high-throughput microarray data on apache spark
by Madhuri Gupta; Bharat Gupta
International Journal of Data Mining and Bioinformatics (IJDMB), Vol. 23, No. 1, 2020

Abstract: Data acquisition of high-dimensional data is performed using High-Throughput Technology (HTT). Data extracted using HTT contain the large amount of missing values. Gene expression data are vital in healthcare research; therefore, reconstruction of missing value is a challenging task. In the research work, a scalable technique PC-ImNN is proposed that stands for Pearson correlation involving with Monte Carlo and modified Nearest Neighbour method to predict the missing value. Monte Carlo is the technique that uses the procedure of repeated random sampling to make numerical estimations of unknown parameters. Pearson correlation combined with Monte Carlo to maintain the distribution of estimated datapoints. Nearest Neighbour technique is applied to find the nearest estimated datapoints. Proposed model is compared with five existing imputation techniques. The result shows that proposed technique performs better in term of mean square error and imputation accuracy. In the work, Apache Spark is used to speed up the performance.

Online publication date: Fri, 28-Feb-2020

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Data Mining and Bioinformatics (IJDMB):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com