Title: Health data analytics using scalable logistic regression with stochastic gradient descent

Authors: Gunasekaran Manogaran; Daphne Lopez

Addresses: School of Information Technology and Engineering, VIT University, Vellore, India ' School of Information Technology and Engineering, VIT University, Vellore, India

Abstract: As wearable medical sensors continuously generate enormous data, it is difficult to process and analyse. This paper focuses on developing scalable sensor data processing architecture in cloud computing to store and process body sensor data for healthcare applications. Proposed architecture uses big data technologies such as Apache Flume, Apache Pig and Apache HBase to collect and store huge sensor data in the Amazon web service. Apache Mahout implementation of MapReduce-based online stochastic gradient descent algorithm is used in the logistic regression to develop the scalable diagnosis model. Cleveland heart disease database (CHDD) is used to train the logistic regression model. Wearable body sensors are used to get the blood pressure, blood sugar level and heart rate of the patient to predict the heart disease status. Proposed prediction model efficiently classifies the heart disease with the accuracy of training and validation sample is 81.99% and 81.52%, respectively.

Keywords: stochastic gradient descent; SGD; mapreduce logistic regression; wearable body sensor; sensor data; big data; heart disease; cloud computing; Cleveland heart disease database; CHDD; Amazon web service; AWS; scalable diagnosis model.

DOI: 10.1504/IJAIP.2018.089494

International Journal of Advanced Intelligence Paradigms, 2018 Vol.10 No.1/2, pp.118 - 132

Received: 13 Apr 2016
Accepted: 12 Jun 2016

Published online: 29 Jan 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article