Title: Health data analytics using scalable logistic regression with stochastic gradient descent
Authors: Gunasekaran Manogaran; Daphne Lopez
Addresses: School of Information Technology and Engineering, VIT University, Vellore, India ' School of Information Technology and Engineering, VIT University, Vellore, India
Abstract: As wearable medical sensors continuously generate enormous data, it is difficult to process and analyse. This paper focuses on developing scalable sensor data processing architecture in cloud computing to store and process body sensor data for healthcare applications. Proposed architecture uses big data technologies such as Apache Flume, Apache Pig and Apache HBase to collect and store huge sensor data in the Amazon web service. Apache Mahout implementation of MapReduce-based online stochastic gradient descent algorithm is used in the logistic regression to develop the scalable diagnosis model. Cleveland heart disease database (CHDD) is used to train the logistic regression model. Wearable body sensors are used to get the blood pressure, blood sugar level and heart rate of the patient to predict the heart disease status. Proposed prediction model efficiently classifies the heart disease with the accuracy of training and validation sample is 81.99% and 81.52%, respectively.
Keywords: stochastic gradient descent; SGD; mapreduce logistic regression; wearable body sensor; sensor data; big data; heart disease; cloud computing; Cleveland heart disease database; CHDD; Amazon web service; AWS; scalable diagnosis model.
International Journal of Advanced Intelligence Paradigms, 2018 Vol.10 No.1/2, pp.118 - 132
Received: 13 Apr 2016
Accepted: 12 Jun 2016
Published online: 29 Jan 2018 *