Title: Large-scale spectral clustering for managing big data in healthcare operations

Authors: Maoqing Liu; Nasser Fard; Keivan Sadeghzadeh

Addresses: Samsung Neurologica Corporation, Danvers, MA 01923, USA ' Department of Mechanical and Industrial, Northeastern University, Boston, MA 02115, USA ' Sloan School of Management, MIT, Cambridge, MA 02142, USA

Abstract: Healthcare industries have access to a large volume and variety of data about patients' behaviours, diseases, and treatments. There is a significant need for a data-driven system to discover patterns for better understanding of the impact of human risk behaviours on numerous diseases. In order to discover and extract interesting knowledge and pattern from large amount of data, a data mining process for discovering knowledge from unprocessed and raw healthcare data is studied. Methods for analysis of big data, and the role and types of clustering methods are presented. An in-depth analysis of spectral clustering method as a superior clustering algorithm for big healthcare data is presented. The spectral clustering algorithm is applied to a large healthcare data from the behavioural risk factor surveillance system (BRFSS), by partitioning the untrained data to at least four clusters. The MATLAB® R2011b programming environment is utilised as a calculation tool in the experimental design and analysis. The experimental results and analysis, and the implementation process are discussed and the data processing is presented. Sensitivity analysis for both parameters of the spectral clustering are performed to determine their influence on the clustering results.

Keywords: big data; healthcare; spectral clustering; visualisation.

DOI: 10.1504/IJBDI.2017.10006120

International Journal of Big Data Intelligence, 2017 Vol.4 No.3, pp.195 - 207

Received: 24 Nov 2015
Accepted: 17 Sep 2016

Published online: 30 Jul 2017 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article