Title: A comparison of cluster algorithms as applied to unsupervised surveys

Authors: Kathleen Campbell Garwood; Arpit Arun Dhobale

Addresses: Saint Joseph's University, 5600 City Ave, Philadelphia, PA 19131, USA ' Indian Institute of Technology, Near Doul Gobinda Road, Amingaon, North Guwahati, Guwahati, Assam – 781039, India

Abstract: When considering answering important questions with data, unsupervised data offers extensive insight opportunity and unique challenges. This study considers student survey data with a specific goal of clustering students into like groups with underlying concept of identifying different poverty levels. Fuzzy logic is considered during the data cleaning and organising phase helping to create a logical dependent variable for analysis comparison. Using multiple data reduction techniques, the survey was reduced and cleaned. Finally, multiple clustering techniques (k-means, k-modes and hierarchical clustering) are applied and compared. Though each method has strengths, the goal was to identify which was most viable when applied to survey data and specifically when trying to identify the most impoverished students.

Keywords: fuzzy logic; cluster analysis; unsupervised learning; survey analysis; decision support system; k-means; k-modes; hierarchical clustering.

DOI: 10.1504/IJBIDM.2021.114471

International Journal of Business Intelligence and Data Mining, 2021 Vol.18 No.3, pp.332 - 363

Received: 28 Mar 2018
Accepted: 28 Jun 2018

Published online: 26 Feb 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article