Title: Sequential dimension reduction and clustering of mixed-type data

Authors: Angelos Markos; Odysseas Moschidis; Theodore Chadjipantelis

Addresses: Department of Primary Education, Democritus University of Thrace, Alexandroupolis, Greece ' Department of Business Administration, University of Macedonia, Thessaloniki, Greece ' School of Political Sciences, Aristotle University of Thessaloniki, Thessaloniki, Greece

Abstract: Clustering of a set of objects described by a mixture of continuous and categorical variables can be a challenging task. In the context of data reduction, an effective class of methods combine dimension reduction with clustering in the reduced space. In this paper, we review three approaches for sequential dimension reduction and clustering of mixed-type data. The first step of each approach involves the application of principal component analysis on a suitably transformed matrix. In the second step, a partitioning or hierarchical clustering algorithm is applied to the object scores in the reduced space. The common theoretical underpinnings of the three approaches are highlighted. The results of a benchmarking study show that sequential dimension reduction and clustering is an effective strategy, especially when categorical variables are more informative than continuous with regard to the underlying cluster structure. Strengths and limitations are also demonstrated on a real mixed-type dataset.

Keywords: cluster analysis; dimension reduction; correspondence analysis; principal component analysis; PCA; mixed-type data.

DOI: 10.1504/IJDATS.2020.108043

International Journal of Data Analysis Techniques and Strategies, 2020 Vol.12 No.3, pp.228 - 246

Received: 01 Mar 2018
Accepted: 04 Sep 2018

Published online: 02 Jul 2020 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article