Title: Goodman-Kruskal measure associated clustering for categorical data

Authors: Wenxue Huang; Yuanyi Pan; Jianhong Wu

Addresses: Department of Mathematics, Shantou University, Shantou, Guangdong 515063, China. ' Department of Mathematics and Statistics, York University, Toronto, Ontario, M3J 1P3, Canada. ' Department of Mathematics and Statistics, York University, Toronto, Ontario, M3J 1P3, Canada

Abstract: Motivated by business interest of return on investment (ROI) in marketing, we develop a conceptual clustering algorithm for categorical data with a response variable based on a variation to Goodman-Kruskal measure. The key to this algorithm is an implicitly cost-effective dissimilarity measure derived from a probabilistic association rule between the response and the explanatory scenarios. Applications to a real dataset FAMEX96 illustrate how useful information can be mined from marketing data using this dissimilarity measure.

Keywords: categorical data; supervised clustering; dissimilarity measures; decisive rules; Goodman-Kruskal measure; return on investment; ROI; scenario association; target variable; clustering algorithms; marketing data; data mining.

DOI: 10.1504/IJDMMM.2012.049880

International Journal of Data Mining, Modelling and Management, 2012 Vol.4 No.4, pp.334 - 360

Published online: 23 Aug 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article