Title: Automatic generation of classes-interpretation as a bridge between clustering and decision-making
Authors: Karina Gibert
Addresses: Department of Statistics and Operations Research, Knowledge Engineering and Machine Learning Group, Universitat Politècnica de Catalunya-BarcelonaTech, Campus Nord, Edif, C5, C/Jordi Girona 1-3, 08034 Barcelona, Spain
Abstract: Understanding the meaning of the classes outcomming from a clustering method is one of the critical aspects to guarantee both the validity of the clustering results and their usefulness. The methodology of conceptual characterisation by embedded conditioning (CCEC), is a proposal for building conceptual interpretations of hierarchical clustering that contributes to enshort the gap between the clustering itself and the further decision-making processes. The methodology uses some statistical tools (as the boxplot multiple, introduced by Tukey) together with some machine learning methods, to learn the structure of the data; and find the characterising variables (previously introduced by Gibert) of the classes when they exist, whereas providing alternatives when they do not exist. In this paper, the pillars of the methodology are presented, as well as different criteria for knowledge integration. The usefulness of CCEC for building domain theories as models supporting later decision-making is addressed. The proposal is applied for building the interpretation of a set of classes extracted from a waste water treatment plant (WWTP) and the results obtained with the different criteria are compared and discussed.
Keywords: hierarchical clustering; post-processing; class interpretation; knowledge discovery; data mining; decision making; intelligent DSS; IDSS; decision support systems; wastewater treatment plants; WWTPs; boxplot multiple; machine learning; knowledge integration.
International Journal of Multicriteria Decision Making, 2014 Vol.4 No.2, pp.154 - 182
Received: 01 Dec 2012
Accepted: 08 Apr 2013
Published online: 14 Apr 2014 *