Authors: Qiusha Zhu; Lin Lin; Mei-Ling Shyu
Addresses: Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33124, USA. ' Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33124, USA. ' Department of Electrical and Computer Engineering, University of Miami, Coral Gables, FL 33124, USA
Abstract: This paper proposes a novel supervised discretisation algorithm based on Correlation Maximisation (CM) using Multiple Correspondence Analysis (MCA). MCA is an effective technique to capture the correlation between multiple variables. For each numeric feature, the proposed discretisation algorithm utilises MCA to measure the correlations between feature intervals/items and classes, and the set of cut-points yielding the maximum correlation is chosen as the discretisation scheme for that feature. Therefore, the discretised feature can not only produce a concise summarisation of the original numeric feature but also provide the maximum correlation information to predict class labels. Experiments are conducted by comparing to seven state-of-the-art supervised discretisation algorithms using six well-known classifiers on 19 UCI data sets. Experimental results demonstrate that the proposed discretisation algorithm can automatically generate a set of features (feature intervals) that produce the best classification results on average.
Keywords: discretisation; supervised classification; MCA; multiple correspondence analysis; correlation maximisation; feature intervals.
International Journal of Business Intelligence and Data Mining, 2012 Vol.7 No.1/2, pp.40 - 59
Available online: 23 Aug 2012 *Full-text access for editors Access for subscribers Purchase this article Comment on this article