Authors: Bikash Kanti Sarkar, Shib Sankar Sana, Kripasindhu Chaudhuri
Addresses: Department of Information Technology, Birla Institute of Technology, Deemed University, Mesra, Ranchi, India. ' Department of Mathematics, Bhangar Mahavidyalaya, University of Calcutta, Bhangar-743502, 24PGS(South), West Bengal, India. ' Department of Mathematics, Jadavpur University, Kolkata-32, India
Abstract: Data discretisation is an important step in the process of machine learning, since it is easier for classifiers to deal with discrete attributes rather than continuous attributes. Over the years, several methods of performing discretisation such as Boolean reasoning, equal frequency binning, entropy have been proposed, explored, and implemented. In this article, a simple supervised discretisation approach called minimum information loss (MIL) is introduced. The prime goal of MIL is to maximise classification accuracy of classifier, minimising loss of information while discretisation of continuous attributes. The performance of the suggested approach is compared with the supervised discretisation algorithms: selective pseudo iterative deletion 4.7 (SPID4.7) and minimum description length principle (MDLP), using four state-of-the-art rule inductive algorithms – neural network, C4.5, Naive-Bayes, and CN2. The empirical results show that the presented approach performs better in several cases in comparison to the other two algorithms.
Keywords: data mining; data discretisation; classifiers; accuracy; information loss; machine learning.
International Journal of Data Mining, Modelling and Management, 2011 Vol.3 No.3, pp.303 - 318
Published online: 06 Aug 2011 *Full-text access for editors Access for subscribers Purchase this article Comment on this article