Authors: Swathi Jamjala Narayanan; Ilango Paramasivam; Rajen B. Bhatt
Addresses: School of Computing Science and Engineering, VIT University, Vellore – 632014, India ' School of Computing Science and Engineering, VIT University, Vellore – 632014, India ' Robert Bosch Research and Technology Cente, Pittsburgh, PA 15203, USA
Abstract: Fuzzy decision tree (FDT) induction is a powerful methodology to extract human interpretable fuzzy classification rules. As far as our knowledge goes there is no recent comparative study of fuzzy cluster validity indices with an objective of using it for estimating the optimal number of clusters for each of the continuous attributes during the process of induction of FDT. In this paper, we study the performance of the FDT with optimal number of partitions for each node appearing in the FDT. By obtaining optimal number of fuzzy clusters, we capture the intrinsic structure of the attribute values during the formation of fuzzy partitions, which in turn improves the classification accuracy of FDT. Extensive computational experiments are conducted on FDT developed using Fuzzy ID3 and eight fuzzy cluster validity indices over 30 publicly available pattern classification datasets. Non-parametric statistical tests are conducted to test the null hypothesis.
Keywords: FDT; fuzzy decision tree; fuzzy ID3; fuzzy c-means; cluster analysis; cluster validity; non-parametric statistical test; optimal clusters; data science.
International Journal of Data Science, 2017 Vol.2 No.3, pp.221 - 245
Received: 09 Jul 2014
Accepted: 16 Nov 2014
Published online: 27 Aug 2017 *