Authors: Balaji K. Bodkhe; Sanjay Sood
Addresses: IKG Punjab Technical University, Jalandhar Punjab, India; Modern Education Society's College of Engineering, Pune, India ' CDAC, Mohali, India
Abstract: The learning approach is basically used for classification of different data into clusters. Basically, semi supervised learning has been used worldwide to classify the labelled as well as unlabeled data. The dataset sometimes may be in mixed features that may consist of both numeric and categorical type of data. In these two types, data may differ in their characteristics. Due to the differences in their characteristics, in order to group these types of mixed data, it is better to use the ensemble clustering method which uses split and merge approach to solve this problem. This research work carried out the original mixed dataset and is categorised into numeric dataset and categorical dataset and clustered using both traditional clustering algorithms and fuzzy clustering algorithms using random subspace approach called as fuzzy random forest (FRF). The resultant clusters are combined using ensemble clustering methods and evaluated by both f-measure and entropy measure. It is found that splitting is more beneficial and applying fuzzy clustering algorithms provides better results than traditional clustering algorithms. The system was tested on Hadoop multi node cluster environment as well as traditional environment. The hybrid genetic algorithm is used for optimisation.
Keywords: genetic algorithm; fuzzy random forest; FRF; fuzzy clustering algorithms.
International Journal of Intelligent Enterprise, 2021 Vol.8 No.4, pp.397 - 406
Received: 25 Aug 2018
Accepted: 30 Aug 2019
Published online: 06 Oct 2021 *