Title: Prediction of disease using fuzzy random forest

Authors: Balaji K. Bodkhe; Sanjay Sood

Addresses: IKG Punjab Technical University, Jalandhar Punjab, India; Modern Education Society's College of Engineering, Pune, India ' CDAC, Mohali, India

Abstract: The learning approach is basically used for classification of different data into clusters. Basically, semi supervised learning has been used worldwide to classify the labelled as well as unlabeled data. The dataset sometimes may be in mixed features that may consist of both numeric and categorical type of data. In these two types, data may differ in their characteristics. Due to the differences in their characteristics, in order to group these types of mixed data, it is better to use the ensemble clustering method which uses split and merge approach to solve this problem. This research work carried out the original mixed dataset and is categorised into numeric dataset and categorical dataset and clustered using both traditional clustering algorithms and fuzzy clustering algorithms using random subspace approach called as fuzzy random forest (FRF). The resultant clusters are combined using ensemble clustering methods and evaluated by both f-measure and entropy measure. It is found that splitting is more beneficial and applying fuzzy clustering algorithms provides better results than traditional clustering algorithms. The system was tested on Hadoop multi node cluster environment as well as traditional environment. The hybrid genetic algorithm is used for optimisation.

Keywords: genetic algorithm; fuzzy random forest; FRF; fuzzy clustering algorithms.

DOI: 10.1504/IJIE.2021.117986

International Journal of Intelligent Enterprise, 2021 Vol.8 No.4, pp.397 - 406

Received: 25 Aug 2018
Accepted: 30 Aug 2019

Published online: 28 Jul 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article