Title: Rule grouping and multiple minimum support thresholds for semantic multi-label associative classifier using feature reoccurrences
Authors: Preeti A. Bailke; S.T. Patil
Addresses: Vishwakarma Institute of Technology, Pune, India ' Vishwakarma Institute of Technology, Pune, India
Abstract: Multi-label classification is one of the important tasks in data mining. Researchers have addressed and extensively studied supervised classification which has vast applications in many domains. Associative classifiers are better performing classifiers, but they still have some issues which need to be addressed. This paper handles class imbalance problem, semantically organises vast number of generated rules, and applies relevant rules during classification. An algorithm called semantic multi-label associative classifier using feature reoccurrences (SeMACR) is proposed. Considering reoccurrence of features while generating rules proves to be beneficial, in particular for text documents. Class imbalance problem is handled with the help of balanced training and use of multiple minimum support thresholds based on the class distribution. A novel semantic-based approach is proposed for grouping of association rules using relatedness score between features rather than the traditional distance-based measure. Such organisation of rules makes them manageable and interpretable. During classification, only the relevant rules i.e., the rules present in the semantically most related group are applied. SeMACR algorithm has shown improved or comparable performance as compared to state-of-the-art techniques.
Keywords: multi-label classification; semantic rule grouping; reoccurrence of features; association rules; multiple minimum support thresholds; balanced training.
International Journal of Data Mining, Modelling and Management, 2017 Vol.9 No.2, pp.163 - 183
Received: 12 Feb 2016
Accepted: 06 Oct 2016
Published online: 28 Jul 2017 *