Title: Distributed algorithms for improved associative multilabel document classification considering reoccurrence of features and handling minority classes

Authors: Preeti A. Bailke; S.T. Patil

Addresses: Vishwakarma Institute of Technology, 666, Upper Indiranagar, Bibvewadi, Pune, Maharashtra, India ' Vishwakarma Institute of Technology, 666, Upper Indiranagar, Bibvewadi, Pune, Maharashtra, India

Abstract: Existing work in the domain of distributed data mining mainly focuses on achieving the speedup and scaleup properties rather than improving performance measures of the classifier. Improvement in speedup and scaleup is obvious when distributed computing platform is used. But its computing power should also be used for improving performance measures of the classifier. This paper focuses on the same by considering reoccurrence of features and handling minority classes. Since it is very time consuming to run such complex algorithms on large datasets sequentially, distributed versions of the algorithms are designed and tested on the Hadoop cluster. Base associative classifier is designed based on multi-class, multi-label associative classification (MMAC) algorithm. Since no similar distributed algorithms exist, proposed algorithms are compared with the base classifier and have shown improvement in classifier performance measures.

Keywords: multilabel associative classifier; Hadoop; Pig Latin; feature reoccurrence; minority class; distributed algorithm.

DOI: 10.1504/IJBIDM.2019.098843

International Journal of Business Intelligence and Data Mining, 2019 Vol.14 No.3, pp.299 - 321

Received: 20 Jan 2017
Accepted: 22 Mar 2017

Published online: 04 Apr 2019 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article