Authors: Shruthi Hiremath; Pallavi Chandra; Anne Mary Joy; B.K. Tripathy
Addresses: Hariganga Society, Flat No. 104, Building C-1, Phulenagar, Yerwada, Pune – 411006, India ' Ericsson India Global Services Private Limited, Tamarai Tech Park, 4th Floor, South Block, S.P Plot No. 16 to 20 & 20A, Guindy, Chennai – 600032, Tamil Nadu, India ' University of Southern California, 2618 Ellendale Place, Apt#7, Los Angeles, CA 90007, USA ' Department of Computer Science and Engineering, VIT University, Vellore – 632014, Tamil Nadu, India
Abstract: Data mining techniques are used to generate information from enormous amount of raw data collected from different sources so that prediction of future events can be made. Rough set theory, which is used to perform data mining for knowledge acquisition has imitations and hence is not efficient in handling heterogeneous real datasets. In this paper, we use a neighbourhood based rough set model and propose a method to determine reduced neighbourhood subsets derived from samples of the universal set. We compare the accuracy and coverage of the computations obtained by using parallel rough set-based methods using the conventional MapReduce technique. The results provide strong evidence of reduced reasoning time in both the cases. Although the subset formation method defines a range of values to which the rules give a better result of the computational analysis, the covering method reduces the number of rules at some cost of the values computed.
Keywords: rough sets; neighbourhood subsets; MapReduce; knowledge acquisition; data mining; big data.
International Journal of Communication Networks and Distributed Systems, 2015 Vol.15 No.2/3, pp.212 - 234
Received: 07 Jun 2014
Accepted: 07 Feb 2015
Published online: 02 Aug 2015 *