Title: Non-persistent stratified sampling based IQRA_IG for scalable reduct generation

Authors: P.S.V.S. Sai Prasad; C. Raghavendra Rao

Addresses: School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Andhra Pradesh, 500046, India ' School of Computer and Information Sciences, University of Hyderabad, Hyderabad, Andhra Pradesh, 500046, India

Abstract: Feature selections in large datasets using reduct based on rough set principles is computationally expensive. The existing scalable sampling-based reduct computation algorithms suffer from limitations like redundancy and inadequacy. This paper develops two algorithms for finding reduct based on stratified sampling and non-persistent stratified sampling techniques which addresses adequacy and to certain extent redundancy. This paper compares the performance of these algorithms against discernbility matrix-based sampling approximate reduct algorithm (SARA) and sample guided improved quick reduct algorithm with information gain heuristic (SGIQRA_IG). The performance of these algorithms is demonstrated on benchmark large dataset repository of Arizona State University.

Keywords: rough sets; quick reduct; scalability; stratified sampling; fidelity; feature selection.

DOI: 10.1504/IJGCRSIS.2014.060844

International Journal of Granular Computing, Rough Sets and Intelligent Systems, 2014 Vol.3 No.3, pp.234 - 255

Received: 24 Sep 2012
Accepted: 07 Jan 2014

Published online: 29 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article