Authors: Reda Younsi; Anthony Bagnall
Addresses: School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK. ' School of Computing Sciences, University of East Anglia, Norwich, NR4 7TJ, UK
Abstract: This paper describes an efficient randomised sphere cover classifier (αRSC), that reduces the training data set size without loss of accuracy when compared to nearest neighbour classifiers. The motivation for developing this algorithm is the desire to have a non-deterministic, fast, instance-based classifier that performs well in isolation but is also ideal for use with ensembles. We use 24 benchmark datasets from UCI repository and six gene expression datasets for evaluation. The first set of experiments demonstrate the basic benefits of sphere covering. The second set of experiments demonstrate that when we set the α parameter through cross validation, the resulting αRSC algorithm outperforms several well known classifiers when compared using the Friedman rank sum test. Thirdly, we test the usefulness of αRSC when used with three feature filtering filters on six gene expression datasets. Finally, we highlight the benefits of pruning with a bias/variance decomposition.
Keywords: sphere covers; randomised classifiers; randomisation; bias decomposition; variance decomposition; gene expression datasets; training data; set sizes; accuracy; nearest neighbour classifiers; algorithms; non-deterministic classifiers; fast classifiers; instance-based classifiers; isolation; ensembles; UCI Machine Learning Repository; University of California; UC Irvine; universities; higher education; USA; United States; benchmark datasets; gene expression datasets; sphere covering; cross validation; rank sum tests; non-parametric tests; statistical tests; Milton Friedman; feature filtering; filters; pruning; data mining; data modelling; data management; intelligent data analysis.
International Journal of Data Mining, Modelling and Management, 2012 Vol.4 No.2, pp.156 - 171
Received: 08 May 2021
Accepted: 12 May 2021
Published online: 09 May 2012 *