Title: SCHISM: a new approach to interesting subspace mining

Authors: Karlton Sequeira, Mohammed Zaki

Addresses: Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA. ' Department of Computer Science, Rensselaer Polytechnic Institute, 110 8th St, Troy, NY 12180, USA

Abstract: High dimensional data pose challenges to traditional clustering algorithms due to their inherent sparseness and data tend to cluster in different and possibly overlapping subspaces of the entire feature space. Finding such subspaces is called subspace mining. We present SCHISM, a new algorithm for mining interesting subspaces, using the notions of support and Chernoff-Hoeffding bounds. We use a vertical representation of the dataset, and use a depth first search with backtracking to find maximal interesting subspaces. We test our algorithm on a number of high dimensional synthetic and real datasets to test its effectiveness.

Keywords: subspace mining; clustering; interestingness measures; Chernoff-Hoeffding bounds; maximal subspaces; data mining.

DOI: 10.1504/IJBIDM.2005.008360

International Journal of Business Intelligence and Data Mining, 2005 Vol.1 No.2, pp.137 - 160

Published online: 08 Dec 2005 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article