Title: Preprocessing enhancements to improve data mining algorithms

Authors: Paraskevas Orfanidis, David J. Russomanno

Addresses: Department of Electrical and Computer Engineering, Herff College of Engineering, The University of Memphis, Memphis, TN 38152, USA. ' Department of Electrical and Computer Engineering, Herff College of Engineering, The University of Memphis, Memphis, TN 38152, USA

Abstract: Preprocessing is often required before using clustering or other data mining algorithms to analyse multivariate data sets. The approaches discussed in this paper are enhanced implementations of a preprocess that utilises an algorithm to cluster points in a data set based upon each attribute independently, resulting in additional information about the data points with respect to each of its dimensions. Noise, data boundaries, and likely representatives of data subsets can be more easily identified, thus significantly improving the performance of subsequent clustering or data mining algorithms by combining this additional information across all dimensions and querying the results.

Keywords: preprocessing; clustering; seed; boundary; noise; sampling; data mining algorithms.

DOI: 10.1504/IJBIDM.2008.020519

International Journal of Business Intelligence and Data Mining, 2008 Vol.3 No.2, pp.196 - 211

Published online: 28 Sep 2008 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article