Title: A partition based method for finding highly correlated pairs

Authors: Shuxin Li, Sheau-Dong Lang

Addresses: 1180 Celebration Blvd. Suit 101, Celebration, FL 34747, USA. ' 207 Harris Engineering Center (ENG III), University of Central Florida, Orlando, FL 33816, USA

Abstract: The problem of finding highly correlated pairs is to output all item pairs whose (Pearson) correlation coefficients are greater than a user-specified correlation threshold. Effective discovery of such item pairs is of primary importance in many real data mining applications. Algorithm and Taper algorithm are special cases of our new algorithm with respect to the number of segments. Experimental results on real datasets demonstrate the feasibility and superiority of our algorithm. Recently, the Taper algorithm is developed to discover the set of highly correlated item pairs. In this paper, we present a generalised Taper algorithm to find strongly correlated pairs between items by partitioning the collection of transactions into different segments, so as to achieve better pruning effect and less running time. Consequently, it can be proved that both are naive.

Keywords: correlation; association rules; Pearson correlation coefficients; transactional databases; data mining; partition; highly correlated pairs.

DOI: 10.1504/IJDMMM.2010.035562

International Journal of Data Mining, Modelling and Management, 2010 Vol.2 No.4, pp.334 - 350

Published online: 30 Sep 2010 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article