Title: Mining strongly correlated item pairs in large transaction databases

Authors: Swarup Roy; Dhruba Kr Bhattacharyya

Addresses: Department of Information Technology, North-Eastern Hill University, Umshing, Shillong 793022, Meghalaya, India ' Department of Computer Science and Engineering, Tezpur University, Napaam, Tezpur 784028, Assam, India

Abstract: Correlation mining is an approach of drawing statistical relationship between items from transaction data. Most of the existing techniques use Pearson's correlation coefficient as a measure of correlation, which may not always perform well when data are noisy and binary in nature. Moreover, they require multi-pass over the database. This paper presents an effective and faster correlation mining technique to extract most strongly correlated item pairs from large transaction databases. As an alternative to Pearson's correlation coefficient, it presents a method of computing Spearman's rank order correlation coefficient from transaction data. The proposed technique found to perform satisfactorily in terms of execution time over several real and synthetic datasets, while comparing to other similar techniques. To justify its usefulness, an application of the proposed technique for extracting yeast genetic network from gene expression data is also reported.

Keywords: correlation mining; correlation coefficient; strongly correlated item pairs; support; Spearman; rank order correlation; large transaction databases; bioinformatics; yeast genetic networks; gene expression data.

DOI: 10.1504/IJDMMM.2013.051920

International Journal of Data Mining, Modelling and Management, 2013 Vol.5 No.1, pp.76 - 96

Published online: 29 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article