Article: Sequence clustering approach for clustering web user session Journal: International Journal of Business Information Systems (IJBIS) 2018 Vol.28 No.1 pp.67 - 78 Abstract: Clustering web usage data is useful to discover interesting patterns pertaining to user traversals, behaviour and their usage characteristics. It is also useful for trend discovery as well as for building personalisation and recommendation engines. Since web is dynamic, clustering web user transactions results in arbitrary shapes. Moreover, users accesses web pages in an order in which they are interested and hence incorporating sequence nature of their usage is crucial for clustering web transactions. In this paper, we present an approach to cluster web usage sequence data and removing noise using DBSCAN algorithm. We also study the impact of clustering process when both sequence and content information is incorporated while computing similarity measure. We use sequence and set similarity (S<SUP align="right"><SMALL>3</SMALL></SUP>M) measure to capture both the order of occurrence of page visits and the page information itself, and compared the results with Euclidean distance and Jaccard similarity measures. The inter-cluster and intra-cluster distances are computed using average Levensthein distance (ALD) to demonstrate the usefulness of the proposed approach in the context of web usage mining. Inderscience Publishers - linking academia, business and industry through research

Title: Sequence clustering approach for clustering web user session

Authors: Pradeep Kumar

Addresses: Indian Institute of Management Lucknow, Lucknow, India

Abstract: Clustering web usage data is useful to discover interesting patterns pertaining to user traversals, behaviour and their usage characteristics. It is also useful for trend discovery as well as for building personalisation and recommendation engines. Since web is dynamic, clustering web user transactions results in arbitrary shapes. Moreover, users accesses web pages in an order in which they are interested and hence incorporating sequence nature of their usage is crucial for clustering web transactions. In this paper, we present an approach to cluster web usage sequence data and removing noise using DBSCAN algorithm. We also study the impact of clustering process when both sequence and content information is incorporated while computing similarity measure. We use sequence and set similarity (S³M) measure to capture both the order of occurrence of page visits and the page information itself, and compared the results with Euclidean distance and Jaccard similarity measures. The inter-cluster and intra-cluster distances are computed using average Levensthein distance (ALD) to demonstrate the usefulness of the proposed approach in the context of web usage mining.

Keywords: sequence clustering; web usage data; similarity measures; average Levensthein distance; ALD.

DOI: 10.1504/IJBIS.2018.091163

International Journal of Business Information Systems, 2018 Vol.28 No.1, pp.67 - 78

Received: 16 May 2016
Accepted: 29 Sep 2016
Published online: 13 Apr 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article

Title: Sequence clustering approach for clustering web user session

Keep up-to-date