Title: Semantic indexing of hybrid frequent pattern-based clustering of documents with missing semantic information

Authors: E. Anupriya; N.Ch.S.N. Iyengar

Addresses: Assistant Professor (Selection Grade) School of Computing Science & Engineering VIT University, Vellore, TN, India ' School of Computing Science & Engineering, VIT University, Vellore – 632 014, Tamil Nadu, India

Abstract: Documents added recently to the web are augmented with semantic information to identify the class of the documents, i.e., the topic or concept to which document belongs to, can be identified explicitly using meta tags like keyword tags or rich data format (RDF) tags. But the documents that enriched the web five or ten years back do not contain semantic information. In this paper, we present hybrid clustering system using frequent pattern mining (HCSFPM) technique which fuses the two frequent pattern mining schemes: frequent term-based and frequent pattern-based techniques to cluster the documents according to topics or concepts. We also index the documents based on the semantic information content of the document. Results illustrate that HCSFPM method performs better than the traditional term-based method.

Keywords: frequent pattern mining; document semantic indexing; semantic similarity; similarity histogram clustering; conceptual clustering; web document clustering.

DOI: 10.1504/IJCISTUDIES.2015.069833

International Journal of Computational Intelligence Studies, 2015 Vol.4 No.1, pp.72 - 86

Received: 19 Sep 2013
Accepted: 14 May 2014

Published online: 13 Jun 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article