Authors: Soumen Swarnakar
Addresses: Netaji Subhas Engineering College, Technocity, Garia, Kolkata-152, India
Abstract: Document clustering has become an increasingly important task in analysing huge documents. The challenging aspect to analyse the enormous documents is to organise them in such a way that facilitates better search and knowledge extraction without introducing extra cost and complexity. In this paper, first ontology-based document clustering method has been proposed using hierarchical clustering technique. The approach is purely based on the frequency count of the terms present in the documents where context of the terms are totally ignored. Therefore, the method is modified by incorporating belief to measure the degree of relatedness of the terms with respect to the concepts present in the documents. An efficient searching algorithm has been developed to create the ontology tree of the documents, through consultation with different dictionaries. Davis-Bouldin's (DB) index is the well-known metric for measuring quality of clusters exhibits that the proposed approach can efficiently produce higher quality document clusters as compared with several well-known document-clustering algorithms, including our previous one.
Keywords: ontology; DB index; document clustering; knowledge extraction; intromation retrieval; hierarchical clustering; Davis-Bouldin; context; searching algorithms.
International Journal of Knowledge Engineering and Data Mining, 2012 Vol.2 No.1, pp.35 - 59
Received: 08 May 2021
Accepted: 12 May 2021
Published online: 03 Jan 2012 *