Title: Ontology-based semantic smoothing model for biomedical document clustering

Authors: S. Logeswari; K. Premalatha

Addresses: Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Erode – 638401, Tamil Nadu, India ' Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Erode – 638401, Tamil Nadu, India

Abstract: One of the major issues of data mining is the clustering of unstructured text documents. Traditional clustering algorithms are failing to prove the accuracy of the clustering process because of the characteristics of text documents such as high dimension, complex semantics, sparsity, etc. Recent researches focus on the clustering of text documents based on the semantic smoothing technique, which resolves the conflicts by general words and the sparsity of class-specific core words. In this work ontology-based semantic smoothing model is proposed which uses the domain ontology for concept extraction. It is a mixture of simple language model and a topic signature translation model. The results obtained from the proposed method shows a significant improvement in the clustering process than the existing methods in terms of cluster quality.

Keywords: semantic smoothing; document clustering; ontology; Jaccard index; silhouette index; FM index; biomedical documents; data mining; cluster quality; domain ontology; concept extraction; text documents.

DOI: 10.1504/IJTMCP.2015.069475

International Journal of Telemedicine and Clinical Practices, 2015 Vol.1 No.1, pp.94 - 110

Received: 31 Oct 2013
Accepted: 11 Jun 2014

Published online: 18 May 2015 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article