Title: Ontology-based semantic smoothing model for biomedical document clustering
Authors: S. Logeswari; K. Premalatha
Addresses: Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Erode – 638401, Tamil Nadu, India ' Department of Computer Science and Engineering, Bannari Amman Institute of Technology, Sathyamangalam, Erode – 638401, Tamil Nadu, India
Abstract: One of the major issues of data mining is the clustering of unstructured text documents. Traditional clustering algorithms are failing to prove the accuracy of the clustering process because of the characteristics of text documents such as high dimension, complex semantics, sparsity, etc. Recent researches focus on the clustering of text documents based on the semantic smoothing technique, which resolves the conflicts by general words and the sparsity of class-specific core words. In this work ontology-based semantic smoothing model is proposed which uses the domain ontology for concept extraction. It is a mixture of simple language model and a topic signature translation model. The results obtained from the proposed method shows a significant improvement in the clustering process than the existing methods in terms of cluster quality.
Keywords: semantic smoothing; document clustering; ontology; Jaccard index; silhouette index; FM index; biomedical documents; data mining; cluster quality; domain ontology; concept extraction; text documents.
DOI: 10.1504/IJTMCP.2015.069475
International Journal of Telemedicine and Clinical Practices, 2015 Vol.1 No.1, pp.94 - 110
Received: 31 Oct 2013
Accepted: 11 Jun 2014
Published online: 18 May 2015 *