Title: Comparative study of classification techniques on biomedical data from hypertext documents

Authors: Rashedur M. Rahman; Sazia Salahuddin

Addresses: Department of Electrical Engineering and Computer Science, North South University, Plot-15, Block-B, Bashundhara, Dhaka 1229, Bangladesh ' Department of Electrical Engineering and Computer Science, North South University, Plot-15, Block-B, Bashundhara, Dhaka 1229, Bangladesh

Abstract: In this paper, our goal is to mine biomedical data from hypertext documents (e.g., mining data from web contents) using data mining algorithms with the help of 'biomedical ontology'. We collect a number of documents using Google and preprocess the hypertext documents and extract the text data. Next job is the identification of biomedical data. To identify whether a word is a biomedical entity or not we use a biomedical database, the 'UMLS metathesaurus'. The mapping of biomedical entity from the metathesaurus will be done based on keyword query. The more occurrence of a biomedical entity in a page, the more relevant the page is, and thus, we can re-rank the documents to find the most important documents. Then we test and analyse the performance of seven most popular classification algorithms by training them separately with the documents ranked by Google and our algorithm.

Keywords: data mining; biomedical ontology; classification; performance analysis; document clustering; biomedical data; hypertext documents; web content; metathesaurus; keyword queries; keyword search.

DOI: 10.1504/IJKESDP.2013.052717

International Journal of Knowledge Engineering and Soft Data Paradigms, 2013 Vol.4 No.1, pp.21 - 41

Published online: 19 Jul 2014 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article