Title: A semi-automatic method for extracting a taxonomy for nuclear knowledge using hierarchical document clustering based on concept sets
Authors: Fabiane Braga; Nelson F.F. Ebecken
Addresses: Brazilian Nuclear Energy Commission - CNEN, Rua General Severiano 90, CGTI/CIN, Botafogo, 2222290-901, Rio de Janeiro, Brazil ' COPPE/Federal University of Rio de Janeiro, UFRJ, Centro de Tecnologia, Bloco B, Sala 101, Ilha do Fundão, 21945-970, Rio de Janeiro, Brazil
Abstract: In this paper, we present a text mining approach for the semi-automatic extraction of taxonomy of concepts for nuclear knowledge and evaluate the achievable results. Taxonomies are a fundamental part of any knowledge management strategy or framework. We propose a method for hierarchical document clustering based on the notion of frequent concept sets. Most clustering algorithms treat documents as a bag of words and bypass the important relationships between words, such as synonyms. In this method, we consider the semantic relationship between words and use a domain thesaurus (ETDE/INIS) to identify concepts. To validate the method, we conducted a case study in which we implemented a prototype, generating a taxonomy for nuclear knowledge with the goal of conceptually mapping the scientific production of the Brazilian Nuclear Energy Commission (CNEN).
Keywords: nuclear knowledge management; hierarchical document clustering; frequent item set clustering; taxonomy; concept hierarchy; text mining; concept sets; text mining; semantic relationships; Brazil.
DOI: 10.1504/IJNKM.2013.054496
International Journal of Nuclear Knowledge Management, 2013 Vol.6 No.2, pp.155 - 169
Received: 01 Mar 2013
Accepted: 01 Mar 2013
Published online: 30 Sep 2014 *