Authors: J. Balaji; P. Ranjani; T.V. Geetha
Addresses: Department of CSE, Anna University, Chennai, India ' Department of IST, Anna University, Chennai, India ' Department of CSE, Anna University, Chennai, India
Abstract: Use of semantic concepts and relations for NLP applications including information retrieval and web search is a major area of research. In this context, semantic relation extraction from open domain web documents is important not only for English but also for other languages where hand-crafted rules covering the variability in expressing semantic relations or semantically tagged corpora are not available. To meet this crucial need, an unsupervised approach to learn semantic relations between concepts specifically for morphologically rich, relatively free word order languages gains importance. Unlike previous approaches that used word order and morpho-syntactic features, in this paper, we use morpho-semantic features with a minimal amount of co-occurrence features to extract semantic relations. The features are used to learn whether a concept node is a source of a concept-relation-concept subgraph, what the relation associated with the source is and which is the destination node using appropriate probabilities. Determining the destination node requires a novel source-destination probability because of the relatively free word order nature of the language. The approach was evaluated using 20,000 document corpus from both tourism and news domain. The results showed that the approach gave F-measure 0.50 for a morphologically rich language without using syntactic features.
Keywords: unsupervised learning; semantic relations; universal networking language; UNL; semantic graphs; source destination probability; natural language processing; NLP; morpho-semantic features; morphologically rich languages; tourism; news.
International Journal of Information and Communication Technology, 2016 Vol.8 No.4, pp.344 - 356
Received: 29 Jul 2013
Accepted: 15 May 2014
Published online: 26 Mar 2016 *