Title: A recursive algorithm for open information extraction from Persian texts

Authors: Mahmoud Rahat; Alireza Talebpour; Seyedamin Monemian

Addresses: Faculty of Computer Science and Engineering, Shahid Beheshti University, Evin, Tehran, 1983963113, Iran ' Faculty of Computer Science and Engineering, Shahid Beheshti University, Evin, Tehran, 1983963113, Iran ' Faculty of Computer Science and Engineering, Shahid Beheshti University, Evin, Tehran, 1983963113, Iran

Abstract: With the proliferation of textual data accessible in the internet, researchers have focused on shifting Open Information Extraction (Open IE) paradigm to non-English languages. The process of adapting an Open IE system from English to Persian is challenging since two languages have fundamental differences in syntax and dependency representation trees. To the best of our knowledge, this article is the first published paper about Open IE for Persian. Many traditional systems apply a large set of lexical patterns which is inefficient in out-of-domain text. We replace this large pattern set with a few syntactic rules defined upon dependency parse of a sentence that are specifically designed for Persian. We also addressed some Persian-specific phenomena to enhance the results. The recursive nature of the algorithm enabled us to handle nested sentences. Our experiments showed that the proposed system achieves decent performance compared to the state of the art systems in English.

Keywords: open information extraction; Persian text processing; dependency parsers; natural language processing.

DOI: 10.1504/IJCAT.2018.092978

International Journal of Computer Applications in Technology, 2018 Vol.57 No.3, pp.193 - 206

Received: 31 May 2016
Accepted: 07 Mar 2017

Published online: 04 Jul 2018 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article