Authors: Minh-Tien Nguyen; Tri-Thanh Nguyen
Addresses: Hung Yen University of Technology and Education (UTEHY), Administrative Building, Nhan Hoa Commune, My Hao District, Hung Yen Province, Vietnam ' Vietnam National University (VNU), University of Engineering and Technology (UET), 144, Xuan Thuy Street, Cau Giay District, Hanoi, Vietnam
Abstract: In this paper, we proposed a method that combines semantic rules and machine learning to extract infectious disease events in Vietnamese electronic news for a real-time monitoring system of spreading status. Our method includes two important steps: detecting disease events from unstructured text and extracting information of the disease event. The detection phrase uses semantic rules and machine learning to detect a disease event; in the later step, named entity recognition (NER), rules, and dictionaries are utilised to capture the events information. The performance of the two steps has F-score of 77.33% (2.36% better than the baseline's) and 91.89% (4.31% better than the baseline's) correspondingly. The promising results from the comparisons showed that our method is suitable for extracting disease events in Vietnamese text.
Keywords: data mining; information extraction; disease event extraction; disease monitoring systems; semantic rules; machine learning; infectious diseases; disease extraction; Vietnam; electronic news; e-news; disease spread; named entity recognition; disease events; disease status.
International Journal of Computational Vision and Robotics, 2015 Vol.5 No.3, pp.282 - 301
Received: 05 Nov 2014
Accepted: 03 Feb 2015
Published online: 20 Aug 2015 *