Authors: Maimaiti Ayifu; Silamu Wushouer; Muhetaer Palidan
Addresses: College of Information Science and Engineering, Xinjiang University, Urumqi, 830046, China ' College of Information Science and Engineering, Xinjiang University, Urumqi, 830046, China ' College of Information Science and Engineering, Xinjiang University, Urumqi, 830046, China
Abstract: Uyghur, Kazak, and Kyrgyz (UKK languages) are agglutinative and low-resource languages with rich morphological features. Determining how to obtain a better general entity recognition method without relying on artificial features and resources is a problem that remains to be solved. In this paper, a hybrid neural network model based on bidirectional GRU (BiGRU)-CNN-CRF is proposed. This model uses concatenated vectors including affix vectors, part of speech vectors, and word vectors as inputs and constructs a deep neural network of BiGRU-CRF suitable for the recognition of UKK named entities. Finally, the global optimal labelling sequence is outputted by the conditional random field (CRF) layer. The experimental results show that this model can solve the problem of automatic recognition of named entities. In addition, the model has good robustness. The F1 value of UKK named entity recognition reached 93.11%, 90.29%, and 89.22% for the Uyghur, Kazak, and Kyrgyz languages, respectively.
Keywords: recurrent neural network; convolutional neural network; CNN; conditional random field; CRF; named entity recognition; Uyghur; Kazak; Kyrgyz.
International Journal of Information and Communication Technology, 2019 Vol.15 No.3, pp.223 - 242
Received: 03 Jul 2018
Accepted: 18 Jul 2018
Published online: 11 Oct 2019 *