Title: Unsupervised offensive speech detection for multimedia based on multilingual BERT
Authors: Ge Liu; Xiaona Yang; Xiayang Shi; Yinlin Li
Addresses: Xuchang Vocational and Technical College, Henan 461000, China ' Software College, Zhengzhou University of Light Industry, Henan 450000, China ' Software College, Zhengzhou University of Light Industry, Henan 450000, China ' Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China
Abstract: There is a significant amount of offensive speech in multimedia, which seriously negatively impacts social stability. With the proliferation of sensor-equipped devices contributing to social media data, detecting offensive speech within this vast dataset has emerged as a critical challenge. However, most existing methods have focused only on a few high-resource languages. This paper proposes a cross-lingual aggressive transfer learning method based on bidirectional encoder representations from transformers (BERT) for automatically detecting offensive speech in low-resource languages. Initially, we utilise the multilingual BERT model to learn the characteristics of aggressive speech from a high-resource language dataset to establish an initial model. Subsequently, based on the linguistic similarity between languages, this model is transferred to low-resource languages. Experimental results demonstrate that our method achieves higher detection accuracy in multiple languages including English, Danish, Arabic, Turkish, and Greek, particularly excelling in low-resource languages.
Keywords: natural language processing; offensive speech detection; social media.
DOI: 10.1504/IJSNET.2024.142516
International Journal of Sensor Networks, 2024 Vol.46 No.3, pp.186 - 196
Received: 07 Apr 2024
Accepted: 18 Apr 2024
Published online: 05 Nov 2024 *