Title: Harnessing the power of hugging face's multilingual transformers: unravelling the code-mixed named entity recognition enigma
Authors: Rejuwan Shamim; Asadullah Shaikh
Addresses: Department of Computer Science and Engineering with Data Science, Maharishi University of Information Technology, Noida, India ' Department of Information Systems, College of Computer Science and Information Systems, Najran University, Najran 61441, Saudi Arabia
Abstract: Named entity recognition (NER) in code-mixed documents, which have different languages, is hard for natural language processing. In this paper, we use hugging face's multilingual transformers to come up with a way to do code-mixed NER without any problems. Our work tries to solve the problems that come up when you try to recognise named entities in more than one language within the same text. We did thorough tests by fine-tuning the multilingual transformer model on a dataset with mixed codes. With an F1-score of 0.85, we got great results. This works better than previous methods and proves that our model can accurately find named items. We also examine at how well the model works with other language pairs and code-mixed patterns. This shows how well the model can handle different language situations. Our study helps us understand how to handle data in multiple languages, makes code-mixed NER techniques better, and shows how multilingual transformers can help break down language barriers. The research has implications for areas that need to understand more than one language, such as analysing social media, creating language-specific customer service systems, and finding information across languages. Speaking different languages can communicate more easily and effectively in these fields, which encourage inclusion.
Keywords: named entity recognition; NER; code-mixed texts; hugging face's multilingual transformers; fine-tuning; evaluation metrics; cross-lingual knowledge transfer.
DOI: 10.1504/IJIEI.2024.140171
International Journal of Intelligent Engineering Informatics, 2024 Vol.12 No.3, pp.353 - 376
Received: 30 Aug 2023
Accepted: 28 Nov 2023
Published online: 26 Jul 2024 *