Authors: Gaurav Yadav; Santosh Singh Rathore; Debanjan Sadhya
Addresses: Department of IT, ABV-IIITM, Gwalior, India ' Department of IT, ABV-IIITM, Gwalior, India ' Department of IT, ABV-IIITM, Gwalior, India
Abstract: Named entity recognition (NER) is the process of categorisation of a given entity in texts into a corresponding pre-defined category such as PE for name of the person, LOC for the name of location in the text, ORG for name of an organisation, etc. NER is an important step in the process of text mining when searching for textual information is to be done. Each information domain has a different set of entities, which requires the development of domain dependent NER system. This paper presents a NER approach for entity identification in the weather domain texts in the Hindi language. The presented approach is two-fold. In the first fold, we collect weather data by crawling the Hindi weather forecasting websites. In the second fold, we apply a machine learning algorithm with a vector representation of the Hindi language on the collected data to train the model. Further, the model is used to classify entities in the unknown weather text data. Experimental results showed that the presented approach produced an improved result for NER for the used weather dataset.
Keywords: NLP; named entity recognition; NER; regional language; support vector machine; SVM; Hindi language; continuous bag of word; CBOW; word embeddings.
International Journal of Swarm Intelligence, 2021 Vol.6 No.2, pp.178 - 187
Received: 28 Jun 2020
Accepted: 27 Nov 2020
Published online: 29 Oct 2021 *