Mining multilingual and multiscript Twitter data: unleashing the language and script barrier
by Bidhan Sarkar; Nilanjan Sinhababu; Manob Roy; Pijush Kanti Dutta Pramanik; Prasenjit Choudhury
International Journal of Business Intelligence and Data Mining (IJBIDM), Vol. 16, No. 1, 2020

Abstract: Micro-blogging sites like Twitter have become an opinion hub where views on diverse topics are expressed. Interpreting, comprehending and analysing this emotion-rich information can unearth many valuable insights. The job is trivial if the tweets are in English. But lately, increase in native languages for communication has imposed a great challenge in social media mining. Things become more complicated when people use Roman scripts to write non-English languages. India, being a country with a diverse collection of scripts and languages, encounters the problem severely. We have developed a system that automatically identifies and classifies native tweets, irrespective of the script used. Converting all tweets to English, we get rid of the 'script vs language' problem. The new approach we formulated consists of Script Identification, Language analysis, and Clustered mining. Considering English and the top two Indian languages, we found that the proposed framework gives better precision than the prevailing approaches.

Online publication date: Mon, 02-Dec-2019

The full text of this article is only available to individual subscribers or to users at subscribing institutions.

 
Existing subscribers:
Go to Inderscience Online Journals to access the Full Text of this article.

Pay per view:
If you are not a subscriber and you just want to read the full contents of this article, buy online access here.

Complimentary Subscribers, Editors or Members of the Editorial Board of the International Journal of Business Intelligence and Data Mining (IJBIDM):
Login with your Inderscience username and password:

    Username:        Password:         

Forgotten your password?


Want to subscribe?
A subscription gives you complete access to all articles in the current issue, as well as to all articles in the previous three years (where applicable). See our Orders page to subscribe.

If you still need assistance, please email subs@inderscience.com