Title: Demographical gender prediction of Twitter users using big data analytics: an application of decision marketing
Authors: Sudipta Roy; Bhavya Patel; Debnath Bhattacharyya; Kushal Dhayal; Tai-Hoon Kim; Mamta Mittal
Addresses: PRT2L, Washington University in St. Louis, Saint Louis, Missouri 63110, USA ' Department of Computer Science and Engineering, Ganpat University, Mehsana 384012, Gujarat, India; Evolutionary System Pvt. Ltd., Kataria Automobiles Rd., Ahmedabad, Gujarat 380051, India ' Department of Computer Science and Engineering, K L Deemed to be University, KLEF, Guntur - 522502, India ' Department of Computer Science and Engineering, Ganpat University, Mehsana 384012, Gujarat, India ' School of Economics and Management, Beijing Jiaotong University, Beijing 100044, China ' Department of Computer Science & Engineering, G.B. Pant Government Engineering College, New Delhi, India
Abstract: Twitter text is difficult to analyse due to the non-standard and unstructured data. Twitter does not accumulate user gender information as do other popular social media platforms. The demographic feature prediction and additional informative content are important for advertising, custom-made marketing and authorised investigation from the social medium. The proposed statistical method with real-time analysis using big data technologies is able to predict the gender of Twitter users. Gender prediction is performed using the naive Bayes classifier to address systemic issues, and Apache Hive is used to solve data cleaning, storage and processing issues. The proposed method is a speedy, easy-to-implement with pre-processing, close to state-of-the-art document text categorisation method using big data technologies.
Keywords: Twitter; naive Bayes; gender classification; perceptron; logistic regression; Apache Hive.
International Journal of Reasoning-based Intelligent Systems, 2021 Vol.13 No.2, pp.41 - 49
Received: 21 May 2019
Accepted: 27 Aug 2019
Published online: 30 Mar 2021 *