Title: A hybrid ensemble machine learning model to predict success of Bollywood movies

Authors: Garima Verma; Hemraj Verma; Sushil Kumar Dixit

Addresses: School of Computing, DIT University, Dehradun, Uttarakhand, India ' Faculty of Management Studies, DIT University, Dehradun, Uttarakhand, India ' School of Management, Lal Bahadur Shastri Institute of Management, New Delhi, India

Abstract: Bollywood is a multi-billion industry. Hundreds of films are released every year, where each film is an investment of multi-crores. In terms of awards or marketing it has found a place in almost every country and culture. It also contributes and attracts skilled and passionate people to become entrepreneurs. Therefore, it becomes a need as well as a huge concern of the director, producer and all stakeholders involved in a particular film to know the chances of the success of a film on the box office before its release. To address this concern, a hybrid ensemble machine learning model has been proposed. The model uses data sets collected from various sources, such as Boxofficeindia, cinemalytics, YouTube, etc. The model performed pre-processing on data set, which included handling of missing values with mean, cleaning of data, and removal of text values. Feature engineering has been applied in the model to create a new feature called act_direct to make the model more robust. Further, the effectiveness of the model has been tested in terms of accuracy and the AUC-ROC curve. From the experimental results, it is evident that the proposed model ensures relatively better accuracy compared to some recent state-of-art models.

Keywords: Bollywood; entrepreneurs; ensemble; machine learning; feature engineering.

DOI: 10.1504/WREMSD.2021.114439

World Review of Entrepreneurship, Management and Sustainable Development, 2021 Vol.17 No.2/3, pp.343 - 357

Received: 25 May 2019
Accepted: 31 Mar 2020

Published online: 15 Apr 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article