Authors: Ranjan Kumar Dash
Addresses: Department of Information Technology, College of Engineering and Technology, Bhubaneswar, Odisha, India
Abstract: The huge volume and variety of data stored in big data provide more accurate predictive platform for the users. However, the decision-making process becomes a tedious task due to requirement of much computational time and memory to access them. Thus, a solution to the said problem is data scoring that provides the selection of only those variables or features that impact the decision-making process to a greater extend. To cater the need of an efficient data scoring model, the work carried out in this paper proposes a new data scoring model for big data. The proposed model uses adaptive LASSO as the statistical method. The steps involved in the design of the proposed model are outlined with proper explanation. The model is trained and tested by k-fold cross validation technique. The performance of the model is measured using ROC curve. The model is simulated using R and is applied on three distinct datasets. To make a comparison with LASSO, LASSO is also applied on these datasets. The simulated results reveal that the adaptive LASSO performs better than LASSO for large-sized datasets.
Keywords: big data; regression analysis; data scoring; receiver operating characteristic curves; discriminant analysis; decision tree; support vector machine; random forest; intelligent system.
International Journal of Intelligent Enterprise, 2020 Vol.7 No.1/2/3, pp.356 - 371
Received: 28 Nov 2018
Accepted: 08 Apr 2019
Published online: 27 Jan 2020 *