Title: A machine learning approach using random forest and LASSO to predict wine quality
Authors: Ioannis Athanasiadis; Dimitrios Ioannides
Addresses: University of Macedonia, Egnatia Str. 156, Thessaloniki 54636, Greece ' University of Macedonia, Egnatia Str. 156, Thessaloniki 54636, Greece
Abstract: Quality assessment is a key factor for the wine industry, where the aim is to meet consumers' needs/demands and promote sales. Quality assessment is usually performed by experts and it is a time-consuming and expensive process. This paper proposes an alternative assessment using machine learning methods, such as the least absolute shrinkage and selection operator (LASSO) and random forest to predict wine quality. Our data analysis is based on a real wine dataset provided by a well-known wine firm in Greece. For this purpose, we employ the LASSO method, which is particularly effective in selecting the best possible number of variables required. Additionally, the random forest method is used and its findings are contrasted to those derived by four different M.L. methods, namely, linear discriminant analysis (LDA), classification and regression trees (CART), k-nearest neighbours (kNN) and support vector machines (SVM), and using the well-known ten-fold cross-validation method. The results of our analysis show that the statistical technique of random forest proposed improves the accuracy of the prediction wine quality, up to almost 95%, compared to the rankings attributed by wine tasters.
Keywords: random forest; least absolute shrinkage and selection operator; LASSO; machine learning; physicochemical properties; wine quality; prediction; linear discriminant analysis; LDA; classification and regression trees; CART; k-nearest neighbours; kNN; support vector machines; SVM.
International Journal of Sustainable Agricultural Management and Informatics, 2021 Vol.7 No.3, pp.232 - 251
Received: 15 Feb 2021
Accepted: 24 May 2021
Published online: 12 Oct 2021 *