Title: An empirical study of alternating least squares collaborative filtering recommendation for Movielens on Apache Hadoop and Spark
Authors: Jung-Bin Li; Szu-Yin Lin; Yu-Hsiang Hsu; Ying-Chu Huang
Addresses: Department of Statistics and Information Science, Fu Jen Catholic University, New Taipei City, Taiwan ' Department of Computer Science and Information Engineering, National Ilan University, Yilan County, Taiwan ' Atrust Fintech Corp., Taipei, Taiwan ' Smart Channel & Logistics Department, Industrial Technology Research Institute, New Taipei City, Taiwan
Abstract: In recent years, both consumers and businesses have faced the problem of information explosion, and the recommendation system provides a possible solution. This study implements a movie recommendation system that provides recommendations to consumers in an effort to increase consumer spending while reducing the time between film selection. This study is a prototype of collaborative filtering recommendation system based on Alternating Least Squares (ALS) algorithm. The advantage of collaborative filtering is that it avoids possible violations of the Personal Data Protection Act and reduces the possibility of errors due to poor quality of personal data. Our research improves the ALS's limited scalability by using a platform that combines Spark with Hadoop Yarn and uses this combination to calculate movie recommendations and store data separately. Based on the results of this study, our proposed system architecture provides recommendations with satisfactory accuracy while maintaining acceptable computational time with limited resources.
Keywords: recommendation system; alternating least squares; collaborative filtering; Movielens; Hadoop; Spark; content-based filtering.
DOI: 10.1504/IJGUC.2020.110053
International Journal of Grid and Utility Computing, 2020 Vol.11 No.5, pp.674 - 682
Received: 12 Apr 2019
Accepted: 12 Aug 2019
Published online: 02 Oct 2020 *