Title: Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm
Authors: Emmanuel Boachie; Chunlin Li
Addresses: School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China; Department of Computer Science, Kumasi Technical University, Box 854, Kumasi, Ghana ' School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China
Abstract: Data processing is an effective tool for educational sector, which can improve admission selection procedures and decisions. Most research papers focus on computational and theoretical aspect of education though little effort have been put on technological aspect of applying data mining techniques on students admission process. We therefore design a simple spark streaming framework together with machine learning algorithm to guide admission processing. We implement the spark streaming model and the proposed machine learning algorithm in a selected university using its admissions' data. The focus is on the number of students that can be admitted and those that should be rejected to reduce time and cost. The case study we evaluated show the practical usefulness of Spark streaming and machine learning algorithm for data processing in a real-time to reduce time and cost. The experiment results also confirm meaningful graphical interpretation of data using spark streaming and machine learning algorithm for students' selection for admissions.
Keywords: spark streaming; big data processing; machine learning algorithm.
International Journal of Continuing Engineering Education and Life-Long Learning, 2019 Vol.29 No.1/2, pp.5 - 20
Received: 22 Jun 2018
Accepted: 18 Sep 2018
Published online: 16 Apr 2019 *