Title: Big data processing with Apache Spark in university institutions: spark streaming and machine learning algorithm

Authors: Emmanuel Boachie; Chunlin Li

Addresses: School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China; Department of Computer Science, Kumasi Technical University, Box 854, Kumasi, Ghana ' School of Computer Science and Technology, Wuhan University of Technology, Wuhan 430063, China

Abstract: Data processing is an effective tool for educational sector, which can improve admission selection procedures and decisions. Most research papers focus on computational and theoretical aspect of education though little effort have been put on technological aspect of applying data mining techniques on students admission process. We therefore design a simple spark streaming framework together with machine learning algorithm to guide admission processing. We implement the spark streaming model and the proposed machine learning algorithm in a selected university using its admissions' data. The focus is on the number of students that can be admitted and those that should be rejected to reduce time and cost. The case study we evaluated show the practical usefulness of Spark streaming and machine learning algorithm for data processing in a real-time to reduce time and cost. The experiment results also confirm meaningful graphical interpretation of data using spark streaming and machine learning algorithm for students' selection for admissions.

Keywords: spark streaming; big data processing; machine learning algorithm.

DOI: 10.1504/IJCEELL.2019.099217

International Journal of Continuing Engineering Education and Life-Long Learning, 2019 Vol.29 No.1/2, pp.5 - 20

Available online: 16 Apr 2019 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article