Authors: Alhamza Munther; Ali Abdulrazzaq; Mosleh M. Abualhaj; Ghada Almukhaini
Addresses: IT Department, Sur College of Applied Science, Ministry of Higher Education, Sur, Sultanate of Oman ' Department of Computer Science, College of Education for Pure Science, University of Mosul, Mosul, Iraq ' Networks and Information Security Department, Faculty of Information Technology, Al-Ahliyya Amman University, Amman, Jordan ' IT Department, Sur College of Applied Science, Ministry of Higher Education, Sur, Sultanate of Oman
Abstract: Application level traffic classification is an essential requirement for stable network operation and resource management. However, the classification's processing tends to face low resources when high volumes of traffic are being classified in high-speed networks in real time. Memory consumption considered to be a serious issue during classification processing time. In this paper, a data reduction method is proposed to decrease redundant data entry during the preprocessing phase with regard to accuracy classification. The proposed active build-model random forest (ABRF) eliminates redundant data-entry by utilising feature selection algorithm during the preprocessing phase. The proposed system successfully reduces the memory space of the entire classification process. The system is evaluated by comparing the proposed system against four classifiers (RF, NB, SVM and C5.0) and four features selection techniques (FCBF, SFE, Chi2 and GR). DR reported excellent results amongst the NB, C5.0 and RF. The results were optimised due to the data excluding 314,216 out of 774,013. Moreover, C5.0 consumed less memory space due to the decreased depth of C5.0 tree model. In conclusion, the DR was most effective on the RF model due to the nature of the ensemble classifier.
Keywords: internet traffic classification; machine learning; feature selection techniques; supervised learning; random forest.
International Journal of Networking and Virtual Organisations, 2021 Vol.24 No.2, pp.144 - 160
Received: 04 Jan 2020
Accepted: 31 Mar 2020
Published online: 21 Apr 2021 *