Title: Network traffic performance analysis from passive measurements using gradient boosting machine learning

Authors: Astha Syal; Alina Lazar; Jinoh Kim; Alex Sim; Kesheng Wu

Addresses: Department of Computer Science and Information Systems, Youngstown State University, One University Plaza, Youngstown, OH, USA ' Department of Computer Science and Information Systems, Youngstown State University, One University Plaza, Youngstown, OH, USA ' Department of Computer Science and Information Systems, Texas A&M University-Commerce, Commerce, TX, USA ' Department of Data Science and Technology, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA, USA ' Department of Data Science and Technology, Lawrence Berkeley National Laboratory, One Cyclotron Road, Berkeley, CA, USA

Abstract: Effective monitoring and analysis of network traffic are vital for scientific computing, since scientific applications often require moving massive data from one site to another. A body of statistical and machine learning techniques have been introduced for network traffic monitoring and analysis, but this is considered a highly challenging task due to several reasons, such as unavailability of label information, complication of real-time analysis, and generalisation property of machine learning models. In this paper, we present a novel method to identify continuous time windows of low throughput for the purpose of network performance analysis and anomaly detection, in order to facilitate data transfers for high-performance scientific computing. The presented method is based on supervised learning techniques with an adaptive labelling function that automatically determines if the time window is whether 'slow' or 'normal'. The presented method is validated on real large datasets collected from several data transfer nodes (DTNs) located in Science DMZ. Our experimental results show that the proposed method is able to quickly predict time windows of low performing network transfers, that require attention from network engineers and also to identify the most important features for the classification.

Keywords: network traffic; TCP performance; UMAP; classification; Tstat.

DOI: 10.1504/IJBDI.2021.118741

International Journal of Big Data Intelligence, 2021 Vol.8 No.1, pp.13 - 30

Received: 10 Jan 2020
Accepted: 06 Sep 2020

Published online: 25 Oct 2021 *

Full-text access for editors Access for subscribers Purchase this article Comment on this article