Title: Composition analysis of the Bot-IoT dataset

Authors: Jared M. Peterson; Taghi M. Khoshgoftaar; Joffrey L. Leevy

Addresses: Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA ' Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA ' Department of Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA

Abstract: As machine learning continues to be a promising tool for cyber security, industry and researchers have continued to develop datasets for research. These datasets often contain multiple emulated exemplars for common attacks seen in real-world networks. The datasets provide researchers with the necessary samples to train and test the detection capabilities of their machine learning models. This paper contains an in-depth analysis of the composition of one of the newest datasets, Bot-IoT. The full dataset contains about 73 million instances (big data), three dependent features, 26 independent features, and four primary attack categories. The purpose of this paper is to provide researchers with an understanding of the environment used to create Bot-IoT and how that environment effected its composition. A detailed analysis of the dataset's composition can provide additional insight into the dataset's suitability for machine learning.

Keywords: Bot-IoT; machine learning; destination port; internet of things; IoT; intrusion detection; denial of service; DoS; distributed denial of service; DDoS; information theft; reconnaissance.

DOI: 10.1504/IJITCA.2022.124371

International Journal of Internet of Things and Cyber-Assurance, 2022 Vol.2 No.1, pp.31 - 44

Received: 22 Feb 2022
Accepted: 09 Mar 2022

Published online: 25 Jul 2022 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article