Title: Statistical machine learning for network intrusion detection: a data quality perspective

Authors: Eitel J.M. Lauria, Giri Kumar Tayi

Addresses: Marist College, Poughkeepsie, NY, USA. ' University at Albany, SUNY Albany, NY, USA

Abstract: In this paper, we present our research in applying statistical machine learning methods for network intrusion detection. With the advent of online distributed services, the issue of preventing network intrusion and other forms of information security failures is gaining prominence. In this work, we use two different algorithms for classification (decision trees and naive Bayes classifier) to build predictive models capable of distinguishing between |bad| TCP/IP connections, called intrusions attacks, and |good| normal TCP/IP connections. We investigate the effect of training the models using both clean and dirty data. The goal is to analyse the predictive power of network intrusion classification models trained with data of varying quality. The classifiers are contrasted with a clustering-based approach for comparison purposes.

Keywords: data mining; data quality; network intrusion detection; machine learning; classification; distributed service infrastructure; information security; security failures; predictive modelling; intrusions attacks; clustering.

DOI: 10.1504/IJSSCI.2008.019611

International Journal of Services Sciences, 2008 Vol.1 No.2, pp.179 - 195

Published online: 17 Jul 2008 *

Full-text access for editors Full-text access for subscribers Purchase this article Comment on this article