click here to download the base paper

click here to download the abstract project

Abstract

Machine Learning has been steadily gaining traction for its use in Anomaly-based Network Intrusion Detection Systems (A-NIDS). Research into this domain is frequently performed using the KDD CUP 99 dataset as a benchmark. Several studies question its usability while constructing a contemporary NIDS, due to the skewed response distribution, nonstationarity, and failure to incorporate modern attacks. In this paper, we compare the performance for KDD-99 alternatives when trained using classification models commonly found in literature: Neural Network, Support Vector Machine, Decision Tree, Random Forest, Naive Bayes and K-Means. Applying the SMOTE oversampling technique and random undersampling, we
create a balanced version of NSL-KDD and prove that skewed target classes in KDD-99 and NSL-KDD hamper the efficacy of classifiers on minority classes (U2R and R2L), leading to possible security risks. We explore UNSW-NB15, a modern substitute to KDD-99 with greater uniformity of pattern distribution. We
benchmark this dataset before and after SMOTE oversampling to observe the effect on minority performance. Our results indicate that classifiers trained on UNSW-NB15 match or better the Weighted F1-Score of those trained on NSL-KDD and KDD99 in the binary case, thus advocating UNSW-NB15 as a modern substitute to these datasets.
Keywords—KDD-99, NSL-KDD, Network Intrusion Detection,Benchmarking, SMOTE, UNSW-NB15.

INTRODUCTION
Network security is an ever-evolving discipline where new types of attacks manifest and must be mitigated on a daily basis. This has led to the development of software to aid the identification of security breaches from traffic packets in real time: Network Intrusion Detection Systems (NIDS). These
can be further categorized into misuse-based (M-NIDS) and anomaly-based systems (A-NIDS). M-NIDSs detect intrusions by an exact matching of network traffic to known attack signatures. An A-NIDS protects resources on computer networks by differentiating ordinary and suspicious traffic patterns, even those of which it has been previously unaware. Typically, ANIDS systems are either statistical-based, knowledge-based or machine-learning based, with each category facing inherent drawbacks. Statistical techniques are susceptible to being gradually taught a deceptive version of normalcy, and rely on the quasi-stationary process assumption . Knowledge-based expert systems employing predicates, state machine analysis and specifications suffer from scalability problems: asthe number of attack vectors grows in tandem with the rise in volume and variety of network traffic, it becomes exceedingly difficult to construct an omniscient set of rules . The state of art thus explores systems that automatically gain knowledge of how to distinguish benign usage patterns from malicious ones using a variety of machine learning techniques. In a machine learning approach, a classifier (or model) is trained using a machine learning algorithm on a dataset of normal and abnormal traffic patterns. A trained model may subsequently be employed to flag suspicious traffic in real time. Such a dataset typically considers each pattern across
several features and an associated target class, which denotes whether the pattern corresponds to normal or abnormal usage. Further training on fresh examples allows the model to adapt to the current network state. While systems adopting such an approach are not without fault, the preeminent advantages offered include the ability to gain knowledge without explicit programming, and adaptability to dynamic traffic patterns .The selection of a training dataset is integral to the security of a modern A-NIDS using machine learning techniques. In the ideal case, such datasets would be specific to each network deployment however, a lack of alternatives has led to several works focusing on the KDD CUP 99 dataset as a popular benchmark for classifier accuracy . Unfortunately, KDD-99 suffers several weaknesses which discourage its use in the modern context, including: its age, highly skewed targets, non-stationarity between training and test datasets, pattern redundancy, and irrelevant features. Other researchers have devised several countermeasures that mitigate these flaws. An important effort by Tavallaee et al. was to introduce
NSL-KDD, a more balanced resampling of KDD-99 where focus is given to examples which are likely to be missed by classifiers trained on the basic KDD-99. A significant trend that has been discovered is the poor
performance of classifiers on minority classes of KDD-99, an obstacle which NSL-KDD is unable to eliminate. We argue that this is due to the extreme skewedness of the dataset with respect to its target class distribution. In this work, we empirically prove that NSL-SMOTE – a balanced dataset constructed using oversampling and undersampling – shows significant performance improvements over NSL-KDD on the same minority classes. The inherent age of the dataset is another major drawback. At the time of this writing, KDD-99 is nearly two decades old and proves insufficient and archaic in the modern security context . We believe that the inertia of extant research using KDD-99 as the primary dataset makes authors reluctant to “start from scratch” on another one when constructing complex algorithms. Hence, a major contribution of this work is to benchmark the performance of a modern and balanced dataset, UNSW-NB15, on standard machine learning classifiers used widely in academia. Consequently, we observe the class-wise and Weighted F1-Score of Neural Network, Support Vector Machine, Decision Tree, Random Forest and K-Means classifiers trained on UNSW-NB15. We then contrast its performance to that of NB15-SMOTE, an oversampling of UNSW-NB15’s minority classes. Finally, binarization of targets eliminates imbalance and allows a direct comparison of the datasets. Extant research conducted on KDD-99 is thus aided by proving the suitability of basic machine learning techniques on a more rounded dataset. The structure of this paper is as follows: Section II briefly describes the characteristics of KDD, NSL-KDD, and UNSWNB15. In Section III we present the Machine Learning pipeline that we apply to train classification models on each dataset. This also discusses SMOTE oversampling, which is used to create the NSL-SMOTE and NB15-SMOTE datasets. Section IV showcases and analyzes the results of different
classifiers obtained on these datasets. We conclude our paper with our inferences in Section V.

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *