to download project abstract

to download the Base paper

ABSTRACT

The main aim of Network Traffic Classification is to classify the network traffic coming from different applications by analyzing the data packets that were received. Network Traffic is nothing but the data traffic i.e., the amount of data flowing in a particular network. Nowadays there’s a widespread use of encryption techniques in network applications and network traffic classification has become a major challenge for managing the network. Network Traffic Classification is now a very important task for Internet Service Providers in order to know the type of applications flowing in a network and is used to analyze the different types of applications in a network. Nowadays the most common technique we have used is Machine Learning Based Techniques because it has given more accurate and effective results. We used four different machine learning algorithms on encrypted data and finally we got 99.8% Accuracy for Decision Tree Algorithm, 92% for Random Forest Algorithm,99% for Naive Bayes Algorithm and 87% for KNN Algorithm respectively. Network Traffic Classification is a central topic nowadays in the field of computer science. It is a very essential task for internet service providers (ISPs) to know which types of network applications flow in a network. Network Traffic Classification is the first step to analyze and identify different types of applications flowing in a network. There are many traditional techniques to classify internet traffic like Port Based, Payload Based and Machine Learning Based techniques. The most common technique used these days is Machine Learning (ML) technique, which is used by many researchers and has very effective results. In this project we attempt to implement a machine learning approach to classify Encrypted network traffic. This project, aims at building models like Decision Tree, Random forest ,knn for Encrypted network traffic classification. Nowadays there is a widespread use of encryption techniques in network applications and network traffic classification has become a major challenge for managing the network. In our project we’ll use Machine Learning Based Techniques to analyze the traffic using algorithms like Decision Tree, Random Forest, KNN, Naive Bayes. Traffic classification is an automated process which categorises computer network traffic according to various parameters (for example, based on port number or protocol) into a number of traffic classes. Traffic classification utilizing flow measurement enables operators to perform essential network management. Flow accounting methods such as NetFlow are, however, considered inadequate for classification requiring additional packet-level information, host behaviour analysis, and specialized hardware limiting their practical adoption.

INTRODUCTION

Data encryption has become the primary means of maintaining the privacy of Internet communications. According to data released in April 2020 by Statistica, 63 percent of organizations use the Transport Layer Security (TLS) and Secure Socket Layer (SSL)cryptographic protocols extensively. Another 23 percent use them partially. The introduction of network traffic encryption has significantly improved communication security and user privacy. When using technologies, like Transport Layer Security (TLS), most internet users assume that third parties cannot gain access to their communications and companies rest assured that their transactions are safe from interference and eavesdropping. Encryption plays an essential role in data security and privacy, but it also provides cybercriminals with an efficient mechanism for distributing malware. That’s why you need security tools that are capable of inspecting encrypted network traffic. To that purpose, research and methods are evaluated through the following essential use cases:

  • Application identification;
  • Network analytics;
  • User information identification;
  • Detection of encrypted malware;
  • File/Device/Website/Location fingerprinting;
  • DNS tunnelling detection.

KNN Algorithm:

k-NN is a type of classification where the function is only approximated locally and all computation is deferred until function evaluation. Since this algorithm relies on distance for classification, if the features represent different physical units or come in vastly different scales then normalizing the training data can improve its accuracy dramatically.

Both for classification and regression, a useful technique can be to assign weights to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.

The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.

1) K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on Supervised Learning technique.

2) K-NN algorithm assumes the similarity between the new case/data and available cases and put the new case into the category that is most similar to the available categories.

3) K-NN algorithm stores all the available data and classifies a new data point based on the similarity. This means when new data appears then it can be easily classified into a well suite category by using K- NN algorithm.

4)K-NN algorithm can be used for Regression as well as for Classification but mostly it is used for the Classification problems.

5)K-NN is a non-parametric algorithm, which means it does not make any assumption on underlying data.

6)It is also called a lazy learner algorithm because it does not learn from the training set immediately instead it stores the dataset and at the time of classification, it performs an action on the dataset.

7)KNN algorithm at the training phase just stores the dataset and when it gets new data, then it classifies that data into a category that is much similar to the new data.

KNN Algorithm:

K-nearest neighbours (KNN) algorithm uses ‘feature similarity’ to predict the values of new data points which further means that the new data point will be assigned a value based on how closely it matches the points in the training set.

Step 1 − For implementing any algorithm, we need dataset. So during the first step of KNN, we must load the training as well as test data.

Step 2 − Next, we need to choose the value of K i.e. the nearest data points. K can be any integer.

Step 3 − For each point in the test data do the following −

3.1 − Calculate the distance between test data and each row of training data with the help of any of the method namely: Euclidean, Manhattan or Hamming distance. The most commonly used method to calculate distance is Euclidean.

3.2 − Now, based on the distance value, sort them in ascending order.

3.3 − Next, it will choose the top K rows from the sorted array.

3.4 − Now, it will assign a class to the test point based on most frequent class of these rows.

Step 4 − End

Leave a Comment

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *