Intrusion Detection Method Using Protocol Classification and Rough Set Based Support Vector Machine

In order to improve the efficiency of support vector intrusion detection, we first do protocol Classification for the intrusion data, then refine its characteristic by rough set reduction. By using these procedures, we propose an intrusion detection method using protocol classification and rough set based support vector machine. The method is divided into training and testing processes. In the process of training, we first do protocol classification for the training data, and then do rough set refinement. The refined characteristics are stored as the pre-defined process, and finally the usage of support vector machine for data reduction training, the training model will be stored in accordance with the agreement. In the testing process, the data is classified according to protocol classification and then start the characteristics reduction procedure according to protocol classification. Finally, make a decision using the Support Vector Machines that corresponding to the agreement. The experimental results based on KDDCUP'99 data show that the method is the method is faster and the detection accuracy is comparable compared with the SVM without using protocol classification and using all characteristic.


Introduction
Support Vector Machine, refer to Vapnik, 1995, Burges, 1996, P.121-167, is based on structured risk minimization and statistic theory.It overcomes the shortcoming such as difficult to handle of small samples, high dimension, over-matching, local minimization problems etc, that exists in the conventional methods like natural network.Therefore, it is a new high performance learning method, and it has been widely applied in intrusion detection face reorganization, voice processing and so on.
Intrusion detection is essentially a classification problem.It can be viewed as a classification process for test samples of training models.However, the construction of intrusion detection model needs to do learning for thousands of samples; there are tens of characteristics for every sample.Moreover, samples have the property of different structure.If we put the entire characteristic into intrusion detection, SVM will have to solve complex a quadratic programming problem.Therefore, the method is inefficient.
Actually, certain dependent relationship exists in the high dimension characteristics, therefore, how to find this dependence, and then compress the data so as to reduce the dimension, are significant for shorten SVM training time, detection time, and choosing the optimal parameter (Mukkamala, Janoski &.Sung, 2001, P. 1702-1707, Sung, 405-411, Lin & .Cunningham, P.190-198).In (Frohlich, Chapelle & B.Scholkopf, 2003, P.142-148), the genetic algorithm is adopted to optimize the model and characteristic chosen.In (Roberto,Guofei & Wenke,ICDM'06), n-grams is chosen to choose the host computer character and construct a combined SVM detector.In (Sung,, the weighted SVM W is adopted to order and choose the characteristic, and by deleting the low influenced characteristics so as to find the most efficient two kinds of methods.
These kinds of methods have made considerable progress; however, these methods are always distilling the characteristic from all the data.Actually, the intrusion detection usually uses the leak of the protocol, and for every kind of protocol, the intrusion data has different characteristic.For different protocols, if different characteristic is used, the method will more powerful, and hence it will be helpful for improving the learning efficiency of the model.
Rough sets (Pawlak, 1982, P.341-356) is a frequently used method for distilling the characteristics, it is efficient in decreasing the dimension of data.In this paper, we propose to combine the protocol classification and rough sets methods, and so as to produce a intrusion detection method that is based on protocol classification and rough set SVM.By classifying the data based on data protocol, and reduction, we can give the training and detection model.Using the KDDCUP'99 intrusion data, we verify the method.(Vapnik, 1995, Burges, 1998, P.121-167) Suppose {( , ),( , ), ,( ,

Classify Support Vector Machine
We want to find the optimal partition plane y W X b = ⋅ + , which is equivalent to solve the convex quadratic Burges programming problem: , , 1 Where w is the normal vector of hyperplane, b is the deviation, while C a punish function parameter in the case of incomplete integral, and i ξ is a relaxation parameter in the case of relaxing the constraint conditions.By introducing the Lagrange multiplier: Then do partial differential for L p : In order to obtain a i , we convert the original problem in to a dual problem, and introduce kernel function ( , ) By solving (4) we can obtain a i , then submit it into (3) we have: . As the quadratic programming problem satisfy the KKT condition, so we have ( , ) 1 , with a i * is a coefficient larger than 0. As only when 0 a i > , it has effect on the value of Q D . Therefore, we call the support vector corresponding to 0 a i > as the support vector of X i .Then we can get the decision function * ( ) sgn( ( , ) )

Rough Sets
Rough sets are proposed by Z.Pawlak in 1982, it is a data mining method which can be used to study the incompleteness of data, and uncertainty of knowledge.The basic idea of data reduction by using rough set theory can be outlined as follows: it find the decision regulation by the dependence relationship between the sample attribute and the decision attribute; then judge the importance by the degree of influence of attribute to the decision.By these procedures, the unimportant attribute can be removed, so as to achieve the classification ability of reduce the data characteristic and preserve the data nature.
Definition 1. Information system is a four number set I=<S,A,V,F>, with U is the nonempty sample set, and A is the attribute set, and V is the attribute value region, and F is the map, which can give a value from V for every sample attribute A in S.
For the training sample, there is some classification marks, such as the 42 dimensional intrusion sample of KDDCUP'99 is "normal", "abnormal" and so on.These attributes are called decision attributes.By introducing the decision attribute, we can obtain the decision graph by the information system.
Definition 2. The decision graph of information system is a four number set T=<S,A&{d}, V, F>, with A be the sample attribute, and its value a is called as condition attribute, and d is the decision attribute.Definition 5.The decision reduction refers to that in DT; we seek the smallest attribute set such that Though the decision reduction is a NP hard problem, there exist many fast reduction algorithms; this topic is beyond the discussion of this paper.Decision graph can be established by the decision graph discriminate matrix.Definition 6. Suppose M is the decision graph discriminate matrix constructed based on DT, the element Mij on the (i, j) position is defined as follows, By classifying the data protocol, and construct a decision graph for every group of data, then reduce the decision graph using the reduction algorithms, then we can obtain the different data set reduced from different data protocol.

The SVM intrusion detection method using protocol classification and rough Set
In the former investigation of rough set data reduction, the protocol is indiscriminate and the reduction is for all the data.There are two shortcomings in these approaches: firstly, all the data is strongly different structured, study the data using SVM, we need to introduce a new computation method for distance.On the other side, intrusion usually takes the leak of the different structured data.The indiscriminate protocol is just a broad detection method, it does not consider the different characteristic in different data, and hence these methods are not aimed.We propose the SVM intrusion detection method using protocol classification and rough sets, it is able to remove the shortcomings in the original methods, and is able to improve the detection time and the accuracy.
Classifying the protocol, using the rough set to reduce the data, then do training to the reduced data, i.e. the corresponding SVM input.The obtained training model is the SVM detector corresponding to different protocol.The SVM intrusion detection method using protocol classification and rough sets can be described as the following Fig 1.
In Fig 1, the real line illustrates the training process, the training data is classified according to protocol.Three different kinds of intrusion data is divided, denoted by TCP, UDP, and ICMP.Then carrying out the rough sets study for these three kinds of intrusion data, the studying procedure is denoted by T, U, and I.The reduced characteristic after study is used as the SVM study input; on the other hand, the reduced regularization is stored as the pre-definition process, denoted by reduction T, reduction U and reduction I. Three SVM study apparatus will become three detector models after study; they are stored as three detectors T, U and I.In Figure 1, the dash line denotes the detection process of the test data.The test data first classified by the protocol, then the reduction procedures are started based on different protocol data, the reduced data is inputted into corresponding detector, and the test results come from the detector.The SVM intrusion detection method using protocol classification and rough sets can be described as the following algorithm: Step 1: Input the training data, start protocol classification, the data is divided into TCP, UDP, AND, ICMP according to different data protocol; and they are stored in database.
Step 2: Start the rough sets study machine, reduce three kinds of data separably,then obtain their own reduced characteristic set T, U and I. Then construct a SQL sentence based on the characteristic set, which is stored as the pre-definition process.Finally, the reduced training data is inputted into the corresponding SVM study machine.
Step 3: Start the SVM study machine T, U and I, then obtain their own decision function by study.
Step 4: For the input data X to be detect, first do protocol classification, then start the pre-defined rough sets reduction process according to classification.
Step 4: Input the reduced data into the corresponding SVM detector, the output the detection results through the SVM detector, normal is denoted by +1, and abnormal is denoted by -1.

The tested data
KDDCUP'99 is obtained in the real net work.It can be used to simulate the 5 classes including 23 different kinds of data arising from attack, these data can be used as experimental data in data mining.The 10% subset of the data has 494021 records, and each record has 41 characteristics, which incorporate the continuous, discrete and text data.We can put a note at the end of each record to show whether the data is normal.Therefore, such kind of data set is a classical multi-protocol multi-attack different structured data set.By classifying the protocol for the normal and attack situations, the results are illustrated in Figure 2 as follows Statistical results show that TCP protocol records are 190064, and ICMP protocol records are 283602, and UDP protocol records are 20354.In the TCP protocol classification, there are all different kinds of attack, and DoS attack most frequently.In the UDP and ICMP protocol, the R2L and U2L attack almost never appear.For the UDP protocol, the abnormal data includes DoS and Probe.For the ICMP protocol the DoS attack has 280 thousands records.The abnormal data is mainly DoS data.
After protocol classification, we begin to do test from selected training data and test data, the test results are outlined as follows, (1).TCP test data: Choosing 30000 records from the TCP data set, where the normal data is 12802 items; and abnormal data is 17198 items (DOS has 16560 items, Prob has 422 items, R2L has 188 items, U2L has 8 items).
(2).UDP test data: Choosing 10173 records from the UDP data set, where the normal data has 9586 items, and abnormal data has 587 items (DoS has 489 items, Prob has 98 items).
(3).ICMP test data: Choosing 28353 records from the ICMP data set, where the normal data has 128 items, and abnormal data has 28225 items (DoS has 28105 items, and Prob has 120 items).
Taking 70% data randomly from the test data set for training; then leaving other 30% for test.

The reduction of the test data
Reducing the data by means of Rosetta tool Komorowski, 1997, P.403-407, and form different reduction set from the 41 reduced characteristic.The characteristic set reduced from TCP, UDP and ICMP are outlined in the following Figure 3, Figure 4 and Figure 5.
Choosing two groups of characteristic set, for example, take the first and the eighth from TCP, and take the sixth and the 30th from UDP, and take the sixth and the eighth from ICMP.By reducing the characteristic for the corresponding training data and test data, we can obtain the training data and the test data after characteristic reduction.Compare with the characteristic with the ones given in Sung P.405-411, we can see our approach has less characteristic and easier to deal with, and finally the test result shows that the our method can preserve high accuracy and much faster.

Data training and detection
In the test, we choose RBF function 2 ( , ) exp( ( )/ 2 ) f x x x x i j i j σ = − − as the SVM kernel, and adopt 5-Fold Cross Validation, embedded in the LibSVM software by Chihjen.The test is in three steps, firstly, we use grid search (grid.pycommand) to compute the optimal punish parameter C and 2 σ , then obtain the training model by train the training data using the optimal parameter.and finally test using the trained data.Take the example using 21000 TCP training data and 9000 test data, the parameter search is outlined in Figure 6.The optimal value is 512 C = , 2 0.03125 σ = . By using these two parameters to train the 21000 TCP data, we obtain train.txt.model.Then we use this model to do test for these 90000 data.Finally, we obtain the training time, the detection time, and the accuracy.
For comparison reasons, the intrusion data and detection is divided into three situations.The first is to do test on the classified data by the complete characteristic.The second is to do test on the classified data by the reduced characteristic.The third is to do test on the unclassified data.The final test results are outlined in Figure 1, Figure 2 and Figure 3.
Comparing Figure 2 and Figure 3, we can discover that the training time and the detection time is shorten by using protocol classification, moreover, the detection accuracy is not damaged.
Comparing Figure 1 and Figure 2, we can see that using characteristic reduction and not using characteristic reduction has similar accuracy, however, the detection time and training time is saved obviously by using the characteristic reduction.Therefore, our conclusion is as follows, protocol classification along with characteristic reduction need less time, while using the complete characteristic need much more time, further more time is needed if protocol classification and characteristic reduction are all not carried out.

Figure 1 .
Figure 1.The SVM intrusion detection method using protocol classification and rough sets

Figure 6 .
Figure 6.Parameter search of TCP training data Definition 3. Indiscriminate relationship can be described as follows, in the decision graph DT, with B A attribute, i.e. the sample can not be discernible from attribute B. The decision indiscriminate relationship can be constructed based on the concept.Definition 4. The indiscriminate relationship of decision is refer to the following fact, in

Table 1 .
Experimental result with protocol difference but without reduction