Machine Learning Approach to Combat False Alarms in Wireless Intrusion Detection System

Wireless Networks facilitate the ease of communicat ion for sharing the crucial informat ion. Recently, most of the small and large-scale companies, educational institutions, government organizations, medical sectors, military and banking sectors are using the wireless networks. Security threats, a common term found both in wired as well as in wireless networks. However, it holds lot of importance in wireless networks because of its susceptible nature to threats. Security concerns in WLAN are studied and many organizations concluded that Wireless Intrusion Detection Systems (WIDS) is an essential element in network security infrastructure to monitor wireless activity for signs of attacks. However, it is an indisputable fact that the art of detecting attacks re mains in its infancy. WIDS generally collect the activities within the protected network and analyze them to detect intrusions and generates an intrusion alarm. Irrespective of the different types of Intrusion Detection Systems, the major problems arising with WIDS is its inability to handle large volumes of alarms and more prone to false alarm attacks. Reducing the false alarms can improve the overall efficiency of the WIDS. Many techniques have been proposed in the literature to reduce the false alarm rates. However, most of the existing techniques are failed to provide desirable result and the high complexity to achieve high detection rate with less false alarm rates. This is the right time to propose a new technique for providing high detection accuracy with less false alarm rate. This paper made an extensive survey about the role of machine learn ing techniques to reduce the false alarm rate in W LAN IEEE 802.11. This survey proved that the substantial improvement has been achieved by reducing false alarm rate through machine learn ing algorithms. In addit ion to that, advancements specific to machine learning approaches is studied meticulously and a filtration technique is proposed.


Introduction
Wireless networks one of the promising developments in this decade have changed the way we live and work.Current world is accessible anywhere and everywhere is the best analogy that can describe the efficiency and usage of wireless networks.Wi-Fi proved the significance of the wireless networks in terms of mobility, cost effectiveness, resource sharing and its presence where wired networks are impossible.Despite of its tremendous benefits, the major pit fall that is creating havoc in this technology is its lack of well-defined security control measures.Since the transmission mediu m chosen for the wireless networks is air, anybody can tune it into the specific network and gain access to it with the aid of simp le wireless devices and powerful operating system.The traditional security standard that was introduced along with IEEE 802.11 standard in 1999 was the WEP encryption method where a shared key is used for both encryption and decryption.This technology failed drastically with the simp le brute force attacking tools like A ir snort and WEP crack.The next standard WAP was introduced in the year 2003 that provided encryption by mixing up the key and named as Tempo ral Key Integrity Protocol (TKIP).Cisco's LEAP was the next standard that allowed authenticated data to pass between the Access Point (AP) and the RADIUS [1].its authorized signal range.The necessary signal range for listening to the network is much lower than that of the one necessary to make a connection, thus providing the hackers an opportunity to listen to the network without many efforts.The other form of security attacks in accordance with AP can happen due to its poor configuration or unauthorized Rogue AP.Other security vulnerability apart fro m AP also exists and one such comes in the form of frames [2].W LAN can produce three types of frames like management, data a nd control frames at the Medium Access Control (MA C) layer.Currently available W EP,WPA, WPA -2 standards can protect only Data frames.This opens up lot of oppurtunities for the hackers to perform efficient spoof attacks especially termed as Denial of Serv ice (DoS) attacks that can be invoked in almost every layer of the WLA N. Typical DoS attacks involve flooding the network with traffic choking the transmission lines and preventing other legit imate users fro m accessing services on the network.Th is clearly gives us an idea that the weakness exist in network, application as well as in Link layer.To combat these types of attacks and to protect the network, a system that can basically work well on unencrypted data, that has complete knowledge about the network in the form of the users, APS and Rogue and the actions to take if we detect an unauthorized access point.With this informat ion in hand, it becomes easy for the administrators or for the underlying system to protect the network fro m unnecessary attacks.Such a system is being coined as Wireless Intrusion Detection System (WIDS).An intrusion detection system can informally be in analogy with the security camera installed in an organizat ion that can basically mon itor and track all the activ ities happenin g in an organizat ion.Like a security camera an IDS can monitor the wired and wireless networks for intruders and create an alarm.WIDS dynamically monitors the events of a host or a network, analy ze and report on possible unauthorized attempts.These unau thorized attempts typically termed as alarms and they are generated whenever the detection system come across any event that can directly or indirect ly harm the network.The general form of attacks that the WIDS can identify are AS leap attack, Association frame flooding, authentication frame flooding, broadcasting de-authentication, EAPOL packet flooding, Invalid MAC OUI, Nu ll SSID probe response, spoofed De -authentication, long duration attack and weak WEP detection.Identifying and isolating the real alarms fro m false alarms remains as an unsolved problem because of the reasons listed [5]: 1. False alarms differ only slightly to an extent that only the context can say that the alarm generated is false.
2. Alarms are generated which varies depending on the environment.i.e., actions that are normal in some environments may be malicious in other environment and that can ultimately raise a false alarm.
3. It is comp letely unmeasurable to identify and describe the nu mber of signatures to discriminate the false and real alarms.
In real world scenarios, despite the efforts taken to intelligently identify the true alarms with the predefined classification, it becomes impract ical to separate the alarms that are harmful.To co mbat the false alarms, a solid understandability of the different types of false alarms and the primary reasons for generation should be studied so as when it comes to reduction it becomes easy for us to distinguish true and false alarms.Further studies revealed that the contributing factors for false alarms are network protocol, network architecture and inherent challenging issues.Depending on these factors, the more meaningful and specific categories of false alarms are as follows: Reactionary Traffic alarms: Traffic that is caused by another network event, often non -malicious.
Equi pment-related alarms: Attack alerts that are triggered by odd, unrecognized packets generated by certain network equipment.

Protocol Vi olati ons:
A lerts that are caused by unrecognized network t raffic often caused by poorly or oddly written client software.
True False Positi ves: Alarms that are generated by an IDS for no apparent reason.These are often caused by IDS software bugs.
Non-Malicious alarms: Generated through some real occurrence that is non-malicious in nature.
Along with this, understandability of the major t riggering mechanisms used in WIDS is essential so that the intelligent techniques employed to isolate the real alarms fro m false alarms to be more accurate.The mos t common form of triggering mechanism used are anomaly -based detection otherwise referred to as profile -based detection, in which every user who is using the WLAN is created with a profile and their normal act ivities are monitored.At any instant if the WIDS co me across any activity which is deviated fro m their normal activ ities then an alarm is produced fro m the WIDS.This method serves as an efficient technique to detect insider attacks as well as account theft.However, the obvious disadvantage with this technique is the system should be trained to create appropriate user profiles .Th is method is more prone to false alarms.M isuse based detection or signature-based IDS are generated based on specific attack signatures.This methodology provides some very less false alarms.However, it becomes highly impossible to update the signature frequently.The complete analysis gave us an idea that false alarms remains as a contributing factor that can reduce the overall efficiency of the Wireless Intrusion Detection System.
To mitigate the problem of false alarms and to improve the performance of detection system, we are try ing to design a mach ine learning based intelligent filter that can identify and analy ze the false alarms accurately so that there is a drastic reduction in the rate of false alarms.The contributions of this extensive survey are summarized as follows: This paper presents a survey regarding the existing methodologies to solve the problem of false alarms in four aspects namely ontology-based approaches, Mining Approaches, heuristic and machine learning based approaches.
This paper proposes a machine learn ing based false alarm filter that follo ws the traditional steps of reducing features and in filtration process, our focus mainly concentrates in fin ding out the suitability of traditional machine learning algorithms in wireless intrusion detection system.
Another proposal of this work is to use the AWID dataset, where meagre amount of work is done using this dataset compared to the DARPA, KDD and Trace Dataset.Here, it listed the relevancy of this dataset especially in wireless networks.
This paper is organized as follows: In section 2, we discuss the different approaches followed traditionally to decrease the false alarm rate with the conclusion of machine learn ing approach and its importance in wireless intrusion detection system.In section 3, the features of AWID dataset and its relevancy in designing the filter is discussed.In section 4, we present the machine learn ing based false alarm filter.In section 5, we suggest some suitable machine learning algorith ms and the way to improve the performance in wireless networks.Finally, we conclude our paper in section 6 with highlighting the features of this work.

Traditional Techniques to Reduce the False Alarms
Co mputer network has two major classification namely wired (infrastructure) and wireless (infrastructure less) networks which are based on the topology.Similarly, the d ifferent IDSs are used separately for these different networks namely wired intrusion Detection system with wireless Intrusion Detection system.Here, the only difference is network topology and the requirement to scan air than wire [1].The above said point ponders that the methodologies tried generally fo r reducing the false positives in wired network can help us to arrive at a more efficient methodology to minimize the number of false positives.Traditionally, the network intrusion detection system false alarms are reduced by dealing the problem by following any of the four a pproaches namely ontology-based approaches, min ing, heuristics and machine learning based approaches.Moreover, wireless intrusion detection system is no exception to this.
This section discusses briefly about all the work that revolves around these categ ories.This entire survey considers only the anomaly-based detection mechanism as they are mo re prone to false alarms as this method completely takes decision depending on the behaviour of a user.Moreover, it is always not essential that the user must follo w the same pattern of behaviour when the user is trying to access the network.Certain unexpected behaviour from a leg itimate user always persists and this unusualness should not be identified as malicious and that ultimately triggers a false alarm.The above-mentioned methodologies are provided so me noteworthy research on solving the false alarm problem.However, we can't ignore the fact that these techniques have its root with some of the preliminary approaches like fine tuning procedure that worked by adapting a signature policy.
Even though this technique opened its doors towards handling the false alarms it was a trade -off between the reduction and security level.Manual assistance in examining, updating the environment and the concept of acknowledging the alarms depending on the operating system [18] reported this procedure to be less effective.However, there was a clear understanding about the number of false alarms on an average generated in any intrusion detection system for a day motivated for further research.
Another major research issue is claimed that the false positives are arises in any intrusion detection system because of the lack of correlation between the input and output [19] termed as APHRODITE.This system is defined with two co mponents: output anomaly detector that refers to the predefined statistical model which holds all the possible normal behaviour and any deviation was flagged as an attack generally used to monitor the output of the system.Correlation engine specific purpose was to correlate the input to the output.The process of the identificat ion of the threats was made by tracking and co mbin ing the input and output traffic.Bolzoni and Etalle [19] shows the similar research stressing in their work on a very simple concept of generating as many number of alarms as they are and then compare this alarm with statistical model, mon itor the input and output traffic, if a match is found in the statistical model report as a false alarm.Th is survey clearly denotes that this was the starting point for dealing the false positives problem using the modernized artificial intelligence technique.
An alternative to the traditional alarming technique can be performed by modelling a specification -based Model [5].This model primarily uses a wireless sensor that can monitor the spectrum and construct a state transition model for each AP and STA in that spectrum.The state transition model denotes the series of actions that must be taken for the association of STA and AP.Anomalous transition is something that is observed with the frames in the state transition table.Every frame is evaluated against the specification wh ich is configured in the sensor and if it results in a security constraint, alarm is generated.This model was analysed using the snort-based tool and monitored for the changes in transition model, and this generated low rate of false positives.Since this methodology is comp letely based on the state behaviour, threshold tuning would be helpful to provide a better solution.With these preliminary approaches, the research towards false alarm rate reduction narrowed down to the below categories.

Data Mining Approaches
Lot of valuable research is done for the creation of effective intrusion detection system using data min ing approaches, which basically extracts knowledge fro m larger database and using that knowledge to build a concept, rule, law or model.Then we try to find a relationship fro m ext racted knowledge that will help largely in decision making.Interestingly most of the researches performed using the date min ing concepts concludes that the accurate detection methodologies can reduce the number of false alarms in intrusion detection system.The beauty of mining is most appropriately used in the alert processing techn ique [20].Detection phase and alert processing techniques was dealt differently by many people but common idea behind every research lies with the statistical modelling, decision tree classification.In most of the min ing approaches, alert correlation ana lysis is performed by the usage of clustering and merging functions to recognize alerts that corresponds to the same occurrence of an attack and then creat ing a new alert by co mbin ing the data fro m similar alerts.One very important work with respect to data mining is to find alarm clusters and generalised forms of false alarms to analyse the root causes [21].This cluster-based study identified that more than 90% of the false alarms are generated from a very small subset of root causes.Despite the idea lo oked very promising, this could reduce only a percent of the false alarms.
Rupinder Gill et al [6] derives the conceptual understanding fro m the age-o ld data mining concepts as it possesses the feature of describing behaviour fro m a g iven large data set.They demonstrate that the implementation of WIDS based on statistics.Co mb ining these concepts an algorithm that can statistically measure the similarity of management traffic clusters between a long term and short -term performance is presented.The methodology presented in their work concentrated on the management frames which is generally created whenever the stations tries to associate a connection, i.e., the stations create management frame and send it to AP for its association.The activities that are associated with the frames are scanning, jo ining, leaving are grouped into a cluster which results in a different cluster pattern for every event.This management cluster frames if analysed systematically, WIDS can tell the type of event occurring.Th is wo rk has created a test bed with five clients, one attacker and one sniffer to observe the traffic.Considering window size and sample interval as parameters this algorithm tried to identify the patterns of every cluster and report any unnecessary events.This methodology kept in mind that the false alarms are better than missing events, so more emphasis is kept for not any missing events which practically put a barrier on the prediction in real time analysis.
Another work [48] has dealt the correlation phas e as duplicate removal and consequence correlation.As the name signifies, duplicate removal looks for specific configuration file and identify the instances of the same attack using the rules.Consequence correlation includes five functions alert base man agement, alert clustering, alert merging, alert correlation and intention recognition function.Management receives the alerts generated fro m WIDS in IDM EF fo rmat and stores it in relational database for further analysis.Alert cluster and merging function accesses the database, uses a similarity function to cluster and merge the alert.A pure statistical causality analysis doesn't require predefined knowledge about attack scenarios but uses causality analysis to correlate alerts and constructs attack scenarios.This work was helpful in identifying new attack scenarios.
Despite the tremendous research performed in this category there are some open problems and disadvantages related to the studied techniques.
• Most of the proposed techniques act in an off-line mode.
• Some of these techniques are depended to human analyst for training phase or developing filtering rules.
• Another problem associated to some of the proposed techniques is the lack of accuracy.
This extensive study on the datamining approaches gave us a clear idea that most of the approaches revolved around the alert processing technique which can be used as a starting point for dealing the false alarms problem without human dependence especially in machine learning approaches.

Heuristic Approaches
The growth of technology is massive, and we are all liv ing in a situation where we try to identify a heuristic methodology for every concept under study.False Alarm rate detection is no exception to this and the idea of designing a heuristic approach to reduce the false alarm is proposed by Wenche chow et al [7].Even though, the specification model tried to identify the solution for giv ing a novel understandable solution for making the system to learn by itself.The viable difference lies in heuristic approach, which concentrated on identifying the intrusive behaviour of a node rather than the specific attack.The technique proposed is simp ly by sniffing the packets of 802.11 whatever packet captured is checked for its owner and its origin by just conv erting the frame fro m its hexadecimal format to decimal format and a co mparison is performed between the MAC and 802.11 frame.Signatures are basically verified, and this process is repeated for every upload file so that an alarm is generated which showed some pro mising results.This approach showed promising results, however the testing was done for 20 values and the efficiency of this approach with the more signature can be studied.

Ontology Based WIDS
Semantic Web techniques and methods like concept of "content" and "ontology" can be used in many fields of computer science.Classificat ion tools for unlimited events can be obtained using ontology and it can analyze user behavior, system activit ies and abnormal behavior.With this basic idea of ontology, it tried to extract semantic relations between computer attacks and intrusions.Ontology constructions are done based on the computer attacks and every method incorporated consists of some agents and master agent.Every t ime the agents come across a suspected condition, they send a report to master agent and the master agent verifies its ontology and takes relevant action [32].Research in this area is very minimalistic and most of the work revolved around these four aspects: target centric ontology, relat ionship between features, hybrid ontology and master IDS agent model.The target centric ontology is characterized by system co mponent, Means of Attack, Consequences of Attack and Location of attacker.The simple taxono mies are replaced by ontologies and an in itial ontology construction for intrusion detection system is proposed [50].Hybrid ontology tries to combine the syntactic and semantic features as it believed that the syntactic match alone is not sufficient as they are based on prefix substring and suffix matching.Most of the work reus ed the ontology with older set of attacks.Ontology is an iterative process.Because of these drawbacks, very few research works are performed in this area.
To summarize, this survey gave us an idea about the different approaches that can be thought of to deal the false alarm prob lem in intrusion detection system.So me of the common problems which was observed in these approaches are 1.Analytical module uses a limited portion of source information, so the detection capability is limited.
2. Continuous scanning of the network traffic affects adversely the performance.
3. Inability to handle encrypted data packets.
4. Upgradation to newer standards is difficult.
5. Some requires alteration in the 802.11protocol.
We need a human independent solution that can process millio ns of data points each minute and automatically identify anomalous behaviour.The most notable difference between machine learning and statistical approaches is that the latter in general is based on understanding the process behind the generation of the o bserved data.
Machine learning in contrast focuses on a system that can improve the detection rate by learning fro m previous results, therefore being able to adapt their strategy over time.

Machine Learning Based WIDS
Machine learning based intrusion detection can be viewed in two perspectives as approaches based on artificial Intelligence and Computation Intelligence.There is a strong bonding between the artificial intelligence and Machine learn ing which can be well exp lained as writing a very clever p rogram which has human like behaviour can be artificial intelligence, if the program's parameters are auto matically learned fro m data, it is machine learning.This strong relationship between these led us to do a further research in the role of artificia l intelligence in intrusion detection system.Artificial Intelligence techniques revolve around some well-known concepts like statistical modelling wh ile co mputational intelligence concentrates on evolutionary computation, fuzzy logic, artificial neural networks and artificial Immune System.Co mputational Intelligence differs fro m art ificial intelligence with the underlying representation.Generally artificial intelligence uses symbolic representation whereas Computational Intelligence uses numeric represen tation.
Most of the approaches under the artificial intelligence concentrate on the development of cognitive models that can perform the activities of clustering, correlation and prioritization under single roof.Mansour et al [23] tried to discover the structural relat ionship using the interference technique in fuzzy cognitive modelling.Alert clustering refers to simp le grouping of co mmon attack patterns, correlation concentrates on finding the relationship between patterns, common feature values between two alerts are co mpared for a perfect match to develop a unified alert fusion model that can reduce the number of false alarms.This work by long [24] have suggested a clustering algorithm for d iscriminating the IDS t rue alerts fro m the false positives.In this work, he understood that one majo r problem raised when they tried to analyze the alarms was the diversity of formats used by different vendor, so a unified framework for IDS alerts was essential to handle the alarms efficiently and came to the existence of IDM EF.The proposed clustering algorithm worked with this measure.Along with this a combination o f min ing and fu zzy came in th is technique proposed by Long et al [25] wh ich fo llowed min ing concepts to build the clustering algorithm and the same old root cause analysis was done to provide a cognitive model.
One of the works that is going to be used as a starting point for our proposal is well described in [50].Th is work of generating a filter-based feature selection method, one of the preliminary approaches of reducing the number of features in the dataset to identify the best features that can be suited for predict ion.This work already showed some co mparative results that can be used as a conclusive benchmark for our feature selection.KDD cup da taset consists of nearly five million train ing samples and two million testing samples that not only contributed in terms of co mputational comp lexity but can also reduce the efficiency with more number of redundant samples.Here the data pre-processing is performed by following the tradit ional steps of transferring and normalizat ion.Feature selection is performed by using the appropriate mutual informat ion and linear correlation co -efficient algorith m and the classification is based on support vector machines.With this information, the points that is still unexplored in this work is that the feature selection methodology wasn't been tested for network specific data and the suitability of these algorith ms in wireless networks remains as a question mark.Thu s, in our proposal we are going to test the adaptability of these feature selection algorith ms in wireless networks by considering the network specific data as well.
Another interesting work in wh ich a specific machine learning based wireless intrusion det ection system was created for detecting and recovering fro m DoS attacks.This work believed that the accuracy of any intrusion detection system depends on the classifier and the classification was carried out as a two -step process with the training phase to build the classifier and the performance of several classification algorith ms is analysed.Major algorithms like Bayesian, Ad boost, SVM, RIDOR are all tested for its detection rate and false alarm rate.An approach named as Additional Localization Approach mentioned in [32]  The above results prove that the AdaBoostM1 classifier algorithm showed pro mising results for achiev ing better accuracy and detection if the type of attacks is Den ial of Service (DoS).Fro m this work, we decided to try on the other attacks as well and use the classification algorithm that can y ield pro mising results irrespective of the type of attacks.As mentioned, this work tried to build a WIDS that can handle the DoS attacks, however it failed to address MAC layer attacks and they have used 18 features to identify the attack, reducibility of features is what we need to concentrate.The ideas derived from this work can be used as a base for building our dataset and selecting the features, however more emphasize is to be given for recent machine learning classifiers .
Another interesting work that concentrated to study specifically the MAC layer attacks in which the feature selection dataset followed the traditional methodology, however the feature selected consisted of the MAC address.They have tried to explore the impact of MAC address mapping schemes on the cross -platform robustness of machine learning based intrusion detection system.This work can also be used as a base for understanding that the same methodology of feature selection, formatting and classificatio n can very well be used not only for DoS but also for other forms of Wi-Fi attacks.Another well-known attack with respect to wireless networks is probe request attacks and an intelligent approach to deal with these attacks is presented in [3].Here, they have built a prototype to detect the probe request attacks using neural networks is trained using MATLAB and the network was trained using the back-propagation algorith m to detect an external attacker.With this approach, they have successfully discriminated a rogue frame than a genuine frame and with this approach we can justify that even the probe request attacks can be detected efficiently by using the machine learn ing approach.Another common type of attack found in WLAN is the man in the middle attack (M ITM) that can be detected MITM by observing abnormal variat ion of network measurements with its emp irical data such as delay and signal strength.Here they have proposed a novel method to identify MITM by analyzing and obtaining the mean and deviations of the round-trip time and received signal strength.Presence of attacks is identified by using the longer delay and larger standard deviation in round -trip t ime.To locate the Man-in-the-Middle attacker, they have the traditional machine learning algorith ms like Naïve Bayes, Support Vector machine learn ing algorithms and concluded that Gaussian naïve base shows better results for MITM [35].
One of the notable research that used a novel approach using machine learn ing technique to identify true positive is presented in [35] as Adaptive Learner for Alert Classification.Here they have constructed a classifier that gets instant updates from the analyst which helps the system to update the classifier automatically.Th is continuous updating builds up the system knowledge base so that it can minimize the number of false alarms.Th is method offered a great efficiency in terms of operation that failed as it co mpletely dependent on the analyst accuracy and faced lot of difficult ies when it was tried in real time analy sis.ALAC was designed to operate in two modes: a recommender mode, in which all alerts are labelled and passed onto the analyst, and an agent mode, in which some alerts are processed automatically.In reco mmender mode, where it adaptively learns the class ificat ion fro m the analyst, false negative and false positive were obtained.Where in the agent mode, some alerts are autonomously processed (e.g., false positives classified with high confidence are discarded).In this system, a fast and effective ru le learner was used that is RIPPER.It can build a set of rules d iscriminating between classes (i.e.false and true alerts).The nu mber o f false alerts is reduced by more than 30%.Th is system has a disadvantage that is during a system's lifetime the size of the training set grows infinitely.
Later, he extended his previous work in [27] and presented two complementary approaches for false positives reduction: CLARAty which is based on alert post processing by data mining and root -cause analysis and ALAC which is based on machine learn ing.CLARAty is an alert-clustering approach using data min ing with a modified version of attribute-oriented induction [27].Using this system, the nu mber of alerts to be handled has been reduced by more than 50%.He has released a complete document of his work in 2006 [28].Another promising ability of artificial intelligent technique is its pattern recognition ability.Several studies have been undertaken to improve the alert correlation mechanis m by artificial intelligent techniqu e.In this work [36] alert fusion is a process of interpretation, combination and analysis of alerts to determine and provide a quantitative view of the status of the system is being monitored.This method uses the cause and effect events to interpret the data which could lead to the identification o f attacks.This technique was not able to discover the casual relationship among alerts or it required large nu mber of pre-defined ru les in correlating new alerts.Siraj and Vaughn [37] considered some of the we ll-known algorith ms like decision trees, k-nearest neighbour, multi-layer protection and support vector mach ines to perform a co mparative analysis if the system fo llo ws a classification approach and clustering approach.This analysis gave us an insight these algorith ms can reduce false alarms if clustering is emp loyed.This study gave us an idea that machine learn ing is better for finding variations of known attacks rather than previously known malicious activity.
Co mputational intelligence-based approaches use genetic algorithms [38] that can create rules fo r an expert system and the training sets are generally created by the analyst for rule development and decision support.Support Vector Machines belongs to a set of classifiers that simultaneou sly minimize the empirical classificat ion error and maximize the geo metric margin.The process involves creating a hyperplane in N -dimensional space that would separate two data sets with highest marg in [39].Support Vector Machines classify data by determining a set of support vectors which are members of the set of t rain ing inputs that outline a hyperplane in the feature space.Support vector machine using a kernel function provides a mechanis m to fit the surface of the hyperplane to the data.Their method offered a great efficiency in terms of operation.However, the system working procedure without an analyst role is unexplored.The ideas derived fro m their work can help us to conclude that alert processing technique can be well used in our proposal.
This work [9] is quite interesting as they used the computational intelligence techniques that can be deployed easily in network security [10,11].Associating the danger theory and Artificial Immune system produced an intelligent system that are the key co mponents of mult i agent systems and tried to identify the active and passive attacks in 802.11.Th is work is an extension of the model proposed in [12] in which the anomalies are detected using the Immune Based Agents.The methodology followed in this scheme is a test bed is created using five workstations, one server and one AP.JADE is used as a framewo rk air crack ng was the tool for attack and the experiment was conducted based on attack.The results analyzed in this methodology showed very less number of false alarms.Th is methodology can be further imp rovised by performing the analysis for both active and passive attacks.Mayank Agarwal and Suku mar Nandi [33] concentrate in genetic programming and artificial neural networks that can help the system to decide on untrained attack with its trained ability.We tried to evaluate all the algorith ms that fall under the machine learning approach that can help us to build a resilient intrusion detection system that eventually reduces the false alarms.
Later, Law and Kwok [50] proposed a method using KNN classifier that works by using the Euclidean distances.They created a model that shows the sequence of inco ming alarms which are normal and any deviations fro m this model was identified as anomalies.In itially lot of machine learn ing work concentrated on applying supervised learning algorith ms that requires number of labelled instances for training phase.As far as the intrusion detection system it's better to avoid human intervention so the necessity for semi supe rvised learning increased.The idea of using the semi supervised learning can make the false alarm rate mo re realistic.Another concept of active learning is a form of supervised machine learn ing that has the capability of interactively querying the user for information by using the classifier and query function.By co mbining the active learn ing and semi supervised learning the unlabelled data can be used effectively in intrusion detection system.
With these meticulous studies, we can exp lore few of the points that remain unaddressed and if it is addressed effectively can help us to achieve better results.Most of the researches in machine learning tried and achieved the results using the KDD 1999 Cup dataset, the problem of evaluating the robustness of mach ine learn ing techniques in real networks is still unexplored.In terms of attacks, most of the machine learn ing approaches concentrated on DoS and very few wo rks considering other wireless attacks.Alert verification and correlat ion techniques showed promising results, however it failed when it co mes to the methodology to collect and determine the appropriate contextual informat ion.Many WIDS approaches using machine learning to combat false alarms are unable to detect recent unknown attacks and others were not able to provide a real-time solution.
In the next section, we will try to address the issues that were co mmonly found in the machine learn ing approaches and come up with a solution that can effectively reduce the false alarm rate.

Data Pre-processing and Filtering
Filtering and pre-processing is an essential step if we must perform any operations related to the intrusion detection system as even for a very s mall network large amount of data is generated.The larger number of attributes in dataset produces will increase the computation complexity and decreases the redundancy.In pre-processing the raw intrusion alarms are formatted to a standard format in wh ich the processing using the mach ine language algorith m beco mes easier.Once the standard for mat is acquired, it beco mes easy to identify the redundant alerts that in turn can decrease the computation complexity.The standard procedure for performing the pre-processing is formatting, cleaning and sampling.Formatting is a process that converts the standard format in which we acquired the data into the one that can be used for further processing.Cleaning is generally carried out to remove the inco mplete data instances that may not be useful for processing.Sampling reduces the number of data instances by grouping similar alerts with the available techniques like clustering and aggregation that can make the analysis process easy.Filtration process should be carefully done as wrong filtrat ion might remove the alerts that are necessary.In most of the situations, depending on the machine learn ing tools we use, pre-processing can be iterative.Filtration process are generally performed by using any of the approaches like human expert analysis, WEKA, R machine learning package, MAO, ELKI, Rapid Miner etc .

Wrapper Based Feature Selection
Feature selection becomes unavoidable in performing the analyses of intrusion detection alarms as some of the features may be redundant and some feature may not useful for our analyses.So, consistency should be maintained to select the best features for our analyses.As known, there are two types of feature selection method: filter based and wrapper based.Both the methods hold goodness in terms of certain factors, the wrapper method is best in terms o f accuracy.So, if we emp loy the wrapper method it goes undoubtedly the best feature set is generated.A suitable algorithm will be used to perform the feature selection.Once the appropriate feature set is obtained, combining the alerts with same attributes is performed and the relationship that existing among the similar alerts also analysed so that we can obtain a complete minimalistic dataset for further processing.

Machine Learning Algorithm based Feature Selection
In this section, we are trying to identify the best machine learn ing algorith m fro m the pool of algorithms by evaluating some of the well-known algorith ms like OneR, Adaboost, J48, decision tree, Random Forest and Random Tree.The Waikato Env iron ment for Knowledge Analysis (WEKA) toolkit can be used for performing the same.We will be using a decision value that can possibly predict the accurate number of false alarms.With our proposed method, we are trying to achieve the double filtration process that can effectively reduce the number of false alarms.

Results and Discussion
Some of the results that are derived from some works can be used as some parameters that can aid us in the development of false alarm filter.So me of the earlier results obtained in terms of evaluating the performance of mach ine learning algorith ms fro m various sources are presented here.Most of the algorith ms provided 90% classification accuracy with random tree acquiring the top position.Another point that was observed with respect to the network intrusion detection system using the AWID dataset indicated removal of lo w rank features didn 't improve the classification accuracy and it can only be achieved by following different feature reduction levels.So, it beco mes unavoidable that continuous evaluation of datasets is extremely important to achieve the highest accuracy.This result evaluated can be taken as a benchmark for our false alarm filter.
Figure After the feature reduction process, they have derived one very important equation of using the decision value to identify the false alarms effect ively.This looks pro mising and the idea of adaptively selecting the machine learning algorithm is also discussed.This feature set can be used as a benchmark for our false alarm filter.

Suggestion Proposed
The end of this survey provides us a solid research conclusion that machine learning technique is able to produce better or mo re concise rule if the background knowledge is used appropriately for the classification.We have also understood that the anomalies fall under these three categories namely: 1. Point Anomalies: If an individual data instance lies outside the boundaries of normal region of data.

Contextual Anomalies:
If an information occurrence is anomalous in a precise context.

Collective Anomalies:
If collection of data instances is anomalous with respect to the entire data set.
The mentioned anomalies are concentrating on only one thing i.e., to gain knowledge about the network infrastructure internals.Many WIDS approaches using machine learning to combat false alarms are unable to detect recent unknown attacks and others were not able to provide a real-time solution.This section helps to understand about the primary reasons for false alarm generation and with the help of false alarm filter and also try to reduce the false alarm rate considerably.
Current machine learning algorith ms are not suitable to use in real time network as it might require some changes in the protocol itself.So, if a standalone filter is created that doesn 't require any alterat ions in the protocol would be satisfactory.And the filter if it undergoes double filtration process can reduce the false alarm rate considerably.The intelligent filter that is proposed by us is not going to perform so mething wh ich is not mentioned in the tradit ional methodology.However, this survey g uides the researchers to follo w d ifferent algorith ms that can perform the double filtrat ion process.Constructing a false alarm filter offers some merits in terms of flexib ility, adaptation and scalability.The alarm filter doesn 't affect the structure of the intrusion detection system as it is deployed behind an intrusion detection system and can work both online as well as offline.One gap that is still unexplo red in terms of intrusion detection system is the semantic needs, so the alarm filter effectively uses the contextual informat ion for filtering process.We are also aiming to provide the accuracy of the filtration rate to maximu m and stable by selecting the most appropriate machine learning algorith m.The architecture for false alarm filter is described in the below figure: As mentioned earlier, the alarm filter is going to follo w the standard procedure of formatting, feature selection, prediction phases.However, the idea of using wrapper-based feature selection and its suitability in IEEE 802.11 is something unexplored.The process of identifying the best prediction algorithm is been done for specific attacks, here we are going to do the same considering all the possible attacks possible.The prediction itself will identify the type of alarm, and in the next filt ration process again we are performing the filtration to rule out the rate of false alarms.Thus, the objective of double filtrat ion can be achieved by the proposed architecture.
Starting off with the process of pre-processing we will try to achieve the standard process of formatting, clean ing and sampling to obtain the most relevant informat ion for analysis.Massive Online Analysis can be considered for carrying out this work due to its suitability over demanding problems and its widest support of algorith ms covering different concepts of machine learning.
The next process of feature reduction using the wrapper method generates different training data sets from the given training data set.Ensemble learning algorith ms like the Bagging and ad boost can be used to generate the feature subset.The obtained reduced subset undergoes the process of classification and the performance evaluation of various machine learning algorith ms are carried out.The perfo rmance of the algorith m is measured by evaluating some of the standardized met rics such as classification accuracy and the false alarm detection accuracy.The algorith m that is coming out with the best classification accuracy rate and false alarm detection accuracy can be selected fro m the pool of several mach ine learning algorithms.So me of the algorith ms that will be tested are Decision Tree, Random forest, Rando m tree, one R and J48.The objective of double filtration can be achieved by using the false alarm filtrat ion component that once again will make use of the best learning algorithm to boost the detection accuracy of false alarms.
The Aegean Wi-Fi Intrusion Dataset (AWID) is the dataset chosen by us for this evaluation as this contains 155 attributes.Even though this dataset consists of simu lated attack, the higher number of attributes allows us to identify higher number of intrusion types so that different types of attacks can be addressed.The Aegean Wi-Fi Intrusion Dataset (AWID) is a publicly availab le labeled dataset which was developed based on real traces of both normal and intrusion activit ies of an 802.11.Wi-Fi netwo rk is under the supervision of University of the Aegean and George Mason University.The AWID dataset is comprised from a large set of packets (F) and a smaller one (R).These two versions are not related i.e., the smaller one has not been produced from the larger.They have been captured at different times, with different equip ment and in d ifferent environments.Each version has a training set (denoted as Trn) and a test set (denoted as Tst).Th e test version has not been produced from the corresponding training set.Finally, a version where labels that correspond to different attacks (ATK), as well as a version where the attack labels are organized into 3 major classes (CLS) are provided.In tha t case, the datasets only differ in the label.By using this information, we can frame fuzzy IF…THEN rules for identify ing the attacks exactly.This paper is also suggested to prepare new fuzzy rules for enhancing the detection accuracy and also reduces the fals e alarm.Moreover, it concludes the uses of computational intelligence techniques are useful to reduce the computational complexity, increase the detection accuracy and it also able to reduce the false alarm rate by using conditional probability.In addition, intelligent agents can be introduced for the effective co mmun ication and it also helps to improve the decision -making accuracy.Soft co mputing techniques can be used with intelligent agents for improving the machine learn ing algorithm performance in terms of detection accuracy.

Conclusion
False alarms remain as a big challenging issue in intrusion detection system and this serves as a limiting factor for its construction.Constructing a false alarm filter appears to be a promising method in reducing the false alarms.In this survey, we explained in detail about the usage of data min ing techniques, preprocessing techniques such as filter approach and wrapper approach, ontology -based approaches and heuristic search-based approaches for enhancing the detection accuracy, reducing the false alarm rate and co mputation complexity.The comparative analysis made by the end of the discussion.After the co mparative analysis, it reco mmends the suitable idea for enhancing the detection accuracy by reducing the false alarm rate and computation complexity.Finally, it reco mmended that to introduce an intelligent agent based false alarm filter that undergoes double filtration p rocess for enhance the performance in terms reduction of false alarm rate.Moreover, most of the best mach ine learning algorithms are studied to obtain the best feature set and to obtain the best prediction.End of the discussion, it suggested that to design new machine learning algorith ms with the introduction of intelligent agents, soft computing techniques and fuzzy rules for better p rediction accuracy on WLAN 802.11 attacks.

Figure 2 .
Figure 2. Architecture of False Alarm Filter The concept of studying the framework for a distributed intrusion respons e engine based on alarm confidence, attack frequency, accessed risks are estimated and produced a response matrix for detecting attacks is proposed byLim  et al [9].
Danziger et al [13]conduct a thorough study about the different types of attacks existing in WLA N 802.11 is given and then the idea of blending the 802.11 with WIDS and identifying the false alarm is imp lemented.The most common form o f attack identified in wireless networks is DoS, the SSID fro m where the attack originated is identified and the corresponding management frames are studied.This frame value is co mpared with the existing threshold value and the identification of alarms is generated.The same principle applies to the other types of attacks as well.Th is model failed to explain abo ut on what factors the threshold limit can be set[14].This factor is one essential component that should be identified and set in a proper format in the problem statement.Borsc and Shinde [15]developed a Wi-Fi-EWS model in wh ich the methodology is implemented as two level defenses.First level looks for anomalies, and a systematic learn ing mechanism is used to track the t imings of wireless transmissions.At the second level, a state transition model is built and then querying the h istorical data is performed.Results evaluated in this methodology are quite the same proposed byTade and Timothy [5].Elankayer Sithirasenan and Vallipuram Muthukku marasamy [16]tried to match sequences of audit records to the expected audit trials and the usage of various tools is clearly identified and studied.

Table 1 .
Here they concentrated to decide the best classifier algorith m for accuracy and detection.The comparison results are mentioned in table 1: Comparison of Various Classification Techniques can be applied to both open as well as encrypted networks, which used RSSI and AoA localization approaches for detecting the existence of flooding-based DoS attacks in a Wireless network.This research tried to cover up some of the drawbacks which we mentioned earlier, and the architecture consisted of knowledge base, intrusion Detection system and Localization Module.

Table 2 .
1. Performance evaluation of various machine learning algorithms Another interesting result obtained from based on snort alarms that took 8 features description, classification, priority, packet type, source IP address, source port number, destination IP address and destination port number.Conversion of snort alarms to standard alarms is depicted in table 2: Snort Alarms Feature Selection

Table 2
lists the intrusion types which are availab le in the standard AWID Dataset.It has 17 intrusions with description.

Table 2 .
Intrusion Types in AWID Dataset Intrusion Description Amok An Increased number of 802.11Authentication Requests is noticed in Amok Arp It may be used as a first step for any of the Key cracking attacks Authentication request 802.11DoS Attack Fro m table 2, it can be observed that the availability of DoS attacks is high when it is compared with other types of attacks which are listed in the table.In this 802.11W LAN, authentication, power saving, probe request, probe response and Rts named attacks comes under the category DoS attack.