Using J48 Tree Partitioning for scalable SVM in Spam Detection

Full Text: PDF &nbsp;
DOI: 10.5539/cis.v8n2p37

Mohammad-Hossein Nadimi-Shahraki; Zahra S. Torabi; Akbar Nabiollahi

doi:10.5539/cis.v8n2p37

Using J48 Tree Partitioning for scalable SVM in Spam Detection

Mohammad-Hossein Nadimi-Shahraki
Zahra S. Torabi
Akbar Nabiollahi

Abstract

Support Vector Machines (SVM) is a state-of-the-art, powerful algorithm in machine learning which has strong regularization attributes. Regularization points to the model generalization to the new data. Therefore, SVM can be very efficient for spam detection. Although the experimental results represent that the performance of SVM is usually more than other algorithms, but its efficiency is decreased when the number of feature of spam is increased. In this paper, a scalable SVM is proposed by using J48 tree for spam detection. In the proposed method, dataset is firstly partitioned by using J48 tree, then, features selection are applied in each partition in parallel. Consistently, selected features are used in the training phase of SVM. The propose method is evaluated conducted some benchmark datasets and the results are compared with other algorithms such as SVM and GA-SVM. The experimental results show that the proposed method is scalable when the number of features are increased and has higher accuracy compared to SVM and GA-SVM.