An Empirical Analysis of Imbalanced Data Classification


  •  Shu Zhang    
  •  Samira Sadaoui    
  •  Malek Mouhoub    

Abstract

SVM has been given top consideration for addressing the challenging problem of data imbalance learning. Here,we conduct an empirical classification analysis of new UCI datasets that have dierent imbalance ratios, sizes andcomplexities. The experimentation consists of comparing the classification results of SVM with two other popularclassifiers, Naive Bayes and decision tree C4.5, to explore their pros and cons. To make the comparative exper-iments more comprehensive and have a better idea about the learning performance of each classifier, we employin total four performance metrics: Sensitive, Specificity, G-means and time-based eciency. For each benchmarkdataset, we perform an empirical search of the learning model through numerous training of the three classifiersunder dierent parameter settings and performance measurements. This paper exposes the most significant resultsi.e. the highest performance achieved by each classifier for each dataset. In summary, SVM outperforms the othertwo classifiers in terms of Sensitive (or Specificity) for all the datasets, and is more accurate in terms of G-meanswhen classifying large datasets.


This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

WJCI (2020): 0.439

Impact Factor 2020 (by WJCI): 0.247

Google Scholar Citations (March 2022): 6907

Google-based Impact Factor (2021): 0.68

h-index (December 2021): 37

i10-index (December 2021): 172

(Click Here to Learn More)

Contact