An Empirical Analysis of Imbalanced Data Classification

  •  Shu Zhang    
  •  Samira Sadaoui    
  •  Malek Mouhoub    


SVM has been given top consideration for addressing the challenging problem of data imbalance learning. Here,we conduct an empirical classification analysis of new UCI datasets that have dierent imbalance ratios, sizes andcomplexities. The experimentation consists of comparing the classification results of SVM with two other popularclassifiers, Naive Bayes and decision tree C4.5, to explore their pros and cons. To make the comparative exper-iments more comprehensive and have a better idea about the learning performance of each classifier, we employin total four performance metrics: Sensitive, Specificity, G-means and time-based eciency. For each benchmarkdataset, we perform an empirical search of the learning model through numerous training of the three classifiersunder dierent parameter settings and performance measurements. This paper exposes the most significant resultsi.e. the highest performance achieved by each classifier for each dataset. In summary, SVM outperforms the othertwo classifiers in terms of Sensitive (or Specificity) for all the datasets, and is more accurate in terms of G-meanswhen classifying large datasets.

This work is licensed under a Creative Commons Attribution 4.0 License.
  • ISSN(Print): 1913-8989
  • ISSN(Online): 1913-8997
  • Started: 2008
  • Frequency: quarterly

Journal Metrics

WJCI (2021): 0.557

Impact Factor 2021 (by WJCI):  0.304

h-index (December 2022): 40

i10-index (December 2022): 179

h5-index (December 2022): N/A

h5-median(December 2022): N/A

( The data was calculated based on Google Scholar Citations. Click Here to Learn More. )