An Empirical Study of Bagging Predictors for Imbalanced Data with Different Levels of Class Distribution
Research into learning from imbalanced data has increasingly captured the attention of both academia and industry, especially when the class distribution is highly skewed. This paper compares the Area Under the Receiver Operating Characteristic Curve (AUC) performance of bagging in the context of learning from different imbalanced levels of class distribution. Despite the popularity of bagging in many real-world applications, some questions have not been clearly answered in the existing research, e.g., which bagging predictors may achieve the best performance for applications, and whether bagging is superior to single learners when the levels of class distribution change. We perform a comprehensive evaluation of the AUC performance of bagging predictors with 12 base learners at different imbalanced levels of class distribution by using a sampling technique on 14 imbalanced data-sets. Our experimental results indicate that Decision Table (DTable) and RepTree are the learning algorithms with the best bagging AUC performance. Most AUC performances of bagging predictors are statistically superior to single learners, except for Support Vector Machines (SVM) and Decision Stump (DStump).
Keywordsimbalanced class distribution AUC performance bagging
Unable to display preview. Download preview PDF.
- 3.Mena, L., Gonzalez, J.: Machine learning for imbalanced datasets: application in medical diagnostic. In: Proceedings of the 19th International FLAIRS Conference (2006)Google Scholar
- 6.Koknar-Tezel, S., Latecki, L.J.: Improving SVM Classification on Imbalanced Data Sets in Distance Spaces. In: Proceedings of ICDM 2009, pp. 259–267 (2009)Google Scholar
- 7.Su, C.T., Hsiao, Y.H.: An evaluation of the robustness of MTS for imbalanced data. IEEE Transactions on Knowledge and Data Engineering, 1321–1332 (2007)Google Scholar
- 8.Maloof, M.: Learning when data sets are imbalanced and when costs are unequal and unknown. In: ICML 2003 Workshop on Learning from Imbalanced Data Sets II, Washington, DC (2003)Google Scholar
- 12.Zeng-Chang, Q.: ROC analysis for predictions made by probabilistic classifiers. In: Proceedings of ICMLC 2005, pp. 3119–3124 (2005)Google Scholar
- 20.Kotsiantis, S., Kanellopoulos, D., Pintelas, P.: Handling imbalanced datasets: a review. GESTS International Transactions on Computer Science and Engineering 30, 25–36 (2006)Google Scholar
- 24.Merz, C., Murphy, P.: UCI Repository of Machine Learning Databases (2006)Google Scholar
- 25.Liang, G., Zhu, X., Zhang, C.: An Empirical Study of Bagging Predictors for Different Learning Algorithms. In: Twenty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2011. AAAI Press, San Francisco (2011)Google Scholar