Class Confidence Weighted kNN Algorithms for Imbalanced Data Sets
In this paper, a novel k-nearest neighbors (kNN) weighting strategy is proposed for handling the problem of class imbalance. When dealing with highly imbalanced data, a salient drawback of existing kNN algorithms is that the class with more frequent samples tends to dominate the neighborhood of a test instance in spite of distance measurements, which leads to suboptimal classification performance on the minority class. To solve this problem, we propose CCW (class confidence weights) that uses the probability of attribute values given class labels to weight prototypes in kNN. The main advantage of CCW is that it is able to correct the inherent bias to majority class in existing kNN algorithms on any distance measurement. Theoretical analysis and comprehensive experiments confirm our claims.
Unable to display preview. Download preview PDF.
- 5.Liu, W., Chawla, S., Cieslak, D., Chawla, N.: A Robust Decision Tree Algorithms for Imbalanced Data Sets. In: Proceedings of the Tenth SIAM International Conference on Data Mining, pp. 766–777 (2010)Google Scholar
- 8.Min, R., Stanley, D.A., Yuan, Z., Bonner, A., Zhang, Z.: A deep non-linear feature mapping for large-margin knn classification. In: Proceedings of the 2009 Ninth IEEE International Conference on Data Mining, pp. 357–366 (2009)Google Scholar
- 11.Paredes, R., Vidal, E.: Learning weighted metrics to minimize nearest-neighbor classification error. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1100–1110 (2006)Google Scholar
- 16.Asuncion, A., Newman, D.: UCI Machine Learning Repository (2007)Google Scholar
- 19.Davis, J., Goadrich, M.: The relationship between precision-recall and roc curves. In: Proceedings of the 23rd International Conference on Machine Learning, pp. 233–240 (2006)Google Scholar