Abstract
In the real world, the frequency of occurrence of objects is naturally skewed forming long-tail class distributions, which results in poor performance on the statistically rare classes. A promising solution is to mine tail-class examples to balance the training dataset. However, mining tail-class examples is a very challenging task. For instance, most of the otherwise successful uncertainty-based mining approaches struggle due to distortion of class probabilities resulting from skewness in data. In this work, we propose an effective, yet simple, approach to overcome these challenges. Our framework enhances the subdued tail-class activations and, thereafter, uses a one-class data-centric approach to effectively identify tail-class examples. We carry out an exhaustive evaluation of our framework on three datasets spanning over two computer vision tasks. Substantial improvements in the minority-class mining and fine-tuned model’s task performance strongly corroborate the value of our method.
G. Singh and L. Chu—Contribute equally in this work.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Aggarwal, U., Popescu, A., Hudelot, C.: Active learning for imbalanced datasets. In: The IEEE WACV, pp. 1428–1437 (2020)
Attenberg, J., Provost, F.: Why label when you can search? alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In: Proceedings of the 16th ACM SIGKDD, pp. 423–432 (2010)
Attenberg, J., Provost, F.: Inactive learning? difficulties employing active learning in practice. ACM SIGKDD Explor. 12(2), 36–41 (2011)
Bengio, S.: The battle against the long tail. In: Talk on Workshop on Big Data and Statistical Machine Learning, vol. 1 (2015)
Bhattacharya, A.R., Liu, J., Chakraborty, S.: A generic active learning framework for class imbalance applications. In: BMVC, p. 121 (2019)
C Lin, M.: Active learning with unbalanced classes & example-generated queries. In: AAAI Conference on Human Computation (2018)
Chen, Y., Mani, S.: Active learning for unbalanced data in the challenge with multiple models and biasing. In: Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pp. 113–126. JMLR Workshop and Conference Proceedings (2011)
Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: AAAI, vol. 5, pp. 746–751 (2005)
Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Machine Learning Proceedings 1995, pp. 150–157. Elsevier (1995)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 127–136 (2007)
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. arXiv preprint arXiv:1706.04599 (2017)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Kazerouni, A., Zhao, Q., Xie, J., Tata, S., Najork, M.: Active learning for skewed data sets. arXiv preprint arXiv:2005.11442 (2020)
Kirshners, A., Parshutin, S., Gorskis, H.: Entropy-based classifier enhancement to handle imbalanced class problem. Procedia Comput. Sci. 104, 586–591 (2017)
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE ICCV, pp. 2980–2988 (2017)
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P.H., Dokania, P.K.: Calibrating deep neural networks using focal loss. arXiv preprint arXiv:2002.09437 (2020)
Ramirez-Loaiza, M.E., Sharma, M., Kumar, G., Bilgic, M.: Active learning: an empirical study of common baselines. Data Mining Knowle. Discov. 31(2), 287–313 (2016). https://doi.org/10.1007/s10618-016-0469-7
Settles, B.: Active learning literature survey. Technical report, UW-Madison Dept. of Computer Sciences (2009)
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. 5(1), 3–55 (2001)
Singh, G., Sigal, L., Little, J.J.: Spatio-temporal relational reasoning for video question answering
Singh, G., Srikant, S., Aggarwal, V.: Question independent grading using machine learning: the case of computer program grading. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263–272 (2016)
Thudumu, S., Branch, P., Jin, J., Singh, J.J.: A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 7(1), 1–30 (2020). https://doi.org/10.1186/s40537-020-00320-x
Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 105–112 (2009)
Zhu, X., Anguelov, D., Ramanan, D.: Capturing long-tail distributions of object subcategories. In: IEEE CVPR, pp. 915–922 (2014)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 Springer Nature Switzerland AG
About this paper
Cite this paper
Singh, G., Chu, L., Wang, L., Pei, J., Tian, Q., Zhang, Y. (2022). Mining Minority-Class Examples with Uncertainty Estimates. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-98358-1_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)