Mining Minority-Class Examples with Uncertainty Estimates

Singh, Gursimran; Chu, Lingyang; Wang, Lanjun; Pei, Jian; Tian, Qi; Zhang, Yong

doi:10.1007/978-3-030-98358-1_21

Gursimran Singh¹⁵,
Lingyang Chu¹⁶,
Lanjun Wang¹⁷,
Jian Pei¹⁸,
Qi Tian¹⁹ &
…
Yong Zhang¹⁵

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13141))

Included in the following conference series:

International Conference on Multimedia Modeling

2008 Accesses

Abstract

In the real world, the frequency of occurrence of objects is naturally skewed forming long-tail class distributions, which results in poor performance on the statistically rare classes. A promising solution is to mine tail-class examples to balance the training dataset. However, mining tail-class examples is a very challenging task. For instance, most of the otherwise successful uncertainty-based mining approaches struggle due to distortion of class probabilities resulting from skewness in data. In this work, we propose an effective, yet simple, approach to overcome these challenges. Our framework enhances the subdued tail-class activations and, thereafter, uses a one-class data-centric approach to effectively identify tail-class examples. We carry out an exhaustive evaluation of our framework on three datasets spanning over two computer vision tasks. Substantial improvements in the minority-class mining and fine-tuned model’s task performance strongly corroborate the value of our method.

G. Singh and L. Chu—Contribute equally in this work.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization

Article 13 November 2014

Where Next in Object Recognition and how much Supervision Do We Need?

Learning sample representativeness for class-imbalanced multi-label classification

Article 28 February 2024

References

Aggarwal, U., Popescu, A., Hudelot, C.: Active learning for imbalanced datasets. In: The IEEE WACV, pp. 1428–1437 (2020)
Google Scholar
Attenberg, J., Provost, F.: Why label when you can search? alternatives to active learning for applying human resources to build classification models under extreme class imbalance. In: Proceedings of the 16th ACM SIGKDD, pp. 423–432 (2010)
Google Scholar
Attenberg, J., Provost, F.: Inactive learning? difficulties employing active learning in practice. ACM SIGKDD Explor. 12(2), 36–41 (2011)
Article Google Scholar
Bengio, S.: The battle against the long tail. In: Talk on Workshop on Big Data and Statistical Machine Learning, vol. 1 (2015)
Google Scholar
Bhattacharya, A.R., Liu, J., Chakraborty, S.: A generic active learning framework for class imbalance applications. In: BMVC, p. 121 (2019)
Google Scholar
C Lin, M.: Active learning with unbalanced classes & example-generated queries. In: AAAI Conference on Human Computation (2018)
Google Scholar
Chen, Y., Mani, S.: Active learning for unbalanced data in the challenge with multiple models and biasing. In: Active Learning and Experimental Design workshop In conjunction with AISTATS 2010, pp. 113–126. JMLR Workshop and Conference Proceedings (2011)
Google Scholar
Culotta, A., McCallum, A.: Reducing labeling effort for structured prediction tasks. In: AAAI, vol. 5, pp. 746–751 (2005)
Google Scholar
Dagan, I., Engelson, S.P.: Committee-based sampling for training probabilistic classifiers. In: Machine Learning Proceedings 1995, pp. 150–157. Elsevier (1995)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Ertekin, S., Huang, J., Bottou, L., Giles, L.: Learning on the border: active learning in imbalanced data classification. In: Proceedings of the Sixteenth ACM Conference on Conference on Information and Knowledge Management, pp. 127–136 (2007)
Google Scholar
Guo, C., Pleiss, G., Sun, Y., Weinberger, K.Q.: On calibration of modern neural networks. arXiv preprint arXiv:1706.04599 (2017)
He, H., Garcia, E.A.: Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21(9), 1263–1284 (2009)
Article Google Scholar
Kazerouni, A., Zhao, Q., Xie, J., Tata, S., Najork, M.: Active learning for skewed data sets. arXiv preprint arXiv:2005.11442 (2020)
Kirshners, A., Parshutin, S., Gorskis, H.: Entropy-based classifier enhancement to handle imbalanced class problem. Procedia Comput. Sci. 104, 586–591 (2017)
Article Google Scholar
Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Progress Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0
Article Google Scholar
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
Google Scholar
Lin, T.Y., Goyal, P., Girshick, R., He, K., Dollár, P.: Focal loss for dense object detection. In: Proceedings of the IEEE ICCV, pp. 2980–2988 (2017)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Mukhoti, J., Kulharia, V., Sanyal, A., Golodetz, S., Torr, P.H., Dokania, P.K.: Calibrating deep neural networks using focal loss. arXiv preprint arXiv:2002.09437 (2020)
Ramirez-Loaiza, M.E., Sharma, M., Kumar, G., Bilgic, M.: Active learning: an empirical study of common baselines. Data Mining Knowle. Discov. 31(2), 287–313 (2016). https://doi.org/10.1007/s10618-016-0469-7
Article MathSciNet Google Scholar
Settles, B.: Active learning literature survey. Technical report, UW-Madison Dept. of Computer Sciences (2009)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. ACM SIGMOBILE Mob. Comput. Commun. 5(1), 3–55 (2001)
Article MathSciNet Google Scholar
Singh, G., Sigal, L., Little, J.J.: Spatio-temporal relational reasoning for video question answering
Google Scholar
Singh, G., Srikant, S., Aggarwal, V.: Question independent grading using machine learning: the case of computer program grading. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 263–272 (2016)
Google Scholar
Thudumu, S., Branch, P., Jin, J., Singh, J.J.: A comprehensive survey of anomaly detection techniques for high dimensional big data. J. Big Data 7(1), 1–30 (2020). https://doi.org/10.1186/s40537-020-00320-x
Article Google Scholar
Tomanek, K., Hahn, U.: Reducing class imbalance during active learning for named entity annotation. In: Proceedings of the Fifth International Conference on Knowledge Capture, pp. 105–112 (2009)
Google Scholar
Zhu, X., Anguelov, D., Ramanan, D.: Capturing long-tail distributions of object subcategories. In: IEEE CVPR, pp. 915–922 (2014)
Google Scholar

Download references

Author information

Authors and Affiliations

Huawei Technologies Canada, Burnaby, Canada
Gursimran Singh & Yong Zhang
McMaster University, Hamilton, Canada
Lingyang Chu
Tianjin University, Tianjin, China
Lanjun Wang
Simon Fraser University, Burnaby, Canada
Jian Pei
Huawei Technologies China, Shenzhen, China
Qi Tian

Authors

Gursimran Singh
View author publications
You can also search for this author in PubMed Google Scholar
Lingyang Chu
View author publications
You can also search for this author in PubMed Google Scholar
Lanjun Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jian Pei
View author publications
You can also search for this author in PubMed Google Scholar
Qi Tian
View author publications
You can also search for this author in PubMed Google Scholar
Yong Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Gursimran Singh .

Editor information

Editors and Affiliations

IT University of Copenhagen, Copenhagen, Denmark
Björn Þór Jónsson
Dublin City University, Dublin, Ireland
Cathal Gurrin
University of Science, VNU-HCM, Ho Chi Minh City, Vietnam
Minh-Triet Tran
University of Bergen, Bergen, Norway
Duc-Tien Dang-Nguyen
National Tsing Hua University, Hsinchu, Taiwan
Anita Min-Chun Hu
Hanoi University of Science and Technology, Hanoi, Vietnam
Binh Huynh Thi Thanh
Median Technologies, Valbonne, France
Benoit Huet

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Singh, G., Chu, L., Wang, L., Pei, J., Tian, Q., Zhang, Y. (2022). Mining Minority-Class Examples with Uncertainty Estimates. In: Þór Jónsson, B., et al. MultiMedia Modeling. MMM 2022. Lecture Notes in Computer Science, vol 13141. Springer, Cham. https://doi.org/10.1007/978-3-030-98358-1_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-98358-1_21
Published: 15 March 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-98357-4
Online ISBN: 978-3-030-98358-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Mining Minority-Class Examples with Uncertainty Estimates

Abstract

Access this chapter

Similar content being viewed by others

Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization

Where Next in Object Recognition and how much Supervision Do We Need?

Learning sample representativeness for class-imbalanced multi-label classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Mining Minority-Class Examples with Uncertainty Estimates

Abstract

Access this chapter

Similar content being viewed by others

Multi-Class Active Learning by Uncertainty Sampling with Diversity Maximization

Where Next in Object Recognition and how much Supervision Do We Need?

Learning sample representativeness for class-imbalanced multi-label classification

References

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation