Skip to main content

Comparing and Improving Active Learning Uncertainty Measures for Transformer Models

  • Conference paper
  • First Online:
Advances in Databases and Information Systems (ADBIS 2023)

Abstract

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-encoder based language models still requires a significant amount of labeled data to achieve satisfying work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is Active Learning (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final Neural Network layer. In this paper we compare eight alternatives on seven datasets and show that the softmax function provides misleading probabilities. Our finding is that most of the methods primarily identify hard-to-learn-from samples (outliers), resulting in worse than random performance, instead of samples, which reduce the uncertainty of the learned language model. As a solution this paper proposes a heuristic to systematically exclude samples, which results in improvements of various methods compared to the softmax function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 49.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 64.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    According to scale.ai as of December 2021 (as of 2023 the cost is not publicly visible anymore): https://web.archive.org/web/20210112234705/https://scale.com/pricing.

  2. 2.

    https://github.com/jgonsior/active-learning-softmax-uncertainty-clipping.

  3. 3.

    The average training time of the Transformer-encoder models is around 600 s combined for a single AL experiment.

References

  1. Baram, Y., Yaniv, R.E., Luz, K.: Online choice of active learning algorithms. J. Mach. Learn. Res. 5(Mar), 255–291 (2004)

    Google Scholar 

  2. Coleman, C., et al.: Selection via proxy: Efficient data selection for deep learning. ICLR (2020)

    Google Scholar 

  3. D’Arcy, M., Downey, D.: Limitations of active learning with deep transformer language models (2022)

    Google Scholar 

  4. Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186. Association for Computational Linguistics (2019)

    Google Scholar 

  5. Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: ICML, pp. 1050–1059. PMLR (2016)

    Google Scholar 

  6. Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: International Conference on Machine Learning, pp. 1183–1192. PMLR (2017)

    Google Scholar 

  7. Gawlikowski, J., et al.: A survey of uncertainty in deep neural networks. arXiv preprint arXiv:2107.03342 (2021)

  8. Gleave, A., Irving, G.: Uncertainty estimation for language reward models. arXiv preprint arXiv:2203.07472 (2022)

  9. Gonsior, J., Rehak, J., Thiele, M., Koci, E., Günther, M., Lehner, W.: Active learning for spreadsheet cell classification. In: EDBT/ICDT Workshops (2020)

    Google Scholar 

  10. Gonsior, J., Thiele, M., Lehner, W.: Imital: learned active learning strategy on synthetic data. In: Pascal, P., Ienco, D. (eds.) Discovery Science, pp. 47–56. Springer Nature Switzerland, Cham (2022)

    Chapter  Google Scholar 

  11. Hein, M., Andriushchenko, M., Bitterwolf, J.: Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: CVPR, pp. 41–50. IEEE (2019)

    Google Scholar 

  12. Hsu, W.N., Lin, H.T.: Active learning by learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2659–2665. AAAI’15, AAAI Press (2015)

    Google Scholar 

  13. Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: NeurIPS 31 (2018)

    Google Scholar 

  14. Karamcheti, S., Krishna, R., Fei-Fei, L., Manning, C.: Mind your outliers! investigating the negative impact of outliers on active learning for visual question answering. In: ACL-IJCNLP, pp. 7265–7281. Association for Computational Linguistics (2021)

    Google Scholar 

  15. Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NeurIPS 30 (2017)

    Google Scholar 

  16. LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)

    Article  Google Scholar 

  17. Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR ’94, pp. 3–12. Springer, London (1994)

    Google Scholar 

  18. Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)

  19. Lowell, D., Lipton, Z.C., Wallace, B.C.: Practical obstacles to deploying active learning. In: EMNLP-IJCNLP, pp. 21–30 (2019)

    Google Scholar 

  20. McCallumzy, A.K., Nigamy, K.: Employing em and pool-based active learning for text classification. In: ICML, pp. 359–367. Citeseer (1998)

    Google Scholar 

  21. Możejko, M., Susik, M., Karczewski, R.: Inhibited softmax for uncertainty estimation in neural networks. arXiv preprint arXiv:1810.01861 (2018)

  22. Pearce, T., Brintrup, A., Zhu, J.: Understanding softmax confidence and uncertainty. arXiv preprint arXiv:2106.04972 (2021)

  23. Sankararaman, K.A., Wang, S., Fang, H.: Bayesformer: Transformer with uncertainty estimation. arXiv preprint arXiv:2206.00826 (2022)

  24. Scheffer, T., Decomain, C., Wrobel, S.: Mining the web with active hidden markov models. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) ICDM, pp. 309–318. IEEE Comput. Soc (2001)

    Google Scholar 

  25. Schröder, C., Müller, L., Niekler, A., Potthast, M.: Small-text: Active learning for text classification in python. arXiv preprint arXiv:2107.10314 (2021)

  26. Schröder, C., Niekler, A.: A survey of active learning for text classification using deep neural networks. arXiv preprint arXiv:2008.07267 (2020)

  27. Schröder, C., Niekler, A., Potthast, M.: Revisiting uncertainty-based query strategies for active learning with transformers. In: ACL, pp. 2194–2203. Association for Computational Linguistics (2022)

    Google Scholar 

  28. Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. In: NeurIPS 31 (2018)

    Google Scholar 

  29. Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)

    MathSciNet  MATH  Google Scholar 

  30. Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop On Computational Learning Theory, pp. 287–294. COLT ’92, ACM (1992)

    Google Scholar 

  31. Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)

    Article  MathSciNet  MATH  Google Scholar 

  32. Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826. IEEE (2016)

    Google Scholar 

  33. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: NeurIPS 30 (2017)

    Google Scholar 

  34. Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). arXiv preprint arXiv:2205.00664 (2022)

  35. Yoo, D., Kweon, I.S.: Learning loss for active learning. In: Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, pp. 93–102 (2019)

    Google Scholar 

  36. Zhan, X., Liu, H., Li, Q., Chan, A.B.: A comparative survey: Benchmarking for pool-based active learning. In: IJCAI, pp. 4679–4686 (2021), survey Track

    Google Scholar 

  37. Zhang, J., Kailkhura, B., Han, T.Y.J.: Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In: ICML, pp. 11117–11128. PMLR (2020)

    Google Scholar 

Download references

Acknowledgments

The authors are grateful to the Center for Information Services and High Performance Computing ZIH at TU Dresden for providing its facilities for high throughput calculations. This research was funded by the German Federal Ministry of Education and Research (BMBF) through grants 01IS17044 Software Campus 2.0 (TU Dresden).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Julius Gonsior .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Gonsior, J. et al. (2023). Comparing and Improving Active Learning Uncertainty Measures for Transformer Models. In: Abelló, A., Vassiliadis, P., Romero, O., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2023. Lecture Notes in Computer Science, vol 13985. Springer, Cham. https://doi.org/10.1007/978-3-031-42914-9_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-031-42914-9_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-031-42913-2

  • Online ISBN: 978-3-031-42914-9

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics