Comparing and Improving Active Learning Uncertainty Measures for Transformer Models

Gonsior, Julius; Falkenberg, Christian; Magino, Silvio; Reusch, Anja; Hartmann, Claudio; Thiele, Maik; Lehner, Wolfgang

doi:10.1007/978-3-031-42914-9_9

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13985))

Included in the following conference series:

European Conference on Advances in Databases and Information Systems

327 Accesses

Abstract

Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-encoder based language models still requires a significant amount of labeled data to achieve satisfying work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is Active Learning (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final Neural Network layer. In this paper we compare eight alternatives on seven datasets and show that the softmax function provides misleading probabilities. Our finding is that most of the methods primarily identify hard-to-learn-from samples (outliers), resulting in worse than random performance, instead of samples, which reduce the uncertainty of the learned language model. As a solution this paper proposes a heuristic to systematically exclude samples, which results in improvements of various methods compared to the softmax function.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers

Towards Linguistically Informed Multi-objective Transformer Pre-training for Natural Language Inference

Structure-inducing pre-training

Article Open access 01 June 2023

Notes

1.
According to scale.ai as of December 2021 (as of 2023 the cost is not publicly visible anymore): https://web.archive.org/web/20210112234705/https://scale.com/pricing.
2.
https://github.com/jgonsior/active-learning-softmax-uncertainty-clipping.
3.
The average training time of the Transformer-encoder models is around 600 s combined for a single AL experiment.

References

Baram, Y., Yaniv, R.E., Luz, K.: Online choice of active learning algorithms. J. Mach. Learn. Res. 5(Mar), 255–291 (2004)
Google Scholar
Coleman, C., et al.: Selection via proxy: Efficient data selection for deep learning. ICLR (2020)
Google Scholar
D’Arcy, M., Downey, D.: Limitations of active learning with deep transformer language models (2022)
Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186. Association for Computational Linguistics (2019)
Google Scholar
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: ICML, pp. 1050–1059. PMLR (2016)
Google Scholar
Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: International Conference on Machine Learning, pp. 1183–1192. PMLR (2017)
Google Scholar
Gawlikowski, J., et al.: A survey of uncertainty in deep neural networks. arXiv preprint arXiv:2107.03342 (2021)
Gleave, A., Irving, G.: Uncertainty estimation for language reward models. arXiv preprint arXiv:2203.07472 (2022)
Gonsior, J., Rehak, J., Thiele, M., Koci, E., Günther, M., Lehner, W.: Active learning for spreadsheet cell classification. In: EDBT/ICDT Workshops (2020)
Google Scholar
Gonsior, J., Thiele, M., Lehner, W.: Imital: learned active learning strategy on synthetic data. In: Pascal, P., Ienco, D. (eds.) Discovery Science, pp. 47–56. Springer Nature Switzerland, Cham (2022)
Chapter Google Scholar
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: CVPR, pp. 41–50. IEEE (2019)
Google Scholar
Hsu, W.N., Lin, H.T.: Active learning by learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2659–2665. AAAI’15, AAAI Press (2015)
Google Scholar
Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: NeurIPS 31 (2018)
Google Scholar
Karamcheti, S., Krishna, R., Fei-Fei, L., Manning, C.: Mind your outliers! investigating the negative impact of outliers on active learning for visual question answering. In: ACL-IJCNLP, pp. 7265–7281. Association for Computational Linguistics (2021)
Google Scholar
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NeurIPS 30 (2017)
Google Scholar
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Article Google Scholar
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR ’94, pp. 3–12. Springer, London (1994)
Google Scholar
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lowell, D., Lipton, Z.C., Wallace, B.C.: Practical obstacles to deploying active learning. In: EMNLP-IJCNLP, pp. 21–30 (2019)
Google Scholar
McCallumzy, A.K., Nigamy, K.: Employing em and pool-based active learning for text classification. In: ICML, pp. 359–367. Citeseer (1998)
Google Scholar
Możejko, M., Susik, M., Karczewski, R.: Inhibited softmax for uncertainty estimation in neural networks. arXiv preprint arXiv:1810.01861 (2018)
Pearce, T., Brintrup, A., Zhu, J.: Understanding softmax confidence and uncertainty. arXiv preprint arXiv:2106.04972 (2021)
Sankararaman, K.A., Wang, S., Fang, H.: Bayesformer: Transformer with uncertainty estimation. arXiv preprint arXiv:2206.00826 (2022)
Scheffer, T., Decomain, C., Wrobel, S.: Mining the web with active hidden markov models. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) ICDM, pp. 309–318. IEEE Comput. Soc (2001)
Google Scholar
Schröder, C., Müller, L., Niekler, A., Potthast, M.: Small-text: Active learning for text classification in python. arXiv preprint arXiv:2107.10314 (2021)
Schröder, C., Niekler, A.: A survey of active learning for text classification using deep neural networks. arXiv preprint arXiv:2008.07267 (2020)
Schröder, C., Niekler, A., Potthast, M.: Revisiting uncertainty-based query strategies for active learning with transformers. In: ACL, pp. 2194–2203. Association for Computational Linguistics (2022)
Google Scholar
Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. In: NeurIPS 31 (2018)
Google Scholar
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
MathSciNet MATH Google Scholar
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop On Computational Learning Theory, pp. 287–294. COLT ’92, ACM (1992)
Google Scholar
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Article MathSciNet MATH Google Scholar
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826. IEEE (2016)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: NeurIPS 30 (2017)
Google Scholar
Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). arXiv preprint arXiv:2205.00664 (2022)
Yoo, D., Kweon, I.S.: Learning loss for active learning. In: Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, pp. 93–102 (2019)
Google Scholar
Zhan, X., Liu, H., Li, Q., Chan, A.B.: A comparative survey: Benchmarking for pool-based active learning. In: IJCAI, pp. 4679–4686 (2021), survey Track
Google Scholar
Zhang, J., Kailkhura, B., Han, T.Y.J.: Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In: ICML, pp. 11117–11128. PMLR (2020)
Google Scholar

Download references

Acknowledgments

The authors are grateful to the Center for Information Services and High Performance Computing ZIH at TU Dresden for providing its facilities for high throughput calculations. This research was funded by the German Federal Ministry of Education and Research (BMBF) through grants 01IS17044 Software Campus 2.0 (TU Dresden).

Author information

Authors and Affiliations

Technische Universität Dresden, Dresden, Germany
Julius Gonsior, Christian Falkenberg, Silvio Magino, Anja Reusch, Claudio Hartmann & Wolfgang Lehner
Hochschule für Technik und Wirtschaft Dresden, Dresden, Germany
Maik Thiele

Authors

Julius Gonsior
View author publications
You can also search for this author in PubMed Google Scholar
Christian Falkenberg
View author publications
You can also search for this author in PubMed Google Scholar
Silvio Magino
View author publications
You can also search for this author in PubMed Google Scholar
Anja Reusch
View author publications
You can also search for this author in PubMed Google Scholar
Claudio Hartmann
View author publications
You can also search for this author in PubMed Google Scholar
Maik Thiele
View author publications
You can also search for this author in PubMed Google Scholar
Wolfgang Lehner
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Julius Gonsior .

Editor information

Editors and Affiliations

Universitat Politècnica de Catalunya, Barcelona, Spain
Alberto Abelló
University of Ioannina, Ioannina, Greece
Panos Vassiliadis
Universitat Politècnica de Catalunya, Barcelona, Spain
Oscar Romero
Poznan University of Technology, Poznan, Poland
Robert Wrembel

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Gonsior, J. et al. (2023). Comparing and Improving Active Learning Uncertainty Measures for Transformer Models. In: Abelló, A., Vassiliadis, P., Romero, O., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2023. Lecture Notes in Computer Science, vol 13985. Springer, Cham. https://doi.org/10.1007/978-3-031-42914-9_9

Download citation

DOI: https://doi.org/10.1007/978-3-031-42914-9_9
Published: 28 August 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42913-2
Online ISBN: 978-3-031-42914-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Comparing and Improving Active Learning Uncertainty Measures for Transformer Models

Abstract

Access this chapter

Similar content being viewed by others

Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers

Towards Linguistically Informed Multi-objective Transformer Pre-training for Natural Language Inference

Structure-inducing pre-training

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Navigation

Comparing and Improving Active Learning Uncertainty Measures for Transformer Models

Abstract

Access this chapter

Similar content being viewed by others

Intelligent Learning Rate Distribution to Reduce Catastrophic Forgetting in Transformers

Towards Linguistically Informed Multi-objective Transformer Pre-training for Natural Language Inference

Structure-inducing pre-training

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation