Abstract
Despite achieving state-of-the-art results in nearly all Natural Language Processing applications, fine-tuning Transformer-encoder based language models still requires a significant amount of labeled data to achieve satisfying work. A well known technique to reduce the amount of human effort in acquiring a labeled dataset is Active Learning (AL): an iterative process in which only the minimal amount of samples is labeled. AL strategies require access to a quantified confidence measure of the model predictions. A common choice is the softmax activation function for the final Neural Network layer. In this paper we compare eight alternatives on seven datasets and show that the softmax function provides misleading probabilities. Our finding is that most of the methods primarily identify hard-to-learn-from samples (outliers), resulting in worse than random performance, instead of samples, which reduce the uncertainty of the learned language model. As a solution this paper proposes a heuristic to systematically exclude samples, which results in improvements of various methods compared to the softmax function.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
Notes
- 1.
According to scale.ai as of December 2021 (as of 2023 the cost is not publicly visible anymore): https://web.archive.org/web/20210112234705/https://scale.com/pricing.
- 2.
- 3.
The average training time of the Transformer-encoder models is around 600 s combined for a single AL experiment.
References
Baram, Y., Yaniv, R.E., Luz, K.: Online choice of active learning algorithms. J. Mach. Learn. Res. 5(Mar), 255–291 (2004)
Coleman, C., et al.: Selection via proxy: Efficient data selection for deep learning. ICLR (2020)
D’Arcy, M., Downey, D.: Limitations of active learning with deep transformer language models (2022)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: NAACL, pp. 4171–4186. Association for Computational Linguistics (2019)
Gal, Y., Ghahramani, Z.: Dropout as a bayesian approximation: Representing model uncertainty in deep learning. In: ICML, pp. 1050–1059. PMLR (2016)
Gal, Y., Islam, R., Ghahramani, Z.: Deep bayesian active learning with image data. In: International Conference on Machine Learning, pp. 1183–1192. PMLR (2017)
Gawlikowski, J., et al.: A survey of uncertainty in deep neural networks. arXiv preprint arXiv:2107.03342 (2021)
Gleave, A., Irving, G.: Uncertainty estimation for language reward models. arXiv preprint arXiv:2203.07472 (2022)
Gonsior, J., Rehak, J., Thiele, M., Koci, E., Günther, M., Lehner, W.: Active learning for spreadsheet cell classification. In: EDBT/ICDT Workshops (2020)
Gonsior, J., Thiele, M., Lehner, W.: Imital: learned active learning strategy on synthetic data. In: Pascal, P., Ienco, D. (eds.) Discovery Science, pp. 47–56. Springer Nature Switzerland, Cham (2022)
Hein, M., Andriushchenko, M., Bitterwolf, J.: Why relu networks yield high-confidence predictions far away from the training data and how to mitigate the problem. In: CVPR, pp. 41–50. IEEE (2019)
Hsu, W.N., Lin, H.T.: Active learning by learning. In: Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 2659–2665. AAAI’15, AAAI Press (2015)
Jiang, H., Kim, B., Guan, M., Gupta, M.: To trust or not to trust a classifier. In: NeurIPS 31 (2018)
Karamcheti, S., Krishna, R., Fei-Fei, L., Manning, C.: Mind your outliers! investigating the negative impact of outliers on active learning for visual question answering. In: ACL-IJCNLP, pp. 7265–7281. Association for Computational Linguistics (2021)
Lakshminarayanan, B., Pritzel, A., Blundell, C.: Simple and scalable predictive uncertainty estimation using deep ensembles. In: NeurIPS 30 (2017)
LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
Lewis, D.D., Gale, W.A.: A sequential algorithm for training text classifiers. In: SIGIR ’94, pp. 3–12. Springer, London (1994)
Liu, Y., et al.: Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692 (2019)
Lowell, D., Lipton, Z.C., Wallace, B.C.: Practical obstacles to deploying active learning. In: EMNLP-IJCNLP, pp. 21–30 (2019)
McCallumzy, A.K., Nigamy, K.: Employing em and pool-based active learning for text classification. In: ICML, pp. 359–367. Citeseer (1998)
Możejko, M., Susik, M., Karczewski, R.: Inhibited softmax for uncertainty estimation in neural networks. arXiv preprint arXiv:1810.01861 (2018)
Pearce, T., Brintrup, A., Zhu, J.: Understanding softmax confidence and uncertainty. arXiv preprint arXiv:2106.04972 (2021)
Sankararaman, K.A., Wang, S., Fang, H.: Bayesformer: Transformer with uncertainty estimation. arXiv preprint arXiv:2206.00826 (2022)
Scheffer, T., Decomain, C., Wrobel, S.: Mining the web with active hidden markov models. In: Hoffmann, F., Hand, D.J., Adams, N., Fisher, D., Guimaraes, G. (eds.) ICDM, pp. 309–318. IEEE Comput. Soc (2001)
Schröder, C., Müller, L., Niekler, A., Potthast, M.: Small-text: Active learning for text classification in python. arXiv preprint arXiv:2107.10314 (2021)
Schröder, C., Niekler, A.: A survey of active learning for text classification using deep neural networks. arXiv preprint arXiv:2008.07267 (2020)
Schröder, C., Niekler, A., Potthast, M.: Revisiting uncertainty-based query strategies for active learning with transformers. In: ACL, pp. 2194–2203. Association for Computational Linguistics (2022)
Sensoy, M., Kaplan, L., Kandemir, M.: Evidential deep learning to quantify classification uncertainty. In: NeurIPS 31 (2018)
Settles, B.: Active learning. Synth. Lect. Artif. Intell. Mach. Learn. 6(1), 1–114 (2012)
Seung, H.S., Opper, M., Sompolinsky, H.: Query by committee. In: Proceedings of the Fifth Annual Workshop On Computational Learning Theory, pp. 287–294. COLT ’92, ACM (1992)
Shannon, C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27(3), 379–423 (1948)
Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: CVPR, pp. 2818–2826. IEEE (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention is all you need. In: NeurIPS 30 (2017)
Weiss, M., Tonella, P.: Simple techniques work surprisingly well for neural network test prioritization and active learning (replicability study). arXiv preprint arXiv:2205.00664 (2022)
Yoo, D., Kweon, I.S.: Learning loss for active learning. In: Proceedings of the IEEE/CVF Conference On Computer Vision and Pattern Recognition, pp. 93–102 (2019)
Zhan, X., Liu, H., Li, Q., Chan, A.B.: A comparative survey: Benchmarking for pool-based active learning. In: IJCAI, pp. 4679–4686 (2021), survey Track
Zhang, J., Kailkhura, B., Han, T.Y.J.: Mix-n-match: Ensemble and compositional methods for uncertainty calibration in deep learning. In: ICML, pp. 11117–11128. PMLR (2020)
Acknowledgments
The authors are grateful to the Center for Information Services and High Performance Computing ZIH at TU Dresden for providing its facilities for high throughput calculations. This research was funded by the German Federal Ministry of Education and Research (BMBF) through grants 01IS17044 Software Campus 2.0 (TU Dresden).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Gonsior, J. et al. (2023). Comparing and Improving Active Learning Uncertainty Measures for Transformer Models. In: Abelló, A., Vassiliadis, P., Romero, O., Wrembel, R. (eds) Advances in Databases and Information Systems. ADBIS 2023. Lecture Notes in Computer Science, vol 13985. Springer, Cham. https://doi.org/10.1007/978-3-031-42914-9_9
Download citation
DOI: https://doi.org/10.1007/978-3-031-42914-9_9
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-42913-2
Online ISBN: 978-3-031-42914-9
eBook Packages: Computer ScienceComputer Science (R0)