Attention and self-attention in random forests

Utkin, Lev V.; Konstantinov, Andrei V.; Kirpichenko, Stanislav R.

doi:10.1007/s13748-023-00301-0

Attention and self-attention in random forests

Regular Paper
Published: 11 May 2023

Volume 12, pages 257–273, (2023)
Cite this article

Progress in Artificial Intelligence Aims and scope Submit manuscript

Lev V. Utkin ORCID: orcid.org/0000-0002-5637-1420¹^na1,
Andrei V. Konstantinov¹^na1 &
Stanislav R. Kirpichenko¹

442 Accesses
Explore all metrics

Abstract

New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya–Watson kernel regression and the Huber’s contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the self-attention are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. The proposed modifications of the attention and self-attention combinations are verified and compared with other random forest models by using several datasets. The code implementing the corresponding algorithms is publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Article 18 August 2021

Microsoft COCO: Common Objects in Context

A survey on Image Data Augmentation for Deep Learning

Article Open access 06 July 2019

Data availability

Data are available from open sources.

Notes

References

Arik, S., Pfister, T.: Tabnet: Attentive interpretable tabular learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6679–6687 (2021)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Preprint at arXiv:1409.0473 (2014)
Beltagy, I., Peters, M., Cohan, A.: Longformer: The long-document transformer.Preprint at arXiv:2004.05150 (2020)
Borisov, V., Leemann, T., Sessler, K., et al.: Deep neural networks and tabular data: A survey. Preprint at arXiv:2110.01889v2 (2021)
Brauwers, G., Frasincar, F.: A general survey on attention mechanisms in deep learning. Preprint at arXiv:2203.14263 (2022)
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)
Article MATH Google Scholar
Chaudhari, S., Mithal, V., Polatkan, G., et al.: An attentive survey of attention models. Preprint at arXiv:1904.02874 (2019)
Chen, Z., Xie, L., Niu, J., et al.: Joint self-attention and scale-aggregation for self-calibrated deraining network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2517–2525 (2020)
Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. Preprint at arXiv:1601.06733 (2016)
Choromanski, K., Chen, H., Lin, H., et al.: Hybrid random features. Preprint at arXiv:2110.04367v2 (2021a)
Choromanski, K., Likhosherstov, V., Dohan, D., et al.: Rethinking attention with performers. In: 2021 International Conference on Learning Representations, pp. 1–38 (2021b)
Correia, A., Colombini, E.: Attention, please! A survey of neural attention models in deep learning. Preprint at arXiv:2103.16775 (2021a)
Correia, A., Colombini, E.: Neural attention models in deep learning: survey and taxonomy. Preprint at arXiv:2112.05909 (2021b)
Daho, M., Settouti, N., Lazouni, M., et al.: Weighted vote for trees aggregation in random forest. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS). IEEE, pp. 438–443 (2014)
Dai, Z., Yang, Z., Yang, Y., et al.: Transformer-xl: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 2978–2988 (2019)
Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)
MathSciNet MATH Google Scholar
Devlin, J., Chang, M., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv:1810.04805 (2018)
Dua, D., Graff, C.: UCI machine learning repository. (2017). http://archive.ics.uci.edu/ml
Fournier, Q., Caron, G., Aloise, D.: A practical survey on faster and lighter transformers. Preprint at arXiv:2103.14636 (2021)
Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)
Article MathSciNet MATH Google Scholar
Friedman, J.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
Article MathSciNet MATH Google Scholar
Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)
Article MATH Google Scholar
Goncalves, T., Rio-Torto, I., Teixeira, L., et al.: A survey on attention mechanisms for medical applications: are we moving towards better algorithms?. Preprint at arXiv:2204.12406 (2022)
Guo, MH., Liu, ZN., Mu, T.J., et al.: Beyond self-attention: external attention using two linear layers for visual tasks. Preprint at arXiv:2105.02358 (2021)
Hassanin, M., Anwar, S., Radwan, I., et al.: Visual attention methods in deep learning: an in-depth survey. Preprint at arXiv:2204.07756 (2022)
Huber, P.: Robust Statistics. Wiley, New York (1981)
Book MATH Google Scholar
Katzir, L., Elidan, G., El-Yaniv, R.: Net-dnf: effective deep modeling of tabular data. In: 9th International Conference on Learning Representations, ICLR 2021, pp 1–16 (2021)
Khan, S., Naseer, M., Hayat, M., et al.: Transformers in vision: a survey. ACM Comput. Surv. 54, 1–41 (2022)
Article Google Scholar
Kim, H., Kim, H., Moon, H., et al.: A weight-adjusted voting algorithm for ensemble of classifiers. J. Korean Stat. Soc. 40(4), 437–449 (2011)
Article MathSciNet MATH Google Scholar
Konstantinov, A., Utkin, L., Kirpichenko, S.: AGBoost: attention-based modification of gradient boosting machine. In: 31st Conference of Open Innovations Association (FRUCT). IEEE, pp. 96–101 (2022)
Li, H.B., Wang, W., Ding, H.W, et al.: Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th International Conference on E-Business Engineering. IEEE, pp. 160–163 (2010)
Li, M., Hsu, W., Xie, X., et al.: SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network. IEEE Trans. Med. Imaging 39(7), 2289–2301 (2020)
Article Google Scholar
Lin, T., Wang, Y., Liu, X., et al.: A survey of transformers. Preprint at arXiv:2106.04554 (2021)
Lin, Z., Feng, M., dos Santos, C., et al.: A structured self-attentive sentence embedding. In: The 5th International Conference on Learning Representations (ICLR 2017), pp. 1–15 (2017)
Liu, F., Huang, X., Chen, Y., et al.: Random features for kernel approximation: A survey on algorithms, theory, and beyond. Preprint at arXiv:2004.11154v5 (2021a)
Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp. 10,012–10,022 (2021b)
Luong, T., Pham, H., Manning, C.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics, pp. 1412–1421 (2015)
Ma, X., Kong, X., Wang, S., et al.: Luna: Linear unified nested attention. Preprint at arXiv:2106.01540 (2021)
Nadaraya, E.: On estimating regression. Theory Probab. Appl. 9(1), 141–142 (1964)
Article MATH Google Scholar
Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021)
Article Google Scholar
Parikh, A., Tackstrom, O., Das, D., et al.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 2249–2255 (2016)
Parmar, N., Vaswani, A., Uszkoreit, J., et al.: Image transformer. In: International Conference on Machine Learning. PMLR, pp. 4055–4064 (2018)
Peng, H., Pappas, N., Yogatama, D., et al.: Random feature attention. In: International Conference on Learning Representations (ICLR 2021), pp. 1–19 (2021)
Povey, D., Hadian, H., Ghahremani, P., et al.: A time-restricted self-attention layer for ASR. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5874–5878 (2018)
Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, pp.1–13 (2019)
Ronao, C., Cho, S.B.: Random forests with weighted voting for anomalous query access detection in relational databases. In: Artificial Intelligence and Soft Computing. ICAISC 2015, Lecture Notes in Computer Science, vol. 9120, pp. 36–48. Springer, Cham (2015)
Schlag, I., Irie, K., Schmidhuber, J.: Linear transformers are secretly fast weight programmers. In: International Conference on Machine Learning 2021. PMLR, pp. 9355–9366 (2021)
Shen, Z., Bello, I., Vemulapalli, R., et al.: Global self-attention networks for image recognition. Preprint at arXiv:2010.03019 (2020)
Shim, K., Choi, J., Sung, W.: Understanding the role of self attention for efficient speech recognition. In: The Tenth International Conference on Learning Representations (ICLR), pp. 1–19 (2022)
Shwartz-Ziv, R., Amitai, A.: Tabular data: deep learning is not all you need. Inf. Fus. 81, 84–90 (2022)
Article Google Scholar
Somepalli, G., Goldblum, M., Schwarzschild, A., et al.: Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. Preprint at arXiv:2106.01342 (2021)
Soydaner, D.: Attention mechanism in neural networks: where it comes and where it goes. Preprint at arXiv:2204.13154 (2022)
Tay, Y., Dehghani, M., Bahri, D., et al.: Efficient transformers: a survey. ACM Comput. Surv. 55(6), 1–28 (2022)
Article Google Scholar
Tian, C., Fei, L., Zheng, W., et al.: Deep learning on image denoising: an overview. Neural Netw. 131, 251–275 (2020)
Article MATH Google Scholar
Utkin, L., Konstantinov, A.: Attention-based random forest and contamination model. Neural Netw. 154, 346–359 (2022)
Article Google Scholar
Utkin, L., Konstantinov, A., Chukanov, V., et al.: A weighted random survival forest. Knowl.-Based Syst. 177, 136–144 (2019)
Article Google Scholar
Utkin, L., Kovalev, M., Meldo, A.: A deep forest classifier with weights of class probability distribution subsets. Knowl.-Based Syst. 173, 15–27 (2019)
Article Google Scholar
Utkin, L., Konstantinov, A., Chukanov, V., et al.: A new adaptive weighted deep forest and its modifications. Int. J. Inf. Technol. Decis. Mak. 19(4), 963–986 (2020)
Article Google Scholar
Utkin, L., Kovalev, M., Coolen, F.: Imprecise weighted extensions of random forests for classification and regression. Appl. Soft Comput. 92(106324), 1–14 (2020)
Google Scholar
Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, Curran Associates pp. 5998–6008, (2017)
Vidal, R.: Attention: Self-expression is all you need, iCLR 2022, OpenReview.net. https://openreview.net/forum?id=MmujBClawFo (2022)
Vyas, A., Katharopoulos, A., Fleuret, F.: Fast transformers with clustered attention. In: Advances in Neural Information Processing Systems 33, pp. 21665–21674 (2020)
Wang, F., Jiang, M., Qian, C., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2017)
Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)
Watson, G.: Smooth regression analysis. Sankhya: Indian J. Stat. Ser. A 26, 359–372 (1964)
MathSciNet MATH Google Scholar
Winham, S., Freimuth, R., Biernacka, J.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. 6(6), 496–505 (2013)
Article MathSciNet MATH Google Scholar
Wu, F., Fan, A., Baevski, A., et al.: Pay less attention with lightweight and dynamic convolutions. In: International Conference on Learning Representations (ICLR 2019), pp. 1–14 (2019)
Xu, Y., Wei, H., Lin, M., et al.: Transformers in computational visual media: a survey. Comput. Vis. Media 8(1), 33–62 (2022)
Article Google Scholar
Xuan, S., Liu, G., Li, Z.: Refined weighted random forest and its application to credit card fraud detection. In: Computational Data and Social Networks, pp. 343–355. Springer International Publishing, Cham (2018)
Yu, J., Nie, Y., Long, C., et al.: Monte Carlo denoising via auxiliary feature guided self-attention. ACM Trans. Gr. 40(6), 1–13 (2021)
Article Google Scholar
Zhang, A., Lipton, Z., Li, M., et al.: Dive into deep learning. Preprint at arXiv:2106.11342 (2021)
Zhang, H., Quost, B., Masson, M.H.: Cautious weighted random forests. Expert Syst. Appl. 213, 118883 (2023)
Article Google Scholar
Zhang, X., Wang, M.: Weighted random forest algorithm based on bayesian algorithm. In: Journal of Physics: Conference Series, vol 1924. IOP Publishing, p. 012006 (2021)
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020)
Zheng, L., Wang, C., Kong, L.: Linear complexity randomized self-attention mechanism. In: Proceedings of the 39th International Conference on Machine Learning. PMLR, pp. 27011–27041 (2022)
Zhou, Z.H., Feng, J.: Deep forest: Towards an alternative to deep neural networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, Melbourne, Australia, pp. 3553–3559 (2017)
Zuo, Z., Chen, X., Xu, H., et al.: Idea-net: Adaptive dual self-attention network for single image denoising. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pp. 739–748 (2022)

Download references

Acknowledgements

The authors would like to express their appreciation to the anonymous referees whose very valuable comments have improved the paper.

Funding

This work is supported by the Russian Science Foundation under grant 21-11-00116.

Author information

All authors contributed equally to this work.

Authors and Affiliations

Higher School of Artificial Intelligence, Peter the Great St.Petersburg Polytechnic University, Polytechnicheskaya str. 29, Saint Petersburg, Russia, 195251
Lev V. Utkin, Andrei V. Konstantinov & Stanislav R. Kirpichenko

Authors

Lev V. Utkin
View author publications
You can also search for this author in PubMed Google Scholar
Andrei V. Konstantinov
View author publications
You can also search for this author in PubMed Google Scholar
Stanislav R. Kirpichenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lev V. Utkin.

Ethics declarations

Conflict of interest

I certify that no party having a direct interest in the results of the research supporting this article has or will confer a benefit on me or on any organization with which I am associated, and I certify that all financial and material supports for this research and work are clearly identified in the title page of the manuscript.

Code availability

The corresponding code implementing the method is publicly available https://github.com/andruekonst/forest-self-attention.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Utkin, L.V., Konstantinov, A.V. & Kirpichenko, S.R. Attention and self-attention in random forests. Prog Artif Intell 12, 257–273 (2023). https://doi.org/10.1007/s13748-023-00301-0

Download citation

Received: 02 August 2022
Accepted: 20 March 2023
Published: 11 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s13748-023-00301-0

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Attention and self-attention in random forests

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Microsoft COCO: Common Objects in Context

A survey on Image Data Augmentation for Deep Learning

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Attention and self-attention in random forests

Abstract

Access this article

Similar content being viewed by others

Deep Learning: A Comprehensive Overview on Techniques, Taxonomy, Applications and Research Directions

Microsoft COCO: Common Objects in Context

A survey on Image Data Augmentation for Deep Learning

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Code availability

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation