Skip to main content
Log in

Attention and self-attention in random forests

  • Regular Paper
  • Published:
Progress in Artificial Intelligence Aims and scope Submit manuscript

Abstract

New models of random forests jointly using the attention and self-attention mechanisms are proposed for solving the regression problem. The models can be regarded as extensions of the attention-based random forest whose idea stems from applying a combination of the Nadaraya–Watson kernel regression and the Huber’s contamination model to random forests. The self-attention aims to capture dependencies of the tree predictions and to remove noise or anomalous predictions in the random forest. The self-attention module is trained jointly with the attention module for computing weights. It is shown that the training process of attention weights is reduced to solving a single quadratic or linear optimization problem. Three modifications of the self-attention are proposed and compared. A specific multi-head self-attention for the random forest is also considered. Heads of the self-attention are obtained by changing its tuning parameters including the kernel parameters and the contamination parameter of models. The proposed modifications of the attention and self-attention combinations are verified and compared with other random forest models by using several datasets. The code implementing the corresponding algorithms is publicly available.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4

Similar content being viewed by others

Data availability

Data are available from open sources.

Notes

  1. https://github.com/andruekonst/forest-self-attention.

  2. https://www.stat.berkeley.edu/unicode007Ebreiman/bagging.pdf.

References

  1. Arik, S., Pfister, T.: Tabnet: Attentive interpretable tabular learning. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 6679–6687 (2021)

  2. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. Preprint at arXiv:1409.0473 (2014)

  3. Beltagy, I., Peters, M., Cohan, A.: Longformer: The long-document transformer.Preprint at arXiv:2004.05150 (2020)

  4. Borisov, V., Leemann, T., Sessler, K., et al.: Deep neural networks and tabular data: A survey. Preprint at arXiv:2110.01889v2 (2021)

  5. Brauwers, G., Frasincar, F.: A general survey on attention mechanisms in deep learning. Preprint at arXiv:2203.14263 (2022)

  6. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001)

    Article  MATH  Google Scholar 

  7. Chaudhari, S., Mithal, V., Polatkan, G., et al.: An attentive survey of attention models. Preprint at arXiv:1904.02874 (2019)

  8. Chen, Z., Xie, L., Niu, J., et al.: Joint self-attention and scale-aggregation for self-calibrated deraining network. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 2517–2525 (2020)

  9. Cheng, J., Dong, L., Lapata, M.: Long short-term memory-networks for machine reading. Preprint at arXiv:1601.06733 (2016)

  10. Choromanski, K., Chen, H., Lin, H., et al.: Hybrid random features. Preprint at arXiv:2110.04367v2 (2021a)

  11. Choromanski, K., Likhosherstov, V., Dohan, D., et al.: Rethinking attention with performers. In: 2021 International Conference on Learning Representations, pp. 1–38 (2021b)

  12. Correia, A., Colombini, E.: Attention, please! A survey of neural attention models in deep learning. Preprint at arXiv:2103.16775 (2021a)

  13. Correia, A., Colombini, E.: Neural attention models in deep learning: survey and taxonomy. Preprint at arXiv:2112.05909 (2021b)

  14. Daho, M., Settouti, N., Lazouni, M., et al.: Weighted vote for trees aggregation in random forest. In: 2014 International Conference on Multimedia Computing and Systems (ICMCS). IEEE, pp. 438–443 (2014)

  15. Dai, Z., Yang, Z., Yang, Y., et al.: Transformer-xl: attentive language models beyond a fixed-length context. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. Association for Computational Linguistics, pp. 2978–2988 (2019)

  16. Demsar, J.: Statistical comparisons of classifiers over multiple data sets. J. Mach. Learn. Res. 7, 1–30 (2006)

    MathSciNet  MATH  Google Scholar 

  17. Devlin, J., Chang, M., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. Preprint at arXiv:1810.04805 (2018)

  18. Dua, D., Graff, C.: UCI machine learning repository. (2017). http://archive.ics.uci.edu/ml

  19. Fournier, Q., Caron, G., Aloise, D.: A practical survey on faster and lighter transformers. Preprint at arXiv:2103.14636 (2021)

  20. Friedman, J.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 29, 1189–1232 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  21. Friedman, J.: Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)

    Article  MathSciNet  MATH  Google Scholar 

  22. Geurts, P., Ernst, D., Wehenkel, L.: Extremely randomized trees. Mach. Learn. 63, 3–42 (2006)

    Article  MATH  Google Scholar 

  23. Goncalves, T., Rio-Torto, I., Teixeira, L., et al.: A survey on attention mechanisms for medical applications: are we moving towards better algorithms?. Preprint at arXiv:2204.12406 (2022)

  24. Guo, MH., Liu, ZN., Mu, T.J., et al.: Beyond self-attention: external attention using two linear layers for visual tasks. Preprint at arXiv:2105.02358 (2021)

  25. Hassanin, M., Anwar, S., Radwan, I., et al.: Visual attention methods in deep learning: an in-depth survey. Preprint at arXiv:2204.07756 (2022)

  26. Huber, P.: Robust Statistics. Wiley, New York (1981)

    Book  MATH  Google Scholar 

  27. Katzir, L., Elidan, G., El-Yaniv, R.: Net-dnf: effective deep modeling of tabular data. In: 9th International Conference on Learning Representations, ICLR 2021, pp 1–16 (2021)

  28. Khan, S., Naseer, M., Hayat, M., et al.: Transformers in vision: a survey. ACM Comput. Surv. 54, 1–41 (2022)

    Article  Google Scholar 

  29. Kim, H., Kim, H., Moon, H., et al.: A weight-adjusted voting algorithm for ensemble of classifiers. J. Korean Stat. Soc. 40(4), 437–449 (2011)

    Article  MathSciNet  MATH  Google Scholar 

  30. Konstantinov, A., Utkin, L., Kirpichenko, S.: AGBoost: attention-based modification of gradient boosting machine. In: 31st Conference of Open Innovations Association (FRUCT). IEEE, pp. 96–101 (2022)

  31. Li, H.B., Wang, W., Ding, H.W, et al.: Trees weighting random forest method for classifying high-dimensional noisy data. In: 2010 IEEE 7th International Conference on E-Business Engineering. IEEE, pp. 160–163 (2010)

  32. Li, M., Hsu, W., Xie, X., et al.: SACNN: Self-attention convolutional neural network for low-dose CT denoising with self-supervised perceptual loss network. IEEE Trans. Med. Imaging 39(7), 2289–2301 (2020)

    Article  Google Scholar 

  33. Lin, T., Wang, Y., Liu, X., et al.: A survey of transformers. Preprint at arXiv:2106.04554 (2021)

  34. Lin, Z., Feng, M., dos Santos, C., et al.: A structured self-attentive sentence embedding. In: The 5th International Conference on Learning Representations (ICLR 2017), pp. 1–15 (2017)

  35. Liu, F., Huang, X., Chen, Y., et al.: Random features for kernel approximation: A survey on algorithms, theory, and beyond. Preprint at arXiv:2004.11154v5 (2021a)

  36. Liu, Z., Lin, Y., Cao, Y., et al.: Swin transformer: hierarchical vision transformer using shifted windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). IEEE, pp. 10,012–10,022 (2021b)

  37. Luong, T., Pham, H., Manning, C.: Effective approaches to attention-based neural machine translation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics, pp. 1412–1421 (2015)

  38. Ma, X., Kong, X., Wang, S., et al.: Luna: Linear unified nested attention. Preprint at arXiv:2106.01540 (2021)

  39. Nadaraya, E.: On estimating regression. Theory Probab. Appl. 9(1), 141–142 (1964)

    Article  MATH  Google Scholar 

  40. Niu, Z., Zhong, G., Yu, H.: A review on the attention mechanism of deep learning. Neurocomputing 452, 48–62 (2021)

    Article  Google Scholar 

  41. Parikh, A., Tackstrom, O., Das, D., et al.: A decomposable attention model for natural language inference. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, pp. 2249–2255 (2016)

  42. Parmar, N., Vaswani, A., Uszkoreit, J., et al.: Image transformer. In: International Conference on Machine Learning. PMLR, pp. 4055–4064 (2018)

  43. Peng, H., Pappas, N., Yogatama, D., et al.: Random feature attention. In: International Conference on Learning Representations (ICLR 2021), pp. 1–19 (2021)

  44. Povey, D., Hadian, H., Ghahremani, P., et al.: A time-restricted self-attention layer for ASR. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, pp. 5874–5878 (2018)

  45. Ramachandran, P., Parmar, N., Vaswani, A., Bello, I., Levskaya, A., Shlens, J.: Stand-alone self-attention in vision models. In: Proceedings of the 33rd Conference on Neural Information Processing Systems, pp.1–13 (2019)

  46. Ronao, C., Cho, S.B.: Random forests with weighted voting for anomalous query access detection in relational databases. In: Artificial Intelligence and Soft Computing. ICAISC 2015, Lecture Notes in Computer Science, vol. 9120, pp. 36–48. Springer, Cham (2015)

  47. Schlag, I., Irie, K., Schmidhuber, J.: Linear transformers are secretly fast weight programmers. In: International Conference on Machine Learning 2021. PMLR, pp. 9355–9366 (2021)

  48. Shen, Z., Bello, I., Vemulapalli, R., et al.: Global self-attention networks for image recognition. Preprint at arXiv:2010.03019 (2020)

  49. Shim, K., Choi, J., Sung, W.: Understanding the role of self attention for efficient speech recognition. In: The Tenth International Conference on Learning Representations (ICLR), pp. 1–19 (2022)

  50. Shwartz-Ziv, R., Amitai, A.: Tabular data: deep learning is not all you need. Inf. Fus. 81, 84–90 (2022)

    Article  Google Scholar 

  51. Somepalli, G., Goldblum, M., Schwarzschild, A., et al.: Saint: Improved neural networks for tabular data via row attention and contrastive pre-training. Preprint at arXiv:2106.01342 (2021)

  52. Soydaner, D.: Attention mechanism in neural networks: where it comes and where it goes. Preprint at arXiv:2204.13154 (2022)

  53. Tay, Y., Dehghani, M., Bahri, D., et al.: Efficient transformers: a survey. ACM Comput. Surv. 55(6), 1–28 (2022)

    Article  Google Scholar 

  54. Tian, C., Fei, L., Zheng, W., et al.: Deep learning on image denoising: an overview. Neural Netw. 131, 251–275 (2020)

    Article  MATH  Google Scholar 

  55. Utkin, L., Konstantinov, A.: Attention-based random forest and contamination model. Neural Netw. 154, 346–359 (2022)

    Article  Google Scholar 

  56. Utkin, L., Konstantinov, A., Chukanov, V., et al.: A weighted random survival forest. Knowl.-Based Syst. 177, 136–144 (2019)

    Article  Google Scholar 

  57. Utkin, L., Kovalev, M., Meldo, A.: A deep forest classifier with weights of class probability distribution subsets. Knowl.-Based Syst. 173, 15–27 (2019)

    Article  Google Scholar 

  58. Utkin, L., Konstantinov, A., Chukanov, V., et al.: A new adaptive weighted deep forest and its modifications. Int. J. Inf. Technol. Decis. Mak. 19(4), 963–986 (2020)

    Article  Google Scholar 

  59. Utkin, L., Kovalev, M., Coolen, F.: Imprecise weighted extensions of random forests for classification and regression. Appl. Soft Comput. 92(106324), 1–14 (2020)

    Google Scholar 

  60. Vaswani, A., Shazeer, N., Parmar, N., et al.: Attention is all you need. In: Advances in Neural Information Processing Systems 30, Curran Associates pp. 5998–6008, (2017)

  61. Vidal, R.: Attention: Self-expression is all you need, iCLR 2022, OpenReview.net. https://openreview.net/forum?id=MmujBClawFo (2022)

  62. Vyas, A., Katharopoulos, A., Fleuret, F.: Fast transformers with clustered attention. In: Advances in Neural Information Processing Systems 33, pp. 21665–21674 (2020)

  63. Wang, F., Jiang, M., Qian, C., et al.: Residual attention network for image classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 3156–3164 (2017)

  64. Wang, X., Girshick, R., Gupta, A., et al.: Non-local neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7794–7803 (2018)

  65. Watson, G.: Smooth regression analysis. Sankhya: Indian J. Stat. Ser. A 26, 359–372 (1964)

    MathSciNet  MATH  Google Scholar 

  66. Winham, S., Freimuth, R., Biernacka, J.: A weighted random forests approach to improve predictive performance. Stat. Anal. Data Min. 6(6), 496–505 (2013)

    Article  MathSciNet  MATH  Google Scholar 

  67. Wu, F., Fan, A., Baevski, A., et al.: Pay less attention with lightweight and dynamic convolutions. In: International Conference on Learning Representations (ICLR 2019), pp. 1–14 (2019)

  68. Xu, Y., Wei, H., Lin, M., et al.: Transformers in computational visual media: a survey. Comput. Vis. Media 8(1), 33–62 (2022)

    Article  Google Scholar 

  69. Xuan, S., Liu, G., Li, Z.: Refined weighted random forest and its application to credit card fraud detection. In: Computational Data and Social Networks, pp. 343–355. Springer International Publishing, Cham (2018)

  70. Yu, J., Nie, Y., Long, C., et al.: Monte Carlo denoising via auxiliary feature guided self-attention. ACM Trans. Gr. 40(6), 1–13 (2021)

    Article  Google Scholar 

  71. Zhang, A., Lipton, Z., Li, M., et al.: Dive into deep learning. Preprint at arXiv:2106.11342 (2021)

  72. Zhang, H., Quost, B., Masson, M.H.: Cautious weighted random forests. Expert Syst. Appl. 213, 118883 (2023)

    Article  Google Scholar 

  73. Zhang, X., Wang, M.: Weighted random forest algorithm based on bayesian algorithm. In: Journal of Physics: Conference Series, vol 1924. IOP Publishing, p. 012006 (2021)

  74. Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020)

  75. Zheng, L., Wang, C., Kong, L.: Linear complexity randomized self-attention mechanism. In: Proceedings of the 39th International Conference on Machine Learning. PMLR, pp. 27011–27041 (2022)

  76. Zhou, Z.H., Feng, J.: Deep forest: Towards an alternative to deep neural networks. In: Proceedings of the 26th International Joint Conference on Artificial Intelligence (IJCAI’17). AAAI Press, Melbourne, Australia, pp. 3553–3559 (2017)

  77. Zuo, Z., Chen, X., Xu, H., et al.: Idea-net: Adaptive dual self-attention network for single image denoising. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) Workshops, pp. 739–748 (2022)

Download references

Acknowledgements

The authors would like to express their appreciation to the anonymous referees whose very valuable comments have improved the paper.

Funding

This work is supported by the Russian Science Foundation under grant 21-11-00116.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Lev V. Utkin.

Ethics declarations

Conflict of interest

I certify that no party having a direct interest in the results of the research supporting this article has or will confer a benefit on me or on any organization with which I am associated, and I certify that all financial and material supports for this research and work are clearly identified in the title page of the manuscript.

Code availability

The corresponding code implementing the method is publicly available https://github.com/andruekonst/forest-self-attention.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Utkin, L.V., Konstantinov, A.V. & Kirpichenko, S.R. Attention and self-attention in random forests. Prog Artif Intell 12, 257–273 (2023). https://doi.org/10.1007/s13748-023-00301-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13748-023-00301-0

Keywords

Navigation