Pain-attentive network: a deep spatio-temporal attention model for pain estimation


In the video surveillance of medical institutions, pain intensity is a significant clue to the state of patients. Of late, some approaches leverage various spatio-temporal methods to capture the dynamic pain information of videos for accomplishing pain estimation automatically. However, there is still a challenge in the spatio-temporal saliency, which means pain is always reflected in some important regions of informative image frames in a video sequence. To this end, we propose a deep spatio-temporal attention model called as Pain-Attentive Network (PAN), which pays more attention on the saliency in the extraction of dynamic features. PAN consists of two subnetworks: spatial and temporal subnetwork. Especially, in spatial subnetwork, a proposed spatial attention module is embedded to make the spatial feature extraction more targeted. Also, a devised temporal attention module is inserted in temporal subnetwork, so that the temporal features focus on informative image frames. Extensive experiment results on the UNBC-McMaster Shoulder Pain database show that our proposed PAN achieves compelling performances. In addition, to evaluate the generalization, we report competitive results of our proposed method in the Remote Collaborative and Affective database.

This is a preview of subscription content, log in to check access.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16


  1. 1.

    Albanie S, Vedaldi A (2016) Learning grimaces by watching tv. arXiv preprint arXiv:1610.02255

  2. 2.

    Ashraf AB, Lucey S, Cohn JF, Chen T, Ambadar Z, Prkachin KM, Solomon PE (2009) The painful face–pain expression recognition using active appearance models. Image Vision Comput 27(12):1788–1796

    Article  Google Scholar 

  3. 3.

    Bahdanau D, Cho K, Bengio Y (2014) Neural machine translation by jointly learning to align and translate. arXiv:1409.0473

  4. 4.

    Barros P, Parisi GI, Weber C, Wermter S (2017) Emotion-modulated attention improves expression recognition: a deep learning model. Neurocomputing 253:104–114

    Article  Google Scholar 

  5. 5.

    Bartlett MS, Littlewort GC, Frank MG, Lee K (2014) Automatic decoding of facial movements reveals deceptive pain expressions. Curr Biol 24(7):738–743

    Article  Google Scholar 

  6. 6.

    Chung J, Gulcehre C, Cho K, Bengio Y (2014) Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv:1412.3555

  7. 7.

    Cootes TF, Edwards GJ, Taylor CJ (1998) Active appearance models. In: European conference on computer vision. Springer, New York, pp 484–498

  8. 8.

    Dong Y, Zhang Z, Hong WC (2018) A hybrid seasonal mechanism with a chaotic cuckoo search algorithm with a support vector regression model for electric load forecasting. Energies 11(4):1009

    Article  Google Scholar 

  9. 9.

    Florea C, Florea L, Vertan C (2014) Learning pain from emotion: transferred hot data representation for pain intensity estimation. In: European conference on computer vision. Springer, New York, pp 778–790

  10. 10.

    Hammal Z, Cohn JF (2012) Automatic detection of pain intensity. In: Proceedings of the 14th ACM international conference on multimodal interaction, ACM, pp 47–52

  11. 11.

    Hammal Z, Kunz M (2012) Pain monitoring: a dynamic and context-sensitive system. Pattern Recogn 45(4):1265–1280

    Article  Google Scholar 

  12. 12.

    Han J, Zhang Z, Cummins N, Ringeval F, Schuller B (2017) Strength modelling for real-worldautomatic continuous affect recognition from audiovisual signals. Image Vis Comput 65:76–86

    Article  Google Scholar 

  13. 13.

    Hochreiter S, Schmidhuber J (1997) Long short-term memory. Neural Comput 9(8):1735–1780

    Article  Google Scholar 

  14. 14.

    Hong WC, Dong Y, Lai CY, Chen LY, Wei SY (2011) Svr with hybrid chaotic immune algorithm for seasonal load demand forecasting. Energies 4(6):960–977

    Article  Google Scholar 

  15. 15.

    Hong WC, Li MW, Geng J, Zhang Y (2019) Novel chaotic bat algorithm for forecasting complex motion of floating platforms. Appl Math Model 72:425–443

    MathSciNet  Article  Google Scholar 

  16. 16.

    Hong X, Zhao G, Zafeiriou S, Pantic M, Pietikäinen M (2016) Capturing correlations of local features for image representation. Neurocomputing 184:99–106

    Article  Google Scholar 

  17. 17.

    Huang D, Xia Z, Li L, Wang K, Feng X (2019) Pain-awareness multistream convolutional neural network for pain estimation. J Electron Imaging 28(4):043,008

    Article  Google Scholar 

  18. 18.

    Irani R, Nasrollahi K, Moeslund TB (2015) Pain recognition using spatiotemporal oriented energy of facial muscles. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 80–87

  19. 19.

    Jaderberg M, Simonyan K, Zisserman A, et al. (2015) Spatial transformer networks. In: Advances in neural information processing systems, pp 2017–2025

  20. 20.

    Kaltwang S, Rudovic O, Pantic M (2012) Continuous pain intensity estimation from facial expressions. In: International symposium on visual computing. Springer, New York, pp 368–377

  21. 21.

    Kaya H, Gürpınar F, Salah AA (2017) Video-based emotion recognition in the wild using deep transfer learning and score fusion. Image Vision Comput 65:66–75

    Article  Google Scholar 

  22. 22.

    Kollias D, Tzirakis P, Nicolaou MA, Papaioannou A, Zhao G, Schuller B, Kotsia I, Zafeiriou S (2019) Deep affect prediction in-the-wild: Aff-wild database and challenge, deep architectures, and beyond. Int J Comput Vis 127 (6-7):907–929

    Article  Google Scholar 

  23. 23.

    Kundra H, Sadawarti H (2015) Hybrid algorithm of cuckoo search and particle swarm optimization for natural terrain feature extraction. Res J Inf Technol 7(1):58–69

    Google Scholar 

  24. 24.

    Li L, Xia Z, Hadid A, Jiang X, Zhang H, Feng X (2019) Replayed video attack detection based on motion blur analysis. IEEE Trans Inform Forensics Secur 14(9):2246–2261

    Article  Google Scholar 

  25. 25.

    Li Y, Zeng J, Shan S, Chen X (2019) Occlusion aware facial expression recognition using cnn with attention mechanism. IEEE Trans Image Process 28(5):2439–2450

    MathSciNet  Article  Google Scholar 

  26. 26.

    Littlewort GC, Bartlett MS, Lee K (2009) Automatic coding of facial expressions displayed during posed and genuine pain. Image Vis Comput 27(12):1797–1803

    Article  Google Scholar 

  27. 27.

    Liu D, Peng F, Shea A, Picard R (2017) Deepfacelift: interpretable personalized models for automatic estimation of self-reported pain. J Mach Learn Res 66:1–16

    Google Scholar 

  28. 28.

    Liu M, Li S, Shan S, Wang R, Chen X (2014) Deeply learning deformable facial action parts model for dynamic expression analysis. In: Asian conference on computer vision. Springer, New York, pp 143–157

  29. 29.

    Lucey P, Cohn JF, Prkachin KM, Solomon PE, Chew S, Matthews I (2012) Painful monitoring: automatic pain monitoring using the unbc-mcmaster shoulder pain expression archive database. Image Vis Comput 30(3):197–205

    Article  Google Scholar 

  30. 30.

    Lucey P, Cohn JF, Prkachin KM, Solomon PE, Matthews I (2011) Painful data: the unbc-mcmaster shoulder pain expression archive database. In: Face and gesture 2011, IEEE, pp 57–64

  31. 31.

    Martinez L, Rosalind Picard D, et al. (2017) Personalized automatic estimation of self-reported pain intensity from facial expressions. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 70–79

  32. 32.

    Minaee S, Abdolrashidi A (2019) Deep-emotion:, Facial expression recognition using attentional convolutional network. arXiv:1902.01019

  33. 33.

    Neshov N, Manolova A (2015) Pain detection from facial characteristics using supervised descent method. In: 2015 IEEE 8Th international conference on intelligent data acquisition and advanced computing systems: technology and applications (IDAACS), IEEE, vol 1, pp 251–256

  34. 34.

    Pei W, Dibeklioġlu H, Baltrušaitis T, Tax DM (2017) Attended end-to-end architecture for age estimation from facial expression videos. arXiv:1711.08690

  35. 35.

    Prkachin KM (1992) The consistency of facial expressions of pain: a comparison across modalities. Pain 51(3):297–306

    Article  Google Scholar 

  36. 36.

    Rathee N, Ganotra D (2017) A novel approach for continuous pain intensity estimation. In: Proceeding of international conference on intelligent communication, control and devices. Springer, New York, pp 443–450

  37. 37.

    Ringeval F, Sonderegger A, Sauer J, Lalanne D (2013) Introducing the recola multimodal corpus of remote collaborative and affective interactions. In: 2013 10Th IEEE international conference and workshops on automatic face and gesture recognition (FG), IEEE, pp 1–8

  38. 38.

    Rodriguez P, Cucurull G, Gonzàlez J, Gonfaus JM, Nasrollahi K, Moeslund TB, Roca FX (2017) Deep pain: exploiting long short-term memory networks for facial expression classification. IEEE Trans Cybern

  39. 39.

    Rudovic O, Pavlovic V, Pantic M (2015) Context-sensitive dynamic ordinal regression for intensity estimation of facial action units. IEEE Trans Patt Anal Mach Intell 37(5):944–958

    Article  Google Scholar 

  40. 40.

    Ruiz A, Rudovic O, Binefa X, Pantic M (2018) Multi-instance dynamic ordinal random fields for weakly supervised facial behavior analysis. IEEE Trans Image Process 27(8):3969–3982

    MathSciNet  Article  Google Scholar 

  41. 41.

    Sikka K, Dhall A, Bartlett MS (2014) Classification and weakly supervised pain localization using multiple segment representation. Image Vision Comput 32(10):659–670

    Article  Google Scholar 

  42. 42.

    Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

  43. 43.

    Sun W, Zhao H, Jin Z (2018) A visual attention based roi detection method for facial expression recognition. Neurocomputing 296:12–22

    Article  Google Scholar 

  44. 44.

    Tavakolian M, Hadid A (2018) Deep binary representation of facial expressions: a novel framework for automatic pain intensity recognition. In: 2018 25Th IEEE international conference on image processing (ICIP), IEEE, pp 1952–1956

  45. 45.

    Tavakolian M, Hadid A (2018) Deep spatiotemporal representation of the face for automatic pain intensity estimation. In: 2018 24Th international conference on pattern recognition (ICPR), IEEE, pp 350–354

  46. 46.

    Tzirakis P, Trigeorgis G, Nicolaou MA, Schuller BW, Zafeiriou S (2017) End-to-end multimodal emotion recognition using deep neural networks. IEEE J Select Topics Signal Process 11(8):1301–1309

    Article  Google Scholar 

  47. 47.

    Wang F, Xiang X, Liu C, Tran TD, Reiter A, Hager GD, Quon H, Cheng J, Yuille AL (2017) Regularizing face verification nets for pain intensity regression. In: 2017 IEEE international conference on image processing (ICIP), IEEE, pp 1087–1091

  48. 48.

    Wang J, Sun H (2018) Pain intensity estimation using deep spatiotemporal and handcrafted features. IEICE Trans Inf Syst 101(6):1572–1580

    Article  Google Scholar 

  49. 49.

    Werner P, Al-Hamadi A, Limbrecht-Ecklundt K, Walter S, Gruss S, Traue HC (2016) Automatic pain assessment with facial activity descriptors. IEEE Trans Affect Comput 8(3):286–299

    Article  Google Scholar 

  50. 50.

    Werner P, Al-Hamadi A, Niese R (2012) Pain recognition and intensity rating based on comparative learning. In: 2012 19Th IEEE international conference on image processing, IEEE, pp 2313–2316

  51. 51.

    Werner P, Al-Hamadi A, Niese R, Walter S, Gruss S, Traue HC (2014) Automatic pain recognition from video and biomedical signals. In: 2014 22Nd international conference on pattern recognition, IEEE, pp 4582–4587

  52. 52.

    Xia Z, Hong X, Gao X, Feng X, Zhao G (2020) Spatiotemporal recurrent convolutional networks for recognizing spontaneous micro-expressions. IEEE Trans Multimed 22(3):626–640

    Article  Google Scholar 

  53. 53.

    Xu K, Ba J, Kiros R, Cho K, Courville A, Salakhudinov R, Zemel R, Bengio Y (2015) Show, attend and tell: neural image caption generation with visual attention. In: International conference on machine learning, pp 2048–2057

  54. 54.

    Yang R, Hong X, Peng J, Feng X, Zhao G (2018) Incorporating high-level and low-level cues for pain intensity estimation. In: 2018 24Th international conference on pattern recognition (ICPR), IEEE, pp 3495–3500

  55. 55.

    Yang Z, He X, Gao J, Deng L, Smola A (2016) Stacked attention networks for image question answering. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 21–29

  56. 56.

    Zhang Y, Zhao R, Dong W, Hu BG, Ji Q (2018) Bilateral ordinal relevance multi-instance regression for facial action unit intensity estimation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 7034–7043

  57. 57.

    Zhang Z, Hong W, Li J (2020) Electric load forecasting by hybrid self-recurrent support vector regression model with variational mode decomposition and improved cuckoo search algorithm. IEEE Access 8:14,642–14,658

    Article  Google Scholar 

  58. 58.

    Zhang Z, Hong WC (2019) Electric load forecasting by complete ensemble empirical mode decomposition adaptive noise and support vector regression with quantum-based dragonfly algorithm. Nonlinear Dyn 98(2):1107–1136

    Article  Google Scholar 

  59. 59.

    Zhao R, Gan Q, Wang S, Ji Q (2016) Facial expression intensity estimation using ordinal information. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 3466–3474

  60. 60.

    Zhou J, Hong X, Su F, Zhao G (2016) Recurrent convolutional neural network regression for continuous pain intensity estimation in video. In: Proceedings of the IEEE conference on computer vision and pattern recognition workshops, pp 84–92

  61. 61.

    Zwakhalen SM, Hamers JP, Abu-Saad HH, Berger MP (2006) Pain in elderly people with severe dementia: a systematic review of behavioural pain assessment tools. BMC Geriatrics 6(1):3

    Article  Google Scholar 

Download references


This work is partly supported by the National Natural Science Foundation of China (No. 61702419), and the Natural Science Basic Research Plan in Shaanxi Province of China (No. 2018JQ6090).

Author information



Corresponding author

Correspondence to Zhaoqiang Xia.

Ethics declarations

Conflict of interests

There is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Huang, D., Xia, Z., Mwesigye, J. et al. Pain-attentive network: a deep spatio-temporal attention model for pain estimation. Multimed Tools Appl (2020).

Download citation


  • Deep learning
  • Spatio-temporal model
  • Attention mechanism
  • Pain estimation