Skip to main content
Log in

WHITE STAG model: wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

To understand human to human dealing accurately, human interaction recognition (HIR) systems require robust feature extraction and selection methods based on vision sensors. In this paper, we have proposed WHITE STAG model to wisely track human interactions using space time methods as well as shape based angular-geometric sequential approaches over full-body silhouettes and skeleton joints, respectively. After feature extraction, feature space is reduced by employing codebook generation and linear discriminant analysis (LDA). Finally, kernel sliding perceptron is used to recognize multiple classes of human interactions. The proposed WHITE STAG model is validated using two publicly available RGB datasets and one self-annotated intensity interactive dataset as novelty. For evaluation, four experiments are performed using leave-one-out and cross validation testing schemes. Our WHITE STAG model and kernel sliding perceptron outperformed the existing well known statistical state-of-the-art methods by achieving a weighted average recognition rate of 87.48% over UT-Interaction, 87.5% over BIT-Interaction and 85.7% over proposed IM-IntensityInteractive7 datasets. The proposed system should be applicable to various multimedia contents and security applications such as surveillance systems, video based learning, medical futurists, service cobots, and interactive gaming.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20
Fig. 21
Fig. 22
Fig. 23
Fig. 24
Fig. 25
Fig. 26

Similar content being viewed by others

References

  1. Al-amri S, Kalyankar N-V, Khamitkar S-D (2010) Image segmentation by using thershod techniques. J Comput 2:5

    Google Scholar 

  2. Azorin-Lopez J, Saval-Calvo M, Fuster-Guillo A, Garcia-Rodriguez J, Cazorla M, Signes-Pont M-T (2016) Group activity description and recognition based on trajectory analysis and neural networks. Neural Networks (IJCNN). Vancouver, pp 1585–1592

  3. Babiker M, Khalifa O, Htyke K, Hassan A, Zaharadeen M (2017) Automated daily human activity recognition for video surveillance using neural network. Smart Instrum Measur Appl, 1–5

  4. Baldassano C, Beck DM, Fei-Fei L (2017) Human–object interactions are more than the sum of their parts. Cereb Cortex 27(3):2276–2288

    Google Scholar 

  5. Berlin S, John M (2016) Human interaction recognition through deep learning network. Secur Technol, 1–4

  6. Berlin SJ, John M (2016) Human interaction recognition through deep learning network. In: IEEE International Carnahan conference on security technology

  7. Bi S, Liang D (2006) Human segmentation in a complex situation based on properties of the human visual system. In: 6th World Congress on intelligent control and automation. Dalian, pp 9587–9590

  8. Bloom V, Makris D, Argyriou V (2012) G3D: a gaming action dataset and real time action recognition evaluation framework. In: IEEE Computer society conference on computer vision and pattern recognition workshops

  9. Buys K, Cagniart C, Baksheev A, Laet T-D, Schutter J-D, Pantofaru C (2014) An adaptable system for RGB-D based human body detection and pose estimation. J Vis Commun Image Represent 25: 39–52

    Article  Google Scholar 

  10. Chattopadhyay C, Das S (2016) Supervised framework for automatic recognition and retreival of interaction: a framework for classification and retrieving videos with similar human interactions. IET Comput Vis 10:220–227

    Article  Google Scholar 

  11. Cho S, Kwak S, Byun H (2013) Recognizing human-human interaction activities using visual and textual information. Pattern Recogn Lett 34(15):1840–1848

    Article  Google Scholar 

  12. Cho N, Park S, Park J, Park U, Lee S (2017) Compositional interaction descriptor for human interaction recognition. Neurocomputing 267:169–181

    Article  Google Scholar 

  13. Cho N-G, Park S-H, Park J-S, Park U, Lee S-W (2017) Compositional interaction descriptor for human interaction recognition. Neurocomputing 267:169–181

    Article  Google Scholar 

  14. Deepak P, Krishnakumar S, Suresh S (2014) Human recognition for surveillance systems using bounding box. Contemp Comput Inform Mysore 2014:851–856

    Google Scholar 

  15. Desai C, Ramanan D, Fowlkes C (2010) Discriminative models for static human-object interactions. Computer Vision and Pattern Recognition Workshops

  16. Dhanachandra N, Manglem K, Chanu Y-J (2015) Image segmentation using k-means clustering algorithm and subtractive clustering algorithm. Procedia Comput Sci 54:764–771

    Article  Google Scholar 

  17. Duc Thanh N, Li W, Ogunbona P (2009) A novel template matching method for human detection. Image Process Cairo 2009:2549–2552

    Google Scholar 

  18. Duc Thanh N, Li W, Ogunbona P (2010) A part-based template matching method for multi-view human detection. Image and Vision Computing, New Zealand

  19. Fujii T, Lee J, Okamoto S (2014) Gesture recognition system for human-robot interaction and its application to robotic service task. Eng Comput Sci, 63–68

  20. Gaschler A, Jentzsch S, Giuliani M, Huth K, Ruiter J, Knoll A (2012) Social behavior recognition using body posture and head pose for human-robot interaction. Intell Robots Syst, 2128–2133

  21. Heo S, Koo H, Kim H, Cho N (2013) Human segmentation algorithm for real-time video-call applications. In: Asia-Pacific signal and information processing association annual summit and conference

  22. Houda K, Yannick F (2014) Human interaction recognition based on the co-occurrence of visual words. In: CVPR conference, pp 455–460

  23. Hu T, Zhu X, Wang S, Duan L (2018) Human interaction recognition using spatial-temporal salient feature. Multimed Tools Appl, 1–21

  24. Jalal A, Kamal S, Kim D (2015) Depth silhouettes context: a new robust feature for human tracking and activity recognition based on embedded HMMs. URAI

  25. Jalal A, Kamal S, Kim D (2015) Individual detection-tracking-recognition using depth activity images. URAI

  26. Jalal A, Kim Y-H, Kim Y-J, Kamal S, Kim D (2017) Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognition

  27. Ji Y, Ye G, Cheng H (2014) Interactive body part contrast mining for human interaction recognition. In: IEEE International conference on multimedia and expo workshops, pp 1–6

  28. Kim Y, Cho N, Lee S (2014) Group activity recognition with group interaction zone. Pattern Recogn Stockholm 2014:3517–3521

    Google Scholar 

  29. Kong Y, Jia Y (2012) A hierarchical model for human interaction recognition. In: IEEE International conference on multimedia and expo, pp 1–6

  30. Kong Y, Jia Y (2014) Interactive phrases: semantic descriptions for human interaction recognition. IEEE Trans Pattern Anal Mach Intell 36(9):1775–1788

    Article  Google Scholar 

  31. Kong, Jia (2016) A hierarchical model for human interaction recognition. Int Conf Multimed Expo 2:9–13

    Google Scholar 

  32. Kong Y, Jia Y, Fu Y (2012) Learning human interaction by interactive phrases. In: European Conference on computer vision, pp 300–313

  33. Kong Y, Liang W, Dong Z, Jia Y (2014) Recognising human interaction from videos by a discriminative model. IET Comput Vis 8:277–286

    Article  Google Scholar 

  34. Kong Y, Liang W, Dong Z, Jia Y (2014) Recognizing human interactions from videos by a discriminative model. IET Comput Vis 8:277–286

    Article  Google Scholar 

  35. Kong YR, Liang W, Dong Z (2014) Recognising human interaction from videos by a discriminative model. IET Comput Vis, 4

  36. Lee K, Choo C, See H, Tan Z, Lee Y (2010) Human detection using histogram of oriented gradients and Human body ratio estimation. Computer Science and Information Technology. Chengdu, pp 18–22

  37. Li M, Leung H (2018) Multi-view depth-based pairwise feature learning for person-person interaction recognition. Multimed Tools Appl, 1–19

  38. Li H, Ye C, Sample AP (2015) IDSense: a human object interaction detection system based on passive UHF RFID. Human Factors in Computing Systems, 2555–2564

  39. Li N, Cheng X, Guo H, Wu Z (2014) A hybrid method for human interaction recognition using spatio-temporal interest points. In: 22nd International conference on pattern recognition, pp 2513–2518

  40. Li N, Cheng X, Guo H, Wu Z (2014) A hybrid method for human interaction recognition using spatio-temporal interest points. Pattern Recognition

  41. Li N, Cheng X, Guo H, Wu Z (2014) A hybrid method for human interaction recognition using spatio-temporal interest points. Pattern Recognition. Stockholm, pp 2513–2518

  42. Li J, Mao X, Chen L, Wang L (2017) Human interaction recognition fusing multiple features of depth sequences. IET Comput Vis 11:7

    Google Scholar 

  43. Liu B, Ju Z, Liu H (2018) A structured multi-feature representation for recognizing human action and interaction. Neurocomputing 318:287–296

    Article  Google Scholar 

  44. Liu CD, Chung YN, Chung PCJ (2010) An interaction embedded HMM framework for human behavior understanding with nursing environments as examples. IEEE Trans Inf Technol Biomed, 1236–1246

  45. Lu Z, Zheng J (2011) Human segmentation based on Bayesian model. Multimedia Technology. Hangzhou, pp 3547–3550

  46. Magar A, Shinde JV (2015) A New Approach of Human Segmentation from Photo Images. Int J Sci Res Publ 5:1

    Google Scholar 

  47. Marcolin F, Vezzetti E (2017) Novel descriptors for geometrical 3D face analysis. Multimed Tools Appl 76(12):13805–13834

    Article  Google Scholar 

  48. Nguyen N, Yoshitaka A (2014) Human interaction recognition using independent subspace analysis algorithm. In: IEEE International symposium on multimedia, pp 40–46

  49. Nikzad S, Ebrahimnezhad H (2014) Human interaction recognition from distance signature of body centers during time. In: 7th International symposium on telecommunications

  50. Nikzad S, Ebrahimnezhad H (2014) Human interaction recognition from distance signature of body centers during time. In: International symposium on telecommunications, pp 502–506

  51. Park S, Aggarwal J-K (2000) Recognition of human interaction using multiple features in grayscale images. IEEE

  52. Peng Y, Chen Z, Liu C, Zhao D (2017) Moving human detection and extraction via improved optical flow and adjacent region merger. Chinese Automation Congress. Jinan, pp 7384–7388

  53. Rauber T, Berns K (2011) Kernel multilayer perceptron. In: 24th SIBGRAPI conference on graphics, patterns and images

  54. Rodriguez C, Fernando B, Li H (2018) Action anticipation by predicting future dynamic images. Computer Vision Workshops

  55. Ryoo MS, Aggarwal JK (2009) Spatio-temporal relationship match: video structure comparison for recognition of complex human activities. In: Proceedings of ICCV, pp 1593–1600

  56. Schreiber D, Beleznai C, Rauter M (2013) GPU-accelerated human detection using fast directional chamfer matching. Comput Vis Pattern Recogn Workshops Portland 2013:614–621

    Google Scholar 

  57. Shen L, Yeung S, Hoffman J, Mori G, Fei-Fei L (2018) Scaling human-object interaction recognition through zero-shot learning. Proc. of WACV

  58. Solar J, Lee Ferng J, Correa M, Verschae R, R-d-solar J (2010) Real-time hand gesture recognition for human robot interaction. Lecture Notes in Computer Science

  59. Su Y, Li Y, Liu A (2018) Open-view human action recognition based on linear discriminant analysis. Multimed Tools Appl, 767–782

  60. Subetha T, Chitrakala S (2016) Recognition of human-human interaction using CWDTW. In: International conference on circuit, power and computing technologies

  61. Sun B, Kong D, Wang S, Wang L, Wang Y, Yin B (2018) Effective human action recognition using global and local offsets of skeleton joints. Multimed Tools Appl, 1–25

  62. Uddin MdZ (2018) Human activity recognition using segmented body part and body joint features with hidden Markov models. Multimed Tools Appl, 13585–13614

  63. Vezzetti E, Marcolin F, Stola V (2013) 3D human face soft tissues landmarking method: an advanced approach. Comput Industr 64(9):1326–1354

    Article  Google Scholar 

  64. Wang H, Wang L (2017) Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks. In: IEEE Conference on computer vision and pattern recognition, pp 3633–3642

  65. Wu J, Ye F, Ma J, Sun X, Xu J, Cui Z (2008) The segmentation and visualization of human organs based on adaptive region growing method. Computer and Information Technology Workshops. Sydney, pp 439–443

  66. Xing D, Wang X, Lu H (2015) Action recognition using hybrid feature descriptor and VLAD video encoding. Computer Vision Workshops

  67. Xu R, Zhang B, Ye Q, Jiao J (2010) Human detection in images via L1-norm minimization learning. Acoustics, Speech and Signal Processing. Dallas, pp 3566–3569

  68. Yin Y, Yang G, Man H (2013) Small human group detection and event representation based on cognitive semantics. Semantic Computing. Irvine, pp 64–69

  69. Yun K, Honorio J, Chattopadhyay D, Berg T-L, Samaras D (2012) Two-person interaction detection using body-pose features and multiple instance learning. Computer Vision and Pattern Recognition Workshops, pp 28–35

  70. Zhan S, Chang I (2014) Pictorial structures model based human interaction recognition. ICMLC, 862–866

  71. Zhan S-Z, Chang I-C (2014) Pictorial structures model based human interaction recognition. Machine Learning and Cybernetics. Lanzhou

Download references

Acknowledgements

This research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (No. 2018R1D1A1A02085645).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ahmad Jalal.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahmood, M., Jalal, A. & Kim, K. WHITE STAG model: wise human interaction tracking and estimation (WHITE) using spatio-temporal and angular-geometric (STAG) descriptors. Multimed Tools Appl 79, 6919–6950 (2020). https://doi.org/10.1007/s11042-019-08527-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-019-08527-8

Keywords

Navigation