Abstract
As image processing techniques and devices advance, the real-time applications of computer vision such as human action/interaction recognition and video content analysis become more attractive. However, the methods proposed in the state-of-the-art studies are still far from representing real-time and all-inclusive classifiers because of the image and video analysis complexity. This work presents a new approach based on key-poses of frame silhouettes for human interaction recognition. We use an inner-distance-based shape descriptor which gives a perfect description of the shape due to its ability to collect data from the whole shape. The core idea is to develop a two-step classifier based on a sequential pattern mining classifier. So, we extract the Bilateral Silhouette shape for the persons and describe it based on the inner-distance feature to compare each frame with a pre-defined dictionary of key-poses. The classification process is performed in frame and sequence layers. Accurate and efficient, the sequential pattern mining approach provides an appealing solution to the problem of sequence classification, giving comparable or even better results than standard classifiers. We evaluated the recognition performance of the system using video sequences of SBU human interaction dataset and the UT-interaction dataset as two well-known interaction datasets and the results are considered acceptable (95.25% in SBU and 90.5% in UT databases, respectively), outperforming most state-of-the-art results. These recognition rates are calculated after we have tested different parameters which can affect the results. Both datasets include multiple interaction classes performed by different actors, which helps us develop an all-inclusive method based on the datasets. The proposed method can be optimized to be used in some real world applications such as abnormal activity recognition in crowded places, auxiliary surveillance system, human-computer interaction, etc.
Similar content being viewed by others
Data Availability
The datasets analysed during the current study are available in the SDHA 2010 repository, https://cvrc.ece.utexas.edu/SDHA2010/Human_Interaction.html and in COVE repository, https://cove.thecvf.com/datasets/57.
References
Yang Y (2018) Decomposition and recognition of playing volleyball action based onsvm algorithm. J Interdiscip Math 21(5):1181–1186
Martin P-E, Benois-Pineau J, Péteri R, Morlier J (2018) Sport action recognition with siamese spatio-temporal cnns:Application to table tennis. In: 2018 International conference on content-based multimedia indexing(CBMI), pp 1–6. IEEE
Kotyan S, Venkanna U, Kumar N, Sahu PK (2018) Hauar: Home automation using action recognition. In: 2018 Conference on information and communication technology(CICT), pp 1–6. IEEE
Diederichs F, Brouwer N, Klöden H, Zahn P, Schmitz B (2018) Application of a driver intention recognition algorithm on a pedestrian intention recognition and collision avoidance system. UR: BAN Human Factors in Traffic: Approaches for Safe, Efficient and Stress-free Urban Traffic, pp 267–284
Sun S, Liu Y, Mao L (2019) Multi-view learning for visual violence recognition with maximumentropy discrimination and deep features. Inf Fusion 50:43–53
Song S, Yan D, Xie,Y (2018) Design of control system based on hand gesture recognition. In: 2018 IEEE 15th international conference on networking, sensing andControl (ICNSC), pp 1–4. IEEE
Ryoo MS, Aggarwal J (2010) Ut-interaction dataset, icpr contest on semantic description of humanactivities (sdha). In: IEEE international conference on pattern recognition workshops, vol 2, pp 4
Yun K, Honorio J, Chattopadhyay D, Berg TL, Samaras D (2012) Two-person interaction detection using body-pose features and multipleinstance learning. In: 2012 IEEE computer society conference on computer vision andPattern recognition workshops, pp 28–35. IEEE
Nikzad S, Ebrahimnezhad H (2017) Two-person interaction recognition from bilateral silhouette of keyposes. J Ambient Intell Smart Environ 9(4):483–499
Marin-Jimenez MJ, Yeguas E, De La Blanca NP (2013) Exploring stip-based models for recognizing human interactions in tvvideos. Pattern Recognit Lett 34(15):1819–1828
Aggarwal JK, Ryoo MS (2011) Human activity analysis: A review. Acm Comput Surv (Csur) 43(3):1–43
Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28(6):976–990
Rahmani H, Mian A, Shah M (2017) Learning a deep model for human action recognition from novelviewpoints. IEEE Trans Pattern Anal Mach Intell 40(3):667–681
Liu J, Wang G, Duan L-Y, Abdiyeva K, Kot AC (2017) Skeleton-based human action recognition with global context-awareattention lstm networks. IEEE Trans Image Process 27(4):1586–1599
Choutas V, Weinzaepfel P, Revaud J, Schmid C (2018) Potion: Pose motion representation for action recognition. In: Proceedings of the IEEE conference on computer vision and PatternRecognition, pp 7024–7033
Yan Y, Ni B, Yang X (2017) Predicting human interaction via relative attention model. arXiv:1705.09467
Meng M, Drira H, Boonaert J (2018) Distances evolution analysis for online and off-line human objectinteraction recognition. Image Vis Comput 70:32–45
Rahmani H, Mahmood A, Q Huynh D, Mian A (2014) Hopc: Histogram of oriented principal components of 3d pointclouds foraction recognition. In: Computer vision–ECCV 2014: 13th european conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part II 13, pp. 742–757. Springer
Bilen H, Fernando B, Gavves E, Vedaldi A (2017) Action recognition with dynamic image networks. IEEE Trans Pattern Anal Mach Intell 40(12):2799–2813
Dey A, Biswas S, Le D-N (2023) Recognition of human interactions in still images using adaptivedrnet with multi-level attention. International Journal of Advanced Computer Science and Applications 14(10)
Truong BT, Venkatesh S (2007) Video abstraction: A systematic review and classification. ACM Trans Multimed Comput Commun Appl (TOMM) 3(1):3
Kim C, Hwang J-N (2002) Object-based video abstraction for video surveillance systems. IEEE Trans Circuits Syst Video Technol 12(12):1128–1138
Fei M, Jiang W, Mao W (2017) Memorable and rich video summarization. J Vis Commun Image Represent 42:207–217
Baysal S, Kurt MC, Duygulu P (2010) Recognizing human actions using key poses. In: 2010 20th International conference on pattern recognition, pp 1727–1730. IEEE
Ling H, Jacobs DW (2007) Shape classification using the inner-distance. IEEE Trans Pattern Anal Mach Intell 29(2):286–299
Cormen TH, Leiserson CE, Rivest RL, Stein C (2022) Introduction to Algorithms. MIT press
Canny J (1986) A computational approach to edge detection. IEEE Trans Pattern Anal Mach Intell 6:679–698
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on DataEngineering, pp 3–14. IEEE
Wang K, Xu Y, Yu JX (2004) Scalable sequential pattern mining for biological sequences. In: Proceedings of the thirteenth ACM international conference onInformation and knowledge management, pp 178–187
Batal I, Valizadegan H, Cooper GF, Hauskrecht M (2011) A pattern mining approach for classifying multivariate temporal data. In: 2011 IEEE international conference on Bioinformatics andBiomedicine, pp 358–365. IEEE
Chen Y-L, Kuo M-H, Wu S-Y, Tang K (2009) Discovering recency, frequency, and monetary (rfm) sequential patternsfrom customers’ purchasing data. Electron Commer Res Appl 8(5):241–251
Kim S-W, Park S, Won J-I, Kim S-W (2008) Privacy preserving data mining of sequential patterns for networktraffic data. Inf Sci 178(3):694–713
Palacios A, Martínez A, Sánchez L, Couso I (2015) Sequential pattern mining applied to aeroengine condition monitoringwith uncertain health data. Eng Appl Artif Intell 44:10–24
Exarchos TP, Tsipouras MG, Papaloukas C, Fotiadis DI (2008) A two-stage methodology for sequence classification based onsequential pattern mining and optimization. Data Knowl Eng 66(3):467–487
Evangelakis G, Rizos J, Lagaris I, Demetropoulos I (1987) Merlin-a portable system for multidimensional minimization. Comput Phys Commun 46(3):401–415
Papageorgiou D, Demetropoulos I, Lagaris I (2004) Merlin-3.1. 1. a new version of the merlin optimization environment. Comput Phys Commun 159(1):70–71
Fournier-Viger P, Gomariz A, Campos M, Thomas R (2014) Fast vertical mining of sequential patterns using co-occurrenceinformation. In: Advances in knowledge discovery and data mining: 18th Pacific-AsiaConference, PAKDD 2014, Tainan, Taiwan, May 13-16, 2014. Proceedings, Part I18, pp. 40–52. Springer
Zaki MJ (2001) Spade: An efficient algorithm for mining frequent sequences. Mach Learn 42:31–60
Lin L, Wang K, Zuo W, Wang M, Luo J, Zhang L (2016) A deep structured model with radius-margin bound for 3d humanactivity recognition. Int J Comput Vis 118:256–273
Ji Y, Cheng H, Zheng Y, Li H (2015) Learning contrastive feature distribution model for interactionrecognition. J Vis Commun Image Represent 33:340–349
Huynh-The T, Banos O, Le B-V, Bui D-M, Lee S, Yoon Y, Le-Tien T (2015) Pam-based flexible generative topic model for 3d interactive activityrecognition. In: 2015 International conference on advanced technologies forCommunications (ATC), pp 117–122. IEEE
Liu B, Ju Z, Liu H (2018) A structured multi-feature representation for recognizing human actionand interaction. Neurocomputing 318:287–296
Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2018) Learning clip representations for skeleton-based 3d actionrecognition. IEEE Trans Image Process 27(6):2842–2855
Mottaghi A, Soryani M, Seifi H (2020) Action recognition in freestyle wrestling using silhouette-skeletonfeatures. Eng Sci Technol Int J 23(4):921–930
Liu X, Li Y, Guo T, Xia R (2020) Relative view based holistic-separate representations for two-personinteraction recognition using multiple graph convolutional networks. J Vis Commun Image Represent 70:102833
Li Z, Li Y, Tang L, Zhang T, Su J (2022) Two-person graph convolutional network for skeleton-based human interaction recognition. IEEE Transactions on Circuits and Systems for Video Technology
Liu M, Liu H, Sun Q, Zhang T, Ding R (2016) Salient pairwise spatio-temporal interest points for real-timeactivity recognition. CAAI Trans Intell Technol 1(1):14–29
Sefidgar YS, Vahdat A, Se S, Mori G (2015) Discriminative key-component models for interaction detection andrecognition. Comput Vis Image Underst 135:16–30
Kantorov V, Laptev I (2014) Efficient feature extraction, encoding and classification for actionrecognition. In: Proceedings of the IEEE conference on computer vision and PatternRecognition, pp 2593–2600
Amer MR, Todorovic S (2015) Sum product networks for activity recognition. IEEE Trans Pattern Anal Mac Intell 38(4):800–813
Garzón G, Martínez F (2019) A fast action recognition strategy based on motion trajectoryoccurrences. Pattern Recognit Image Anal 29:447–456
Sahoo SP, Ari S (2019) On an algorithm for human action recognition. Expert Syst Appl 115:524–534
Wang Z, Jin J, Liu T, Liu S, Zhang J, Chen S, Zhang Z, Guo D, Shao Z (2018) Understanding human activities in videos: A joint action andinteraction learning approach. Neurocomputing 321:216–226
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflicts of interest
The authors declare that they have no conflict of interest concerning the publication of this manuscript.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Nikzad, S., Ebrahimi, A. Two-person interaction recognition using a two-step sequential pattern classification. Multimed Tools Appl (2024). https://doi.org/10.1007/s11042-024-19240-6
Received:
Revised:
Accepted:
Published:
DOI: https://doi.org/10.1007/s11042-024-19240-6