Abstract
In this paper, we propose a gesture recognition approach using a multi-view setup for assistive device applications. As smart assistances become a reality, the need to interact with them in a natural fashion, as we do with other humans, becomes crucial. Gestures are a fundamental modality of human interaction, being natural and intuitive. We propose a gesture recognition approach relying on upper-body joints’ motions, so that individuals suffering from motor dysfunctions, that need to use wheelchairs or cannot stand, can as well interact with their smart assistive devices. To achieve this goal, we propose a robust multi-view skeleton fusion through a Kalman filtering technique, followed by an upper-body handcrafted feature extraction process. Gestures are classified using a support vector machine (SVM) classifier. Experiments with our captured dataset revealed a strong generalization from our method, and an increased performance of our multi-view fusion over the individual views.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Aggarwal, J.K., Ryoo, M.S.: Human activity analysis: a review. ACM Comput. Surv. (CSUR) 43(3) (2011). Article no. 16
Aggarwal, J.K., Xia, L.: Human activity recognition from 3d data: a review. Pattern Recogn. Lett. 48, 70–80 (2014)
Amor, B.B., Su, J., Srivastava, A.: Action recognition using rate-invariant analysis of skeletal shape trajectories. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 1–13 (2016)
Baak, A., Müller, M., Bharaj, G., Seidel, H.P., Theobalt, C.: A data-driven approach for real-time full body pose reconstruction from a depth camera. In: Consumer Depth Cameras for Computer Vision, pp. 71–98. Springer (2013)
Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory, pp. 144–152. ACM (1992)
Cao, Z., Simon, T., Wei, S.E., Sheikh, Y.: Realtime multi-person 2d pose estimation using part affinity fields. In: CVPR (2017)
Cippitelli, E., Gasparrini, S., Gambi, E., Spinsante, S.: A human activity recognition system using skeleton data from RGBD sensors. Comput. Intell. Neurosci. 2016, 21 (2016)
Donahue, J., Anne Hendricks, L., Guadarrama, S., Rohrbach, M., Venugopalan, S., Saenko, K., Darrell, T.: Long-term recurrent convolutional networks for visual recognition and description. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2625–2634 (2015)
Du, H., Zhao, Y., Han, J., Wang, Z., Song, G.: Data fusion of multiple kinect sensors for a rehabilitation system. In: 2016 38th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 4869–4872. IEEE (2016)
Eweiwi, A., Cheema, M.S., Bauckhage, C., Gall, J.: Efficient pose-based action recognition. In: Asian Conference on Computer Vision, pp. 428–443. Springer (2014)
Feichtenhofer, C., Pinz, A., Zisserman, A.: Convolutional two-stream network fusion for video action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1933–1941 (2016)
Gan, Q., Harris, C.J.: Comparison of two measurement fusion methods for Kalman-filter-based multisensor data fusion. IEEE Trans. Aerosp. Electron. Syst. 37(1), 273–279 (2001)
Girão, P., Paulo, J., Garrote, L., Peixoto, P.: Real-time multi-view grid map-based spatial representation for mixed reality applications. In: De Paolis, L.T., Bourdot, P. (eds.) Augmented Reality, Virtual Reality, and Computer Graphics, pp. 322–339. Springer International Publishing, Cham (2018)
Herath, S., Harandi, M., Porikli, F.: Going deeper into action recognition: a survey. Image Vis. Comput. 60, 4–21 (2017)
Hofmann, M., Gavrila, D.M.: Multi-view 3d human pose estimation in complex environment. Int. J. Comput. Vision 96(1), 103–124 (2012)
Ke, S.R., Thuc, H., Lee, Y.J., Hwang, J.N., Yoo, J.H., Choi, K.H.: A review on video-based human activity recognition. Computers 2(2), 88–131 (2013)
Kitsikidis, A., Dimitropoulos, K., Douka, S., Grammalidis, N.: Dance analysis using multiple kinect sensors. In: 2014 International Conference on Computer Vision Theory and Applications (VISAPP), vol. 2, pp. 789–795. IEEE (2014)
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft COCO: common objects in context. In: European Conference on Computer Vision, pp. 740–755. Springer (2014)
Liu, Y., Gall, J., Stoll, C., Dai, Q., Seidel, H.P., Theobalt, C.: Markerless motion capture of multiple characters using multiview image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 35(11), 2720–2735 (2013)
Masse, J.T., Lerasle, F., Devy, M., Monin, A., Lefebvre, O., Mas, S.: Human motion capture using data fusion of multiple skeleton data. In: International Conference on Advanced Concepts for Intelligent Vision Systems, pp. 126–137. Springer (2013)
Park, S., Trivedi, M.M.: Understanding human interactions with track and body synergies (TBS) captured from multiple views. Comput. Vis. Image Underst. 111(1), 2–20 (2008)
Poppe, R.: A survey on vision-based human action recognition. Image Vis. Comput. 28(6), 976–990 (2010)
Qiu, Z., Yao, T., Mei, T.: Learning spatio-temporal representation with pseudo-3d residual networks. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 5534–5542. IEEE (2017)
Simonyan, K., Zisserman, A.: Two-stream convolutional networks for action recognition in videos. In: Advances in Neural Information Processing Systems, pp. 568–576 (2014)
Tao, L., Vidal, R.: Moving poselets: a discriminative and interpretable skeletal motion representation for action recognition. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 61–69 (2015)
Tran, D., Bourdev, L., Fergus, R., Torresani, L., Paluri, M.: Learning spatiotemporal features with 3d convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4489–4497 (2015)
Turaga, P., Chellappa, R., Subrahmanian, V.S., Udrea, O.: Machine recognition of human activities: a survey. IEEE Trans. Circuits Syst. Video Technol. 18(11), 1473 (2008)
Yeung, K.Y., Kwok, T.H., Wang, C.C.: Improved skeleton tracking by duplex kinects: a practical approach for real-time applications. J. Comput. Inf. Sci. Eng. 13(4), 041007 (2013)
Zhang, L., Sturm, J., Cremers, D., Lee, D.: Real-time human motion tracking using multiple depth cameras. In: 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 2389–2395. IEEE (2012)
Zhu, G., Zhang, L., Shen, P., Song, J.: Human action recognition using multi-layer codebooks of key poses and atomic motions. Sig. Process. Image Commun. 42, 19–30 (2016)
Ziegler, J., Nickel, K., Stiefelhagen, R.: Tracking of the articulated upper body on multi-view stereo image sequences. In: 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2006), vol. 1, pp. 774–781. IEEE (2006)
Acknowledgments
This work was supported by the project POCI-01-0247-FEDER-017644 HTPDIR - “Human Tracking and Perception in Dynamic Immersive Rooms” financed by the Portugal2020 program and European Union’s structural funds.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Paulo, J., Girão, P., Peixoto, P. (2020). Multi-view Robust Gesture Recognition for Assistive Interfaces. In: Henriques, J., Neves, N., de Carvalho, P. (eds) XV Mediterranean Conference on Medical and Biological Engineering and Computing – MEDICON 2019. MEDICON 2019. IFMBE Proceedings, vol 76. Springer, Cham. https://doi.org/10.1007/978-3-030-31635-8_205
Download citation
DOI: https://doi.org/10.1007/978-3-030-31635-8_205
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31634-1
Online ISBN: 978-3-030-31635-8
eBook Packages: EngineeringEngineering (R0)