Skip to main content

Action Recognition Using Local Visual Descriptors and Inertial Data

  • Conference paper
  • First Online:
Book cover Ambient Intelligence (AmI 2019)

Abstract

Different body sensors and modalities can be used in human action recognition, either separately or simultaneously. Multi-modal data can be used in recognizing human action. In this work we are using inertial measurement units (IMUs) positioned at left and right hands with first person vision for human action recognition. A novel statistical feature extraction method was proposed based on curvature of the graph of a function and tracking left and right hand positions in space. Local visual descriptors have been used as features for egocentric vision. An intermediate fusion between IMUs and visual sensors has been performed. Despite of using only two IMUs sensors with egocentric vision, our classification result achieved is 99.61% for recognizing nine different actions. Feature extraction step could play a vital step in human action recognition with limited number of sensors, hence, our method might indeed be promising.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://github.com/alhersh/ActionExtractor.

References

  1. Abebe, G., Cavallaro, A.: Inertial-vision: cross-domain knowledge transfer for wearable sensors. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1392–1400 (2017)

    Google Scholar 

  2. Akpinar, S., Alpaslan, F.N.: Video action recognition using an optical flow based representation. In: IPCV, the Steering Committee of the World Congress in Computer Science, Computer Engineering and Applied Computing (WorldComp), p. 1 (2014)

    Google Scholar 

  3. Alhersh, T., Stuckenschmidt, H.: On the combination of IMU and optical flow for action recognition. In: 2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops). IEEE (2019)

    Google Scholar 

  4. Alhersh, T., Stuckenschmidt, H.: Unsupervised fine-tuning of optical flow for better motion boundary estimation. In: Proceedings of the 14th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications, Prague, Czech Republic, 25–27 February 2019. VISAPP, vol. 5. pp. 776–783. SciTePress, Setúbal (2019). https://doi.org/10.5220/0007343707760783. http://ub-madoc.bib.uni-mannheim.de/49196/, online-Resource

  5. Arabacı, M.A., Özkan, F., Surer, E., Jančovič, P., Temizel, A.: Multi-modal egocentric activity recognition using audio-visual features. arXiv preprint arXiv:1807.00612 (2018)

  6. Ashry, S., Elbasiony, R., Gomaa, W.: An LSTM-based descriptor for human activities recognition using IMU sensors. In: Proceedings of the 15th International Conference on Informatics in Control, Automation and Robotics, ICINCO, vol. 1, pp. 494–501 (2018)

    Google Scholar 

  7. Attal, F., Mohammed, S., Dedabrishvili, M., Chamroukhi, F., Oukhellou, L., Amirat, Y.: Physical human activity recognition using wearable sensors. Sensors 15(12), 31314–31338 (2015)

    Article  Google Scholar 

  8. Betancourt, A., Morerio, P., Regazzoni, C.S., Rauterberg, M.: The evolution of first person vision methods: a survey. IEEE Trans. Circuits Syst. Video Technol. 25(5), 744–760 (2015)

    Article  Google Scholar 

  9. Bevilacqua, A., MacDonald, K., Rangarej, A., Widjaya, V., Caulfield, B., Kechadi, T.: Human activity recognition with convolutional neural networks. In: Brefeld, U., et al. (eds.) ECML PKDD 2018. LNCS (LNAI), vol. 11053, pp. 541–552. Springer, Cham (2019). https://doi.org/10.1007/978-3-030-10997-4_33

    Chapter  Google Scholar 

  10. Brox, T., Malik, J.: Large displacement optical flow: descriptor matching in variational motion estimation. IEEE Trans. Pattern Anal. Mach. Intell. 33(3), 500–513 (2011)

    Article  Google Scholar 

  11. Chen, C., Jafari, R., Kehtarnavaz, N.: A survey of depth and inertial sensor fusion for human action recognition. Multimed. Tools Appl. 76(3), 4405–4425 (2017)

    Article  Google Scholar 

  12. Cortes, C., Vapnik, V.: Support-vector networks. Mach. Learn. 20(3), 273–297 (1995)

    MATH  Google Scholar 

  13. Coskun, H., Tan, D.J., Conjeti, S., Navab, N., Tombari, F.: Human motion analysis with deep metric learning. arXiv preprint arXiv:1807.11176 (2018)

  14. Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection (2005)

    Google Scholar 

  15. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of flow and appearance. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3952, pp. 428–441. Springer, Heidelberg (2006). https://doi.org/10.1007/11744047_33

    Chapter  Google Scholar 

  16. Dasarathy, B.V.: Sensor fusion potential exploitation-innovative architectures and illustrative applications. Proc. IEEE 85(1), 24–38 (1997)

    Article  Google Scholar 

  17. Davila, J.C., Cretu, A.M., Zaremba, M.: Wearable sensor data classification for human activity recognition based on an iterative learning framework. Sensors 17(6), 1287 (2017)

    Article  Google Scholar 

  18. Dosovitskiy, A., et al.: FlowNet: learning optical flow with convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2758–2766 (2015)

    Google Scholar 

  19. Elmenreich, W.: An introduction to sensor fusion. Vienna University of Technology, Austria 502 (2002)

    Google Scholar 

  20. Fortun, D., Bouthemy, P., Kervrann, C.: Optical flow modeling and computation: a survey. Comput. Vis. Image Underst. 134, 1–21 (2015)

    Article  Google Scholar 

  21. Horn, B.K., Schunck, B.G.: Determining optical flow. Artif. Intell. 17(1–3), 185–203 (1981)

    Article  Google Scholar 

  22. Ijjina, E.P., Chalavadi, K.M.: Human action recognition in RGB-D videos using motion sequence information and deep learning. Pattern Recogn. 72, 504–516 (2017)

    Article  Google Scholar 

  23. Jalloul, N., Porée, F., Viardot, G., LHostis, P., Carrault, G.: Activity recognition using complex network analysis. IEEE J. Biomed. Health Inform. 22(4), 989–1000 (2018)

    Article  Google Scholar 

  24. Kumar, S.S., John, M.: Human activity recognition using optical flow based feature set. In: 2016 IEEE International Carnahan Conference on Security Technology (ICCST), pp. 1–5. IEEE (2016)

    Google Scholar 

  25. Lu, Y., Velipasalar, S.: Human activity classification incorporating egocentric video and inertial measurement unit data. In: 2018 IEEE Global Conference on Signal and Information Processing (GlobalSIP), pp. 429–433. IEEE (2018)

    Google Scholar 

  26. Moutinho, N.M.B.: Video and image match searching, US Patent App. 15/252,142, 2 March 2017

    Google Scholar 

  27. Moya Rueda, F., Grzeszick, R., Fink, G., Feldhorst, S., ten Hompel, M.: Convolutional neural networks for human activity recognition using body-worn sensors. In: Informatics, vol. 5, p. 26. Multidisciplinary Digital Publishing Institute (2018)

    Google Scholar 

  28. Nguyen, T.H.C., Nebel, J.C., Florez-Revuelta, F., et al.: Recognition of activities of daily living with egocentric vision: a review. Sensors 16(1), 72 (2016)

    Article  Google Scholar 

  29. Romero, H., Salazar, S., Lozano, R., Benosman, R.: Fusion of optical flow and inertial sensors for four-rotor rotorcraft stabilization. IFAC Proc. Vol. 40(15), 209–214 (2007)

    Article  Google Scholar 

  30. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: Advances in Neural Information Processing Systems, pp. 3856–3866 (2017)

    Google Scholar 

  31. Sevilla-Lara, L., Liao, Y., Guney, F., Jampani, V., Geiger, A., Black, M.J.: On the integration of optical flow and action recognition. arXiv preprint arXiv:1712.08416 (2017)

  32. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)

  33. Singh, S., Arora, C., Jawahar, C.: First person action recognition using deep learned descriptors. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2620–2628 (2016)

    Google Scholar 

  34. Stein, S., McKenna, S.J.: Recognising complex activities with histograms of relative tracklets. Comput. Vis. Image Underst. 154, 82–93 (2017)

    Article  Google Scholar 

  35. Sudhakaran, S., Escalera, S., Lanz, O.: Hierarchical feature aggregation networks for video action recognition. arXiv preprint arXiv:1905.12462 (2019)

  36. Sudhakaran, S., Escalera, S., Lanz, O.: LSTA: long short-term attention for egocentric action recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 9954–9963 (2019)

    Google Scholar 

  37. Sun, D., Yang, X., Liu, M.Y., Kautz, J.: PWC-NET: CNNs for optical flow using pyramid, warping, and cost volume. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8934–8943 (2018)

    Google Scholar 

  38. Sun, J., Fu, Y., Li, S., He, J., Xu, C., Tan, L.: Sequential human activity recognition based on deep convolutional network and extreme learning machine using wearable sensors. J. Sensors 2018 (2018)

    Google Scholar 

  39. Sun, S., Kuang, Z., Sheng, L., Ouyang, W., Zhang, W.: Optical flow guided feature: a fast and robust motion representation for video action recognition. In: CVPR (2018)

    Google Scholar 

  40. X-IO Technologies Limited: X-IO technologies limited. UK company (2019). http://x-io.co.uk/

  41. De la Torre, F., et al.: Guide to the Carnegie Mellon university multimodal activity (CMU-MMAC) database. Robotics Institute, p. 135 (2008)

    Google Scholar 

  42. Uijlings, J., Duta, I.C., Sangineto, E., Sebe, N.: Video classification with densely extracted HOG/HOF/MBH features: an evaluation of the accuracy/computational efficiency trade-off. Int. J. Multimed. Inf. Retr. 4(1), 33–44 (2015)

    Article  Google Scholar 

  43. Uijlings, J.R., Duta, I.C., Rostamzadeh, N., Sebe, N.: Realtime video classification using dense HOF/HOG. In: Proceedings of International Conference on Multimedia Retrieval, p. 145. ACM (2014)

    Google Scholar 

  44. Wang, H., Kläser, A., Schmid, C., Cheng-Lin, L.: Action recognition by dense trajectories. In: CVPR 2011-IEEE Conference on Computer Vision and Pattern Recognition, pp. 3169–3176. IEEE (2011)

    Google Scholar 

  45. Wang, X., Wu, Y., Zhu, L., Yang, Y.: Baidu-UTS submission to the EPIC-kitchens action recognition challenge 2019. arXiv preprint arXiv:1906.09383 (2019)

  46. Wannenwetsch, A.S., Keuper, M., Roth, S.: ProbFlow: joint optical flow and uncertainty estimation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 1182–1191. IEEE (2017)

    Google Scholar 

  47. Wrzalik, M., Krechel, D.: Human action recognition using optical flow and convolutional neural networks. In: 2017 16th IEEE International Conference on Machine Learning and Applications (ICMLA), pp. 801–805. IEEE (2017)

    Google Scholar 

  48. Xu, C., Chai, D., He, J., Zhang, X., Duan, S.: InnoHAR: a deep neural network for complex human activity recognition. IEEE Access 7, 9893–9902 (2019)

    Article  Google Scholar 

  49. Yordanova, K., Krüger, F.: Creating and exploring semantic annotation for behaviour analysis. Sensors 18(9), 2778 (2018)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Taha Alhersh .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Alhersh, T., Brahim Belhaouari, S., Stuckenschmidt, H. (2019). Action Recognition Using Local Visual Descriptors and Inertial Data. In: Chatzigiannakis, I., De Ruyter, B., Mavrommati, I. (eds) Ambient Intelligence. AmI 2019. Lecture Notes in Computer Science(), vol 11912. Springer, Cham. https://doi.org/10.1007/978-3-030-34255-5_9

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-34255-5_9

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-34254-8

  • Online ISBN: 978-3-030-34255-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics