Skip to main content

Exploiting Egocentric Cues for Action Recognition for Ambient Assisted Living Applications

  • 214 Accesses

Part of the Advances in Science, Technology & Innovation book series (ASTI)

Abstract

Being the elder population in constant growth, governments have to cope with higher expenses for elder care from year to year. Helping the elderly to extend their independent lifestyle is of pivotal importance to minimise those costs. That is the goal of the Ambient Assisted Living research field. Through the use of Information and Communication Technologies, it is possible to provide solutions to help the elderly live independently for as long as possible or to predict mental health issues that could seriously harm their independence. The key enablers for these solutions are the egocentric cameras and the egocentric action recognition techniques for the analysis of egocentric videos. This chapter proposes various of those techniques focused on the exploitation of intrinsic egocentric cues.

Keywords

  • Ambient assisted living
  • Egocentric action recognition
  • Computer vision
  • Deep learning

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-14647-4_10
  • Chapter length: 28 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   149.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-14647-4
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   199.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4

From the work of [19]

Fig. 5

From the work of [18]

Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11

From the work of [60]

Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Notes

  1. 1.

    https://www.nih.gov/news-events/news-releases/worlds-older-population-grows-dramatically.

References

  1. Nachwa Aboubakr, James L Crowley, and Rémi Ronfard. Recognizing manipulation actions from state-transformations. arXiv preprint arXiv:1906.05147, 2019.

  2. Ahmad Akl, Jasper Snoek, and Alex Mihailidis. Unobtrusive detection of mild cognitive impairment in older adults through home monitoring. IEEE Journal of Biomedical and Health Informatics, 21(2):339–348, 2015.

    CrossRef  Google Scholar 

  3. Maryam Asadi-Aghbolaghi, Albert Clapes, Marco Bellantonio, Hugo Jair Escalante, Víctor Ponce-López, Xavier Baró, Isabelle Guyon, Shohreh Kasaei, and Sergio Escalera. A survey on deep learning based approaches for action and gesture recognition in image sequences. In 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017), pages 476–483. IEEE, 2017.

    Google Scholar 

  4. Sven Bambach, Stefan Lee, David J Crandall, and Chen Yu. Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions. In Proceedings of the IEEE International Conference on Computer Vision, pages 1949–1957, 2015.

    Google Scholar 

  5. Ardhendu Behera, Matthew Chapman, Anthony G Cohn, and David C Hogg. Egocentric activity recognition using histograms of oriented pairwise relations. In 2014 International Conference on Computer Vision Theory and Applications (VISAPP), volume 2, pages 22–30. IEEE, 2014.

    Google Scholar 

  6. Ardhendu Behera, David C Hogg, and Anthony G Cohn. Egocentric activity monitoring and recovery. In Asian Conference on Computer Vision, pages 519–532. Springer, 2012.

    Google Scholar 

  7. Allah Bux, Plamen Angelov, and Zulfiqar Habib. Vision based human activity recognition: a review. In Advances in Computational Intelligence Systems, pages 341–371. Springer, 2017.

    Google Scholar 

  8. Joao Carreira and Andrew Zisserman. Quo vadis, action recognition? a new model and the kinetics dataset. In proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6299–6308, 2017.

    Google Scholar 

  9. Alejandro Cartas, Petia Radeva, and Mariella Dimiccoli. Contextually driven first-person action recognition from videos.

    Google Scholar 

  10. Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531, 2014.

  11. Liming Chen, Jesse Hoey, Chris D Nugent, Diane J Cook, and Zhiwen Yu. Sensor-based activity recognition. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 42(6):790–808, 2012.

    Google Scholar 

  12. Dima Damen, Teesid Leelasawassuk, Osian Haines, Andrew Calway, and Walterio W Mayol-Cuevas. You-do, i-learn: Discovering task relevant objects and their modes of interaction from multi-user egocentric video. In BMVC, volume 2, page 3, 2014.

    Google Scholar 

  13. Alireza Fathi and James M Rehg. Modeling actions through state changes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 2579–2586, 2013.

    Google Scholar 

  14. Alireza Fathi, Ali Farhadi, and James M Rehg. Understanding egocentric activities. In 2011 International Conference on Computer Vision, pages 407–414. IEEE, 2011.

    Google Scholar 

  15. Alireza Fathi, Xiaofeng Ren, and James M Rehg. Learning to recognize objects in egocentric activities. In CVPR 2011, pages 3281–3288. IEEE, 2011.

    Google Scholar 

  16. Alireza Fathi, Yin Li, and James M Rehg. Learning to recognize daily actions using gaze. In European Conference on Computer Vision, pages 314–327. Springer, 2012.

    Google Scholar 

  17. Amy Fire and Song-Chun Zhu. Learning perceptual causality from video. ACM Transactions on Intelligent Systems and Technology (TIST), 7(2):1–22, 2015.

    Google Scholar 

  18. Ross Girshick. Fast r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1440–1448, 2015.

    Google Scholar 

  19. Ross Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Region-based convolutional networks for accurate object detection and segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 38(1):142–158, 2015.

    CrossRef  Google Scholar 

  20. Georgia Gkioxari, Ross Girshick, and Jitendra Malik. Contextual action recognition with r* cnn. In Proceedings of the IEEE International Conference on Computer Vision, pages 1080–1088, 2015.

    Google Scholar 

  21. Nadee Goonawardene, Hwee-Pink Tan, and Lee Buay Tan. Unobtrusive detection of frailty in older adults. In International Conference on Human Aspects of IT for the Aged Population, pages 290–302. Springer, 2018.

    Google Scholar 

  22. Mary Hayhoe. Vision using routines: A functional account of vision. Visual Cognition, 7(1-3):43–64, 2000.

    CrossRef  Google Scholar 

  23. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 770–778, 2016.

    Google Scholar 

  24. Hongwen Kang, Martial Hebert, and Takeo Kanade. Discovering object instances from scenes of daily living. In 2011 International Conference on Computer Vision, pages 762–769. IEEE, 2011.

    Google Scholar 

  25. Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas Noldus, and Remco Veltkamp. Multitask learning to improve egocentric action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.

    Google Scholar 

  26. Georgios Kapidis, Ronald Poppe, Elsbeth van Dam, Lucas PJJ Noldus, and Remco C Veltkamp. Object detection-based location and activity classification from egocentric videos: A systematic analysis. In Smart Assisted Living, pages 119–145. Springer, 2020.

    Google Scholar 

  27. Yoon Kim. Convolutional neural networks for sentence classification. arXiv preprint arXiv:1408.5882, 2014.

  28. Michael Land, Neil Mennie, and Jennifer Rusted. The roles of vision and eye movements in the control of activities of daily living. Perception, 28(11):1311–1328, 1999.

    CrossRef  Google Scholar 

  29. Yin Li, Zhefan Ye, and James M Rehg. Delving into egocentric actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 287–295, 2015.

    Google Scholar 

  30. Yin Li, Miao Liu, and James M Rehg. In the eye of beholder: Joint learning of gaze and actions in first person video. In Proceedings of the European Conference on Computer Vision (ECCV), pages 619–635, 2018.

    Google Scholar 

  31. Jun Li, Xianglong Liu, Wenxuan Zhang, Mingyuan Zhang, Jingkuan Song, and Nicu Sebe. Spatio-temporal attention networks for action recognition and detection. IEEE Transactions on Multimedia, 2020.

    Google Scholar 

  32. Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In European Conference on Computer Vision, pages 740–755. Springer, 2014.

    Google Scholar 

  33. Yang Liu, Ping Wei, and Song-Chun Zhu. Jointly recognizing object fluents and tasks in egocentric videos. In Proceedings of the IEEE International Conference on Computer Vision, pages 2924–2932, 2017.

    Google Scholar 

  34. Minlong Lu, Ze-Nian Li, Yueming Wang, and Gang Pan. Deep attention network for egocentric action recognition. IEEE Transactions on Image Processing, 28(8):3703–3713, 2019.

    MathSciNet  CrossRef  Google Scholar 

  35. Minlong Lu, Danping Liao, and Ze-Nian Li. Learning spatiotemporal attention for egocentric action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 0–0, 2019.

    Google Scholar 

  36. Minghuang Ma, Haoqi Fan, and Kris M Kitani. Going deeper into first-person activity recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1894–1903, 2016.

    Google Scholar 

  37. Chih-Yao Ma, Asim Kadav, Iain Melvin, Zsolt Kira, Ghassan AlRegib, and Hans Peter Graf. Attend and interact: Higher-order object interactions for video understanding. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 6790–6800, 2018.

    Google Scholar 

  38. Steve Mann. ’wearcam’(the wearable camera): personal imaging systems for long-term use in wearable tetherless computer-mediated reality and personal photo/videographic memory prosthesis. In Digest of Papers. Second International Symposium on Wearable Computers (Cat. No. 98EX215), pages 124–131. IEEE, 1998.

    Google Scholar 

  39. Kenji Matsuo, Kentaro Yamada, Satoshi Ueno, and Sei Naito. An attention-based activity recognition for egocentric video. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 551–556, 2014.

    Google Scholar 

  40. Tomas McCandless and Kristen Grauman. Object-centric spatio-temporal pyramids for egocentric activity recognition. In BMVC, volume 2, page 3. Citeseer, 2013.

    Google Scholar 

  41. Ajay K Mishra, Yiannis Aloimonos, Loong Fah Cheong, and Ashraf Kassim. Active visual segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 34(4):639–653, 2011.

    Google Scholar 

  42. Erik T Mueller. Commonsense reasoning: an event calculus based approach. Morgan Kaufmann, 2014.

    Google Scholar 

  43. Tomoya Nakatani, Ryohei Kuga, and Takuya Maekawa. Preliminary investigation of object-based activity recognition using egocentric video based on web knowledge. In Proceedings of the 17th International Conference on Mobile and Ubiquitous Multimedia, pages 375–381, 2018.

    Google Scholar 

  44. Thi-Hoa-Cuc Nguyen, Jean-Christophe Nebel, Francisco Florez-Revuelta, et al. Recognition of activities of daily living with egocentric vision: A review. Sensors, 16(1):72, 2016.

    CrossRef  Google Scholar 

  45. Adrián Núñez-Marcos, Gorka Azkune, and Ignacio Arganda-Carreras. Object bounding box annotations for the GTEA Gaze+ dataset, July 2020.

    Google Scholar 

  46. Hamed Pirsiavash and Deva Ramanan. Detecting activities of daily living in first-person camera views. In 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 2847–2854. IEEE, 2012.

    Google Scholar 

  47. Iris Rawtaer, Rathi Mahendran, Ee Heok Kua, Hwee Pink Tan, Hwee Xian Tan, Tih-Shih Lee, and Tze Pin Ng. Early detection of mild cognitive impairment with in-home sensors to monitor behavior patterns in community-dwelling senior citizens in singapore: Cross-sectional feasibility study. Journal of Medical Internet Research, 22(5):e16854, 2020.

    Google Scholar 

  48. Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 779–788, 2016.

    Google Scholar 

  49. Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems, pages 91–99, 2015.

    Google Scholar 

  50. Liyue Shen, Serena Yeung, Judy Hoffman, Greg Mori, and Li Fei-Fei. Scaling human-object interaction recognition through zero-shot learning. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), pages 1568–1576. IEEE, 2018.

    Google Scholar 

  51. Karen Simonyan and Andrew Zisserman. Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, pages 568–576, 2014.

    Google Scholar 

  52. Swathikiran Sudhakaran and Oswald Lanz. Attention is all we need: Nailing down object-centric attention for egocentric activity recognition. arXiv preprint arXiv:1807.11794, 2018.

  53. Swathikiran Sudhakaran, Sergio Escalera, and Oswald Lanz. Lsta: Long short-term attention for egocentric action recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 9954–9963, 2019.

    Google Scholar 

  54. Li Sun, Ulrich Klank, and Michael Beetz. Eyewatchme—3d hand and object tracking for inside out activity analysis. In 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops, pages 9–16. IEEE, 2009.

    Google Scholar 

  55. Dipak Surie, Thomas Pederson, Fabien Lagriffoul, Lars-Erik Janlert, and Daniel Sjölie. Activity recognition using an egocentric perspective of everyday objects. In International Conference on Ubiquitous Intelligence and Computing, pages 246–257. Springer, 2007.

    Google Scholar 

  56. Bugra Tekin, Federica Bogo, and Marc Pollefeys. H+ o: Unified egocentric recognition of 3d hand-object poses and interactions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 4511–4520, 2019.

    Google Scholar 

  57. An Tran and Loong-Fah Cheong. Two-stream flow-guided convolutional attention networks for action recognition. In Proceedings of the IEEE International Conference on Computer Vision Workshops, pages 3110–3119, 2017.

    Google Scholar 

  58. Jasper RR Uijlings, Koen EA Van De Sande, Theo Gevers, and Arnold WM Smeulders. Selective search for object recognition. International Journal of Computer Vision, 104(2):154–171, 2013.

    Google Scholar 

  59. Sagar Verma, Pravin Nagar, Divam Gupta, and Chetan Arora. Making third person techniques recognize first-person actions in egocentric videos. In 2018 25th IEEE International Conference on Image Processing (ICIP), pages 2301–2305. IEEE, 2018.

    Google Scholar 

  60. Heng Wang, Alexander Kläser, Cordelia Schmid, and Cheng-Lin Liu. Dense trajectories and motion boundary descriptors for action recognition. International journal of computer vision, 103(1):60–79, 2013.

    MathSciNet  CrossRef  Google Scholar 

  61. Heng Wang and Cordelia Schmid. Action recognition with improved trajectories. In Proceedings of the IEEE International Conference on Computer Vision, pages 3551–3558, 2013.

    Google Scholar 

  62. Limin Wang, Yuanjun Xiong, Zhe Wang, Yu Qiao, Dahua Lin, Xiaoou Tang, and Luc Van Gool. Temporal segment networks: Towards good practices for deep action recognition. In European Conference on Computer Vision, pages 20–36. Springer, 2016.

    Google Scholar 

  63. Jindong Wang, Yiqiang Chen, Shuji Hao, Xiaohui Peng, and Lisha Hu. Deep learning for sensor-based activity recognition: A survey. Pattern Recognition Letters, 119:3–11, 2019.

    CrossRef  Google Scholar 

  64. Xiaohan Wang, Yu Wu, Linchao Zhu, and Yi Yang. Baidu-uts submission to the epic-kitchens action recognition challenge 2019. arXiv preprint arXiv:1906.09383, 2019.

  65. Xiaohan Wang, Yu Wu, Linchao Zhu, and Yi Yang. Symbiotic attention with privileged information for egocentric action recognition. arXiv preprint arXiv:2002.03137, 2020.

  66. Michael Wray, Davide Moltisanti, and Dima Damen. Towards an unequivocal representation of actions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, pages 1127–1131, 2018.

    Google Scholar 

  67. SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional lstm network: A machine learning approach for precipitation nowcasting. In Advances in Neural Information Processing Systems, pages 802–810, 2015.

    Google Scholar 

  68. Hong-Bo Zhang, Yi-Xiang Zhang, Bineng Zhong, Qing Lei, Lijie Yang, Ji-Xiang Du, and Duan-Sheng Chen. A comprehensive survey of vision-based human action recognition methods. Sensors, 19(5):1005, 2019.

    CrossRef  Google Scholar 

  69. Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R Manmatha, et al. Resnest: Split-attention networks. arXiv preprint arXiv:2004.08955, 2020.

  70. Yang Zhou, Bingbing Ni, Richang Hong, Xiaokang Yang, and Qi Tian. Cascaded interactional targeting network for egocentric video analysis. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 1904–1913, 2016.

    Google Scholar 

  71. Zheming Zuo, Longzhi Yang, Yonghong Peng, Fei Chao, and Yanpeng Qu. Gaze-informed egocentric action recognition for memory aid systems. IEEE Access, 6:12894–12904, 2018.

    CrossRef  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the support of the Basque Government’s Department of Education for the predoctoral funding of the first author. This work has been supported by the Spanish Government under the FuturAAL-Ego project (RTI2018-101045-A-C22) and the FuturAAL-Context project (RTI2018-101045-B-C21) and by the Basque Government under the Deustek project (IT-1078-16-D).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Adrián Núñez-Marcos .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Núñez-Marcos, A., Azkune, G., Arganda-Carreras, I. (2021). Exploiting Egocentric Cues for Action Recognition for Ambient Assisted Living Applications. In: Alja’am, J., Al-Maadeed, S., Halabi, O. (eds) Emerging Technologies in Biomedical Engineering and Sustainable TeleMedicine. Advances in Science, Technology & Innovation. Springer, Cham. https://doi.org/10.1007/978-3-030-14647-4_10

Download citation