Advertisement

Adaptive Retraining of Visual Recognition-Model in Human Activity Recognition by Collaborative Humanoid Robots

Conference paper
  • 303 Downloads
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1251)

Abstract

We present a vision-based activity recognition system for centrally connected humanoid robots. The robots interact with several human participants who have varying behavioral styles and inter-activity-variability. A cloud server provides and updates the recognition model in all robots. The server continuously fetches the new activity videos recorded by the robots. It also fetches corresponding results and ground-truths provided by the human interacting with the robot. A decision on when to retrain the recognition model is made by an evolving performance-based logic. In the current article, we present the aforementioned adaptive recognition system with special emphasis on the partitioning logic employed for the division of new videos in training, cross-validation, and test groups of the next retraining instance. The distinct operating logic is based on class-wise recognition inaccuracies of the existing model. We compare this approach to a probabilistic partitioning approach in which the videos are partitioned with no performance considerations.

Keywords

Learning and adaptive systems Human activity recognition Online learning Distributed robot systems Computer vision Intersection-kernel svm model Dense interest point trajectories 

Acronyms

ADLs:

Activities of daily living

BOW:

Bag of words

CV/cv Group:

Cross-validation group

EADLs:

Enhanced ADLs

GMM:

Generalized method of moments

HAR:

Human activity recognition

HOF:

Histogram of optical flows

HOG:

Histogram of gradients

HSV:

Hue, saturation and value

IADLs:

Instrumental ADLs

IKSVM:

Intersection kernel based SVM

IP:

Interest point

LSTM:

Long short-term memory

MBH:

Motion boundary histogram

MBHx:

MBH in x orientation

MBHy:

MBH in y orientation

NLP:

Natural language processing

P-CS/CS:

Probabilistic contribution Split

P-RS/RS:

Probabilistic ratio Split

RNN:

Recurrent neural network

STIPs:

Space-time interest points

ST-LSTM:

Spatio-temporal LSTM

SVM:

Support vector machine

TR/tr Group:

Training group

TS/ts Group:

Test group

Notes

Acknowledgment

The authors would like to thank CARNOT MINES-TSN for funding this work through the ‘Robot apprenant’ project.

We are thankful to the Service Robotics Research Center at Technische Hochschule Ulm (SeRoNet project) for supporting the consolidation period of this article.

References

  1. 1.
  2. 2.
    Begum, M., et al.: Performance of daily activities by older adults with dementia: the role of an assistive robot. In: 2013 IEEE 13th International Conference on Rehabilitation Robotics (ICORR), pp. 1–8 (2013).  https://doi.org/10.1109/ICORR.2013.6650405
  3. 3.
    Bertsch, F.A., Hafner, V.V.: Real-time dynamic visual gesture recognition in human-robot interaction. In: 2009 9th IEEE-RAS International Conference on Humanoid Robots, pp. 447–453 (2009).  https://doi.org/10.1109/ICHR.2009.5379541
  4. 4.
    Bilinski, P., Bremond, F.: Contextual statistics of space-time ordered features for human action recognition. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pp. 228–233 (2012)Google Scholar
  5. 5.
    Boucenna, S., et al.: Learning of social signatures through imitation game between a robot and a human partner. IEEE Trans. Auton. Mental Dev. 6(3), 213–225 (2014).  https://doi.org/10.1109/TAMD.2014.2319861. ISSN 1943-0604MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen, T.L., et al.: Robots for humanity: using assistive robotics to empower people with disabilities. IEEE Robot. Autom. Mag. 20(1), 30–39 (2013).  https://doi.org/10.1109/MRA.2012.2229950. ISSN 1070-9932CrossRefGoogle Scholar
  7. 7.
    Cho, K., Chen, X.: Classifying and visualizing motion capture sequences using deep neural networks. In: VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications, vol. 2, June 2013Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: In CVPR, pp. 886–893 (2005)Google Scholar
  9. 9.
    Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. pp. 1110–1118, June 2015.  https://doi.org/10.1109/CVPR.2015.7298714
  10. 10.
    El-Yacoubi, M.A., et al.: Vision-based recognition of activities by a humanoid robot. Int. J. Adv. Robot. Syst. 12(12), 179 (2015).  https://doi.org/10.5772/61819
  11. 11.
    Falco, P., et al.: Representing human motion with FADE and U-FADE: an efficient frequency-domain approach. In: Autonomous Robots, March 2018Google Scholar
  12. 12.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, 29 June–2 July 2003 Proceedings, pp. 363–370. Springer, Heidelberg (2003). ISBN: 978-3-540-45103-7Google Scholar
  13. 13.
    Ho, Y., et al.: A hand gesture recognition system based on GMM method for human-robot interface. In: 2013 Second International Conference on Robot, Vision and Signal Processing, pp. 291–294 (2013).  https://doi.org/10.1109/RVSP.2013.72
  14. 14.
    Kotseruba, I., Tsotsos, J.K.: 40 years of cognitive architectures: core cognitive abilities and practical applications. In: Artificial Intelligence Review (2018).  https://doi.org/10.1007/s10462-018-9646-y. ISSN 1573-7462
  15. 15.
    Kragic, D., et al.: Interactive, collaborative robots: challenges and opportunities. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), pp. 18–25. AAAI Press, Stockholm (2018). http://dl.acm.org/citation.cfm?id=3304415.3304419. ISBN 978-0-9992411-2-7
  16. 16.
    Kruger, V., et al.: Learning actions from observations. IEEE Robot. Autom. Mag. 17(2), 30–43 (2010).  https://doi.org/10.1109/MRA.2010.936961. ISSN 1070-9932CrossRefGoogle Scholar
  17. 17.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005). ISSN 1573-1405CrossRefGoogle Scholar
  18. 18.
    Laptev, I., et al.: Learning realistic human actions from movies, June 2008.  https://doi.org/10.1109/CVPR.2008.4587756
  19. 19.
    Lee, D., Soloperto, R., Saveriano, M.: Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Auton. Robots 42, 1–21 (2017).  https://doi.org/10.1007/s10514-017-9645-xCrossRefGoogle Scholar
  20. 20.
    Liu, J., et al.: Spatio-temporal LSTM with trust gates for 3D human action recognition. vol. 9907, October 2016.  https://doi.org/10.1007/978-3-319-46487-9_50
  21. 21.
    Maji, S., Berg, A.C., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008).  https://doi.org/10.1109/CVPR.2008.4587630
  22. 22.
    Margheri, L.: Dialogs on robotics horizons [student’s corner]. IEEE Robot. Autom. Mag. 21(1), 74–76 (2014).  https://doi.org/10.1109/MRA.2014.2298365. ISSN 1070-9932MathSciNetCrossRefGoogle Scholar
  23. 23.
  24. 24.
  25. 25.
    Myagmarbayar, N., et al.: Human body contour data based activity recognition. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5634–5637 (2013).  https://doi.org/10.1109/EMBC.2013.6610828
  26. 26.
  27. 27.
    Oi, F., et al.: Sequence of the most informative joints (SMIJ): a new representation for human skeletal action recognition. vol. 25, pp. 8 –13, June 2012.  https://doi.org/10.1109/CVPRW.2012.6239231
  28. 28.
    Okamoto, T., et al.: Toward a dancing robot with listening capability: keypose-based integration of lower-, middle-, and upper-body motions for varying music tempos. IEEE Trans. Robot. 30(3), 771–778 (2014).  https://doi.org/10.1109/TRO.2014.2300212. ISSN 1552-3098CrossRefGoogle Scholar
  29. 29.
    Olatunji, I.E.: Human activity recognition for mobile robot. In: CoRR abs/1801.07633 arXiv: 1801.07633 (2018). http://arxiv.org/abs/1801.07633
  30. 30.
    Pers, J., et al.: Histograms of optical ow for efficient representation of body motion. Pattern Recog. Lett. 31, 1369–1376 (2010).  https://doi.org/10.1016/j.patrec.2010.03.024CrossRefGoogle Scholar
  31. 31.
    Santos, L., Khoshhal, K., Dias, J.: Trajectory-based human action segmentation. Pattern Recogn. 48(2), 568–579 (2015).  https://doi.org/10.1016/j.patcog.2014.08.015. ISSN 0031-3203
  32. 32.
    Sasaki, Y.: The truth of the F-measure. In: Teach Tutor Mater, January 2007Google Scholar
  33. 33.
    Saveriano, M., Lee, D.: Invariant representation for user independent motion recognition. In: 2013 IEEE RO-MAN, pp. 650–655 (2013).  https://doi.org/10.1109/ROMAN.2013.6628422
  34. 34.
    Schenck, C., et al.: Which object fits best? solving matrix completion tasks with a humanoid robot. IEEE Trans. Auton. Mental Dev. 6(3), 226–240 (2014).  https://doi.org/10.1109/TAMD.2014.2325822. ISSN 1943-0604CrossRefGoogle Scholar
  35. 35.
    Nandi, G.C., Siddharth, S., Akash, A.: Human-robot communication through visual game and gesture learning. In: International Advance Computing Conference (IACC), vol. 2, pp. 1395–1402 (2013).  https://doi.org/10.1109/ICCV.2005.28
  36. 36.
    Wang, H., et al.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176 (2011).  https://doi.org/10.1109/CVPR.2011.5995407
  37. 37.
    Wang, H., et al.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013).  https://doi.org/10.1007/s11263-012-0594-8. https://hal.inria.fr/hal-00803241
  38. 38.
    Yuan, F., et al.: Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn. 45(12), 4182 –4191 (2012).  https://doi.org/10.1016/j.patcog.2012.05.001. http://www.sciencedirect.com/science/article/pii/S0031320312002129. ISSN 0031-3203
  39. 39.
    Zhen, X., Shao, L.: Spatio-temporal steerable pyramid for human action recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–6 (2013).  https://doi.org/10.1109/FG.2013.6553732
  40. 40.
    Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: CoRR abs/1603.07772. arXiv: 1603.07772 (2016). http://arxiv.org/abs/1603.07772

Copyright information

© Springer Nature Switzerland AG 2021

Authors and Affiliations

  1. 1.Institute Mines-Telecom, Telecom SudParis, Institut Polytechnique de ParisParisFrance
  2. 2.Service Robotics Research Center, Technische Hochschule UlmUlmGermany

Personalised recommendations