Adaptive Retraining of Visual Recognition-Model in Human Activity Recognition by Collaborative Humanoid Robots

Conference paper
Part of the Advances in Intelligent Systems and Computing book series (AISC, volume 1251)


We present a vision-based activity recognition system for centrally connected humanoid robots. The robots interact with several human participants who have varying behavioral styles and inter-activity-variability. A cloud server provides and updates the recognition model in all robots. The server continuously fetches the new activity videos recorded by the robots. It also fetches corresponding results and ground-truths provided by the human interacting with the robot. A decision on when to retrain the recognition model is made by an evolving performance-based logic. In the current article, we present the aforementioned adaptive recognition system with special emphasis on the partitioning logic employed for the division of new videos in training, cross-validation, and test groups of the next retraining instance. The distinct operating logic is based on class-wise recognition inaccuracies of the existing model. We compare this approach to a probabilistic partitioning approach in which the videos are partitioned with no performance considerations.


Learning and adaptive systems Human activity recognition Online learning Distributed robot systems Computer vision Intersection-kernel svm model Dense interest point trajectories 



Activities of daily living


Bag of words

CV/cv Group:

Cross-validation group


Enhanced ADLs


Generalized method of moments


Human activity recognition


Histogram of optical flows


Histogram of gradients


Hue, saturation and value


Instrumental ADLs


Intersection kernel based SVM


Interest point


Long short-term memory


Motion boundary histogram


MBH in x orientation


MBH in y orientation


Natural language processing


Probabilistic contribution Split


Probabilistic ratio Split


Recurrent neural network


Space-time interest points


Spatio-temporal LSTM


Support vector machine

TR/tr Group:

Training group

TS/ts Group:

Test group



The authors would like to thank CARNOT MINES-TSN for funding this work through the ‘Robot apprenant’ project.

We are thankful to the Service Robotics Research Center at Technische Hochschule Ulm (SeRoNet project) for supporting the consolidation period of this article.


  1. 1.
  2. 2.
    Begum, M., et al.: Performance of daily activities by older adults with dementia: the role of an assistive robot. In: 2013 IEEE 13th International Conference on Rehabilitation Robotics (ICORR), pp. 1–8 (2013).
  3. 3.
    Bertsch, F.A., Hafner, V.V.: Real-time dynamic visual gesture recognition in human-robot interaction. In: 2009 9th IEEE-RAS International Conference on Humanoid Robots, pp. 447–453 (2009).
  4. 4.
    Bilinski, P., Bremond, F.: Contextual statistics of space-time ordered features for human action recognition. In: 2012 IEEE Ninth International Conference on Advanced Video and Signal-Based Surveillance, pp. 228–233 (2012)Google Scholar
  5. 5.
    Boucenna, S., et al.: Learning of social signatures through imitation game between a robot and a human partner. IEEE Trans. Auton. Mental Dev. 6(3), 213–225 (2014). ISSN 1943-0604MathSciNetCrossRefGoogle Scholar
  6. 6.
    Chen, T.L., et al.: Robots for humanity: using assistive robotics to empower people with disabilities. IEEE Robot. Autom. Mag. 20(1), 30–39 (2013). ISSN 1070-9932CrossRefGoogle Scholar
  7. 7.
    Cho, K., Chen, X.: Classifying and visualizing motion capture sequences using deep neural networks. In: VISAPP 2014 - Proceedings of the 9th International Conference on Computer Vision Theory and Applications, vol. 2, June 2013Google Scholar
  8. 8.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: In CVPR, pp. 886–893 (2005)Google Scholar
  9. 9.
    Du, Y., Wang, W., Wang, L.: Hierarchical recurrent neural network for skeleton based action recognition. pp. 1110–1118, June 2015.
  10. 10.
    El-Yacoubi, M.A., et al.: Vision-based recognition of activities by a humanoid robot. Int. J. Adv. Robot. Syst. 12(12), 179 (2015).
  11. 11.
    Falco, P., et al.: Representing human motion with FADE and U-FADE: an efficient frequency-domain approach. In: Autonomous Robots, March 2018Google Scholar
  12. 12.
    Farnebäck, G.: Two-frame motion estimation based on polynomial expansion. In: Bigun, J., Gustavsson, T. (eds.) Image Analysis: 13th Scandinavian Conference, SCIA 2003 Halmstad, Sweden, 29 June–2 July 2003 Proceedings, pp. 363–370. Springer, Heidelberg (2003). ISBN: 978-3-540-45103-7Google Scholar
  13. 13.
    Ho, Y., et al.: A hand gesture recognition system based on GMM method for human-robot interface. In: 2013 Second International Conference on Robot, Vision and Signal Processing, pp. 291–294 (2013).
  14. 14.
    Kotseruba, I., Tsotsos, J.K.: 40 years of cognitive architectures: core cognitive abilities and practical applications. In: Artificial Intelligence Review (2018). ISSN 1573-7462
  15. 15.
    Kragic, D., et al.: Interactive, collaborative robots: challenges and opportunities. In: Proceedings of the 27th International Joint Conference on Artificial Intelligence (IJCAI 2018), pp. 18–25. AAAI Press, Stockholm (2018). ISBN 978-0-9992411-2-7
  16. 16.
    Kruger, V., et al.: Learning actions from observations. IEEE Robot. Autom. Mag. 17(2), 30–43 (2010). ISSN 1070-9932CrossRefGoogle Scholar
  17. 17.
    Laptev, I.: On space-time interest points. Int. J. Comput. Vis. 64(2), 107–123 (2005). ISSN 1573-1405CrossRefGoogle Scholar
  18. 18.
    Laptev, I., et al.: Learning realistic human actions from movies, June 2008.
  19. 19.
    Lee, D., Soloperto, R., Saveriano, M.: Bidirectional invariant representation of rigid body motions and its application to gesture recognition and reproduction. Auton. Robots 42, 1–21 (2017). Scholar
  20. 20.
    Liu, J., et al.: Spatio-temporal LSTM with trust gates for 3D human action recognition. vol. 9907, October 2016.
  21. 21.
    Maji, S., Berg, A.C., Malik, J.: Classification using intersection kernel support vector machines is efficient. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8 (2008).
  22. 22.
    Margheri, L.: Dialogs on robotics horizons [student’s corner]. IEEE Robot. Autom. Mag. 21(1), 74–76 (2014). ISSN 1070-9932MathSciNetCrossRefGoogle Scholar
  23. 23.
  24. 24.
  25. 25.
    Myagmarbayar, N., et al.: Human body contour data based activity recognition. In: 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 5634–5637 (2013).
  26. 26.
  27. 27.
    Oi, F., et al.: Sequence of the most informative joints (SMIJ): a new representation for human skeletal action recognition. vol. 25, pp. 8 –13, June 2012.
  28. 28.
    Okamoto, T., et al.: Toward a dancing robot with listening capability: keypose-based integration of lower-, middle-, and upper-body motions for varying music tempos. IEEE Trans. Robot. 30(3), 771–778 (2014). ISSN 1552-3098CrossRefGoogle Scholar
  29. 29.
    Olatunji, I.E.: Human activity recognition for mobile robot. In: CoRR abs/1801.07633 arXiv: 1801.07633 (2018).
  30. 30.
    Pers, J., et al.: Histograms of optical ow for efficient representation of body motion. Pattern Recog. Lett. 31, 1369–1376 (2010). Scholar
  31. 31.
    Santos, L., Khoshhal, K., Dias, J.: Trajectory-based human action segmentation. Pattern Recogn. 48(2), 568–579 (2015). ISSN 0031-3203
  32. 32.
    Sasaki, Y.: The truth of the F-measure. In: Teach Tutor Mater, January 2007Google Scholar
  33. 33.
    Saveriano, M., Lee, D.: Invariant representation for user independent motion recognition. In: 2013 IEEE RO-MAN, pp. 650–655 (2013).
  34. 34.
    Schenck, C., et al.: Which object fits best? solving matrix completion tasks with a humanoid robot. IEEE Trans. Auton. Mental Dev. 6(3), 226–240 (2014). ISSN 1943-0604CrossRefGoogle Scholar
  35. 35.
    Nandi, G.C., Siddharth, S., Akash, A.: Human-robot communication through visual game and gesture learning. In: International Advance Computing Conference (IACC), vol. 2, pp. 1395–1402 (2013).
  36. 36.
    Wang, H., et al.: Action recognition by dense trajectories. In: CVPR 2011, pp. 3169–3176 (2011).
  37. 37.
    Wang, H., et al.: Dense trajectories and motion boundary descriptors for action recognition. Int. J. Comput. Vis. 103(1), 60–79 (2013).
  38. 38.
    Yuan, F., et al.: Mid-level features and spatio-temporal context for activity recognition. Pattern Recogn. 45(12), 4182 –4191 (2012). ISSN 0031-3203
  39. 39.
    Zhen, X., Shao, L.: Spatio-temporal steerable pyramid for human action recognition. In: 2013 10th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG), pp. 1–6 (2013).
  40. 40.
    Zhu, W., et al.: Co-occurrence feature learning for skeleton based action recognition using regularized deep LSTM networks. In: CoRR abs/1603.07772. arXiv: 1603.07772 (2016).

Copyright information

© Springer Nature Switzerland AG 2021

Authors and Affiliations

  1. 1.Institute Mines-Telecom, Telecom SudParis, Institut Polytechnique de ParisParisFrance
  2. 2.Service Robotics Research Center, Technische Hochschule UlmUlmGermany

Personalised recommendations