Skip to main content

Advertisement

Log in

Action recognition for the robotics and manufacturing automation using 3-D binary micro-block difference

  • ORIGINAL ARTICLE
  • Published:
The International Journal of Advanced Manufacturing Technology Aims and scope Submit manuscript

Abstract

Vision-based control systems play an important role in modern robotics systems. An important task in implementing such a system is developing an effective algorithm for recognizing human actions and the working environment and the design of intuitive gesture commands. This paper proposes an action recognition algorithm for robotics and manufacturing automation. The key contributions are (1) fusion of multimodal information obtained by depth sensors and cameras of the visible range, (2) modified Gabor-based and 3-D binary-based descriptor using micro-block difference, (3) efficient skeleton-based descriptor, and (4) recognition algorithm using the combined descriptor. The proposed binary micro-block difference representation of 3-D patches from video with a complex background in several scales and orientations leads to an informative description of the scene action. The experimental results showed the effectiveness of the proposed algorithm on datasets.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13

Similar content being viewed by others

Availability of data and materials

Not applicable.

Code availability

Not applicable.

References

  1. Ogenyi U, Liu J, Yang C, Ju Z, Liu H (2021) Physical human–robot collaboration: robotic systems, learning methods, collaborative strategies, sensors, and actuators. IEEE Transactions on Cybernetics 51(4):1888–1901

    Article  Google Scholar 

  2. Heo Y, Kim D, Lee W, Kim H, Park J, Chung W (2019) Collision detection for industrial collaborative robots: a deep learning approach. IEEE Robotics and Automation Letters 4(2):740–746

    Article  Google Scholar 

  3. Nascimento H, Mujica M, Benoussaad M (2021) Collision avoidance interaction between human and a hidden robot based on kinect and robot data fusion. IEEE Robotics and Automation Letters 6(1):88–94

    Article  Google Scholar 

  4. Solmaz B, Assari SM, Shah M (2013) Classifying web videos using a global video descriptor. Mach Vis Appl 24(7):1473–1485

    Article  Google Scholar 

  5. Ji XF, Wu QQ, Ju ZJ, Wang YY (2017) Study of human action recognition based on improved spatio-temporal features. Springer, Berlin Heidelberg, pp 233–250

    Google Scholar 

  6. Shahroudy A, Liu J, Ng TT, Wang G (2016) Ntu rgb+ d: A large scale dataset for 3-D human activity analysis. Proc CVPR:1010–1019

  7. Ke Q, Bennamoun M, An S, Sohel F, Boussaid F (2017) A new representation of skeleton sequences for 3-D action recognition. Proc CVPR:3288–3297

  8. Kim TS, Reiter A (2017) Interpretable 3-D human action analysis with temporal convolutional networks. Proc. CVPR, 1623-1631

  9. Brailean JC, Little D, Giger ML, Chen CT, Sullivan BJ et al (1992) Application of the EM algorithm to radiographic images. Med Phys 19(5):1175–1182

    Article  Google Scholar 

  10. Nercessian S, Panetta K, Agaian S (2011) Multiresolution decomposition schemes using the parameterized logarithmic image processing model with application to image fusion, computer science, EURASIP J. Adv. Signal Process

  11. Xie Z, Stockham TG (1989) Toward the unification of three visual laws and two visual models in brightness perception. IEEE Trans Syst Man Cyber 19:379–387

    Article  Google Scholar 

  12. Panetta K, Wharton E, Agaian S (2007) Parameterization of logarithmic image processing models. IEEE Tran. Systems, Man, and Cybernetics, Part A: Systems and Humans

  13. Zhdanova M, Voronin V, Semenishchev E, Ilyukhin Y, Zelensky A (2020) Human activity recognition for efficient human-robot collaboration. Proc. International Society for Optics and Photonics, 115430K

  14. Serrano-Cuerda J, Fernández-Caballero A, López M (2014) Selection of a visible-light vs. thermal infrared sensor in dynamic environments based on confidence measures. Appl Sci 4(3):331–350

    Article  Google Scholar 

  15. Voronin V, Zhdanova M, Semenishchev E, Zelensky A, Tokareva O (2020) Fusion of color and depth information for human actions recognition. Proc. International Society for Optics and Photonics, 114231C

  16. Berkan Solmaz, Shayan Modiri Assari, Mubarak Shah (2012) Classifying web videos using a global video descriptor

  17. Zelensky A, Zhdanova M, Voronin V, Alepko A, Gapon N, Egiazarian KO, Balabaeva O (2019) Control system of collaborative robotic based on the methods of contactless recognition of human actions. EPJ Web of Conferences 224:04006

    Article  Google Scholar 

  18. Baumann F, Ehlers A, Rosenhahn B, Liao J (2016) Recognizing human actions using novel space-time volume binary patterns. Neurocomputing. 173:54–63

    Article  Google Scholar 

  19. Ojala T, Pietikainen M, Harwood D (1994) Performance evaluation of texture measures with classification based on Kullback discrimination of distributions. Pattern Recognition, vol. 1

  20. Zhao G, Pietikäinen M (2006) Dynamic texture recognition using volume local binary patterns. Springer, 165-177

  21. Varol G, Laptev I, Schmid C (2018) Long-term temporal convolutions for action recognition. IEEE Trans Pattern Anal Mach Intell 40(6):1510–1517

    Article  Google Scholar 

  22. Belagiannis V, Zisserman A (2017) Recurrent human pose estimation. 12th IEEE International Conference on Automatic Face & Gesture Recognition. 468-475

  23. Tompson J, Jain A, LeCun Y, Bregler C (2014) Joint training of a convolutional network and a graphical model for human pose estimation. In NIPS, 1799–1807

  24. Johnson S, Everingham M (2010) Clustered pose and nonlinear appearance models for human pose estimation. In BMVC 2(4):5

    Google Scholar 

  25. Soomro K, Zamir AR and Shah M (2012) UCF101: A dataset of 101 human action classes from videos in the wild. In CRCV-TR-12-01

  26. Peng X, Wang L, Wang X, Qiao Y (2016) Bag of visual words and fusion methods for action recognition: comprehensive study and good practice. Comput Vis Image Underst 150:109–125

    Article  Google Scholar 

  27. Yue-Hei Ng J, Hausknecht M, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: deep networks for video classification. In CVRP

  28. Liu Q, Che X, Bie M (2019) R-STAN: residual spatial-temporal attention network for action recognition. IEEE Access 7:82246–82255

    Article  Google Scholar 

  29. Bilen H, Fernando B, GavvesE, Vedaldi A, Gould S (2016) Dynamic image networks for action recognition. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3034-3042

  30. Zhaofan Q, Ting Y, Tao M (2017) Learning spatio-temporal representation with pseudo-3D residual networks. ICCV

  31. Zhang B, Wang L, Wang Z, Qiao Y, Wang H (2016) Real-time action recognition with enhanced motion vector CNNs. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2718-2726

  32. Ng J, Choi J, Neumann J, Davis L (2018) ActionFlowNet: learning motion representation for action recognition. IEEE Winter Conference on Applications of Computer Vision (WACV), 1616-1624

  33. Chen C, Jafari R, Kehtarnavaz N (2015) UTD-MHAD: a multimodal dataset for human action recognition utilizing a depth camera and a wearable inertial sensor. Proceedings of IEEE International Conference on Image Processing

Download references

Funding

The reported study was funded by Educational Organizations in 2020–2022 Project under Grant No FSFS-2020-0031 and in part by RFBR and NSFC according to the research project No 20-57- 53012.

Author information

Authors and Affiliations

Authors

Contributions

All authors contributed to the process of critical literature review. In addition, all authors contributed to writing and revising the manuscripts.

Corresponding author

Correspondence to Viacheslav Voronin.

Ethics declarations

Ethics approval

The manuscript in part or in full has not been submitted or published anywhere. The manuscript will not be submitted elsewhere until the editorial process is completed.

Consent to participate

Not applicable.

Consent for publication

The author transfers to Springer the non-exclusive publication rights.

Conflict of interest

The authors declare no competing interests.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Voronin, V., Zhdanova, M., Semenishchev, E. et al. Action recognition for the robotics and manufacturing automation using 3-D binary micro-block difference. Int J Adv Manuf Technol 117, 2319–2330 (2021). https://doi.org/10.1007/s00170-021-07613-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00170-021-07613-2

Keywords

Navigation