Skip to main content

Recognizing Hand Gesture in Still Infrared Images by CapsNet

  • Conference paper
  • First Online:
Web Information Systems Engineering – WISE 2021 (WISE 2021)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13080))

Included in the following conference series:

Abstract

In recent years, gesture recognition has been becoming a hot topic that attracts much attention from the computer vision community because of its great potential in many real-world applications. There is a need to design a robust hand-based gesture recognition algorithm to cope with hand gesture recognition tasks. Infrared image recognition has the characteristic of not being disturbed by illumination variation. As a promising alternative to Convolutional Neural Networks (CNN), Capsule Networks (CapsNet) can represent the orientations of features and capture the spatial relationships between features of an entity, which makes CapsNet possess higher generalization ability. In this paper, we propose IRHGR-CapsNet to investigate hand gesture recognition in still infrared images. To evaluate the testing accuracy, the convergence ability and the generalization ability of IRHGR-CapsNet comprehensively, we split the original dataset into three subsets according to different split proportions and get four different dataset split modes for experiments. We can achieve almost 100.00% testing accuracy on all the four dataset split modes. Meanwhile, we can also demonstrate that our proposed IRHGR-CapsNet has a strong convergence ability and generalization ability.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://www.csc.kth.se/cvap/actions/.

  2. 2.

    https://www.crcv.ucf.edu/research/data-sets/.

  3. 3.

    https://www.kaggle.com/gti-upm/leapgestrecog.

References

  1. Marusarz, W.: The challenges and opportunities of gesture recognition. https://nexocode.com/blog/posts/gestures-recognition-challenges-and-opportunities/. Accessed 19 Apr 2021

  2. Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1–9 (2012).https://doi.org/10.1145/3065386

  3. Deng, J., Dong, W., Socher, R., Li, L.-J., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: CVPR, pp. 248–255 (2009). https://doi.org/10.1109/cvpr.2009.5206848

  4. Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015). https://doi.org/10.1109/CVPR.2015.7298965

  5. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: ICCV, pp. 2980–2988 (2017). https://doi.org/10.1109/ICCV.2017.322

  6. Sermanet, P., Eigen, D., Zhang, X., Mathieu, M., Fergus, R., LeCun, Y.: OverFeat: integrated recognition, localization and detection using convolutional networks. In: ICLR (2014). https://doi.org/10.1016/j.visres.2006.11.009

  7. Yosinski, J., Clune, J., Nguyen, A., Yosinski, J., Clune, J.: Deep neural networks are easily fooled: high confidence predictions for unrecognizable images. In: CVPR, pp. 427–436 (2015). https://doi.org/10.1109/CVPR.2015.7298640

  8. Hinton, G.E., Krizhevsky, A., Wang, S.D.: Transforming auto-encoders. In: Honkela, T., Duch, W., Girolami, M., Kaski, S. (eds.) ICANN 2011. LNCS, vol. 6791, pp. 44–51. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21735-7_6

    Chapter  Google Scholar 

  9. Sabour, S., Frosst, N., Hinton, G.E.: Dynamic routing between capsules. In: NIPS, pp. 1–11 (2017). https://doi.org/10.1177/1535676017742133

  10. Xiao, H., et al.: Sign language digits and alphabets recognition by capsule networks. J. Ambient Intell. Humaniz. Comput. 1–11 (2021). https://doi.org/10.1007/s12652-021-02974-8

  11. Hoogi, A., Wilcox, B., Gupta, Y., Rubin, D.L.: Self-attention capsule networks for image classification. arXiv Prepr. arXiv:1904.12483 (2019)

  12. LaLonde, R., Bagci, U.: Capsules for object segmentation. In: 1st Conference on Medical Imaging with Deep Learning (MIDL), pp. 1–9 (2018)

    Google Scholar 

  13. Neelavathy Pari, S., Mohana, T., Akshaya, V.: Real-time traffic sign detection using capsule network. In: Proceedings of the 11th International Conference on Advanced Computing (ADCOM), pp. 193–196 (2019). https://doi.org/10.1109/ICoAC48765.2019.247140

  14. Ertugrul, I.O., Jeni, L.A., Cohn, J.F.: FACSCaps: pose-independent facial action coding with capsules. In: CVPR Workshops, pp. 2211–2220 (2018). https://doi.org/10.1109/CVPRW.2018.00287

  15. Duarte, K., Rawat, Y.S., Shah, M.: VideocapsuleNet: a simplified network for action detection. In: NeurIPS, pp. 7610–7619 (2018)

    Google Scholar 

  16. McIntosh, B., Duarte, K., Rawat, Y.S., Shah, M.: Multi-modal capsule routing for actor and action video segmentation conditioned on natural language queries. arXiv Prepr. arXiv:1812.00303 (2018)

  17. Yu, Y., Tian, N., Chen, X., Li, Y.: Skeleton capsule net: an efficient network for action recognition. In: Proceedings of 8th International Conference on Virtual Reality and Visualization (ICVRV), pp. 74–77. IEEE (2018). https://doi.org/10.1109/ICVRV.2018.00022

  18. Algamdi, A.M., Sanchez, V., Li, C.-T.: Learning temporal information from spatial information using CapsNets for human action recognition. In: International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3867–3871 (2019). https://doi.org/10.1109/icassp.2019.8683720

  19. Mantecón, T., del-Blanco, C.R., Jaureguizar, F., García, N.: Hand gesture recognition using infrared imagery provided by leap motion controller. In: Blanc-Talon, J., Distante, C., Philips, W., Popescu, D., Scheunders, P. (eds.) ACIVS 2016. LNCS, vol. 10016, pp. 47–57. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-48680-2_5

    Chapter  Google Scholar 

  20. Abadi, M., et al.: TensorFlow: large-scale machine learning on heterogeneous distributed systems. arXiv Prepr. arXiv:1603.04467 (2016)

  21. Kingma, D.P., Ba, J.L.: Adam: a method for stochastic optimization. In: ICLR, pp. 1–15 (2015)

    Google Scholar 

  22. Huang, D.-Y., Hu, W.-C., Chang, S.-H.: Gabor filter-based hand-pose angle estimation for hand gesture recognition under varying illumination. Expert Syst. Appl. 38, 6031–6042 (2011). https://doi.org/10.1016/j.eswa.2010.11.016

    Article  Google Scholar 

  23. Mantecón, T., Mantecón, A., Del-Blanco, C.R., Jaureguizar, F., García, N.: Enhanced gesture-based human-computer interaction through a compressive sensing reduction scheme of very large and efficient depth feature descriptors. In: 2th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS) (2015). https://doi.org/10.1109/AVSS.2015.7301804

  24. Qi, L., Dou, W., Zhang, X., Chen, J.: A Qos-aware composition method supporting cross-platform service invocation in cloud environment. J. Comput. Syst. Sci. 78(5), 1316–1329 (2012). https://doi.org/10.1016/j.jcss.2011.12.016

    Article  MATH  Google Scholar 

  25. Wang, L., Jie, W., Chen, J.: Grid Computing: Infrastructure, Service, and Applications, 528 p. CRC Press, Boca Raton (2009). ISBN 13:978-1420067668. https://doi.org/10.1201/9781315218854

  26. Qi, L., Dou, W., Chen, J.: Weighted principal component analysis-based service selection method for multimedia services in cloud. Computing 98(1–2), 195–214 (2014). https://doi.org/10.1007/s00607-014-0413-x

    Article  MathSciNet  MATH  Google Scholar 

  27. Liu, X., Yuan, D., Zhang, G., Chen, J., Yang, Y.: Swindew-C: a peer-to-peer based cloud workflow system. In: Borko, F., Armando, E. (eds.) Handbook of Cloud Computing, pp. 309–332. Springer, Boston (2010). ISBN 978-1-4419-6523-3. https://doi.org/10.1007/978-1-4419-6524-0_13

  28. Song, X., Dou, W., Chen, J.: A workflow framework for intelligent service composition. Futur. Gener. Comput. Syst. 27(5), 627–636 (2011). https://doi.org/10.1016/j.future.2010.06.008

    Article  Google Scholar 

  29. Chen, J., Yang, Y.: Temporal dependency based checkpoint selection for dynamic verification of fixed-time constraints in grid workflow systems. In: Proceedings of ACM/IEEE 30th International Conference on Software Engineering (ICSE), pp. 141–150 (2008). https://doi.org/10.1145/1368088.1368108

  30. Puthal, D., Nepal, S., Ranjan, R., Chen, J.: DLSeF: a dynamic key-length-based efficient real-time security verification model for big data stream. ACM Trans. Embed. Comput. Syst (TECS) 16(2), Article 51 (2017). https://doi.org/10.1145/2937755

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jinjun Chen .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Xiao, H. et al. (2021). Recognizing Hand Gesture in Still Infrared Images by CapsNet. In: Zhang, W., Zou, L., Maamar, Z., Chen, L. (eds) Web Information Systems Engineering – WISE 2021. WISE 2021. Lecture Notes in Computer Science(), vol 13080. Springer, Cham. https://doi.org/10.1007/978-3-030-90888-1_13

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-90888-1_13

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-90887-4

  • Online ISBN: 978-3-030-90888-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics