Advertisement

Multi-input 1-dimensional deep belief network: action and activity recognition as case study

  • Ali Mohammad Nickfarjam
  • Hossein Ebrahimpour-KomlehEmail author
Article

Abstract

This paper develops a new variation of deep belief networks which is evaluated on the basis of supervised classification of human actions and activities. The proposed multi-input 1-dimensional deep belief network (M1DBN) can work based on three inputs which contain different information structures. The multi input features helps M1DBN automatically search the solution space more accurately and extract high-level representations more efficiently. M1DBN utilizes three inputs to provide spatial, short-term and long-term information for the action and activity recognition. Spatial information can distinguish between human movements which have a high inter-class variation. However, regarding similar inter-class variations, a temporal description is used. Short-term and long-term inputs learn actions or activities for short and long video intervals, respectively. Experimental results show the superiority of this approach over state-of-the-art methods on KTH (97.04%), HMDB51 (67.19%), UCI-HAD (97.16%) and Skoda (93.28%) datasets. Also, a detailed explanation of learning, training and test procedures used in M1DBN are available.

Keywords

Multi-input 1-Dimensional Deep Belief Network Action Recognition Activity Recognition Spatial Description Temporal Description 

Notes

Compliance with ethical standards

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical approval

This paper uses four standard datasets of human actions or activities. All procedures performed on four datasets which involving human participants were in accordance with the ethical standards of the institutional and/or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. This article dose not contain any studies with animals performed by any of the authors.

Informed consent

Informed consent was obtained from all individual participants included in the study during gathering two standard datasets of human actions by researchers.

References

  1. 1.
    Access time: December 2, 2017. Available on: www.nada.kth.se/cvap/actions
  2. 2.
    Access time: December 2, 2017. Available on: serre-lab.clps.brown.edu/resource/hmdb-a-large-human-motion-database
  3. 3.
    Access time: July 4, 2018. Available on: https://archive.ics.uci.edu/ml/machine-learning-databases/00240
  4. 4.
  5. 5.
    Alsheikh MA, Selim A, Niyato D, Doyle L, Lin S, Tan HP (2016) Deep activity recognition models with triaxial accelerometers/ AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart EnvironmentsGoogle Scholar
  6. 6.
    Anguita D, Ghio A, Oneto L, Parra X, Reyes-Ortiz JL (2013) A public domain dataset for human activity recognition using smartphones. European Symposium on Artificial Neural NetworksGoogle Scholar
  7. 7.
    Batra D, Chen T, Sukthankar R (2008) Space-time shapelets for action recognition. Proceedings of the Workshop on Motion and Video ComputingGoogle Scholar
  8. 8.
    Bilinski P, Corvee E, Bak S (2013) Relative dense tracklets for human action recognition. IEEE International Conference and Workshops on Automatic Face and Gesture RecognitionGoogle Scholar
  9. 9.
    Boyer WH (1984) CRC standard mathematical tables, 27th edn. CRC Press, Boca RatonGoogle Scholar
  10. 10.
    Cai Z, Wang L, Peng X (2014) Multi-view super vector for action recognition. IEEE conference on Computer Vision and Pattern RecognitionGoogle Scholar
  11. 11.
    Chen J, Jin Q, Chao J (2012) Design of deep belief networks for short-term prediction of drought index using data in the Huaihe river basin. Math Probl Eng 2012:235929Google Scholar
  12. 12.
    Chen B, Ting J, Marlin B (2010) Deep learning of invariant spatio-temporal features from video. Neural Information Processing Systems WorkshopGoogle Scholar
  13. 13.
    Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. IEEE Computer Society Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  14. 14.
    Fathi A, Mori G (2008) Action recognition by learning mid-level motion features. IEEE International Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  15. 15.
    Feichtenhofer C, Pinz A, Zisserman A (2016) Convolutional two-stream network fusion for video action recognition. IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  16. 16.
    Gilbert A, Illingworth J, Bowden R (2011) Action recognition using mined hierarchical compound features. IEEE Transactions on Pattern Analysis and Machine IntelligenceGoogle Scholar
  17. 17.
    Gowda SN (2017) Human activity recognition using combinatorial Deep Belief Networks. IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)Google Scholar
  18. 18.
    Herath S, Harandi M, Porikli F (2017) Going deeper into action recognition: A survey. Image Vis Comput 60:4–21CrossRefGoogle Scholar
  19. 19.
    Hinton GE (2007) Learning multiple layers of representation. Trends Cogn Sci 11:428–434CrossRefGoogle Scholar
  20. 20.
    Hinton G (2014) Where do features come from? Cogn Sci 38:1078–1101CrossRefGoogle Scholar
  21. 21.
    Hinton GE, Osindero S, Teh YW (2006) A fast learning algorithm for deep belief nets. Neural Comput 18:1527–1554MathSciNetCrossRefGoogle Scholar
  22. 22.
    Jhuang H, Serre T, Wolf L (2007) A biologically inspired system for action recognition. IEEE International Conference on Computer VisionGoogle Scholar
  23. 23.
    Jiang W, Yin, Z (2015) Human activity recognition using wearable sensors by deep convolutional neural networks. the 23rd ACM international conference on MultimediaGoogle Scholar
  24. 24.
    Kovashka A, Grauman K (2010) Learning a hierarchy of discriminative space-time neighborhood features for human action recognition. IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  25. 25.
    Kuehne H, Jhuang H, Garrote E (2011) HMDB: a large video database for human motion recognition. IEEE International Conference on Computer VisionGoogle Scholar
  26. 26.
    Kumar RC, Bharadwaj SS, Sumukha BN, George, K (2016) Human activity recognition in cognitive environments using sequential ELM. In International IEEE Conference on Cognitive Computing and Information ProcessingGoogle Scholar
  27. 27.
    Laptev I, Marszalek M, Schmid C (2008) Learning realistic human actions from movies. IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  28. 28.
    Le QV, Zou WY, Yeung SY (2011) Learning hierarchical invariant spatio-temporal features for action recognition with independent subspace analysis. IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  29. 29.
    LeCun Y, Bengio Y, Hinton G (2015) Deep learning. Nature 521:436–444CrossRefGoogle Scholar
  30. 30.
    Li C, Chen C, Zhang B, Ye Q, Han J, Ji R (2017) Deep spatio-temporal manifold network for action recognition/ arXiv preprint arXiv:1705.03148Google Scholar
  31. 31.
    Lin SY, Lin YY, Chen CS (2017) Learning and inferring human actions with temporal pyramid features based on conditional random fields. IEEE International Conference on Acoustics, Speech and Signal ProcessingGoogle Scholar
  32. 32.
    Liu L, Shao L, Rockett P (2013) Boosted key-frame selection and correlated pyramidal motion-feature representation for human action recognition. Pattern RecognitionGoogle Scholar
  33. 33.
    Liu W, Wang Z, Liu X, Zeng N, Liu Y, Alsaadi FE (2017) A survey of deep neural network architectures and their applications. Neurocomputing 234:11–26CrossRefGoogle Scholar
  34. 34.
    Moeslund TB, Hilton A, Kruger V (2006) A survey of advances in vision-based human motion capture and analysis. Comput Vis Image Underst 104:90–127CrossRefGoogle Scholar
  35. 35.
    Murad A, Pyun JY (2017) Deep recurrent neural networks for human activity recognition. Sensors 17:2556CrossRefGoogle Scholar
  36. 36.
    Murahari VS, Ploetz T (2018) On attention models for human activity recognition. arXiv preprint arXiv:1805.07648Google Scholar
  37. 37.
    Nickfarjam AM, Najafabadi AP, Ebrahimpour-Komleh H (2014) Efficient parameter tuning for histogram of oriented gradients. IEEE International Conference on Electrical EngineeringGoogle Scholar
  38. 38.
    Peng X, Wang L, Wang X (2016) Bag of visual words and fusion methods for action recognition: Comprehensive study and good practice. Computer Vision and Image UnderstandingGoogle Scholar
  39. 39.
    Perš J, Sulić V, Kristan M (2010) Histograms of optical flow for efficient representation of body motion. Pattern Recogn Lett 31:1369–1376CrossRefGoogle Scholar
  40. 40.
    Poppe R (2010) A survey on vision-based human action recognition. Image Vis Comput 28:976–990CrossRefGoogle Scholar
  41. 41.
    Rahmani H, Mian A, Shah M (2017) Learning a deep model for human action recognition from novel viewpoints. IEEE Transactions on Pattern Analysis and Machine IntelligenceGoogle Scholar
  42. 42.
    Ravi D, Wong C, Lo B, Yang GZ (2016) Deep learning for human activity recognition: A resource efficient implementation on low-power devices. IEEE 13th International Conference on Wearable and Implantable Body Sensor NetworksGoogle Scholar
  43. 43.
    Reddy KK, Shah M (2013) Recognizing 50 human action categories of web videos. Mach Vis Appl 24:971–981CrossRefGoogle Scholar
  44. 44.
    Sharma S, Kiros R, Salakhutdinov R (2015) Action recognition using visual attention. arXiv preprint arXiv:1511.04119Google Scholar
  45. 45.
    Shi Y, Tian Y, Wang Y (2017) Sequential deep trajectory descriptor for action recognition with three-stream CNN. IEEE Transactions on Multimedia 19:1510–1520CrossRefGoogle Scholar
  46. 46.
    Sun L, Jia K, Yeung DY, Shi BE (2015) Human action recognition using factorized spatio-temporal convolutional networks. IEEE International Conference on Computer VisionGoogle Scholar
  47. 47.
    Tang X, Bouzerdoum A, Phung SL (2015) Video classification based on spatial gradient and optical flow descriptors. IEEE International Conference on Digital Image Computing: Techniques and ApplicationsGoogle Scholar
  48. 48.
    Tejero-de-Pablos A, Nakashima Y, Sato T, Yokoya N, Linna M, Rahtu E (2017) Summarization of user-generated sports video by using deep action recognition features. arXiv preprint arXiv:1709.08421Google Scholar
  49. 49.
    Twomey N, Diethe T, Fafoutis X, Elsts A, McConville R, Flach P, Craddock I (2018) A comprehensive study of activity recognition using accelerometers. Publisher of Open Access Journals, Journal of InformaticsGoogle Scholar
  50. 50.
    Uddin MZ, Kim J (2017) A robust approach for human activity recognition using 3-D body joint motion features with deep belief network. KSII Transactions on Internet and Information Systems (TIIS) 11:1118–1133Google Scholar
  51. 51.
    Varol G, Laptev I, Schmid C (2017) Long-term temporal convolutions for action recognition. IEEE Transactions on Pattern Analysis and Machine IntelligenceGoogle Scholar
  52. 52.
    Wang H, Kläser A, Schmid C (2013) Dense trajectories and motion boundary descriptors for action recognition. International Journal of Computer VisionGoogle Scholar
  53. 53.
    Wang H, Kläser A, Schmid C, Liu CL (2011, June) Action recognition by dense trajectories. In: Computer Vision and Pattern Recognition (CVPR), 2011 IEEE Conference on (pp 3169–3176). IEEE.Google Scholar
  54. 54.
    Wang L, Qiao Y, Tang X (2013) Motionlets: Mid-level 3d parts for human motion recognition. IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  55. 55.
    Wang L, Qiao Y, Tang X (2015) Action recognition with trajectory pooled deep-convolutional descriptors. IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  56. 56.
    Wang H, Schmid C (2013) Action recognition with improved trajectories. IEEE International Conference on Computer VisionGoogle Scholar
  57. 57.
    Wang L, Suter D (2007) Learning and matching of dynamic shape manifolds for human action recognition. IEEE Trans Image Process 16:1646–1661MathSciNetCrossRefGoogle Scholar
  58. 58.
    Wang H, Ullah MM, Klaser A (2009) Evaluation of local spatio-temporal features for action recognition. In BMVC 2009-British Machine Vision ConferenceGoogle Scholar
  59. 59.
    Wang L, Xiong Y, Wang Z (2015) Towards good practices for very deep two-stream convnets. arXiv preprint arXiv:1507.02159Google Scholar
  60. 60.
    Wen W, Cai R, Hao Z, Yang X, Li Y (2017) Recognizing activities from partially observed streams using posterior regularized conditional random fields. Neurocomputing 260Google Scholar
  61. 61.
    Xia L, Chen CC, Aggarwal JK (2012) View invariant human action recognition using histograms of 3d joints. IEEE Computer Society Workshops and Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  62. 62.
    Yalçın H (2016) Human activity recognition using deep belief networks. IEEE Signal Processing and Communication Application Conference (SIU)Google Scholar
  63. 63.
    Yuan C, Li X, Hu W (2013) 3d R transform on spatio-temporal interest points for action recognition. IEEE Conference on Computer Vision and Pattern RecognitionGoogle Scholar
  64. 64.
    Zappi P, Lombriser C, Stiefmeier T, Farella E, Roggen D, Benini L, Tröster G (2008) Activity recognition from on-body sensors: accuracy-power trade-off by dynamic sensor selection. Wireless sensor networks, Springer, Berlin, HeidelbergGoogle Scholar
  65. 65.
    Zhang H, Zhou F, Zhang W, Yuan X, Chen Z (2014) Real-time action recognition based on a modified deep belief network model. IEEE International Conference on Information and Automation (ICIA)Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2019

Authors and Affiliations

  • Ali Mohammad Nickfarjam
    • 1
  • Hossein Ebrahimpour-Komleh
    • 1
    Email author
  1. 1.Faculty of Electrical and Computer Engineering, Computer Engineering DepartmentUniversity of KashanKashanIran

Personalised recommendations