Neural Computing and Applications

, Volume 28, Issue 12, pp 3827–3835 | Cite as

Moving object recognition using multi-view three-dimensional convolutional neural networks

  • Tao He
  • Hua MaoEmail author
  • Zhang Yi
Original Article


Moving object recognition (MOR) is an important but challenging problem in the field of computer vision. The aim of MOR is to recognize moving objects in a given video dataset. Convolutional neural networks (CNNs) have been extensively used for image recognition and video analysis problems. Recently, a 3D-CNN, which contains 3D convolution layers, was proposed to address MOR problems by successfully extracting spatiotemporal features. In this paper, a multi-view (MV) 3D-CNN is proposed for MOR. This model combines 3D-CNNs with a well-known MV learning technique. Because multi-view learning techniques have the ability to obtain more view-related features from videos captured by different cameras, the proposed model can extract more representative features. Moreover, the model contains a special view-pooling layer that can fuse the feature information from previous layers. The proposed MV3D-CNN is applied to both real-world moving vehicle recognition and sign language recognition tasks. The experimental results show that the proposed model possesses good performance.


Moving object recognition Multi-view learning 3D convolutional neural networks Feature extraction Deep learning 



This work was supported by the National Science Foundation of China (Grant Nos. 61432012 and 61402306).


  1. 1.
    Kuehne H, Jhuang H, Garrote E, Poggio T, Serre T (2011) A large video database for human motion recognition. In International Conference on Computer VisionGoogle Scholar
  2. 2.
    Yang J, Yuan J, Li YF (2015) Flexible trajectory indexing for 3d motion recognition. In IEEE Winter Conference on Applications of Computer Vision, pp 326–332Google Scholar
  3. 3.
    Ji S, Xu W, Yang M, Yu K (2013) 3d convolutional neural networks for human action recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, pp 221–231Google Scholar
  4. 4.
    Simonyan K, Zisserman A (2014) Two-stream convolutional networks for action recognition in videos. In Advances in Neural Information Processing Systems, pp 568–576Google Scholar
  5. 5.
    Jang S, Choi K, Toh K, Teoh ABJ, Kim J (2015) Object tracking based on an online learning network with total error rate minimization. Pattern Recognition, pp 126–139Google Scholar
  6. 6.
    Baccouche M, Mamalet F, Wolf C, Garcia C, Baskurt A (2010) Action classification in soccer videos with long short-term memory recurrent neural networks. In Artificial Neural Networks-ICANN, pp 154–159Google Scholar
  7. 7.
    Ciresan DC, Meier U, Masci J, Maria Gambardella L, Schmidhuber J (2011) Flexible, high performance convolutional neural networks for image classification. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, p 1237Google Scholar
  8. 8.
    Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, pp 1097–1105Google Scholar
  9. 9.
    LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems, pp 253–256Google Scholar
  10. 10.
    Karpathy A, Toderici G, Shetty S, Leung T, Sukthankar R, Fei-Fei L (2014) Large-scale video classification with convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition, pp 1725–1732Google Scholar
  11. 11.
    Yue-Hei Ng J, Hausknecht H, Vijayanarasimhan S, Vinyals O, Monga R, Toderici G (2015) Beyond short snippets: Deep networks for video classification. arXiv preprint arXiv:1503.08909
  12. 12.
    Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatio-temporal features with 3D convolutional networksGoogle Scholar
  13. 13.
    Chang X, Dacheng T, Chao X (2013) A survey on multi-view learning. arXiv preprint arXiv:1304.5634
  14. 14.
    Su H, Maji S, Kalogerakis E, Learned-Miller E (2015) Multi-view convolutional neural networks for 3D shape recognition. arXiv preprint arXiv:1505.00880
  15. 15.
    Zhe W, Songcan C, Daqi G (2011) A novel multi-view learning developed from single-view patterns. Pattern Recognition, pp 2395–2413Google Scholar
  16. 16.
    Bouvrie J (2006) Notes on convolutional neural networksGoogle Scholar
  17. 17.
    Neidle C, Vogler C (2012) A new web interface to facilitate access to corpora: Development of the asllrp data access interface. In Proceedings of the International Conference on Language Resources and EvaluationGoogle Scholar
  18. 18.
    Neidle C, Thangali A, Sclaroff S (2012) Challenges in development of the american sign language lexicon video dataset (asllvd) corpus. In Processings of Sign Languages: Interactions between Corpus and LexiconGoogle Scholar
  19. 19.
    Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov R (2012) Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580

Copyright information

© The Natural Computing Applications Forum 2016

Authors and Affiliations

  1. 1.Machine Intelligence Laboratory, College of Computer ScienceSichuan UniversityChengduPeople’s Republic of China

Personalised recommendations