Skip to main content
Log in

Image classification based on convolutional neural networks with cross-level strategy

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the past few years, convolutional neural networks (CNNs) have exhibited great potential in the field of image classification. In this paper, we present a novel strategy named cross-level to improve the existing networks’ architecture in which different levels of feature representation in a network are merely connected in series. The basic idea of cross-level is to establish a convolutional layer between two nonadjacent levels, aiming to extract more sufficient features with multiple scales at each feature representation level. The proposed cross-level strategy can be naturally integrated into an existing network without any change on its original architecture, which makes it very practical and convenient. Four popular convolutional networks for image classification are employed to illustrate its implementation in detail. Experimental results on the dataset adopted by the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) verify the effectiveness of the cross-level strategy on image classification. Furthermore, a new convolutional network with cross-level architecture is presented to demonstrate the potential of the proposed strategy in future network design.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828

    Article  Google Scholar 

  2. Caffe website: http://caffe.berkeleyvision.org

  3. Caffe model zoo: http://caffe.berkeleyvision.org/model_zoo.html

  4. Caffe model zoo wiki page: https://github.com/BVLC/caffe/wiki/Model-Zoo

  5. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297

    MATH  Google Scholar 

  6. Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on computer vision and pattern recognition (CVPR), vol 1, pp 886–893

  7. Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Ann Rev Neurosci 18:193–222

    Article  Google Scholar 

  8. Fan J, Xu W, Wu Y, Gong Y (2010) Human tracking using convolutional neural networks. IEEE Trans Neural Netw 21:1610–1623

    Article  Google Scholar 

  9. Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37

  10. He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision (ECCV), pp 346–361

  11. He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imageNet classification. arXiv:1502.01852

  12. Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507

    Article  MathSciNet  MATH  Google Scholar 

  13. Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural net-works by preventing co-adaptation of feature detectors. arXiv:1207.0580

  14. ImageNet Website: http://www.image-net.org/

  15. Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM International conference on multimedia, pp 675–678

  16. Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convoluntional neural networks. In: Advances in neural information processing systems (NIPS), vol 25, pp 1106–1114

  17. Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on computer vision and pattern recognition (CVPR), vol 2, pp 2169–2178

  18. LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1:541–551

    Article  Google Scholar 

  19. LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324

    Article  Google Scholar 

  20. LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: IEEE International symposium on circuits and systems, pp 254–256

  21. Lee C, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised networks. arXiv:1409.5185

  22. Li F F, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Understand 106:59–70

    Article  Google Scholar 

  23. Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400

  24. Liu Y, Yin B, Yu J, Wang Z (2015) Cross-level: a practical strategy for convolutional neural networks based image classification. In: CCF Chinese conference on computer vision, pp 398–406

  25. Long X, Lu H, Li W (2014) Image classification based on nearest neighbor basis vectors. Mulitimed Tools Appl 71:1559–1576

    Article  Google Scholar 

  26. Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110

    Article  Google Scholar 

  27. Qu Y, Wu S, Liu H, Xie Y, Wang H (2014) Evaluation of local features and classifiers in BOW model for image classification. Mulitimed Tools Appl 70:605–624

    Article  Google Scholar 

  28. Sermanet P, LeCun Y (2011) Traffic sign recognition with multi-scale convolutional networks. In: International joint conference on neural networks, pp 2809–2813

  29. Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409-1556

  30. Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision (ICCV), pp 1470–1477

  31. Spirkovska L, Reid M B (1992) Robust position, scale, and rotation invariant object recognition using higher-order neural networks. Pattern Recog 25:975–985

    Article  Google Scholar 

  32. Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: IEEE International conference on computer vision and pattern recognition (CVPR), pp 1891–1898

  33. Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409-4842

  34. Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3360–3367

  35. Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1794–1801

  36. Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision (ECCV), Part I, pp 818–833

Download references

Acknowledgments

The authors would like to thank the editors and anonymous reviewers for their constructive comments and valuable suggestions. This work was supported by the National Natural Science Foundation of China (No. 61472393 and No. 61303150), the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2012GB102007), and the Anhui Province Initiative Funds on Intelligent Speech Technology and Industrialization (No. 13Z02008). The authors greatly acknowledge the support of IFLYTEK CO.,LTD.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zengfu Wang.

Ethics declarations

Conflict of interests

The authors declare that they have no conflict of interest.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Yin, B., Yu, J. et al. Image classification based on convolutional neural networks with cross-level strategy. Multimed Tools Appl 76, 11065–11079 (2017). https://doi.org/10.1007/s11042-016-3540-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-016-3540-x

Keywords

Navigation