Abstract
In the past few years, convolutional neural networks (CNNs) have exhibited great potential in the field of image classification. In this paper, we present a novel strategy named cross-level to improve the existing networks’ architecture in which different levels of feature representation in a network are merely connected in series. The basic idea of cross-level is to establish a convolutional layer between two nonadjacent levels, aiming to extract more sufficient features with multiple scales at each feature representation level. The proposed cross-level strategy can be naturally integrated into an existing network without any change on its original architecture, which makes it very practical and convenient. Four popular convolutional networks for image classification are employed to illustrate its implementation in detail. Experimental results on the dataset adopted by the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC) verify the effectiveness of the cross-level strategy on image classification. Furthermore, a new convolutional network with cross-level architecture is presented to demonstrate the potential of the proposed strategy in future network design.
Similar content being viewed by others
References
Bengio Y, Courville A, Vincent P (2013) Representation learning: a review and new perspectives. IEEE Trans Pattern Anal Mach Intell 35:1798–1828
Caffe website: http://caffe.berkeleyvision.org
Caffe model zoo: http://caffe.berkeleyvision.org/model_zoo.html
Caffe model zoo wiki page: https://github.com/BVLC/caffe/wiki/Model-Zoo
Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20(3):273–297
Dalal N, Triggs B (2005) Histograms of oriented gradients for human detection. In: IEEE Conference on computer vision and pattern recognition (CVPR), vol 1, pp 886–893
Desimone R, Duncan J (1995) Neural mechanisms of selective visual attention. Ann Rev Neurosci 18:193–222
Fan J, Xu W, Wu Y, Gong Y (2010) Human tracking using convolutional neural networks. IEEE Trans Neural Netw 21:1610–1623
Freund Y, Schapire R (1995) A desicion-theoretic generalization of on-line learning and an application to boosting. In: Computational learning theory, pp 23–37
He K, Zhang X, Ren S, Sun J (2014) Spatial pyramid pooling in deep convolutional networks for visual recognition. In: European conference on computer vision (ECCV), pp 346–361
He K, Zhang X, Ren S, Sun J (2015) Delving deep into rectifiers: surpassing human-level performance on imageNet classification. arXiv:1502.01852
Hinton GE, Salakhutdinov RR (2006) Reducing the dimensionality of data with neural networks. Science 313:504–507
Hinton GE, Srivastava N, Krizhevsky A, Sutskever I, Salakhutdinov RR (2012) Improving neural net-works by preventing co-adaptation of feature detectors. arXiv:1207.0580
ImageNet Website: http://www.image-net.org/
Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: ACM International conference on multimedia, pp 675–678
Krizhevsky A, Sutskever I, Hinton GE (2012) ImageNet classification with deep convoluntional neural networks. In: Advances in neural information processing systems (NIPS), vol 25, pp 1106–1114
Lazebnik S, Schmid C, Ponce J (2006) Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: IEEE Conference on computer vision and pattern recognition (CVPR), vol 2, pp 2169–2178
LeCun Y, Boser B, Denker JS, Henderson D, Howard RE, Hubbard W, Jackel LD (1989) Backpropagation applied to handwritten zip code recognition. Neural Comput 1:541–551
LeCun Y, Bottou L, Bengio Y, Haffner P (1998) Gradient-based learning applied to document recognition. Proc IEEE 86:2278–2324
LeCun Y, Kavukcuoglu K, Farabet C (2010) Convolutional networks and applications in vision. In: IEEE International symposium on circuits and systems, pp 254–256
Lee C, Xie S, Gallagher P, Zhang Z, Tu Z (2014) Deeply-supervised networks. arXiv:1409.5185
Li F F, Fergus R, Perona P (2007) Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. Comput Vis Image Understand 106:59–70
Lin M, Chen Q, Yan S (2013) Network in network. arXiv:1312.4400
Liu Y, Yin B, Yu J, Wang Z (2015) Cross-level: a practical strategy for convolutional neural networks based image classification. In: CCF Chinese conference on computer vision, pp 398–406
Long X, Lu H, Li W (2014) Image classification based on nearest neighbor basis vectors. Mulitimed Tools Appl 71:1559–1576
Lowe D (2004) Distinctive image features from scale-invariant keypoints. Int J Comput Vis 60:91–110
Qu Y, Wu S, Liu H, Xie Y, Wang H (2014) Evaluation of local features and classifiers in BOW model for image classification. Mulitimed Tools Appl 70:605–624
Sermanet P, LeCun Y (2011) Traffic sign recognition with multi-scale convolutional networks. In: International joint conference on neural networks, pp 2809–2813
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409-1556
Sivic J, Zisserman A (2003) Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision (ICCV), pp 1470–1477
Spirkovska L, Reid M B (1992) Robust position, scale, and rotation invariant object recognition using higher-order neural networks. Pattern Recog 25:975–985
Sun Y, Wang X, Tang X (2014) Deep learning face representation from predicting 10,000 classes. In: IEEE International conference on computer vision and pattern recognition (CVPR), pp 1891–1898
Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2014) Going deeper with convolutions. arXiv:1409-4842
Wang JJ, Yang JC, Yu K, Lv FJ, Huang T, Gong YH (2010) Locality-constrained linear coding for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 3360–3367
Yang JC, Yu K, Gong YH, Huang T (2009) Linear spatial pyramid matching using sparse coding for image classification. In: IEEE Conference on computer vision and pattern recognition (CVPR), pp 1794–1801
Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision (ECCV), Part I, pp 818–833
Acknowledgments
The authors would like to thank the editors and anonymous reviewers for their constructive comments and valuable suggestions. This work was supported by the National Natural Science Foundation of China (No. 61472393 and No. 61303150), the National Science and Technology Major Project of the Ministry of Science and Technology of China (No. 2012GB102007), and the Anhui Province Initiative Funds on Intelligent Speech Technology and Industrialization (No. 13Z02008). The authors greatly acknowledge the support of IFLYTEK CO.,LTD.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interests
The authors declare that they have no conflict of interest.
Rights and permissions
About this article
Cite this article
Liu, Y., Yin, B., Yu, J. et al. Image classification based on convolutional neural networks with cross-level strategy. Multimed Tools Appl 76, 11065–11079 (2017). https://doi.org/10.1007/s11042-016-3540-x
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-016-3540-x