World Wide Web

, Volume 22, Issue 2, pp 423–436 | Cite as

On fusing the latent deep CNN feature for image classification

  • Xueliang Liu
  • Rongjie Zhang
  • Zhijun MengEmail author
  • Richang Hong
  • Guangcan Liu
Part of the following topical collections:
  1. Special Issue on Deep vs. Shallow: Learning for Emerging Web-scale Data Computing and Applications


Image classification, which aims at assigning a semantic category to images, has been extensively studied during the past few years. More recently, convolution neural network arises and has achieved very promising achievement. Compared with traditional feature extraction techniques (e.g., SIFT, HOG, GIST), the convolutional neural network can extract features from image automatically and does not need hand designed features. However, how to further improve the classification algorithm is still challenging in academic research. The latest research on CNN shows that the features extracted from middle layers is representative, which shows a possible way to improve the classification accuracy. Based on the observation, in this paper, we propose a method to fuse the latent features extracted from the middle layers in a CNN to train a more robust classifier. First, we utilize the pretrained CNN models to extract visual features from middle layer. Then, we use supervised learning method to train classifiers for each feature respectively. Finally, we use the late fusion strategy to combine the prediction of these classifiers. We evaluate the proposal with different classification methods under some several images benchmarks, and the results demonstrate that the proposed method can improve the performance effectively.


Image classification Convolutional neural network Late fusion 



This work was supported by the National Natural Science Foundation of China (NSFC) under grants 61632007 and 61502139.


  1. 1.
    Anderson, J.R., Matessa, M.: Explorations of an incremental, bayesian algorithm for categorization. Mach. Learn. 9(4), 275–308 (1992)Google Scholar
  2. 2.
    Babenko, A., Slesarev, A., Chigorin, A., Lempitsky, V.: Neural codes for image retrieval. 8689:584–599 (2014)Google Scholar
  3. 3.
    Boyd, S., Vandenberghe, L.: Convex optimization. Cambridge University Press, Cambridge (2004)CrossRefzbMATHGoogle Scholar
  4. 4.
    Buf, J.M.H., Kardan, M., Spann, M.: Texture feature performance for image segmentation. Pattern Recogn. 23(3C4), 291–309 (1990)CrossRefGoogle Scholar
  5. 5.
    Chang, C.C., Lin, J.C.: LIBSVM: A library for support vector machines. ACM (2011)Google Scholar
  6. 6.
    Chen, W.S., Dai, X., Pan, B., Huang, T.: A novel discriminant criterion based on feature fusion strategy for face recognition. Neurocomputing 159(1), 67–77 (2015)Google Scholar
  7. 7.
    Chowdhury, S., Verma, B., Stockwell, D.: A novel texture feature based multiple classifier technique for roadside vegetation classification. Expert Syst. Appl. 42(12), 5047–5055 (2015)CrossRefGoogle Scholar
  8. 8.
    Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. J. Mach. Learn. Res. 15, 215–223 (2011)Google Scholar
  9. 9.
    Le Cun, Y., Boser, B., Denker, J., Howard, R., Habbard, W, Jackel, L., Henderson, D.: Handwritten digit recognition with a back-propagation network. In: Advances in neural information processing systems, pp. 396–404 (1990)Google Scholar
  10. 10.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: IEEE computer society conference on computer vision and pattern recognition CVPR 2005, pp. 886–893 (2005)Google Scholar
  11. 11.
    Delac, K., Grgic, M., Grgic, S.: Statistics in face recognition: analyzing probability distributions of pca, ica and lda performance results. In: International symposium on image and signal processing and analysis, pp. 289–294 (2005)Google Scholar
  12. 12.
    Deng, J., Dong, W., Socher, R., Li, J.L., Li, K., Li, F.F.: Imagenet: A large-scale hierarchical image database. In: IEEE conference on computer vision and pattern recognition, pp. 248–255 (2009)Google Scholar
  13. 13.
    Donahue, J., Jia, Y., Vinyals, O., Hoffman, J., Zhang, N., Tzeng, E., Decaf, T.D.: A deep convolutional activation feature for generic visual recognition. In: International conference on machine learning, pp. 647–655 (2014)Google Scholar
  14. 14.
    Gao, L., Guo, Z., Zhang, H., Xing, X.U., Shen, H.T.: Video captioning with attention-based lstm and semantic consistency. IEEE Trans. Multimed. 19(9), 2045–2055 (2017)CrossRefGoogle Scholar
  15. 15.
    Gevers, T.H., van de Weijer, J., Stokman, H.M.G.: Color feature detection: An overview. Color Image Process. Methods Appl. 2, II– 714–17 (2006)Google Scholar
  16. 16.
    Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation, pp. 580–587 (2013)Google Scholar
  17. 17.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE conference on computer vision and pattern recognition, pp. 770–778 (2015)Google Scholar
  18. 18.
    Jaeger, M., Fawcett, T., Mishra, N.: Probabilistic classifiers and the concepts they recognize. In: 20th international conference on machine learning, pp. 266–273 (2003)Google Scholar
  19. 19.
    Jarrett, K., Kavukcuoglu, K., Marc’Aurelio, R., Lecun, Y.: What is the best multi-stage architecture for object recognition?. In: IEEE international conference on computer vision, pp. 2146–2153 (2010)Google Scholar
  20. 20.
    Jin, H., Liu, Q., Lu, H., Tong, X.: Face detection using improved lbp under bayesian framework. In: International conference on image and graphics, pp. 306–309 (2004)Google Scholar
  21. 21.
    Kataoka, H., Iwata, K., Satoh, Y.: Feature evaluation of deep convolutional neural networks for object recognition and detection. arXiv:1509.07627 (2015)
  22. 22.
    Kim, K.M., Park, J.J., Song, M.H., In, C.K., Suen, C.Y.: Binary decision tree using genetic algorithm for recognizing defect patterns of cold mill strip. Lect. Notes Comput. Sci 3029, 341–350 (2004)CrossRefGoogle Scholar
  23. 23.
    Kinnunen, T., Kamarainen, J.K., Lensu, L., Lankinen, J., Kalviainen, H.: Making visual object categorization more challenging: Randomized caltech-101 data set. In: International conference on pattern recognition, pp. 476–479 (2010)Google Scholar
  24. 24.
    Krizhevsky, A.: Learning multiple layers of features from tiny images. Technical report. University of Toronto, Toronto (2009)Google Scholar
  25. 25.
    Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: International conference on neural information processing systems, pp. 1097–1105 (2012)Google Scholar
  26. 26.
    Fukushima, K., Miyake, S., Ito, T.: Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Syst. Man Cybern. SMC-13 (5), 826–834 (1983)CrossRefGoogle Scholar
  27. 27.
    Lcun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86(11), 2278–2324 (1998)CrossRefGoogle Scholar
  28. 28.
    Lee, S.J., Kim, H.J., Song, J.M.: Scalable encoding method of color histogram (2005)Google Scholar
  29. 29.
    Li, Z., Liu, J., Tang, J., Hanqing, L.U.: Robust structured subspace learning for data representation. IEEE Trans. Pattern Anal. Mach. Intell. 37(10), 2085–2098 (2015)CrossRefGoogle Scholar
  30. 30.
    Li, Z., Liu, J., Yi, Y., Zhou, X., Hanqing, L.U.: Clustering-guided sparse structural learning for unsupervised feature selection. IEEE Trans. Knowl. Data Eng. 26(9), 2138–2150 (2014)CrossRefGoogle Scholar
  31. 31.
    Li, Z., Tang, J.: Weakly supervised deep metric learning for community-contributed image retrieval. IEEE Trans. Multimed. 17(11), 1989–1999 (2015)CrossRefGoogle Scholar
  32. 32.
    Li, Z., Tang, J.: Weakly supervised deep matrix factorization for social image understanding. IEEE Press (2017)Google Scholar
  33. 33.
    Lin, M., Chen, Q., Yan, S.: Network in network. arXiv:1312.4400 (2013)
  34. 34.
    Liu, C., Wechsler, H: A shape- and texture-based enhanced fisher classifier for face recognition. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 10(4), 598–608 (2001)zbMATHGoogle Scholar
  35. 35.
    Di, L., Sun, D.M., Qiu, Z.D.: Wavelet decomposition 4-feature parallel fusion by quaternion euclidean product distance matching score for palmprint verification. In: International Conference on Signal Processing, pp. 2104–2107 (2008)Google Scholar
  36. 36.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Kluwer Academic Publishers, Dordrecht (2004)CrossRefGoogle Scholar
  37. 37.
    Dengsheng, L.U., Weng, Q.: A survey of image classification methods and techniques for improving classification performance. Int. J. Remote Sens. 28(5), 823–870 (2007)CrossRefGoogle Scholar
  38. 38.
    Ng, Y.H., Yang, F., Davis, L.: Exploiting local features from deep networks for image retrieval. In: IEEE conference on computer vision and pattern recognition workshops, pp. 53–61 (2015)Google Scholar
  39. 39.
    Nie, L., Wang, M., Zha, Z.J., Chua, T.S.: Oracle in image search A content-based approach to performance prediction. ACM Trans. Inf. Syst. 30(2), 13 (2012)CrossRefGoogle Scholar
  40. 40.
    Oliva, A., Torralba, A.: Modeling the Shape of the Scene A Holistic Representation of the Spatial Envelope. Kluwer Academic Publishers, Dordrecht (2001)Google Scholar
  41. 41.
    Oquab, M., Bottou, L., Laptev, I., Sivic, J.: Learning and transferring mid-level image representations using convolutional neural networks. In: Computer vision and pattern recognition, pp. 1717–1724 (2014)Google Scholar
  42. 42.
    Razavian, A.S., Azizpour, H., Sullivan, J., Carlsson, S.: CNN features off-the-shelf: An astounding baseline for recognition. In: IEEE conference on computer vision and pattern recognition workshops, pp. 512–519 (2014)Google Scholar
  43. 43.
    Schmidhuber, J.: Deep learning in neural networks: An overview. Neural Netw. 61, 85–117 (2015)CrossRefGoogle Scholar
  44. 44.
    Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. Computer Science (2014)Google Scholar
  45. 45.
    Song, J., Gao, L., Nie, F., Shen, H., Yan, Y., Sebe, N: Optimized graph learning using partial tags and multiple features for image and video annotation. IEEE Trans. Image Process. Publ. IEEE Signal Process. Soc. 25(11), 4999–5011 (2016)MathSciNetCrossRefzbMATHGoogle Scholar
  46. 46.
    Song, J., Yi, Y., Zi, H., Shen, H.T., Luo, J.: Effective multiple feature hashing for large-scale near-duplicate video retrieval. IEEE Trans. Multimed. 15(8), 1997–2008 (2013)CrossRefGoogle Scholar
  47. 47.
    Sun, J., Cai, X., Sun, F., Zhang, J.: Scene image classification method based on alex-net model. In: International conference on informative and cybernetics for computational social systems (2016)Google Scholar
  48. 48.
    Sun, Q.-S., Zeng, S.-G., Heng, P.-A., Xia, D.-S.: The theory of canonical correlation analysis and its application to feature fusion. Chin. J. Comput. 36(9), 1524–1533 (2005)MathSciNetGoogle Scholar
  49. 49.
    Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A.: Going deeper with convolutions. In: Computer vision and pattern recognition, pp. 1–9 (2015)Google Scholar
  50. 50.
    Vega-Rodriguez, M.A.: Review: Feature extraction and image processing. Comput. J. 44(2), 595–599 (2004)Google Scholar
  51. 51.
    Wang, S.: Application of tamura texture feature to classify underwater targets. Appl. Acoust. 31(2), 135–139 (2012)Google Scholar
  52. 52.
    Wang, X., Gao, L., Wang, P., Sun, X., Liu, X.: Two-stream 3d convnet fusion for action recognition in videos with arbitrary size and length. IEEE Trans. Multimed. PP(99), 1–1 (2017)Google Scholar
  53. 53.
    Wei, Y., Xia, W., Huang, J., Ni, B., Dong, J., Zhao, Y., Yan, S.: CNN: Single-label to multi-label. Computer Science (2014)Google Scholar
  54. 54.
    Xiong, H., Swamy, M.N.S., Ahmad, M.O.: Two-dimensional fld for face recognition. Pattern Recogn. 38(7), 1121–1124 (2005)CrossRefGoogle Scholar
  55. 55.
    Dan, X.U., Ricci, E., Yan, Y., Song, J., Sebe, N.: Learning deep representations of appearance and motion for anomalous event detection. arXiv:1510.01553 (2015)
  56. 56.
    Yang, J., Yang, J.Y., Zhang, D., Jian Feng, L.U.: Feature fusion: parallel strategy vs. serial strategy. Pattern Recogn 36(6), 1369–1381 (2003)CrossRefzbMATHGoogle Scholar
  57. 57.
    Yang, M., Kpalma, K., Ronsin, J.: A survey of shape feature extraction techniques. Pattern Recognition, pp. 43–90 (2008)Google Scholar
  58. 58.
    Zhao, J., Fan, Y., Fan, W.: Fusion of global and local features using kcca for automatic target recognition. In: 5th international conference on image and graphics, pp. 958–962 (2009)Google Scholar
  59. 59.
    Zhong, Y., Sullivan, J., Li, H.: Face attribute prediction with classification CNN. arXiv:1602.01827 (2016)
  60. 60.
    Zhong, Y., Sullivan, J., Li, H.: Leveraging mid-level deep representations for predicting face attributes in the wild. In: IEEE international conference on image processing (2016)Google Scholar
  61. 61.
    Zhou, X., Bhanu, B.: Feature fusion of side face and gait for video-based human identification. Pattern Recogn. 41(3), 778–795 (2008)CrossRefzbMATHGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018
corrected publication 2018

Authors and Affiliations

  1. 1.Hefei University of TechnologyHefeiChina
  2. 2.Beihang UniversityBeijingChina
  3. 3.Nanjing University of Information Science and TechnologyNanjingChina

Personalised recommendations