Convolutional Neural Networks Features: Principal Pyramidal Convolution

  • Yanming GuoEmail author
  • Songyang Lao
  • Yu Liu
  • Liang Bai
  • Shi Liu
  • Michael S. Lew
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 9314)


The features extracted from convolutional neural networks (CNNs) are able to capture the discriminative part of an image and have shown superior performance in visual recognition. Furthermore, it has been verified that the CNN activations trained from large and diverse datasets can act as generic features and be transferred to other visual recognition tasks. In this paper, we aim to learn more from an image and present an effective method called Principal Pyramidal Convolution (PPC). The scheme first partitions the image into two levels, and extracts CNN activations for each sub-region along with the whole image, and then aggregates them together. The concatenated feature is later reduced to the standard dimension using Principal Component Analysis (PCA) algorithm, generating the refined CNN feature. When applied in image classification and retrieval tasks, the PPC feature consistently outperforms the conventional CNN feature, regardless of the network type where they derive from. Specifically, PPC achieves state-of-the-art result on the MIT Indoor67 dataset, utilizing the activations from Places-CNN.


Convolutional neural networks Concatenate Principal component analysis Image classification Image retrieval 


  1. 1.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: ECCV (2014)Google Scholar
  2. 2.
    Liu, Y., Guo, Y., Wu, S., Lew, M.S.: DeepIndex for accurate and efficient image retrieval. In: ICMR (2015)Google Scholar
  3. 3.
    Girshick, R., Donahue, J., Darrell, T., et al.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: CVPR (2014)Google Scholar
  4. 4.
    Thomee, B., Lew, M.S.: Interactive search in image retrieval: a survey. Int. J. Multimedia Inf. Retrieval 1(2), 71–86 (2012)CrossRefGoogle Scholar
  5. 5.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. Int. J. Comput. Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  6. 6.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: CVPR (2005)Google Scholar
  7. 7.
    Deng, J., Dong, W., Socher, R., et al.: Imagenet: a large-scale hierarchical image database. In: CVPR (2009)Google Scholar
  8. 8.
    Zhou, B., Lapedriza, A., Xiao, J., et al.: Learning deep features for scene recognition using places database. In: NIPS (2014)Google Scholar
  9. 9.
    Seber, G.A.F.: Multivariate observations. Wiley, New York (2009)zbMATHGoogle Scholar
  10. 10.
    Oquab, M., Bottou, L., Laptev, I., et al.: Learning and transferring mid-level image representations using convolutional neural networks. In: CVPR (2014)Google Scholar
  11. 11.
    Philbin, J., Chum, O., Isard, M., et al.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  12. 12.
    Yosinski, J., Clune, J., Bengio, Y., et al.: How transferable are features in deep neural networks? In: NIPS (2014)Google Scholar
  13. 13.
    Gong, Y., Wang, L., Guo, R., et al.: Multi-scale orderless pooling of deep convolutional activation features. In: ECCV (2014)Google Scholar
  14. 14.
    Koskela, M., Laaksonen, J.: Convolutional network features for scene recognition. In: ACM Multimedia (2014)Google Scholar
  15. 15.
    Lazebnik, S., Schmid, C., Ponce, J.: Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In: CVPR (2006)Google Scholar
  16. 16.
    Yang, J., Yu, K., Gong, Y., et al.: Linear spatial pyramid matching using sparse coding for image classification. In: CVPR (2009)Google Scholar
  17. 17.
    Wang, J., Yang, J., Yu, K., et al.: Locality-constrained linear coding for image classification. In: CVPR (2010)Google Scholar
  18. 18.
    Jolliffe, I.: Principal Component Analysis. Wiley Online Library, New York (2005)CrossRefzbMATHGoogle Scholar
  19. 19.
    Jia, Y., Shelhamer, E., Donahue, J., et al.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia (2014)Google Scholar
  20. 20.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVIU (2007)Google Scholar
  21. 21.
    Quattoni, A., Torralba, A.: Recognizing indoor scenes. In: CVPR (2009)Google Scholar
  22. 22.
    Jegou, H., Douze, M., Schmid, C.: Hamming embedding and weak geometric consistency for large scale image search. In: ECCV (2008)Google Scholar
  23. 23.
    Chang, C.C., Lin, C.J.: LIBSVM: a library for support vector machines. In: ACM TIST (2011)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Yanming Guo
    • 1
    • 2
    Email author
  • Songyang Lao
    • 2
  • Yu Liu
    • 1
  • Liang Bai
    • 2
  • Shi Liu
    • 3
  • Michael S. Lew
    • 1
  1. 1.LIACS Media LabLeiden UniversityLeidenThe Netherlands
  2. 2.Science and Technology on Information Systems Engineering LaboratoryNational University of Defense TechnologyChangshaChina
  3. 3.School of Arts and MediaBeijing Normal UniversityBeijingChina

Personalised recommendations