Hyperfeatures – Multilevel Local Coding for Visual Recognition

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 3951)


Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant and have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics at scales larger than their local input patches. We present a new multilevel visual representation, ‘hyperfeatures’, that is designed to remedy this. The starting point is the familiar notion that to detect object parts, in practice it often suffices to detect co-occurrences of more local object fragments – a process that can be formalized as comparison (e.g. vector quantization) of image patches against a codebook of known fragments, followed by local aggregation of the resulting codebook membership vectors to detect co-occurrences. This process converts local collections of image descriptor vectors into somewhat less local histogram vectors – higher-level but spatially coarser descriptors. We observe that as the output is again a local descriptor vector, the process can be iterated, and that doing so captures and codes ever larger assemblies of object parts and increasingly abstract or ‘semantic’ image properties. We formulate the hyperfeatures model and study its performance under several different image coding methods including clustering based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks.


Vector Quantization Latent Dirichlet Allocation Image Patch Convolutional Neural Network Visual Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Center for Research in Language, International Picture Naming Project, Available from
  2. 2.
    Agarwal, A., Triggs, B.: Hyperfeatures – Multilevel Local Coding for Visual Recognition. Technical report, INRIA Rhône Alpes (2005)Google Scholar
  3. 3.
    Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. PAMI 26(11), 1475–1490 (2004)CrossRefGoogle Scholar
  4. 4.
    Berg, A., Malik, J.: Geometric Blur for Template Matching. In: Int. Conf. Computer Vision & Pattern Recognition (2001)Google Scholar
  5. 5.
    Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)zbMATHGoogle Scholar
  6. 6.
    Bouman, C.A.: Cluster: An unsupervised algorithm for modeling Gaussian mixtures. (April 1997), Available from
  7. 7.
    Buntine, W., Jakaulin, A.: Discrete principal component analysis. Technical report, HIIT (2005)Google Scholar
  8. 8.
    Buntine, W., Perttu, S.: Is multinomial pca multi-faceted clustering or dimensionality reduction? AI and Statistics (2003)Google Scholar
  9. 9.
    Canny, J.: Gap: A factor model for discrete data. In: ACM Conference on Information Retrieval (SIGIR), Sheffield, UK (2004)Google Scholar
  10. 10.
    Visual Object Classes Challenge. The PASCAL Object Recognition Database Collection, Available at
  11. 11.
    Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: European Conf. Computer Vision (2004)Google Scholar
  12. 12.
    Dorko, G., Schmid, C.: Object class recognition using discriminative local features. Technical report, INRIA Rhône Alpes (2005)Google Scholar
  13. 13.
    Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Int. Conf. Computer Vision & Pattern Recognition (2005)Google Scholar
  14. 14.
    Ferencz, A., Learned-Miller, E., Malik, J.: Learning Hyper-Features for Visual Identification. In: Neural Information Processing Systems (2004)Google Scholar
  15. 15.
    Fergus, R., Perona, P.: The Caltech database, Available at
  16. 16.
    Fritz, M., Hayman, E., Caputo, B., Eklundh, J.-O.: The KTH-TIPS database, Available at
  17. 17.
    Hayman, E., Caputo, B., Fritz, M., Eklundh, J.-O.: On the Significance of Real-World Conditions for Material Classification. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 253–266. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  18. 18.
    Fukushima, K.: Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernetics 36(4), 193–202 (1980)CrossRefzbMATHGoogle Scholar
  19. 19.
    Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Alvey Vision Conference, pp. 147–151 (1988)Google Scholar
  20. 20.
    Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of Uncertainty in Artificial Intelligence, Stockholm (1999)Google Scholar
  21. 21.
    Joachims, T.: Making large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge (1999)Google Scholar
  22. 22.
    Jurie, F., Triggs, B.: Creating Efficient Codebooks for Visual Recognition. In: Int. Conf. Computer Vision (2005)Google Scholar
  23. 23.
    Kadir, T., Brady, M.: Saliency, Scale and Image Description. Int. J. Computer Vision 45(2), 83–105 (2001)CrossRefzbMATHGoogle Scholar
  24. 24.
    Keller, M., Bengio, S.: Theme-Topic Mixture Model for Document Representation. In: PASCAL Workshop on Learning Methods for Text Understanding and Mining (2004)Google Scholar
  25. 25.
    Lang, G., Seitz, P.: Robust Classification of Arbitrary Object Classes Based on Hierarchical Spatial Feature-Matching. Machine Vision and Applications 10(3), 123–135 (1997)CrossRefGoogle Scholar
  26. 26.
    Lazebnik, S., Schmid, C., Ponce, J.: Affine-Invariant Local Descriptors and Neighborhood Statistics for Texture Recognition. In: Int. Conf. Computer Vision (2003)Google Scholar
  27. 27.
    Lazebnik, S., Schmid, C., Ponce, J.: Semi-local Affine Parts for Object Recognition. In: British Machine Vision Conference, vol. 2, pp. 779–788 (2004)Google Scholar
  28. 28.
    LeCun, Y., Huang, F.-J., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: CVPR (2004)Google Scholar
  29. 29.
    Leung, T., Malik, J.: Recognizing Surfaces Using Three-Dimensional Textons. In: Int. Conf. Computer Vision (1999)Google Scholar
  30. 30.
    Lowe, D.: Distinctive Image Features from Scale-invariant Keypoints. Int. J. Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  31. 31.
    Malik, J., Perona, P.: Preattentive texture discrimination with early vision mechanisms. J. Optical Society of America A 7(5), 923–932 (1990)CrossRefGoogle Scholar
  32. 32.
    Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Analysis & Machine Intelligence 27(10) (2005)Google Scholar
  33. 33.
    K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors. IJCV, 65(1/2), 2005.Google Scholar
  34. 34.
    Mori, G., Malik, J.: Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA. In: Int. Conf. Computer Vision & Pattern Recognition (2003)Google Scholar
  35. 35.
    Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: The Graz image databases, available at
  36. 36.
    Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 71–84. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  37. 37.
    Puzicha, J., Hofmann, T., Buhmann, J.: Histogram Clustering for Unsupervised Segmentation and Image Retrieval. Pattern Recognition Letters 20, 899–909 (1999)CrossRefGoogle Scholar
  38. 38.
    Riesenhuber, M., Poggio, T.: Hierarchical Models of Object Recognition in Cortex. Nature Neuroscience 2, 1019–1025 (1999)CrossRefGoogle Scholar
  39. 39.
    Schaffalitzky, F., Zisserman, A.: Viewpoint invariant texture matching and wide baseline stereo. In: Int. Conf. Computer Vision, Vancouver, pp. 636–643 (2001)Google Scholar
  40. 40.
    Schiele, B., Crowley, J.: Recognition without Correspondence using Multidimensional Receptive Field Histograms. Int. J. Computer Vision 36(1), 31–50 (2000)CrossRefGoogle Scholar
  41. 41.
    Schiele, B., Pentland, A.: Probabilistic Object Recognition and Localization. In: Int. Conf. Computer Vision (1999)Google Scholar
  42. 42.
    Schmid, C.: Weakly supervised learning of visual models and its application to content-based retrieval. Int. J. Computer Vision 56(1), 7–16 (2004)CrossRefGoogle Scholar
  43. 43.
    Schmid, C., Mohr, R.: Local Grayvalue Invariants for Image Retrieval. IEEE Trans. Pattern Analysis & Machine Intelligence 19(5), 530–534 (1997)CrossRefGoogle Scholar
  44. 44.
    Varma, M., Zisserman, A.: Texture Classification: Are filter banks necessary? In: Int. Conf. Computer Vision & Pattern Recognition (2003)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  1. 1.GRAVIR-INRIA-CNRSMontbonnotFrance

Personalised recommendations