Abstract
Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant and have good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics at scales larger than their local input patches. We present a new multilevel visual representation, ‘hyperfeatures’, that is designed to remedy this. The starting point is the familiar notion that to detect object parts, in practice it often suffices to detect co-occurrences of more local object fragments – a process that can be formalized as comparison (e.g. vector quantization) of image patches against a codebook of known fragments, followed by local aggregation of the resulting codebook membership vectors to detect co-occurrences. This process converts local collections of image descriptor vectors into somewhat less local histogram vectors – higher-level but spatially coarser descriptors. We observe that as the output is again a local descriptor vector, the process can be iterated, and that doing so captures and codes ever larger assemblies of object parts and increasingly abstract or ‘semantic’ image properties. We formulate the hyperfeatures model and study its performance under several different image coding methods including clustering based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks.
Chapter PDF
Similar content being viewed by others
Keywords
- Vector Quantization
- Latent Dirichlet Allocation
- Image Patch
- Convolutional Neural Network
- Visual Recognition
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
References
Center for Research in Language, International Picture Naming Project, Available from http://crl.ucsd.edu/~aszekely/ipnp/index.html
Agarwal, A., Triggs, B.: Hyperfeatures – Multilevel Local Coding for Visual Recognition. Technical report, INRIA Rhône Alpes (2005)
Agarwal, S., Awan, A., Roth, D.: Learning to detect objects in images via a sparse, part-based representation. PAMI 26(11), 1475–1490 (2004)
Berg, A., Malik, J.: Geometric Blur for Template Matching. In: Int. Conf. Computer Vision & Pattern Recognition (2001)
Blei, D., Ng, A., Jordan, M.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3, 993–1022 (2003)
Bouman, C.A.: Cluster: An unsupervised algorithm for modeling Gaussian mixtures. (April 1997), Available from http://www.ece.purdue.edu/~bouman
Buntine, W., Jakaulin, A.: Discrete principal component analysis. Technical report, HIIT (2005)
Buntine, W., Perttu, S.: Is multinomial pca multi-faceted clustering or dimensionality reduction? AI and Statistics (2003)
Canny, J.: Gap: A factor model for discrete data. In: ACM Conference on Information Retrieval (SIGIR), Sheffield, UK (2004)
Visual Object Classes Challenge. The PASCAL Object Recognition Database Collection, Available at www.pascal-network.org/challenges/VOC
Csurka, G., Bray, C., Dance, C., Fan, L.: Visual categorization with bags of keypoints. In: European Conf. Computer Vision (2004)
Dorko, G., Schmid, C.: Object class recognition using discriminative local features. Technical report, INRIA Rhône Alpes (2005)
Fei-Fei, L., Perona, P.: A bayesian hierarchical model for learning natural scene categories. In: Int. Conf. Computer Vision & Pattern Recognition (2005)
Ferencz, A., Learned-Miller, E., Malik, J.: Learning Hyper-Features for Visual Identification. In: Neural Information Processing Systems (2004)
Fergus, R., Perona, P.: The Caltech database, Available at www.vision.caltech.edu/html-files/archive.html
Fritz, M., Hayman, E., Caputo, B., Eklundh, J.-O.: The KTH-TIPS database, Available at www.nada.kth.se/cvap/databases/kth-tips
Hayman, E., Caputo, B., Fritz, M., Eklundh, J.-O.: On the Significance of Real-World Conditions for Material Classification. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3024, pp. 253–266. Springer, Heidelberg (2004)
Fukushima, K.: Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biol. Cybernetics 36(4), 193–202 (1980)
Harris, C., Stephens, M.: A Combined Corner and Edge Detector. In: Alvey Vision Conference, pp. 147–151 (1988)
Hofmann, T.: Probabilistic Latent Semantic Analysis. In: Proc. of Uncertainty in Artificial Intelligence, Stockholm (1999)
Joachims, T.: Making large-Scale SVM Learning Practical. In: Advances in Kernel Methods - Support Vector Learning, MIT Press, Cambridge (1999)
Jurie, F., Triggs, B.: Creating Efficient Codebooks for Visual Recognition. In: Int. Conf. Computer Vision (2005)
Kadir, T., Brady, M.: Saliency, Scale and Image Description. Int. J. Computer Vision 45(2), 83–105 (2001)
Keller, M., Bengio, S.: Theme-Topic Mixture Model for Document Representation. In: PASCAL Workshop on Learning Methods for Text Understanding and Mining (2004)
Lang, G., Seitz, P.: Robust Classification of Arbitrary Object Classes Based on Hierarchical Spatial Feature-Matching. Machine Vision and Applications 10(3), 123–135 (1997)
Lazebnik, S., Schmid, C., Ponce, J.: Affine-Invariant Local Descriptors and Neighborhood Statistics for Texture Recognition. In: Int. Conf. Computer Vision (2003)
Lazebnik, S., Schmid, C., Ponce, J.: Semi-local Affine Parts for Object Recognition. In: British Machine Vision Conference, vol. 2, pp. 779–788 (2004)
LeCun, Y., Huang, F.-J., Bottou, L.: Learning Methods for Generic Object Recognition with Invariance to Pose and Lighting. In: CVPR (2004)
Leung, T., Malik, J.: Recognizing Surfaces Using Three-Dimensional Textons. In: Int. Conf. Computer Vision (1999)
Lowe, D.: Distinctive Image Features from Scale-invariant Keypoints. Int. J. Computer Vision 60(2), 91–110 (2004)
Malik, J., Perona, P.: Preattentive texture discrimination with early vision mechanisms. J. Optical Society of America A 7(5), 923–932 (1990)
Mikolajczyk, K., Schmid, C.: A performance evaluation of local descriptors. IEEE Trans. Pattern Analysis & Machine Intelligence 27(10) (2005)
K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. A comparison of affine region detectors. IJCV, 65(1/2), 2005.
Mori, G., Malik, J.: Recognizing Objects in Adversarial Clutter: Breaking a Visual CAPTCHA. In: Int. Conf. Computer Vision & Pattern Recognition (2003)
Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: The Graz image databases, available at http://www.emt.tugraz.at/~pinz/data/
Opelt, A., Fussenegger, M., Pinz, A., Auer, P.: Weak hypotheses and boosting for generic object detection and recognition. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3022, pp. 71–84. Springer, Heidelberg (2004)
Puzicha, J., Hofmann, T., Buhmann, J.: Histogram Clustering for Unsupervised Segmentation and Image Retrieval. Pattern Recognition Letters 20, 899–909 (1999)
Riesenhuber, M., Poggio, T.: Hierarchical Models of Object Recognition in Cortex. Nature Neuroscience 2, 1019–1025 (1999)
Schaffalitzky, F., Zisserman, A.: Viewpoint invariant texture matching and wide baseline stereo. In: Int. Conf. Computer Vision, Vancouver, pp. 636–643 (2001)
Schiele, B., Crowley, J.: Recognition without Correspondence using Multidimensional Receptive Field Histograms. Int. J. Computer Vision 36(1), 31–50 (2000)
Schiele, B., Pentland, A.: Probabilistic Object Recognition and Localization. In: Int. Conf. Computer Vision (1999)
Schmid, C.: Weakly supervised learning of visual models and its application to content-based retrieval. Int. J. Computer Vision 56(1), 7–16 (2004)
Schmid, C., Mohr, R.: Local Grayvalue Invariants for Image Retrieval. IEEE Trans. Pattern Analysis & Machine Intelligence 19(5), 530–534 (1997)
Varma, M., Zisserman, A.: Texture Classification: Are filter banks necessary? In: Int. Conf. Computer Vision & Pattern Recognition (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2006 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Agarwal, A., Triggs, B. (2006). Hyperfeatures – Multilevel Local Coding for Visual Recognition. In: Leonardis, A., Bischof, H., Pinz, A. (eds) Computer Vision – ECCV 2006. ECCV 2006. Lecture Notes in Computer Science, vol 3951. Springer, Berlin, Heidelberg. https://doi.org/10.1007/11744023_3
Download citation
DOI: https://doi.org/10.1007/11744023_3
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-33832-1
Online ISBN: 978-3-540-33833-8
eBook Packages: Computer ScienceComputer Science (R0)