International Journal of Computer Vision

, Volume 78, Issue 1, pp 15–27 | Cite as

Multilevel Image Coding with Hyperfeatures

Article

Abstract

Histograms of local appearance descriptors are a popular representation for visual recognition. They are highly discriminant with good resistance to local occlusions and to geometric and photometric variations, but they are not able to exploit spatial co-occurrence statistics over scales larger than the local input patches. We present a multilevel visual representation that remedies this. The starting point is the notion that to detect object parts in images, in practice it often suffices to detect co-occurrences of more local object fragments. This can be formalized by coding image patches against a codebook of known fragments or a more general statistical model and locally histogramming the resulting labels to capture their co-occurrence statistics. Local patch descriptors are converted into somewhat less local histograms over label occurrences. The histograms are themselves local descriptor vectors so the process can be iterated to code ever larger assemblies of object parts and increasingly abstract or ‘semantic’ image properties. We call these higher-level descriptors “hyperfeatures”. We formulate the hyperfeature model and study its performance under several different image coding methods including k-means based Vector Quantization, Gaussian Mixtures, and combinations of these with Latent Dirichlet Allocation. We find that the resulting high-level features provide improved performance in several object image and texture image classification tasks.

Keywords

Image coding Hierarchical representations Visual recognition Image classification 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Agarwal, A., & Triggs, B. (2006). Hyperfeatures—multilevel local coding for visual recognition. In European conference on computer vision (pp. 30–43). Google Scholar
  2. Agarwal, S., Awan, A., & Roth, D. (2004). Learning to detect objects in images via a sparse, part-based representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(11), 1475–1490. CrossRefGoogle Scholar
  3. Berg, A., & Malik, J. (2001). Geometric blur for template matching. In International conference on computer vision & pattern recognition. Google Scholar
  4. Blei, D., Ng, A., & Jordan, M. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022. MATHCrossRefGoogle Scholar
  5. Bouman, C. A. (1997). Cluster: an unsupervised algorithm for modeling Gaussian mixtures. Available from http://www.ece.purdue.edu/~bouman, April 1997.
  6. Buntine, W., & Jakaulin, A. (2005). Discrete principal component analysis. Technical report, HIIT. Google Scholar
  7. Buntine, W., & Perttu, S. (2003). Is multinomial PCA multi-faceted clustering or dimensionality reduction? In AI and statistics. Google Scholar
  8. Canny, J. (2004). GaP: A factor model for discrete data. In ACM conference on information retrieval (SIGIR), Sheffield, UK. Google Scholar
  9. Csurka, G., Bray, C., Dance, C., & Fan, L. (2004). Visual categorization with bags of keypoints. In European conference on computer vision. Google Scholar
  10. Dorko, G., & Schmid, C. (2005). Object class recognition using discriminative local features. Technical report, INRIA Rhône Alpes. Google Scholar
  11. Everingham, M., et al. (2006). The 2005 PASCAL visual object classes challenge. In F. d’Alche Buc, I. Dagan, & J. Quinonero (Eds.), Springer Lecture notes in artificial intelligence. Proceedings of the first PASCAL challenges workshop. Berlin: Springer. Google Scholar
  12. Epshtein, B., & Ullman, S. (2005). Feature hierarchies for object classification. In International conference on computer vision. Google Scholar
  13. Fei-Fei, L., & Perona, P. (2005) A Bayesian hierarchical model for learning natural scene categories. In International conference on computer vision & pattern recognition. Google Scholar
  14. Ferencz, A., Learned-Miller, E., & Malik, J. (2004). Learning hyper-features for visual identification. In Neural information processing systems. Google Scholar
  15. Fritz, M., Hayman, E., Caputo, B., & Eklundh, J.-O. (2004). On the significance of real-world conditions for material classification. In European conference on computer vision. Google Scholar
  16. Fukushima, K. (1980). Neocognitron: a self organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36(4), 193–202. MATHCrossRefGoogle Scholar
  17. Harris, C., & Stephens, M. (1988). A combined corner and edge detector. In Alvey vision conference (pp. 147–151). Google Scholar
  18. Hofmann, T. (1999). Probabilistic latent semantic analysis. In Proceedings of uncertainty in artificial intelligence, Stockholm. Google Scholar
  19. Joachims, T. (1999). Making large-scale SVM learning practical. In Advances in kernel methods—support vector learning. London: MIT Press. Google Scholar
  20. Jurie, F., & Triggs, B. (2005). Creating efficient codebooks for visual recognition. In International conference on computer vision. Google Scholar
  21. Kadir, T., & Brady, M. (2001). Saliency, scale and image description. International Journal of Computer Vision, 45(2), 83–105. MATHCrossRefGoogle Scholar
  22. Keller, M., & Bengio, S. (2004). Theme-topic mixture model for document representation. In PASCAL workshop on learning methods for text understanding and mining. Google Scholar
  23. Lang, G., & Seitz, P. (1997). Robust classification of arbitrary object classes based on hierarchical spatial feature-matching. Machine Vision and Applications, 10(3), 123–135. CrossRefGoogle Scholar
  24. Lazebnik, S., Schmid, C., & Ponce, J. (2003). Affine-invariant local descriptors and neighborhood statistics for texture recognition. In International conference on computer vision. Google Scholar
  25. Lazebnik, S., Schmid, C., & Ponce, J. (2004). Semi-local affine parts for object recognition. In British machine vision conference (Vol. 2, pp. 779–788). Google Scholar
  26. Lazebnik, S., Schmid, C., & Ponce, J. (2006). Beyond bags of features: spatial pyramid matching for recognizing natural scene categories. In International conference on computer vision & pattern recognition. Google Scholar
  27. LeCun, Y., Huang, F.-J., & Bottou, L. (2004). Learning methods for generic object recognition with invariance to pose and lighting. In IEEE conference on computer vision and pattern recognition. Google Scholar
  28. Leung, T., & Malik, J. (1999). Recognizing surfaces using three-dimensional textons. In International conference on computer vision. Google Scholar
  29. Lowe, D. (2004). Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2), 91–110. CrossRefGoogle Scholar
  30. Malik, J., & Perona, P. (1990). Preattentive texture discrimination with early vision mechanisms. Journal of the Optical Society of America A, 7(5), 923–932. Google Scholar
  31. Mikolajczyk, K., & Schmid, C. (2005). A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10), 1615–1630. CrossRefGoogle Scholar
  32. Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., & Van Gool, L. (2005). A comparison of affine region detectors. International Journal of Computer Vision, 65(1–2), 43–72. CrossRefGoogle Scholar
  33. Mori, G., & Malik, J. (2003). Recognizing objects in adversarial clutter: breaking a visual CAPTCHA. In International conference on computer vision & pattern recognition. Google Scholar
  34. Mutch, J., & Lowe, D. (2006). Multiclass object recognition with sparse, localized features. In International conference on computer vision & pattern recognition (Vol. I, pp. 11–18). Google Scholar
  35. Opelt, A., Fussenegger, M., Pinz, A., & Auer, P. (2004). Weak hypotheses and boosting for generic object detection and recognition. In European conference on computer vision. Google Scholar
  36. Puzicha, J., Hofmann, T., & Buhmann, J. (1999). Histogram clustering for unsupervised segmentation and image retrieval. Pattern Recognition Letters, 20, 899–909. CrossRefGoogle Scholar
  37. Riesenhuber, M., & Poggio, T. (1999). Hierarchical models of object recognition in cortex. Nature Neuroscience, 2, 1019–1025. CrossRefGoogle Scholar
  38. Schaffalitzky, F., & Zisserman, A. (2001). Viewpoint invariant texture matching and wide baseline stereo. In International conference on computer vision (pp. 636–643), Vancouver. Google Scholar
  39. Schiele, B., & Crowley, J. (2000). Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(1), 31–50. CrossRefGoogle Scholar
  40. Schiele, B., & Pentland, A. (1999). Probabilistic object recognition and localization. In International conference on computer vision. Google Scholar
  41. Schmid, C. (2004). Weakly supervised learning of visual models and its application to content-based retrieval. International Journal of Computer Vision, 56(1), 7–16. CrossRefGoogle Scholar
  42. Schmid, C., & Mohr, R. (1997). Local gray value invariants for image retrieval. IEEE Transactions on Pattern Analysis and Machine Intelligence, 19(5), 530–534. CrossRefGoogle Scholar
  43. Serre, T., Wolf, L., & Poggio, T. (2005). Object recognition with features inspired by visual cortex. In International conference on computer vision & pattern recognition. Google Scholar
  44. Vapnik, V. (1995). The nature of statistical learning theory. Berlin: Springer. MATHGoogle Scholar
  45. Varma, M., & Zisserman, A. (2003). Texture classification: are filter banks necessary? In International conference on computer vision & pattern recognition. Google Scholar

Copyright information

© Springer Science+Business Media, LLC 2007

Authors and Affiliations

  1. 1.Microsoft Research Ltd.CambridgeUK
  2. 2.LJK–INRIA siteLJK–CNRSMontbonnotFrance

Personalised recommendations