International Journal of Computer Vision

, Volume 73, Issue 2, pp 213–238

Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study

  • J. Zhang
  • M. Marszałek
  • S. Lazebnik
  • C. Schmid
Article

Abstract

Recently, methods based on local image features have shown promise for texture and object recognition tasks. This paper presents a large-scale evaluation of an approach that represents images as distributions (signatures or histograms) of features extracted from a sparse set of keypoint locations and learns a Support Vector Machine classifier with kernels based on two effective measures for comparing distributions, the Earth Mover’s Distance and the χ2 distance. We first evaluate the performance of our approach with different keypoint detectors and descriptors, as well as different kernels and classifiers. We then conduct a comparative evaluation with several state-of-the-art recognition methods on four texture and five object databases. On most of these databases, our implementation exceeds the best reported results and achieves comparable performance on the rest. Finally, we investigate the influence of background correlations on recognition performance via extensive tests on the PASCAL database, for which ground-truth object localization information is available. Our experiments demonstrate that image representations based on distributions of local features are surprisingly effective for classification of texture and object images under challenging real-world conditions, including significant intra-class variations and substantial background clutter.

Keywords

image classification texture recognition object recognition scale- and affine-invariant keypoints support vector machines kernel methods 

References

  1. Agarwal, S. and Roth, D. 2002. Learning a sparse representation for object detection. In European Conference on Computer Vision, Vol. 4, pp. 113–130.Google Scholar
  2. Berg, A., Berg, T., and Malik, J. 2005. Shape matching and object recognition using low distortion correspondences. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 26–33.Google Scholar
  3. Brodatz, P. 1966. Textures: A Photographic Album for Artists and Designers. Dover: New York.Google Scholar
  4. Caputo, B., Wallraven, C., and Nilsback, M.-E. 2004. Object categorization via local kernels. In International Conference on Pattern Recognition, Vol. 2, pp. 132–135.Google Scholar
  5. Chapelle, O., Haffner, P., and Vapnik, V. 1999. Support vector machines for histogram-based image classification. IEEE Transactions on Neural Networks, 10(5):1055–1064.CrossRefGoogle Scholar
  6. Cohen, F.S., Fan, Z., and Patel, M.A.S. 1991. Classification of rotated and scaled textured images using Gaussian Markov field models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 13(2):192–202.CrossRefGoogle Scholar
  7. Csurka, G., Dance, C., Fan, L., Willamowski, J., and Bray, C. 2004. Visual categorization with bags of keypoints. In ECCV Workshop on Statistical Learning in Computer Vision.Google Scholar
  8. Cula, O.G. and Dana, K.J. 2001. Compact representation of bidirectional texture functions. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 1, pp. 1041–1047.Google Scholar
  9. Dana, K.J., van Ginneken, B., Nayar, S.K., and Koenderink, J.J. 1999. Reflectance and texture of real world surfaces. ACM Transactions on Graphics, 18(1):1–34.CrossRefGoogle Scholar
  10. Deselaers, T., Keysers, D., and Ney, H. 2005. Discriminative training for object recognition using image patches. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 157–162.Google Scholar
  11. Deselaers, T., Keysers, D., and Ney, H. 2005. Improving a discriminative approach to object recognition using image patches. In DAGM, pp. 326–333.Google Scholar
  12. Dorkó, G. and Schmid, C. 2005. Object class recognition using discriminative local features. Technical Report RR-5497, INRIA - Rhône-Alpes.Google Scholar
  13. Eichhorn, J. and Chapelle, O. 2004. Object categorization with SVM: kernels for local features. Technical report, Max Planck Institute for Biological Cybernetics, Tuebingen, Germany.Google Scholar
  14. Everingham, M., Zisserman, A., Williams, C., Van Gool, L. et al. 2006. The 2005 PASCAL visual object classes challenge. In Selected Proceedings of the first PASCAL Challenges Workshop, F. d’Alche Buc, I. Dagan, and J. Quinonero (Eds), LNAI, Springer. http://www.pascal-network.org/challenges/VOC/voc/index.html.
  15. Fei-Fei, L., Fergus, R., and Perona, P. 2004. Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In IEEE CVPR Workshop on Generative-Model Based Vision.Google Scholar
  16. Fei-Fei, L. and Perona, P. 2005. A Bayesian hierarchical model for learning natural scene categories. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 524–531.Google Scholar
  17. Felzenszwalb, P. and Huttenlocher, D. 2005. Pictorial structures for object recognition. International Journal of Computer Vision, 61(1):55–79.CrossRefGoogle Scholar
  18. Fergus, R., Perona, P., and Zisserman, A. 2003. Object class recognition by unsupervised scale-invariant learning. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 264–271.Google Scholar
  19. Fischler, M. and Elschlager, R. 1973. The representation and matching of pictorial structures. IEEE Transactions on Computers, 22(1):67–92.Google Scholar
  20. Fowlkes, C., Belongie, S., Chung, F., and Malik, J. 2004. Spectral grouping using the Nyström method. IEEE Transactions on Pattern Analysis and Machine Intelligence, 26(2):1–12.CrossRefGoogle Scholar
  21. Gårding, J. and Lindeberg, T. 1996. Direct computation of shape cues using scale-adapted spatial derivative operators. International Journal of Computer Vision, 17(2):163–191.CrossRefGoogle Scholar
  22. Grauman, K. and Darrell, T. 2005. Efficient image matching with distributions of local invariant features. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 627–634.Google Scholar
  23. Grauman, K. and Darrell, T. 2005. Pyramid match kernels: Discriminative classification with sets of image features. In International Conference on Computer Vision, Vol. 2, pp. 1458–1465.Google Scholar
  24. Hayman, E., Caputo, B., Fritz, M., and Eklundh, J.-O. 2004. On the significance of real-world conditions for material classification. In European Conference on Computer Vision, Vol. 4, pp. 253–266.Google Scholar
  25. Jing, F., Li, M., Zhang, H.-J., and Zhang, B. 2003. Support vector machines for region-based image retrieval. In IEEE International Conference on Multimedia and Expo.Google Scholar
  26. Johnson, A. and Hebert, M. 1999. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(5):433–449.CrossRefGoogle Scholar
  27. Julesz, B. 1981. Textons, the elements of texture perception and their interactions. Nature, 290:91–97.Google Scholar
  28. Jurie, F. and Triggs, B. 2005. Creating efficient codebooks for visual recognition. In International Conference on Computer Vision, Vol. 1, pp. 604–610.Google Scholar
  29. Larlus, D., Dorkó, G., and Jurie, F. 2006. Création de vocabulaires visuels efficaces pour la catégorisation d’images. In Reconnaissance des Formes et Intelligence Artificielle.Google Scholar
  30. Lazebnik, S., Schmid, C., and Ponce, J. 2004. Semi-local affine parts for object recognition. In British Machine Vision Conference, Vol. 2, pp. 959–968,Google Scholar
  31. Lazebnik, S., Schmid, C., and Ponce, J. 2005. A sparse texture representation using local affine regions. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1265–1278.CrossRefGoogle Scholar
  32. Leibe, B. and Schiele, B. 2003. Analyzing appearance and contour-based methods for object categorization. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 409–415, http://www.mis.informatik.tu-darmstadt.de/Research/Projects/categorization/eth80-db.html
  33. Leung, T. and Malik, J. 2001. Recognizing surfaces using three-dimensional textons. International Journal of Computer Vision, 43(1):29–44.CrossRefGoogle Scholar
  34. Lindeberg, T. 1998. Feature detection with automatic scale selection. International Journal of Computer Vision, 30(2):79–116.CrossRefGoogle Scholar
  35. Llado, X., Marti, J., and Petrou, M. 2003. Classification of textures seen from different distances and under varying illumination direction. In IEEE International Conference on Image Processing, Vol. 1, pp. 833–836.Google Scholar
  36. Lowe, D. 2004. Distinctive image features form scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110.CrossRefGoogle Scholar
  37. Lyu, S. 2005. Mercer kernels for object recognition with local features. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 223–229.Google Scholar
  38. Manjunath, B.S. and Ma, W.Y. 1996. Texture features for browsing and retrieval of image data. IEEE Transactions on Pattern Analysis and Machine Intelligence, 18(5):837–842.CrossRefGoogle Scholar
  39. Mao, J. and Jain, A. 1992. Texture classification and segmentation using multiresolution simultaneous autoregressive models. Pattern Recognition, 25(2):173–188.CrossRefGoogle Scholar
  40. McCallum, A. and Nigam, K. 1998. A comparison of event models for naive Bayes text classification. In AAAI-98 Workshop on Learning for Text Categorization, pp. 41–48.Google Scholar
  41. Mikolajczyk, K. and Schmid, C. 2005. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(10):1615–1630.CrossRefGoogle Scholar
  42. Mikolajczyk, K. and Schmid, C. 2002. An affine invariant interest point detector. In European Conference on Computer Vision, Vol. 1, pp. 128–142.Google Scholar
  43. Mikolajczyk, K. and Schmid, C. 2004. Scale and affine invariant interest point detectors. International Journal of Computer Vision, 60(1):63–86.CrossRefGoogle Scholar
  44. Nene, S.A., Nayar, S.K., and Murase, H. 1996. Columbia object image library (COIL-100), Technical Report CUCS-006-96, Columbia University, http://www1.cs.columbia.edu/CAVE/research/softlib/coil-100.html
  45. Niblack, W., Barber, R., Equitz, W., Fickner, M., Glasman, E., Petkovic, D., and Yanker, P. 1993. The QBIC project: Querying images by content using color, texture and shape. In SPIE Conference on Geometric Methods in Computer Vision II.Google Scholar
  46. Nigam, K., Lafferty, J., and McCallum, A. 1999. Using maximum entropy for text classification. In IJCAI Workshop on Machine Learning for Information Filtering, pp. 61–67.Google Scholar
  47. Nilsback, M.-E. and Caputo, B. 2004. Cue integration through discriminative accumulation. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 578–585.Google Scholar
  48. Opelt, A., Fussenegger, M., Pinz, A., and Auer, P. 2004. Weak hypotheses and boosting for generic object detection and recognition. In European Conference on Computer Vision, Vol. 2, pp. 71–84.Google Scholar
  49. Pelleg, D. and Moore, A. 2000. X-means: Extending k-means with efficient estimation of the number of clusters. In International Conference on Machine Learning, pp. 727–734.Google Scholar
  50. Pontil, M. and Verri, A. 1998. Support vector machines for 3D object recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(6):637–646.CrossRefGoogle Scholar
  51. Quelhas, P., Monay, F., Odobez, J.-M., Gatica, D., Tuytelaars, T., and Van Gool, L. 2005. Modeling scenes with local descriptors and latent aspects. In International Conference on Computer Vision, Vol. 2, pp. 883–890.Google Scholar
  52. Rubner, Y., Tomasi, C., and Guibas, L. 2000. The Earth Mover’s distance as a metric for image retrieval. International Journal of Computer Vision, 40(2):99–121.CrossRefGoogle Scholar
  53. Schiele, B. and Crowley, J. 2000. Recognition without correspondence using multidimensional receptive field histograms. International Journal of Computer Vision, 36(1):31–50.CrossRefGoogle Scholar
  54. Schölkopf, B. and Smola, A. 2002. Learning with Kernels: Support Vector Machines, Regularization, Optimization and Beyond. MIT Press: Cambridge, MA.Google Scholar
  55. Sivic, J., Russell, B., Efros, A., Zisserman, A., and Freeman, W. 2005. Discovering objects and their location in images. In International Conference on Computer Vision, Vol. 1, pp. 370–378.Google Scholar
  56. Sivic, J. and Zisserman, A. 2003. Video Google: A text retrieval approach to object matching in videos. In International Conference on Computer Vision, Vol. 2, pp. 1470–1477.Google Scholar
  57. Varma, M. and Zisserman, A. 2002. Classifying images of materials: Achieving viewpoint and illumination independence. In European Conference on Computer Vision, Vol. 3, pp. 255–271.Google Scholar
  58. Varma, M. and Zisserman, A. 2003. Texture classification: Are filter banks necessary? In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 691–698.Google Scholar
  59. Wallraven, C., Caputo, B., and Graf, A. 2003. Recognition with local features: the kernel recipe. In International Conference on Computer Vision, Vol. 1, pp. 257–264.Google Scholar
  60. Weber, M., Welling, M., and Perona, P. 2000. Towards automatic discovery of object categories. In IEEE Conference on Computer Vision and Pattern Recognition, Vol. 2, pp. 2101–2109.Google Scholar
  61. Willamowski, J., Arregui, D., Csurka, G., Dance, C.R., and Fan, L. 2004. Categorizing nine visual classes using local appearance descriptors. In ICPR Workshop on Learning for Adaptable Visual Systems.Google Scholar
  62. Wu, J. and Chantler, M.J. 2003. Combining gradient and albedo data for rotation invariant classification of 3D surface texture. In International Conference on Computer Vision, Vol. 2, pp. 48–855.Google Scholar
  63. Zhang, J., Marszalek, M., Lazebnik, S., and Schmid, C. Local features and kernels for classifcation of texture and object categories: An in-depth study. Technical Report RR-5737, INRIA Rhône-Alpes, November 2005. http://lear.inrialpes.fr/pubs/2005/ZMLS05

Copyright information

© Springer Science + Business Media, LLC 2006

Authors and Affiliations

  • J. Zhang
    • 1
  • M. Marszałek
    • 1
  • S. Lazebnik
    • 2
  • C. Schmid
    • 1
  1. 1.INRIA, GRAVIR-CNRSMontbonnotFrance
  2. 2.Beckman Institute, University of IllinoisUrbanaUSA

Personalised recommendations