Unsupervised Visual Object Categorisation with BoF and Spatial Matching

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7944)


The ultimate challenge of image categorisation is unsupervised object discovery, where the selection of categories and the assignments of given images to these categories are performed automatically. The unsupervised setting prohibits the use of the best discriminative methods, and in Tuytelaars et al. [30] the standard Bag-of-Features (BoF) approach performed the best. The downside of the BoF is that it omits spatial information of local features. In this work, we propose a novel unsupervised image categorisation method which uses the BoF to find initial matches for each image (pre-filter) and then refines and ranks them using spatial matching of local features. Unsupervised visual object discovery is performed by the normalised cuts algorithm which produces the clusterings from a similarity matrix representing the spatial match scores. In our experiments, the proposed approach outperforms the best method in Tuytelaars et al with the Caltech-101, randomised Caltech-101, and Caltech-256 data sets. Especially for a large number of classes, clear and statistically significant improvements are achieved.


Query Image Latent Dirichlet Allocation Object Categorisation Conditional Entropy Unsupervised Categorisation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Biederman, I.: Recognition-by-components: A theory of human image understanding. Psychological Review 94(2), 115–147 (1987)CrossRefGoogle Scholar
  2. 2.
    Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet Allocation. Journal of Machine Learning Research 3(4-5), 993–1022 (2003)zbMATHGoogle Scholar
  3. 3.
    Chum, O., Matas, J.: Unsupervised discovery of co-occurrence in sparse high dimensional data. In: CVPR (2010)Google Scholar
  4. 4.
    Csurka, G., Dance, C., Willamowski, J., Fan, L., Bray, C.: Visual categorization with bags of keypoints. In: ECCV Workshop on Statistical Learning in Computer Vision (2004)Google Scholar
  5. 5.
    Deng, J., Berg, A.C., Li, K., Fei-Fei, L.: What does classifying more than 10,000 image categories tell us? In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 71–84. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes (VOC) challenge. IJCV 88(2) (2010)Google Scholar
  7. 7.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: an incremental bayesian approach tested on 101 object categories. In: CVPR Workshop on Generative-Model Based Vision (2004)Google Scholar
  8. 8.
    Griffin, G., Holub, A., Perona, P.: Caltech-256 object category dataset. Tech. Rep. 7694, California Institute of Technology (2007)Google Scholar
  9. 9.
    Hartley, R., Zisserman, A.: Multiple View Geometry in computer vision. Cambridge Press (2003)Google Scholar
  10. 10.
    Jégou, H., Douze, M., Schmid, C.: Improving bag-of-features for large scale image search. International Journal of Computer Vision 87(3), 316–336 (2010)CrossRefGoogle Scholar
  11. 11.
    Kim, G., Faloutsos, C., Hebert, M.: Unsupervised Modeling of Object Categories Using Link Analysis Techniques. In: CVPR (2008)Google Scholar
  12. 12.
    Kinnunen, T., Kamarainen, J.K., Lensu, L., Kälviäinen, H.: Unsupervised object discovery via self-organisation. Pattern Recognition Letters 33(16), 2102–2112 (2012)CrossRefGoogle Scholar
  13. 13.
    Kinnunen, T., Kamarainen, J.K., Lensu, L., Lankinen, J., Kälviäinen, H.: Making visual object categorization more challenging: Randomized caltech 101 data set. In: ICPR (2010)Google Scholar
  14. 14.
    Kohonen, T.: The self-organizing map. Proc. of the IEEE 78(9), 1464–1480 (1990)CrossRefGoogle Scholar
  15. 15.
    Krapac, J., Verbeek, J., Jurie, F.: Modeling spatial layout with fisher vectors for image categorization. In: Proc. of International Conference on Computer Vision, pp. 1487–1494 (2011)Google Scholar
  16. 16.
    Lankinen, J., Kamarainen, J.K.: Local feature based unsupervised alignment of object class images. In: Proc. of British Machine Vision Conference (2011)Google Scholar
  17. 17.
    Lou, Z., Ye, Y., Liu, D.: Unsupervised object category discovery via information bottleneck method. In: Proc. of the Int. Conf. on Multimedia (2010)Google Scholar
  18. 18.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. IJVC 20, 91–110 (2004)Google Scholar
  19. 19.
    Mikolajczyk, K., Tuytelaars, T., Schmid, C., Zisserman, A., Matas, J., Schaffalitzky, F., Kadir, T., Gool, L.V.: A comparison of affine region detectors. IJCV 65(1/2) (2005)Google Scholar
  20. 20.
    Mikolajczyk, K., Schmid, C.: Indexing based on scale invariant interest points. In: ICCV (2001)Google Scholar
  21. 21.
    Mikolajczyk, K., Schmid, C.: An affine invariant interest point detector. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002, Part I. LNCS, vol. 2350, pp. 128–142. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  22. 22.
    Nowak, E., Jurie, F., Triggs, B.: Sampling strategies for bag-of-features image classification. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3954, pp. 490–503. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  23. 23.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  24. 24.
    Ponce, J., Berg, T.L., Everingham, M., Forsyth, D., Hebert, M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B.C., Torralba, A., Williams, C.K.I., Zhang, J., Zisserman, A.: Dataset issues in object recognition. In: Ponce, J., Hebert, M., Schmid, C., Zisserman, A. (eds.) Toward Category-Level Object Recognition. LNCS, vol. 4170, pp. 29–48. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  25. 25.
    Schroff, F., Criminisi, A., Zisserman, A.: Harvesting image databases from the web. T-PAMI 33(4) (2011)Google Scholar
  26. 26.
    Shi, J., Malik, J.: Normalized cuts and image segmentation. T-PAMI 22(8) (2000)Google Scholar
  27. 27.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering object categories in image collections. In: ICCV (2005)Google Scholar
  28. 28.
    Sivic, J., Russell, B.C., Zisserman, A., Freeman, W.T., Efros, A.A.: Unsupervised discovery of visual object class hierarchies. In: CVPR, pp. 1–8 (2008)Google Scholar
  29. 29.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: CVPR (2003)Google Scholar
  30. 30.
    Tuytelaars, T., Lampert, C., Blaschko, M., Buntine, W.: Unsupervised object discovery: A comparison. IJCV 88(2) (2010)Google Scholar
  31. 31.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: Vernon, D. (ed.) ECCV 2000. LNCS, vol. 1842, pp. 18–32. Springer, Heidelberg (2000)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Department of Media TechnologyAalto UniversityFinland
  2. 2.Machine Vision and Pattern Recognition LaboratoryLappeenranta University of TechnologyFinland
  3. 3.Department of Signal ProcessingTampere University of TechnologyFinland

Personalised recommendations