Multi-class Object Layout with Unsupervised Image Classification and Object Localization

  • Ser-Nam Lim
  • Gianfranco Doretto
  • Jens Rittscher
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6938)


Recognizing the presence of object classes in an image, or image classification, has become an increasingly important topic of interest. Equally important, however, is also the capability to locate these object classes in the image. We consider in this paper an approach to these two related problems with the primary goal of minimizing the training requirements so as to allow for ease of adding new object classes, as opposed to approaches that favor training a suite of object-specific classifiers. To this end, we provide the analysis of an exemplar-based approach that leverages unsupervised clustering for classification purpose, and sliding window matching for localization. While such exemplar based approach by itself is brittle towards intraclass and viewpoint variations, we achieve robustness by introducing a novel Conditional Random Field model that facilitates a straightforward accept/reject decision of the localized object classes. Performance of our approach on the PASCAL Visual Object Challenge 2007 dataset demonstrates its efficacy.


Point Cloud Test Image Object Class Conditional Random Field Unsupervised Cluster 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Choi, M., Lim, J., Torralba, A., Willsky, A.: Exploiting hierarchical context on a large database of object categories. In: CVPR (2010)Google Scholar
  2. 2.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: ICCV, pp. 229–236 (2009)Google Scholar
  3. 3.
    Harzallah, H., Jurie, F., Schmid, C.: Combining efficient object localization and image classification. In: ICCV, pp. 237–244 (2009)Google Scholar
  4. 4.
    Heitz, G., Koller, D.: Learning spatial context: Using stuff to find things. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part I. LNCS, vol. 5302, pp. 30–43. Springer, Heidelberg (2008)CrossRefGoogle Scholar
  5. 5.
    Ladicky, L., Sturgess, P., Alahari, K., Russel, C., Torr, P.H.S.: What, where and how many? Combining object detectors and cRFs. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6314, pp. 424–437. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: ICCV (2007)Google Scholar
  7. 7.
    Lim, S.N., Doretto, G., Rittscher, J.: Object constellations: Scalable, simultaneous detection and recognition of multiple specific objects. In: Workshop on Cognitive Vision in conjunction with ECCV (2010)Google Scholar
  8. 8.
    Mutch, J., Lowe, D.: Object class recognition and localization using sparse features with limited receptive fields. IJCV 80 (2008)Google Scholar
  9. 9.
    Shotton, J., Johnson, M., Cipolla, R.: Semantic texton forests for image categorization and segmentation. In: CVPR (2008)Google Scholar
  10. 10.
    Yao, B., Fei-Fei, L.: Modeling mutual context of object and human pose in human-object interaction activities. In: CVPR (2010)Google Scholar
  11. 11.
    Yeh, T., Lee, J., Darrell, T.: Fast concurrent object localization and recognition. In: CVPR, pp. 280–287 (2009)Google Scholar
  12. 12.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  13. 13.
    Perronin, F.: Universal and adapted vocabularies for generic visual categorization. IEEE TPAMI 30, 1243–1256 (2008)CrossRefGoogle Scholar
  14. 14.
    Zhang, J., Marszalek, M., Lazebnik, S., Schmid, C.: Local features and kernels for classification of texture and object categories: a comprehensive study. IJCV 73, 213–238 (2007)CrossRefGoogle Scholar
  15. 15.
    Bourdev, L., Maji, S., Brox, T., Malik, J.: Detecting people using mutually consistent poselet activations. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010. LNCS, vol. 6316, pp. 168–181. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  16. 16.
    Dalal, N., Triggs, B.: Histogram of oriented gradients for human detection. In: ICCV, vol. 1, pp. 886–893 (2005)Google Scholar
  17. 17.
    Felzenszwalb, P., Girshick, R., McAllester, D., Ramanan, D.: Object detection with discriminatively trained part based models. IEEE TPAMI 32, 1627–1645 (2010)CrossRefGoogle Scholar
  18. 18.
    Tuzel, O., Porikli, F., Meer, P.: Human detection via classification on riemannian manifolds. In: CVPR, pp. 1–8 (2007)Google Scholar
  19. 19.
    Viola, P., Jones, M.J.: Robust real-time face detection. IJCV 57, 137–154 (2004)CrossRefGoogle Scholar
  20. 20.
    Berg, T., Forsyth, D.: Animals on the web. In: CVPR (2006)Google Scholar
  21. 21.
    Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: ICCV (2007)Google Scholar
  22. 22.
    Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: CVPR, vol. 2, pp. 2161–2168 (2006)Google Scholar
  23. 23.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabularies and fast spatial matching. In: CVPR (2007)Google Scholar
  24. 24.
    Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in quantization: Improving particular object retrieval in large scale image databases. In: CVPR (2008)Google Scholar
  25. 25.
    Gammeter, S., Bossard, L., Quack, T., van Gool, L.: I know what you did last summer: Object-level auto-annotation of holiday snaps. In: ICCV, pp. 614–621 (2009)Google Scholar
  26. 26.
    Kumar, S., Hebert, M.: Discriminative random fields. IJCV 68, 179–202 (2006)CrossRefGoogle Scholar
  27. 27.
    Lafferty, J.D., McCallum, A., Pereira, F.C.N.: Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In: ICML, pp. 282–289 (2001)Google Scholar
  28. 28.
    Divvala, S.K., Hoiem, D., Hays, J.H., Efros, A.A., Hebert, M.: An empirical study of context in object detection. In: CVPR (2009)Google Scholar
  29. 29.
    Galleguillos, C., McFee, B., Belongie, S., Lanckriet, G.: Multi-class object localization by combining local contextual interactions. In: CVPR, pp. 113–120 (2010)Google Scholar
  30. 30.
    Lampert, C.H., Blaschko, M.B., Hofmann, T.: Beyond sliding windows: Object localization by efficient subwindow search. In: CVPR (2008)Google Scholar
  31. 31.
    Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., Zisserman, A.: The PASCAL Visual Object Classes Challenge 2007 (VOC 2007) Results (2007),
  32. 32.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV 60, 91–110 (2004)CrossRefGoogle Scholar
  33. 33.
    Felzenszwalb, P., Huttenlocher, D.: Efficient belief propagation for early vision. IJCV 70, 41–54 (2006)CrossRefGoogle Scholar
  34. 34.
    Fredman, M.L.: On computing the length of the longest increasing subsequences. Discrete Mathematics 11, 29–35 (1975)MathSciNetCrossRefzbMATHGoogle Scholar
  35. 35.
    Bengio, S., Pereira, F., Singer, Y., Strelow, D.: Group sparse coding. In: NIPS (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2011

Authors and Affiliations

  • Ser-Nam Lim
    • 1
  • Gianfranco Doretto
    • 2
  • Jens Rittscher
    • 1
  1. 1.Computer Vision LabGE Global ResearchNiskayunaUSA
  2. 2.Dept. of CS & EEWest Virginia UniversityMorgantownUSA

Personalised recommendations