Context as Supervisory Signal: Discovering Objects with Predictable Context

  • Carl Doersch
  • Abhinav Gupta
  • Alexei A. Efros
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8691)


This paper addresses the well-established problem of unsupervised object discovery with a novel method inspired by weakly-supervised approaches. In particular, the ability of an object patch to predict the rest of the object (its context) is used as supervisory signal to help discover visually consistent object clusters. The main contributions of this work are: 1) framing unsupervised clustering as a leave-one-out context prediction task; 2) evaluating the quality of context prediction by statistical hypothesis testing between thing and stuff appearance models; and 3) an iterative region prediction and context alignment approach that gradually discovers a visual object cluster together with a segmentation mask and fine-grained correspondences. The proposed method outperforms previous unsupervised as well as weakly-supervised object discovery approaches, and is shown to provide correspondences detailed enough to transfer keypoint annotations.


Context prediction unsupervised object discovery mining 


  1. 1.
    Hinton, G.E., Dayan, P., Frey, B.J., Neal, R.M.: The “wake-sleep” algorithm for unsupervised neural networks. IEEE Proceedings (1995)Google Scholar
  2. 2.
    Olshausen, B.A., et al.: Emergence of simple-cell receptive field properties by learning a sparse code for natural images. Nature (1996)Google Scholar
  3. 3.
    Le, Q.V.: Building high-level features using large scale unsupervised learning. In: ICASSP (2013)Google Scholar
  4. 4.
    Fergus, R., Perona, P., Zisserman, A.: Object class recognition by unsupervised scale-invariant learning. In: CVPR (2003)Google Scholar
  5. 5.
    Sivic, J., Russell, B.C., Efros, A.A., Zisserman, A., Freeman, W.T.: Discovering objects and their location in images. In: ICCV (2005)Google Scholar
  6. 6.
    Russell, B.C., Freeman, W.T., Efros, A.A., Sivic, J., Zisserman, A.: Using multiple segmentations to discover objects and their extent in image collections. In: CVPR (2006)Google Scholar
  7. 7.
    Lee, Y.J., Grauman, K.: Foreground focus: Unsupervised learning from partially matching images. IJCV (2009)Google Scholar
  8. 8.
    Payet, N., Todorovic, S.: From a set of shapes to object discovery. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part V. LNCS, vol. 6315, pp. 57–70. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  9. 9.
    Kim, G., Faloutsos, C., Hebert, M.: Unsupervised modeling of object categories using link analysis techniques. In: CVPR (2008)Google Scholar
  10. 10.
    Grauman, K., Darrell, T.: Unsupervised learning of categories from sets of partially matching image features. In: CVPR (2006)Google Scholar
  11. 11.
    Faktor, A., Irani, M.: “Clustering by composition” – unsupervised discovery of image categories. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part VII. LNCS, vol. 7578, pp. 474–487. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  12. 12.
    Rubinstein, M., Joulin, A., Kopf, J., Liu, C.: Unsupervised joint object discovery and segmentation in internet images. In: CVPR (2013)Google Scholar
  13. 13.
    Torralba, A., Efros, A.A.: Unbiased look at dataset bias. In: CVPR (2011)Google Scholar
  14. 14.
    Doersch, C., Gupta, A., Efros, A.A.: Mid-level visual element discovery as discriminative mode seeking. In: NIPS (2013)Google Scholar
  15. 15.
    Singh, S., Gupta, A., Efros, A.A.: Unsupervised discovery of mid-level discriminative patches. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part II. LNCS, vol. 7573, pp. 73–86. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  16. 16.
    Doersch, C., Singh, S., Gupta, A., Sivic, J., Efros, A.A.: What makes Paris look like Paris? In: SIGGRAPH (2012)Google Scholar
  17. 17.
    Endres, I., Shih, K., Jiaa, J., Hoiem, D.: Learning collections of part models for object recognition. In: CVPR (2013)Google Scholar
  18. 18.
    Jain, A., Gupta, A., Rodriguez, M., Davis, L.: Representing videos using mid-level discriminative patches. In: CVPR (2013)Google Scholar
  19. 19.
    Juneja, M., Vedaldi, A., Jawahar, C.V., Zisserman, A.: Blocks that shout: Distinctive parts for scene classification. In: CVPR (2013)Google Scholar
  20. 20.
    Li, Q., Wu, J., Tu, Z.: Harvesting mid-level visual concepts from large-scale internet images. In: CVPR (2013)Google Scholar
  21. 21.
    Sun, J., Ponce, J.: Learning discriminative part detectors for image classification and cosegmentation. In: ICCV (2013)Google Scholar
  22. 22.
    Wang, X., Wang, B., Bai, X., Liu, W., Tu, Z.: Max-margin multiple-instance dictionary learning. In: ICML (2013)Google Scholar
  23. 23.
    Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional neural networks. ArXiv preprint ArXiv:1311.2901 (2013)Google Scholar
  24. 24.
    Oliva, A., Torralba, A.: The role of context in object recognition. Trends in Cognitive Sciences (2007)Google Scholar
  25. 25.
    Weber, M., Welling, M., Perona, P.: Unsupervised learning of models for recognition. In: CVPR (2000)Google Scholar
  26. 26.
    Adelson, E.H.: On seeing stuff: The perception of materials by humans and machines. In: Photonics West 2001-Electronic Imaging, International Society for Optics and Photonics (2001)Google Scholar
  27. 27.
    Efros, A.A., Leung, T.K.: Texture synthesis by non-parametric sampling. In: ICCV (1999)Google Scholar
  28. 28.
    Munroe, R.: xkcd, a webcomic of romance, sarcasm, math and language. Creative Commons Attribution-Noncommercial (2014)Google Scholar
  29. 29.
    Hariharan, B., Malik, J., Ramanan, D.: Discriminative decorrelation for clustering and classification. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part IV. ECCV, vol. 7575, pp. 459–472. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  30. 30.
    Fei-Fei, L., Fergus, R., Perona, P.: Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. CVIU (2007)Google Scholar
  31. 31.
    Wang, G., Zhang, Y., Fei-Fei, L.: Using dependent regions for object categorization in a generative framework. In: CVPR (2006)Google Scholar
  32. 32.
    Sudderth, E.B., Torralba, A., Freeman, W.T., Willsky, A.S.: Learning hierarchical models of scenes, objects, and parts. In: ICCV (2005)Google Scholar
  33. 33.
    Vondrick, C., Khosla, A., Malisiewicz, T., Torralba, A.: HOG-gles: Visualizing object detection features. In: ICCV (2013)Google Scholar
  34. 34.
    Song, H.O., Girshick, R., Jegelka, S., Mairal, J., Harchaoui, Z., Darrell, T.: On learning to localize objects with minimal supervision. In: ICML (2014)Google Scholar
  35. 35.
    Hejrati, M., Ramanan, D.: Analyzing 3d objects in cluttered images. In: NIPS (2012)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Carl Doersch
    • 1
  • Abhinav Gupta
    • 1
  • Alexei A. Efros
    • 2
  1. 1.Carnegie Mellon UniversityUSA
  2. 2.UC BerkeleyUSA

Personalised recommendations