The Visual Computer

, Volume 32, Issue 6–8, pp 1045–1055 | Cite as

Toward semantic image similarity from crowdsourced clustering

  • Yanir Kleiman
  • George Goldberg
  • Yael Amsterdamer
  • Daniel Cohen-Or
Original Article


Determining the similarity between images is a fundamental step in many applications, such as image categorization, image labeling and image retrieval. Automatic methods for similarity estimation often fall short when semantic context is required for the task, raising the need for human judgment. Such judgments can be collected via crowdsourcing techniques, based on tasks posed to web users. However, to allow the estimation of image similarities in reasonable time and cost, the generation of tasks to the crowd must be done in a careful manner. We observe that distances within local neighborhoods provide valuable information that allows a quick and accurate construction of the global similarity metric. This key observation leads to a solution based on clustering tasks, comparing relatively similar images. In each query, crowd members cluster a small set of images into bins. The results yield many relative similarities between images, which are used to construct a global image similarity metric. This metric is progressively refined, and serves to generate finer, more local queries in subsequent iterations. We demonstrate the effectiveness of our method on datasets where ground truth is available, and on a collection of images where semantic similarities cannot be quantified. In particular, we show that our method outperforms alternative baseline approaches, and prove the usefulness of clustering queries, and of our progressive refinement process.


Crowdsourcing Image similarity Image distance metric 



This research was supported by a Google Focused Research Award, the Israeli Science Foundation (ISF, Grant No. 1636/13), by ICRC-The Blavatnik Interdisciplinary Cyber Research Center, and by the European Research Council under the FP7, ERC Grant MoDaS, Agreement 291071.


  1. 1.
    Bar-Hillel, A., Hertz, T., Shental, N., Weinshall, D.: Learning a mahalanobis metric from equivalence constraints. J. Mach. Learn. Res. 6(6), 937–965 (2005)MathSciNetzbMATHGoogle Scholar
  2. 2.
    Biswas, A., Jacobs, D.: Active image clustering with pairwise constraints from humans. Int. J. Comput. Vis. 108(1–2), 133–147 (2014)MathSciNetCrossRefzbMATHGoogle Scholar
  3. 3.
    Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H., et al.: Shapenet: An information-rich 3d model repository. arXiv:1512.03012 (arXiv preprint) (2015)
  4. 4.
    Dalal, N., Triggs, B.: Histograms of oriented gradients for human detection. In: Computer vision and pattern recognition, IEEE, 2005, vol. 1, pp. 886–893 (2005)Google Scholar
  5. 5.
    Davidson, S.B., Khanna, S., Milo, T., Roy, S.: Using the crowd for top-k and group-by queries. In: International conference on database theory, pp. 225–236. ACM (2013)Google Scholar
  6. 6.
    Frome, A., Singer, Y., Sha, F., Malik, J.: Learning globally-consistent local distance functions for shape-based image retrieval and classification. In: International conference on computer vision, IEEE. pp. 1–8 (2007)Google Scholar
  7. 7.
    Gomes, R.G., Welinder, P., Krause, A., Perona, P.: Crowdclustering. In: Advances in neural information processing systems. pp. 558–566 (2011)Google Scholar
  8. 8.
    Lowe, D.G.: Object recognition from local scale-invariant features. In: International conference on computer vision, IEEE 1999, vol. 2, pp. 1150–1157 (1999)Google Scholar
  9. 9.
    Lun, Z., Kalogerakis, E., Sheffer, A.: Elements of style: learning perceptual shape style similarity. ACM Trans. Gr. (TOG) 34(4), 84 (2015)Google Scholar
  10. 10.
    Marcus, A., Wu, E., Karger, D., Madden, S., Miller, R.: Human-powered sorts and joins. Proc. VLDB Endow. 5(1), 13–24 (2011)CrossRefGoogle Scholar
  11. 11.
    O’Donovan, P., Lībeks, J., Agarwala, A., Hertzmann, A.: Exploratory font selection using crowdsourced attributes. ACM Trans. Gr. (TOG) 33(4), 92 (2014)Google Scholar
  12. 12.
    Oliva, A., Torralba, A.: Modeling the shape of the scene: a holistic representation of the spatial envelope. Int. J. Comput. Vis. 42(3), 145–175 (2001)CrossRefzbMATHGoogle Scholar
  13. 13.
    Saleh, B., Dontcheva, M., Hertzmann, A., Liu, Z.: Learning style similarity for searching infographics. In: Proceedings of the 41st Graphics Interface Conference, pp. 59–64. Canadian Information Processing Society (2015)Google Scholar
  14. 14.
    Sammon, J.W.: A nonlinear mapping for data structure analysis. In: IEEE transactions on computers (1969)Google Scholar
  15. 15.
    Sivic, J., Zisserman, A.: Video google: a text retrieval approach to object matching in videos. In: International conference on computer vision, IEEE 2003, pp. 1470–1477 (2003)Google Scholar
  16. 16.
    Tamuz, O., Liu, C., Shamir, O., Kalai, A., Belongie, S.J.: Adaptively learning the crowd kernel. In: International conference on machine learning (ICML-11), pp. 673–680. ACM (2011)Google Scholar
  17. 17.
    Wang, C., Blei, D., Li, F.-F.: Simultaneous image classification and annotation. Computer vision and pattern recognition, IEEE 2009, pp. 1903–1910 (2009)Google Scholar
  18. 18.
    Wang, J., Kraska, T., Franklin, M.J., Feng, J.: Crowder: crowdsourcing entity resolution. Proc. VLDB Endow. 5(11), 1483–1494 (2012)CrossRefGoogle Scholar
  19. 19.
    Weinberger, K.Q., Blitzer, J., Saul, L.K.: Distance metric learning for large margin nearest neighbor classification. In: Advances in neural information processing systems, pp. 1473–1480 (2005)Google Scholar
  20. 20.
    Wilber, M.J., Kwak, I.S., Belongie, S.J.: Cost-effective hits for relative similarity comparisons. In: Conference on human computation and crowdsourcing (2014)Google Scholar
  21. 21.
    Xing, E.P., Ng, A.Y., Jordan, M.I., Russell, S.: Distance metric learning with application to clustering with side-information. Adv. Neural Inf. Proc. Syst. 15, 505–512 (2003)Google Scholar
  22. 22.
    Yi, J., Jin, R., Jain, S., Yang, T., Jain, A.K.: Semi-crowdsourced clustering: Generalizing crowd labeling by robust distance metric learning. In: Advances in neural information processing systems, pp. 1772–1780 (2012)Google Scholar
  23. 23.
    Zha, Z.-J., Hua, X.-S., Mei, T., Wang, J., Qi, G.-J., Wang, Z.: Joint multi-label multi-instance learning for image classification. Computer vision and pattern recognition, IEEE 2008, pp. 1–8 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2016

Authors and Affiliations

  • Yanir Kleiman
    • 1
  • George Goldberg
    • 1
  • Yael Amsterdamer
    • 2
  • Daniel Cohen-Or
    • 1
  1. 1.Tel Aviv UniversityTel AvivIsrael
  2. 2.Bar Ilan UniversityRamat GanIsrael

Personalised recommendations