Advertisement

Image Retrieval with Structured Object Queries Using Latent Ranking SVM

  • Tian Lan
  • Weilong Yang
  • Yang Wang
  • Greg Mori
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7577)

Abstract

We consider image retrieval with structured object queries – queries that specify the objects that should be present in the scene, and their spatial relations. An example of such queries is “car on the road”. Existing image retrieval systems typically consider queries consisting of object classes (i.e. keywords). They train a separate classifier for each object class and combine the output heuristically. In contrast, we develop a learning framework to jointly consider object classes and their relations. Our method considers not only the objects in the query (“car” and “road” in the above example), but also related object categories can be useful for retrieval. Since we do not have ground-truth labeling of object bounding boxes on the test image, we represent them as latent variables in our model. Our learning method is an extension of the ranking SVM with latent variables, which we call latent ranking SVM. We demonstrate image retrieval and ranking results on a dataset with more than a hundred of object classes.

References

  1. 1.
    Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D.M., Jordan, M.I.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHGoogle Scholar
  2. 2.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  3. 3.
    Gupta, A., Srinivasan, P., Shi, J., Davis, L.S.: Understanding videos, constructing plots: Learning a visually grounded storyline model from annotated videos. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  4. 4.
    Sadeghi, M.A., Farhadi, A.: Recognition using visual phrases. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  5. 5.
    Farhadi, A., Hejrati, M., Sadeghi, M.A., Young, P., Rashtchian, C., Hockenmaier, J., Forsyth, D.: Every Picture Tells a Story: Generating Sentences from Images. In: Daniilidis, K., Maragos, P., Paragios, N. (eds.) ECCV 2010, Part IV. LNCS, vol. 6314, pp. 15–29. Springer, Heidelberg (2010)CrossRefGoogle Scholar
  6. 6.
    Kulkarni, G., Premraj, V., Dhar, S., Li, S., Choi, Y., Berg, A.C., Berg, T.L.: Baby talk: Understanding and generating simple image descriptions. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  7. 7.
    Datta, R., Joshi, D., Li, J., Wang, J.Z.: Image retrieval: Ideas, influences, and trends of the new age. ACM Computing Survey 40, 1–60 (2008)CrossRefGoogle Scholar
  8. 8.
    Joachims, T.: Optimizing search engines using clickthrough data. In: ACM SIGKDD (2002)Google Scholar
  9. 9.
    Yu, C.N., Joachims, T.: Learning structural SVMs with latent variables. In: International Conference on Machine Learning (2009)Google Scholar
  10. 10.
    Blaschko, M.B., Vedaldi, A., Zisserman, A.: Simultaneous object detection and ranking with weak supervision. In: NIPS (2010)Google Scholar
  11. 11.
    Siddiquie, B., Feris, R.S., Davis, L.S.: Image ranking and retrieval based on multi-attribute queries. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2011)Google Scholar
  12. 12.
    Parikh, D., Zitnick, C.L., Chen, T.: From appearance to context-based recognition: Dense labeling in small images. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2009)Google Scholar
  13. 13.
    Desai, C., Ramanan, D., Fowlkes, C.: Discriminative models for multi-class object layout. In: IEEE International Conference on Computer Vision (2009)Google Scholar
  14. 14.
    Chen, T., Cheng, M.M., Tan, P., Shamir, A., Hu, S.M.: Sketch2Photo: Internet image montage. ACM Transactions on Graphics (2009)Google Scholar
  15. 15.
    Felzenszwalb, P., McAllester, D., Ramanan, D.: A discriminatively trained, multiscale, deformable part model. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  16. 16.
    Joachims, T.: Training linear SVMs in linear time. In: SIGKDD (2006)Google Scholar
  17. 17.
    Choi, M.J., Lim, J.J., Torralba, A., Willsky, A.S.: Exploiting hierarchical context on a large database of object categories. In: IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010)Google Scholar
  18. 18.
    Chapelle, O., Le, Q., Smola, A.: Large margin optimization of ranking measures. In: NIPS Workshop on Learning to Rank (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • Tian Lan
    • 1
  • Weilong Yang
    • 1
  • Yang Wang
    • 2
  • Greg Mori
    • 1
  1. 1.Simon Fraser UniversityCanada
  2. 2.University of ManitobaCanada

Personalised recommendations