Improving People Search Using Query Expansions

How Friends Help to Find People
  • Thomas Mensink
  • Jakob Verbeek
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5303)

Abstract

In this paper we are interested in finding images of people on the web, and more specifically within large databases of captioned news images. It has recently been shown that visual analysis of the faces in images returned on a text-based query over captions can significantly improve search results. The underlying idea to improve the text-based results is that although this initial result is imperfect, it will render the queried person to be relatively frequent as compared to other people, so we can search for a large group of highly similar faces. The performance of such methods depends strongly on this assumption: for people whose face appears in less than about 40% of the initial text-based result, the performance may be very poor. The contribution of this paper is to improve search results by exploiting faces of other people that co-occur frequently with the queried person. We refer to this process as ‘query expansion’. In the face analysis we use the query expansion to provide a query-specific relevant set of ‘negative’ examples which should be separated from the potentially positive examples in the text-based result set. We apply this idea to a recently-proposed method which filters the initial result set using a Gaussian mixture model, and apply the same idea using a logistic discriminant model. We experimentally evaluate the methods using a set of 23 queries on a database of 15.000 captioned news stories from Yahoo! News. The results show that (i) query expansion improves both methods, (ii) that our discriminative models outperform the generative ones, and (iii) our best results surpass the state-of-the-art results by 10% precision on average.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)CrossRefGoogle Scholar
  2. 2.
    Ponce, J., Berg, T., Everingham, M., Forsyth, D., Hebert, M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B., Torralba, A., Williams, C., Zhang, J., Zisserman, A.: Selected Proceedings of the first PASCAL Challenges Workshop. In: The 2005 PASCAL Visual Object Classes Challenge. LNCS (LNAI). Springer, Heidelberg (2006)Google Scholar
  3. 3.
    Lazebnik, S., Schmid, C., Ponce, J.: Affine-invariant local descriptors and neighborhood statistics for texture recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 649–655 (2003)Google Scholar
  4. 4.
    Verbeek, J., Triggs, B.: Region classification with Markov field aspect models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  5. 5.
    Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)MATHGoogle Scholar
  6. 6.
    Grangier, D., Monay, F., Bengio, S.: A discriminative approach for the retrieval of images from text queries. In: Proceedings of the European Conference on Machine Learning, pp. 162–173 (2006)Google Scholar
  7. 7.
    Bressan, M., Csurka, G., Hoppenot, Y., Renders, J.M.: Travel blog assistant system. In: Proceedings of the International Conference on Computer Vision Theory and Applications (2008)Google Scholar
  8. 8.
    Jain, V., Learned-Miller, E., McCallum, A.: People-LDA: Anchoring topics to people using face recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2007)Google Scholar
  9. 9.
    Everingham, M., Sivic, J., Zisserman, A.: Hello! My name is... Buffy - automatic naming of characters in TV video. In: Proceedings of the British Machine Vision Conference, pp. 889–908 (2006)Google Scholar
  10. 10.
    Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  11. 11.
    Marcel, S., Abbet, P., Guillemot, M.: Google portrait. Technical Report IDIAP-COM-07-07, IDIAP (2007)Google Scholar
  12. 12.
    Ozkan, D., Duygulu, P.: A graph based approach for naming faces in news photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1477–1482 (2006)Google Scholar
  13. 13.
    Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Automatic face naming with caption-based supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
  14. 14.
    Berg, T., Berg, A., Edwards, J., Maire, M., White, R., Teh, Y., Learned-Miller, E., Forsyth, D.: Names and faces in the news. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 848–854 (2004)Google Scholar
  15. 15.
    Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using SMART: TREC 3. In: Proceedings of the Text Retrieval Conference, pp. 69–80 (1995)Google Scholar
  16. 16.
    Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: Proceedings of the IEEE International Conference on Computer Vision (2007)Google Scholar
  17. 17.
    Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 957–968 (2005)CrossRefGoogle Scholar
  18. 18.
    Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
  19. 19.
    Mikolajczyk, K., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust part detectors. In: Proceedings of the European Conference on Computer Vision, pp. 69–81 (2004)Google Scholar
  20. 20.
    Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. In: Zhou, S.K., Zhao, W., Tang, X., Gong, S. (eds.) AMFG 2007. LNCS, vol. 4778, pp. 168–182. Springer, Heidelberg (2007)CrossRefGoogle Scholar
  21. 21.
    Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
  22. 22.
    Deschacht, K., Moens, M.: Efficient hierarchical entity classification using conditional random fields. In: Proceedings of Workshop on Ontology Learning and Population (2006)Google Scholar
  23. 23.
    Huang, G., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Thomas Mensink
    • 1
  • Jakob Verbeek
    • 1
  1. 1.LEAR - INRIA Rhône Alpes - GrenobleFrance

Personalised recommendations