Improving People Search Using Query Expansions
Abstract
In this paper we are interested in finding images of people on the web, and more specifically within large databases of captioned news images. It has recently been shown that visual analysis of the faces in images returned on a text-based query over captions can significantly improve search results. The underlying idea to improve the text-based results is that although this initial result is imperfect, it will render the queried person to be relatively frequent as compared to other people, so we can search for a large group of highly similar faces. The performance of such methods depends strongly on this assumption: for people whose face appears in less than about 40% of the initial text-based result, the performance may be very poor. The contribution of this paper is to improve search results by exploiting faces of other people that co-occur frequently with the queried person. We refer to this process as ‘query expansion’. In the face analysis we use the query expansion to provide a query-specific relevant set of ‘negative’ examples which should be separated from the potentially positive examples in the text-based result set. We apply this idea to a recently-proposed method which filters the initial result set using a Gaussian mixture model, and apply the same idea using a logistic discriminant model. We experimentally evaluate the methods using a set of 23 queries on a database of 15.000 captioned news stories from Yahoo! News. The results show that (i) query expansion improves both methods, (ii) that our discriminative models outperform the generative ones, and (iii) our best results surpass the state-of-the-art results by 10% precision on average.
Keywords
Gaussian Mixture Model Face Detector Query Expansion Discriminative Model Discriminative MethodReferences
- 1.Smeulders, A., Worring, M., Santini, S., Gupta, A., Jain, R.: Content-based image retrieval at the end of the early years. IEEE Transactions on Pattern Analysis and Machine Intelligence 22(12), 1349–1380 (2000)CrossRefGoogle Scholar
- 2.Ponce, J., Berg, T., Everingham, M., Forsyth, D., Hebert, M., Lazebnik, S., Marszalek, M., Schmid, C., Russell, B., Torralba, A., Williams, C., Zhang, J., Zisserman, A.: Selected Proceedings of the first PASCAL Challenges Workshop. In: The 2005 PASCAL Visual Object Classes Challenge. LNCS (LNAI). Springer, Heidelberg (2006)Google Scholar
- 3.Lazebnik, S., Schmid, C., Ponce, J.: Affine-invariant local descriptors and neighborhood statistics for texture recognition. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 649–655 (2003)Google Scholar
- 4.Verbeek, J., Triggs, B.: Region classification with Markov field aspect models. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
- 5.Barnard, K., Duygulu, P., Forsyth, D., de Freitas, N., Blei, D., Jordan, M.: Matching words and pictures. Journal of Machine Learning Research 3, 1107–1135 (2003)zbMATHGoogle Scholar
- 6.Grangier, D., Monay, F., Bengio, S.: A discriminative approach for the retrieval of images from text queries. In: Proceedings of the European Conference on Machine Learning, pp. 162–173 (2006)Google Scholar
- 7.Bressan, M., Csurka, G., Hoppenot, Y., Renders, J.M.: Travel blog assistant system. In: Proceedings of the International Conference on Computer Vision Theory and Applications (2008)Google Scholar
- 8.Jain, V., Learned-Miller, E., McCallum, A.: People-LDA: Anchoring topics to people using face recognition. In: Proceedings of the IEEE International Conference on Computer Vision (2007)Google Scholar
- 9.Everingham, M., Sivic, J., Zisserman, A.: Hello! My name is... Buffy - automatic naming of characters in TV video. In: Proceedings of the British Machine Vision Conference, pp. 889–908 (2006)Google Scholar
- 10.Laptev, I., Marszałek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
- 11.Marcel, S., Abbet, P., Guillemot, M.: Google portrait. Technical Report IDIAP-COM-07-07, IDIAP (2007)Google Scholar
- 12.Ozkan, D., Duygulu, P.: A graph based approach for naming faces in news photos. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1477–1482 (2006)Google Scholar
- 13.Guillaumin, M., Mensink, T., Verbeek, J., Schmid, C.: Automatic face naming with caption-based supervision. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2008)Google Scholar
- 14.Berg, T., Berg, A., Edwards, J., Maire, M., White, R., Teh, Y., Learned-Miller, E., Forsyth, D.: Names and faces in the news. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 848–854 (2004)Google Scholar
- 15.Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using SMART: TREC 3. In: Proceedings of the Text Retrieval Conference, pp. 69–80 (1995)Google Scholar
- 16.Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query expansion with a generative feature model for object retrieval. In: Proceedings of the IEEE International Conference on Computer Vision (2007)Google Scholar
- 17.Krishnapuram, B., Carin, L., Figueiredo, M., Hartemink, A.: Sparse multinomial logistic regression: fast algorithms and generalization bounds. IEEE Transactions on Pattern Analysis and Machine Intelligence 27(6), 957–968 (2005)CrossRefGoogle Scholar
- 18.Perronnin, F., Dance, C.: Fisher kernels on visual vocabularies for image categorization. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2007)Google Scholar
- 19.Mikolajczyk, K., Schmid, C., Zisserman, A.: Human detection based on a probabilistic assembly of robust part detectors. In: Proceedings of the European Conference on Computer Vision, pp. 69–81 (2004)Google Scholar
- 20.Tan, X., Triggs, B.: Enhanced local texture feature sets for face recognition under difficult lighting conditions. In: Zhou, S.K., Zhao, W., Tang, X., Gong, S. (eds.) AMFG 2007. LNCS, vol. 4778, pp. 168–182. Springer, Heidelberg (2007)CrossRefGoogle Scholar
- 21.Lowe, D.: Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision 60(2), 91–110 (2004)CrossRefGoogle Scholar
- 22.Deschacht, K., Moens, M.: Efficient hierarchical entity classification using conditional random fields. In: Proceedings of Workshop on Ontology Learning and Population (2006)Google Scholar
- 23.Huang, G., Ramesh, M., Berg, T., Learned-Miller, E.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. Technical Report 07-49, University of Massachusetts, Amherst (2007)Google Scholar