Joint People, Event, and Location Recognition in Personal Photo Collections Using Cross-Domain Context

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 6311)


We present a framework for vision-assisted tagging of personal photo collections using context. Whereas previous efforts mainly focus on tagging people, we develop a unified approach to jointly tag across multiple domains (specifically people, events, and locations). The heart of our approach is a generic probabilistic model of context that couples the domains through a set of cross-domain relations. Each relation models how likely the instances in two domains are to co-occur. Based on this model, we derive an algorithm that simultaneously estimates the cross-domain relations and infers the unknown tags in a semi-supervised manner. We conducted experiments on two well-known datasets and obtained significant performance improvements in both people and location recognition. We also demonstrated the ability to infer event labels with missing timestamps (i.e. with no event features).


Face Recognition Personal Photo Photo Collection Event Label Location Recognition 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Gallagher, A.C., Tsuhan, C.: Using context to recognize people in consumer images. IPSJ Journal 49, 1234–1245 (2008)Google Scholar
  2. 2.
    Zhang, L., Chen, L., Li, M., Zhang, H.: Automated annotation of human faces in family albums. In: 11th ACM Conf. on Multimedia (2003)Google Scholar
  3. 3.
    Davis, M., Smith, M., Canny, J., Good, N., King, S., Janakiraman, R.: Towards context-aware face recognition. In: 13th ACM Conf. on Multimedia (2005)Google Scholar
  4. 4.
    Davis, M., Smith, M., Stentiford, F., Bamidele, A., Canny, J., Good, N., King, S., Janakiraman, R.: Using context and similarity for face and location identification. In: SPIE’06 (2006)Google Scholar
  5. 5.
    Song, Y., Leung, T.: Context-aided human recognition - clustering. In: Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3953, pp. 382–395. Springer, Heidelberg (2006)CrossRefGoogle Scholar
  6. 6.
    Naaman, M., Garcia Molina, H., Paepcke, A., Yeh, R.B.: Leveraging context to resolve identity in photo albums. In: ACM/IEEE-CS Joint Conf. on Digi. Lib. (2005)Google Scholar
  7. 7.
    Gallagher, A.C., Tsuhan, C.: Using a markov network to recognize people in consumer images. In: ICIP (2007)Google Scholar
  8. 8.
    Gallagher, A.C., Chen, T.: Using group prior to identify people in consumer images. In: CVPR Workshop on SLAM’07 (2007)Google Scholar
  9. 9.
    Anguelov, D., Lee, K.c., Gokturk, S.B., Sumengen, B.: Contextual identity recognition in personal photo albums. In: CVPR’07 (2007)Google Scholar
  10. 10.
    Kapoor, A., Hua, G., Akbarzadeh, A., Baker, S.: Which faces to tag: Adding prior constraints into active learning. In: ICCV’09 (2009)Google Scholar
  11. 11.
    Torralba, A., Murphy, K.P., Freeman, W.T., Rubin, M.A.: Context-based vision system for place and object recognition. In: ICCV’03 (2003)Google Scholar
  12. 12.
    Torralba, A.: Contextual priming for object detection. Int’l. J. on Computer Vision 53, 169–191 (2003)CrossRefGoogle Scholar
  13. 13.
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S.: Objects in context. In: ICCV’07 (2007)Google Scholar
  14. 14.
    Galleguillos, C., Rabinovich, A., Belongie, S.: Object categorization using co-occurrence, location and appearance. In: CVPR’08 (2008)Google Scholar
  15. 15.
    Li, L.J., Fei-Fei, L.: What, where and who? classifying events by scene and object recognition. In: CVPR’07 (2007)Google Scholar
  16. 16.
    Li, L.J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation, and segmentation in an automatic framework. In: CVPR’09 (2009)Google Scholar
  17. 17.
    Cao, L., Luo, J., Kautz, H., Huang, T.S.: Annotating collections of photos using hierarchical event and scene models. In: CVPR’08 (2008)Google Scholar
  18. 18.
    Sutton, C., McCallum, A.: An Introduction to Conditional Random Fields for Relational Learning. In: Introduction to Statistical Learning. MIT Press, Cambridge (2007)Google Scholar
  19. 19.
    Wainwright, M.J., Jordan, M.I.: Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning 1, 1–305 (2008)zbMATHCrossRefGoogle Scholar
  20. 20.
    Wainwright, M.J., Jaakkola, T., Willsky, A.: A new class of upper bounds on the log partition function. IEEE Transaction on Information Theory 51, 2313–2335 (2005)CrossRefMathSciNetGoogle Scholar
  21. 21.
    Byrd, R.H., Lu, P., Nocedal, J.: A limited memory algorithm for bound constrained optimization. SIAM Journal on SSC 16, 1190–1208 (1995)zbMATHCrossRefMathSciNetGoogle Scholar
  22. 22.
    Cui, J., Wen, F., Xiao, R., Tian, Y., Tang, X.: Easyalbum: an interactive photo annotation system based on face clustering and re-ranking. In: SIGCHI, pp. 367–376 (2007)Google Scholar
  23. 23.
    Gallagher, A.C.: Clothing cosegmentation for recognizing people. In: CVPR’08 (2008)Google Scholar
  24. 24.
    Hua, G., Akbarzadeh, A.: A robust elastic and partial matching metric for face recognition. In: ICCV’09 (2009)Google Scholar
  25. 25.
    Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image retrieval. Int’l. Journal on Computer Vision 40, 99–121 (2000)zbMATHCrossRefGoogle Scholar
  26. 26.
    Schroff, F., Zitnick, C., Baker, S.: Clustering videos by location. In: British Machine Vision Conference (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  1. 1.Computer Science and Artificial Intelligence LaboratoryMIT 
  2. 2.Microsoft Research 
  3. 3.Nokia Research Center Hollywood 

Personalised recommendations