Skip to main content
Log in

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

  • Regular Paper
  • Published:
Journal of Computer Science and Technology Aims and scope Submit manuscript

Abstract

Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75 073 Internet videos of over 4 000 hours, covering 2 427 celebrities and 649 001 faces. This is, to our knowledge, the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Zhao M, Yagnik J, Adam H et al. Large scale learning and recognition of faces in Web videos. In Proc. the 8th IEEE FGR, Sept. 2008, pp.1–7.

  2. Zhang X, Zhang L, Wang X J, Shum H Y. Finding celebrities in billions of Web images. IEEE Trans. Multimedia, 2012, 14(4): 995–1007.

    Article  Google Scholar 

  3. Xie H, Zhang Y, Tan J, Guo L, Li J. Contextual query expansion for image retrieval. IEEE Trans. Multimedia, 2014, 16(4): 1104–1114.

    Article  Google Scholar 

  4. Yao T, Ngo C W, Mei T. Circular reranking for visual search. IEEE Trans. Image Processing, 2013, 22(4): 1644–1655.

    Article  MathSciNet  Google Scholar 

  5. Liu J, Huang Z, Cai H, Shen H T, Ngo C W, Wang W. Near-duplicate video retrieval: Current research and future trends. ACM Computing Surveys, 2013, 45(4): Article No.44.

  6. Zhang L, Zhang Y, Gu X, Tang J, Tian Q. Scalable similarity search with topology preserving hashing. IEEE Trans. Image Processing, 2014, 23(7): 3025–3039.

    Article  MathSciNet  Google Scholar 

  7. Chen Z, Cao J, Xia T et al. Web video retagging. Multimedia Tools and Application, 2011, 55(1): 53–82.

    Article  Google Scholar 

  8. Berg T L, Berg A C, Edwards J et al. Names and faces in the news. In Proc. the 2004 IEEE CVPR, Jun. 2004, 2: 848–854.

  9. Bu J, Xu B, Wu C et al. Unsupervised face-name association via commute distance. In Proc. the 20th ACM Multimedia, Oct. 29–Nov. 2, 2012, pp.219–228.

  10. Satoh S, Nakamura Y, Kanade T. Name-it: Naming and detecting faces in news videos. IEEE MultiMedia, 1999, 6(1): 22–35.

    Article  Google Scholar 

  11. Pham P T, Tuytelaars T, Moens M F. Naming people in news videos with label propagation. IEEE MultiMedia, 2011, 18(3): 44–55.

    Article  Google Scholar 

  12. Pham P T, Deschacht K, Tuytelaars T, Moens M F. Naming persons in video: Using the weak supervision of textual stories. J. Visual Communication and Image Representation, 2013, 24(7): 944–955

    Article  Google Scholar 

  13. Yang J, Hauptmann A G. Naming every individual in news video monologues. In Proc. the 12th Annual ACM Multimedia, Oct. 2004, pp.580–587.

  14. Yang J, Yan R, Hauptmann A G. Multiple instance learning for labeling faces in broadcasting news video. In Proc. the 13th Annual ACM Multimedia, Oct. 2005, pp.31–40.

  15. Duygulu P, Hauptmann A. What’s news, what’s not? Associating news videos with words. In Proc. the 3th CIVR, Jul. 2004, pp.132–140.

  16. Everingham M, Sivic J, Zisserman A. Hello! My name is …buffy — Automatic naming of characters in TV video. In Proc. the 17th BMVC, Sept. 2006, pp.889–908.

  17. Ramanan D, Baker S, Kakade S. Leveraging archival video for building face datasets. In Proc. the 11th ICCV, Oct. 2007, pp.1–8.

  18. Cinbis R G, Verbeek J, Schmid C. Unsupervised metric learning for face identification in TV video. In Proc. the 13th ICCV, Nov. 2011, pp.1559–1566.

  19. Bäuml M, Tapaswi M, Stiefelhagen R. Semi-supervised learning with constraints for person identification in multimedia data. In Proc. the 26th IEEE CVPR, Jun. 2013, pp.3602–3609

  20. Zhang Y F, Xu C, Lu H et al. Character identification in feature-length films using global face-name matching. IEEE Trans. Multimedia, 2009, 11(7): 1276–1288.

    Article  Google Scholar 

  21. Guillaumin M, Mensink T, Verbeek J, Schmid C. Face recognition from caption-based supervision. International Journal of Computer Vision, 2012, 96(1): 64–82.

    Article  MathSciNet  MATH  Google Scholar 

  22. Ozkan D, Duygulu P. Interesting faces: A graph-based approach for finding people in news. Pattern Recognition, 2010, 43(5): 1717–1735.

    Article  Google Scholar 

  23. Guillaumin M, Verbeek J, Schmid C. Multiple instance metric learning from automatically labeled bags of faces. In Proc. the 11th ECCV, Sept. 2010, pp.634–647.

  24. Ozcan M, Jie L, Ferrari V et al. A large-scale database of images and captions for automatic face naming. In Proc. the 22nd BMVC, Aug. 29–Sept. 2, 2011, Article No. 29.

  25. Huang G B, Ramesh M, Berg T et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07–49, University of Massachusetts, Amherst, 2007.

  26. Wolf L, Hassner T, Maoz I. Face recognition in unconstrained videos with matched background similarity. In Proc. the 2011 IEEE CVPR, Jun. 2011, pp.529–534.

  27. Chen Z, Ngo C W, Cao J, Zhang W. Community as a connector: Associating faces with celebrity names in Web videos. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.809–812.

  28. Ruiz-del-Solar J, Verschae R, Correa M. Recognition of faces in unconstrained environments: A comparative study. EURASIP Journal on Advances in Signal Processing, 2009, pp.1–19

  29. Wang D, Hoi S C H, He Y, Zhu J. Mining weakly labeled Web facial images for search-based face annotation. IEEE Trans. Knowledge and Data Engineering, Jan. 2014, 26(1): 166–179.

  30. Stone Z, Zickler T, Darrell T. Toward large-scale face recognition using social network context. Proceedings of the IEEE, 2010, 98(8): 1408–1415.

    Article  Google Scholar 

  31. Cao J, Zhang Y D, Song Y C et al. MCG-WEBV: A bench-mark dataset for Web video analysis. Technical Report, Institute of Computing Technology, CAS, May 2009.

  32. Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661–703.

    Article  MathSciNet  MATH  Google Scholar 

  33. Sigurbjornsson B, Zwol R V. Flickr Tag recommendation based on collective knowledge. In Proc. the 17th Int. Conf. World Wide Web, Apr. 2008, pp.327–336.

  34. Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. the 20th ISCIS, Oct. 2005, pp.284–293.

  35. Song Y C, Zhang Y D, Cao J, Xia T, Li J T. Web video geolocation by geotagged social resources. IEEE Trans. Multimedia, 2012, 14(2): 456–470.

    Article  Google Scholar 

  36. Wu X, Ngo C W, Hauptmann A G, Tan H K. Real-time near-duplicate elimination for Web video search with content and context. IEEE Trans. Multimedia, 2009, 11(2): 196–207.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Chong-Wah Ngo.

Additional information

This work was supported by a research grant from City University of Hong Kong under Grant No. 7008178, and the National Natural Science Foundation of China under Grant Nos. 61228205, 61303175 and 61172153.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 131 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Chen, ZN., Ngo, CW., Zhang, W. et al. Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues. J. Comput. Sci. Technol. 29, 785–798 (2014). https://doi.org/10.1007/s11390-014-1468-z

Download citation

  • Received:

  • Revised:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11390-014-1468-z

Keywords

Navigation