Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

Chen, Zhi-Neng; Ngo, Chong-Wah; Zhang, Wei; Cao, Juan; Jiang, Yu-Gang

doi:10.1007/s11390-014-1468-z

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

Regular Paper
Published: 12 September 2014

Volume 29, pages 785–798, (2014)
Cite this article

Journal of Computer Science and Technology Aims and scope Submit manuscript

Zhi-Neng Chen^1,2,
Chong-Wah Ngo²,
Wei Zhang²,
Juan Cao³ &
…
Yu-Gang Jiang⁴

16 Citations
Explore all metrics

Abstract

Associating faces appearing in Web videos with names presented in the surrounding context is an important task in many applications. However, the problem is not well investigated particularly under large-scale realistic scenario, mainly due to the scarcity of dataset constructed in such circumstance. In this paper, we introduce a Web video dataset of celebrities, named WebV-Cele, for name-face association. The dataset consists of 75 073 Internet videos of over 4 000 hours, covering 2 427 celebrities and 649 001 faces. This is, to our knowledge, the most comprehensive dataset for this problem. We describe the details of dataset construction, discuss several interesting findings by analyzing this dataset like celebrity community discovery, and provide experimental results of name-face association using five existing techniques. We also outline important and challenging research problems that could be investigated in the future.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Context-Oriented Name-Face Association in Web Videos

Name-face association with web facial image supervision

Article 23 March 2017

Face-Based People Searching in Videos

References

Zhao M, Yagnik J, Adam H et al. Large scale learning and recognition of faces in Web videos. In Proc. the 8th IEEE FGR, Sept. 2008, pp.1–7.
Zhang X, Zhang L, Wang X J, Shum H Y. Finding celebrities in billions of Web images. IEEE Trans. Multimedia, 2012, 14(4): 995–1007.
Article Google Scholar
Xie H, Zhang Y, Tan J, Guo L, Li J. Contextual query expansion for image retrieval. IEEE Trans. Multimedia, 2014, 16(4): 1104–1114.
Article Google Scholar
Yao T, Ngo C W, Mei T. Circular reranking for visual search. IEEE Trans. Image Processing, 2013, 22(4): 1644–1655.
Article MathSciNet Google Scholar
Liu J, Huang Z, Cai H, Shen H T, Ngo C W, Wang W. Near-duplicate video retrieval: Current research and future trends. ACM Computing Surveys, 2013, 45(4): Article No.44.
Zhang L, Zhang Y, Gu X, Tang J, Tian Q. Scalable similarity search with topology preserving hashing. IEEE Trans. Image Processing, 2014, 23(7): 3025–3039.
Article MathSciNet Google Scholar
Chen Z, Cao J, Xia T et al. Web video retagging. Multimedia Tools and Application, 2011, 55(1): 53–82.
Article Google Scholar
Berg T L, Berg A C, Edwards J et al. Names and faces in the news. In Proc. the 2004 IEEE CVPR, Jun. 2004, 2: 848–854.
Bu J, Xu B, Wu C et al. Unsupervised face-name association via commute distance. In Proc. the 20th ACM Multimedia, Oct. 29–Nov. 2, 2012, pp.219–228.
Satoh S, Nakamura Y, Kanade T. Name-it: Naming and detecting faces in news videos. IEEE MultiMedia, 1999, 6(1): 22–35.
Article Google Scholar
Pham P T, Tuytelaars T, Moens M F. Naming people in news videos with label propagation. IEEE MultiMedia, 2011, 18(3): 44–55.
Article Google Scholar
Pham P T, Deschacht K, Tuytelaars T, Moens M F. Naming persons in video: Using the weak supervision of textual stories. J. Visual Communication and Image Representation, 2013, 24(7): 944–955
Article Google Scholar
Yang J, Hauptmann A G. Naming every individual in news video monologues. In Proc. the 12th Annual ACM Multimedia, Oct. 2004, pp.580–587.
Yang J, Yan R, Hauptmann A G. Multiple instance learning for labeling faces in broadcasting news video. In Proc. the 13th Annual ACM Multimedia, Oct. 2005, pp.31–40.
Duygulu P, Hauptmann A. What’s news, what’s not? Associating news videos with words. In Proc. the 3th CIVR, Jul. 2004, pp.132–140.
Everingham M, Sivic J, Zisserman A. Hello! My name is …buffy — Automatic naming of characters in TV video. In Proc. the 17th BMVC, Sept. 2006, pp.889–908.
Ramanan D, Baker S, Kakade S. Leveraging archival video for building face datasets. In Proc. the 11th ICCV, Oct. 2007, pp.1–8.
Cinbis R G, Verbeek J, Schmid C. Unsupervised metric learning for face identification in TV video. In Proc. the 13th ICCV, Nov. 2011, pp.1559–1566.
Bäuml M, Tapaswi M, Stiefelhagen R. Semi-supervised learning with constraints for person identification in multimedia data. In Proc. the 26th IEEE CVPR, Jun. 2013, pp.3602–3609
Zhang Y F, Xu C, Lu H et al. Character identification in feature-length films using global face-name matching. IEEE Trans. Multimedia, 2009, 11(7): 1276–1288.
Article Google Scholar
Guillaumin M, Mensink T, Verbeek J, Schmid C. Face recognition from caption-based supervision. International Journal of Computer Vision, 2012, 96(1): 64–82.
Article MathSciNet MATH Google Scholar
Ozkan D, Duygulu P. Interesting faces: A graph-based approach for finding people in news. Pattern Recognition, 2010, 43(5): 1717–1735.
Article Google Scholar
Guillaumin M, Verbeek J, Schmid C. Multiple instance metric learning from automatically labeled bags of faces. In Proc. the 11th ECCV, Sept. 2010, pp.634–647.
Ozcan M, Jie L, Ferrari V et al. A large-scale database of images and captions for automatic face naming. In Proc. the 22nd BMVC, Aug. 29–Sept. 2, 2011, Article No. 29.
Huang G B, Ramesh M, Berg T et al. Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07–49, University of Massachusetts, Amherst, 2007.
Wolf L, Hassner T, Maoz I. Face recognition in unconstrained videos with matched background similarity. In Proc. the 2011 IEEE CVPR, Jun. 2011, pp.529–534.
Chen Z, Ngo C W, Cao J, Zhang W. Community as a connector: Associating faces with celebrity names in Web videos. In Proc. the 20th ACM Multimedia, Oct. 29-Nov. 2, 2012, pp.809–812.
Ruiz-del-Solar J, Verschae R, Correa M. Recognition of faces in unconstrained environments: A comparative study. EURASIP Journal on Advances in Signal Processing, 2009, pp.1–19
Wang D, Hoi S C H, He Y, Zhu J. Mining weakly labeled Web facial images for search-based face annotation. IEEE Trans. Knowledge and Data Engineering, Jan. 2014, 26(1): 166–179.
Stone Z, Zickler T, Darrell T. Toward large-scale face recognition using social network context. Proceedings of the IEEE, 2010, 98(8): 1408–1415.
Article Google Scholar
Cao J, Zhang Y D, Song Y C et al. MCG-WEBV: A bench-mark dataset for Web video analysis. Technical Report, Institute of Computing Technology, CAS, May 2009.
Clauset A, Shalizi C R, Newman M E J. Power-law distributions in empirical data. SIAM Review, 2009, 51(4): 661–703.
Article MathSciNet MATH Google Scholar
Sigurbjornsson B, Zwol R V. Flickr Tag recommendation based on collective knowledge. In Proc. the 17th Int. Conf. World Wide Web, Apr. 2008, pp.327–336.
Pons P, Latapy M. Computing communities in large networks using random walks. In Proc. the 20th ISCIS, Oct. 2005, pp.284–293.
Song Y C, Zhang Y D, Cao J, Xia T, Li J T. Web video geolocation by geotagged social resources. IEEE Trans. Multimedia, 2012, 14(2): 456–470.
Article Google Scholar
Wu X, Ngo C W, Hauptmann A G, Tan H K. Real-time near-duplicate elimination for Web video search with content and context. IEEE Trans. Multimedia, 2009, 11(2): 196–207.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Institute of Automation, Chinese Academy of Sciences, Beijing, 100190, China
Zhi-Neng Chen
Department of Computer Science, City University of Hong Kong, Hong Kong, China
Zhi-Neng Chen, Chong-Wah Ngo & Wei Zhang
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, 100190, China
Juan Cao
School of Computer Science, Fudan University, Shanghai, 200433, China
Yu-Gang Jiang

Authors

Zhi-Neng Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chong-Wah Ngo
View author publications
You can also search for this author in PubMed Google Scholar
Wei Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Yu-Gang Jiang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Chong-Wah Ngo.

Additional information

This work was supported by a research grant from City University of Hong Kong under Grant No. 7008178, and the National Natural Science Foundation of China under Grant Nos. 61228205, 61303175 and 61172153.

Electronic supplementary material

Below is the link to the electronic supplementary material.

ESM 1

(PDF 131 kb)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Chen, ZN., Ngo, CW., Zhang, W. et al. Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues. J. Comput. Sci. Technol. 29, 785–798 (2014). https://doi.org/10.1007/s11390-014-1468-z

Download citation

Received: 24 February 2014
Revised: 03 July 2014
Published: 12 September 2014
Issue Date: September 2014
DOI: https://doi.org/10.1007/s11390-014-1468-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

Abstract

Access this article

Similar content being viewed by others

Context-Oriented Name-Face Association in Web Videos

Name-face association with web facial image supervision

Face-Based People Searching in Videos

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues

Abstract

Access this article

Similar content being viewed by others

Context-Oriented Name-Face Association in Web Videos

Name-face association with web facial image supervision

Face-Based People Searching in Videos

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Electronic supplementary material

ESM 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation