Abstract
Semantic queries involving image understanding aspects require the exploitation of multiple clues, namely the (inter-) relations between objects and events across multiple images, the situational context, and the application context. A prominent example for such queries is the identification of individuals in video sequences. Straightforward face recognition approaches require a model of the persons in question and tend to fail in ill-conditioned environments. Therefore, an alternative approach is to involve contextual conditions of observations in order to determine the role a person plays in the current context. Due to the strong relation between roles, persons and their identities, knowing either often allows inferring about the other. This paper presents a system that implements this approach: First, robust face detection localizes the actors in the video. By clustering similar face instances the relative frequency of their appearance within a sequence is determined. In combination with a coarse textual annotation manually created by the broadcast station’s archivist the roles and consequently the identities can be assigned and labeled in the video. Starting with unambiguous assignments and cascading, most of the persons can be identified and labeled successfully. The feasibility and performance of the role-based person identification is demonstrated on the basis of several programs of a popular German TV show, which consists of various elements like interview scenes, games and musical show acts.
Similar content being viewed by others
References
Arandjelovic O, Zisserman A (2005) “Automatic face recognition for film character retrieval in feature-length films”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, pp. 860–867
Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Network 13(6):1450–1464
Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720
Berg T, Berg A, Edwards J, Maire M, White R, Teh Y, Miller E, Foryth D (2004) “Names and faces in the news”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Washington, DC, USA, vol. 2, pp. 848–854
Boujemaa N, Fleuret F, Gouet V, Sahbi H (2004) “Automatic textual annotation of video news based on semantic visual object extraction”. In: Proc. SPIE Storage and Retrieval Methods and Applications for Multimedia, San Jose, California, pp. 329–339
Chaisorn L, Koh C, Zhao Y, Xu H, Chua T-S, Qi T (2003) “Two- level multi-modal framework for news story segmentation of large video corpus”. In: Proc. 12th Text Retrieval Conference, Gaithersburg, MD, USA
Chen S, Tan X, Zhou Z-H, Zhang F (2006) Face recognition from a single image per person: a survey. IEEE Pattern Recogn 39(9):1725–1745
Everingham M, Sivic J, Zisserman A. “Hello! My name is… Buffy—automatic naming of characters in TV video”. In: Proc. British Machine Vision Conference, Sept. 2006, Edinburgh
Fitzgibbon AW, Zisserman A (2002) “On affine invariant clustering and automatic cast listing in movies”. In: Proc. 7th European Conference on Computer Vision, Copenhagen, pp. 304–320
Gao Y, Leung MKH (2002) Face recognition using line edge map. IEEE Trans Pattern Anal Mach Intell 24(6):764–779
Guillaumin M, Mensink T, Verbeek J, Schmid C (2008) “Automatic face naming with caption-based supervision”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Anchorage, AK, USA, pp. 1–8
Han S, Hutter A, Stechele W (2009) “Toward contextual forensic retrieval for visual surveillance: challenges and an architectural approach”. In: Proc. Int. Workshop on Image Analysis for Multimedia Interactive Services, London, United Kingdom, pp. 201–204
He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using Laplacian faces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340
Houghton R (1999) Named faces: putting names to faces. IEEE Intell Syst 14(5):45–50
Jain V, Learned-Miller E, McCallum A (2007) “People-LDA: anchoring topics to people using face recognition”. In: Proc. IEEE Int. Conf. Computer Vision, Rio de Janeiro, pp. 1–8
Javed O, Rasheed Z, Shah M (2001) “A framework for segmentation of talk & game shows”. In: Proc. Int. Conf. on Computer Vision, Vancouver, BC, Canada, pp. 532–537
Jøsang A (2001) “A logic for uncertain probabilities,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(3): 279–311
Kirby M, Sirovich L (1990) Application of the Karhunen–Loève procedure for the characterization of human face. IEEE Trans Pattern Anal Mach Intell 12(1):103–108
Kobla V, Dementhon D, Doermann D (2000) “Identifying sports videos using replay, text, and camera motion features”. In: Proc. SPIE Conference on Storage and Retrieval for Image and Video Databases, San Jose, CA, USA, pp. 332–343
Kuhmunch C (1997) “On the detection and recognition of television commercials”. In: Proc. Int. Conf. on Multimedia Computing and Systems, June 3–6, Ottawa, Canada, pp. 509–516
Lehane B, O'Connor NE, Murphy N (2005) “Dialogue sequence detection in movies”. In: Proc. Int. Conf. on Image and Video Retrieval 2005, Singapore, pp. 286–296
Lienhart R, Pfeiffer S, Fischer S. “Automatic movie abstracting”, Universität Mannheim, Reihe Informatik 3/97
Lin Y, Lin Y (2005) “Robust face detection with multi-class boosting”. In: Proc. Int. Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, pp. 680–687
Ozkan D, Duygulu P (2006) “Finding people frequently appearing in news”. In: Proc. Int. Conf. Image and Video Retrieval, Tempe, AZ, USA, pp. 173–182
Petersohn C (2009) “Temporal video structuring for preservation and annotation of video content”. In: Proc. IEEE Int. Conf. on Image Processing, Cairo, pp. 93–96
Porikli F, Tuzel O, Meer P (2006) “Covariance tracking using model update based on lie algebra”. In: Proc. Int. Conf. on Computer Vision and Pattern Recognition, New York, NY, USA, pp. 728–735
Satoh S, Kanade T (1997) “Name-it: association of face and name in video”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp. 368–373
Viola P, Jones M (2001) “Rapid object detection using a boosted cascade of simple features”. In: Proc. Int. Conference on Computer Vision and Pattern Recognition, Kauai, USA, pp. 511–518
Yang J, Yan R, Hauptmann AG (2005) “Multiple instance learning for labeling faces in broadcasting news video”. In: Proc. 13th. ACM Int. Conf. Multimedia, Nov, Singapore, pp. 31–40
Zhang Yi-Fan, Changsheng Xu, Hanqing Lu, Huang Y-M (2009) Character identification in feature-length films using global face-name matching. IEEE Trans Multimedia 11(7):1276–1288
Zhang L, Chu R, Xiang S, Liao S, Li SZ (2007) Face detection based on multi-block LBP representation. Lect Notes Comput Sci 4642:11–18
Zhang X, Gaoa Y (2009) Face recognition across pose: a review. ELSEVIER Pattern Recogn 42(11):2876–2896
Zhao W, Chellappa R, Phillips PJ, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv 35(4):399–459
Acknowledgments
This work has been supported by the THESEUS Program, which is funded by the German Federal Ministry of Economics and Technology. In particular, we thank our THESEUS project partner Institut für Rundfunktechnik for providing the TV program data and permission to use them for scientific purposes.
Author information
Authors and Affiliations
Corresponding author
Additional information
Part of the content of this paper has been presented on 3rd International Workshop at the Automated Information Extraction in Media Production, AIEMPro’10, Florence 25–29 October 2010.
Rights and permissions
About this article
Cite this article
Schwarze, T., Riegel, T., Han, S. et al. Role-based identity recognition for TV broadcasts. Multimed Tools Appl 63, 501–520 (2013). https://doi.org/10.1007/s11042-011-0834-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-011-0834-x