Multimedia Tools and Applications

, Volume 63, Issue 2, pp 501–520 | Cite as

Role-based identity recognition for TV broadcasts

  • Tobias Schwarze
  • Thomas Riegel
  • Seunghan Han
  • Andreas Hutter
  • Stefanie Nowak
  • Sascha Ebel
  • Christian Petersohn
  • Patrick Ndjiki-Nya


Semantic queries involving image understanding aspects require the exploitation of multiple clues, namely the (inter-) relations between objects and events across multiple images, the situational context, and the application context. A prominent example for such queries is the identification of individuals in video sequences. Straightforward face recognition approaches require a model of the persons in question and tend to fail in ill-conditioned environments. Therefore, an alternative approach is to involve contextual conditions of observations in order to determine the role a person plays in the current context. Due to the strong relation between roles, persons and their identities, knowing either often allows inferring about the other. This paper presents a system that implements this approach: First, robust face detection localizes the actors in the video. By clustering similar face instances the relative frequency of their appearance within a sequence is determined. In combination with a coarse textual annotation manually created by the broadcast station’s archivist the roles and consequently the identities can be assigned and labeled in the video. Starting with unambiguous assignments and cascading, most of the persons can be identified and labeled successfully. The feasibility and performance of the role-based person identification is demonstrated on the basis of several programs of a popular German TV show, which consists of various elements like interview scenes, games and musical show acts.


Identity recognition Metadata Searching Clustering Television programs Face localization Labeling 



This work has been supported by the THESEUS Program, which is funded by the German Federal Ministry of Economics and Technology. In particular, we thank our THESEUS project partner Institut für Rundfunktechnik for providing the TV program data and permission to use them for scientific purposes.


  1. 1.
    Arandjelovic O, Zisserman A (2005) “Automatic face recognition for film character retrieval in feature-length films”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, San Diego, CA, USA, pp. 860–867Google Scholar
  2. 2.
    Bartlett MS, Movellan JR, Sejnowski TJ (2002) Face recognition by independent component analysis. IEEE Trans Neural Network 13(6):1450–1464CrossRefGoogle Scholar
  3. 3.
    Belhumeur PN, Hespanha JP, Kriegman DJ (1997) Eigenfaces vs. fisherfaces: recognition using class specific linear projection. IEEE Trans Pattern Anal Mach Intell 19(7):711–720CrossRefGoogle Scholar
  4. 4.
    Berg T, Berg A, Edwards J, Maire M, White R, Teh Y, Miller E, Foryth D (2004) “Names and faces in the news”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Washington, DC, USA, vol. 2, pp. 848–854Google Scholar
  5. 5.
    Boujemaa N, Fleuret F, Gouet V, Sahbi H (2004) “Automatic textual annotation of video news based on semantic visual object extraction”. In: Proc. SPIE Storage and Retrieval Methods and Applications for Multimedia, San Jose, California, pp. 329–339Google Scholar
  6. 6.
    Chaisorn L, Koh C, Zhao Y, Xu H, Chua T-S, Qi T (2003) “Two- level multi-modal framework for news story segmentation of large video corpus”. In: Proc. 12th Text Retrieval Conference, Gaithersburg, MD, USAGoogle Scholar
  7. 7.
    Chen S, Tan X, Zhou Z-H, Zhang F (2006) Face recognition from a single image per person: a survey. IEEE Pattern Recogn 39(9):1725–1745zbMATHCrossRefGoogle Scholar
  8. 8.
    Everingham M, Sivic J, Zisserman A. “Hello! My name is… Buffy—automatic naming of characters in TV video”. In: Proc. British Machine Vision Conference, Sept. 2006, EdinburghGoogle Scholar
  9. 9.
    Fitzgibbon AW, Zisserman A (2002) “On affine invariant clustering and automatic cast listing in movies”. In: Proc. 7th European Conference on Computer Vision, Copenhagen, pp. 304–320Google Scholar
  10. 10.
    Gao Y, Leung MKH (2002) Face recognition using line edge map. IEEE Trans Pattern Anal Mach Intell 24(6):764–779CrossRefGoogle Scholar
  11. 11.
    Guillaumin M, Mensink T, Verbeek J, Schmid C (2008) “Automatic face naming with caption-based supervision”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, Anchorage, AK, USA, pp. 1–8Google Scholar
  12. 12.
    Han S, Hutter A, Stechele W (2009) “Toward contextual forensic retrieval for visual surveillance: challenges and an architectural approach”. In: Proc. Int. Workshop on Image Analysis for Multimedia Interactive Services, London, United Kingdom, pp. 201–204Google Scholar
  13. 13.
    He X, Yan S, Hu Y, Niyogi P, Zhang H-J (2005) Face recognition using Laplacian faces. IEEE Trans Pattern Anal Mach Intell 27(3):328–340CrossRefGoogle Scholar
  14. 14.
    Houghton R (1999) Named faces: putting names to faces. IEEE Intell Syst 14(5):45–50CrossRefGoogle Scholar
  15. 15.
    Jain V, Learned-Miller E, McCallum A (2007) “People-LDA: anchoring topics to people using face recognition”. In: Proc. IEEE Int. Conf. Computer Vision, Rio de Janeiro, pp. 1–8Google Scholar
  16. 16.
    Javed O, Rasheed Z, Shah M (2001) “A framework for segmentation of talk & game shows”. In: Proc. Int. Conf. on Computer Vision, Vancouver, BC, Canada, pp. 532–537Google Scholar
  17. 17.
    Jøsang A (2001) “A logic for uncertain probabilities,” International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 9(3): 279–311Google Scholar
  18. 18.
    Kirby M, Sirovich L (1990) Application of the Karhunen–Loève procedure for the characterization of human face. IEEE Trans Pattern Anal Mach Intell 12(1):103–108CrossRefGoogle Scholar
  19. 19.
    Kobla V, Dementhon D, Doermann D (2000) “Identifying sports videos using replay, text, and camera motion features”. In: Proc. SPIE Conference on Storage and Retrieval for Image and Video Databases, San Jose, CA, USA, pp. 332–343Google Scholar
  20. 20.
    Kuhmunch C (1997) “On the detection and recognition of television commercials”. In: Proc. Int. Conf. on Multimedia Computing and Systems, June 3–6, Ottawa, Canada, pp. 509–516Google Scholar
  21. 21.
    Lehane B, O'Connor NE, Murphy N (2005) “Dialogue sequence detection in movies”. In: Proc. Int. Conf. on Image and Video Retrieval 2005, Singapore, pp. 286–296Google Scholar
  22. 22.
    Lienhart R, Pfeiffer S, Fischer S. “Automatic movie abstracting”, Universität Mannheim, Reihe Informatik 3/97Google Scholar
  23. 23.
    Lin Y, Lin Y (2005) “Robust face detection with multi-class boosting”. In: Proc. Int. Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, pp. 680–687Google Scholar
  24. 24.
    Ozkan D, Duygulu P (2006) “Finding people frequently appearing in news”. In: Proc. Int. Conf. Image and Video Retrieval, Tempe, AZ, USA, pp. 173–182Google Scholar
  25. 25.
    Petersohn C (2009) “Temporal video structuring for preservation and annotation of video content”. In: Proc. IEEE Int. Conf. on Image Processing, Cairo, pp. 93–96Google Scholar
  26. 26.
    Porikli F, Tuzel O, Meer P (2006) “Covariance tracking using model update based on lie algebra”. In: Proc. Int. Conf. on Computer Vision and Pattern Recognition, New York, NY, USA, pp. 728–735Google Scholar
  27. 27.
    Satoh S, Kanade T (1997) “Name-it: association of face and name in video”. In: Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, San Juan, Puerto Rico, pp. 368–373Google Scholar
  28. 28.
    Viola P, Jones M (2001) “Rapid object detection using a boosted cascade of simple features”. In: Proc. Int. Conference on Computer Vision and Pattern Recognition, Kauai, USA, pp. 511–518Google Scholar
  29. 29.
    Yang J, Yan R, Hauptmann AG (2005) “Multiple instance learning for labeling faces in broadcasting news video”. In: Proc. 13th. ACM Int. Conf. Multimedia, Nov, Singapore, pp. 31–40Google Scholar
  30. 30.
    Zhang Yi-Fan, Changsheng Xu, Hanqing Lu, Huang Y-M (2009) Character identification in feature-length films using global face-name matching. IEEE Trans Multimedia 11(7):1276–1288CrossRefGoogle Scholar
  31. 31.
    Zhang L, Chu R, Xiang S, Liao S, Li SZ (2007) Face detection based on multi-block LBP representation. Lect Notes Comput Sci 4642:11–18CrossRefGoogle Scholar
  32. 32.
    Zhang X, Gaoa Y (2009) Face recognition across pose: a review. ELSEVIER Pattern Recogn 42(11):2876–2896CrossRefGoogle Scholar
  33. 33.
    Zhao W, Chellappa R, Phillips PJ, Rosenfeld A (2003) Face recognition: a literature survey. ACM Comput Surv 35(4):399–459CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2011

Authors and Affiliations

  • Tobias Schwarze
    • 1
  • Thomas Riegel
    • 1
  • Seunghan Han
    • 1
  • Andreas Hutter
    • 1
  • Stefanie Nowak
    • 2
  • Sascha Ebel
    • 3
  • Christian Petersohn
    • 3
  • Patrick Ndjiki-Nya
    • 3
  1. 1.Siemens AG, Corporate TechnologyMunichGermany
  2. 2.Fraunhofer Institute for Digital Media TechnologyIlmenauGermany
  3. 3.Fraunhofer Institute for TelecommunicationsBerlinGermany

Personalised recommendations