Multimedia Tools and Applications

, Volume 75, Issue 15, pp 8999–9023

Naming multi-modal clusters to identify persons in TV broadcast

  • Johann Poignant
  • Guillaume Fortier
  • Laurent Besacier
  • Georges Quénot
Article
  • 126 Downloads

Abstract

Persons’ identification in TV broadcast is one of the main tools to index this type of videos. The classical way is to use biometric face and speaker models, but, to cover a decent number of persons, costly annotations are needed. Over the recent years, several works have proposed to use other sources of names for identifying people, such as pronounced names and written names. The main idea is to form face/speaker clusters based on their similarities and to propagate these names onto clusters. In this paper, we propose a method to take advantage of written names during the diarization process, in order to both name clusters and prevent the fusion of two clusters named differently. First, we extract written names with the LOOV tool (Poignant et al. 2012); these names are associated to their co-occurring speaker turns / face tracks. Simultaneously, we build a multi-modal matrix of distances between speaker turns and face tracks. Then agglomerative clustering is performed on this matrix with the constraint to avoid merging clusters associated to different names. We also integrate the prediction of few biometric models (anchors, some journalists) to directly identify speaker turns / face tracks before the clustering process. Our approach was evaluated on the REPERE corpus and reached an F-measure of 68.2 % for speaker identification and 60.2 % for face identification. Adding few biometric models improves results and leads to 82.4 % and 65.6 % for speaker and face identity respectively. By comparison, a mono-modal, supervised person identification system with 706 speaker models trained on matching development data and additional TV and radio data provides 67.8 % F-measure, while 908 face models provide only 30.5 % F-measure.

Keywords

Multimodal fusion VideoOCR Face and speaker identification TV broadcast 

Copyright information

© Springer Science+Business Media New York 2015

Authors and Affiliations

  • Johann Poignant
    • 1
    • 2
  • Guillaume Fortier
    • 1
    • 2
  • Laurent Besacier
    • 1
    • 2
  • Georges Quénot
    • 1
    • 2
  1. 1.Université Grenoble Alpes, LIGGrenobleFrance
  2. 2.CNRS, LIGGrenobleFrance

Personalised recommendations