Friendly Faces: Weakly Supervised Character Identification

  • Matthew Marter
  • Simon Hadfield
  • Richard Bowden
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8912)


This paper demonstrates a novel method for automatically discovering and recognising characters in video without any labelled examples or user intervention. Instead weak supervision is obtained via a rough script-to-subtitle alignment. The technique uses pose invariant features, extracted from detected faces and clustered to form groups of co-occurring characters. Results show that with 9 characters, 29% of the closest exemplars are correctly identified, increasing to 50% as additional exemplars are considered.


Linear Predictor Active Appearance Model Shot Boundary Facial Landmark Shot Boundary Detection 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Belhumeur, P., Jacobs, D., Kriegman, D., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 545–552, June 2011Google Scholar
  2. 2.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. ICCV (2005)Google Scholar
  3. 3.
    Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. Journal of Electronic Imaging (1996)Google Scholar
  4. 4.
    Buehler, P., Everingham, M., Zisserman, A.: Learning sign language by watching tv (using weakly aligned subtitles. In: Computer Vision and Pattern Recognition (2009)Google Scholar
  5. 5.
    Cernekova, Z., Nikou, C., Pitas, I.: Shot detection in video sequences using entropy based metrics. In: Proc. ICIP (2002)Google Scholar
  6. 6.
    Cooper, H., Bowden, R.: Learning signs from subtitles: A weakly supervised approach to sign language recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2568–2574. IEEE (2009)Google Scholar
  7. 7.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 484–498. Springer, Heidelberg (1998) Google Scholar
  8. 8.
    Everingham, M., Sivic, J., Zisserman, A.: “Hello! My name is... Buffy" - automatic naming of characters in TV video. In: Proc. BMVC (2006)Google Scholar
  9. 9.
    Everingham, M., Sivic, J., Zisserman, A.: Taking the bite out of automatic naming of characters in TV video. Image and Vision Computing (2009)Google Scholar
  10. 10.
    Gao, X., Li, J., Shi, Y.: A Video Shot Boundary Detection Algorithm Based on Feature Tracking. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 651–658. Springer, Heidelberg (2006) Google Scholar
  11. 11.
    Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vision Comput. 28(5), 807–813 (2010).
  12. 12.
    Hadfield, S., Bowden, R.: Hollywood 3D: Recognizing actions in 3D natural scenes. In: Proc. CVPR (2013)Google Scholar
  13. 13.
    Hanjalic, A.: Shot-boundary detection: unraveled and resolved? Circuits and Systems for Video Technology (2002)Google Scholar
  14. 14.
    Jurie, F., Dhome, M.: Real time robust template matching. In: BMVC (2002)Google Scholar
  15. 15.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. CVPR (2008)Google Scholar
  16. 16.
    Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive Facial Feature Localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012) Google Scholar
  17. 17.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Proc. CVPR (2009)Google Scholar
  18. 18.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)Google Scholar
  19. 19.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proc. CVPR (2009)Google Scholar
  20. 20.
    Mas, J., Fernandez, G.: Video shot boundary detection based on color histogram. Notebook Papers TRECVID 2003 (2003)Google Scholar
  21. 21.
    Matas, J., Zimmermann, K., Svoboda, T., Hilton, A.: Learning efficient linear predictors for motion estimation. In: Comp. Vis. Graphics and Image Proc. (2006)Google Scholar
  22. 22.
    Messer, K., Matas, J., Kittler, J., Lttin, J., Maitre, G.: Xm2vtsdb: The extended m2vts database. In: Second International Conference on Audio and Video-based Biometric Person Authentication, pp. 72–77 (1999)Google Scholar
  23. 23.
    Ong, E.J., Bowden, R.: Robust facial feature tracking using shape-constrained multiresolution-selected linear predictors. PAMI (2011)Google Scholar
  24. 24.
    Ong, E.J., Lan, Y., Theobald, B., Harvey, R., Bowden, R.: Robust facial feature tracking using selected multi-resolution linear predictors. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1483–1490. IEEE (2009)Google Scholar
  25. 25.
    Patel, N.V., Sethi, I.K.: Compressed video processing for cut detection. In: Proc. Vision, Image and Signal Processing (1996)Google Scholar
  26. 26.
    Pfister, T., Charles, J., Zisserman, A.: Large-scale learning of sign language by watching tv (using co-occurrences). In: British Machine Vision Conference (BMVC) (2013)Google Scholar
  27. 27.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 397–403, December 2013Google Scholar
  28. 28.
    Sankar, P., Jawahar, C.V., Zisserman, A.: Subtitle-free movie to script alignment. In: Proc. BMVC (2009)Google Scholar
  29. 29.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. ICPR (2004)Google Scholar
  30. 30.
    Smeaton, A.F., Over, P., Doherty, A.R.: Video shot boundary detection: Seven years of trecvid activity. CVIU (2010)Google Scholar
  31. 31.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features (2001)Google Scholar
  32. 32.
    Williams, O., Blake, A., Cipolla, R.: A sparse probabilistic learning algorithm for real-time tracking. In: Proc. ICCV (2003)Google Scholar
  33. 33.
    Xuehan, X., De la Torre, F.: Supervised descent method and its application to face alignment. In: Proc. CVPR (2013)Google Scholar
  34. 34.
    Yuan, J., Wang, H., Xiao, L., Zheng, W., Li, J., Lin, F., Zhang, B.: A formal study of shot boundary detection. Circuits and Systems for Video Technology (2007)Google Scholar
  35. 35.
    Zhang, H., Kankanhalli, A., Smoliar, S.W.: Automatic partitioning of full-motion video. Multimedia Systems (1993)Google Scholar
  36. 36.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886, June 2012Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Matthew Marter
    • 1
  • Simon Hadfield
    • 1
  • Richard Bowden
    • 1
  1. 1.CVSSPUniversity of SurreySurreyUK

Personalised recommendations