Friendly Faces: Weakly Supervised Character Identification

Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 8912)


This paper demonstrates a novel method for automatically discovering and recognising characters in video without any labelled examples or user intervention. Instead weak supervision is obtained via a rough script-to-subtitle alignment. The technique uses pose invariant features, extracted from detected faces and clustered to form groups of co-occurring characters. Results show that with 9 characters, 29% of the closest exemplars are correctly identified, increasing to 50% as additional exemplars are considered.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Belhumeur, P., Jacobs, D., Kriegman, D., Kumar, N.: Localizing parts of faces using a consensus of exemplars. In: 2011 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 545–552, June 2011Google Scholar
  2. 2.
    Blank, M., Gorelick, L., Shechtman, E., Irani, M., Basri, R.: Actions as space-time shapes. In: Proc. ICCV (2005)Google Scholar
  3. 3.
    Boreczky, J.S., Rowe, L.A.: Comparison of video shot boundary detection techniques. Journal of Electronic Imaging (1996)Google Scholar
  4. 4.
    Buehler, P., Everingham, M., Zisserman, A.: Learning sign language by watching tv (using weakly aligned subtitles. In: Computer Vision and Pattern Recognition (2009)Google Scholar
  5. 5.
    Cernekova, Z., Nikou, C., Pitas, I.: Shot detection in video sequences using entropy based metrics. In: Proc. ICIP (2002)Google Scholar
  6. 6.
    Cooper, H., Bowden, R.: Learning signs from subtitles: A weakly supervised approach to sign language recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 2568–2574. IEEE (2009)Google Scholar
  7. 7.
    Cootes, T.F., Edwards, G.J., Taylor, C.J.: Active Appearance Models. In: Burkhardt, H., Neumann, B. (eds.) ECCV 1998. LNCS, vol. 1407, pp. 484–498. Springer, Heidelberg (1998) Google Scholar
  8. 8.
    Everingham, M., Sivic, J., Zisserman, A.: “Hello! My name is... Buffy" - automatic naming of characters in TV video. In: Proc. BMVC (2006)Google Scholar
  9. 9.
    Everingham, M., Sivic, J., Zisserman, A.: Taking the bite out of automatic naming of characters in TV video. Image and Vision Computing (2009)Google Scholar
  10. 10.
    Gao, X., Li, J., Shi, Y.: A Video Shot Boundary Detection Algorithm Based on Feature Tracking. In: Wang, G.-Y., Peters, J.F., Skowron, A., Yao, Y. (eds.) RSKT 2006. LNCS (LNAI), vol. 4062, pp. 651–658. Springer, Heidelberg (2006) Google Scholar
  11. 11.
    Gross, R., Matthews, I., Cohn, J., Kanade, T., Baker, S.: Multi-pie. Image Vision Comput. 28(5), 807–813 (2010).
  12. 12.
    Hadfield, S., Bowden, R.: Hollywood 3D: Recognizing actions in 3D natural scenes. In: Proc. CVPR (2013)Google Scholar
  13. 13.
    Hanjalic, A.: Shot-boundary detection: unraveled and resolved? Circuits and Systems for Video Technology (2002)Google Scholar
  14. 14.
    Jurie, F., Dhome, M.: Real time robust template matching. In: BMVC (2002)Google Scholar
  15. 15.
    Laptev, I., Marszalek, M., Schmid, C., Rozenfeld, B.: Learning realistic human actions from movies. In: Proc. CVPR (2008)Google Scholar
  16. 16.
    Le, V., Brandt, J., Lin, Z., Bourdev, L., Huang, T.S.: Interactive Facial Feature Localization. In: Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C. (eds.) ECCV 2012, Part III. LNCS, vol. 7574, pp. 679–692. Springer, Heidelberg (2012) Google Scholar
  17. 17.
    Liu, J., Luo, J., Shah, M.: Recognizing realistic actions from videos in the wild. In: Proc. CVPR (2009)Google Scholar
  18. 18.
    Lowe, D.G.: Distinctive image features from scale-invariant keypoints. IJCV (2004)Google Scholar
  19. 19.
    Marszalek, M., Laptev, I., Schmid, C.: Actions in context. In: Proc. CVPR (2009)Google Scholar
  20. 20.
    Mas, J., Fernandez, G.: Video shot boundary detection based on color histogram. Notebook Papers TRECVID 2003 (2003)Google Scholar
  21. 21.
    Matas, J., Zimmermann, K., Svoboda, T., Hilton, A.: Learning efficient linear predictors for motion estimation. In: Comp. Vis. Graphics and Image Proc. (2006)Google Scholar
  22. 22.
    Messer, K., Matas, J., Kittler, J., Lttin, J., Maitre, G.: Xm2vtsdb: The extended m2vts database. In: Second International Conference on Audio and Video-based Biometric Person Authentication, pp. 72–77 (1999)Google Scholar
  23. 23.
    Ong, E.J., Bowden, R.: Robust facial feature tracking using shape-constrained multiresolution-selected linear predictors. PAMI (2011)Google Scholar
  24. 24.
    Ong, E.J., Lan, Y., Theobald, B., Harvey, R., Bowden, R.: Robust facial feature tracking using selected multi-resolution linear predictors. In: 2009 IEEE 12th International Conference on Computer Vision, pp. 1483–1490. IEEE (2009)Google Scholar
  25. 25.
    Patel, N.V., Sethi, I.K.: Compressed video processing for cut detection. In: Proc. Vision, Image and Signal Processing (1996)Google Scholar
  26. 26.
    Pfister, T., Charles, J., Zisserman, A.: Large-scale learning of sign language by watching tv (using co-occurrences). In: British Machine Vision Conference (BMVC) (2013)Google Scholar
  27. 27.
    Sagonas, C., Tzimiropoulos, G., Zafeiriou, S., Pantic, M.: 300 faces in-the-wild challenge: The first facial landmark localization challenge. In: 2013 IEEE International Conference on Computer Vision Workshops (ICCVW), pp. 397–403, December 2013Google Scholar
  28. 28.
    Sankar, P., Jawahar, C.V., Zisserman, A.: Subtitle-free movie to script alignment. In: Proc. BMVC (2009)Google Scholar
  29. 29.
    Schuldt, C., Laptev, I., Caputo, B.: Recognizing human actions: a local SVM approach. In: Proc. ICPR (2004)Google Scholar
  30. 30.
    Smeaton, A.F., Over, P., Doherty, A.R.: Video shot boundary detection: Seven years of trecvid activity. CVIU (2010)Google Scholar
  31. 31.
    Viola, P., Jones, M.: Rapid object detection using a boosted cascade of simple features (2001)Google Scholar
  32. 32.
    Williams, O., Blake, A., Cipolla, R.: A sparse probabilistic learning algorithm for real-time tracking. In: Proc. ICCV (2003)Google Scholar
  33. 33.
    Xuehan, X., De la Torre, F.: Supervised descent method and its application to face alignment. In: Proc. CVPR (2013)Google Scholar
  34. 34.
    Yuan, J., Wang, H., Xiao, L., Zheng, W., Li, J., Lin, F., Zhang, B.: A formal study of shot boundary detection. Circuits and Systems for Video Technology (2007)Google Scholar
  35. 35.
    Zhang, H., Kankanhalli, A., Smoliar, S.W.: Automatic partitioning of full-motion video. Multimedia Systems (1993)Google Scholar
  36. 36.
    Zhu, X., Ramanan, D.: Face detection, pose estimation, and landmark localization in the wild. In: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2879–2886, June 2012Google Scholar

Copyright information

© Springer International Publishing Switzerland 2015

Authors and Affiliations

  • Matthew Marter
    • 1
  • Simon Hadfield
    • 1
  • Richard Bowden
    • 1
  1. 1.CVSSPUniversity of SurreySurreyUK

Personalised recommendations