Multimedia Tools and Applications

, Volume 51, Issue 2, pp 505–523 | Cite as

Looking at the viewer: analysing facial activity to detect personal highlights of multimedia contents

  • Hideo Joho
  • Jacopo Staiano
  • Nicu SebeEmail author
  • Joemon M. Jose


This paper presents an approach to detect personal highlights in videos based on the analysis of facial activities of the viewer. Our facial activity analysis was based on the motion vectors tracked on twelve key points in the human face. In our approach, the magnitude of the motion vectors represented a degree of a viewer’s affective reaction to video contents. We examined 80 facial activity videos recorded for ten participants, each watching eight video clips in various genres. The experimental results suggest that useful motion vectors to detect personal highlights varied significantly across viewers. However, it was suggested that the activity in the upper part of face tended to be more indicative of personal highlights than the activity in the lower part.


Facial activity Facial expression Affective summarization 



Funding was provided by the MIAUCE Project (EU IST-033715). Any opinions, findings, and conclusions described here are the authors and do not necessarily reflect those of the sponsor. The work of Jacopo Staiano and Nicu Sebe has been supported by the FP7 IP GLOCAL european project and by the FIRB S-PATTERNS project.


  1. 1.
    Arifin S, Cheung P (2007) A computation method for video segmentation utilizing the pleasure-arousal-dominance emotional information. In: ACM international conference on multimediaGoogle Scholar
  2. 2.
    Calvo R, D’Mello S (2010) Affect detection: an interdisciplinary review of models, methods, and their applications. IEEE Trans Aff Comp 1(1):18–37CrossRefGoogle Scholar
  3. 3.
    Chan CH, Jones GJF (2005) Affect-based indexing and retrieval of films. In: ACM international conference on multimedia, pp 427–430Google Scholar
  4. 4.
    Chen L (2000) Joint processing of audio-visual information for the recognition of emotional expressions in human-computer interaction. PhD thesis, University of Illinois at Urbana-ChampaignGoogle Scholar
  5. 5.
    Cheon Y, Kim D (2009) Natural facial expression recognition using differential-AAM and manifold learning. Pattern Recogn 42(7):260–274CrossRefGoogle Scholar
  6. 6.
    Cohen I, Sebe N, Chen L, Garg A, Huang T (2003) Facial expression recognition from video sequences: temporal and static modeling. Comput Vis Image Underst 91(1–2):160–187CrossRefGoogle Scholar
  7. 7.
    Cohen I, Sebe N, Cozman F, Cirelo M, Huang T (2004) Semi-supervised learning of classifiers: theory, algorithms, and applications to human-computer interaction. IEEE Trans Pattern Anal Mach Intell 26(12):1553–1567CrossRefGoogle Scholar
  8. 8.
    Dietz R, Lang A (1999) Aefective agents: effects of agent affect on arousal, attention, liking and learning. In: Cognitive technology conferenceGoogle Scholar
  9. 9.
    Ekman P, Friesen W (1978) Facial action coding system: investigator’s guide. Consulting Psychologists Press, Palo AltoGoogle Scholar
  10. 10.
    Hanjalic A (2006) Extracting moods from pictures and sounds: towards truly personalized TV. IEEE Signal Process Mag 2(23):90–100CrossRefGoogle Scholar
  11. 11.
    Hanjalic A, Xu LQ (2005) Affective video content representation and modeling. IEEE Trans Multimedia 7(1):143–154CrossRefGoogle Scholar
  12. 12.
    Hanjalic A, Lienhart R, Ma WY, Smith JR (2008) The holy grail of multimedia information retrieval: so close or yet so far away? Proc IEEE 96(4):541–547CrossRefGoogle Scholar
  13. 13.
    Huijsmans D, Sebe N (2005) How to complete performance graphs in content-based image retrieval: add generality and normalize scope. IEEE Trans Pattern Anal Mach Intell 27(2):245–251CrossRefGoogle Scholar
  14. 14.
    Jaimes A, Gatica-Perez D, Sebe N, Huang T (2007) Human-centered computing: towards a human revolution. IEEE Comp 5(40):30–34Google Scholar
  15. 15.
    Joho H, Jose JM, Valenti R, Sebe N (2009) Exploiting facial expressions for affective video summarisation. In: ACM international conference on image and video retrieval (CIVR)Google Scholar
  16. 16.
    Kang H (2002) Analysis of scene context related with emotional events. In: ACM international conference on multimediaGoogle Scholar
  17. 17.
    Madigan D, York J (1995) Bayesian graphical models for discrete data. Int Stat Rev 63:215–232zbMATHCrossRefGoogle Scholar
  18. 18.
    Mehrabian A (1996) Pleasure-arousal-dominance: a general framework for describing and measuring individual differences in temperament. Curr Psychol 14(4):261–292CrossRefMathSciNetGoogle Scholar
  19. 19.
    Moncrieff S, Dorai C, Venkatesh S (2001) Affect computing in film through sound energy dynamics. In: ACM international conference on multimediaGoogle Scholar
  20. 20.
    Money A, Agius H (2008) Feasibility of personalized affective video summaries. In: Affect and emotion in human—computer interaction. SpringerGoogle Scholar
  21. 21.
    Money A, Agius H (2008) Video summarisation: a conceptual framework and survey of the state of the art. J Vis Commun Image Represent 19(2):121–143CrossRefGoogle Scholar
  22. 22.
    Mooney C, Scully M, Jones GJF, Smeaton AF (2006) Investigating biometric response for information retrieval applications. In: European conference on information retrieval, pp 570–574Google Scholar
  23. 23.
    Over P, Smeaton AF, Kelly P (2007) The TRECVID 2007 BBC rushes summarization evaluation pilot. In: TVS ’07: Int. workshop on TRECVID video summarization, pp 1–15Google Scholar
  24. 24.
    van Rijsbergen CJ (1979) Information retrieval, 2nd edn. ButterworthsGoogle Scholar
  25. 25.
    Sebe N, Cohen I, Cozman F, Huang T (2005) Learning probabilistic classifiers for human-computer interaction applications. Multimedia Syst 10(6):484–498CrossRefGoogle Scholar
  26. 26.
    Soleymani M, Chanel G, Kierkels JJ, Pun T (2008) Affective ranking of movie scenes using physiological signals and content analysis. In: ACM workshop on multimedia semantics, pp 32–39Google Scholar
  27. 27.
    Sung J, Kanade T, Kim D (2008) Pose robust face tracking by combining active appearance models and cylinder head models. Int J Comput Vis 80(2):260–274CrossRefGoogle Scholar
  28. 28.
    Tao H, Huang T (1998) Connected vibrations: a modal analysis approach to non-rigid motion tracking. In: IEEE conference on compter vision and pattern recognition, pp 735–740Google Scholar
  29. 29.
    Tjondronegoro D, Chen YP, Pham B (2004) Highlights for more complete sports video summarization. IEEE Multimed 11(4):22–37CrossRefGoogle Scholar
  30. 30.
    Wang H, Cheong L (2006) Affective understanding in film. IEEE Trans Circuits Syst Video Technol 16(6):689–704CrossRefGoogle Scholar
  31. 31.
    Xu M, Chia L, Jin J (2005) Affective content analysis in comedy and horror videos by audio emotional event detection. In: IEEE international conference on multimedia and expoGoogle Scholar

Copyright information

© Springer Science+Business Media, LLC 2010

Authors and Affiliations

  • Hideo Joho
    • 1
  • Jacopo Staiano
    • 2
  • Nicu Sebe
    • 2
    Email author
  • Joemon M. Jose
    • 3
  1. 1.Department of Library, Information and Media StudiesUniversity of TsukubaTsukubaJapan
  2. 2.Department of Information Engineering and Computer ScienceUniversity of TrentoPovoItaly
  3. 3.School of Computing ScienceUniversity of GlasgowGlasgowUK

Personalised recommendations