Multimedia Tools and Applications

, Volume 76, Issue 2, pp 2861–2889 | Cite as

Gaze movement-driven random forests for query clustering in automatic video annotation

  • Stefanos VrochidisEmail author
  • Ioannis Patras
  • Ioannis Kompatsiaris


In the recent years, the rapid increase of the volume of multimedia content has led to the development of several automatic annotation approaches. In parallel, the high availability of large amounts of user interaction data, revealed the need for developing automatic annotation techniques that exploit the implicit user feedback during interactive multimedia retrieval tasks. In this context, this paper proposes a method for automatic video annotation by exploiting implicit user feedback during interactive video retrieval, as this is expressed with gaze movements, mouse clicks and queries submitted to a content-based video search engine. We exploit this interaction data to represent video shots with feature vectors based on aggregated gaze movements. This information is used to train a classifier that can identify shots of interest for new users. Subsequently, we propose a framework that during testing: a) identifies topics (expressed by query clusters), for which new users are searching for, based on a novel clustering algorithm and b) associates multimedia data (i.e., video shots) to the identified topics using supervised classification. The novel clustering algorithm is based on random forests and is driven by two factors: first, by the distance measures between different sets of queries and second by the homogeneity of the shots viewed during each query cluster defined by the clustering procedure; this homogeneity is inferred from the performance of the gaze-based classifier on these shots. The evaluation shows that the use of aggregated gaze data can be exploited for video annotation purposes.


Implicit feedback Eye-tracking Interactive video retrieval Clustering Random forests 



This work was partially supported by the projects MULTISENSOR (FP7-610411), HOMER (FP7-312388) and PetaMedia (FP7-216444).


  1. 1.
    Auer P, Hussain Z, Kaski S et al. (2010) Pinview: Implicit feedback in content-based image retrieval. ICML Works Reinforce Learn Search Very Large Spaces, Haifa, IsraelGoogle Scholar
  2. 2.
    Ayache S, Queenot G. (2008) “Video corpus annotation using active learning”. Proc Europ Conf Inform Retrieval (ECIR’08), Glasgow, ScotlandGoogle Scholar
  3. 3.
    Beitzel S, Jensen E, Lewis D et al. (2007) “Automatic classification of web queries using very large unlabeled query logs,”. ACM Trans Inform Syst 25(2)Google Scholar
  4. 4.
    Breiman L (2001) Random forests. Mach Learn 45(1):5–32CrossRefzbMATHGoogle Scholar
  5. 5.
    Breiman L, Cutler A (1999) “Random forests manual v4.0”, Berkeley, University of California, California, USA, Technical Report 99–02Google Scholar
  6. 6.
    Burkard R, Dell’Amico M, Martello S et al. (2009) Assignment problems. SIAMGoogle Scholar
  7. 7.
    Chang C, Lin C. Libsvm: a library for support vector machinesGoogle Scholar
  8. 8.
    Chuang SC, Xu YY, Fu HC, Huang HC (2006) “A multiple-instance neural networks based image content retrieval system”. Proc Int’l Conf Innov Comput Inform Control 2:412–415, Beijing, ChinaCrossRefGoogle Scholar
  9. 9.
    Granka LA, Joachims T, Gay G et al. (2004) "Eye-tracking analysis of user behavior in WWW search". Proc 27th Ann Int ACM SIGIR Conf Res Dev Inform Retrieval (SIGIR’04). New York, NY, USA 478–479Google Scholar
  10. 10.
    Hardoon DR, Pasupa K (2010) “Image ranking with implicit feedback from eye movements,”. Proc Symposium Eye-Tracking Res Applic, Austin, Texas, USA 291–298Google Scholar
  11. 11.
    Hardoon DR, Shawe-Taylor J, Ajanki A et al. (2007) “Information retrieval by inferring implicit queries from eye movement”. Proc 11th Int Conf Artif Intell Stat, San Juan, Puerto RicoGoogle Scholar
  12. 12.
    Hopfgartner F, Jose J. (2007) “Evaluating the implicit feedback models for adaptive video retrieval”. Proc 9th ACM SIGMM Int Work Multimed Inform Retrieval, Augsburg, Germany 323–331Google Scholar
  13. 13.
    Hopfgartner F, Vallet D, Halvey M. et al. (2008) “Search trails using user feedback to improve video search”. Proc 2008 ACM Multimed, Vancouver, Canada 339–348Google Scholar
  14. 14.
    Hughes A, Wilkens T, Wildemuth B et al. (2003) “Text or pictures? an eyetracking study of how people view digital video surrogates”. Proc 2nd Int Conf Imag Video Retrieval (CIVR’03), Urbana, IL, USA 271–280Google Scholar
  15. 15.
    Iosifidis A, Tefas A, Pitas I (2013) “Multi-view action recognition based on action volumes”, fuzzy distances and cluster discriminant analysis. Signal Process 93:1445–1457CrossRefGoogle Scholar
  16. 16.
    Jaccard P (1908) Nouvelles recherches sur la distribution florale. IEEE Trans Inf Theory 44:223–270Google Scholar
  17. 17.
    Jiang L, Chang X, Mao Z et al. (2014) “CMU Informedia @ TRECVID 2014: Semantic indexing”. Proc TRECVID 2014 Workshop, Gaithersburg, USAGoogle Scholar
  18. 18.
    Klami A, Saunders C, Campos TD et al. (2008) “Can relevance of images be inferred from eye movements?”. Proc 1st ACM Int Conf Multimed Inform Retriev, Vancouver, Canada 134–140Google Scholar
  19. 19.
    Koelstra S, Muehl C, Patras I et al. (2009) “EEG analysis for implicit tagging of video data”. Proc Work Affective Brain-Comput Interfaces (ABCI’09), Amsterdam, Canada 27–32Google Scholar
  20. 20.
    Kozma L, Klami A, Kaski S et al. (2009) “Gazir: gaze-based zooming interface for image retrieval” Proc 2009 Int Conf Multimodal Interf (ICMI09), New York, USA 305–312Google Scholar
  21. 21.
    Kuhn H (1955) The hungarian method for the assignment problem. Naval Res Logist Quarter 2:83–97MathSciNetCrossRefzbMATHGoogle Scholar
  22. 22.
    Lai 4dPS et al (2005) Automated information mining on multimedia tv news archives. Lecture Notes Artif Intell (LNAI) 3682:1238–1244Google Scholar
  23. 23.
    Li Q, Key B, Liu J et al. (2014) “A novel image retrieval system with real-time eye tracking”. Proc Int Conf Internet Multimed Comput Service (ICIMCS '14)Google Scholar
  24. 24.
    Liang Z, Fu H, Zhang Y et al. (2010) “Content-based image retrieval using a combination of visual features and eye tracking data,”. Proc Symposium Eye-Track Res Applic, Austin, Texas, USA 41–44Google Scholar
  25. 25.
    Liu W, Li Y, Lin X et al. (2014) “Hessian-regularized co-training for social activity recognition”. PLoS ONE 9(no 9)Google Scholar
  26. 26.
    Liu W, Liu H, Tao D, Wang Y, Lu K (2015) Multiview Hessian regularized logistic regression for action recognition. Signal Process 110:101–107CrossRefGoogle Scholar
  27. 27.
    Liu W, Tao D, Cheng J, Tang Y (2014) Multiview Hessian discriminative sparse coding for image annotation”. Comput Vis Image Underst 118:50–60CrossRefGoogle Scholar
  28. 28.
    Liu XF, Zhu XX (2015) Parallel feature extraction through preserving global and discriminative property for Kernel-based image classification. J Inform Hiding Multimed Sign Process 6(5):977–986Google Scholar
  29. 29.
    Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137MathSciNetCrossRefzbMATHGoogle Scholar
  30. 30.
    Papadopoulos GT, Apostolakis KC, Daras P et al “Gaze-based relevance feedback for realizing region-based image retrieval”. IEEE Trans Multimed 16(2) 440–454Google Scholar
  31. 31.
    Pedersen T, Patwardhan S, Michelizzi J et al. (2004) “Wordnet::similarity-measuring the relatedness of concepts,”. Proc 19th Natl Conf Artif Intell (AAAI’04), California, USA 1024–1025Google Scholar
  32. 32.
    Puolamaki K, Salojarvi J, Savia E et al. (2005) “Combining eye movements and collaborative filtering for proactive information retrieval”. Proc 28th Ann Int ACM SIGIR Conf Res Dev Inform Retrieval, Salvador, BrazilGoogle Scholar
  33. 33.
    Qi G.-J, Hua X.-S, Rui Y et al. (2007) “Correlative multi-label video annotation”. Proc ACM Multimed 2007, Augsburg, GermanyGoogle Scholar
  34. 34.
    Radlinski F, Joachims T. (2005) “Query chains: learning to rank from implicit feedback”. Proc 11th ACM SIGKDD Int Conf Knowledge Discov Data Mining, Chicago, IllinoisGoogle Scholar
  35. 35.
    Rayner K (1998) Eye movements in reading and information processing”. Psychol Bull 124:372–422CrossRefGoogle Scholar
  36. 36.
    Sarafis I, Diou C, Delopoulos A (2015) Building effective SVM concept detectors from clickthrough data for large-scale image retrieval”. Int J Multimed Inform Retriev 4(2):129–142CrossRefGoogle Scholar
  37. 37.
    Vinh N, Epps J, Bailey J et al. (2009) “Information theoretic measures for clusterings comparison: is a correction for chance necessary?”. Proc 26th Ann Int Conf Machine Learn (IMCL’09), Montreal, Canada 1073–1080Google Scholar
  38. 38.
    Vrochidis S, Kompatsiaris I, Patras I et al. (2011) “Utilizing implicit user feedback to improve interactive video retrieval”. Adv Multimed 2011(15)Google Scholar
  39. 39.
    Vrochidis S, Patras I, Kompatsiaris I et al. (2011) “An eye-tracking-based approach to facilitate interactive video search”. Proc 1st ACM Int Conf Multimed Retrieval (ICMR’11), Trento, ItalyGoogle Scholar
  40. 40.
    Vrochidis S, Patras I, Kompatsiaris I. (2012) “Exploiting gaze movements for automatic video annotation”. Proc 13th Int Workshop Imag Anal Multimed Interact Services, Dublin, IrelandGoogle Scholar
  41. 41.
    Wen J-R, Nie J-Y, Hong-Jiang Z (2002) Query clustering using user logs. ACM Trans Inf Syst 20:59–81CrossRefGoogle Scholar
  42. 42.
    Xie X, Li B, Chai X (2015) Adaptive sparse kernel principal component analysis for computation and store space constrained-based feature extraction. J Inform Hiding Multimed Sign Proces 6(4):824–832Google Scholar
  43. 43.
    Yang B, Mei T, Hua X.-S et al. (2007) “Online video recommendation based on multimodal fusion and relevance feedback”. Proc 6th ACM Int Conf Imag Video Retrieval (CIVR’09), Amsterdam, Canada 73–80Google Scholar
  44. 44.
    Zhang Y, Fu H, Liang Z et al. (2010) “Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system”. Proc Symp Eye-Tracking Res Applic, Austin, Texas 37–40Google Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  • Stefanos Vrochidis
    • 1
    Email author
  • Ioannis Patras
    • 2
  • Ioannis Kompatsiaris
    • 1
  1. 1.Centre for Research and Technology Hellas - Information Technologies InstituteThessalonikiGreece
  2. 2.Queen MaryUniversity of LondonLondonUK

Personalised recommendations