Skip to main content
Log in

Gaze movement-driven random forests for query clustering in automatic video annotation

Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the recent years, the rapid increase of the volume of multimedia content has led to the development of several automatic annotation approaches. In parallel, the high availability of large amounts of user interaction data, revealed the need for developing automatic annotation techniques that exploit the implicit user feedback during interactive multimedia retrieval tasks. In this context, this paper proposes a method for automatic video annotation by exploiting implicit user feedback during interactive video retrieval, as this is expressed with gaze movements, mouse clicks and queries submitted to a content-based video search engine. We exploit this interaction data to represent video shots with feature vectors based on aggregated gaze movements. This information is used to train a classifier that can identify shots of interest for new users. Subsequently, we propose a framework that during testing: a) identifies topics (expressed by query clusters), for which new users are searching for, based on a novel clustering algorithm and b) associates multimedia data (i.e., video shots) to the identified topics using supervised classification. The novel clustering algorithm is based on random forests and is driven by two factors: first, by the distance measures between different sets of queries and second by the homogeneity of the shots viewed during each query cluster defined by the clustering procedure; this homogeneity is inferred from the performance of the gaze-based classifier on these shots. The evaluation shows that the use of aggregated gaze data can be exploited for video annotation purposes.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16

Similar content being viewed by others

Notes

  1. https://www.youtube.com/

  2. NIST: http://www.nist.gov/

References

  1. Auer P, Hussain Z, Kaski S et al. (2010) Pinview: Implicit feedback in content-based image retrieval. ICML Works Reinforce Learn Search Very Large Spaces, Haifa, Israel

  2. Ayache S, Queenot G. (2008) “Video corpus annotation using active learning”. Proc Europ Conf Inform Retrieval (ECIR’08), Glasgow, Scotland

  3. Beitzel S, Jensen E, Lewis D et al. (2007) “Automatic classification of web queries using very large unlabeled query logs,”. ACM Trans Inform Syst 25(2)

  4. Breiman L (2001) Random forests. Mach Learn 45(1):5–32

    Article  MATH  Google Scholar 

  5. Breiman L, Cutler A (1999) “Random forests manual v4.0”, Berkeley, University of California, California, USA, Technical Report 99–02

  6. Burkard R, Dell’Amico M, Martello S et al. (2009) Assignment problems. SIAM

  7. Chang C, Lin C. Libsvm: a library for support vector machines

  8. Chuang SC, Xu YY, Fu HC, Huang HC (2006) “A multiple-instance neural networks based image content retrieval system”. Proc Int’l Conf Innov Comput Inform Control 2:412–415, Beijing, China

    Article  Google Scholar 

  9. Granka LA, Joachims T, Gay G et al. (2004) "Eye-tracking analysis of user behavior in WWW search". Proc 27th Ann Int ACM SIGIR Conf Res Dev Inform Retrieval (SIGIR’04). New York, NY, USA 478–479

  10. Hardoon DR, Pasupa K (2010) “Image ranking with implicit feedback from eye movements,”. Proc Symposium Eye-Tracking Res Applic, Austin, Texas, USA 291–298

  11. Hardoon DR, Shawe-Taylor J, Ajanki A et al. (2007) “Information retrieval by inferring implicit queries from eye movement”. Proc 11th Int Conf Artif Intell Stat, San Juan, Puerto Rico

  12. Hopfgartner F, Jose J. (2007) “Evaluating the implicit feedback models for adaptive video retrieval”. Proc 9th ACM SIGMM Int Work Multimed Inform Retrieval, Augsburg, Germany 323–331

  13. Hopfgartner F, Vallet D, Halvey M. et al. (2008) “Search trails using user feedback to improve video search”. Proc 2008 ACM Multimed, Vancouver, Canada 339–348

  14. Hughes A, Wilkens T, Wildemuth B et al. (2003) “Text or pictures? an eyetracking study of how people view digital video surrogates”. Proc 2nd Int Conf Imag Video Retrieval (CIVR’03), Urbana, IL, USA 271–280

  15. Iosifidis A, Tefas A, Pitas I (2013) “Multi-view action recognition based on action volumes”, fuzzy distances and cluster discriminant analysis. Signal Process 93:1445–1457

    Article  Google Scholar 

  16. Jaccard P (1908) Nouvelles recherches sur la distribution florale. IEEE Trans Inf Theory 44:223–270

    Google Scholar 

  17. Jiang L, Chang X, Mao Z et al. (2014) “CMU Informedia @ TRECVID 2014: Semantic indexing”. Proc TRECVID 2014 Workshop, Gaithersburg, USA

  18. Klami A, Saunders C, Campos TD et al. (2008) “Can relevance of images be inferred from eye movements?”. Proc 1st ACM Int Conf Multimed Inform Retriev, Vancouver, Canada 134–140

  19. Koelstra S, Muehl C, Patras I et al. (2009) “EEG analysis for implicit tagging of video data”. Proc Work Affective Brain-Comput Interfaces (ABCI’09), Amsterdam, Canada 27–32

  20. Kozma L, Klami A, Kaski S et al. (2009) “Gazir: gaze-based zooming interface for image retrieval” Proc 2009 Int Conf Multimodal Interf (ICMI09), New York, USA 305–312

  21. Kuhn H (1955) The hungarian method for the assignment problem. Naval Res Logist Quarter 2:83–97

    Article  MathSciNet  MATH  Google Scholar 

  22. Lai 4dPS et al (2005) Automated information mining on multimedia tv news archives. Lecture Notes Artif Intell (LNAI) 3682:1238–1244

    Google Scholar 

  23. Li Q, Key B, Liu J et al. (2014) “A novel image retrieval system with real-time eye tracking”. Proc Int Conf Internet Multimed Comput Service (ICIMCS '14)

  24. Liang Z, Fu H, Zhang Y et al. (2010) “Content-based image retrieval using a combination of visual features and eye tracking data,”. Proc Symposium Eye-Track Res Applic, Austin, Texas, USA 41–44

  25. Liu W, Li Y, Lin X et al. (2014) “Hessian-regularized co-training for social activity recognition”. PLoS ONE 9(no 9)

  26. Liu W, Liu H, Tao D, Wang Y, Lu K (2015) Multiview Hessian regularized logistic regression for action recognition. Signal Process 110:101–107

    Article  Google Scholar 

  27. Liu W, Tao D, Cheng J, Tang Y (2014) Multiview Hessian discriminative sparse coding for image annotation”. Comput Vis Image Underst 118:50–60

    Article  Google Scholar 

  28. Liu XF, Zhu XX (2015) Parallel feature extraction through preserving global and discriminative property for Kernel-based image classification. J Inform Hiding Multimed Sign Process 6(5):977–986

    Google Scholar 

  29. Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137

    Article  MathSciNet  MATH  Google Scholar 

  30. Papadopoulos GT, Apostolakis KC, Daras P et al “Gaze-based relevance feedback for realizing region-based image retrieval”. IEEE Trans Multimed 16(2) 440–454

  31. Pedersen T, Patwardhan S, Michelizzi J et al. (2004) “Wordnet::similarity-measuring the relatedness of concepts,”. Proc 19th Natl Conf Artif Intell (AAAI’04), California, USA 1024–1025

  32. Puolamaki K, Salojarvi J, Savia E et al. (2005) “Combining eye movements and collaborative filtering for proactive information retrieval”. Proc 28th Ann Int ACM SIGIR Conf Res Dev Inform Retrieval, Salvador, Brazil

  33. Qi G.-J, Hua X.-S, Rui Y et al. (2007) “Correlative multi-label video annotation”. Proc ACM Multimed 2007, Augsburg, Germany

  34. Radlinski F, Joachims T. (2005) “Query chains: learning to rank from implicit feedback”. Proc 11th ACM SIGKDD Int Conf Knowledge Discov Data Mining, Chicago, Illinois

  35. Rayner K (1998) Eye movements in reading and information processing”. Psychol Bull 124:372–422

    Article  Google Scholar 

  36. Sarafis I, Diou C, Delopoulos A (2015) Building effective SVM concept detectors from clickthrough data for large-scale image retrieval”. Int J Multimed Inform Retriev 4(2):129–142

    Article  Google Scholar 

  37. Vinh N, Epps J, Bailey J et al. (2009) “Information theoretic measures for clusterings comparison: is a correction for chance necessary?”. Proc 26th Ann Int Conf Machine Learn (IMCL’09), Montreal, Canada 1073–1080

  38. Vrochidis S, Kompatsiaris I, Patras I et al. (2011) “Utilizing implicit user feedback to improve interactive video retrieval”. Adv Multimed 2011(15)

  39. Vrochidis S, Patras I, Kompatsiaris I et al. (2011) “An eye-tracking-based approach to facilitate interactive video search”. Proc 1st ACM Int Conf Multimed Retrieval (ICMR’11), Trento, Italy

  40. Vrochidis S, Patras I, Kompatsiaris I. (2012) “Exploiting gaze movements for automatic video annotation”. Proc 13th Int Workshop Imag Anal Multimed Interact Services, Dublin, Ireland

  41. Wen J-R, Nie J-Y, Hong-Jiang Z (2002) Query clustering using user logs. ACM Trans Inf Syst 20:59–81

    Article  Google Scholar 

  42. Xie X, Li B, Chai X (2015) Adaptive sparse kernel principal component analysis for computation and store space constrained-based feature extraction. J Inform Hiding Multimed Sign Proces 6(4):824–832

    Google Scholar 

  43. Yang B, Mei T, Hua X.-S et al. (2007) “Online video recommendation based on multimodal fusion and relevance feedback”. Proc 6th ACM Int Conf Imag Video Retrieval (CIVR’09), Amsterdam, Canada 73–80

  44. Zhang Y, Fu H, Liang Z et al. (2010) “Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system”. Proc Symp Eye-Tracking Res Applic, Austin, Texas 37–40

Download references

Acknowledgments

This work was partially supported by the projects MULTISENSOR (FP7-610411), HOMER (FP7-312388) and PetaMedia (FP7-216444).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Stefanos Vrochidis.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Vrochidis, S., Patras, I. & Kompatsiaris, I. Gaze movement-driven random forests for query clustering in automatic video annotation. Multimed Tools Appl 76, 2861–2889 (2017). https://doi.org/10.1007/s11042-015-3221-1

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-015-3221-1

Keywords

Navigation