Abstract
In the recent years, the rapid increase of the volume of multimedia content has led to the development of several automatic annotation approaches. In parallel, the high availability of large amounts of user interaction data, revealed the need for developing automatic annotation techniques that exploit the implicit user feedback during interactive multimedia retrieval tasks. In this context, this paper proposes a method for automatic video annotation by exploiting implicit user feedback during interactive video retrieval, as this is expressed with gaze movements, mouse clicks and queries submitted to a content-based video search engine. We exploit this interaction data to represent video shots with feature vectors based on aggregated gaze movements. This information is used to train a classifier that can identify shots of interest for new users. Subsequently, we propose a framework that during testing: a) identifies topics (expressed by query clusters), for which new users are searching for, based on a novel clustering algorithm and b) associates multimedia data (i.e., video shots) to the identified topics using supervised classification. The novel clustering algorithm is based on random forests and is driven by two factors: first, by the distance measures between different sets of queries and second by the homogeneity of the shots viewed during each query cluster defined by the clustering procedure; this homogeneity is inferred from the performance of the gaze-based classifier on these shots. The evaluation shows that the use of aggregated gaze data can be exploited for video annotation purposes.
Similar content being viewed by others
Notes
References
Auer P, Hussain Z, Kaski S et al. (2010) Pinview: Implicit feedback in content-based image retrieval. ICML Works Reinforce Learn Search Very Large Spaces, Haifa, Israel
Ayache S, Queenot G. (2008) “Video corpus annotation using active learning”. Proc Europ Conf Inform Retrieval (ECIR’08), Glasgow, Scotland
Beitzel S, Jensen E, Lewis D et al. (2007) “Automatic classification of web queries using very large unlabeled query logs,”. ACM Trans Inform Syst 25(2)
Breiman L (2001) Random forests. Mach Learn 45(1):5–32
Breiman L, Cutler A (1999) “Random forests manual v4.0”, Berkeley, University of California, California, USA, Technical Report 99–02
Burkard R, Dell’Amico M, Martello S et al. (2009) Assignment problems. SIAM
Chang C, Lin C. Libsvm: a library for support vector machines
Chuang SC, Xu YY, Fu HC, Huang HC (2006) “A multiple-instance neural networks based image content retrieval system”. Proc Int’l Conf Innov Comput Inform Control 2:412–415, Beijing, China
Granka LA, Joachims T, Gay G et al. (2004) "Eye-tracking analysis of user behavior in WWW search". Proc 27th Ann Int ACM SIGIR Conf Res Dev Inform Retrieval (SIGIR’04). New York, NY, USA 478–479
Hardoon DR, Pasupa K (2010) “Image ranking with implicit feedback from eye movements,”. Proc Symposium Eye-Tracking Res Applic, Austin, Texas, USA 291–298
Hardoon DR, Shawe-Taylor J, Ajanki A et al. (2007) “Information retrieval by inferring implicit queries from eye movement”. Proc 11th Int Conf Artif Intell Stat, San Juan, Puerto Rico
Hopfgartner F, Jose J. (2007) “Evaluating the implicit feedback models for adaptive video retrieval”. Proc 9th ACM SIGMM Int Work Multimed Inform Retrieval, Augsburg, Germany 323–331
Hopfgartner F, Vallet D, Halvey M. et al. (2008) “Search trails using user feedback to improve video search”. Proc 2008 ACM Multimed, Vancouver, Canada 339–348
Hughes A, Wilkens T, Wildemuth B et al. (2003) “Text or pictures? an eyetracking study of how people view digital video surrogates”. Proc 2nd Int Conf Imag Video Retrieval (CIVR’03), Urbana, IL, USA 271–280
Iosifidis A, Tefas A, Pitas I (2013) “Multi-view action recognition based on action volumes”, fuzzy distances and cluster discriminant analysis. Signal Process 93:1445–1457
Jaccard P (1908) Nouvelles recherches sur la distribution florale. IEEE Trans Inf Theory 44:223–270
Jiang L, Chang X, Mao Z et al. (2014) “CMU Informedia @ TRECVID 2014: Semantic indexing”. Proc TRECVID 2014 Workshop, Gaithersburg, USA
Klami A, Saunders C, Campos TD et al. (2008) “Can relevance of images be inferred from eye movements?”. Proc 1st ACM Int Conf Multimed Inform Retriev, Vancouver, Canada 134–140
Koelstra S, Muehl C, Patras I et al. (2009) “EEG analysis for implicit tagging of video data”. Proc Work Affective Brain-Comput Interfaces (ABCI’09), Amsterdam, Canada 27–32
Kozma L, Klami A, Kaski S et al. (2009) “Gazir: gaze-based zooming interface for image retrieval” Proc 2009 Int Conf Multimodal Interf (ICMI09), New York, USA 305–312
Kuhn H (1955) The hungarian method for the assignment problem. Naval Res Logist Quarter 2:83–97
Lai 4dPS et al (2005) Automated information mining on multimedia tv news archives. Lecture Notes Artif Intell (LNAI) 3682:1238–1244
Li Q, Key B, Liu J et al. (2014) “A novel image retrieval system with real-time eye tracking”. Proc Int Conf Internet Multimed Comput Service (ICIMCS '14)
Liang Z, Fu H, Zhang Y et al. (2010) “Content-based image retrieval using a combination of visual features and eye tracking data,”. Proc Symposium Eye-Track Res Applic, Austin, Texas, USA 41–44
Liu W, Li Y, Lin X et al. (2014) “Hessian-regularized co-training for social activity recognition”. PLoS ONE 9(no 9)
Liu W, Liu H, Tao D, Wang Y, Lu K (2015) Multiview Hessian regularized logistic regression for action recognition. Signal Process 110:101–107
Liu W, Tao D, Cheng J, Tang Y (2014) Multiview Hessian discriminative sparse coding for image annotation”. Comput Vis Image Underst 118:50–60
Liu XF, Zhu XX (2015) Parallel feature extraction through preserving global and discriminative property for Kernel-based image classification. J Inform Hiding Multimed Sign Process 6(5):977–986
Lloyd S (1982) Least squares quantization in pcm. IEEE Trans Inf Theory 28(2):129–137
Papadopoulos GT, Apostolakis KC, Daras P et al “Gaze-based relevance feedback for realizing region-based image retrieval”. IEEE Trans Multimed 16(2) 440–454
Pedersen T, Patwardhan S, Michelizzi J et al. (2004) “Wordnet::similarity-measuring the relatedness of concepts,”. Proc 19th Natl Conf Artif Intell (AAAI’04), California, USA 1024–1025
Puolamaki K, Salojarvi J, Savia E et al. (2005) “Combining eye movements and collaborative filtering for proactive information retrieval”. Proc 28th Ann Int ACM SIGIR Conf Res Dev Inform Retrieval, Salvador, Brazil
Qi G.-J, Hua X.-S, Rui Y et al. (2007) “Correlative multi-label video annotation”. Proc ACM Multimed 2007, Augsburg, Germany
Radlinski F, Joachims T. (2005) “Query chains: learning to rank from implicit feedback”. Proc 11th ACM SIGKDD Int Conf Knowledge Discov Data Mining, Chicago, Illinois
Rayner K (1998) Eye movements in reading and information processing”. Psychol Bull 124:372–422
Sarafis I, Diou C, Delopoulos A (2015) Building effective SVM concept detectors from clickthrough data for large-scale image retrieval”. Int J Multimed Inform Retriev 4(2):129–142
Vinh N, Epps J, Bailey J et al. (2009) “Information theoretic measures for clusterings comparison: is a correction for chance necessary?”. Proc 26th Ann Int Conf Machine Learn (IMCL’09), Montreal, Canada 1073–1080
Vrochidis S, Kompatsiaris I, Patras I et al. (2011) “Utilizing implicit user feedback to improve interactive video retrieval”. Adv Multimed 2011(15)
Vrochidis S, Patras I, Kompatsiaris I et al. (2011) “An eye-tracking-based approach to facilitate interactive video search”. Proc 1st ACM Int Conf Multimed Retrieval (ICMR’11), Trento, Italy
Vrochidis S, Patras I, Kompatsiaris I. (2012) “Exploiting gaze movements for automatic video annotation”. Proc 13th Int Workshop Imag Anal Multimed Interact Services, Dublin, Ireland
Wen J-R, Nie J-Y, Hong-Jiang Z (2002) Query clustering using user logs. ACM Trans Inf Syst 20:59–81
Xie X, Li B, Chai X (2015) Adaptive sparse kernel principal component analysis for computation and store space constrained-based feature extraction. J Inform Hiding Multimed Sign Proces 6(4):824–832
Yang B, Mei T, Hua X.-S et al. (2007) “Online video recommendation based on multimodal fusion and relevance feedback”. Proc 6th ACM Int Conf Imag Video Retrieval (CIVR’09), Amsterdam, Canada 73–80
Zhang Y, Fu H, Liang Z et al. (2010) “Eye movement as an interaction mechanism for relevance feedback in a content-based image retrieval system”. Proc Symp Eye-Tracking Res Applic, Austin, Texas 37–40
Acknowledgments
This work was partially supported by the projects MULTISENSOR (FP7-610411), HOMER (FP7-312388) and PetaMedia (FP7-216444).
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vrochidis, S., Patras, I. & Kompatsiaris, I. Gaze movement-driven random forests for query clustering in automatic video annotation. Multimed Tools Appl 76, 2861–2889 (2017). https://doi.org/10.1007/s11042-015-3221-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-015-3221-1