Abstract
In this paper, a unified and adaptive web video thumbnail recommendation framework is proposed, which recommends thumbnails both for video owners and browsers on the basis of image quality assessment, image accessibility analysis, video content representativeness analysis and query-sensitive matching. At the very start, video shot detection is performed and the highest image quality video frame is extracted as the key frame for each shot on the basis of our proposed image quality assessment method. These key frames are utilized as the thumbnail candidates for the following processes. In the image quality assessment, the normalized variance autofocusing function is employed to evaluate the image blur and ensures that the selected video thumbnail candidates are clear and have high image quality. For accessibility analysis, color moment, visual salience and texture are used with a support vector regression model to predict the candidates’ accessibility score, which ensures that the recommended thumbnail’s ROIs are big enough and it is very accessible for users. For content representativeness analysis, the mutual reinforcement algorithm is adopted in the entire video to obtain the candidates’ representativeness score, which ensures that the final thumbnail is representative enough for users to catch the main video contents at a glance. Considering browsers’ query intent, a relevant model is designed to recommend more personalized thumbnails for certain browsers. Finally, by flexibly fusing the above analysis results, the final adaptive recommendation work is accomplished. Experimental results and subjective evaluations demonstrate the effectiveness of the proposed approach. Compared with the existing web video thumbnail generation methods, the thumbnails for video owners not only reflect the contents of the video better, but also make users feel more comfortable. The thumbnails for video browsers directly reflect their preference, which greatly enhances their user experience.
Similar content being viewed by others
References
Christel M (2006) Evaluation and user studies with respect to video summarization and browsing. In Proceedings of the IS&T/SPIE conference on multimedia content analysis, management, and retrieval, pp 196–210
Dirfaux F (2000) Key frame selection to represent a video. In Proceedings of IEEE International Conference on Image Processing (ICIP), pp 275–278
Gao Y, Zhang T, Xiao J (2009) Thematic video thumbnail selection. In Proceedings of IEEE International Conference on Image Processing (ICIP), pp 4333–4336
Gong Y, Liu X (2000) Generating optimal video summaries. In Proceedings of IEEE International Conference on Multimedia and Expo, pp 1559–1562
Haralick R, Shanmugam K, Dinstein I (1973) Textural feature for image classification. IEEE Trans Syst Man Cybern SMC-3(No.6):610–621
Harel J, Koch C, Perona P (2006) Graph-based visual saliency. NIPS’06, pp 545–552
Hua X, Li S, Zhang H (2005) Video booklet. In Proceedings of IEEE International Conference on Multimedia and Expo, pp 4,6–8
Jiang J, Zhang X (2011) Video thumbnail extraction using video time density function and independent component analysis mixture model. In Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp 1417–1420
Joshi D, Wang J, Li J (2004) The story picturing engine: finding elite images to illustrate a story using mutual reinforcement. In Proceedings of the 6th ACM SIGMM International Workshop on Multimedia Information Retrieval (MIR’04), pp 119–126
Li Y, Zhang T, Tretter D (2001) An overview of video abstraction techniques. Tech. Rep. HP-2001-191, HP Laboratory
Liu C, Huang Q, Jiang S (2011) Query sensitive dynamic web video thumbnail generation. In Proceedings of IEEE International Conference on Image Processing (ICIP), pp 2449–2452
Liu C, Liu H, Jiang S, Huang Q, Zheng Y, Zhang W (2006) JDL at TRECVID 2006 shot boundary detection. Online Proceedings of the TRECVID Workshops, 2006
Liu J, Wang B, Li M, Li Z, Ma W, Lu H, Ma S (2007) Dual cross-media relevance model for image annotation. In Proceedings of the 15th International Conference on Multimedia (MM’07), pp 605–614
Liu T, Zhang H, Qi F (2003) A novel video key-frame-extraction algorithm based on perceived motion energy model. IEEE Trans Circ Syst Video Technol 13(10):1006–1013
Moorthy A, Bovik A (2011) Visual quality assessment algorithms: what does the future hold? Multimed Tools Appl 51(2):675–696
Mukherjee S, Mukherjee D (2013) A design-of-experiment based statistical technique for detection of key-frames. Multimed Tools Appl 62(3):847–877
Niu Y, Liu F, Li X, Gleicher M (2012) Image resizing via non-homogeneous warping. Multimed Tools Appl 56(3):485–508
Qing L, Wang W, Huang T, Gao W (2002) A framework for background detection in video. Adv Multimed Inf Process PCM 2002:799–805
Santos A, Ortiz De Solórzano C, Vaquero J, Peña J, Malpica N, Del Pozo F (1997) Evaluation of autofocus functions in molecular cytogenetic analysis. J Microsc 188(3):264–272
Sun Y, Duthaler S, Nelson B (2004) Autofocusing in computer microscopy: selecting the optimal focus algorithm. Microsc Res Tech 65(3):139–149
SUU Design Studio (2013) A commercial software: Video Thumbnails Maker by Scorp. url: http://www.suu-design.com
Tombros A, Sanderson M (1998) Advantages of query biased summaries in information retrieval. In Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR’98), pp 2–10
Torralba A (2009) How many pixels make an image? Vis Neurosci 26(1):123–131
Vapnik V (1995) The nature of statistical learning theory. Springer-Verlag New York, Inc., USA
Wang M, Liu B, Hua X (2010) Accessible image search for colorblindness. ACM Transactions on Intelligent Systems and Technology (TIST), Vol.1, No.1, Article 8
Wang Z, Bovik A, Lu L (2002) Why is image quality assessment so difficult? In Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp 3313–3316
Wang Z, Bovik A, Sheikh H, Simoncelli E (2004) Image quality assessment: from error visibility to structural similarity. IEEE Trans Image Process 13(4):600–612
Wolf W (1996) Key frame selection by motion analysis. In Proceedings of IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), pp 1228–1231
Yong S, Deng J, Purvis M (2013) Wildlife video key-frame extraction based on novelty detection in semantic context. Multimed Tools Appl 62(2):359–376
Zhang H, Wu J, Zhong D, Smoliar S (1997) An integrated system for content based video retrieval and browsing. Pattern Recog 30(4):643–658
Zhang S, Tian Q, Hua G, Huang Q, Li S (2009) Descriptive visual words and visual phrases for image applications. In Proceedings of the 17th ACM International Conference on Multimedia (MM ’09), pp 75–84
Zhang W, Liu C, Huang Q, Jiang S, Gao W (2012) A novel framework for web video thumbnail generation. In Proceedings of the Eighth International Conference on Intelligent Information Hiding and Multimedia Signal Processing (IIH-MSP), pp 343–346
Zhuang Y, Rui Y, Huang T, Mehrotra S (1998) Adaptive key frame extraction using unsupervised clustering. In Proceedings of IEEE International Conference on Image Processing (ICIP), pp 866–870
Acknowledgments
This work was supported in part by National Basic Research Program of China (973 Program): 2012CB316400, in part by National Natural Science Foundation of China: 61025011, 61202322 and 61070108.
Author information
Authors and Affiliations
Corresponding authors
Rights and permissions
About this article
Cite this article
Zhang, W., Liu, C., Wang, Z. et al. Web video thumbnail recommendation with content-aware analysis and query-sensitive matching. Multimed Tools Appl 73, 547–571 (2014). https://doi.org/10.1007/s11042-013-1607-5
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-013-1607-5