Video Scene Analysis: A Machine Learning Perspective

Gao, Wen; Tian, Yonghong; Duan, Lingyu; Li, Jia; Li, Yuanning

doi:10.1007/978-1-4419-9482-0_4

Video Scene Analysis: A Machine Learning Perspective

Wen Gao³,
Yonghong Tian,
Lingyu Duan,
Jia Li &
…
Yuanning Li

Chapter
First Online: 01 January 2011

1087 Accesses
3 Altmetric

Abstract

With the increasing proliferation of digital video contents, learning-based video scene analysis has proven to be an effective methodology for improving the access and retrieval of large video collections. This chapter is devoted to present a survey and tutorial on the research in this topic. We identify two major categories of the state-of-the-art tasks based on their application setup and learning targets: generic methods and genre-specific analysis techniques. For generic video scene analysis problems, we discuss two kinds of learning models that aim at narrowing down the semantic gap and the intention gap, two main research challenges in video content analysis and retrieval. For genre-specific analysis problems, we take sports video analysis and surveillance event detection as illustrating examples.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Hardcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

S. Aksoy, K. Koperski, C. Tusk, G. Marchisio, and J.C. Tilton, “Learning Bayesian classifiers for scene classification with a visual grammar,” IEEE Trans. Geoscience and Remote Sensing, vol. 43, no. 3, pp. 581-589, 2005.
Article Google Scholar
Y. Altun, I. Tsochantaridis, and T. Hofman, “Hidden Markov support vector machines,” in Proc. IEEE Int. Conf. Mechine Learning, 2003, pp. 3-10.
Google Scholar
K. Barnard, P. Duygulu, N. de Freitas, D. Forsyth, D. Blei, and M. I. Jordan, “Matching words and pictures,” J. Machine Learning Research, vol 3, pp. 1107-1135, 2003.
Article MATH Google Scholar
S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge University Press, 2004.
Google Scholar
N. D. Bruce and J. K. Tsotsos. Saliency based on information maximization. In Advances in neural information processing systems, pp. 155-162, 2006.
Google Scholar
M. Cerf, J. Harel, W. Einhauser, and C. Koch, Predicting human gaze using low-level saliency combined with face detection, in Advances in Neural Information Processing Systems, 2008, pp. 241-248.
Google Scholar
Dai, J., Duan, L., Tong, X., Xu, C., Tian, Q., Lu, H., and Jin, J. 2005. Replay scene classification in soccer video using web broadcast text. In Proc. IEEE ICME. 1098-1101.
Google Scholar
L. Duan, I.W. Tsang, D. Xu, and S.J. Maybank, “Domain transfer SVM for video concept detection,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition, 2009, pp. 1-8.
Google Scholar
S. Ebadollahi, L. Xie, S.-F., Chang, and J.R. Smith, “Visual event detection using multidimensional concept dynamics,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2006, pp. 881-884.
Google Scholar
C. Frith. The top in top-down attention. In Neurobiology of attention (pp. 105-108), 2005.
Google Scholar
Wen Gao, Yonghong Tian, Tiejun Huang, Qiang Yang. Vlogging: A Survey of Video Blogging Technology on the Web. ACM Computing Survey, 2(4), Jun. 2010.
Google Scholar
Gunawardana, A., Mahajan, M., Acero, A., and Platt, J. 2005. Hidden conditional random fields for phone classification. In Proc. Interspeech. 1117-1120.
Google Scholar
C. Guo, Q. Ma, and L. Zhang, Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform, in IEEE Conference on Computer Vision and Pattern Recognition, 2008.
Google Scholar
J. S. Hare, P. H. Lewis, P. G. B. Enser and C. J. Sandom, “Mind the Gap: Another look at the problem of the semantic gap in image retrieval,” Multimedia Content Analysis, Management and Retrieval 2006, vol. 6073, No. 1, 2006, San Jose, CA, USA.
Google Scholar
J. Harel, C. Koch, and P. Perona, Graph-based visual saliency, in Advances in Neural Information Processing Systems, 2007, pp. 545-552.
Google Scholar
X. Hou and L. Zhang, Saliency detection: A spectral residual approach, in IEEE Conference on Computer Vision and Pattern Recognition, 2007.
Google Scholar
H. Hsu, L. Kennedy, and S. F. Chang, “Video search reranking through random walk over document-level context graph,” in Proc. ACM Multimedia, 2007, pp. 971-980.
Google Scholar
Y. Hu, D. Rajan, and L.-T. Chia, Robust subspace analysis for detecting visual attention regions in images, in ACM International Conference on Multimedia, 2005, pp. 716-724.
Google Scholar
L. Itti and C. Koch, Computational modeling of visual attention, Nature Review Neuroscience, vol. 2, no. 3, pp. 194-203, 2001.
Article Google Scholar
L. Itti and P. Baldi, A principled approach to detecting surprising events in video, in Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2005, pp. 631-637.
Google Scholar
L. Itti, G. Rees, and J. Tsotsos. Neurobiology of attention. San Diego: Elsevier, 2005
Google Scholar
L. Itti, Crcns data sharing: Eye movements during free-viewing of natural videos, in Collaborative Research in Computational Neuroscience Annual Meeting, 2008.
Google Scholar
L. Itti and C. Koch. Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging, 10(1), 161-169, 2001.
Article Google Scholar
L. Itti, C. Koch, and E. Niebur, A model of saliency-based visual attention for rapid scene analysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 20, no. 11, pp. 1254-1259, 1998.
Article Google Scholar
W. Jiang, S. F. Chang, and A. Loui, “Context-based concept fusion with boosted conditional random Fields,” in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, 2007, pp. 949-952.
Google Scholar
Shuqiang Jiang, Yonghong Tian, Qingming Huang, Tiejun Huang, Wen Gao. Content-Based Video Semantic Analysis. Book Chapter in Semantic Mining Technologies for Multimedia Databases (Edited by Tao, Xu, and Li), IGI Global, 2009.
Google Scholar
Y. G. Jiang, J. Wang, S. F. Chang, C. W. Ngo, “Domain adaptive semantic diffusion for large scale context-based video annotation,” in Proc. IEEE Int. Conf. Computer Vision, 2009, pp. 1-8.
Google Scholar
L. Kennedy, and S. F. Chang, “A reranking approach for context-based concept fusion in video indexing and retrieval,” in Proc. IEEE Int. Conf. on Image and Video Retrieval, 2007, pp. 333-340.
Google Scholar
W. Kienzle, F. A.Wichmann, B. Scholkopf, and M. O. Franz, A nonparametric approach to bottom-up visual saliency, in Advances in Neural Information Processing Systems, 2007, pp. 689-696.
Google Scholar
W. Kienzle, B. Scholkopf, F. A. Wichmann, and M. O. Franz, How to find interesting locations in video: a spatiotemporal interest point detector learned from human eye movements, in 29th DAGM Symposium, 2007, pp. 405-414.
Google Scholar
M. Li, Y. T. Zheng, S. X. Lin, Y. D. Zhang, T.-S. Chua, Multimedia evidence fusion for video concept detection via OWA operator, in Proc. Advances in Multimedia Modeling, pp. 208-216, 2009.
Google Scholar
H. Liu, S. Jiang, Q. Huang, C. Xu, and W. Gao, Region-based visual attention analysis with its application in image browsing on small displays, in ACM International Conference on Multimedia, 2007, pp. 305-308.
Google Scholar
T. Liu, J. Sun, N.-N. Zheng, X. Tang, and H.-Y. Shum, Learning to detect a salient object, in IEEE Conference on Computer Vision and Pattern Recognition, 2007.
Google Scholar
T. Liu, N. Zheng, W. Ding, and Z. Yuan, Video attention: Learning to detect a salient object sequence, in IEEE International Conference on Pattern Recognition, 2008.
Google Scholar
Y. Liu, F. Wu, Y. Zhuang, J. Xiao, “Active post-refined multimodality video semantic concept detection with tensor representation,” in Proc. ACM Multimedia, 2008, pp. 91-100.
Google Scholar
K. H. Liu, M. F. Weng, C. Y. Tseng, Y. Y. Chuang, and M. S. Chen, “Association and temporal rule mining for post-processing of semantic concept detection in video,” IEEE Trans. Multimedia, 2008, pp. 240-251.
Google Scholar
Y.-F. Ma, X.-S. Hua, L. Lu, and H.-J. Zhang, A generic framework of user attention model and its application in video summarization, IEEE Transactions on Multimedia, vol. 7, no. 5, pp. 907-919, 2005.
Article Google Scholar
S. Marat, T. H. Phuoc, L. Granjon, N. Guyader, D. Pellerin, and A. Guerin-Dugue, Modelling spatio-temporal saliency to predict gaze direction for short videos, International Journal of Computer Vision, vol. 82, no. 3, pp. 231-243, 2009.
Article Google Scholar
G. Miao, G. Zhu, S. Jiang, Q. Huang, C. Xu, and W. Gao, A Real-Time Score Detection and Recognition Approach for Broadcast Basketball Video. In Proc. IEEE Int. Conf. Multimedia and Expo, 2007, pp. 1691-1694.
Google Scholar
F. Monay and D. Gatica-Perez, “Modeling semantic aspects for cross-media image indexing,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 29, no. 10, pp. 1802-1917, Oct. 2007.
Article Google Scholar
M. R. Naphade, I. Kozintsev, and T. Huang, “Factor graph framework for semantic video indexing,” IEEE Trans. Circuits and Systems for Video Technology, 2002, pp. 40-52.
Google Scholar
M. R. Naphade, “On supervision and statistical learning for semantic multimedia analysis,” Journal of Visual Communication and Image Representation, vol. 15, no. 3, pp. 348-369, Sep. 2004.
Article Google Scholar
A. Natsev, A. Haubold, J. Tesic, L. Xie, R. Yan, “Semantic concept-based query expansion and re-ranking for multimedia retrieval,” in Proc. ACM Multimedia, 2007, pp. 991-1000.
Google Scholar
V. Navalpakkam and L. Itti, Search goal tunes visual features optimally, Neuron, vol. 53, pp. 605-617, 2007.
Article Google Scholar
T. N. Pappas, J.Q. Chen, D. Depalov, “Perceptually based techniques for image segmentation and semantic classification,” IEEE Communications Magazine, vol. 45, no. 1, pp. 44-51, Jan. 2007.
Article Google Scholar
R. J. Peters and L. Itti, Beyond bottom-up: Incorporating task-dependent influences into a computational model of spatial attention, in IEEE CVPR, 2007.
Google Scholar
R. J. Peters and L. Itti. Congruence between model and human attention reveals unique signatures of critical visual events. In Advances in neural information processing systems (pp. 1145-1152), 2007.
Google Scholar
G. J. Qi, X. S. Hua, Y. Rui, J. Tang, T. Mei, and H. J. Zhang, “Correlative multi-label video annotation,” in Proc. ACM Multimedia, 2007, pp. 17-26.
Google Scholar
Quattoni, A.,Wang, S., Morency, L., Collins, M., Darrell, T., and Csail, M. 2007. Hidden state conditional random fields. IEEE Transactions on Pattern Analysis and Machine Intelligence 29, 10, 1848-1852.
Article Google Scholar
A.W.M. Smeulders, M. Worring, S. Santini, A. Gupta, and R. Jain, “Content-based image retrieval at the end of the early years,” IEEE Trans. Pattern Anal. Mach Intell., vol. 22, no.12, pp. 1349-1380, Dec. 2000.
Article Google Scholar
J. R. Smith, M. Naphade, and A. Natsev, “Multimedia semantic indexing using model vectors,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2003, pp. 445-448.
Google Scholar
C. G. M. Snoek, M. Worring, J.C. Gemert, J.-M. Geusebroek, and A.W.M. Smeulers, “The challenge problem for automated detection of 101 semantic concepts in multimedia,” in Proc. ACM Multimedia, 2006, pp. 421-430.
Google Scholar
E. Spyrou and Y. Avrithis, “Detection of High-Level Concepts in Multimedia,” Encyclopedia of Multimedia, 2nd Edition, Springer 2008.
Google Scholar
A. M. Treisman and G. Gelade, A feature-integration theory of attention, Cognitive Psychology, vol. 12, no. 1, pp. 97-136, 1980.
Article Google Scholar
I. Tsochantaridis, T. Hofmann, T. Joachims, and Y. Altun, “Support vector machine learning for interdependent and structured output spaces,” in Proc. IEEE Int. Conf. Machine Learning, 2004, pp. 823-830.
Google Scholar
D. Walther and C. Koch, Modeling attention to salient proto-objects, Neural Networks, vol. 19, no. 9, pp. 1395-1407, 2006.
Article MATH Google Scholar
T. Wang, J. Li, Q. Diao, W. Hu, Y. Zhang, and C. Dulong, “Semantic event detection using conditional random fields,” in Proc. IEEE Int. Conf. Computer Vision and Pattern Recognition Workshop, 2006.
Google Scholar
M. Weng, Y. Chuang, “Multi-cue fusion for semantic video indexing,” in Proc. ACM Multimedia, 2008, pp. 71-80.
Google Scholar
L. Xie and S. F. Chang, “Structural analysis of soccer video with hidden markov models,” in Proc. Int. Conf. on Acoustics, Speech, and Signal Processing, 2002, pp. 767-775.
Google Scholar
Xiong, Z. Y., Zhou, X. S., Tian, Q., Rui, Y., and Huang, T. S. Semantic retrieval of video: Review of research on video retrieval in meetings, movies and broadcast news, and sports. IEEE Signal Processing Magazine 18, 3, 18-27, 2006.
Article Google Scholar
Xu, C., Wang, J., Wan, K., Li, Y., and Duan, L. 2006. Live sports event detection based on broadcast video and web-casting text. In Proc. ACM MM. 230.
Google Scholar
Xu, C., Zhang, Y., Zhu, G., Rui, Y., Lu, H., and Huang, Q. 2008. Using webcast text for semantic event detection in broadcast sports video. IEEE Transactions on Multimedia 10, 7, 1342-1355.
Article Google Scholar
R. Yan, M. Y. Chen, and A. Hauptmann, “Mining relationship between video concepts using probabilistic graphical models,” in Proc. IEEE Int. Conf. Multimedia and Expo, 2006, pp. 301-304.
Google Scholar
J. Yang and A. G. Hauptmann, “Exploring temporal consistency for video analysis and retrieval,” in Proc. 8th ACM SIGMM Int. Workshop on Multimedia Information Retrieval, 2006, pp. 33-46.
Google Scholar
J. Yang, R. Yan, A. Hauptmann, “Cross-domain video concept detection using adaptive svms,” in Proc. ACM Multimedia, 2007, pp. 188-297.
Google Scholar
Yang Yang, Jingen Liu, Mubarak Shah, Video Scene Understanding Using Multi-scale Analysis, Proc. 12th Int’l Conf. Computer Vision, 1669-1676, 2009.
Google Scholar
Zheng-Jun Zha, Linjun Yang, Tao Mei, Meng Wang, Zengfu Wang, Tat-Seng Chua, Xian-Sheng Hua. Visual query suggestion: Towards capturing user intent in internet image search. ACM Transactions on Multimedia Computing, Communications, and Applications, 6(3), Article 13, August 2010.
Google Scholar
Y. Zhai and M. Shah, Visual attention detection in video sequences using spatiotemporal cues, in ACM International Conference on Multimedia, 2006, pp. 815-824.
Google Scholar
H. Zhang, A. C. Berg. M. Maire, and J. Malik, ”Svm-knn: Discriminative nearest neighbor classification for visual category recognition,” Proc. IEEE Conf. CVPR, pp. 2126-2136, 2006.
Google Scholar

Download references

Acknowledgements

The work is supported by grants from the Chinese National Natural Science Foundation under contract No. 60973055 and No. 61035001, and National Basic Research Program of China under contract No. 2009CB320906.

Author information

Authors and Affiliations

School of EE & CS, Peking University, Beijing, 100871, China
Wen Gao

Authors

Wen Gao
View author publications
You can also search for this author in PubMed Google Scholar
Yonghong Tian
View author publications
You can also search for this author in PubMed Google Scholar
Lingyu Duan
View author publications
You can also search for this author in PubMed Google Scholar
Jia Li
View author publications
You can also search for this author in PubMed Google Scholar
Yuanning Li
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wen Gao .

Editor information

Editors and Affiliations

, Department of Electronic Engineering, The Chinese University of Hong Kong, Shatin, New Territories, Hong Kong SAR, China, People's Republic
King Ngi Ngan
Technology of China, School of Electronic Engineering, University of Electronic Science &, Chengdu, 610054, China, People's Republic
Hongliang Li

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Gao, W., Tian, Y., Duan, L., Li, J., Li, Y. (2011). Video Scene Analysis: A Machine Learning Perspective. In: Ngan, K., Li, H. (eds) Video Segmentation and Its Applications. Springer, New York, NY. https://doi.org/10.1007/978-1-4419-9482-0_4

Download citation

DOI: https://doi.org/10.1007/978-1-4419-9482-0_4
Published: 21 March 2011
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4419-9481-3
Online ISBN: 978-1-4419-9482-0
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics