Abstract
Video analytics of real-life scenario deals with the multimedia data statistics that may be characterized by multimodal features of the video components. Large varieties of low-scale multimodal features of the objects creates many challenging issues for discrimination and analysis. On the other hand occlusion, varied illuminations, and complex environmental conditions highlight the video parsing, a challenging research problem. For the experimental purpose, the vital components of the videos include scenes, shots, keyframes, objects, and background. In this work, we focus on keyframes and shot boundaries for scene segmentation of the sample videos taken from YouTube. Structure Similarity index (SSIM) of the shots is computed from the histograms of LBP and HSV color similarities. Motion similarity and inverse time proximity are added to generate Shot Similarity Graph. Sliding window methods are used for grouping similar shots. The proposed work for scene segmentation is validated on six videos of various semantics characterized by human being and animals. The play of the video ranges from 0.5 to 15 min and total no. of scenes in the videos range from 06 to 33.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Li, L.J., Socher, R., Fei-Fei, L.: Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In: Computer Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on pp. 2036–2043. IEEE, (2009)
Tighe, J., Lazebnik, S.: Superparsing. IJCV 101(2), 329–349 (2013)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Learning hierarchical features for scene labeling. IEEE Trans. Pattern Anal. Mach. Intell. 35(8), 1915–1929 (2013)
Myeong, H., Mu Lee, K.: Tensor-based high-order semantic relation transfer for semantic scene segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3073–3080, (2013)
Hong, S., Noh, H., Han, B.: Decoupled deep neural network for semisupervised semantic segmentation, in NIPS, pp. 1495–1503, (2015)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Ioannidis, A., Chasanis, V., Likas, A.: Weighted multi-view key-frame extraction. Pattern Recogn. Lett. 72, 52–61 (2016)
Gygli, M.: Ridiculously fast shot boundary detection with fully convolutional neural networks. arXiv preprint arXiv:1705.08214, (2017)
Hassanien, A., Elgharib, M., Selim, A., Hefeeda, M., Matusik, W.: Large-scale, fast and accurate shot boundary detection through spatio-temporal convolutional neural networks. arXiv preprint arXiv:1705.03281, (2017)
Lee, H., Yu, J., Im, Y., Gil, J.M., Park, D.: A unified scheme of shot boundary detection and anchor shot detection in news video story parsing. Multimedia Tools Appl. 51(3), 1127–1145 (2011)
Mondal, J., Kundu, M.K., Das, S., Chowdhury, M.: Video shot boundary detection using multiscale geometric analysis of nsct and least squares support vector machine. Multimedia Tools Appl. 1–23 (2017)
Mohanta, P.P., Saha, S.K., Chanda, B.: A model-based shot boundary detection technique using frame transition parameters. IEEE Trans. Multimedia 14(1), 223–233 (2012)
Cyganek, B., Woźniak, M.: Tensor-based shot boundary detection in video streams. New Generation Computing 35(4), 311–340 (2017)
Thounaojam, D.M., Bhadouria, V.S., Roy, S., Singh, K.M.: Shot boundary detection using perceptual and semantic information. Int. J. Multimedia Inf. Retrieval 6(2), 167–174 (2017)
Fan, J., Zhou, S., Siddique, M.A.: Fuzzy color distribution chart-based shot boundary detection. Multimedia Tools Appl. 76(7), 10169–10190 (2017)
Lu, Z.M., Shi, Y.: Fast video shot boundary detection based on SVD and pattern matching. IEEE Trans. Image Process. 22(12), 5136–5145 (2013)
Bae, G., Cho, S.I., Kang, S.J., Kim, Y.H.: Dual-dissimilarity measure-based statistical video cut detection. J. Real-Time Image Process. 1–11 (2017)
Yong, S.P., Deng, J.D., Purvis, M.K.: Wildlife video key-frame extraction based on novelty detection in semantic context. Multimedia Tools and Appl. 62(2), 359–376 (2013)
Dang, C.T., Kumar, M., Radha, H.: Key frame extraction from consumer videos using epitome. In: Image Processing (ICIP), 2012 19th IEEE International Conference on, pp. 93–96. IEEE, (2012)
Gianluigi, C., Raimondo, S.: An innovative algorithm for key frame extraction in video summarization. J. Real-Time Image Process. 1(1), 69–88 (2006)
Fei, M., Jiang, W., & Mao, W.: A novel compact yet rich key frame creation method for compressed video summarization. Multimedia Tools Appl. 1–21 (2017)
Badrinarayanan, V., Kendall, A., Cipolla, R.: Segnet: A deep convolutional encoderdecoder architecture for scene segmentation. IEEE Trans. Pattern Anal. Machine Intell. (2017)
Neverova, N., Luc, P., Couprie, C., Verbeek, J., LeCun, Y.: Predicting deeper into the future of semantic segmentation. arXiv preprint arXiv:1703.07684, (2017)
Li, L., Qian, B., Lian, J., Zheng, W., Zhou, Y.: Traffic scene segmentation based on RGB-D image and deep learning. IEEE Trans. Intell. Transport. Syst. (2017)
Farabet, C., Couprie, C., Najman, L., LeCun, Y.: Scene parsing with multiscale feature learning, purity trees, and optimal covers. arXiv preprint arXiv:1202.2160, (2012)
J. Long, E. Shelhamer, T. Darrell: Fully convolutional networks for semantic segmentation. In: CVPR, pp. 3431–3440 (2015)
Galleguillos, C., Belongie, S.: Context based object categorization: A critical survey. Comput. Vis. Image Underst. 114(6), 712–722 (2010)
Liu, Z., Li, X. Luo, P. Loy, C.-C., Tang, X.: Semantic image segmentation via deep parsing network. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 1377–1385, (2015)
Handa, A., Patraucean, V., Badrinarayanan, V. Stent, S., Cipolla, R.: Scenenet: Understanding real world indoor scenes with synthetic data. In: CVPR, (2016)
Song, S., Lichtenberg, S.P., Xiao, J.: Sun rgb-d: a rgb-d scene understanding benchmark suite,: In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576, (2015)
H. Noh, S. Hong, B. Han: Learning deconvolution network for semantic segmentation. In: ICCV, pp. 1520–1528 (2015)
Moscheni, F., Bhattacharjee, S., Kunt, M.: Spatiotemporal segmentation based on region merging. IEEE Trans. Pattern Anal. Mach. Intell. 20(9), 897–915 (1998)
C. Zhang, L. Wang, and R. Yang: Semantic segmentation of urban scenes using dense depth maps. In: ECCV, pp. 708–721, Springer, (2010)
Caelles, S., Maninis, K.K., Pont-Tuset, J., Leal-Taixé, L., Cremers, D., Van Gool, L.: One-shot video object segmentation. In: CVPR, IEEE, (2017)
Arbelaez, P., Maire, M., Fowlkes, C., Malik, J.: Contour detection and hierarchical image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 33(5), 898–916 (2011)
Zhu, S., Liu, Y.: Video scene segmentation and semantic representation using a novel scheme. Multimedia Tools and Appl. 42(2), 183–205 (2009)
Hariharan, B., Arbelaez, P., Girshick, R., Malik, J.: Hypercolumns for object segmentation and fine-grained localization. In CVPR, pp. 447–456, (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Kumar, N., Sukavanam, N. (2019). Keyframes and Shot Boundaries: The Attributes of Scene Segmentation and Classification. In: Yadav, N., Yadav, A., Bansal, J., Deep, K., Kim, J. (eds) Harmony Search and Nature Inspired Optimization Algorithms. Advances in Intelligent Systems and Computing, vol 741. Springer, Singapore. https://doi.org/10.1007/978-981-13-0761-4_74
Download citation
DOI: https://doi.org/10.1007/978-981-13-0761-4_74
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-0760-7
Online ISBN: 978-981-13-0761-4
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)