Abstract
Slide transition detection is used to find the images where the slide content changes, which form a summary of the lecture video and save the time for watching the lecture videos. 3D Convolutional Networks (3D ConvNet) has been regarded as an efficient approach to learn spatio-temporal features in videos. However, 3D ConvNet gives the same weight to all features in the image, and can’t focus on key feature information. We solve this problem by using the attention mechanism, which highlights more effective features information by suppressing invalid ones. Furthermore, 3D ConvNet usually costs much training time and needs lots of memory. Dual Path Network (DPN) combines the two network structures of ResNext and DenseNet and has the advantages of them. ResNext adds input directly to the convolved output, which takes advantage of extracted features from the previous hierarchy. DenseNet concatenates the output of each layer to the input of each layer, which extracts new features from the previous hierarchy. Based on the two networks, DPN not only saves training time and memory, but also extracts more effective features and improves training results. Consequently, we present a novel ConvNet architecture based on Convolutional Block Attention and DPN for slide transition detection in lecture videos. Experimental results show that the proposed novel ConvNet architecture achieves the better results than other slide detection approaches.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ma, D., Agam, G.: Lecture video segmentation and indexing. J. Proc. SPIE 8297(1), 48 (2012)
Li, K., Wang, J., Wang, H., Dai, Q.: Structuring lecture videos by automatic projection screen localization and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1233–1246 (2015)
Jaiswal, S., Misra, M.: Automatic indexing of lecture videos using syntactic similarity measures. In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 164–169 (2018)
Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014)
Wang, F., et al.: Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6450–6458 (2017)
Zhu, Y., Zhao, C., Guo, H., Wang, J., Zhao, X., Lu, H.: Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2019)
Fu, J., et al.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Huang, G., Liu, Z., Weinberger, K.Q., Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Chen, Y., Li, J., Xiao, H., et al.: Dual path networks. arXiv preprint arXiv:1707.01629 (2017)
Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. arXiv preprint arXiv:1807.06521 (2018)
Gong, Y., Liu, X.: Video summarization using singular value decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 174–180 (2000)
Mohanta, P.P., Saha, S.K., Chanda, B.: A model-based shot boundary detection technique using frame transition parameter. IEEE Trans. Multimed. 14(1), 223–233 (2012)
Acknowledgment
This work was supported by the Project of National Natural Science Foundation of (No. 61601278), “Chen Guang” project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation (No. 17CG41).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Guan, M., Li, K., Ma, R., An, P. (2020). Convolutional-Block-Attention Dual Path Networks for Slide Transition Detection in Lecture Videos. In: Zhai, G., Zhou, J., Yang, H., An, P., Yang, X. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2019. Communications in Computer and Information Science, vol 1181. Springer, Singapore. https://doi.org/10.1007/978-981-15-3341-9_9
Download citation
DOI: https://doi.org/10.1007/978-981-15-3341-9_9
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3340-2
Online ISBN: 978-981-15-3341-9
eBook Packages: Computer ScienceComputer Science (R0)