Convolutional-Block-Attention Dual Path Networks for Slide Transition Detection in Lecture Videos
- 43 Downloads
Slide transition detection is used to find the images where the slide content changes, which form a summary of the lecture video and save the time for watching the lecture videos. 3D Convolutional Networks (3D ConvNet) has been regarded as an efficient approach to learn spatio-temporal features in videos. However, 3D ConvNet gives the same weight to all features in the image, and can’t focus on key feature information. We solve this problem by using the attention mechanism, which highlights more effective features information by suppressing invalid ones. Furthermore, 3D ConvNet usually costs much training time and needs lots of memory. Dual Path Network (DPN) combines the two network structures of ResNext and DenseNet and has the advantages of them. ResNext adds input directly to the convolved output, which takes advantage of extracted features from the previous hierarchy. DenseNet concatenates the output of each layer to the input of each layer, which extracts new features from the previous hierarchy. Based on the two networks, DPN not only saves training time and memory, but also extracts more effective features and improves training results. Consequently, we present a novel ConvNet architecture based on Convolutional Block Attention and DPN for slide transition detection in lecture videos. Experimental results show that the proposed novel ConvNet architecture achieves the better results than other slide detection approaches.
KeywordsLecture video Slide transition 3D ConvNet Convolutional Block Attention DPN
This work was supported by the Project of National Natural Science Foundation of (No. 61601278), “Chen Guang” project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation (No. 17CG41).
- 1.Ma, D., Agam, G.: Lecture video segmentation and indexing. J. Proc. SPIE 8297(1), 48 (2012)Google Scholar
- 3.Jaiswal, S., Misra, M.: Automatic indexing of lecture videos using syntactic similarity measures. In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 164–169 (2018)Google Scholar
- 5.Wang, F., et al.: Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6450–6458 (2017)Google Scholar
- 7.Fu, J., et al.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
- 8.He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
- 9.Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)Google Scholar
- 10.Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)Google Scholar
- 11.Huang, G., Liu, Z., Weinberger, K.Q., Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
- 12.Chen, Y., Li, J., Xiao, H., et al.: Dual path networks. arXiv preprint arXiv:1707.01629 (2017)
- 13.Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. arXiv preprint arXiv:1807.06521 (2018)
- 14.Gong, Y., Liu, X.: Video summarization using singular value decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 174–180 (2000)Google Scholar