Convolutional-Block-Attention Dual Path Networks for Slide Transition Detection in Lecture Videos

  • Minhuang Guan
  • Kai Li
  • Ran MaEmail author
  • Ping An
Conference paper
Part of the Communications in Computer and Information Science book series (CCIS, volume 1181)


Slide transition detection is used to find the images where the slide content changes, which form a summary of the lecture video and save the time for watching the lecture videos. 3D Convolutional Networks (3D ConvNet) has been regarded as an efficient approach to learn spatio-temporal features in videos. However, 3D ConvNet gives the same weight to all features in the image, and can’t focus on key feature information. We solve this problem by using the attention mechanism, which highlights more effective features information by suppressing invalid ones. Furthermore, 3D ConvNet usually costs much training time and needs lots of memory. Dual Path Network (DPN) combines the two network structures of ResNext and DenseNet and has the advantages of them. ResNext adds input directly to the convolved output, which takes advantage of extracted features from the previous hierarchy. DenseNet concatenates the output of each layer to the input of each layer, which extracts new features from the previous hierarchy. Based on the two networks, DPN not only saves training time and memory, but also extracts more effective features and improves training results. Consequently, we present a novel ConvNet architecture based on Convolutional Block Attention and DPN for slide transition detection in lecture videos. Experimental results show that the proposed novel ConvNet architecture achieves the better results than other slide detection approaches.


Lecture video Slide transition 3D ConvNet Convolutional Block Attention DPN 



This work was supported by the Project of National Natural Science Foundation of (No. 61601278), “Chen Guang” project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation (No. 17CG41).


  1. 1.
    Ma, D., Agam, G.: Lecture video segmentation and indexing. J. Proc. SPIE 8297(1), 48 (2012)Google Scholar
  2. 2.
    Li, K., Wang, J., Wang, H., Dai, Q.: Structuring lecture videos by automatic projection screen localization and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1233–1246 (2015)CrossRefGoogle Scholar
  3. 3.
    Jaiswal, S., Misra, M.: Automatic indexing of lecture videos using syntactic similarity measures. In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 164–169 (2018)Google Scholar
  4. 4.
    Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014)CrossRefGoogle Scholar
  5. 5.
    Wang, F., et al.: Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6450–6458 (2017)Google Scholar
  6. 6.
    Zhu, Y., Zhao, C., Guo, H., Wang, J., Zhao, X., Lu, H.: Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2019)MathSciNetCrossRefGoogle Scholar
  7. 7.
    Fu, J., et al.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)Google Scholar
  8. 8.
    He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)Google Scholar
  9. 9.
    Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)Google Scholar
  10. 10.
    Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)Google Scholar
  11. 11.
    Huang, G., Liu, Z., Weinberger, K.Q., Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)Google Scholar
  12. 12.
    Chen, Y., Li, J., Xiao, H., et al.: Dual path networks. arXiv preprint arXiv:1707.01629 (2017)
  13. 13.
    Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. arXiv preprint arXiv:1807.06521 (2018)
  14. 14.
    Gong, Y., Liu, X.: Video summarization using singular value decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 174–180 (2000)Google Scholar
  15. 15.
    Mohanta, P.P., Saha, S.K., Chanda, B.: A model-based shot boundary detection technique using frame transition parameter. IEEE Trans. Multimed. 14(1), 223–233 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2020

Authors and Affiliations

  1. 1.Shanghai Institute for Advanced Communication and Data ScienceShanghaiChina
  2. 2.School of Communication and Information EngineeringShanghai UniversityShanghaiChina

Personalised recommendations