Convolutional-Block-Attention Dual Path Networks for Slide Transition Detection in Lecture Videos

Guan, Minhuang; Li, Kai; Ma, Ran; An, Ping

doi:10.1007/978-981-15-3341-9_9

Minhuang Guan^11,12,
Kai Li^11,12,
Ran Ma^11,12 &
…
Ping An^11,12

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1181))

Included in the following conference series:

International Forum on Digital TV and Wireless Multimedia Communications

642 Accesses

Abstract

Slide transition detection is used to find the images where the slide content changes, which form a summary of the lecture video and save the time for watching the lecture videos. 3D Convolutional Networks (3D ConvNet) has been regarded as an efficient approach to learn spatio-temporal features in videos. However, 3D ConvNet gives the same weight to all features in the image, and can’t focus on key feature information. We solve this problem by using the attention mechanism, which highlights more effective features information by suppressing invalid ones. Furthermore, 3D ConvNet usually costs much training time and needs lots of memory. Dual Path Network (DPN) combines the two network structures of ResNext and DenseNet and has the advantages of them. ResNext adds input directly to the convolved output, which takes advantage of extracted features from the previous hierarchy. DenseNet concatenates the output of each layer to the input of each layer, which extracts new features from the previous hierarchy. Based on the two networks, DPN not only saves training time and memory, but also extracts more effective features and improves training results. Consequently, we present a novel ConvNet architecture based on Convolutional Block Attention and DPN for slide transition detection in lecture videos. Experimental results show that the proposed novel ConvNet architecture achieves the better results than other slide detection approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 79.99; Price excludes VAT (USA)

Softcover Book: USD 99.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ma, D., Agam, G.: Lecture video segmentation and indexing. J. Proc. SPIE 8297(1), 48 (2012)
Google Scholar
Li, K., Wang, J., Wang, H., Dai, Q.: Structuring lecture videos by automatic projection screen localization and analysis. IEEE Trans. Pattern Anal. Mach. Intell. 37(6), 1233–1246 (2015)
Article Google Scholar
Jaiswal, S., Misra, M.: Automatic indexing of lecture videos using syntactic similarity measures. In: 2018 5th International Conference on Signal Processing and Integrated Networks (SPIN), pp. 164–169 (2018)
Google Scholar
Yang, H., Meinel, C.: Content based lecture video retrieval using speech and video text information. IEEE Trans. Learn. Technol. 7(2), 142–154 (2014)
Article Google Scholar
Wang, F., et al.: Residual attention network for image classification. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 6450–6458 (2017)
Google Scholar
Zhu, Y., Zhao, C., Guo, H., Wang, J., Zhao, X., Lu, H.: Attention CoupleNet: fully convolutional attention coupling network for object detection. IEEE Trans. Image Process. 28(1), 113–126 (2019)
Article MathSciNet Google Scholar
Fu, J., et al.: Dual attention network for scene segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2019)
Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
Google Scholar
Xie, S., Girshick, R., Dollár, P., Tu, Z., He, K.: Aggregated residual transformations for deep neural networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 5987–5995 (2017)
Google Scholar
Zagoruyko, S., Komodakis, N.: Wide residual networks. In: BMVC (2016)
Google Scholar
Huang, G., Liu, Z., Weinberger, K.Q., Maaten, L.: Densely connected convolutional networks. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Google Scholar
Chen, Y., Li, J., Xiao, H., et al.: Dual path networks. arXiv preprint arXiv:1707.01629 (2017)
Woo, S., Park, J., Lee, J.Y., et al.: CBAM: convolutional block attention module. arXiv preprint arXiv:1807.06521 (2018)
Gong, Y., Liu, X.: Video summarization using singular value decomposition. In: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 174–180 (2000)
Google Scholar
Mohanta, P.P., Saha, S.K., Chanda, B.: A model-based shot boundary detection technique using frame transition parameter. IEEE Trans. Multimed. 14(1), 223–233 (2012)
Article Google Scholar

Download references

Acknowledgment

This work was supported by the Project of National Natural Science Foundation of (No. 61601278), “Chen Guang” project supported by Shanghai Municipal Education Commission and Shanghai Education Development Foundation (No. 17CG41).

Author information

Authors and Affiliations

Shanghai Institute for Advanced Communication and Data Science, Shanghai, China
Minhuang Guan, Kai Li, Ran Ma & Ping An
School of Communication and Information Engineering, Shanghai University, Shanghai, China
Minhuang Guan, Kai Li, Ran Ma & Ping An

Authors

Minhuang Guan
View author publications
You can also search for this author in PubMed Google Scholar
Kai Li
View author publications
You can also search for this author in PubMed Google Scholar
Ran Ma
View author publications
You can also search for this author in PubMed Google Scholar
Ping An
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ran Ma .

Editor information

Editors and Affiliations

Shanghai Jiao Tong University, Shanghai, China
Guangtao Zhai
Shanghai Jiao Tong University, Shanghai, China
Jun Zhou
Shanghai Jiao Tong University, Shanghai, China
Hua Yang
Shanghai University, Shanghai, China
Ping An
Shanghai Jiao Tong University, Shanghai, China
Xiaokang Yang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Guan, M., Li, K., Ma, R., An, P. (2020). Convolutional-Block-Attention Dual Path Networks for Slide Transition Detection in Lecture Videos. In: Zhai, G., Zhou, J., Yang, H., An, P., Yang, X. (eds) Digital TV and Wireless Multimedia Communication. IFTC 2019. Communications in Computer and Information Science, vol 1181. Springer, Singapore. https://doi.org/10.1007/978-981-15-3341-9_9

Download citation

DOI: https://doi.org/10.1007/978-981-15-3341-9_9
Published: 16 February 2020
Publisher Name: Springer, Singapore
Print ISBN: 978-981-15-3340-2
Online ISBN: 978-981-15-3341-9
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics