Abstract
Research in the field of audio-visual broadcast programs transcription and indexing has been solved for more than 20 years. Great progress has been made mainly in the area of broadcast transcription from audio signal. In the last 10 years, this research has become more intense, mainly due to the use of deep or convolutional neural networks and because of large IT companies (Google, Microsoft, IBM, Amazon) that can rely on a large number of custom embedded multimedia databases. Very important part of system for audio-visual broadcast signal transcription is subsystem for signal segmentation. Signal segmentation is usually solved separately for audio and visual signal. In this paper, a methodology for audio-visual broadcast signal segmentation is presented and described. The result from audio signal segmentation is used for improving of visual signal segmentation.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Clua, E., Fonseca, M., Conci, A., Montenegro, A.: Detecting shot transitions based on video content. In: 2008 15th International Conference on Systems, Signals and Image Processing, pp. 339–342. IEEE (2008)
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8609–8613. IEEE (2013)
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)
Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949. IEEE (2017)
Huang, M., Jiang, J., Liang, S., Zhang, Y.: A shot gradual changes detection algorithm based on adjacent scale wavelet transform. In: PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering, vol. 7490, p. 74901F. International Society for Optics and Photonics (2009)
Kim, W.H., Jeong, T.I., Kim, J.N.: Video segmentation algorithm using threshold and weighting based on moving sliding window. In: 2009 11th International Conference on Advanced Communication Technology, vol. 3, pp. 1781–1784. IEEE (2009)
Lefèvre, S., Holler, J., Vincent, N.: A review of real-time segmentation of uncompressed video sequences for content-based search and retrieval. Real-Time Imaging 9(1), 73–98 (2003)
Mateju, L., Cerva, P., Zdansky, J., Malek, J.: Speech activity detection in online broadcast transcription using deep neural networks and weighted finite state transducers. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5460–5464. IEEE (2017)
Neri, L., Pinheiro, H., Ren, T., Cavalcanti, G., Adami, A.: Speaker segmentation using i-vectors in meeting domain. In: ICASSP 2017, pp. 5455–5459. IEEE (2017)
Nouza, J., Blavka, K., Zdansky, J., Cerva, P., Silovsky, J., Bohac, M., Chaloupka, J., Kucharova, M., Seps, L.: Large-scale processing, indexing and search system for czech audio-visual cultural heritage archives. In: 2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP), pp. 337–342. IEEE (2012)
Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Language and Technology Conference, pp. 31–41. Springer (2015)
Nouza, J., Safarik, R., Cerva, P.: ASR for South Slavic languages developed in almost automated way. In: INTERSPEECH, pp. 3868–3872 (2016)
Nouza, J., Silovsky, J., Zdansky, J., Cerva, P., Kroul, M., Chaloupka, J.: Czech-to-Slovak adapted broadcast news transcription system. In: Ninth Annual Conference of the International Speech Communication Association (2008)
Nouza, J., Zdánskỳ, J., David, P., Cerva, P., Kolorenc, J., Nejedlová, D.: Fully automated system for czech spoken broadcast transcription with very large (300k+) lexicon. In: Ninth European Conference on Speech Communication and Technology (2005)
Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Interspeech (2013)
Sarkar, A., Dasgupta, S., Naskar, S., Bandyopadhyay, S.: Says who? deep learning models for joint speech recognition, segmentation and diarization. In: ICASSP 2018, pp. 5229–5233. IEEE (2018)
Sinha, R., Tranter, S.E., Gales, M.J., Woodland, P.C.: The Cambridge university march 2005 speaker diarisation system. In: Ninth European Conference on Speech Communication and Technology (2005)
Tong, W., Song, L., Yang, X., Qu, H., Xie, R.: CNN-based shot boundary detection and video annotation. In: 2015 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pp. 1–5. IEEE (2015)
Vandecatseye, A., Martens, J.P., Neto, J.P., Meinedo, H., Garcia-Mateo, C., Dieguez-Tirado, J., Mihelic, F., Zibert, J., Nouza, J., David, P., et al.: The cost 278 pan-European broadcast news database. In: LREC (2004)
Acknowledgements
The research was supported by the Technology Agency of the Czech Republic in project no. TH03010018.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this paper
Cite this paper
Chaloupka, J. (2020). Audio-Visual TV Broadcast Signal Segmentation. In: Gruca, A., Czachórski, T., Deorowicz, S., Harężlak, K., Piotrowska, A. (eds) Man-Machine Interactions 6. ICMMI 2019. Advances in Intelligent Systems and Computing, vol 1061 . Springer, Cham. https://doi.org/10.1007/978-3-030-31964-9_21
Download citation
DOI: https://doi.org/10.1007/978-3-030-31964-9_21
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31963-2
Online ISBN: 978-3-030-31964-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)