Audio-Visual TV Broadcast Signal Segmentation

Chaloupka, Josef

doi:10.1007/978-3-030-31964-9_21

Josef Chaloupka¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1061 ))

Included in the following conference series:

International Conference on Man–Machine Interactions

349 Accesses

Abstract

Research in the field of audio-visual broadcast programs transcription and indexing has been solved for more than 20 years. Great progress has been made mainly in the area of broadcast transcription from audio signal. In the last 10 years, this research has become more intense, mainly due to the use of deep or convolutional neural networks and because of large IT companies (Google, Microsoft, IBM, Amazon) that can rely on a large number of custom embedded multimedia databases. Very important part of system for audio-visual broadcast signal transcription is subsystem for signal segmentation. Signal segmentation is usually solved separately for audio and visual signal. In this paper, a methodology for audio-visual broadcast signal segmentation is presented and described. The result from audio signal segmentation is used for improving of visual signal segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 129.00; Price excludes VAT (USA)

Softcover Book: USD 169.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Clua, E., Fonseca, M., Conci, A., Montenegro, A.: Detecting shot transitions based on video content. In: 2008 15th International Conference on Systems, Signals and Image Processing, pp. 339–342. IEEE (2008)
Google Scholar
Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8609–8613. IEEE (2013)
Google Scholar
Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)
Google Scholar
Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949. IEEE (2017)
Google Scholar
Huang, M., Jiang, J., Liang, S., Zhang, Y.: A shot gradual changes detection algorithm based on adjacent scale wavelet transform. In: PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering, vol. 7490, p. 74901F. International Society for Optics and Photonics (2009)
Google Scholar
Kim, W.H., Jeong, T.I., Kim, J.N.: Video segmentation algorithm using threshold and weighting based on moving sliding window. In: 2009 11th International Conference on Advanced Communication Technology, vol. 3, pp. 1781–1784. IEEE (2009)
Google Scholar
Lefèvre, S., Holler, J., Vincent, N.: A review of real-time segmentation of uncompressed video sequences for content-based search and retrieval. Real-Time Imaging 9(1), 73–98 (2003)
Article Google Scholar
Mateju, L., Cerva, P., Zdansky, J., Malek, J.: Speech activity detection in online broadcast transcription using deep neural networks and weighted finite state transducers. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5460–5464. IEEE (2017)
Google Scholar
Neri, L., Pinheiro, H., Ren, T., Cavalcanti, G., Adami, A.: Speaker segmentation using i-vectors in meeting domain. In: ICASSP 2017, pp. 5455–5459. IEEE (2017)
Google Scholar
Nouza, J., Blavka, K., Zdansky, J., Cerva, P., Silovsky, J., Bohac, M., Chaloupka, J., Kucharova, M., Seps, L.: Large-scale processing, indexing and search system for czech audio-visual cultural heritage archives. In: 2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP), pp. 337–342. IEEE (2012)
Google Scholar
Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Language and Technology Conference, pp. 31–41. Springer (2015)
Google Scholar
Nouza, J., Safarik, R., Cerva, P.: ASR for South Slavic languages developed in almost automated way. In: INTERSPEECH, pp. 3868–3872 (2016)
Google Scholar
Nouza, J., Silovsky, J., Zdansky, J., Cerva, P., Kroul, M., Chaloupka, J.: Czech-to-Slovak adapted broadcast news transcription system. In: Ninth Annual Conference of the International Speech Communication Association (2008)
Google Scholar
Nouza, J., Zdánskỳ, J., David, P., Cerva, P., Kolorenc, J., Nejedlová, D.: Fully automated system for czech spoken broadcast transcription with very large (300k+) lexicon. In: Ninth European Conference on Speech Communication and Technology (2005)
Google Scholar
Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Interspeech (2013)
Google Scholar
Sarkar, A., Dasgupta, S., Naskar, S., Bandyopadhyay, S.: Says who? deep learning models for joint speech recognition, segmentation and diarization. In: ICASSP 2018, pp. 5229–5233. IEEE (2018)
Google Scholar
Sinha, R., Tranter, S.E., Gales, M.J., Woodland, P.C.: The Cambridge university march 2005 speaker diarisation system. In: Ninth European Conference on Speech Communication and Technology (2005)
Google Scholar
Tong, W., Song, L., Yang, X., Qu, H., Xie, R.: CNN-based shot boundary detection and video annotation. In: 2015 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pp. 1–5. IEEE (2015)
Google Scholar
Vandecatseye, A., Martens, J.P., Neto, J.P., Meinedo, H., Garcia-Mateo, C., Dieguez-Tirado, J., Mihelic, F., Zibert, J., Nouza, J., David, P., et al.: The cost 278 pan-European broadcast news database. In: LREC (2004)
Google Scholar

Download references

Acknowledgements

The research was supported by the Technology Agency of the Czech Republic in project no. TH03010018.

Author information

Authors and Affiliations

The Institute of Information Technology and Electronics Technical University of Liberec, Liberec, Czech Republic
Josef Chaloupka

Authors

Josef Chaloupka
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Josef Chaloupka .

Editor information

Editors and Affiliations

Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Aleksandra Gruca
Institute of Theoretical and Applied Informatics, Polish Academy of Sciences, Gliwice, Poland
Tadeusz Czachórski
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Sebastian Deorowicz
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Katarzyna Harężlak
Institute of Informatics, Silesian University of Technology, Gliwice, Poland
Agnieszka Piotrowska

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chaloupka, J. (2020). Audio-Visual TV Broadcast Signal Segmentation. In: Gruca, A., Czachórski, T., Deorowicz, S., Harężlak, K., Piotrowska, A. (eds) Man-Machine Interactions 6. ICMMI 2019. Advances in Intelligent Systems and Computing, vol 1061 . Springer, Cham. https://doi.org/10.1007/978-3-030-31964-9_21

Download citation

DOI: https://doi.org/10.1007/978-3-030-31964-9_21
Published: 22 September 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-31963-2
Online ISBN: 978-3-030-31964-9
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics