Skip to main content

Audio-Visual TV Broadcast Signal Segmentation

  • Conference paper
  • First Online:
Man-Machine Interactions 6 (ICMMI 2019)

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1061 ))

Included in the following conference series:

  • 349 Accesses

Abstract

Research in the field of audio-visual broadcast programs transcription and indexing has been solved for more than 20 years. Great progress has been made mainly in the area of broadcast transcription from audio signal. In the last 10 years, this research has become more intense, mainly due to the use of deep or convolutional neural networks and because of large IT companies (Google, Microsoft, IBM, Amazon) that can rely on a large number of custom embedded multimedia databases. Very important part of system for audio-visual broadcast signal transcription is subsystem for signal segmentation. Signal segmentation is usually solved separately for audio and visual signal. In this paper, a methodology for audio-visual broadcast signal segmentation is presented and described. The result from audio signal segmentation is used for improving of visual signal segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 129.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 169.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Clua, E., Fonseca, M., Conci, A., Montenegro, A.: Detecting shot transitions based on video content. In: 2008 15th International Conference on Systems, Signals and Image Processing, pp. 339–342. IEEE (2008)

    Google Scholar 

  2. Dahl, G.E., Sainath, T.N., Hinton, G.E.: Improving deep neural networks for LVCSR using rectified linear units and dropout. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8609–8613. IEEE (2013)

    Google Scholar 

  3. Deng, L., Li, J., Huang, J.T., Yao, K., Yu, D., Seide, F., Seltzer, M., Zweig, G., He, X., Williams, J., et al.: Recent advances in deep learning for speech research at microsoft. In: 2013 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 8604–8608. IEEE (2013)

    Google Scholar 

  4. Hrúz, M., Zajíc, Z.: Convolutional neural network for speaker change detection in telephone speaker diarization system. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4945–4949. IEEE (2017)

    Google Scholar 

  5. Huang, M., Jiang, J., Liang, S., Zhang, Y.: A shot gradual changes detection algorithm based on adjacent scale wavelet transform. In: PIAGENG 2009: Intelligent Information, Control, and Communication Technology for Agricultural Engineering, vol. 7490, p. 74901F. International Society for Optics and Photonics (2009)

    Google Scholar 

  6. Kim, W.H., Jeong, T.I., Kim, J.N.: Video segmentation algorithm using threshold and weighting based on moving sliding window. In: 2009 11th International Conference on Advanced Communication Technology, vol. 3, pp. 1781–1784. IEEE (2009)

    Google Scholar 

  7. Lefèvre, S., Holler, J., Vincent, N.: A review of real-time segmentation of uncompressed video sequences for content-based search and retrieval. Real-Time Imaging 9(1), 73–98 (2003)

    Article  Google Scholar 

  8. Mateju, L., Cerva, P., Zdansky, J., Malek, J.: Speech activity detection in online broadcast transcription using deep neural networks and weighted finite state transducers. In: 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5460–5464. IEEE (2017)

    Google Scholar 

  9. Neri, L., Pinheiro, H., Ren, T., Cavalcanti, G., Adami, A.: Speaker segmentation using i-vectors in meeting domain. In: ICASSP 2017, pp. 5455–5459. IEEE (2017)

    Google Scholar 

  10. Nouza, J., Blavka, K., Zdansky, J., Cerva, P., Silovsky, J., Bohac, M., Chaloupka, J., Kucharova, M., Seps, L.: Large-scale processing, indexing and search system for czech audio-visual cultural heritage archives. In: 2012 IEEE 14th International Workshop on Multimedia Signal Processing (MMSP), pp. 337–342. IEEE (2012)

    Google Scholar 

  11. Nouza, J., Cerva, P., Safarik, R.: Cross-lingual adaptation of broadcast transcription system to polish language using public data sources. In: Language and Technology Conference, pp. 31–41. Springer (2015)

    Google Scholar 

  12. Nouza, J., Safarik, R., Cerva, P.: ASR for South Slavic languages developed in almost automated way. In: INTERSPEECH, pp. 3868–3872 (2016)

    Google Scholar 

  13. Nouza, J., Silovsky, J., Zdansky, J., Cerva, P., Kroul, M., Chaloupka, J.: Czech-to-Slovak adapted broadcast news transcription system. In: Ninth Annual Conference of the International Speech Communication Association (2008)

    Google Scholar 

  14. Nouza, J., Zdánskỳ, J., David, P., Cerva, P., Kolorenc, J., Nejedlová, D.: Fully automated system for czech spoken broadcast transcription with very large (300k+) lexicon. In: Ninth European Conference on Speech Communication and Technology (2005)

    Google Scholar 

  15. Rouvier, M., Dupuy, G., Gay, P., Khoury, E., Merlin, T., Meignier, S.: An open-source state-of-the-art toolbox for broadcast news diarization. In: Interspeech (2013)

    Google Scholar 

  16. Sarkar, A., Dasgupta, S., Naskar, S., Bandyopadhyay, S.: Says who? deep learning models for joint speech recognition, segmentation and diarization. In: ICASSP 2018, pp. 5229–5233. IEEE (2018)

    Google Scholar 

  17. Sinha, R., Tranter, S.E., Gales, M.J., Woodland, P.C.: The Cambridge university march 2005 speaker diarisation system. In: Ninth European Conference on Speech Communication and Technology (2005)

    Google Scholar 

  18. Tong, W., Song, L., Yang, X., Qu, H., Xie, R.: CNN-based shot boundary detection and video annotation. In: 2015 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, pp. 1–5. IEEE (2015)

    Google Scholar 

  19. Vandecatseye, A., Martens, J.P., Neto, J.P., Meinedo, H., Garcia-Mateo, C., Dieguez-Tirado, J., Mihelic, F., Zibert, J., Nouza, J., David, P., et al.: The cost 278 pan-European broadcast news database. In: LREC (2004)

    Google Scholar 

Download references

Acknowledgements

The research was supported by the Technology Agency of the Czech Republic in project no. TH03010018.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Josef Chaloupka .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Chaloupka, J. (2020). Audio-Visual TV Broadcast Signal Segmentation. In: Gruca, A., Czachórski, T., Deorowicz, S., Harężlak, K., Piotrowska, A. (eds) Man-Machine Interactions 6. ICMMI 2019. Advances in Intelligent Systems and Computing, vol 1061 . Springer, Cham. https://doi.org/10.1007/978-3-030-31964-9_21

Download citation

Publish with us

Policies and ethics