Skip to main content

Detection of Dialogue in Movie Soundtrack for Speech Intelligibility Enhancement

  • Conference paper
Multimedia Communications, Services and Security (MCSS 2014)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 429))

Abstract

A method for detecting dialogue in 5.1 movie soundtrack based on interchannel spectral disparity is presented. The front channel signals (left, right, center) are analyzed in the frequency domain. The selected partials in the center channel signal, which yield high disparity with left and right channels, are detected as dialogue. Subsequently, the dialogue frequency components are boosted to achieve increased dialogue intelligibility. The techniques for reduction of artifacts in the processed signal are also introduced. Smoothing in the time domain and in the frequency domain is applied to reduce unpleasant artifacts. The results of objective tests are provided, which prove that increased dialogue intelligibility is achieved with the aid of the proposed algorithm. The algorithm is particularly applicable in mobile devices while listening in mobile conditions.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Technical Specification: ETSI TS 102 563 V1.2.1, European Telecommunication Standards Institute (2010)

    Google Scholar 

  2. Technical standard: ISO/IEC 14496-3:2009, Information technology – Coding of audio-visual objects – Part 3: Audio, International Standards Organization (2009)

    Google Scholar 

  3. Technical standard: ITU-R B S.775-3 - Multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union (2006)

    Google Scholar 

  4. Lopatka, K., Czyzewski, A.: Method and apparatus for speech clarity enhancement in multichannel multimedia signal, especially audio-visual signal. Polish patent application no. P.402373 (January 7, 2013)

    Google Scholar 

  5. Kotti, M., Ververidis, D., Evangelopoulos, G., Panagakis, I., Kotropoulos, C., Maragos, P., Pita, I.: Audio-Assisted Movie Dialogue Detection. IEEE Transactions on Circuits and Systems for Video Technology 18(11), 1618–1627 (2008)

    Article  Google Scholar 

  6. Kotti, M., Benetos, E., Kotropoulos, C., Pitas, I.: A neural network approach to audio-assisted movie dialogue detection. Neurocomputing 71(1-3), 157–166 (2007)

    Article  Google Scholar 

  7. Han, J., Chen, C.-W.: Improving melody extraction using Probabilistic Latent Component Analysis. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, pp. 33–36. IEEE, Prague (2011)

    Chapter  Google Scholar 

  8. Lee, T.-W., Lewicki, M.S., Girolami, M., Sejnowski, T.J.: Blind source separation of more sources than mixtures using overcomplete representations. Signal Processing Letters 6(4), 87–90 (1999)

    Article  Google Scholar 

  9. Barry, D., Lawlor, R., Coyle, E.: Real-time sound source separation: Azimuth discrimination and resynthesis. In: 117th Audio Engineering Society Convention. AES, San Francisco (2004)

    Google Scholar 

  10. Lopatka, K., Kunka, B., Czyzewski, A.: Novel 5.1 downmix algorithm with improved dialogue intelligibility. In: 134th Audio Engineering Society Convention, May 4-7. AES, Rome (2013)

    Google Scholar 

  11. Goh, Z., Tan, K.C., Tan, T.G.: Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Transactions on Speech and Audio Processing 6(3), 287–292 (1998)

    Article  Google Scholar 

  12. Cappe, O.: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing 2(2), 345–349 (1994)

    Article  Google Scholar 

  13. Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing 32(6), 1109–1121 (1984)

    Article  Google Scholar 

  14. Yager, R.: Exponential smoothing with credibility weighted observations. Information Sciences 252, 96–105 (2013)

    Article  MathSciNet  Google Scholar 

  15. ITU-T Recommendation P.800, Methods for Subjective Determination of Transmission Quality, ITU (1996)

    Google Scholar 

  16. Opticom software homepage (2013), http://www.opticom.de

  17. Stabinski, A.: Multimedia database for evaluation of downmix quality, Master thesis, Gdansk University of Technology (2013)

    Google Scholar 

  18. Digital Audio Compression Standard (AC-3, E-AC-3), ATSC (2010)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

Š 2014 Springer International Publishing Switzerland

About this paper

Cite this paper

Łopatka, K. (2014). Detection of Dialogue in Movie Soundtrack for Speech Intelligibility Enhancement. In: Dziech, A., Czyşewski, A. (eds) Multimedia Communications, Services and Security. MCSS 2014. Communications in Computer and Information Science, vol 429. Springer, Cham. https://doi.org/10.1007/978-3-319-07569-3_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-07569-3_12

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-07568-6

  • Online ISBN: 978-3-319-07569-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics