Detection of Dialogue in Movie Soundtrack for Speech Intelligibility Enhancement

  • Kuba Łopatka
Part of the Communications in Computer and Information Science book series (CCIS, volume 429)


A method for detecting dialogue in 5.1 movie soundtrack based on interchannel spectral disparity is presented. The front channel signals (left, right, center) are analyzed in the frequency domain. The selected partials in the center channel signal, which yield high disparity with left and right channels, are detected as dialogue. Subsequently, the dialogue frequency components are boosted to achieve increased dialogue intelligibility. The techniques for reduction of artifacts in the processed signal are also introduced. Smoothing in the time domain and in the frequency domain is applied to reduce unpleasant artifacts. The results of objective tests are provided, which prove that increased dialogue intelligibility is achieved with the aid of the proposed algorithm. The algorithm is particularly applicable in mobile devices while listening in mobile conditions.


speech intelligibility center channel extraction speech processing 5.1 downmix 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Technical Specification: ETSI TS 102 563 V1.2.1, European Telecommunication Standards Institute (2010)Google Scholar
  2. 2.
    Technical standard: ISO/IEC 14496-3:2009, Information technology – Coding of audio-visual objects – Part 3: Audio, International Standards Organization (2009)Google Scholar
  3. 3.
    Technical standard: ITU-R B S.775-3 - Multichannel stereophonic sound system with and without accompanying picture. International Telecommunication Union (2006)Google Scholar
  4. 4.
    Lopatka, K., Czyzewski, A.: Method and apparatus for speech clarity enhancement in multichannel multimedia signal, especially audio-visual signal. Polish patent application no. P.402373 (January 7, 2013)Google Scholar
  5. 5.
    Kotti, M., Ververidis, D., Evangelopoulos, G., Panagakis, I., Kotropoulos, C., Maragos, P., Pita, I.: Audio-Assisted Movie Dialogue Detection. IEEE Transactions on Circuits and Systems for Video Technology 18(11), 1618–1627 (2008)CrossRefGoogle Scholar
  6. 6.
    Kotti, M., Benetos, E., Kotropoulos, C., Pitas, I.: A neural network approach to audio-assisted movie dialogue detection. Neurocomputing 71(1-3), 157–166 (2007)CrossRefGoogle Scholar
  7. 7.
    Han, J., Chen, C.-W.: Improving melody extraction using Probabilistic Latent Component Analysis. In: 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), May 22-27, pp. 33–36. IEEE, Prague (2011)CrossRefGoogle Scholar
  8. 8.
    Lee, T.-W., Lewicki, M.S., Girolami, M., Sejnowski, T.J.: Blind source separation of more sources than mixtures using overcomplete representations. Signal Processing Letters 6(4), 87–90 (1999)CrossRefGoogle Scholar
  9. 9.
    Barry, D., Lawlor, R., Coyle, E.: Real-time sound source separation: Azimuth discrimination and resynthesis. In: 117th Audio Engineering Society Convention. AES, San Francisco (2004)Google Scholar
  10. 10.
    Lopatka, K., Kunka, B., Czyzewski, A.: Novel 5.1 downmix algorithm with improved dialogue intelligibility. In: 134th Audio Engineering Society Convention, May 4-7. AES, Rome (2013)Google Scholar
  11. 11.
    Goh, Z., Tan, K.C., Tan, T.G.: Postprocessing method for suppressing musical noise generated by spectral subtraction. IEEE Transactions on Speech and Audio Processing 6(3), 287–292 (1998)CrossRefGoogle Scholar
  12. 12.
    Cappe, O.: Elimination of the musical noise phenomenon with the Ephraim and Malah noise suppressor. IEEE Transactions on Speech and Audio Processing 2(2), 345–349 (1994)CrossRefGoogle Scholar
  13. 13.
    Ephraim, Y., Malah, D.: Speech enhancement using a minimum-mean square error short-time spectral amplitude estimator. IEEE Transactions on Acoustics, Speech and Signal Processing 32(6), 1109–1121 (1984)CrossRefGoogle Scholar
  14. 14.
    Yager, R.: Exponential smoothing with credibility weighted observations. Information Sciences 252, 96–105 (2013)CrossRefMathSciNetGoogle Scholar
  15. 15.
    ITU-T Recommendation P.800, Methods for Subjective Determination of Transmission Quality, ITU (1996)Google Scholar
  16. 16.
    Opticom software homepage (2013),
  17. 17.
    Stabinski, A.: Multimedia database for evaluation of downmix quality, Master thesis, Gdansk University of Technology (2013)Google Scholar
  18. 18.
    Digital Audio Compression Standard (AC-3, E-AC-3), ATSC (2010)Google Scholar

Copyright information

© Springer International Publishing Switzerland 2014

Authors and Affiliations

  • Kuba Łopatka
    • 1
  1. 1.Faculty of Electronics, Telecommunication and Informatics, Multimedia Systems DepartmentGdansk University of TechnologyGdanskPoland

Personalised recommendations