Acoustic Cues for the Perceptual Assessment of Surround Sound

  • Ingo SiegertEmail author
  • Oliver JokischEmail author
  • Alicia Flores Lotz
  • Franziska Trojahn
  • Martin Meszaros
  • Michael Maruschke
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10458)


Speech and audio codecs are implemented in a variety of multimedia applications, and multichannel sound is offered by first streaming or cloud-based services. Beside the objective of perceptual quality, coding-related research is focused on low bitrate and minimal latency. The IETF-standardized Opus codec provides a high perceptual quality, low latency and the capability of coding multiple channels in various audio bandwidths up to Fullband (20 kHz). In a previous perceptual study on Opus-processed 5.1 surround sound, uncompressed and degraded stimuli were rated on a five-point degradation category scale (DMOS) for six channels at total bitrates between 96 and 192 kbit/s. This study revealed that the perceived quality depends on the music characteristics. In the current study we analyze spectral and music-feature differences between those five music stimuli at three coding bitrates and uncompressed sound to identify objective causes for perceptual differences. The results show that samples with annoying audible degradations involve higher spectral differences within the LFE channel as well as highly uncorrelated LSPs.


Opus Music coding Surround sound Spectral features Perception 



This work was partly carried out within the Transregional Collaborative Research Centre SFB/TRR 62 “Companion Technology for Cognitive Technical Systems” funded by the German Research Foundation (DFG) (


  1. 1.
    Dietz, M., Multrus, M., Eksler, V., Malenovsky, V., Norvell, E., Pobloth, H., Miao, L., Wang, Z., Laaksonen, L., Vasilache, A., Kamamoto, Y., Kikuiri, K., Ragot, S., Faure, J., Ehara, H., Rajendran, V., Atti, V., Sung, H., Oh, E., Yuan, H., Zhu, C.: Overview of the EVS codec architecture. In: 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5698–5702 (2015)Google Scholar
  2. 2.
    Dobbriner, J., Jokisch, O., Maruschke, M.: Assessment of prosodic attributes in codec-compressed speech. In: Draxler, C., Kleber, F. (eds.) Proceedings of 12th Conference Phonetik und Phonologie im deutschsprachigen Raum (P&P), Munich, Germany, vol. 12, pp. 35–39. LMU Munich, October 2016Google Scholar
  3. 3.
    Dolby Laboratories Inc.: Dolby Atmos Demonstration Disc, August 2014Google Scholar
  4. 4.
    Eyben, F., Wöllmer, M., Schuller, B.: OpenSMILE - The Munich versatile and fast open-source audio feature extractor. In: Proceedings of the ACM MM-2010, p. s.p., Firenze, Italy (2010)Google Scholar
  5. 5.
    Eyben, F., Schuller, B.: Music classification with the Munich openSMILE toolkit. In: Proceedings of Annual Meeting of the MIREX 2010 Community as Part of the 11th International Conference on Music Information Retrieval, p. s.p., Utrecht, Netherlands, August 2010Google Scholar
  6. 6.
    Fastl, H., Zwicker, E.: Psychoacoustics. Facts and Models. Springer, Berlin (2007)CrossRefGoogle Scholar
  7. 7.
    Hoene, C., Valin, J.M., Vos, K., Skoglund, J.: Summary of Opus listening test results draft-valin-codec-results-03. Internet-draft, IETF (2013).
  8. 8.
    ITU-R: Multichannel stereophonic sound system with and without accompanying picture. REC BS.775-3, International Telecommunication Union (Radiocommunication Sector), August 2012.
  9. 9.
    ITU-T: Methods for objective and subjective assessment of quality- Methods for subjective determination of transmissen quality. REC P.800, International Telecommunication Union (Telecommunication Standardization Sector), August 1996.
  10. 10.
    Jarschel, M., Schlosser, D., Scheuring, S., Hoßfeld, T.: An evaluation of QoE in cloud gaming based on subjective tests. In: Fifth International Conference on Innovative Mobile and Internet Services in Ubiquitous Computing, pp. 330–335, Seoul, Korea (2011)Google Scholar
  11. 11.
    Jokisch, O., Maruschke, M.: Audio and speech coding/transcoding in web real-time communication. In: International Symposium on Human Life Design (HLD 2016), p. s.p., Kanazawa, Japan (2016)Google Scholar
  12. 12.
    Lindberg Lyd AS. 2L - the Nordic sound: HiRes Test Bench (online available). Accessed 15 Jan 2017
  13. 13.
    Lutzky, M., Schuller, G., Gayer, M., Krämer, U., Wabnik, S.: A guideline to audio codec delay. In: AES 116th Convention, Berlin, Germany, pp. 8–11 (2004)Google Scholar
  14. 14.
    Maruschke, M., Jokisch, O., Meszaros, M., Trojahn, F., Hoffmann, M.: Quality assessment of two fullband audio codecs supporting real-time communication. In: Ronzhin, A., Potapova, R., Németh, G. (eds.) SPECOM 2016. LNCS (LNAI), vol. 9811, pp. 571–579. Springer, Cham (2016). doi: 10.1007/978-3-319-43958-7_69 CrossRefGoogle Scholar
  15. 15.
    Maruschke, M., Jokisch, O., Meszaros, M., Iaroshenko, V.: Review of the Opus Codec in a WebRTC scenario for audio and speech communication. In: Ronzhin, A., Potapova, R., Fakotakis, N. (eds.) SPECOM 2015. LNCS (LNAI), vol. 9319, pp. 348–355. Springer, Cham (2015). doi: 10.1007/978-3-319-23132-7_43 CrossRefGoogle Scholar
  16. 16.
    Rämö, A., Toukomaa, H.: Voice quality characterization of IETF Opus codec. In: Proceedings of the INTERSPEECH-2011, pp. 2541–2544, Florence, Italy (2011)Google Scholar
  17. 17.
    Rämö, A., Toukomaa, H.: Subjective qualitiy evaluation of the 3Gpp. EVS codec. In: Proceedings of the 40th IEEE ICASSP, pp. 5157–5161, Brisbane, Australia (2015)Google Scholar
  18. 18.
    Siegert, I., Lotz, A.F., l. Duong, L., Wendemuth, A.: Measuring the impact of audio compression on the spectral quality of speech data. In: Elektronische Sprachsignalverarbeitung 2016. Studientexte zur Sprachkommunikation, vol. 81, pp. 229–236, Leipzig, Germany (2016)Google Scholar
  19. 19.
    Trojahn, F., Meszaros, M., Maruschke, M., Jokisch, O.: Surround sound processed by Opus codec: a perceptual quality assessment. In: Elektronische Sprachsignalverarbeitung 2017. Tagungsband der 28. Konferenz. Studientexte zur Sprachkommunikation, vol. 86, pp. 300–307. TUDpress, Saarbrücken, Germany (2017)Google Scholar
  20. 20.
    Valin, J.M., Maxwell, G., Terriberry, T., Vos, K.: High-quality, low-delay music coding in the Opus codec. In: Proceedings of the 135th Audio Engineering Society Convention, p. s.p. Audio Engineering Society, New York, USA, October 2013Google Scholar
  21. 21.
    Valin, J., Vos, K., Terriberry, T.: Definition of the Opus audio codec. RFC 6716.
  22. 22.
    Zion Market Research Blog: Sound Bar Market: Rising events in corporate, film industry, sports and others increase the demand of sound bar systems, November 2016Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  • Ingo Siegert
    • 1
    Email author
  • Oliver Jokisch
    • 2
    Email author
  • Alicia Flores Lotz
    • 1
  • Franziska Trojahn
    • 2
  • Martin Meszaros
    • 2
  • Michael Maruschke
    • 2
  1. 1.Cognitive Systems Group, Institute of Information and Communication EngineeringOtto von Guericke UniversityMagdeburgGermany
  2. 2.Institute of Communications EngineeringLeipzig University of TelecommunicationsLeipzigGermany

Personalised recommendations