Skip to main content

3D Spatial Sound Individualization with Perceptual Feedback

  • 752 Accesses

Part of the Human–Computer Interaction Series book series (HCIS)

Abstract

Designing an interactive system tailored appropriately for each user’s physical and cognitive characteristics is important for providing optimal user experience. In this chapter, we discuss how we could address such problems leveraging modern interactive machine learning techniques. As a case study, we introduce a method to individualize 3D spatial sound rendering with perceptual feedback. 3D spatial sound rendering traditionally required time-consuming measurement of individual user using an expensive device. By taking data-driven approach, one can replace such expensive measurement with simple calibration. We first describe how to train a generic deep learning model with an existing measured data set. We then describe how to adapt the model to a specific user with simple calibration process consisting of pairwise comparisons. Through this case study, the readers will get insight on how to adapt an interactive system for a specific user’s characteristics, taking advantage of the high expressiveness of modern machine learning techniques.

This is a preview of subscription content, access via your institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • DOI: 10.1007/978-3-030-82681-9_17
  • Chapter length: 25 pages
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
eBook
USD   139.00
Price excludes VAT (USA)
  • ISBN: 978-3-030-82681-9
  • Instant PDF download
  • Readable on all devices
  • Own it forever
  • Exclusive offer for individuals only
  • Tax calculation will be finalised during checkout
Hardcover Book
USD   179.99
Price excludes VAT (USA)
Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

References

  1. Wenzel EM, Arruda DJ, Kistler DJ (1993) Localization using non-individualized head-related transfer functions. J Acoust Soc Amer 94

    Google Scholar 

  2. Moller H, Sorensen MF, Jensen CB, Hammershoi D (1996) Binaural technique: do we need individual recordings? J Audio Eng Soc 44:451–469

    Google Scholar 

  3. Middlebrooks JC (1999) Virtual localization improved by scaling non-individualized external-ear transfer functions in frequency. J Acoust Soc Amer 106

    Google Scholar 

  4. Brochu E, Brochu T, de Freitas N (2010) A Bayesian interactive optimization approach to procedural animation design. In: Proceedings of the SCA, pp 103–112

    Google Scholar 

  5. Kristi T, Gupta Maya R (2011) How to analyze paired comparison data. Technical Report UWEETR-2011-0004

    Google Scholar 

  6. Patrick L, Alan C, Tom T, Seetzen H (2005) Evaluation of tone mapping operators using a high dynamic range display. ACM Trans Graph

    Google Scholar 

  7. Koyama Y, Sakamoto D, Igarashi T (2014) Crowd-powered parameter analysis for visual design exploration. In: Proceedings of ACM UIST, pp 56–74

    Google Scholar 

  8. Hansen N, Muller SD, Koumoutsakos P (2003) Reducing the time complexity of the derandomized evolution strategy with covariance matrix adaptation (CMA-ES). Evolut Comput 11:1–18

    Google Scholar 

  9. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) Generative adversarial nets. In: Advances in neural information processing systems

    Google Scholar 

  10. Kingma, Diederik P (2014) Semi-supervised learning with deep generative models. In: Advances in Neural Information Processing Systems

    Google Scholar 

  11. Kihyuk S, Lee H, Yan X (2015) Learning structured output representation using deep conditional generative models. In: Advances in neural information processing systems

    Google Scholar 

  12. Koren Y, Bell R, Volinsky C (2009) Matrix factorization techniques for recommender systems. IEEE Comput 42(8)

    Google Scholar 

  13. Yehuda K, Rovert B, Chris V (2015) Procedural modeling using autoencoder networks. In: Proceeding of ACM UIST

    Google Scholar 

  14. Daniel H, Jun S, Taku K (2016) A deep learning framework for character motion synthesis and editing. ACM Trans Graph (SIGGRAPH)

    Google Scholar 

  15. Xuefeng C, Xiabi L, Yunde J (2009) Combining evolution strategy and gradient descent method for discriminative learning of bayesian classifiers. Proc Gen Evolut Comput 8:507–514

    Google Scholar 

  16. Matheron G (1963) Principles of geostatistics. Econ Geol 1246–1266

    Google Scholar 

  17. Algazi VR, Duda RO, Thompson DM, Avendano C (2001) The CIPIC HRTF database. In: IEEE Workshop on applications of signal processing to audio and electroacoustics, pp 99–102

    Google Scholar 

  18. John H (1992) Adaptation in natural and artificial systems. MIT Press, Cambridge

    Google Scholar 

  19. Takahama R, Kamishima T, Kashima H (2016) Progressive comparison for ranking estimation. In: Proceedings of IJCAI

    Google Scholar 

  20. Kingma D, Ba JP (2014) Adam: a method for stochastic optimization. arXiv:1412.6980

  21. Kazuhiko Y, Takeo I (2017) Fully perceptual-based 3D spatial sound individualization with an adaptive variational autoEncoder. ACM Trans Graph (SIGGRAPH Asia)

    Google Scholar 

  22. Wenzel EM, Foster SH (1993) Fully perceptual consequences of interpolating head-related transfer functions during spatial synthesis. In: Proceedings of workshop on applications of signal processing to audio and acoustics

    Google Scholar 

  23. Langendijk EHA, Bronkhorst AW (2000) Fidelity of three-dimensional-sound reproduction using a virtual auditory display. J Acoust Soc Am

    Google Scholar 

  24. Oord AVD, Dieleman S, Zen H, Simonyan K, Vinyals O, Graves A, Kalchbrenner N, Senior A, Kavukcuoglu K, (2016) A generative model for raw audio. arXiv:1609.03499

  25. Tran D, Bourdev L, Fergus R, Torresani L, Paluri M (2015) Learning spatiotemporal features with 3D convolutional networks. In: Proceedings of IEEE ICCV

    Google Scholar 

  26. Tobias F, Chang HJ, Demiris Y (2018) Real-time eye gaze estimation in natural environments. In: Proceedings of ECCV, RT-GENE

    Google Scholar 

  27. Fischer T, Liu G, Yu Y, Mora KAF, Odobez J-M (2018) R differential approach for gaze estimation with calibration. In: Proceeding of BMVC

    Google Scholar 

  28. Xucong Z, Yusuke S, Andreas B (2019) Evaluation of appearance-based methods and implications for gaze-based applications. In: Proceedings of CHI

    Google Scholar 

  29. Shi X, Shan S, Kan M, Wu S, Chen X (2020) Real-Time rotation-invariant face detection with progressive calibration networks. In: Proceedings of INTERSPEECH

    Google Scholar 

  30. Kumar A, Singh S, Gowda D, Garg A, Singh S, Kim C (2018) Utterance confidence measure for end-to-end speech recognition with applications to distributed speech recognition scenarios. In: Proceedings of CVPR

    Google Scholar 

  31. Li C, Zhu L, Xu S, Gao P, Xu B (2018) Recurrent neural network based small-footprint wake-up-word speech recognition system with a score calibration method. In: International conference on pattern recognition

    Google Scholar 

  32. Erica C, Cheng-I L, Yusuke Y, Fuming F, Xin W, Nanxin C, Yamagishi J (2020) Zero-shot multi-speaker text-to-speech with state-of-the-art neural speaker embeddings. In: Proceeding of ICASSP

    Google Scholar 

  33. Naoki K, Michinari K, Jun R (2019) SottoVoce: an ultrasound imaging-based silent speech interaction using deep neural networks. In: Proceedings of CHI

    Google Scholar 

  34. Shahan N (2020) Zeo-shot singing voice conversion. In: Proceedings of ISMIR

    Google Scholar 

  35. Zining Z, Bingsheng H, Zhang Z (2020) GAZEV, GAN-based zero shot voice conversion over non-parallel speech corpus. In: Proceedings of INTERSPEECH

    Google Scholar 

  36. Tero K, Samuli L, Aila T (2019) A style-based generator architecture for generative adversarial networks. In: Proceedings of CVPR

    Google Scholar 

  37. Wonkwang L, Donggyun K, Seunghoon H, Lee H (2020) High-fidelity synthesis with disentangled representation. In: Proceedings of ECCV

    Google Scholar 

  38. William P, John P, Jun-Yan Z, Alexei E, Torralba A (2020) The Hessian penalty: a weak prior for unsupervised disentanglement. In: Proceedings of ECCV

    Google Scholar 

  39. Ramesh A, Pavlov M, Goh G, Gray S, Voss C, Radford A, Chen M, Sutskever I (2021) Zero-shot text-to-image generation. arXiv:2102.12092

  40. Songwei G, Vedanuj G, Larry Z, Parikh D (2021) Creative sketch generation. In: Proceeding of ICLR

    Google Scholar 

  41. Gaetan H, Francois P, Frank N (2017) Deepbach: a steerable model for bach chorales generation. In: Proceedings of international conference on machine learning

    Google Scholar 

  42. Dong H-W, Hsiao W-Y, Yang L-C, Yang Y-H (2020) Jukebox: a generative model for music. arXiv:2005.00341

  43. Dhariwal P, Jun H, Payne C, Kim JW, Radford A, Sutskever I (2018) Musegan: multi-track sequential generative adversarial networks for symbolic music generation and accompaniment. In: Proceedings of AAAI

    Google Scholar 

  44. Yijun Z, Yuki K, Masataka G, Igarashi T (2020) Generative melody composition with human-in-the-loop bayesian optimization. In: Proceedings of CSMC-MuMe

    Google Scholar 

  45. Yuki K, Issei S, Goto M (2020) Sequential gallery for interactive visual design optimization. ACM Trans Graph (SIGGRAPH)

    Google Scholar 

  46. Chia-Hsing C, Yuki K, Yu-Chi L, Takeo I, Yue Y (2020) Human-in-the-loop differential subspace search in high-dimensional latent space. ACM Trans Graph (SIGGRAPH)

    Google Scholar 

  47. Mengwei XU, Feng QIAN, Qiaozhu MEI, Huang K, Liu X (2018) DeepType: on-device deep learning for input personalization service with minimal privacy concern. ACM interactive, mobile, wearable and ubiquitous technologies

    Google Scholar 

  48. Liu J, Liu C, Belkin NJ (2020) Personalization in text information retrieval: a survey. J Ass Inf Sci Technol

    Google Scholar 

  49. Helten T, Baak A, Bharaj G, Muller M, Seidel H-P, Theobalt C (2013) Personalization and evaluation of a real-time depth-based full body tracker. In: 3DV-Conference

    Google Scholar 

  50. Anastasia T, Andrea T, Remelli E, Pauly M, Fitzgibbon AW (2017) Online generative model personalization for hand tracking. ACM Trans Graph (SIGGRAPH Asia)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Kazuhiko Yamamoto .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and Permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Verify currency and authenticity via CrossMark

Cite this chapter

Yamamoto, K., Igarashi, T. (2021). 3D Spatial Sound Individualization with Perceptual Feedback. In: Li, Y., Hilliges, O. (eds) Artificial Intelligence for Human Computer Interaction: A Modern Approach. Human–Computer Interaction Series. Springer, Cham. https://doi.org/10.1007/978-3-030-82681-9_17

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-82681-9_17

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-82680-2

  • Online ISBN: 978-3-030-82681-9

  • eBook Packages: Computer ScienceComputer Science (R0)