Skip to main content
Log in

Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

Sound field reproduction using Higher-order Ambisonics (HOA) has many studies in recent years. However, in the HOA, sound fields are reproduced with the least square solution of spherical harmonics (SH) coefficients and not the global sound fields. In this paper, we try to reduce the reproduction error with a data-driven method. As we all known, the Generative Adversarial Networks (GAN) can be used to generate data similar to a data set. With the GAN, the target sound fields are converted to sound fields that can be reproduced accurately in the proposed approach. The data set of target sound fields is updated with the generated fields which have less reproduction error, and thus reproduction errors are reduced. We simulated the performance with four loudspeakers, sound fields of 4 orders SH coefficients are reproduced with GAN and HOA at 1000 Hz, with average reproduction errors of 0.3 and 0.6, respectively. Simulations show that the space between the least-square solution and the optimization solution is reduced with our method. Furthermore, the performances of HOA are optimized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. Abhayapala TD, Ward DB (2002) .. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp II–1949–II–1952

  2. Ahrens J, Spors S (2008) .. In: 2008 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 373–376

  3. Ahrens J, Spors S (2011) Wave field synthesis of moving virtual sound sources with complex radiation properties. The Journal of the Acoustical Society of America 130(5):2807

    Article  Google Scholar 

  4. Ando A (2010) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Transactions on Audio, Speech, and Language Processing 19(6):1467

    Article  Google Scholar 

  5. Ari hrtf database homepage. http://www.kfs.oeaw.ac.at/hrtf. Last accessed 17 January 2020

  6. Berkhout AJ, de Vries D, Vogel P (1993) Acoustic control by wave field synthesis. The Journal of the Acoustical Society of America 93(5):2764

    Article  Google Scholar 

  7. Bi H, Li N, Guan H, Lu D, Yang L (2019) .. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3876–3880

  8. Bishop CM (2006) Pattern recognition and machine learning. Springer

  9. Cai W, Wei Z (2020) Piigan: Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451

    Article  Google Scholar 

  10. Chollet F et al (2015) Keras. https://github.com/fchollet/keras

  11. Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Proc Mag 35(1):53

    Article  Google Scholar 

  12. Esmaeilpour M, Cardinal P, Koerich AL (2020) Unsupervised feature learning for environmental sound classification using weighted cycle-consistent generative adversarial network. Appl Soft Comput 86:105912

    Article  Google Scholar 

  13. Fan DP, Wang W, Cheng MM, Shen J (2019) .. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8554–8564

  14. Fernando T, Sridharan S, McLaren M, Priyasad D, Denman S, Fookes C (2020) Temporarily-aware context modeling using generative adversarial networks for speech activity detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1159

    Article  Google Scholar 

  15. Firtha G, Fiala P (2017) Wave field synthesis of moving sources with arbitrary trajectory and velocity profile. The Journal of the Acoustical Society of America 142(2):551

    Article  Google Scholar 

  16. Fliege J Integration nodes for the sphere. http://www.personal.soton.ac.uk/jf1w07/nodes/nodes.html

  17. Frank M, Sontacchi A (2017) Case study on ambisonics for multi-venue and multi-target concerts and broadcasts. J Audio Eng Soc 65(9):749

    Article  Google Scholar 

  18. Fu K, Fan DP, Ji GP, Zhao Q (2020) .. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3052–3062

  19. Fu K, Zhao Q, Gu IYH, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69

    Article  Google Scholar 

  20. Gerzon MA (1985) Ambisonics in multichannel broadcasting and video. J Audio Eng Soc 33(11):859

    Google Scholar 

  21. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) .. In: Advances in neural information processing systems, pp 2672–2680

  22. Han Z, Wu M, Zhu Q, Yang J (2019) Three-dimensional wave-domain acoustic contrast control using a circular loudspeaker array. The Journal of the Acoustical Society of America 145(6):EL488

    Article  Google Scholar 

  23. Huygens C (1920) Traité de la lumière:... (chez Pierre vander Aa marchand libraire

  24. Kennedy RA, Sadeghi Abhayapala TD, Jones HM (2007) Intrinsic limits of dimensionality and richness in random multipath fields. IEEE Trans Signal Process 55(6):2542

    Article  MathSciNet  Google Scholar 

  25. Kentgens M, Jax P (2019) .. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 131–135

  26. Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980

  27. Kirkeby O, Nelson PA (1993) Reproduction of plane wave sound fields. The Journal of the Acoustical Society of America 94(5):2992

    Article  Google Scholar 

  28. Lecomte P, Gauthier PA, Langrenne C, Berry A, Garcia A (2018) Cancellation of room reflections over an extended area using ambisonics. The Journal of the Acoustical Society of America 143(2):811

    Article  Google Scholar 

  29. Li C, Guo C, Ren W, Cong R, Hou J, Kwong S, Tao D (2019) An underwater image enhancement benchmark dataset and beyond. IEEE Trans Image Process 29:4376

    Article  Google Scholar 

  30. Li C, Wand M (2016) .. In: European conference on computer vision. Springer, pp 702–716

  31. Ma J, Yu W, Liang P, Li C, Jiang J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Information Fusion 48:11

    Article  Google Scholar 

  32. Nelson PA (1994) Active control of acoustic fields and the reproduction of sound. J Sound Vib 177(4):447

    Article  Google Scholar 

  33. Okamoto T (2016) .. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 326–330

  34. Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (gans): a survey. IEEE Access 7:36322

    Article  Google Scholar 

  35. Rafaely B (2005) Analysis and design of spherical microphone arrays. IEEE Trans Speech and Audio Process 13(1):135

    Article  Google Scholar 

  36. Ueno N, Koyama S, Saruwatari H (2019) Three-dimensional sound field reproduction based on weighted mode-matching method. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27(12):1852

    Article  Google Scholar 

  37. Wang S, Hu R, Chen S, Wang X, Yang Y, Tu W (2015) .. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 634–638

  38. Wang W, Shen J, Yang R, Porikli F (2017) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intel 40(1):20

    Article  Google Scholar 

  39. Ward DB, Abhayapala TD (2001) Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans Speech and Audio process 9 (6):697

    Article  Google Scholar 

  40. Williams EG (1999) Fourier acoustics: sound radiation and nearfield acoustical holography. (Academic Press

  41. Wu YJ, Abhayapala TD (2009) Theory and design of soundfield reproduction using continuous loudspeaker concept. IEEE Transactions on Audio, Speech, and Language Processing 17(1):107

    Article  Google Scholar 

  42. Xiang Y, Bao C (2020) A parallel-data-free speech enhancement method using multi- objective learning cycle-consistent generative adversarial network, IEEE/ACM Trans- actions on Audio, Speech, and Language Processing

  43. Yu G, Wu R, Liu Y, Xie B (2018) Near-field head-related transfer-function measurement and database of human subjects. The Journal of the Acoustical Society of America 143(3):EL194

    Article  Google Scholar 

  44. Zhang W, Abhayapala TD (2014) Three dimensional sound field reproduction using multiple circular loudspeaker arrays: Functional analysis guided approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(7):1184

    Article  Google Scholar 

  45. Zhang J, Fan DP, Dai Y, Anwar S, Saleh FS, Zhang T, Barnes N (2020) .. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8582–8591

  46. Zhang J, Zhang W, Abhayapala TD, Zhang L (2020) 2.5 d multizone reproduction using weighted mode matching: Performance analysis and experimental validation. The Journal of the Acoustical Society of America 147(3):1404

    Article  Google Scholar 

  47. Zhao JX, Liu JJ, Fan DP, Cao Y, Yang J, Cheng MM (2019) .. In: Proceedings of the IEEE International Conference on Computer Vision, pp 8779–8788

  48. Zhu JY, Park T, Isola P, Efros AA (2017) .. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232

  49. Zhu Q, Qiu X, Coleman P, Burnett I (2020) A comparison between two modal domain methods for personal audio reproduction. The Journal of the Acoustical Society of America 147(1):161

    Article  Google Scholar 

Download references

Acknowledgements

This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. U1736206, No. 61761044), Hubei Province Technological Innovation Major Project (No. 2017AAA123).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Xiaochen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhang, L., Wang, X., Hu, R. et al. Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN). Multimed Tools Appl 80, 2205–2220 (2021). https://doi.org/10.1007/s11042-020-09735-3

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-020-09735-3

Keywords

Navigation