Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

Zhang, Lingkun; Wang, Xiaochen; Hu, Ruimin; Li, Dengshi; Tu, Weipin

doi:10.1007/s11042-020-09735-3

Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

Published: 12 September 2020

Volume 80, pages 2205–2220, (2021)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

Lingkun Zhang^1,2,
Xiaochen Wang ORCID: orcid.org/0000-0002-1904-2097^1,2,
Ruimin Hu^1,2,
Dengshi Li^1,3 &
…
Weipin Tu^1,3

539 Accesses
2 Citations
Explore all metrics

Abstract

Sound field reproduction using Higher-order Ambisonics (HOA) has many studies in recent years. However, in the HOA, sound fields are reproduced with the least square solution of spherical harmonics (SH) coefficients and not the global sound fields. In this paper, we try to reduce the reproduction error with a data-driven method. As we all known, the Generative Adversarial Networks (GAN) can be used to generate data similar to a data set. With the GAN, the target sound fields are converted to sound fields that can be reproduced accurately in the proposed approach. The data set of target sound fields is updated with the generated fields which have less reproduction error, and thus reproduction errors are reduced. We simulated the performance with four loudspeakers, sound fields of 4 orders SH coefficients are reproduced with GAN and HOA at 1000 Hz, with average reproduction errors of 0.3 and 0.6, respectively. Simulations show that the space between the least-square solution and the optimization solution is reduced with our method. Furthermore, the performances of HOA are optimized.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Active Room Compensation for 2.5D Sound Field Reproduction

Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks

Article 13 October 2020

Multizone Sound Field Reproduction Based on Equivalent Source Method

Article 28 March 2021

References

Abhayapala TD, Ward DB (2002) .. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp II–1949–II–1952
Ahrens J, Spors S (2008) .. In: 2008 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 373–376
Ahrens J, Spors S (2011) Wave field synthesis of moving virtual sound sources with complex radiation properties. The Journal of the Acoustical Society of America 130(5):2807
Article Google Scholar
Ando A (2010) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Transactions on Audio, Speech, and Language Processing 19(6):1467
Article Google Scholar
Ari hrtf database homepage. http://www.kfs.oeaw.ac.at/hrtf. Last accessed 17 January 2020
Berkhout AJ, de Vries D, Vogel P (1993) Acoustic control by wave field synthesis. The Journal of the Acoustical Society of America 93(5):2764
Article Google Scholar
Bi H, Li N, Guan H, Lu D, Yang L (2019) .. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3876–3880
Bishop CM (2006) Pattern recognition and machine learning. Springer
Cai W, Wei Z (2020) Piigan: Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451
Article Google Scholar
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Proc Mag 35(1):53
Article Google Scholar
Esmaeilpour M, Cardinal P, Koerich AL (2020) Unsupervised feature learning for environmental sound classification using weighted cycle-consistent generative adversarial network. Appl Soft Comput 86:105912
Article Google Scholar
Fan DP, Wang W, Cheng MM, Shen J (2019) .. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8554–8564
Fernando T, Sridharan S, McLaren M, Priyasad D, Denman S, Fookes C (2020) Temporarily-aware context modeling using generative adversarial networks for speech activity detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1159
Article Google Scholar
Firtha G, Fiala P (2017) Wave field synthesis of moving sources with arbitrary trajectory and velocity profile. The Journal of the Acoustical Society of America 142(2):551
Article Google Scholar
Fliege J Integration nodes for the sphere. http://www.personal.soton.ac.uk/jf1w07/nodes/nodes.html
Frank M, Sontacchi A (2017) Case study on ambisonics for multi-venue and multi-target concerts and broadcasts. J Audio Eng Soc 65(9):749
Article Google Scholar
Fu K, Fan DP, Ji GP, Zhao Q (2020) .. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3052–3062
Fu K, Zhao Q, Gu IYH, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69
Article Google Scholar
Gerzon MA (1985) Ambisonics in multichannel broadcasting and video. J Audio Eng Soc 33(11):859
Google Scholar
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) .. In: Advances in neural information processing systems, pp 2672–2680
Han Z, Wu M, Zhu Q, Yang J (2019) Three-dimensional wave-domain acoustic contrast control using a circular loudspeaker array. The Journal of the Acoustical Society of America 145(6):EL488
Article Google Scholar
Huygens C (1920) Traité de la lumière:... (chez Pierre vander Aa marchand libraire
Kennedy RA, Sadeghi Abhayapala TD, Jones HM (2007) Intrinsic limits of dimensionality and richness in random multipath fields. IEEE Trans Signal Process 55(6):2542
Article MathSciNet Google Scholar
Kentgens M, Jax P (2019) .. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 131–135
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Kirkeby O, Nelson PA (1993) Reproduction of plane wave sound fields. The Journal of the Acoustical Society of America 94(5):2992
Article Google Scholar
Lecomte P, Gauthier PA, Langrenne C, Berry A, Garcia A (2018) Cancellation of room reflections over an extended area using ambisonics. The Journal of the Acoustical Society of America 143(2):811
Article Google Scholar
Li C, Guo C, Ren W, Cong R, Hou J, Kwong S, Tao D (2019) An underwater image enhancement benchmark dataset and beyond. IEEE Trans Image Process 29:4376
Article Google Scholar
Li C, Wand M (2016) .. In: European conference on computer vision. Springer, pp 702–716
Ma J, Yu W, Liang P, Li C, Jiang J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Information Fusion 48:11
Article Google Scholar
Nelson PA (1994) Active control of acoustic fields and the reproduction of sound. J Sound Vib 177(4):447
Article Google Scholar
Okamoto T (2016) .. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 326–330
Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (gans): a survey. IEEE Access 7:36322
Article Google Scholar
Rafaely B (2005) Analysis and design of spherical microphone arrays. IEEE Trans Speech and Audio Process 13(1):135
Article Google Scholar
Ueno N, Koyama S, Saruwatari H (2019) Three-dimensional sound field reproduction based on weighted mode-matching method. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27(12):1852
Article Google Scholar
Wang S, Hu R, Chen S, Wang X, Yang Y, Tu W (2015) .. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 634–638
Wang W, Shen J, Yang R, Porikli F (2017) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intel 40(1):20
Article Google Scholar
Ward DB, Abhayapala TD (2001) Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans Speech and Audio process 9 (6):697
Article Google Scholar
Williams EG (1999) Fourier acoustics: sound radiation and nearfield acoustical holography. (Academic Press
Wu YJ, Abhayapala TD (2009) Theory and design of soundfield reproduction using continuous loudspeaker concept. IEEE Transactions on Audio, Speech, and Language Processing 17(1):107
Article Google Scholar
Xiang Y, Bao C (2020) A parallel-data-free speech enhancement method using multi- objective learning cycle-consistent generative adversarial network, IEEE/ACM Trans- actions on Audio, Speech, and Language Processing
Yu G, Wu R, Liu Y, Xie B (2018) Near-field head-related transfer-function measurement and database of human subjects. The Journal of the Acoustical Society of America 143(3):EL194
Article Google Scholar
Zhang W, Abhayapala TD (2014) Three dimensional sound field reproduction using multiple circular loudspeaker arrays: Functional analysis guided approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(7):1184
Article Google Scholar
Zhang J, Fan DP, Dai Y, Anwar S, Saleh FS, Zhang T, Barnes N (2020) .. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8582–8591
Zhang J, Zhang W, Abhayapala TD, Zhang L (2020) 2.5 d multizone reproduction using weighted mode matching: Performance analysis and experimental validation. The Journal of the Acoustical Society of America 147(3):1404
Article Google Scholar
Zhao JX, Liu JJ, Fan DP, Cao Y, Yang J, Cheng MM (2019) .. In: Proceedings of the IEEE International Conference on Computer Vision, pp 8779–8788
Zhu JY, Park T, Isola P, Efros AA (2017) .. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Zhu Q, Qiu X, Coleman P, Burnett I (2020) A comparison between two modal domain methods for personal audio reproduction. The Journal of the Acoustical Society of America 147(1):161
Article Google Scholar

Download references

Acknowledgements

This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. U1736206, No. 61761044), Hubei Province Technological Innovation Major Project (No. 2017AAA123).

Author information

Authors and Affiliations

National Engineering Research Center for Multimedia Software, School of Computer Science, Wuhan University, Wuhan, 430072, China
Lingkun Zhang, Xiaochen Wang, Ruimin Hu, Dengshi Li & Weipin Tu
Hubei Key Laboratory of Multimedia and Network Communication Engineering, Wuhan University, Wuhan, 430072, China
Lingkun Zhang, Xiaochen Wang & Ruimin Hu
Collaborative Innovation Center of Geospatial Technology, Wuhan, 430079, China
Dengshi Li & Weipin Tu

Authors

Lingkun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Xiaochen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ruimin Hu
View author publications
You can also search for this author in PubMed Google Scholar
Dengshi Li
View author publications
You can also search for this author in PubMed Google Scholar
Weipin Tu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Xiaochen Wang.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhang, L., Wang, X., Hu, R. et al. Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN). Multimed Tools Appl 80, 2205–2220 (2021). https://doi.org/10.1007/s11042-020-09735-3

Download citation

Received: 17 January 2020
Revised: 03 August 2020
Accepted: 26 August 2020
Published: 12 September 2020
Issue Date: January 2021
DOI: https://doi.org/10.1007/s11042-020-09735-3

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

Abstract

Access this article

Similar content being viewed by others

Active Room Compensation for 2.5D Sound Field Reproduction

Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks

Multizone Sound Field Reproduction Based on Equivalent Source Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN)

Abstract

Access this article

Similar content being viewed by others

Active Room Compensation for 2.5D Sound Field Reproduction

Estimation of spherical harmonic coefficients in sound field recording using feed-forward neural networks

Multizone Sound Field Reproduction Based on Equivalent Source Method

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation