Abstract
Sound field reproduction using Higher-order Ambisonics (HOA) has many studies in recent years. However, in the HOA, sound fields are reproduced with the least square solution of spherical harmonics (SH) coefficients and not the global sound fields. In this paper, we try to reduce the reproduction error with a data-driven method. As we all known, the Generative Adversarial Networks (GAN) can be used to generate data similar to a data set. With the GAN, the target sound fields are converted to sound fields that can be reproduced accurately in the proposed approach. The data set of target sound fields is updated with the generated fields which have less reproduction error, and thus reproduction errors are reduced. We simulated the performance with four loudspeakers, sound fields of 4 orders SH coefficients are reproduced with GAN and HOA at 1000 Hz, with average reproduction errors of 0.3 and 0.6, respectively. Simulations show that the space between the least-square solution and the optimization solution is reduced with our method. Furthermore, the performances of HOA are optimized.
Similar content being viewed by others
References
Abhayapala TD, Ward DB (2002) .. In: IEEE International Conference on Acoustics, Speech, and Signal Processing, pp II–1949–II–1952
Ahrens J, Spors S (2008) .. In: 2008 IEEE international conference on acoustics, speech and signal processing, IEEE, pp 373–376
Ahrens J, Spors S (2011) Wave field synthesis of moving virtual sound sources with complex radiation properties. The Journal of the Acoustical Society of America 130(5):2807
Ando A (2010) Conversion of multichannel sound signal maintaining physical properties of sound in reproduced sound field. IEEE Transactions on Audio, Speech, and Language Processing 19(6):1467
Ari hrtf database homepage. http://www.kfs.oeaw.ac.at/hrtf. Last accessed 17 January 2020
Berkhout AJ, de Vries D, Vogel P (1993) Acoustic control by wave field synthesis. The Journal of the Acoustical Society of America 93(5):2764
Bi H, Li N, Guan H, Lu D, Yang L (2019) .. In: 2019 IEEE International Conference on Image Processing (ICIP), IEEE, pp 3876–3880
Bishop CM (2006) Pattern recognition and machine learning. Springer
Cai W, Wei Z (2020) Piigan: Generative adversarial networks for pluralistic image inpainting. IEEE Access 8:48451
Chollet F et al (2015) Keras. https://github.com/fchollet/keras
Creswell A, White T, Dumoulin V, Arulkumaran K, Sengupta B, Bharath AA (2018) Generative adversarial networks: an overview. IEEE Signal Proc Mag 35(1):53
Esmaeilpour M, Cardinal P, Koerich AL (2020) Unsupervised feature learning for environmental sound classification using weighted cycle-consistent generative adversarial network. Appl Soft Comput 86:105912
Fan DP, Wang W, Cheng MM, Shen J (2019) .. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 8554–8564
Fernando T, Sridharan S, McLaren M, Priyasad D, Denman S, Fookes C (2020) Temporarily-aware context modeling using generative adversarial networks for speech activity detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28:1159
Firtha G, Fiala P (2017) Wave field synthesis of moving sources with arbitrary trajectory and velocity profile. The Journal of the Acoustical Society of America 142(2):551
Fliege J Integration nodes for the sphere. http://www.personal.soton.ac.uk/jf1w07/nodes/nodes.html
Frank M, Sontacchi A (2017) Case study on ambisonics for multi-venue and multi-target concerts and broadcasts. J Audio Eng Soc 65(9):749
Fu K, Fan DP, Ji GP, Zhao Q (2020) .. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp 3052–3062
Fu K, Zhao Q, Gu IYH, Yang J (2019) Deepside: a general deep framework for salient object detection. Neurocomputing 356:69
Gerzon MA (1985) Ambisonics in multichannel broadcasting and video. J Audio Eng Soc 33(11):859
Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D, Ozair S, Courville A, Bengio Y (2014) .. In: Advances in neural information processing systems, pp 2672–2680
Han Z, Wu M, Zhu Q, Yang J (2019) Three-dimensional wave-domain acoustic contrast control using a circular loudspeaker array. The Journal of the Acoustical Society of America 145(6):EL488
Huygens C (1920) Traité de la lumière:... (chez Pierre vander Aa marchand libraire
Kennedy RA, Sadeghi Abhayapala TD, Jones HM (2007) Intrinsic limits of dimensionality and richness in random multipath fields. IEEE Trans Signal Process 55(6):2542
Kentgens M, Jax P (2019) .. In: ICASSP 2019-2019 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 131–135
Kingma D, Ba J (2014) Adam: A method for stochastic optimization. arXiv:1412.6980
Kirkeby O, Nelson PA (1993) Reproduction of plane wave sound fields. The Journal of the Acoustical Society of America 94(5):2992
Lecomte P, Gauthier PA, Langrenne C, Berry A, Garcia A (2018) Cancellation of room reflections over an extended area using ambisonics. The Journal of the Acoustical Society of America 143(2):811
Li C, Guo C, Ren W, Cong R, Hou J, Kwong S, Tao D (2019) An underwater image enhancement benchmark dataset and beyond. IEEE Trans Image Process 29:4376
Li C, Wand M (2016) .. In: European conference on computer vision. Springer, pp 702–716
Ma J, Yu W, Liang P, Li C, Jiang J (2019) Fusiongan: a generative adversarial network for infrared and visible image fusion. Information Fusion 48:11
Nelson PA (1994) Active control of acoustic fields and the reproduction of sound. J Sound Vib 177(4):447
Okamoto T (2016) .. In: 2016 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp 326–330
Pan Z, Yu W, Yi X, Khan A, Yuan F, Zheng Y (2019) Recent progress on generative adversarial networks (gans): a survey. IEEE Access 7:36322
Rafaely B (2005) Analysis and design of spherical microphone arrays. IEEE Trans Speech and Audio Process 13(1):135
Ueno N, Koyama S, Saruwatari H (2019) Three-dimensional sound field reproduction based on weighted mode-matching method. IEEE/ACM Transactions on Audio, Speech, and Language Processing 27(12):1852
Wang S, Hu R, Chen S, Wang X, Yang Y, Tu W (2015) .. In: 2015 IEEE international conference on acoustics, speech and signal processing (ICASSP), IEEE, pp. 634–638
Wang W, Shen J, Yang R, Porikli F (2017) Saliency-aware video object segmentation. IEEE Trans Pattern Anal Mach Intel 40(1):20
Ward DB, Abhayapala TD (2001) Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans Speech and Audio process 9 (6):697
Williams EG (1999) Fourier acoustics: sound radiation and nearfield acoustical holography. (Academic Press
Wu YJ, Abhayapala TD (2009) Theory and design of soundfield reproduction using continuous loudspeaker concept. IEEE Transactions on Audio, Speech, and Language Processing 17(1):107
Xiang Y, Bao C (2020) A parallel-data-free speech enhancement method using multi- objective learning cycle-consistent generative adversarial network, IEEE/ACM Trans- actions on Audio, Speech, and Language Processing
Yu G, Wu R, Liu Y, Xie B (2018) Near-field head-related transfer-function measurement and database of human subjects. The Journal of the Acoustical Society of America 143(3):EL194
Zhang W, Abhayapala TD (2014) Three dimensional sound field reproduction using multiple circular loudspeaker arrays: Functional analysis guided approach. IEEE/ACM Transactions on Audio, Speech, and Language Processing 22(7):1184
Zhang J, Fan DP, Dai Y, Anwar S, Saleh FS, Zhang T, Barnes N (2020) .. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp 8582–8591
Zhang J, Zhang W, Abhayapala TD, Zhang L (2020) 2.5 d multizone reproduction using weighted mode matching: Performance analysis and experimental validation. The Journal of the Acoustical Society of America 147(3):1404
Zhao JX, Liu JJ, Fan DP, Cao Y, Yang J, Cheng MM (2019) .. In: Proceedings of the IEEE International Conference on Computer Vision, pp 8779–8788
Zhu JY, Park T, Isola P, Efros AA (2017) .. In: Proceedings of the IEEE international conference on computer vision, pp 2223–2232
Zhu Q, Qiu X, Coleman P, Burnett I (2020) A comparison between two modal domain methods for personal audio reproduction. The Journal of the Acoustical Society of America 147(1):161
Acknowledgements
This research is partially supported by the National Key R&D Program of China (No. 2017YFB1002803), National Nature Science Foundation of China (No. U1736206, No. 61761044), Hubei Province Technological Innovation Major Project (No. 2017AAA123).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Zhang, L., Wang, X., Hu, R. et al. Optimization of sound fields reproduction based Higher-Order Ambisonics (HOA) using the Generative Adversarial Network (GAN). Multimed Tools Appl 80, 2205–2220 (2021). https://doi.org/10.1007/s11042-020-09735-3
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-020-09735-3