Sound source localization – state of the art and new inverse scheme

Acoustic source localization techniques in combination with microphone array measurements have become an important tool for noise reduction tasks. A common technique for this purpose is acoustic beamforming, which can be used to determine the source locations and source distribution. Advantages are that common algorithms such as conventional beamforming, functional beamforming or deconvolution techniques (e.g., Clean-SC) are robust and fast. In most cases, however, a simple source model is applied and the Green’s function for free radiation is used as transfer function between source and microphone. Additionally, without any further signal processing, only stationary sound sources are covered. To overcome the limitation of stationary sound sources, two approaches of beamforming for rotating sound sources are presented, e.g., in an axial fan. Regarding the restrictions concerning source model and boundary conditions, an inverse method is proposed in which the wave equation in the frequency domain (Helmholtz equation) is solved with the corresponding boundary conditions using the finite element method. The inverse scheme is based on minimizing a Tikhonov functional matching measured microphone signals with simulated ones. This method identifies the amplitude and phase information of the acoustic sources so that the prevailing sound field can be with a high degree of accuracy.


Introduction
Typically, sound emissions from technical applications and production machines are perceived as disturbing noise. When trying to solve noise problems or refine the acoustic design of a product, knowledge of the position and distribution of sound sources is necessary. In this context, one of the biggest challenges in noise and vibration problems is to identify the areas of a device, machine or structure that produce the significant acoustic emission. For this task various sound localization methods can be used, in order to localize and visualize sound sources. The information will be given in so-called source maps, which provide information about location, distribution and strength of sound sources.
The standard methods are intensity measurements, acoustic nearfield holography and acoustic beamforming. But, these methods are not universally applicable. In contrast to intensity measure-ments, where an intensity probe is used, near-field holography and beamforming use locally distributed microphones (= microphone array). Depending on the measurement object, frequency range and measurement environment, the different methods have specific strengths and weaknesses.
In the last years, considerable improvements have been achieved in the localization of sound sources using microphone arrays. However, there are still some limitations. In most cases, a simple source model is applied and the Green's function for free radiation is used as transfer function between source and microphone. Hence, the In real life applications, one may also be faced with moving sound sources, e.g. a passing vehicle or a rotating fan. Here, the mentioned sound source localization algorithms and methods do not readily apply, but there exist advanced signal processing methods that overcome the limitation to stationary sound sources.

Beamforming based algorithm
The acoustic field, described by the complex acoustic pressure p a of a monopole source with source strength σ , is calculated in the frequency domain with Green's function g(r) of free radiation by p a (x, y, ω) = σ (y) g(r) = σ (y) e −j ω c 0 r 4πr with r = |x − y|, (1) where x denotes the observer position (e.g. microphone positions) and y the source postions, c 0 the speed of sound and ω = 2πf the angular sound frequency. All measured microphone signals p a (t) are Fourier transformed and the resulting complex pressure values p a at a certain frequency ω are stored in a vector p a (ω) = ⎡ ⎢ ⎢ ⎣ p a,1 (ω) . . .
The cross-spectral matrix (CSM) is calculated by with H the hermitian operation (transposition and complex conjugation).
In Conventional Beamforming (ConvBF), the fundamental and most basic as well as robust frequency domain processing method [1], the measured sound field is compared to a calculated sound field. Thereby, a certain model for the acoustic source is assumed. Most beamforming algorithms model the acoustic source by monopols (1) to calculate the acoustic pressure. By using this acoustic source model, following functional is defined which refers to a single source with the strength σ to be determined (without loss of generality). Here, g denotes the vector of the individual Green's functions (also called steering vector) and || || F is the Frobenius norm. Minimizing the functional (3), i.e. setting its derivative to zero, yields which is the expression for the determination of the source strength. Thereby, w denotes the weighted steering vector. The steering vector represents the transfer functions from the focus point to the microphone positions and account for the phase shift and amplitude correction (sound propagation model) as well as the microphone weighting [2]. They can be either obtained by measurements [3] or by theoretical models. In [4], different steering vector formulations are discussed. Thereby, a reasonable enhancement in the correct estimation of the source location could be obtained, whereas there is a trade-off between the correct reconstruction of the location and the source strength. Another approach of determining the transfer function is given by combining measurement and simulation, leading to numerically calculated transfer functions (NCTFs) [5,6].
The main diagonal elements of the CSM (2) represent the auto power of the microphones and therefore provide no information about the phase differences between the microphones, but may introduce microphone self noise. Hence, for experimental measurements, the main diagonal is usually omitted.
For broadband sound sources it may be of interest to plot source maps in frequency bands (e.g., one-third or octave bands), rather than for single frequencies. For this purpose the individual source maps are energy summed [7] for each frequency band according to whereby N f denotes the number of frequencies in the considered band. In Fig. 1a the source map (calculated by (4)) of the geometric setup given in Fig. 1b   Overview of the parameters for acoustic beamforming when using a two-dimensional array including the obtained source map after the beamforming process array was applied. In the source map, the actual source position is clearly visible through the maximum which is also called main lobe. The source strength will be given as source level with the reference value σ ref = 20 µPam d−2 (d is the space dimension), according to the reference value of the sound pressure level. Another important quantity is the normalized source level where σ max represents the maximum source strength in the considered focus grid.

Beamforming parameters
The source map, Eq. (4), can be interpreted as a convolution of the array response function (= point spread function, PSF) with the real source distribution [2]. Thus, the main lobe is surrounded by socalled side lobes (artefacts) (see Fig. 2) whereby the formation of them is purely beamforming-related and independent of measuring equipment. Hence, the limitation of resolution and dynamic is caused by the PSF of the microphone array. The PSF has a strong and wide main lobe as well as strong side lobes especially for low frequencies, such that weaker sources may be hidden. The side lobes may lead to problems in the localization and interpretation of the source map if several sources are present. If weak sources have to be localized in the presence of strong sources, the distance between main and side lobes s L should be as large as possible (see Fig. 2a).
The PSF depends among others on -the source characteristics (frequency, position, strength), -the spatial arrangement of the microphones, -the number of microphones and -the focus grid.
Hence, the layout of the microphone array (spatial arrangement, microphone number) is crucial for the quality of the source map and for the ability to locate and quantify acoustic sources. To assess the capabilities, the response function of the array (= PSF) to a defined sound source (usually a point source) can be used. With the PSF, statements can be made about the properties for individual frequencies on the current setup (focus grid, microphone arrangement, etc.). This makes it possible to design microphone arrays for specific measurement tasks or to test the performance of an existing arrangement. To assess the PSF, the beam width b W and the side lobe attenuation s L is used. The width of the main lobe b W (usually specified 3 dB below the maximum, see Fig. 2a) limits the spatial resolution. Thereby, the theoretical resolution is given by the Rayleigh limit [8] as well as by the Sparrow limit [9].
Next, all important parameters that are necessary and must be considered for acoustic beamforming are summarized (see Fig. 3): -Array layout (microphone number M, microphone spacing M, aperture W), -Focus (scan) grid (discretization , dimension X and Y ), -Distance between microphone array and focus grid Z, -Temperature θ and relative air humidity for the estimation of c 0 , -Beamforming algorithm B, -Steering vector w formulation.

Functional beamforming
One drawback of ConvBF is that the source map obtained with Con-vBF not only contains the main lobe, i.e. the peak in the source map where the actual source is located, but also shows artefacts (side lobes), which occur at positions without actual sources. This is due to the above mentioned fact that the computed source distribution is a convolution of the real source distribution with the particular response function (PSF) of the array. While these side lobes have a smaller amplitude than the main lobe it is still possible that the side lobes obscure weaker sources which are therefore not detected by ConvBF. One possibility to reduce the high side lobe level is to use an advanced beamforming algorithm called Functional beamforming (FuncBF) [10,11]. Here, a parameter ν is introduced and (4) is changed to The CSM is diagonalised according to

Deconvolution algorithms
Another way to overcome the drawback of side lobes in the source map is to use advanced signal processing algorithms based on deconvolution, e.g., DAMAS [12], Clean-SC [13], SC-DAMAS [14] etc., that convert the raw source map (4) into a deconvoluted source map, resulting in higher resolution and dynamic range. Thereby, it is assumed that the computed source map obtained by (4) is built up by individual scaled PSFs of the array. By using deconvolution algorithms, these response functions of the array are determined and replaced by single peaks or narrow-width beams. As a consequence, the side lobes (artefacts) are removed in the deconvolved map. In [15] and [16], one can find a detailed comparison between different deconvolution techniques and the application to 2D and 3D sound source localization. There, the different techniques are compared with respect to position detection, source level estimation and computational time. The main findings for source localization in a free radiation environment can be summarized as follows: (1) SC-DAMAS provides the best source map at highest computational costs; (2) Clean-SC has the best trade-off between fast computation and correct source detection. Usually, the deconvolution algorithms can only be applied in post-processing because the calculation of deconvolved source maps takes too much time for real-time analyses.

Rotating beamforming
If the sound source is moving, i.e. the distance between sound source and microphone (observer) is time dependent r = r(t) one has to take into account that sound which is received at time t (= reception time) at an observation point x was emitted at an earlier time τ (= retarded time or emission time) from the source point y. Further, a stationary observer perceives different sound frequencies than those that are emitted by the source, which is commonly known as the Doppler effect.
The retarded time is defined implicitly by For moving sound sources the presented methods for sound source localization can no longer be applied but there exist signal processing algorithms that treat special cases, e.g. translationally moving sound sources at constant speed U or sound sources rotating at an angular velocity Ω.
If we assume a time dependent monopole that moves along a path x = x s (t) the inhomogeneous wave equation in time domain reads as Its solution, i.e. the acoustic pressure of a moving monopole, can be calculated as [17] p a (x, t) = Q e 4πr e (1 − M e cos ϑ e ) (12) with where e denotes evaluation at retarded time τ , M denotes the vectorial Mach number and ϑ the angle between the vector of the source velocity c 0 M and the vector between source and observer r(τ ).
In the stationary case the sound pressure emitted by a monopole source at location y is given as The factor 1/(1 − M e cos ϑ e ) in the non-stationary case is also called 'Doppler factor' as the momentarily perceived sound frequency f(t) at the observer point of a sound source that emits sound at constant frequency f 0 calculates as Since rotating sound sources often occur in practise, e.g. rotating fans, we shall take a closer look at the signal processing algorithms for these kind of sound sources. The sound field that is generated by a rotating monopole with sound frequency f 0 = 1500 Hz, evaluated at a single observation point, is depicted in Fig. 4. In the frequency domain ( Fig. 4b) one can see that the sound field at the observation point not only consists of the excitation frequency f 0 but additionally of values at frequencies shifted by mΩ/(2π). We will focus on two methods that make sound source localization with rotating sources possible, the interpolation of the pressure signals in the time domain [18,19] and the spinning mode decomposition in the frequency domain e.g. [18]. Both methods requires equally spaced microphones on a ring array and a common axis of the ring array and the source's rotational axis.
In the former method the sound pressure signals recorded by the stationary microphones are interpolated according to the momentary position of the rotating sound source. This leads to virtually rotating acoustic pressure signals p vr,m at microphone m which is calculated from its neighboring microphones m l and m h as with Here, ϕ(t) denotes the angle of the source, ϕ is the angle between two arbitrary neighbouring microphones and denotes the floor function.
The latter method for compensating the rotation of a sound source uses modal decomposition of the acoustic pressure signals in the frequency domain and a modified Green's function for the rotating monopole as steering vectors [18]. This method is, contrary to the interpolation method, analytically exact. It requires a constant rotational frequency of the source Ω = const. A measured microphone signal p m (ω) at microphone m is expressed via a discrete Fourier series with spinning mode coefficients The pressure signals in the rotating reference frame p Ω are then calculated as With the modified pressure signals p vr and p Ω , respectively, one can then calculate a modified CSM analogously to (2) and beamforming maps can be calculated in the same way as in the stationary case.
When using the modified CSM but stationary steering vectors, one only gets approximate source locations. For rotating sources the source maps are shifted in tangential direction. This can be avoided when using the modified Green's function g Ω or corrected distances r * between each scan point and each microphone that take into account that the sound source has moved from its initial position when sound is received at a specific microphone -see (10). Further details on the modified steering vectors can be found in [18,20,21].

Limitations and challenges
The fundamental processing method, ConvBF also called frequency domain beamforming (FDBF) [1], is robust and fast. In this simple method, limitations regarding resolution and dynamic range are caused by the PSF. A modification of this classic approach is delivered by FuncBF (Sect. 2.2), which leads to an improvement of resolution and dynamic range, whereby the computational cost remains almost the same as in the standard approach. Furthermore, deconvolution techniques (Sect. 2.3) can be applied, which attempt to eliminate the influence of the PSF on the raw source map. Thereby, resolution and dynamic range are greatly improved at the costs of computational time.
Despite these advances in beamforming techniques, it has to be mentioned that major limitations are caused by the source model. Most beamforming algorithms model the acoustic sources as monopoles or/and dipoles. Moreover, the steering vector g, describing the transfer function (TF) between source and microphone, is modeled by Green's function for free radiation. A different choice of steering vectors can improve the results as demonstrated in [4]. A further challenge of these methods are source localization at low frequencies and in environments with partially or fully reflecting surfaces, for which beamforming techniques do not provide physically reasonable source maps. Furthermore, obstacles can not be considered. In such cases, the steering vectors have to be adapted to take the reverberant environment into account. Two approaches are considered in [3]: (1) modeling reflections by a set of monopoles located at the image source positions; (2) experimentally based identification of Green's function. Thereby, the best results could be obtained by using the formulation with the experimentally obtained Green's functions.
Beamforming is mainly used for acoustic source localization rather than for obtaining quantitative source information. The qualitative statements given by the source map provides information about the distribution and position, and a relative comparison of the source strength at the considered focus grid. In many cases, this information may already be sufficient to determine the origin of the sound emission. However, sometimes quantitative source information is also needed. The estimation of quantitative source spectra is not straightforward [22], but can be obtained through integration methods. Thereby, the source map is integrated over a certain region to obtain a pressure spectrum for this specific area. Thus, the source map is required before integration. A distinction must be made between integrating the raw and deconvolved map. The deconvolved maps can be seen as ideal images of the source contribution and therefore, the integration may be done without further processing [23]. If this is done with the raw source map, it needs to be taken into account that the integrated spectra are still convolved with the array PSF. In [24], an overview of different integration methods is given.

Inverse scheme
Source localization on the basis of beamforming, can be carried out very efficiently in its simplest implementation. In literature, many different comparisons of beamforming methods can be found. Exemplarily, in [25], [2] and [26], simulated data was used as input, and in [15,27], data coming from experiments. A comprehensive overview of different acoustic imaging methods can be found in [25,28,29]. There exist beamforming independent inverse methods (like e.g. L1-Generalized Inverse Beamforming [30], Cross-spectral matrix fitting [14], etc.), which aim to solve an inverse problem considering the presence of all acoustic sources at once in the localization process. Thereby, resolution and dynamic range are greatly improved by these advanced methods at the costs of higher computational time and power.
In the provided inverse scheme a cost functional is minimized such that the physical model with source terms is fulfilled. It is based on the solution of the wave equation in the frequency domain (Helmholtz equation), which allows to fully consider realistic geometry and boundary condition scenarios. Another advantage is its easy generalizability to situations with convection and/or attenuation.

Physical and mathematical model
In ( in acou the space dependent density and compression modulus with speed of sound c 0 and mean density ρ 0 . Herewith, poroelastic materials (e.g., porous absorbers) can be considered as a layer of an equivalent complex fluid having a frequency-dependent effective density˜ eff and bulk modulusK eff . With this formulation the absorption properties of surfaces can be adjusted. Thereby, a large number of models for characterization have been established for poroelastic materials. Depending on the theoretical assumptions, the models are based on a different number of (material) parameters. An overview of different modeling approaches can be found in [31]. Furthermore, the sound sources on the surface are modeled by Since the identification is done separately for each frequency ω, the dependence on ω is neglected in the notation. Now, the considered inverse problem is to reconstruct σ in and/or σ bd from pressure measurements at the microphone positions x 1 , . . . , x M . For the acoustic sources the following ansatz is made a n e jϕn δ xn (27) with the searched for amplitudes a 1 , a 2 , . . . , a N ∈ R and phases ϕ 1 , ϕ 2 , . . . , ϕ N ∈ [−π/2, π/2]. Here, N denotes the number of possible sources and δ xn the delta function at position x n .

Optimization based source identification
The source identification by means of Tikhonov regularization amounts to solving the following constrained optimization problem In here, the box constraints on the phases ϕ n are realized by a barrier term with some penalty parameter ρ > 0. This also helps to avoid phase wrapping artefacts. The penalty parameter ρ and the regularization parameters α and β are chosen according to the sequential discrepancy principle [32] with x the smallest exponent such that following inequality is fulfilled, with ε being the measurement error. According to [33], it can be expected that this leads to a convergent regularization method.

Fig. 5. Ground plan of the room with source location (dimensions in m)
Sparsity of the reconstruction is desired to pick the few true source locations from a large number of the N trial sources. By choosing q ∈ (1, 2] close to one, an enhanced sparsity can be obtained. Since the optimization scheme uses different stopping criteria, a scaling factor is introduced in (29), where the maximum absolute value of the measured pressure | p ms m | is scaled to an amplitude of ψ . The implemented optimization based identification algorithm is based on a gradient method with Armijo line search exploring the adjoint method to efficiently obtain the gradient of the objective function. Hence, the computational time does not depend on the number of microphones M nor on the assumed number of possible sources N. Further details, about the inverse scheme can be found in [34]. In the current implementation, the finite element (FE) method is applied for solving the Helmholtz equation (24). Hence, the applicability of the inverse scheme towards computational time is mainly restricted to the low frequency range, since the discretization effort and therefore the number of degree of freedoms in 3D is of the order O(e −3 size ), where e size is the mesh size being determined by In (32) f max denotes the highest frequency of the acoustic sources and N e should be between 10 to 20 (rule of thumb [35]). Please note that any other numerical method, e.g., the boundary element method (BEM) can be applied, which may be even more efficient with respect to computation time, depending on the particular scenario.

Low-frequency sound source in a room
To demonstrate the applicability of the inverse scheme in real-world scenarios, microphone array measurements were performed in a room where a generic sound source was located. Thereby, the room is partially lined with porous absorbers (Baso Plan 100) on the walls and ceiling. A ground plan of the room is shown in Fig. 5 including the generic sound source location. This sound source is a box (dimensions: 50 x 100 x 55 cm) made of Doka formwork sheet, see Fig. 6a. The sound can be generated by three separately excitable loudspeakers (VISATON WS 20 E, ∅ = 8"). In order to characterize the generic source, the normal velocity is measured with a laser scanning vibrometer (LSV) Polytec PSV-500. In the measurement, first speaker L1 and afterwards L2 was active. The excitation frequency of the speakers was 250 Hz. The measured normal velocity level L v (ref. 50 nm/s) is given in Fig. 6b and Fig. 6c (just the side with the active speaker is shown). The application of numerical methods like the introduced inverse scheme needs physical and geometrical modeling of the real-world situation. Therefore, first, an appropriate FE model of the measurement environment was created, which is depicted in Fig. 7. In order to obtain accurate data for an acoustic field by simulation, the boundary conditions necessary for the solution of the acoustic wave equation have to be determined in a suitable way. For the characterization of the materials present in the room, the absorption coefficient α was determined with impedance tube measurements applying the 2p-method (ISO 10534-2 [36]). The measured absorption coefficients showed that most of the surfaces could be assumed to be fully reflective. Hence, for these surfaces the homogeneous Neumann boundary condition is used For the modeling of the porous absorber an equivalent fluid model (assuming isotropic and volume averaged features) is used, which provides the effective parameters˜ eff andK eff for the generalized Helmholtz equation (24). Hereby, the Delany-Bazely-Miki (DBM) [37,38] model was chosen, which is purely empirical and derived from measurements on many highly porous materials. In Fig. 8, the fitted absorption curve of the DBM model is compared with the measurements. For the positioning within the room so-called microphone trees have been used (see Fig. 9). These trees can have different branches with different lengths at various heights to which the microphones are attached. For the application of the introduced inverse scheme the positions of these microphones have to be determined with respect to a reference position. For this purpose, an acoustic positioning system was developed [6], which is based on the principle of multi-lateration where the distances between an unknown position (microphone) and several known points (loudspeakers at the walls, see Fig. 9) is used to determine the microphone location. This setting is similar to the well-known global-positioning system (GPS) [39].
For the identification of the acoustic source, a microphone array with M = 50 microphones is considered where the microphones are spatially distributed throughout the room without any special requirements for their positions. However, care was taken not to place them too close to the ceiling, the floor and the walls. The used microphone positions in the room, called MicMeas, are depicted in Fig. 10. Furthermore, an optimized arrangement named MicBest, which were obtained by simulations, was used for the inverse scheme (presented by black triangles). It has been shown that 2D microphone arrays do not provide satisfactory results for both beamforming and inverse schemes in the considered environment and therefore 2D arrays are not considered in this context (see [6]). The temperature in the room was ϑ = 25 • ± 2 • C and the relative humidity = 25 ± 10 % during the measurements. The considered source frequency is 250 Hz (λ ≈ 1.36 m). First, the localization result of speaker L1 on the left side of the generic source are shown in Fig. 11.
In addition to the inverse scheme, three common beamforming algorithms are used for the source identification to have a comparison with the results of the inverse scheme. Hereby, ConvBF and FuncBF were applied, which can also handle coherent acoustic sources. Furthermore, the Clean-SC deconvolution algorithm was applied. This algorithm removes the side lobes from the raw source map based on spatial source coherence. However, it will not work satisfactorily with several tonal (i.e. coherent) sound sources. The source maps are also shown in Fig. 11. Comparing the source maps of the inverse scheme with those of the beamforming algorithms used, we can see that the source map of the inverse scheme provides a more accurate localization.
So far, only the localization result has been considered, which indicates whether the source was identified at the correct position. In order to make quantitative statements about the identified source distribution, the identified sources will be used to perform a sound field computation. This allows comparisons with the original acoustic field measured at the microphone positions. For the inverse scheme this is straightforward and no further steps need to be taken, since a detailed source distribution both in amplitude and phase is identified. Hence, a numerical simulation was performed to obtain the acoustic field. For the quantification of the obtained result of the sound field computation, the relative L2 error between measured p ms m and simulated pressure values p inv m at the microphone positions M is used. The results are given in Table 1. For the beamforming source maps, the acoustic field computation is not as simple as for the source distributions obtained by the inverse scheme. Thereby, the main problem is given through the point spread function (PSF) of the array, since the computed source map (raw map) is a convolution of the PSF with the real source distribution. Deconvolution algorithms (like Clean-SC) try to eliminate the influence of the PSF from the raw source map resulting in a deconvolved map. Hence, the source maps obtained with Clean-SC can be integrated without further processing. For the ConvBF and FuncBF results, the source power integration technique [40] [41] [42] was applied to limit the effect of the PSF. Thereby, the raw source map is normalized by the integrated PSF for a point source in the center of the integration area. The integration needs a specified area. Thereby, the maximum of the source map was taken as the center point of the integration area. From this center point, a sphere with radius 0.1 m was assumed and all surface points in this sphere were used for the integration. The results in Table 1 demonstrate the main advantage of the inverse scheme, namely an accurate identification of the source distribution with amplitude and phase for the computation of the acoustic field. Hence, considering the source field  reconstruction, the inverse scheme clearly outperforms the beamforming methods. Next, the identified normal velocity level L v by the inverse scheme is considered in order to have a comparison with the LSV measurement data (see Fig. 6). Herefore, first the same situation as before, when speaker L1 on the left side of the source is active, is considered (Fig. 12a). The direct comparison with the LSV data shows a deviation of about 17 dB. In the next step, speaker L2 at the bottom of the box becomes the active sound source. This source radiates sound towards the floor. As before, the inverse scheme provides a good localization, but the amplitude again deviates by about 18 dB (Fig. 12b).
To demonstrate the capability of the inverse scheme by using highly sensitive microphone positions (named by MicBest, see Fig. 11), we proceeded as follows. A forward simulation with prescribed normal velocity at the loudspeaker position was performed (see Fig. 13a). The positions of the virtual microphones were determined using the guidelines in [6] such that location have been chosen, where the acoustic pressure has a maximum. Since the acoustic field is known in the room through the forward simulation, the positions can be found easily. Therefore, first the maximum in the sound pressure field was searched and taken as microphone position number one, whereby a constraint, namely that the microphone should not be too close to a reflective surface, is used. After this step, the position for the next microphone is determined. For this purpose, the region around the first microphone position is removed from the search space and the next maximum is searched. This procedure is followed until 50 positions are determined to have the same microphone number as before. In Fig. 13b,c the various identification results using the two arrangements are depicted. Thereby, MicB (microphone arrangement as used in the measurements) shows a good localization result with a deviation of the velocity level of about 10 dB compared to the original one. However, using the microphone arrangement MicBest an almost perfect localization result as well as a good agreement in velocity level could be achieved (see Fig. 13c).
The results achieved so far demonstrate the applicability of the proposed method for identifying low-frequency sound sources in real world situations. Thereby, an additional challenge is the localization (separation) of several active sound sources, especially in the low-frequency range. To test the proposed method in the presence of more than one acoustic source, two setups have been considered: (case A) speaker L1 and L3 active and (case B) speaker L1 and L2 active (case B). Both sources should have approximately the same source strength, since the excitation signal was the same for both. However, due to speaker tolerances, the same source level may not be achieved. By using the inverse scheme for localization a good identification especially for case A was achieved (see Fig. 14a). For case B (see Fig. 15a), the result is not as good, but we want to note that this setup is more challenging, since the two sources are closer to each other which makes the separation harder. The localization was also done with FuncBF (ConvBF is omitted, because FuncBF has the better ability for source separation). Since Clean-SC can not localize both sources (coherent sound sources), only the results of FuncBF will be considered. It can be observed that in case A (Fig. 14b), FuncBF can also separate the two sources, but the identified position of speaker L3 is more accurate with the inverse scheme. Moreover, the two source strengths of speaker L1 and L3 (which should be approximately equal) do not differ as much. For case B, FuncBF can not separate between the two sources (see Fig. 15b).

Rotating sources
In the second application, measurement results of real world scenarios of rotating sound sources are presented [21]. All shown measurements were performed at FAU Erlangen with a ring array consisting of 64 microphones with a radius of 0.5 m [20]. For validation purposes, a fan with unskewed blades and mounted piezo buzzers was used. Next, measurements of a forward-skewed fan were processed. The radius of both fan blades is 0.25 m. Figure 16 shows results of ConvBF and Clean-SC of the unskewed fan with buzzers. The source maps depict the source level L σ defined in (6). The normal distance between array and fan plane is approximately 0.63 m and the rotational frequency of the fan 590 min −1 . The frequency band of the scanning frequency f scan was chosen 2 kHz ≤ f scan ≤ 6 kHz which is the frequency range of the buzzers. The averaging of the source strengths computed at single frequencies within the defined band was performed according to (5). The resulting source maps provide an acoustic image of the position the fan had when the measurement was started. In order to interpret the locations of the sound sources with respect to the fan geometry correctly, it is necessary to measure the angle ϕ(t) of the fan synchronously with the emitted sound pressures.
As can be seen, there is no significant difference between interpolation method and spinningmode decomposition presented in Sect. 2.4 regarding the source maps, i.e. positions and levels of the identified sources. If no signal processing was performed prior to the beamforming algorithm, thus stationary sources were assumed, the source maps would be smeared and the piezo buzzers would be interpreted as ring shaped sources. Figures 16a and 16b show results of ConvBF, which show distinctive peaks (main lobes) where the sources are located. There can be seen three main sources near three of the blade tips where the buzzers are mounted. Due to the frequencies of the sound sources and the geometry of the used array and setup, the spacial resolution is not high enough to determine if there is only one buzzer mounted on a blade or more close to each other. Further, there are side lobes of all three main sources visible that interfere with each other. Figures 16c and 16d show results of Clean-SC which is a deconvolution algorithm that uses the source maps calculated with ConvBF, also called "dirty maps", as basis. As this method removes the side lobes of the ConvBF-map and incoherent sources are replaced by single peaks, the individual buzzers at each blade show as separate sound sources in the map. The sources on the fan blade located in the first quadrant now can be clearly identified as independent sources, whereas in case of ConvBF the amplitude of their main lobe is in the range of the amplitudes of the side lobes and therefore the source position could be mistaken as an artefact. Again, there is very good agreement between interpolation method and spinningmode decomposition. Figure 17 shows source maps of the forward-skewed fan with same radii of array and fan, respectively. The distance between array and fan in this setup is approximaletly 0.71 m, the rotational frequency is 1486min −1 and the scanning frequency is chosen 3 kHz ≤ f scan ≤ 4 kHz. Here, no additional sources as in the validation setup with piezo buzzers are mounted. Therefore the overall level of the source level is lower. The fan has nine blades and the Clean-SC algorithm identifies several sources along each blade. Again, there is good agreement between the two methods concerning locations and amplitudes of the sources.

Results for different steering vectors
As mentioned in Sect. 2.4, finding the correct source positions of rotating sources requires not only signal processing of the recorded pressure signals but also the use of the correct Green's function g Ω for rotating monopoles as steering vector. Using g Ω and the spinningmode decomposition is the only analytically exact method. However, the Green's function g 0 for stationary monopoles can be used as an approximation to g Ω . Additionally, corrected distances between each assumed source position on the source grid and each microphone can be calculated and inserted into the stationary Green's function. The function g 0 (r * m ) with corrected distances r * m can be used as an improved approximation to g Ω .
In Fig. 18 all three options are depicted. For these figures measurement data of the unskewed fan with buzzers is used but with a scanning frequency f scan between 4 and 6 kHz so that only the buzzers on one blade are visible. The actual source position is marked with a black circle. If the rough approximation g Ω ≈ g 0 (r) is used (Fig. 18a), the identified source position is shifted in ϕ-direction (tangential direction) and the source map gets somewhat smeared. In case of the use of g 0 (r * m ) (Fig. 18b) or g Ω (Fig. 18c) the correct source position is identified. The scale of deviation from the real source position depends on the rotational frequency and the distances between source grid and observer positions.
As the calculation of the correct Green's function g Ω requires, amongst others, evaluation of spherical harmonics and therefore comes at high computational costs, the approximation with g 0 (r * m ) is a good trade-off between accuracy and computational time, as the corrected distances are calculated once for a measurement setup for each possible combination of scanning point and microphone position but they are not frequency-dependable.

Conclusion
In this contribution an overview of beamforming based algorithms for sound source localization was presented and the advantages as well as limitations of the different algorithms were discussed. Further, two methods for source localization for rotating sound sources are presented. For each method, real-world microphone measurements were performed and evaluated.
A main restriction in current beamforming methods is given by the assumption of free radiation for the calculation of the transfer function between microphone and assumed source point (steering vector). This transfer function is usually given by Green's function for free radiation. Hence, obstacles as well as absorbing surfaces are not considerable. In the presented inverse scheme the Helmholtz equation with the correct boundary conditions is solved. Thereby, to recover the source locations, an inverse scheme based on a sparsity promoting Tikhonov functional to match measured (microphone signals) and simulated pressure is used. A clear advantage of such an inverse method is its ability to fully consider realistic geometry and boundary condition scenarios, as well as its straightforward generalizability to situations with convection and/or damping. Furthermore, a detailed source distribution both in amplitude and phase is identified, and finally with these information a numerical simulation can be performed to obtain the acoustic field. Standard beamforming methods require an integration (question of the integration area arises) or more complex convolution algorithms must be applied.
The localization results achieved with the inverse scheme demonstrates the applicability at low-frequencies in real-world scenarios. Furthermore, through simulations, it could be shown that a perfect reconstruction result of the acoustic sources can be achieved with microphone positions at pressure maxima, which demonstrates the potential of the inverse scheme. Despite the superiority of the inverse scheme compared to advanced deconvolution based signal processing schemes one has to consider the high effort based on the two challenges: (1) Geometry and boundary condition modeling for an accurate FE computation, and (2) the precise determination of the microphone positions.
Considering beamforming at rotating sound sources, it could be shown that both presented methods, the interpolation in time domain and spinningmode decomposition in frequency domain, provide good results.

Funding Note Open access funding provided by TU Wien (TUW).
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.