Gary Elko and Jens Meyer are the well-known inventors of the first commercially available compact spherical microphone array that is able to record higher-order Ambisonics [2], the Eigenmike. There are several inspiring scientific works with valuable contributions that can be recommended for further reading [3,4,5,6,7,8,9,10,11,12], above all Boaz Rafaely’s excellent introductory book [13].

This mathematical theory might appear extensive, but it cannot be avoided when aiming at an in-depth understanding of higher-order Ambisonic microphones. The theory enables processing of the microphone signals received such that the surrounding sound field excitation is retrieved in terms of an Ambisonic signal. Some readers may want to skip the physical introduction and resume in Sect. 6.5 on spherical scattering or Sect. 6.6 on the processing of the array signals.

6.1 Equation of Compression

Wave propagation involves reversible short-term temperature fluctuations becoming effective when air is being compressed by sound, causing the specific stiffness of air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic compression relation based on the first law of thermodynamics and the ideal gas law. It relates the relative volume change \(\frac{V}{V_0}\) to the pressure change \(p=-K\,\frac{V}{V_0}\) by the bulk modulus of air. After expressing the bulk modulus by more common constantsFootnote 1 \(K=\rho \,c^2\) and differentially formulating the volume change over time using the change of the sound particle velocity in space, e.g. in one dimension \(\dot{p} = -\rho \,c^2\;\frac{\partial v_x}{\partial x}\), cf. Appendix A.6.1, we get the three-dimensional compression equation:

$$\begin{aligned} \frac{\partial p}{\partial t}&=-\rho \,c^2\;\varvec{\nabla }^\mathrm {T}\varvec{v}. \end{aligned}$$
(6.1)

Here the inner product of the Del symbol \(\varvec{\nabla }^\mathrm {T}=(\frac{\partial }{\partial x},\frac{\partial }{\partial y},\frac{\partial }{\partial z})\) with \(\varvec{v}\) yields what is called divergence \(\mathrm {div}(\varvec{v})=\varvec{\nabla }^\mathrm {T}\varvec{v}=\frac{\partial v_\mathrm {x}}{\partial x}+\frac{\partial v_\mathrm {y}}{\partial y}+\frac{\partial v_\mathrm {z}}{\partial z}\). The equation means: Independently of whether the outer boundaries of a small package of air are traveling at a common velocity: If there are directions into which their velocity is spatially increasing, the resulting gradual volume expansion over time causes a proportional decrease of interior pressure over time.

6.2 Equation of Motion

The equation of motion is relatively simple to understand from the Newtonian equation of motion, e.g. for the x direction, \(F_\mathrm {x}=m\,\frac{\partial v_\mathrm {x}}{\partial t}\) equates the external force to mass m times acceleration, i.e.  increase in velocity \(\frac{\partial v}{\partial t}\). For a small package of air with constant volume \(V_0=\Delta x\Delta y\Delta z\), the mass is obtained by the air density \(m=\rho \,V_0\), and the force equals the decrease of in pressure over the three space directions, times the corresponding partial surface, e.g. for the x direction \(F_\mathrm {x}=-[p(x+\Delta x)-p(x)]\Delta y\Delta z\). For the x direction, this yields after expanding by \(\frac{\Delta x}{\Delta x}\)

$$ -\frac{\Delta p}{\Delta x}\,V_0=\rho \,V_0\,\frac{\partial v_\mathrm {x}}{\partial t}. $$

Dividing by \(-V_0\) and letting \(V_0\rightarrow 0\), we obtain the typical shape of the equation of motion for all three space directions

$$\begin{aligned} \varvec{\nabla }\,p&=-\rho \,\frac{\partial \varvec{v}}{\partial t}. \end{aligned}$$
(6.2)

The equation of motion means: Independently of the common exterior pressure load on all the outer boundaries of a small air package, an outer pressure decrease into any direction implies a corresponding pushing force on the package causing a proportional acceleration into this direction.

6.3 Wave Equation

We can combine the compression equation \(\frac{\partial p}{\partial t}=-\rho \,c^2\;\varvec{\nabla }^\mathrm {T}\varvec{v}\) with the equation of motion \(\varvec{\nabla }\,p=-\rho \,\frac{\partial \varvec{v}}{\partial t}\) by deriving the first one with regard to time \(\frac{\partial ^2 p}{\partial t^2}=-\rho \,c^2\,\varvec{\nabla }^\mathrm {T}\frac{\partial \varvec{v}}{\partial t}\) and the second one with the gradient \(\varvec{\nabla }^\mathrm {T}\) yielding the Laplacian \(\varvec{\nabla }^\mathrm {T}\varvec{\nabla }=\bigtriangleup \), hence \(\bigtriangleup p=-\rho \varvec{\nabla }^\mathrm {T}\frac{\partial \varvec{v}}{\partial t}\). Division of the first result by \(c^2\) and equating both terms yields the lossless wave equation \(\bigtriangleup p = \frac{1}{c^2}\frac{\partial ^2}{\partial t^2}p\) that is typically written as

$$\begin{aligned} \Bigl (\bigtriangleup -\frac{1}{c^2}\frac{\partial ^2}{\partial t^2}\Bigr )p&=0. \end{aligned}$$
(6.3)

Obviously, the wave equation relates the curvature in space (expressed by the Laplacian) to curvature in time (expressed by the second-order derivative).

If p is a pure sinusoidal oscillation \(\sin (\omega \,t+\phi _0)\), the second derivative in time corresponds to a factor \(-\omega ^2\), and by substitution with the wave-number \(k=\frac{\omega }{c}\), we can write the frequency-domain wave equation as

$$\begin{aligned} (\bigtriangleup +k^2)\,p&=0,&\text {Helmholtz equation.} \end{aligned}$$
(6.4)

6.3.1 Elementary Inhomogeneous Solution: Green’s Function (Free Field)

The Green’s function is an elementary prototype for solutions to inhomogeneous problems \((\bigtriangleup +k^2)p=-q\), which is defined as

$$\begin{aligned} \bigl (\bigtriangleup +k^2\bigr )G=-\delta . \end{aligned}$$

A general excitation q of the equation can be represented by its convolution with the Dirac delta distribution \(\int q(\varvec{s})\,\delta (\varvec{r}-\varvec{s})\, \mathrm {d}V(\varvec{s})=q(\varvec{r})\). Consequently, as the wave equation is linear, the general solution must therefore also equal the convolution of the Green’s function with the excitation function \(p(\varvec{r})=\int q(\varvec{s})\,G(\varvec{r}-\varvec{s})\,\mathrm {d}V(\varvec{s})\) over space; if formulated in the time domain: also over time. The integral superimposes acoustical responses of any point in time and space of the source phenomenon, weighted by the corresponding source strength in space and time.

The Green’s function in three dimensions is derived in Appendix A.6.3, Eq. (A.91),

$$\begin{aligned} G&=\frac{e^{-\mathrm {i}k\,r}}{4\pi r}, \end{aligned}$$
(6.5)

with the wave number \(k=\frac{\omega }{c}\) and distance between source and receiver \(r=\sqrt{\Vert \varvec{r}-\varvec{r}_\mathrm {s}\Vert ^2}\).

Acoustic source phenomena are characterized by the behavior of the Green’s function: far away, the amplitude decays with \(\frac{1}{r}\) and the phase \(-kr=-\omega \frac{r}{c}\) corresponds to the radially increasing delay \(\frac{r}{c}\). Both is expressed in Sommerfeld’s radiation condition \(\lim _{r\rightarrow \infty }r\bigl (\frac{\partial }{\partial r}p+\mathrm {i}k\,p\bigr )=0\).

Plane waves. The radius coordinate of the Green’s function is the distance between two Cartesian position vectors \(\varvec{r}_\mathrm {s}\) and \(\varvec{r}\), the source and receiver location. Letting one of them become large is denoted by re-expressing it in terms of radius and direction vector \(\varvec{r}_\mathrm {s}=r_\mathrm {s}\varvec{\theta }_\mathrm {s}\). This permits far-field approximation

(6.6)

For the phase approximation, for instance at a wave-length of 30 cm, we notice even for a relatively small distance difference, e.g. between 15 m and 15 m \(+\) 15 cm, we could change the sign of the wave. To approximate the phase of the Green’s function, we must therefore at least use \(r_\mathrm {s}-\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{r}\) as approximation. By contrast, this level of precision is irrelevant for the magnitude approximation, e.g., it would be negligible if we used \(\frac{1}{15\,\mathrm {m}}\) instead of the magnitude \(\frac{1}{15\,\mathrm {m}+15\,\mathrm {cm}}\).

At a large distance \(r_\mathrm {s}\) assumed to be constant, the Green’s function is proportional to a plane wave from the source direction \(\varvec{\theta }_\mathrm {s}\)

$$\begin{aligned} \lim _{r_\mathrm {s}\rightarrow \infty } G = {\textstyle \frac{e^{-\mathrm {i}k\,r_\mathrm {s}}}{4\pi \,r_\mathrm {s}}}\; e^{\mathrm {i}k\,\varvec{\theta }_\mathrm {s}^\mathrm {T}\,\varvec{r}}. \end{aligned}$$
(6.7)

The plane-wave part is of unit magnitude \(|p|=1\)

$$\begin{aligned} p=e^{\mathrm {i}k\,\varvec{\theta }_\mathrm {s}^\mathrm {T}\,\varvec{r}} \end{aligned}$$
(6.8)

and its phase evaluates the projection of the position vector onto the plane-wave arrival direction \(\varvec{\theta }_\mathrm {s}\). Towards the direction \(\varvec{\theta }_\mathrm {s}\), the phase grows positive, i.e. the signal arrives earlier. Towards the plane-wave propagation direction \(-\varvec{\theta }_\mathrm {s}\) the phase grows negatively, implying an increasing time delay, which is constant on any plane perpendicular to \(\varvec{\theta }_\mathrm {s}\).

Plane waves are an invaluable tool to locally approximate sound fields from sources that are sufficiently far away, within a small region.Footnote 2

6.4 Basis Solutions in Spherical Coordinates

Figure 4.11 shows spherical coordinates [14, 15] using radius r, azimuth \(\varphi \), and zenith \(\vartheta \). For simplification, zenith is replaced by \(\zeta =\cos \vartheta =\frac{z}{r}\), here. We may solve the Helmholtz equation \((\bigtriangleup +k^2)p=0\) in spherical coordinates by the radial and directional parts of the Laplacian \(\bigtriangleup =\bigtriangleup _\mathrm {r}+\bigtriangleup _{\upvarphi ,\upzeta }\), as identified in Appendix A.3

$$\begin{aligned} \bigtriangleup _\mathrm {r}&=\frac{\partial ^2}{\partial r^2}+\frac{2}{r}\frac{\partial }{\partial r},&\bigtriangleup _{\upvarphi ,\upzeta }&=\frac{1-\zeta ^2}{r^2}\frac{\partial ^2}{\partial \zeta ^2} - \frac{2}{r^2}\zeta \frac{\partial }{\partial \zeta }+\frac{1}{r^2(1-\zeta ^2)}\frac{\partial ^2}{\partial \varphi ^2}. \end{aligned}$$
(6.9)

We already know the spherical harmonics as directional eigensolutions from Sect. 4.7

$$\begin{aligned} \bigtriangleup _{\upvarphi ,\upzeta }Y_n^m=-\frac{n(n+1)}{r^2}\,Y_n^m \end{aligned}$$
(6.10)

and assume them to be a factor of the solution \(p_n^m=R\,Y_n^m\) determining the value of \(\bigtriangleup _{\upvarphi ,\upzeta }\) in \( (\bigtriangleup _\mathrm {r}+k^2+\bigtriangleup _{\upvarphi ,\upzeta })p_n^m=0\). We find a separated radial differential equation after insertion, multiplication by \(\frac{r^2}{Y_n^m}\), and re-expressing the differentials \(\frac{\partial }{\partial r}=k\frac{\partial }{\partial kr}\) and \(\frac{\partial ^2}{\partial r^2}=k^2\frac{\partial ^2}{\partial (kr)^2}\)

$$\begin{aligned} \left[ (kr)^2\frac{\partial ^2}{\partial (kr)^2}+2(kr)\frac{\partial }{\partial (kr)}+(kr)^2-n(n+1)\right] R&=0. \end{aligned}$$
(6.11)

Appendix A.6.4 shows how to get physical solutions for R of this, so-called, spherical Bessel differential equation: spherical Hankel functions of the second kind \(h_n^{(2)}(kr)\) able to represent radiation (radially outgoing into every direction), consistently with Green’s function G, diverging with an \((n+1)\)-fold pole at \(kr=0\), a physical behavior that would also be observed after spatially differentiating G, see Fig. 6.1; spherical Bessel functions \(j_n(kr)=\mathfrak {R}\{h_n^{(2)}(kr)\}\) are real-valued, converge everywhere, exhibit an n-fold zero at \(kr=0\), and can’t represent radiation. Implementations typically rely on the accurate standard libraries implementing cylindrical Bessel and Hankel functions:

$$\begin{aligned} j_n(kr)&=\sqrt{\frac{\pi }{2}\frac{1}{kr}}\,J_{n+\frac{1}{2}}(kr),&h_n^{(2)}(kr)&=\sqrt{\frac{\pi }{2}\frac{1}{kr}}\,H^{(2)}_{n+\frac{1}{2}}(kr). \end{aligned}$$
(6.12)
Fig. 6.1
figure 1

Spherical Bessel functions \(j_n(kr)=\mathfrak {R}\{h_n^{(2)}(kr)\}\) (top left), imaginary part of spherical Hankel functions \(\mathfrak {I}\{h_n^{(2)}(kr)\}\) (top right), and magnitude/dB of \(|h_n^{(2)}(kr)|\) (bottom), over kr

Wave spectra and spherical basis solutions. Any sound field evaluated at a radius r where the air is source-free and homogeneous in any direction can be represented by spherical basis functions for enclosed \(j_n(kr)Y_n^m(\varvec{\theta })\) and radiating fields \(h_n(kr)Y_n^m(\varvec{\theta })\)

$$\begin{aligned} p&=\sum _{n=0}^\infty \sum _{m=-n}^n\bigl [b_{nm}j_n\left( kr\right) +c_{nm}h_n\left( kr\right) \bigr ]Y_n^m\left( \varvec{\theta }\right) . \end{aligned}$$
(6.13)

Here, \(b_{nm}\) are the coefficients for incoming waves that pass through and emanate from radii larger than r and \(c_{nm}\) are the coefficients of outgoing waves radiating from sources at radii smaller than r; the coefficients are called wave spectra of the incoming and outgoing waves, cf. [16].

Ambisonic plane-wave spectrum, plane wave. Plane waves only use the coefficients \(b_{nm}\), while \(c_{nm}=0\) in Eq. (6.13). The sum of incoming plane waves from all directions, whose amplitudes are given by the spherical harmonics coefficients \(\chi _{nm}\) as a set of Ambisonic signals are described by the incoming wave spectrum, see Appendix A.6.5, Eq. (A.119)

$$\begin{aligned} b_{nm}&=4\pi \,\mathrm {i}^n\;\chi _{nm}. \end{aligned}$$
(6.14)

Figure 6.2 shows a single plane wave incoming from the direction \(\varvec{\theta }_\mathrm {s}\) represented by

$$\begin{aligned} b_{nm}=4\pi \,\mathrm {i}^n\;Y_n^m(\varvec{\theta }_\mathrm {s}) \end{aligned}$$
(6.15)

at four different time steps corresponding to \(0^\circ \), \(60^\circ \), \(120^\circ \) and \(180^\circ \) time shifts for the two wave lengths shown.

Fig. 6.2
figure 2

Plane wave from y axis \(\varphi =\vartheta =\frac{\pi }{2}\) in horizontal cross section; time steps correspond to \(0^\circ \), \(60^\circ \), \(120^\circ \), and \(180^\circ \) phase shifts \(\phi \) in the plot \(\mathfrak {R}\{p\,e^{\mathrm {i}\phi }\}\) showing p from Eq. (6.13) with \(c_{nm}=0\) and \(b_{nm}\) of Eq. (6.15) with \(b_{nm}=4\pi \mathrm {i}^nY_n^m(\frac{\pi }{2},\frac{\pi }{2})\); long wave (top), short wave (bottom); simulation uses \(\mathrm {N}=25\) and area shows \(|kx|,|ky|<2\pi \) and \(8\pi \)

Fig. 6.3
figure 3

32-channel higher-order Ambisonic mic. Eigenmike EM32

6.5 Scattering by Rigid Higher-Order Microphone Surface

Higher-order Ambisonic microphone arrays are typically mounted on a rigid sphere of some radius \(r=\mathrm {a}\), such as the Eigenmike EM32, see Fig. 6.3. The physical boundary of the rigid spherical surface is expressed as a vanishing radial component of the sound particle velocity. The radial sound particle velocity is obtained via the equation of motion Eq. (6.2) by deriving Eq. (6.13). This requires to evaluate differentiated spherical radial solutions \(j_n'(x)\) as well as \(h_n'^{(2)}(x)\), which is implemented by \(f'_n(x)=\frac{n}{x}f_n(x)-f_{n+1}(x)\) for either of the functions, cf. e.g. [16]. A sound-hard boundary condition at the radius \(\mathrm {a}\) requires

$$\begin{aligned} v_\mathrm {r}\bigr |_{r=\mathrm {a}}&=\frac{\mathrm {i}}{\rho \,c}\sum _{n=0}^\infty \sum _{m=-n}^n\bigl [b_{nm}\,j_n'(kr) +c_{nm}\,h_n'^{(2)}(kr) \bigr ]_{r=\mathrm {a}}Y_n^m(\varvec{\theta })=0, \end{aligned}$$

which is fulfilled by a vanishing term in square brackets. The rigid boundary responds to incoming surround-sound by velocity-canceling outgoing waves \(h_n'^{(2)}(k\mathrm {a})\,c_{nm}=-{j_n'(k\mathrm {a})}\,b_{nm}.\) The coefficients \(\psi _{nm}\) yield the sound pressure in Fig. 6.4,

$$\begin{aligned} p&=\sum _{n=0}^\infty \sum _{m=-n}^n\psi _{nm}\,Y_n^m(\varvec{\theta }),&\text {with } \psi _{nm}&=\Bigl [j_n(kr)-h_n^{(2)}(kr){\frac{j_n'(k\mathrm {a})}{h_n'^{(2)}(k\mathrm {a})}}\Bigr ]_{r=\mathrm {a}}\,b_{nm}. \end{aligned}$$
(6.16)
Fig. 6.4
figure 4

Plane waves scattered by rigid sphere \(k\mathrm {a}=\pi \) (top) or \(k\mathrm {a}=4\pi \) (bottom); time steps correspond to \(0^\circ \), \(60^\circ \), \(120^\circ \), and \(180^\circ \) phase shifts \(\phi \) in the plot \(\mathfrak {R}\{p\,e^{\mathrm {i}\phi }\}\) showing p from Eq. (6.13) with \(b_{nm}\) and \(c_{nm}\) from Eq. (6.15) with \(b_{nm}=4\pi \mathrm {i}^nY_n^m(\frac{\pi }{2},\frac{\pi }{2})\) and Eq. (6.16); simulation uses \(\mathrm {N}=25\)

The two terms of the bracket are typically further simplified by a common denominator and recognizing the Wronskian Eq. (A.97) in the numerator \(\frac{j_n(x)h_n'(x)-j_n'(x)h_n(x)}{h_n'(x)}=\frac{\mathrm {i}}{x^2h_n'(x)}\)

$$\begin{aligned} \psi _{nm}|_{r=\mathrm {a}}&=\frac{\mathrm {i}}{(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})}\,b_{nm}. \end{aligned}$$
(6.17)

Relation of recorded sound pressure to Ambisonic signal. The scattering equation relates the recorded sound pressure expanded in spherical harmonics to the Ambisonic signal of surround sound scene, see frequency responses in Fig. 6.5,

$$\begin{aligned} \psi _{nm}|_{r=\mathrm {a}}&=\frac{4\pi \,\mathrm {i}^{n+1}}{(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})}\,\chi _{nm}. \end{aligned}$$
(6.18)
Fig. 6.5
figure 5

Attenuation/dB of Ambisonic signals of different orders for varying values of \(k\mathrm {a}\)

It is formally convenient that as soon as the sound pressure is given in terms of its spherical harmonic coefficient signals \(\psi _{nm}\), the Ambisonic signals \(\chi _{nm}\) of a concentric playback system are obviously just an inversely filtered version thereof, with no need for further unmixing/matrixing.

Recognizable from Fig. 6.6 and following our intuition, waves of lengths larger than the diameter \(2\mathrm {a}\) of the sphere will only weakly map to complicated high-order patterns. It is therefore easily understood that the transfer function \(\mathrm {i}^{n+1}[(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})]^{-1}\) attenuates the reception of high-order Ambisonic signals at low frequencies, see Fig. 6.5.

Fig. 6.6
figure 6

Plane-wave sound pressure image \(\mathfrak {R}\{p\,e^{-\mathrm {i}k\mathrm {a}}\}\) on rigid sphere with varying \(k\mathrm {a}\) using \(\psi _{nm}\) from Eq. (6.17) expanded over the spherical harmonics \(p=\sum \psi _{nm}Y_n^m\) and \(\chi _{nm}=Y_n^m(0,0)\) for a plane wave from z. With the wave length \(\lambda =\frac{c}{f}\), the value \(k\mathrm {a}\) is related to a diameter \(2\mathrm {a}\) of \(\frac{k\mathrm {a}}{\pi }=\frac{2\pi \,f\,\mathrm {a}}{\pi \,c}=\frac{2\mathrm {a}}{\lambda }\) in wave lengths to express frequency dependency; simulation uses \(\mathrm {N}=50\); for \(a=4.2\) cm, ka values correspond to \(f=125,250,500,1000,2000,4000,8000,16000\) Hz

6.6 Higher-Order Microphone Array Encoding

The block diagram of Ambisonic encoding of higher-order microphone array signals is shown in Fig. 6.7. The first processing step is about decomposing the pressure samples \(\varvec{p}(t) \) from the microphone array into its spherical harmonics coefficients \(\varvec{\psi }_\mathrm {N}(t)\): To which amount do the samples contain omnidirectional, figure-of-eight, and other spherical harmonic patterns, up to which the microphone arrangement allows decomposition. The frequency-independent matrix \((\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger \) does the conversion. It is the left-inverse to the spherical harmonics sampled at the microphone positions, as shown in the upcoming section.

The second step then sharpens the sound pressure image to an Ambisonic signal by filtering the spherical harmonic coefficient signals. The basic relation between sound pressure coefficients and Ambisonic signals is given in Eq. (6.18) and describes a filter for every coefficient signal, differing only in filter characteristics for different spherical harmonic orders. Robustness to noise, microphone matching and positioning is the key here, and only achieved by the careful design of these filters, as shown in a further sections below. The design considers a gradually increasing sharpening over frequency, for which it moreover employs a filter bank with separate, max-\(\varvec{r}_\mathrm {E}\) weighted and E normalized bands, in order to provide (i) limitation of noise and errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of the sidelobes.

Fig. 6.7
figure 7

Higher-order Ambisonic microphone encoding: sound pressure samples \(\varvec{p}(t)\) are spherical-harmonics decomposed by the matrix \((\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger \), and the resulting coefficient signals \(\varvec{\psi }_\mathrm {N}(t)\) are converted to Ambisonic signals \(\varvec{\chi }_\mathrm {N}(t)\) by the sharpening filters \(\rho _n(\omega )\)

6.7 Discrete Sound Pressure Samples in Spherical Harmonics

To determine the Ambisonics signals \(\chi _{nm}\), we obviously need to find \(\psi _{nm}\) based on all sound pressure samples \(p(\varvec{\uptheta }_i)\) recorded by the microphones distributed on the rigid-sphere array. To accomplish this, we set up a system of model equations equating the pressure samples to the unknown coefficients \(\psi _{nm}\) expanded over the spherical harmonics \(Y_n^m(\varvec{\uptheta }_i)\) sampled at every microphone position. A vector and matrix notation \(\varvec{p}=[p(\varvec{\uptheta }_i)]_i\) and \(\varvec{Y}_\mathrm {N}^\mathrm {T}= [\varvec{y}(\varvec{\uptheta }_i)^\mathrm {T}]_{i,nm}\) is helpful

$$\begin{aligned} \begin{bmatrix} p(\varvec{\uptheta }_1)\\\vdots \\p(\varvec{\uptheta }_\mathrm {M})\end{bmatrix}&=\begin{bmatrix} Y_0^0(\varvec{\theta }_1)&\dots&Y_\mathrm {N}^\mathrm {N}(\varvec{\theta }_1)\\ \vdots&\vdots&\vdots \\ Y_0^0(\varvec{\theta }_\mathrm {M})&\dots&Y_\mathrm {N}^\mathrm {N}(\varvec{\theta }_\mathrm {M}) \end{bmatrix} \begin{bmatrix} \psi _{00}\\ \vdots \\ \psi _{\mathrm {NN}} \end{bmatrix}\nonumber \\ \varvec{p}_\mathrm {N}&=\varvec{Y}_\mathrm {N}^\mathrm {T}\,\varvec{\psi }_\mathrm {N}. \end{aligned}$$
(6.19)

Left inverse (MMSE). The equation can be (pseudo-)inverted if the matrix \(\varvec{Y}_\mathrm {N}\) is well conditioned. Typically more microphones are used than coefficients searched \(\mathrm {M}\ge (\mathrm {N}+1)^2\). Inversion is a matter of mean-square error minimization: As the \(\mathrm {M}\) dimensions may contain more degrees of freedom than \((\mathrm {N}+1)^2\), the coefficient vector \(\varvec{\psi }_\mathrm {N}\) giving the closest model \(\varvec{p}_\mathrm {N}\) to the measurement \(\varvec{p}\) is searched,

$$\begin{aligned} \min _{\varvec{\psi }_\mathrm {N}}&\Vert \varvec{e}\Vert ^2,&\text {with } \varvec{e}&=\varvec{p}_\mathrm {N}-\varvec{p}=\varvec{Y}_\mathrm {N}^\mathrm {T}\,\varvec{\psi }_\mathrm {N}-\varvec{p}. \end{aligned}$$
(6.20)

The minimum-mean-square-error (MMSE) solution is, see Appendix A.4, Eq. (A.65),

$$\begin{aligned} \varvec{\psi }_\mathrm {N}&=(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\varvec{Y}_\mathrm {N}\;\varvec{p}=(\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger \;\varvec{p}. \end{aligned}$$
(6.21)

The resulting left inverse \((\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\varvec{Y}_\mathrm {N}\) inverts the thin matrix \(\varvec{Y}_\mathrm {N}^\mathrm {T}\) from the left. \((\varvec{Y}_\mathrm {N}^\mathrm {T})^\dagger \) symbolizes the pseudo inverse; it is left-inverse for thin matrices.

If the microphones are arranged in a t-design and the order \(\mathrm {N}\) is chosen suitably, then the transpose matrix times \(\frac{4\pi }{\mathrm {L}}\) is equivalent to the left inverse. A more thorough discussion on spherical point sets can be found in [17,18,19].

The maximum determinant points [20] are a particular kind of critical directional sampling scheme that allows to use exactly as few microphones \(\mathrm {M}=(\mathrm {N}+1)^2\) as spherical harmonic coefficients obtained, yielding a well-conditioned square matrix \(\varvec{Y}_\mathrm {N}\), so that it can be inverted directly without left/pseudo-inversion. The 25 maximum-determinant points for \(\mathrm {N}=4\) are used in the simulation example below.Footnote 3

Finite-order assumption and spatial aliasing. An important implication of estimating \(\psi _{nm}\) is that we need to assume that the distribution of the sound pressure is of limited spherical harmonic order on the measurement surface. This could be done by restricting the frequency range, as high-order harmonics are attenuated well-enough according above suitable frequency limits, cf. Fig. 6.5. However, low-pass filtered signals are unacceptable in practice. Instead, one has to accept spatial aliasing at high frequencies, i.e. directional mapping errors and direction-specific comb filters. Figure 6.8 shows spatial aliasing of \(\varvec{\psi }_\mathrm {N}=(\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\,\varvec{p}\) in the angular domain \(p=\sum \psi _{nm}Y_n^m\).

Fig. 6.8
figure 8

Interpolated plane-wave sound pressure image \(\mathfrak {R}\{p\,e^{-\mathrm {i}k\mathrm {a}}\}\) on rigid-sphere array with 25 microphones allowing decomposition up to the order \(\mathrm {N}=4\); simulation uses orders up to 25, and the aliasing-free operation can only be expected within \(kr<\mathrm {N}\)

6.8 Regularizing Filter Bank for Radial Filters

The filters \(\mathrm {i}^{n}\bigl [(ka)^2\,h_n'^{(2)}(ka)\bigr ]^{-1}\) of Fig. 6.5 exhibit an \(n\mathrm {th}\)-order zero at 0 Hz, \(k\mathrm {a}=0\). To retrieve the Ambisonic signals \(\chi _{nm}\) from the sound pressure signals \(\psi _{nm}\), their inverse would have a n-fold (unstable) pole at 0 Hz. Considering that microphone self noise and array imperfection cause erroneous signals louder than the acoustically expected \(n\mathrm {th}\)-order vanishing signals around 0 Hz, filter shapes will moreover cause an excessive boost of erroneous signals unless implemented with precaution. Filters of the different orders n must be stabilized by high-pass slopes of at least the order n, see also [6, 9, 21,22,23,24,25], and with \((n+1)\mathrm {th}\)-order high-pass slopes, see Fig. 6.9, such errors are being cut off by first-order high-pass slopes at exemplary cut-on frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a noise boost of 20 dB for a \(4\mathrm {th}\)-order microphone with \(\mathrm {a}=4.2\) cm, at most. However, just cutting on the frequencies of each order is not enough: every cut-on frequency causes a noticeable loudness drop below due to the discarded signal contributions. It is better to design a filter bank with crossovers instead, which allows compensation for the loudness loss in every band. A zero-phase, \(n\mathrm {th}\)-order Butterworth high-pass response is defined by \(H_\mathrm {hi}=\frac{\omega ^n}{1+\omega ^n}\) and amplitude-complementary to the low pass \(H_\mathrm {lo}=\frac{1}{1+\omega ^n}\), so that \(H_\mathrm {hi}+H_\mathrm {lo}=1\).

Fig. 6.9
figure 9

Filters \((k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})\)/dB over frequency/Hz, regularized with \((n+1)\mathrm {th}\)-order high-pass filters

Fig. 6.10
figure 10

Stabilizing filter bank/dB over frequency/Hz: signal orders \(n>b\) are excluded from the band b

Using this filter type, the filter bank in Fig. 6.10 can be constructed as follows: The band-pass filters \(H_b(\omega )\) are composed of a \((b+1)\mathrm {th}\)-order high- and \((b+2)\mathrm {th}\)-order low-pass skirt at \(\omega _b\), and \(\omega _{b+1}\), respectively, except for the band \(b=0\) (low-pass) and \(b=\mathrm {N}\) (high-pass)

$$\begin{aligned} \hat{H}_0(\omega )&=\frac{1}{1+\bigl (\frac{\omega }{\omega _{1}}\bigr )^{2}} ,&\hat{H}_b(\omega )&=\frac{\bigl (\frac{\omega }{\omega _{b}}\bigr )^{b+1}}{1+\bigl (\frac{\omega }{\omega _{b}}\bigr )^{b+1}}\frac{1}{1+\bigl (\frac{\omega }{\omega _{b+1}}\bigr )^{b+2}} ,&\hat{H}_\mathrm {N}(\omega )&=\frac{\bigl (\frac{\omega }{\omega _{\mathrm {N}}}\bigr )^{\mathrm {N}+1}}{1+\bigl (\frac{\omega }{\omega _{\mathrm {N}}}\bigr )^{\mathrm {N}+1}}. \end{aligned}$$
(6.22)

To make the bands perfectly reconstructing, filters are normalized by the sum response

$$\begin{aligned} H_b=\frac{\hat{H}_b}{\sum _{b=0}^\mathrm {N} \hat{H}_b(\omega )}. \end{aligned}$$
(6.23)

By adjusting the cut-on frequencies \(\omega _b\) of the different orders \(b=1,\dots ,\mathrm {N}\), the noise and mapping behavior of the microphone array is adjusted; only the zeroth order is present in every band down to 0 Hz.

This filter bank design moreover allows to adjust loudness and sidelobe suppression in every frequency band, separately.

6.9 Loudness-Normalized Sub-band Side-Lobe Suppression

The filter bank design shown above would only yield Ambisonic signals whose order increases with the frequency band. Ideally, this variation of the order comes with the necessity of individual max-\(\varvec{r}_\mathrm {E}\) sidelobe suppression in every band. Moreover, Ambisonic signals of different orders are differently loud, so also diffuse-field equalization of the E measure is desirable in every band.

To fulfill the above constraints, we propose to use the following set of FIR filter responses as given in [26, 27], that are modified by a filter bank employing diffuse-field normalized max-\(\varvec{r}_\mathrm {E}\)-weights in separate frequency bands \(b=0,\dots ,\mathrm {N}\), cf. Fig. 6.11, with the \(n\mathrm {th}\) order discarded for bands below \(b<n\):

$$\begin{aligned} \rho _n(\omega )&=\left[ \sum _{b=n}^{\mathrm {N}}a_{n,b}\,H_b(\omega )\right] \,\mathrm {i}^{-n-1}\,(k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})\,e^{\mathrm {i}k\mathrm {a}}. \end{aligned}$$
(6.24)

Here, \(e^{\mathrm {i}k\mathrm {a}}\) removes the linear phase of \(h_n'^{(2)}\), and \(a_{n,b}\) is the set of diffuse-field (\(\sqrt{E}\)) equalized max-\(\varvec{r}_\mathrm {E}\) weights for the band b in which the Ambisonic orders retrieved are \(0\le n\le b\)

$$\begin{aligned} a_{n,b}&={\left\{ \begin{array}{ll} P_n\bigl (\cos \frac{137.9^\circ }{b+1.51}\bigr )\;\sqrt{\frac{\sum _{n=0}^\mathrm {N}(2n+1)\bigl [P_n\bigl (\cos \frac{137.9^\circ }{\mathrm {N}+1.51}\bigr )\bigr ]^2}{\sum _{n=0}^b(2n+1)\bigl [P_n\bigl (\cos \frac{137.9^\circ }{b+1.51}\bigr )\bigr ]^2}} , &{} \text {for }n\le b\\ 0, &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$
(6.25)

Figure 6.12 shows the polar patterns of the corresponding direction-spread functions.

Fig. 6.11
figure 11

Filter-bank-regularized/dB over frequency/Hz, diffuse-field equalized max-\(\varvec{r}_\mathrm {E}\) weighted spherical microphone array responses using \(\mathrm {i}^n\rho _n(\omega )=\sum _{b=n}^\mathrm {N} a_{n,b}\,H_b(\omega )\, (k\mathrm {a})^2\,h_n'^{(2)}(k\mathrm {a})\)

For the implementation of \(\rho _n(\omega )\) by fast block filtering, \(\omega =2\pi \,f\) and \(k=\omega /c\) are uniformly sampled with frequency, and the inverse discrete Fourier transform yields the associated impulse responses (attention: the value at 0 Hz must be replaced for stable results, and cyclic time-domain shifts and windows are necessary).

Fig. 6.12
figure 12

Diffuse-field equalized (to \(E=1\)) max-\(\varvec{r}_\mathrm {E}\) direction-spread functions; even orders are plotted on upper, odd orders on lower semi-circle

The direction-spread function of a plane-wave sound pressure mapped to a directional Ambisonic signal becomes frequency-dependent as shown in Fig. 6.13, and it has minimal side lobes.

Fig. 6.13
figure 13

Direction spread/dB over frequency/Hz in zenithal cross section/degrees through Ambisonic signal of simulated microphone processing response to plane wave from zenith and the parameters \(\mathrm {a}=4.2\) cm, \(\mathrm {M}=25\) mics., max-\(\varvec{r}_\mathrm {E}\)-weighted in bands 90, 680, 1650, 2600 Hz for the cut on of the orders 1, 2, 3, 4. Simulation is done with the order \(\mathrm {N_{sim}}=30\) and spatial aliasing will occur above 5.2 kHz. Gain matching was assumed to be up to \(<\pm 0.5\) dB accurate; the map shows the direction spread normalized to its value at \(0^\circ \) for every frequency to make its shape easier to read

6.10 Influence of Gain Matching, Noise, Side-Lobe Suppression

Typical gain mismatch between the microphones is not always more accurate than 0.5 dB. The result is that the physically dominant omnidirectional signal will leak into the higher-order signals by directionally random gain variations. However, acoustically, higher-order components are expected to be weak and to require amplification. The effect on mapping is equivalent to one of microphone self noise, however gain mismatch yields a correlated signal exciting the microphones, whereas self-noise yields low-frequency noise.

Fig. 6.14
figure 14

Influence of carelessly selected cut-on frequencies for regularization (top), and of non-individual sidelobe suppression per band (middle), in contrast to ideal results (bottom); the maps show direction spreads normalized to their values at \(0^\circ \) for every frequency to make side lobes easier to read

If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression turned off for testing, one would get the poor image as in Fig. 6.14a, where high-order signals at low frequencies are highly boosted.

If a noise-free case is assumed, and only the max-\(\varvec{r}_\mathrm {E}\) side-lobe suppression of the highest band is used for all bands, one gets the image in Fig. 6.14b, which improves with individual max-\(\varvec{r}_\mathrm {E}\) weights in Fig. 6.14c.

Self-noise behavior. Assuming that self-noise of the microphones is uncorrelated, it will also remain uncorrelated and of equal strength after decomposing the \(\mathrm {M}\) microphone signals \(p_i=\mathcal {N}\) into the \((\mathrm {N}+1)^2\) spherical harmonic coefficient signals \(\psi _{nm}=\frac{(\mathrm {N}+1)^2}{\mathrm {M}}\mathcal {N}\), if \(\mathrm {M}\approx (\mathrm {N}+1)^2\) and the microphone arrangement permits a well-conditioned pseudo inversion \(\varvec{Y}_\mathrm {N}^\dagger \). The spectral change of the microphone self noise due to the radial filters \(\rho _n(\omega )\) can be described by the noise of the \((2n+1)\) signals of the same order, amplified by \(|\rho _n(\omega )|^2\), in comparison to the zeroth-order signal:

$$\begin{aligned} |G(\omega )|^2&=\frac{ \sum _{n=0}^\mathrm {N}(2n+1)|\rho _n(\omega )|^2}{ |(k\mathrm {a})^2\,h_0'^{(2)}(k\mathrm {a})|^2}. \end{aligned}$$
(6.26)

Figure 6.15 analyzes the noise amplification for the simulation example (max-\(\varvec{r}_\mathrm {E}\) weighting in each sub band, \(\mathrm {a}=4.2\) cm) and shows the dependency on exemplary cut on frequencies configured to tune the filterbank to 0, 5, 10, 15, and 20 dB noise boosts. The trade here is: the more noise boost one can allow, the more directional resolution one gets, see Fig. 6.16.

Fig. 6.15
figure 15

Self-noise modification \(|G(\omega )|^2\)/dB over frequency/Hz for the filter bank configurations using the cut on frequencies 2k, 3k, 4k, 5k (no noise amplification), 600, 2k, 3.5k, 4.2k (5 dB noise amplification), 280, 1.3k, 2.6k, 3.6k (10 dB noise amplification), 150, 950, 2k, 3.15k (15 dB noise amplification), and 90, 680, 1.65k, 2.6k (20 dB noise amplification)

Fig. 6.16
figure 16

Direction spread/dB for over frequency/Hz and zenith/degrees of filterbank with different settings to achieve 0, 5, 10, 15, 20 dB noise boosts; the maps show direction spreads normalized to their values at \(0^\circ \) at every frequency as above

Open measurement data (SOFA format) characterizing the directivity patterns of the 32 Eigenmike em32 transducers are provided under the link http://phaidra.kug.ac.at/o:69292. They are measured on a \(12^\circ \times 11.25^\circ \) azimuth\(\times \) zenith grid, yielding \(480\times 256\) pt impulse responses for each of the 32 transducers.

6.11 Practical Free-Software Examples

6.11.1 Eigenmike Em32 Encoding Using Mcfx and IEM Plug-In Suites

We give a practical signal processing example for the Eigenmike em32 which is applicable e.g. in digital audio workstations. First the 32 signals are encoded by matrix multiplication (IEM MatrixMultiplier), cf. Fig. 6.17a, yielding 25 fourth-order signals. The preset (json file) is provided online http://phaidra.kug.ac.at/o:79231. The radial filtering that sharpens the surround sound image uses mcfx-convolver, see Fig. 6.17b, with 25 SISO filters, one for each Ambisonic signal, using the 5 different filter curves for the orders \(n=0,\dots ,4\) as defined above. The convolver presets (wav files and config files for mcfx-convolver) are provided online http://phaidra.kug.ac.at/o:79231 and are available for the different noise boosts 0, 5, 10, 15, 20 dB.

Fig. 6.17
figure 17

IEM MatrixMultiplier encoding the Eigenmike em32 signals and mcfx- convolver applying radial filters to encoded em32 recording

Fig. 6.18
figure 18

Practical equalization of the em32 transducer characteristics by two parametric shelving filters of the mcfx_filter, cf. [28]

Fig. 6.19
figure 19

SPARTA Array2SH encoding for, e.g., em32

As found in [28], the em32 transducers exhibit a frequency response that favors low frequencies and attenuates high frequencies. This behavior is sufficiently well equalized in practice using two parametric shelving filters, a low shelf at 500 Hz with a gain of \(-5\) dB, and a high shelf at 5 kHz using a gain of \(+5\) dB, see Fig. 6.18.

6.11.2 SPARTA Array2SH

The SPARTA suite by Aalto University includes the Array2SH plug-in shown in Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It provides both encoding of the signals, as well as calculation and application of radial-focusing filters based on the geometry of the array. It supports rigid and open arrays and comes with presets for several arrays, such as the Eigenmike em32. The plug-in allows to adjust the radial filters in terms of regularization type and maximum gain. The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9.