Higher-Order Ambisonic Microphones and the Wave Equation (Linear, Lossless)

Unlikepressure-gradienttransducers,single-transducermicrophoneswith higher-order directivity apparently turned out to be difﬁcult to manufacture at rea-sonable audio quality. Therefore nowadays, higher-order Ambisonic recording with compact devices is based on compact spherical arrays of pressure transducers. To prepare for higher-order Ambisonic recording based on arrays, we ﬁrst need a model of the sound pressure that the individual transducers of such an array would receive in an arbitrary surrounding sound ﬁeld. The lossless, linear wave equation is the most suitable model to describe how sound propagates when the sound ﬁeld is com-posedofsurroundingsoundsources.Fundamentally,thewaveequationmodelssoundpropagationbyhowsmallpackagesofairreact(i)whenbeingexpandedorcom-pressedbyachangeoftheinternalpressure,andto(ii)directionaldifferencesintheoutsidepressurebystartingtomove.Basedthereupon,theinhomogeneoussolu-tionsofthewaveequationdescribehowanentirefreesoundﬁeldbuildsupifbeingexcitedbyanomnidirectionalsoundsource,asasimpliﬁedmodelofanarbitraryphysicalsource,suchasaloudspeaker,humantalker,ormusicalinstrument.Afteradressingthesebasics,thechaptershowsawaytogetAmbisonicsignalsofhighspatialandtimbralqualityfromthearraysignals,consideringthenecessarydiffuse-ﬁeldequalization,side-lobesuppression,andtradeoffbetweenspatialresolutionandlow-frequenynoiseboost.Thechapterconcludeswithapplicationexamples.

This mathematical theory might appear extensive, but it cannot be avoided when aiming at an in-depth understanding of higher-order Ambisonic microphones. The theory enables processing of the microphone signals received such that the surrounding sound field excitation is retrieved in terms of an Ambisonic signal. Some readers may want to skip the physical introduction and resume in Sect. 6.5 on spherical scattering or Sect. 6.6 on the processing of the array signals.

Equation of Compression
Wave propagation involves reversible short-term temperature fluctuations becoming effective when air is being compressed by sound, causing the specific stiffness of air in sound propagation. The Appendix A.6.1 shows how to derive this adiabatic compression relation based on the first law of thermodynamics and the ideal gas law. It relates the relative volume change V V 0 to the pressure change p = −K V V 0 by the bulk modulus of air. After expressing the bulk modulus by more common constants 1 K = ρ c 2 and differentially formulating the volume change over time using the change of the sound particle velocity in space, e.g. in one dimensionṗ = −ρ c 2 ∂v x ∂ x , cf. Appendix A.6.1, we get the three-dimensional compression equation: ∂ p ∂t = −ρ c 2 ∇ T v. (6.1) Here the inner product of the Del symbol ∇ T = ( ∂ ∂ x , ∂ ∂ y , ∂ ∂z ) with v yields what is called divergence div(v) = ∇ T v = ∂v x ∂ x + ∂v y ∂ y + ∂v z ∂z . The equation means: Independently of whether the outer boundaries of a small package of air are traveling at a common velocity: If there are directions into which their velocity is spatially increasing, the resulting gradual volume expansion over time causes a proportional decrease of interior pressure over time.

Equation of Motion
The equation of motion is relatively simple to understand from the Newtonian equation of motion, e.g. for the x direction, F x = m ∂v x ∂t equates the external force to mass m times acceleration, i.e. increase in velocity ∂v ∂t . For a small package of air with constant volume V 0 = x y z, the mass is obtained by the air density m = ρ V 0 , and the force equals the decrease of in pressure over the three space directions, times the corresponding partial surface, e.g. for the x direction F x = −[p(x + x) − p(x)] y z. For the x direction, this yields after expanding by x Dividing by −V 0 and letting V 0 → 0, we obtain the typical shape of the equation of motion for all three space directions ∇ p = −ρ ∂v ∂t . (6.2) The equation of motion means: Independently of the common exterior pressure load on all the outer boundaries of a small air package, an outer pressure decrease into any direction implies a corresponding pushing force on the package causing a proportional acceleration into this direction.

Wave Equation
We can combine the compression equation ∂ p ∂t = −ρ c 2 ∇ T v with the equation of motion ∇ p = −ρ ∂v ∂t by deriving the first one with regard to time ∂ 2 p ∂t 2 = −ρ c 2 ∇ T ∂v ∂t and the second one with the gradient ∇ T yielding the Laplacian ∇ T ∇ = , hence p = −ρ∇ T ∂v ∂t . Division of the first result by c 2 and equating both terms yields the lossless wave equation p = 1 c 2 ∂ 2 ∂t 2 p that is typically written as Obviously, the wave equation relates the curvature in space (expressed by the Laplacian) to curvature in time (expressed by the second-order derivative). If p is a pure sinusoidal oscillation sin(ω t + φ 0 ), the second derivative in time corresponds to a factor −ω 2 , and by substitution with the wave-number k = ω c , we can write the frequency-domain wave equation as Helmholtz equation. (6.4)

Elementary Inhomogeneous Solution: Green's Function (Free Field)
The Green's function is an elementary prototype for solutions to inhomogeneous problems ( + k 2 ) p = −q, which is defined as A general excitation q of the equation can be represented by its convolution with the Dirac delta distribution q(s) δ(r − s) dV (s) = q(r). Consequently, as the wave equation is linear, the general solution must therefore also equal the convolution of the Green's function with the excitation function p(r) = q(s) G(r − s) dV (s) over space; if formulated in the time domain: also over time. The integral superimposes acoustical responses of any point in time and space of the source phenomenon, weighted by the corresponding source strength in space and time. The Green's function in three dimensions is derived in Appendix A.6.3, Eq. (A.91), with the wave number k = ω c and distance between source and receiver r = r − r s 2 . Acoustic source phenomena are characterized by the behavior of the Green's function: far away, the amplitude decays with 1 r and the phase −kr = −ω r c corresponds to the radially increasing delay r c . Both is expressed in Sommerfeld's radiation condition lim r →∞ r ∂ ∂r p + ik p = 0. Plane waves. The radius coordinate of the Green's function is the distance between two Cartesian position vectors r s and r, the source and receiver location. Letting one of them become large is denoted by re-expressing it in terms of radius and direction vector r s = r s θ s . This permits far-field approximation r s = r s − r = (r s θ s − r) T (r s θ s − r) = r 2 s − 2r s θ T s r + r 2 (6.6) lim r s →∞ For the phase approximation, for instance at a wave-length of 30 cm, we notice even for a relatively small distance difference, e.g. between 15 m and 15 m + 15 cm, we could change the sign of the wave. To approximate the phase of the Green's function, we must therefore at least use r s − θ T s r as approximation. By contrast, this level of precision is irrelevant for the magnitude approximation, e.g., it would be negligible if we used 1 15 m instead of the magnitude 1 15 m+15 cm . At a large distance r s assumed to be constant, the Green's function is proportional to a plane wave from the source direction θ s lim r s →∞ G = e −ik rs 4π r s e ik θ T s r . (6.7) The plane-wave part is of unit magnitude | p| = 1 p = e ik θ T s r (6.8) 6.3 Wave Equation and its phase evaluates the projection of the position vector onto the plane-wave arrival direction θ s . Towards the direction θ s , the phase grows positive, i.e. the signal arrives earlier. Towards the plane-wave propagation direction −θ s the phase grows negatively, implying an increasing time delay, which is constant on any plane perpendicular to θ s . Plane waves are an invaluable tool to locally approximate sound fields from sources that are sufficiently far away, within a small region. 2  [14,15] using radius r , azimuth ϕ, and zenith ϑ. For simplification, zenith is replaced by ζ = cos ϑ = z r , here. We may solve the Helmholtz equation ( + k 2 ) p = 0 in spherical coordinates by the radial and directional parts of the Laplacian = r + ϕ,ζ , as identified in Appendix A.3

Basis Solutions in Spherical Coordinates
We already know the spherical harmonics as directional eigensolutions from Sect. 4.7 (6.10) and assume them to be a factor of the solution p m n = R Y m n determining the value of ϕ,ζ in ( r + k 2 + ϕ,ζ ) p m n = 0. We find a separated radial differential equation after insertion, multiplication by r 2 Y m n , and re-expressing the differentials ∂ ∂r = k ∂ ∂kr and ∂ 2 Appendix A.6.4 shows how to get physical solutions for R of this, so-called, spherical Bessel differential equation: spherical Hankel functions of the second kind h (2) n (kr) able to represent radiation (radially outgoing into every direction), consistently with Green's function G, diverging with an (n + 1)-fold pole at kr = 0, a physical behavior that would also be observed after spatially differentiating G, see Wave spectra and spherical basis solutions. Any sound field evaluated at a radius r where the air is source-free and homogeneous in any direction can be represented by spherical basis functions for enclosed j n (kr)Y m n (θ ) and radiating fields h n (kr)Y m n (θ) Here, b nm are the coefficients for incoming waves that pass through and emanate from radii larger than r and c nm are the coefficients of outgoing waves radiating from sources at radii smaller than r ; the coefficients are called wave spectra of the incoming and outgoing waves, cf. [16].
Ambisonic plane-wave spectrum, plane wave. Plane waves only use the coefficients b nm , while c nm = 0 in Eq. (6.13). The sum of incoming plane waves from all directions, whose amplitudes are given by the spherical harmonics coefficients χ nm as a set of Ambisonic signals are described by the incoming wave spectrum, see Appendix A.6.

Scattering by Rigid Higher-Order Microphone Surface
Higher-order Ambisonic microphone arrays are typically mounted on a rigid sphere of some radius r = a, such as the Eigenmike EM32, see Fig. 6.3. The physical boundary of the rigid spherical surface is expressed as a vanishing radial component of the sound particle velocity. The radial sound particle velocity is obtained via the 2) by deriving Eq. (6.13). This requires to evaluate differentiated spherical radial solutions j n (x) as well as h (2) n for either of the functions, cf. e.g. [16]. A sound-hard boundary condition at the radius a requires which is fulfilled by a vanishing term in square brackets. The rigid boundary responds to incoming surround-sound by velocity-canceling outgoing waves h (2) n (ka) c nm = − j n (ka) b nm . The coefficients ψ nm yield the sound pressure in Fig. 6 The two terms of the bracket are typically further simplified by a common denominator and recognizing the Wronskian Eq. (A.97) in the numerator It is formally convenient that as soon as the sound pressure is given in terms of its spherical harmonic coefficient signals ψ nm , the Ambisonic signals χ nm of a concentric playback system are obviously just an inversely filtered version thereof, with no need for further unmixing/matrixing.
Recognizable from Fig. 6.6 and following our intuition, waves of lengths larger than the diameter 2a of the sphere will only weakly map to complicated high-order patterns. It is therefore easily understood that the transfer function i n+1 [(ka) 2 h (2) n (ka)] −1 attenuates the reception of high-order Ambisonic signals at low frequencies, see Fig. 6.5.

Higher-Order Microphone Array Encoding
The block diagram of Ambisonic encoding of higher-order microphone array signals is shown in Fig.6.7. The first processing step is about decomposing the pressure samples p(t) from the microphone array into its spherical harmonics coefficients ψ N (t): To which amount do the samples contain omnidirectional, figure-of-eight, and other spherical harmonic patterns, up to which the microphone arrangement allows decomposition. The frequency-independent matrix (Y T N ) † does the conversion. It is the left-inverse to the spherical harmonics sampled at the microphone positions, as shown in the upcoming section.
The second step then sharpens the sound pressure image to an Ambisonic signal by filtering the spherical harmonic coefficient signals. The basic relation between sound pressure coefficients and Ambisonic signals is given in Eq. (6.18) and describes a filter for every coefficient signal, differing only in filter characteristics for different spherical harmonic orders. Robustness to noise, microphone matching and position-   ing is the key here, and only achieved by the careful design of these filters, as shown in a further sections below. The design considers a gradually increasing sharpening over frequency, for which it moreover employs a filter bank with separate, max-r E weighted and E normalized bands, in order to provide (i) limitation of noise and errors, (ii) a frequency response perceived as flat, and (iii) optimal suppression of the sidelobes.

Discrete Sound Pressure Samples in Spherical Harmonics
To determine the Ambisonics signals χ nm , we obviously need to find ψ nm based on all sound pressure samples p(θ i ) recorded by the microphones distributed on the rigid-sphere array. To accomplish this, we set up a system of model equations equating the pressure samples to the unknown coefficients ψ nm expanded over the spherical harmonics Y m n (θ i ) sampled at every microphone position. A vector and matrix notation p = [p(θ

Left inverse (MMSE). The equation can be (pseudo-)inverted if the matrix
The resulting left inverse (Y N Y T N ) −1 Y N inverts the thin matrix Y T N from the left. (Y T N ) † symbolizes the pseudo inverse; it is left-inverse for thin matrices.
If the microphones are arranged in a t-design and the order N is chosen suitably, then the transpose matrix times 4π L is equivalent to the left inverse. A more thorough discussion on spherical point sets can be found in [17][18][19].
The maximum determinant points [20] are a particular kind of critical directional sampling scheme that allows to use exactly as few microphones M = (N + 1) 2 as spherical harmonic coefficients obtained, yielding a well-conditioned square matrix Y N , so that it can be inverted directly without left/pseudo-inversion. The 25 maximum-determinant points for N = 4 are used in the simulation example below. 3

Finite-order assumption and spatial aliasing. An important implication of estimat-
ing ψ nm is that we need to assume that the distribution of the sound pressure is of limited spherical harmonic order on the measurement surface. This could be done by restricting the frequency range, as high-order harmonics are attenuated wellenough according above suitable frequency limits, cf. Fig. 6.5. However, low-pass filtered signals are unacceptable in practice. Instead, one has to accept spatial aliasing at high frequencies, i.e. directional mapping errors and direction-specific comb filters. Figure 6.8 shows spatial aliasing of ψ N = (Y T N ) −1 p in the angular domain p = ψ nm Y m n .
To retrieve the Ambisonic signals χ nm from the sound pressure signals ψ nm , their inverse would have a n-fold (unstable) pole at 0 Hz. Considering that microphone self noise and array imperfection cause erroneous signals louder than the acoustically expected nth-order vanishing signals around 0 Hz, filter shapes will moreover cause an excessive boost of erroneous signals unless implemented with precaution. Filters of the different orders n must be stabilized by high-pass slopes of at least the order n, see also [6,9,[21][22][23][24][25], and with (n + 1)th-order high-pass slopes, see Fig. 6.9, such errors are being cut off by first-order high-pass slopes at exemplary cut-on frequencies at 90, 680, 1650, 2600 Hz for the Ambisonic orders 1, 2, 3, 4, yielding a  noise boost of 20 dB for a 4th-order microphone with a = 4.2 cm, at most. However, just cutting on the frequencies of each order is not enough: every cut-on frequency causes a noticeable loudness drop below due to the discarded signal contributions. It is better to design a filter bank with crossovers instead, which allows compensation for the loudness loss in every band. A zero-phase, nth-order Butterworth high-pass response is defined by H hi = ω n 1+ω n and amplitude-complementary to the low pass H lo = 1 1+ω n , so that H hi + H lo = 1. Using this filter type, the filter bank in Fig. 6.10 can be constructed as follows: The band-pass filters H b (ω) are composed of a (b + 1)th-order high-and (b + 2)th-order low-pass skirt at ω b , and ω b+1 , respectively, except for the band b = 0 (low-pass) and b = N (high-pass) To make the bands perfectly reconstructing, filters are normalized by the sum response (6.23) By adjusting the cut-on frequencies ω b of the different orders b = 1, . . . , N, the noise and mapping behavior of the microphone array is adjusted; only the zeroth order is present in every band down to 0 Hz. This filter bank design moreover allows to adjust loudness and sidelobe suppression in every frequency band, separately.

Loudness-Normalized Sub-band Side-Lobe Suppression
The filter bank design shown above would only yield Ambisonic signals whose order increases with the frequency band. Ideally, this variation of the order comes with the necessity of individual max-r E sidelobe suppression in every band. Moreover, Ambisonic signals of different orders are differently loud, so also diffuse-field equalization of the E measure is desirable in every band. To fulfill the above constraints, we propose to use the following set of FIR filter responses as given in [26,27], that are modified by a filter bank employing diffuse-field normalized max-r E -weights in separate frequency bands b = 0, . . . , N, cf. Fig. 6.11, with the nth order discarded for bands below b < n: Here, e ika removes the linear phase of h (2) n , and a n,b is the set of diffuse-field ( The direction-spread function of a plane-wave sound pressure mapped to a directional Ambisonic signal becomes frequency-dependent as shown in Fig. 6 Fig. 6.11 Filter-bank-regularized/dB over frequency/Hz, diffuse-field equalized max-r E weighted spherical microphone array responses using i n ρ n (ω) = N b=n a n,b H b (ω) (ka) 2 h (2) n (ka) 13 Direction spread/dB over frequency/Hz in zenithal cross section/degrees through Ambisonic signal of simulated microphone processing response to plane wave from zenith and the parameters a = 4.2 cm, M = 25 mics., max-r E -weighted in bands 90, 680, 1650, 2600 Hz for the cut on of the orders 1, 2, 3, 4. Simulation is done with the order N sim = 30 and spatial aliasing will occur above 5.2 kHz. Gain matching was assumed to be up to < ±0.5 dB accurate; the map shows the direction spread normalized to its value at 0 • for every frequency to make its shape easier to read

Influence of Gain Matching, Noise, Side-Lobe Suppression
Typical gain mismatch between the microphones is not always more accurate than 0.5 dB. The result is that the physically dominant omnidirectional signal will leak into the higher-order signals by directionally random gain variations. However, acoustically, higher-order components are expected to be weak and to require amplification. The effect on mapping is equivalent to one of microphone self noise, however gain mismatch yields a correlated signal exciting the microphones, whereas self-noise yields low-frequency noise. If regularization filters were set to 50, 160, 500, 1600 and sidelobe suppression turned off for testing, one would get the poor image as in Fig. 6.14a, where high-order signals at low frequencies are highly boosted.
If a noise-free case is assumed, and only the max-r E side-lobe suppression of the highest band is used for all bands, one gets the image in Fig. 6.14b, which improves with individual max-r E weights in Fig. 6.14c.

Self-noise behavior.
Assuming that self-noise of the microphones is uncorrelated, it will also remain uncorrelated and of equal strength after decomposing the M microphone signals p i = N into the (N + 1) 2 spherical harmonic coefficient signals ψ nm = (N+1) 2 M N , if M ≈ (N + 1) 2 and the microphone arrangement permits a well-conditioned pseudo inversion Y † N . The spectral change of the microphone self noise due to the radial filters ρ n (ω) can be described by the noise of the (2n + 1) signals of the same order, amplified by |ρ n (ω)| 2 , in comparison to the zeroth-order signal: (6.26) Figure 6.15 analyzes the noise amplification for the simulation example (max-r E weighting in each sub band, a = 4.2 cm) and shows the dependency on exemplary cut on frequencies configured to tune the filterbank to 0, 5, 10, 15, and 20 dB noise boosts. The trade here is: the more noise boost one can allow, the more directional resolution one gets, see Fig. 6.16.
Open measurement data (SOFA format) characterizing the directivity patterns of the 32 Eigenmike em32 transducers are provided under the link http://phaidra.kug.ac.at/o:69292. They are measured on a 12 • × 11.25 • azimuth× zenith grid, yielding 480 × 256 pt impulse responses for each of the 32 transducers.

Eigenmike Em32 Encoding Using Mcfx and IEM Plug-In Suites
We give a practical signal processing example for the Eigenmike em32 which is applicable e.g. in digital audio workstations. First the 32 signals are encoded by matrix multiplication (IEM MatrixMultiplier), cf. Fig. 6.17a, yielding 25 fourth-order signals. The preset (json file) is provided online http://phaidra.kug.ac.at/o:79231. The radial filtering that sharpens the surround sound image uses mcfx-convolver, see Fig. 6.17b, with 25 SISO filters, one for each Ambisonic signal, using the 5 different filter curves for the orders n = 0, . . . , 4 as defined above. The convolver presets (wav   Practical equalization of the em32 transducer characteristics by two parametric shelving filters of the mcfx_filter, cf. [28] files and config files for mcfx-convolver) are provided online http://phaidra.kug. ac.at/o:79231 and are available for the different noise boosts 0, 5, 10, 15, 20 dB. As found in [28], the em32 transducers exhibit a frequency response that favors low frequencies and attenuates high frequencies. This behavior is sufficiently well equalized in practice using two parametric shelving filters, a low shelf at 500 Hz with a gain of −5 dB, and a high shelf at 5 kHz using a gain of +5 dB, see Fig. 6.18.

SPARTA Array2SH
The SPARTA suite by Aalto University includes the Array2SH plug-in shown in Fig. 6.19 to convert the transducer signals of a microphone array into Ambisonics. It provides both encoding of the signals, as well as calculation and application of radialfocusing filters based on the geometry of the array. It supports rigid and open arrays and comes with presets for several arrays, such as the Eigenmike em32. The plug-in allows to adjust the radial filters in terms of regularization type and maximum gain. The Reg. Type called Z-Style corresponds to the linear-phase design of Sect. 6.9.