Ambisonic Amplitude Panning and Decoding in Higher Orders

Already in the 1970s, the idea of using continuous harmonic functions of scalable resolution was described by Cooper and then Gerzon, who introduced the nameAmbisonics.Thischapterstartsbyreviewingpropertiesofﬁrst-orderhorizontal Ambisonics,usinganinterpretationintermsofpanningfunctions.Andtherequired mathematicalformulationsfor3Dhigher-orderAmbisonicsaredevelopedhere,with theideatoimprovethedirectionalresolution.Basedonthisformalism,idealloud-speakerlayoutscanbedeﬁnedforconstantloudness,localization,andwidth,accord-ingtothepreviousmodels.ThechapterdiscusseshowAmbisonicscanbedecoded tolessideal,typicalloudspeakersetupsforstudios,concerts,sound-reinforcement systems,andtoheadphones.Thebehaviorisanalyzedbyarichvarietyoflisten-ingexperimentsandforvariousdecodingapplications.Thechapterconcludeswith exampleapplicationsusingfreesoftwaretools.

a combination of vector-base panning techniques with Ambisonics, yielding the most robust and flexible higher-order decoding method known today.
For headphones, after the work of Jot [9] that outlined the basic problems of binaural decoding in the 1990s, Sun, Bernschütz, Ben-Hur, and Brinkmann [17][18][19] made important contributions to binaural decoding, and we consider TAC and MagLS decoders by Zaunschirm and Schörkhuber [20,21] as the essential binaural decoders. Both remove HRTF delays or optimize HRTF phases at high frequencies to avoid spectral artifacts. By interaural covariance correction, MagLS/TAC manage to play back diffuse fields consistently, using the formalism of Vilkamo et al [22].

Direction Spread in First-Order 2D Ambisonics
In 2D first-order Ambisonics as discussed in Chap. 1, the directional mapping of a single sound source from the angle ϕ s to the direction of each loudspeaker ϕ is described by the shape of panning function (or direction-spread function) in Eq. (1.17). The directional spreading is not infinitely narrow, but determined by what can be represented by first-order directivity patterns. Consequently, sound from the angle ϕ s will be mapped by a dipole pattern aligned with the source and an additional omnidirectional pattern. We can involve a spread parameter a to make the directional spread to the loudspeakers system adjustable and either cardioid-shaped a = 1, 2D-supercardioid-shaped a = √ 2, or 2D-hypercardioid-shaped a = 2, using: g(ϕ) = 1 + a cos(ϕ − ϕ S ). (4.1) This function represents how first-order Ambisonic panning would distribute a mono signal to loudspeakers. With the loudspeaker positions described by the set of angles {ϕ l }, a vector of amplitude-panning gains with an entry for each loudspeaker could be determined by sampling the direction-spread function: With these gain values, we evaluate models of perceived loudness, direction, and width, as introduced in Chap. 2, in order to enter a discussion of perceptual goals. If the loudspeaker directions {θ l } are chosen suitably, it is possible to obtain panning-independent loudness, direction, and width measures E = l g 2 l , r E = 1 E l g 2 l θ l , and 5 8 180 • π 2 arccos r E . How is it done? For first-order 2D Ambisonics, it is theoretically optimal to use at least a ring of 4 loudspeakers with uniform angular spacing and a = √ 2, which is easily checked with the aid of a computer, cf. Fig. 4.1, and explained below and in Sect. 4

.4.
Direction spread in FOA. The panning-function interpretation with its directional spread has some similarity to MDAP, with its attempt to directionally spread an amplitude-panned signal. Similar to the discrete virtual spread by ±α = arccos r E around the panning direction. The virtual direction spread of first-order Ambisonics is described by its continuous panning function g(ϕ) in Eq. (4.1). To inspect the continuous function by the r E measure defined in Eq. (2.7), we may evaluate an integral over the panning function instead of the sum. Because of the symmetry around ϕ s , we may set for convenience ϕ s = 0, which knowingly causes r E,y = 0, and evaluate r E,x = 2π 0 g 2 (ϕ) cos ϕ dϕ (4. 3) The maximum of r E,x = 2a 2+a 2 is found by d da r E,x = 4+2a 2 −4a 2 2+a 2 = 0, hence at a = √ 2. Consequently, the 2D max-r E weight is r E,x = √ 2 2 = 1 √ 2 and yields the angle arccos r E = 45 • . This would resemble a 2D-MDAP-equivalent source spread to ±45 • . Note that first-order Ambisonics cannot map to a smaller spread than this. Only higher orders permit to further reduce this spread to a desired angle below 90 • .

Ideal loudspeaker layouts.
Not only is the directional aiming of the virtual, continuous first-order Ambisonic panning function ideal and its width panning-invariant, also its loudness measure is panning-invariant. However, decoding to a physical loudspeaker setup can degrade the ideal behavior. For which loudspeaker layout are these properties preserved by sampling decoding?
The 2D first-order Ambisonic components (W, X, Y ) correspond to {1, cos ϕ, sin ϕ} patterns, a first-order Fourier series in the angle. Sampling the playback directions by L = 3 uniformly spaced loudspeakers on the horizon, the sampling theorem for this series is already fulfilled. Accordingly, Parseval's theorem ensures panninginvariant loudness E for any panning direction.
For an ideal r E measure, however, one more loudspeaker is required L ≥ 4 for a uniformly spaced horizontal ring. To explain this increase exhaustively, the concept of circular/spherical polynomials and t-designs will be introduced in this chapter. For a brief explanation, g 2 (ϕ) is a second-order expression and therefore to represent the ideal constant loudness E = g 2 (ϕ) dϕ of the continuous panning function consistently after discretization E = 2π L l g 2 l , it requires L = 3 uniformly spaced loudspeakers, as argued before. By contrast, the expressions g 2 (ϕ) cos ϕ and g 2 (ϕ) sin ϕ are third-order and appear in r E · E = g 2 (ϕ) [cos ϕ, sin ϕ] T dϕ. Consequently, ideal mapping of r E (direction and width) requires at least one more loudspeaker L = 4 for a uniformly spaced arrangement to make the continuous and the discretized form r E E = 2π L l g 2 l [cos ϕ l , sin ϕ l ] T perfectly equal. Towards a higher-order panning function. An Nth-order cardioid pattern is obtained from the cardioid pattern by taking its Nth power which makes it narrower. With N = 2, this becomes, using cos 2 ϕ = 1 2 (1 + cos 2ϕ), (1 + 2 cos ϕ + cos 2 ϕ) = 1 8 (3 + 4 cos ϕ + cos 2ϕ).
More generally, Chebyshev polynomials T m (cos ϕ) = cos mϕ, cf. [23,Eq. 3.11.6] can be used to argue that there is always a fully equivalent cosine series describing the higher-order 2D panning function in the azimuth angle

Rotated panning function.
In first-order Ambisonics, panning functions consist of an omnidirectional part, cos(0ϕ) = 1, and a figure-of-eight to x, cos ϕ, but that was not all: Recording and playback also required a figure-of-eight pattern to y, sin ϕ. The additional component allows to express rotated first-order directivities by a basis set of fixed directivities. For higher orders, a panning function rotated to a non-zero aiming ϕ s = 0 can be re-expressed by the addition theorem cos(α + β) = cos α cos β − sin α sin β into a series involving the sinusoids (odd symmetric part of a Fourier series), We conclude: Higher-order Ambisonics in 2D (and the associated set of theoretical microphone directivities) is based on the Fourier series in the azimuth angle ϕ.

Higher-Order Polynomials and Harmonics
The previous section required that direction and length of the r E vector resulting from amplitude panning on loudspeakers matched the desired auditory event direction and width. Harmonic functions with strict symmetry around a panning direction θ s will help us in achieving this goal and in defining good sampling. Regardless of the dimensions, be it in 2D or 3D, we desire to define continuous and resolution-limited axisymmetric functions around the panning direction θ s to fulfill our perceptual goals of a panning-invariant loudness E, width r E , and perfect alignment between panning direction θ s and localized direction r E . Then we hope to find suitable directional discretization schemes for ideal loudspeaker layouts, so that the measures E and r E are perfectly reconstructed in playback.
The projection of a variable direction vector θ onto the panning direction θ s always yields the cosine of the enclosed angle θ T s θ = cos φ, no matter whether it is in two or three dimensions. Hereby constructing the panning function based on this projection readily meets the desired goals. The mth power thereof, (θ T s θ ) m = cos m φ helps to build an Nth-order power series g = N m=0 a m (θ T s θ ) m to describe a virtual Ambisonic panning function.
For 2D, such a circular polynomial However, we could already recognize that it only takes 2N + 1 functions to express g = N m=0 a m (θ T s θ ) m = N m=0 a m cos m φ: First an initial polynomial with relative azimuth φ = ϕ − ϕ s relating to a harmonic series of N + 1 cosines or Chebyshevpolynomials g = N m=0 b m cos mφ = N m=0 b m T m (θ s θ ). Then, in terms of absolute azimuth ϕ, the trigonometric addition theorem re-expresses the series into one of N + 1 cosines and N sines, with T m (θ s θ ) = cos[m(ϕ − ϕ s )] = cos mϕ s cos mϕ + sin mϕ s sin mϕ. As shown in the upcoming section, we can alternatively obtain such orthonormal harmonic functions by solving a second-order differential equation that is generally used to define harmonics, which bears the later benefit that we can use the approach to define spherical harmonics in three space dimensions.
Spherical polynomials are similar, g = N n=0 a n (θ T s θ ) n , involving the expressions Again, all these (N + 1)(N + 2)(N + 3)/6 combinations would be too many to form an orthogonal set of basis functions. Moreover, while the different cosine harmonics are orthogonal axisymmetric functions in 2D, they are not in 3D. On the sphere, the N + 1 orthogonal Legendre polynomials P n (cos φ) replace the cosine series as a basis for g = N n=0 c n P n (cos φ), as shown below. All mathematical derivations for the sphere rely on the definition of harmonics. They result in (N + 1) 2 spherical harmonics and their addition theorem as a basis in terms of absolute directions 2n+1 4π P n (θ T s θ ) = n m=−n Y m n (θ s )Y m n (θ ). Dickins' thesis is interesting for further reading [12].
In both regimes, 2D and 3D, the circular or spherical polynomials concept will be used to determine optimal layouts, so-called t-designs. Such t-designs are directional sampling grids that are able to keep the information about the constant part of any either circular (2D) or spherical (3D) polynomials up to the order N ≤ t. This will be a mathematical key property exploited to determine requirements for preserving E and r E measures during Ambisonic playback with optimal loudspeaker setups, but not only. Also t-designs simplify numerical integration of circular or spherical harmonics to define state-of-the-art Ambisonic decoders or mapping effects.

Angular/Directional Harmonics in 2D and 3D
The Laplacian is defined in the D-dimensional Cartesian space as and for any function f , the Laplacian f describes the curvature. Any harmonic function is proportional to its curvature by an eigenvalue λ, and therefore is an oscillatory function. Generally, eigensolutions f = −λ f to the Laplacian are called harmonics. For suitable eigenvalues λ, harmonics span an orthogonal set of basis functions that are typically used for Fourier expansion on a finite interval. It seems desirable to find such harmonics for functions only exhibiting directional dependencies, i.e. in the azimuth angle ϕ in 2D, and azimuth and zenith angle ϕ, ϑ in 3D.

Panning with Circular Harmonics in 2D
For 2 dimensions Appendix A.3.2 uses the generalized chain rule to convert the Laplacian of a 2D coordinate system = ∂ 2 ∂ x 2 + ∂ 2 ∂ y 2 to a polar coordinate system with the radius r and the angle ϕ to the x axis, = 1 And for functions = (ϕ) purely in the angle ϕ, the radial derivatives of all vanish and it remains (∂ → d) It is only yielding useful solutions with λ r 2 = m 2 , m ∈ Z, cf. Appendix A.3.4, Fig. 4 which defines how to decompose panning functions of limited order |m| < N. The harmonics are periodic in azimuth, orthogonal and normalized (orthonormal) on the period −π ≤ ϕ ≤ π . Due to their completeness, any square-integrable function g(ϕ) can be expanded into a series of the harmonics using coefficients γ m For a known function g(ϕ), the coefficients γ m are obtained by the transformation integral as shown in Appendix Eq. (A.14).

2D panning function.
An infinitely narrow angular range around a desired direction |ϕ − ϕ s | < ε → 0 is represented by the transformation integral over a Dirac delta distribution δ(ϕ − ϕ s ), cf. Appendix Eq. (A. 16), so that the coefficients of such a panning function are As the infinite circular harmonic series is complete, the panning function is (4.14) and in practice we resolution-limit it to the Nth Ambisonic order, |m| ≤ N, and use an additional weight a m that allows us to design its side lobes Fig. 4.3 2D unweighted a n = 1 basic and weighted max-r E Ambisonic panning functions for the orders N = 1, 2, 5 The max-r E panning function [24] uses the weights a m = cos( π m 2(N+1) ), as derived in Appendix Eq. (A.20). The spread is now adjustable by the order to ± 90 • N+1 . The result is shown in Fig. 4.3, compared with no side-lobe suppression when a n = 1 (basic).
It is easy to recognize: m (ϕ s ) represents the recorded or encoded directions, and m (ϕ) represents the decoded playback directions. Optimal sampling of the 2D panning function. In the theory of circular/spherical polynomials in the variable ζ = cos(ϕ − ϕ s ), so-called t-designs in 2D are optimal point sets of given angles {ϕ l } with l = 1, . . . , L and size L. A t-design allows to perfectly compute the integral (constant part) over the polynomials P m (ζ ) of limited degree m ≤ t by discrete summation regardless of any angular shift ϕ s . In 2D, Chebyshev polynomials T m (cos φ) = cos(mφ) are orthogonal polynomials, therefore an Nth-order panning function composed out of cos(mφ) is always a polynomial of Nth degree. Knowing this, it is clear that the integral over g 2 N required to evaluate the loudness measure E is a polynomial of the order 2N. The integral to calculate r E is over g 2 N cos(φ) and thus of the order 2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the continuous panning function and also the perfectly oriented r E vector of constant spread arccos r E , the parameter t must be t ≥ 2N + 1. In 2D, all regular polygons are t-designs with L = t + 1 points We can use the smallest set of 2N + 2 angles ϕ l = 180 • N+1 (l − 1) as optimal 2D layout.

Ambisonics Encoding and Optimal Decoding in 2D
To encode a signal s into Ambisonic signals χ m , we multiply the signal with the encoder representing the direction of the signal at the angle ϕ s by the weights m (ϕ s ) (4.18) or in vector notation In total, the system for encoding and decoding can also be written to yield a set of loudspeaker gains for one virtual source g = D diag{a N } y N (ϕ s ), (4.22) or in particular for the 2D sampling decoder g = 2π L Y T N diag{a N } y N (ϕ s ).

Listening Experiments on 2D Ambisonics
There are several listening experiments discussing the features of Ambisonics, most of which are summarized in [25], which will be discussed complemented with those from [26] below. The perceptually adjusted panning angle of 2nd-order max-r E Ambisonics panning on 6 horizontal loudspeakers matches quite well the acoustic reference direction as shown in Fig. 4.4, similar to MDAP in Fig. 3.8, but with a slightly more    [27] on center and off-center listening seats for 3 virtual sources (A, B, C) using 1st-order (left) and 5th-order (right) Ambisonics on 12 horizontal loudspeakers (IEM CUBE) indicate a more stable localization with high orders. Moreover, for 5th-order, max-r E weighting and omission of delay compensation were preferred. Omission of max-r E weights ("basic") or alternative "in-phase" weights that entirely suppresses any side lobe yield less precise localization at off-center listening positions perceived angle in ° right Stitt's experiments c imply that localization with higher orders is more stable and that the localization deficiency at off-center listening seats seems to be proportional to the ratio between distance to the center divided by radius of the loudspeaker ring, and not the specific time-delays that are larger for large loudspeaker rings, cf. [28] accurate median by 0.5 • on average, and in particular at side and back panning directions.
Another aspect to investigate is how stable the results are for center and offcenter listening seats as shown in Fig. 4.5. It illustrates that max-r E with the highest order achieves the best stability with regard to localization at off-center listening seats. Astonishingly, the delay compensation for non-uniform delay times to the center deteriorated the results, most probably because of the nearly linear frontal arrangement of loudspeakers that is more robust to lateral shifts of the listening positions than a circular arrangement. Figure 4.6a, b shows the direction histogram for two different weightings a m , and it illustrates that proper sidelobe suppression of the panning function by using max-r E weights is decisive at shifted listening positions to avoid splitting of the auditory image, as it appears in Fig. 4.6b without the weights (basic).
Peter Stitt's work shows that the localization offsets at off-center listening seats do not increase with the radius of the loudspeaker arrangement as long as the off-center seat stays in proportion to the radius, Fig. 4.6c. The result are predicted by the sweet area model from Sect. 2.2.9 for the first order (top row) and third order (bottom row) in Fig. 4.7, with both sizes small setup (left) and large setup (right).  The perceptual sweet spot size as investigated by Frank [29] is nearly covering the entire area enclosed by the IEM CUBE as a playback setup (black = 5th, gray = 3rd, light gray = 1st order Ambisonics). It is smallest for 1st-order Ambisonics  [29] used scales on the floor from which listeners read off where the sweet area ends in every radial direction, cf. Fig. 4.8a. For Fig. 4.8b, the criterion for listeners to indicate leaving the sweet area was when the frontally panned sound was mapped outside the loudspeaker pairs L, C, and R. It showed that a sweet area providing perceptually plausible playback measures at least 2 3 of the radius of the loudspeaker setup if the order is high enough.
The perceived width of auditory events is investigated in the experimental results of Fig. 4.9, [25], in which pink noise was frontally panned in different orientations of the loudspeaker ring (with one loudspeaker in front, with front direction lying quarter-and half-spaced wrt. loudspeaker spacing). Listeners compared the width of multiple stimuli, and the results were expected to indicate constant width for the differently rotated loudspeaker ring, as the optimal arrangement with L = 2N + 2  10 Frank's 2013 experiments on the variation of the sound coloration of virtual sources rotating at a speed of 100 • /s imply that max-r E weighting outperforms the "basic" weighting with a m = 1, and that adjusting the number of loudspeakers to the Ambisonic order seems to be reasonable provides constant r E length. The panning-invariant length is not perfectly reflected in the perceived widths with 3rd order on 8 loudspeakers, for which the on-loudspeaker position is perceived as being significantly wider. By contrast, the high-order experiment with 7th order on 16 loudspeakers would perfectly validate the model. Figure 4.10 shows experiments investigating the time-variant change in sound coloration for a pink-noise virtual source rotating at a speed of 100 • /s, and for different Ambisonic panning setups. There is an obvious advantage of a reduced fluctuation in coloration at both listening positions, centered and off-center, when using the side-lobe-suppressing "max-r E " weighting instead of the "basic" rectangular truncation of the Fourier series. At the off-center listening position, max-r E weights achieve good results with regard to constant coloration for both 3rd and 7th order arrangements with 8 and 16 loudspeakers that were investigated.
How well would diffuse signals be preserved played back? All the above experiments deal with how non-diffuse signals are presented. To complement what is shown in Fig. 1.21 of Chap. 1 with an explanation, the relation between Ambisonic order and its ability to preserve diffuse fields is estimated here by the covariance between uncorrelated directions. Assume a max-r E -weighted Nth-order Ambisonic panning function g(θ T s θ ) that is normalized to g(1) = 1, encodes two sounds s 1,2 from two directions θ 1 and θ 2 , with the sounds being uncorrelated and unit-variance E{s 1 s 2 } = δ 1,2 . We can find that the Ambisonic representation mixes the sounds at their respective mapped directions and yields an increase of their correlation x 1 = s 1 + g 12 s 2 and x 2 = s 2 + g 12 s 1 , using g 12 = g(cos φ), This result was presented in Fig. 1.21 and was used to argue that the directional separation of first-order Ambisonics by its high crosstalk term g 12 might be too weak. Higher-order Ambisonics decreases this directional crosstalk and therefore improves the representation of diffuse sound fields.

Panning with Spherical Harmonics in 3D
In three space dimensions, the spherical coordinate system has a radius r and two angles, azimuth ϕ indicating the polar angle of the orthogonal projection to the x y plane, and the zenith angle ϑ indicating the angle to the z axis, according to the righthanded spherical coordinate systems in ISO31-11, ISO80000-2, [30,31], Fig. 4.11.
By the generalized chain rule, Appendix A.3 re-writes the Laplacian to spherical coordinates in 3D with r signifying the radius, ϕ the azimuth angle, and the zenith angle ϑ re-expressed as ζ = z r = cos ϑ, yielding the operator Any radius-dependent part is removed to define an eigenproblem yielding the basis for panning functions, taking only r 2 ϕ,ζ,3D , whose solution with λ = n(n + 1) defines the spherical harmonics The pre-requisites are (i) periodicity in ϕ and (ii) that the function Y m n is finite on the sphere. In addition to the circular harmonics m expressing the dependency on azimuth ϕ according to Eq. (4.10), the spherical harmonics contain the associated Legendre functions P m n and their normalization term m n (ϑ) = N |m| n P |m| n (cos ϑ) (4.26) to express the dependency on the zenith angle ϑ. The index n ≥ 0 expresses the order and the directional resolution can be limited by requiring 0 ≤ n ≤ N. The index m is the degree and for each n it is limited by −n ≤ 0 ≤ n. The spherical harmonics, Fig. 4.12, are orthonormal on the sphere −π ≤ ϕ ≤ π and 0 ≤ ϑ ≤ π , and for unbounded order N → ∞ they are complete; see also Appendix A.3.7.
The spherical harmonics permit a series representation of square-integrable 3D directional functions by the coefficients γ nm , From a known function g(θ), the coefficients are obtained by the transformation integral over the unit sphere S 2 , cf. appendix Eq. (A.38) Note that the above N3D normalization θ∈S 2 |Y m n (θ )| 2 dθ = 1 defines each spherical harmonic except for an arbitrary-phase it might be multiplied with. Legendre functions for the zenith dependency might be defined differently in literature, and for azimuth, some implementations use sin(mϕ) instead of sin(|m|ϕ).

In Ambisonics, real-valued functions and the SN3D normalization
As infinitely many spherical harmonics are complete, the panning function is (4.30) and in practice, the finite-resolution Nth-order panning function with n ≤ N employs a weight a n to reduce side lobes and optimize the spread The max-r E panning function uses the weights a n = P n cos( 137.9 • N+1.51 ) , as derived in Appendix Eq. (A.46). The spread is now adjustable by the order to ± 137.9 • N+1. 51 . Figure 4.13 shows a comparison to the basic weighting a n = 1. An alternative expression that uses Legendre polynomials P n and only depends on the angle φ to the panning direction θ s is obtained by replacing the sum over m by the spherical harmonics addition theorem n m=−n Y m n (θ s ) Y m n (θ) = 2n+1 4π P n (cos φ), 4π a n P n (cos φ). (4.32) Comparison to first-order Ambisonics shows: now Y m n (θ s ) represents the recorded or encoded directions, and Y m n (θ ) represents the decoded playback directions. Optimal sampling of the 3D panning function. In the theory of spherical polynomials in the variable ζ = θ T s θ , so-called t-designs describe point sets of given directions {θ l } with l = 1, . . . , L and size L that allow to perfectly compute the integral (constant part) over the polynomials P n (ζ ) of limited order n ≤ t by discrete summation Fig. 4.13 3D unweighted a n = 1 basic and weighted max-r E Ambisonic panning functions for the orders N = 1, 2, 5 relative to any axis θ s the point set is projected onto. In 3D, the Legendre polynomials P n (ζ ) are orthogonal polynomials, therefore an Nth-order panning function composed thereof is a polynomial of Nth order. The loudness measure E is calculated by the integral over g 2 N , therefore over a polynomial of the order 2N. The integral to calculate r E runs over g 2 N ζ , therefore over a polynomial of the order 2N + 1. In playback, to get a perfectly panning-invariant loudness measure E of the continuous panning function and also the perfectly oriented r E vector of constant spread arccos r E , the parameter t must be t ≥ 2N + 1. In 3D there are only 5 geometrically regular layouts • the tetrahedron, L = 4 corners, is a 2-design, • the octahedron, L = 6 corners, is a 3-design, • the hexahedron (cube), L = 8 corners, is a 3-design, • the icosahedron, L = 12 corners, is a 5-design, • the dodecahedron, L = 20 corners, is a 5-design. For instance, for N = 1, the octahedron is a suitable spherical design, for N = 2, the icosahedral or dodecahedral layouts are suitable.
Exceeding the geometrically regular layouts, there are designs found by optimization to be regular under the mathematical rule to approximate S 2 Y m n (θ) dθ = √ 4πδ n accurately by 4π L l Y m n (θ l ) for all n ≤ t and |m| ≤ n. A large collection can be found by Hardin and Sloane [32], Gräf

Ambisonic Encoding and Optimal Decoding in 3D
To encode a signal s into Ambisonic signals χ nm , we multiply the signal with the encoder representing the direction θ s of the signal by the weights Y m n (θ s ) 34) or in vector notation using the column vector The Ambisonic signals in χ N are weighted by side-lobe suppressing weights a N = [a 0 , a 1 , a 1 , a 1 , a 2 , . . . , a N ] T , expressed by the multiplication with a diagonal matrix diag{a N }, and then decoded to the L loudspeaker signals x by a sampling decoder In total, the system for encoding Eq. (4.35) and decoding Eq. (4.36) can also be written to yield loudspeaker gains for one signal 38) or in particular for the 3D sampling decoding g = 4π L Y T N diag{a N } y N (θ s ).

Ambisonic Decoding to Loudspeakers
Ambisonic decoding to loudspeakers has been dealt with by numerous researchers, in the past, particularly because result are not very stable for first-order Ambisonics, and later because they strongly depend on how uniform the loudspeaker layout is for higher-order Ambisonics. Moreover, Solvang found that even the use of too many loudspeakers has a degrading effect [35]. For first-order decoding, the Vienna decoders by Michael Gerzon [36] are often cited, and for higher-order Ambisonic decoding, one can, e.g. find works by Daniel with max-r E [37] and pseudo-inverse decoding [10], also by Poletti [14,38,39].
What turned out to be the most practical solution, is the All-Round Ambisonic Decoding approach (AllRAD) due to its feature of allowing imaginary loudspeaker insertion and downmix as described in the sections above, cf. [40]. It moreover does not have restrictions on the Ambisonics order, which for other decoders often yields poor controllability of panning-dependent fluctuations in loudness and directional mapping errors.
The playable set of directions θ l or ϕ l is usually finite and discrete, and it is represented by the surrounding loudspeakers' directions. The directional distribution of the surrounding loudspeakers is typically neither a t-design (with t ≥ 2N + 1 in general, sometimes not even regular polygons with L ≥ 2N + 2 loudspeakers for 2D, in particular). In such cases, it is extremely helpful to be aware of the properties of the various decoder design methods.

Sampling Ambisonic Decoder (SAD)
The sampling decoder as introduced above is the simplest decoding method. For dimensions (D = 2) and three (D = 3), it uses the matrix Y N = [ y N (θ 1 ), . . . , y N (θ L )] containing the respective circular or spherical harmonics y N (θ ) sampled at the loudspeaker directions {θ l }, with the circumference of the unit circle denoted as S 1 = 2π or the surface of the unit sphere written as S 2 = 4π . The factor S D−1 L expresses that each loudspeaker synthesizes a fraction of the E measure on the circle or sphere of the surrounding directions. However, the sampling decoder would neither yield perfectly constant loudness and width measures, E, r E , nor a correct aiming of the localization measure r E if the loudspeaker layout wasn't optimal. For instance concerning loudness, for panning towards directional regions of poor loudspeaker coverage, sampling misses out the main lobe of the panning function, yielding a noticeably reduced loudness.

Mode Matching Decoder (MAD)
The mode-matching method is used in [10,39] and yields a fundamentally different decoder design. Its concept is to re-encode the gain vector g of the loudspeakers for any panning direction θ s by the encoding matrix Y N = [ y N (θ 1 ), . . . , y N (θ L )] for all loudspeaker directions {θ l }. Ideally, the re-encoded result should match the encoding of the panning direction with sidelobes suppressed Using the definition g = D diag{a N } y N (θ s ) of the panning gains, we obtain For the inverse of Y N Y T N to exist, it is necessary to have at least as many loudspeakers as harmonics, i.e. L ≥ (N + 1) 2 with D = 3 or L ≥ 2N + 1 for D = 2. However, this is not a sufficient criterion yet: In directions poorly covered with loudspeakers, the inversion will boost the loudness, so that the result is often numerically ill conditioned for (Y N Y T N ) −1 unless the loudspeaker layout is uniformly designed, at least. Mode matching decoding is ill-conditioned on hemispherical or semicircular loudspeaker layouts. The solution is equivalently described by the more general pseudo inverse Y † N , which is right-inverse for fat matrices.

Energy Preservation on Optimal Layouts
For instance, for an order of N = 2, 2D Ambisonics should work optimally with a ring of 45 • spaced loudspeakers on the horizon, a circular (2N + 1)-design, or for 3D, a spherical (2N + 1)-design. On a t-design selected by t ≥ 2N, the loudness measure E is panning-invariant, in general, This is because a t ≥ 2N-design discretization preserves orthonormality

Loudness Deficiencies on Sub-optimal Layouts
For 2D layouts, Fig. 4.15 shows what happens if a decoder is calculated for a t ≥ 2N + 1-design with one loudspeaker removed: While, for panning across the gap, the sampling Ambisonic decoder (SAD) yields a quieter signal, moderate localization errors and width fluctuation, the mode-matching decoder (MAD) yields a strong loudness increase and severe jumps in the localization/width. MAD is therefore not very practical with sub-optimal layouts, SAD only slightly more so.

Energy-Preserving Ambisonic Decoder (EPAD)
To establish panning-invariant loudness for decoding to non-uniform surround loudspeaker layouts one can ensure a constant loudness measure E by enforcing D T D = I, which is otherwise only achieved on t ≥ 2N-designs. We may search for a decoding matrix D whose entries are closest to the sampling decoder under the constraint to be column-orthogonal: Fro → min (4.42) subject to D T D = I.
The singular value decomposition of can be used to create Note that if the loudspeaker setup directions are already a t ≥ 2N design, the sampling, mode-matching, and energy-preserving decoders are equivalent.

All-Round Ambisonic Decoding (AllRAD)
In Chap. 3 on vector-base amplitude panning methods, a well-balanced panning result in terms of loudness, width, and localization was achieved by MDAP that distributes a signal to an arrangement of several superimposed VBAP virtual sources. Hereby E = const., r E ≈ r E θ s , and r E ≈ const. This works for nearly any loudspeaker layout.
While, to calculate loudspeaker gains, MDAP superimposes an arrangement of discrete virtual sources within a range of ±α around the panning direction θ s , one could also think of superimposing a quasi-continuous distribution of virtual sources that are weighted by a continuous panning function g(θ).
The ideal continuous panning function g(θ) of axisymmetric directional spread around the panning direction θ s is described by g(θ) = y T N (θ) diag{a N } y N (θ s ), the Ambisonic panning function. This rotation-invariant continuous function is optimal in terms of loudness, width, and localization measures, which are all evaluated by continuous integrals: E = g 2 (θ ) dθ = const. expresses panning-invariant loudness, r E = 1 E g 2 (θ) θ dθ = r E θ s indicates a perfect alignment r E θ s with the panning direction and a panning-invariant width r E = const. However, the optimal values of these integrals are only preserved by discretization with optimal t ≥ 2N + 1-design loudspeaker layouts.

All-round Ambisonic decoding (AllRAD) is preceded by the work of Batke and
Keiler [16]. They describe Ambisonic panning g AllRAD (θ) = D y N (θ ) by a decoder D, whose result matches best with VBAP g VBAP (θ ). Without max-r E weights yet, we use this here to define AllRAD by the integral expressing a minimum-meansquare-error problem using the integral over all panning directions θ Equivalently, as described by Zotter and Frank [40] who coined the name, we may define AllRAD as VBAP synthesis on the physical loudspeakers when using as multiple-virtual-source inputs the Ambisonic panning function g AMBI (θ ) = y T N (θ ) diag{a N } y N (θ s ) sampled at an optimal layout of virtual loudspeakers. Here, we write the synthesis as the integral over infinitely many virtual loudspeakers θ , We can obviously pull the term diag{a N } y N (θ s ) out of the integral. The remaining integral defines the AllRAD matrix D. We may interpret it as a transformation of the VBAP loudspeaker gain functions g VBAP (θ ) into spherical harmonic coefficients. In the original paper [40], AllRAD is evaluated by an optimal layout of discrete virtual loudspeakers using the directions {θ l } of a t-design. As VBAP's gain functions aren't smooth (derivatives are non-continuous), they are order-unlimited, and a t-design of sufficiently high t should be used. In the 3D practice, the 5200 pts. Chebyshev-type design from [33] is dense enough. Note that the VBAP part permits improvements by insertion and downmix of imaginary loudspeakers to adapt to asymmetric or hemispherical layouts, as suggested in the original paper [40], cf. Sect. 3.3.
Note that the decoder needs to be scaled properly. For instance, the norm of the omnidirectional component (first column) could be equalized to one, as it would typically be with a sampling decoder; there are alternative strategies to circumvent the scaling problem [41].  Figude 4.16 shows the improvement achieved with EPAD and AllRAD on an equiangular arrangement that is suboptimal by the missing loudspeaker at −90 • . Both decoders manage to handle either the loudness stabilization perfectly well (EPAD) or keep the directional and spread mapping errors small (AllRAD). We notice that for EPAD, with the constraint that L ≥ (2N + 1) just fulfilled for N = 3 and L = 7 of the simulation, it would not simply be possible to remove any further loudspeakers without degradation.

Decoding to Hemispherical 3D Loudspeaker Layouts
In typical loudspeaker playback situations for large audience, a solid floor and no loudspeakers below ear level are considered practical for several reasons. However, this does not permit decoding by sampling with optimal t-design layouts covering all directions. As shown above, EPAD and AllRAD do not require such arrays. And yet, they still require some care when used with hemispherical loudspeaker layouts, see [15,40] for further reading.

EPAD with hemispherical loudspeaker layouts.
Even for a hemispherical layout, the energy-preserving decoding method requires L ≥ (N + 1) 2 loudspeakers to achieve a perfectly panning-invariant loudness. However, this is counter-intuitive: Why should one need at least as many loudspeakers on a hemisphere as are required for sameorder playback on a full sphere? Shouldn't the number be half as many? We can show that while the spherical harmonics are orthonormal on the sphere S 2 , i.e. S 2 y N (θ ) y T N (θ ) dθ = I, they aren't orthogonal on the hemisphere S = S 2 : Here, G is called Gram matrix, and it is evaluated by 4π L l:θ z,l ≥0 y N (θ l ) y T N (θ l ) using a high-enough t-design. By singular-value decomposition of the positive semi-definite matrix G = Q diag{s} Q T , with Q T Q = Q Q T = I, we diagonalize G and find new basis functionsỹ N (θ ), the so-called Slepian functions [42], that are orthogonal on S Typically, the singular values in s are sorted descendingly s 1 ≥ s 2 ≥ · · · ≥ s (N+1) 2 so that it is possible to cut out basis functions of significantly large contribution to the upper hemisphere S byỹ Typically, the numerical integral is extended to slightly below the horizon, see Table 4.1, so that truncation to the (N + 1)(N + 2)/2 most significant basis functions, see Fig. 4.17, produces a minimum fluctuation in the loudness measurẽ E = ỹ N (θ ) 2 for panning on the hemisphere. Withỹ N (θ), EPAD is calculated in the same way as for the ordinary harmonics

AllRAD with hemispherical loudspeaker layouts.
Because of the vector-base amplitude panning involved, all-round Ambisonic decoding (AllRAD) is comparatively robust to irregular loudspeaker setups. Still, a hemispherical layout does not contain any loudspeaker direction vector pointing to the lower half space, therefore one could just omit information of the lower half space. However, the Ambisonic panning function implies a directional spread, so that panning to exactly the horizon also produces content below, whose omission causes: (i) a loss in loudness, (ii) a slight elevation of the perceived direction, cf. Fig. 4.18.
As discussed in the section on triangulation Sect. 3.3, the insertion of imaginary loudspeakers fixes this behavior. In the case of hemispherical loudspeaker layouts, it is not necessary to downmix the signal of the imaginary loudspeaker at nadir to stabilize both loudness and localization for panning to the horizon.
Signal contributions below but close to the horizon largely contribute to the horizontal loudspeakers, and it is therefore safe to dispose the signal that would feed the imaginary loudspeaker at nadir without loss of loudness. Moreover, this contribution from below also reinforces signals on the horizontal loudspeakers so that localization is pulled back down. Both can be observed in Fig. 4.18 that shows the loudness measure E as well as mislocalization and width by the measure r E using max-r E -weighted AllRAD with 5th-order Ambisonics along a vertical panning circle on the IEM mobile Ambisonics Array (mAmbA). It consists of 25 loudspeakers set up in rings of 8,8,4,4, and 1 loudspeakers at 0, 20, 40, 60, 90 degrees elevation. Rings two and four start at 0 degree, the others are half-way rotated.   Fig. 4.18 Perceptual measures for 5th max-r E -weighted AllRAD on the IEM mAmbA layout [43] with (black) and without insertion of the bottom imaginary loudspeaker (black dotted) whose signal is disposed, and max-r E -weighted EPAD (gray), for panning on a vertical circle: E in dB (top), orientation error of r E in degrees (middle), and width expressed by arccos r E in degrees (bottom). The thin dashed line shows AllRAD without imaginary loudspeakers While (top in Fig. 4.18) AllRAD produces a loudness fluctuation roughly spanning 1 dB for panning on the hemisphere, EPAD only exhibits 0.3 dB, as specified in Table 4.1. While in monophonic playback of noise, loudness differences of less than 0.5 dB can be heard, it is safe to assume that a weak directional loudness fluctuation of less than 1 dB is normally inaudible. In this regard, loudness fluctuation should be no problem with both EPAD and AllRAD.

Performance comparison on hemispherical layouts.
Concerning the directional mapping, EPAD produces a more strongly pronounced ripple, with r E indicating sounds on the horizon ϑ s = ±90 • to be pulled upwards towards 0 • more with EPAD (7 • ) than with AllRAD (3 • ). In terms of width, both EPAD and AllRAD exhibit the ≈20 • average associated with max-r E weighting. However, EPAD also produces a greater fluctuation, and it widens up to about 30 • degree for panning to the horizon ϑ s = ±90 • .
With the 9 loudspeakers of the ITU [44] 4 + 5 + 0 layout (horizontal ring: ϕ = 0, ±30 • , ±120 • , upper ring at 40 • elevation with ϕ = ±30 • , ±120 • ), it is not possible anymore to use EPAD with 5th order, which would be the optimal resolution for the front loudspeaker triplet. EPAD only supports orders up to N = 2, and to lose level towards below-horizon directions, we can use the reduced set of 6 Slepian functions; alternatively all 9 spherical harmonics of N = 2 would also be thinkable. For AllRAD, imaginary loudspeakers are inserted at the sides at azimuth/elevation ±75 • /27 • , up at 0 • /78 • , back 180 • /35 • , and below 0 • / − 90 • . It is reasonable to downmix the imaginary loudspeakers with a factor one for up, sides, back, and re-normalize the VBAP gain matrix, while disposing the signal of the imaginary loudspeaker below. AllRAD permits to use the order N = 5, which resolves the frontal loudspeaker triplet much better for horizontal panning. Figure 4.19 shows the result of max-r E -weighted 2nd-order EPAD and 5th-order AllRAD for the 4 + 5 + 0 layout using a vertical panning curve. While the perfectly constant loudness measure of EPAD might be favored over the almost +3 dB loudness increase of front and back for AllRAD, AllRAD's lower directional error, narrower width mapping, greater flexibility, and simplicity has often proven to be clearly superior in practice.

Practical Studio/Sound Reinforcement Application Examples
This section analyzes the application of 3D Ambisonic amplitude panning consisting of encoding and AllRAD to studio (with typical setups of 2 m radius) and sound reinforcement applications (for an audience of, e.g., 250 people). Application scenarios are sketched in [43], and various other examples are given below. Requirements of a constant loudness and width are analyzed below, and as sound reinforcement requires a particularly large sweet area, the r E vector model for off-center listening positions from Sect. 2.2.9 is used to depict the sweet area size. The analysis of decoders above described loudness measures for panning on a circle. To observe them with panning across all directions in Figs. 4.20 and 4.22, world-map-like mappings using a gray-scale representation of the loudness and width measures are more reasonable. For several loudspeaker layouts, its axes are azimuth horizontally and zenith vertically, and the gray-scale map displays the loudness measure E in dB (left column) and the width measure arccos r E in degrees (right column). As 5th-order max-r E -weighted AllRAD typically produces minor directional mapping errors, they aren't explicitly shown in Figs  contrast, for instance the mAmbA layout in Fig. 4.22 only has 8 loudspeakers on the horizon, and signals panned to the largely spaced below-horizon triangles tend to get louder. Moreover, it is easier for loudspeaker systems of many channels such as IEM CUBE, mAmbA, Lobby, and Ligeti Hall in Fig. 4.22 to yield smooth loudness and width mappings. Still, also with only a few loudspeakers, slight direction adjustment in the layout can fix some of the behavior, as with the IEM Production Studio, whose ±45 • loudspeakers in the elevated layer is superior to a ±30 • spacing.

Ambisonic Decoding to Headphones
Typically, Ambisonic decoding to headphones can be done similarly as with loudspeakers, except that the loudspeaker signals are rendered to headphones by convolution with the head-related impulse responses (HRIRs) of the corresponding playback directions. Various databases of such HRIRs can be found, e.g., on the website SOFAconventions. 2 This headphone decoding approach is classically using a small set of so-called virtual loudspeakers, as it is found in many places in technical literature, e.g. in the pioneering works of Jean-Marc Jot et al. [9] or Jérôme Daniel [10]. It is relevant in many important other works [18,45,46], the SADIE project, 3 and it is employed in Sect. 1.4.2 on first-order Ambisonics.
Coarse. However, as outlined in some research papers [9,18,46], these approaches have in common that low-order Ambisonic synthesis is problematic. It can either happen when inserting a dense grid of virtual-loudspeaker HRIRs that the Ambisonic smoothing attenuates high-frequency at frontal and dorsal directions. Or, what had been the solution for a long time, a coarse grid of virtual-loudspeaker HRIRs does not attenuate high frequencies, but still yields that spatial quality strongly depends on the particular grid layout or orientation [46]. An early paper by Jot [9] proposed to remove the time delays of the HRIR before Ambisonic decomposition, and then to re-insert the otherwise missing interaural time-delay afterwards, for any sound panned in Ambisonics, which unfortunately yields an object-based panning system rather than a scene-based Ambisonic system.
Dense. Some dense-grid approaches propose to keep the HRIR time delays, or if formulated in the frequency domain: the HRTF phases (head-related transfer function), and hereby stay in a scene-based Ambisonic format, while correcting spectral deficiencies by diffuse-field or interaural-covariance equalization [18,47]. Finally, most recent solutions proposed by Jin, Sun, and Epain, [17,48] or Zaunschirm, Schörkhuber, and Höldrich [20,21] modify the HRIR time delays/HRTF phases but only above, e.g., 3 kHz, without any object-based re-insertion afterwards. The omission of high-frequency interaural time-delay/phase information is a reasonable trade off done in favor of a more important accuracy in spectral magnitude.

What does directional HRIR smoothing do to high frequencies?
The geometrical theory of diffraction [49] suggests that HRIRs must always contain at least the delay to the ear of either the shortest direct path or the shortest indirect path via the surface of the head. For a spherical head model with the radius R = 0.0875 m and speed of sound c = 343 m s , the Woodworth-Schlosberg formula [50] is composed of this as plotted in Fig. 4.25a, and recognizable from dummy-head measurements 4 in Fig. 4.25b. If the HRIR is smoothed across an angular range, the time-delay curve gets spread across time as well, see Fig. 4.26. In this way, depending on whether the smoothing uses a continuous or discrete set of directions, one either obtains something like a comb filter or a sinc-shaped frequency response. This smoothing is least disturbing around the direct-ear side as shown left in Fig. 4.26, and, as the indirect ear also encounters high-frequency shadowing effects, it is most disturbing mainly for frontal and rear sounds at 0 • or 180 • , as shown right in Fig. 4.26. The corresponding frequency responses are roughly exemplified with what third-order Ambisonics equivalent smoothing would do to either 45 • -spaced HRIRs in Fig. 4.27a or 15 • -spaced ones in Fig. 4.27b. To get an upper frequency limit, it is insightful to work in the frequency domain where the HRIR is denoted head-related transfer function (HRTF). A simplified linearized-phase version around φ = 0 uses τ ≈ R c φ, and the resulting in the Fourier transform with ω = 2π f is To represent it by circular or spherical harmonics transformation limited to the order N, a maximum phase change represented by the harmonic e iNφ implies that we can only resolve the phase up to ω c R ≤ N, hence the range of accurate operation is limited in frequency As high-frequency HRTF phase evolves more rapidly over the angle as what the finite order can represent, this typically yields attenuation of the high frequencies when obtaining circular/spherical harmonics coefficients by transformation integral. Directional smoothing of the discrete directional HRTFs causes relevant spectral problems, regardless of whether directional smoothing is done by Ambisonics, VBAP, MDAP. Mainly the geometric delay in the HRIRs is responsible for the emerging comb-filter or low-pass behavior. One could pull out the linear phase trend above the frequency limit and re-insert it, but is re-insertion necessary?

High-Frequency Time-Aligned Binaural Decoding (TAC)
As a pre-requisite for their binaural Ambisonic decoders, Schörkhuber et al. [21] tested, above which frequency the removal of the HRTF linear phase trend remains inaudible in direct HRTF-based rendering without panning or smoothing. In fact, most of their listeners could not distinguish the absence of the linear phase trend when removed above 3 kHz for various sound examples (drums, speech, pink noise, rendered at directions 10 • , −45 • , 80 • , −130 • ). They had their subjects compare the result to a reference with unaltered HRTFs, and the result is analyzed in Fig. 4.28. By this finding, it is possible to split up each of the 2 × 1 HRIRs h(t, θ ) into an unaltered low-pass band and a time-aligned high-pass band to unify the highfrequency HRIR delaŷ The time delay model τ (φ) uses the angle to the left/right ear on the positive/negative y axis, so arccos ±θ y , but shifted by 90 • , hence φ = ± arcsin θ y . This removal allows use all available HRIRs of dense measurement sets for binaural synthesis of high accuracy, using a suitable linear Ambisonic decoder such as AllRAD. Assuming the resulting modified left and right HRIR for all directions are denoted as 2 × L matrixĤ(t) = [ĥ(t, θ 1 ), . . . ,ĥ(t, θ L )] T , the 2 × (N + 1) 2 filter set for decoding every of the Ambisonic channels to the ears becomes: Results achieved by a pseudo-inverse decoding to hereby time-aligned HRIRs using R = 0.085 cm with N = 3 from the 2702-directions Cologne HRIRs 5 is shown in Fig. 4.29. The resulting polar patterns (ta) clearly outperform the linear decomposition (lin) at frequencies above 2kHz in representing the original HRTFs (max).

Magnitude Least Squares (MagLS)
Alternative to high-frequency time delay disposal, Schörkhuber et al. present an optimum-phase approach [21] that disregards phase match in favor of an improved magnitude match above cutoff. Formulated exemplarily for the left ear, across every HRTF direction θ l , and for every discrete frequency ω k , with h l,k = h(θ l , ω k ), this becomes (4.56) Typically, one would need to solve magnitude least squares or magnitude squares least squares tasks with semidefinite relaxation, see Kassakian [51]. In practice, however, results turn out to be perfect already with an iterative combination of the reconstructed phaseφ l,k−1 from the previous frequency ω k−1 with the HRTF magnitude |h k,l | of the current frequency ω k , before a linear decomposition thereof into spherical harmonic coefficientsĥ SH,k .
Every frequency below cutoff ω k < 2π f N just uses the linear least-squares spherical harmonics decomposition with the left-inverse of the spherical harmonics Y N sampled at the HRTF measurement nodes, Continuing with the first frequency above/equal to cutoff ω k ≥ 2π f N , the algorithm proceeds as:φ and then moves to the next frequency k ← k + 1. The results are typically transformed back to time domain to get a real-valued impulse response for every spherical harmonic to the regarded ear.
The results of the MagLS approach (mls) outperform the time-alignment approach (ta) in the exemplary results shown for N = 3 in Fig. 4.29, in particular at the highest frequencies, where sphere-model-based delay simplification is not sufficiently helpful, anymore.

Diffuse-Field Covariance Constraint
Also for both the above approaches that modify the high-frequency phase, Zaunschirm et al. [20] note that low order rendering degrades envelopment in diffuse fields, so that they introduce an additional covariance constraint as defined by Vilkamo [22]. It can be implemented as a 2 × 2 filter matrix equalizing the resulting frequencydomain diffuse-field covariance matrix to the one of the original HRTF datasets. On the main diagonal, this covariance matrix shows the diffuse-field ear sensitivities (left and right), and off-diagonal it contains the diffuse-field inter-aural cross correlation.
At every frequency, the 2 × 2 diffuse-field covariance matrix of the original, very-high-order spherical harmonics HRTF dataset H H SH of the dimensions 2 × (M + 1) 2 with (M N) is given by The derivation why this inner product of spherical harmonic coefficients represents the diffuse-field covariance is given in Appendix A.5. The low-order high-frequency modified HRTF coefficient setH SH of the dimensions 2 × (N + 1) 2 also has a 2 × 2 covariance matrixR that will differ from the more accurate R,  loudspeaker positions need to be employed. To work in 3 dimensions, programming in Pd would also be similar as in the corresponding first-order example of Fig. 1.15, using the matrix object [mtx_spherical_harmonics]. Typically, pre-calculated decoders including AllRAD and max-r E are used and loaded by, e.g., [mtx D.mtx] into Pd to keep programming simple.

Ambix Encoder, IEM MultiEncoder, and IEM AllRADecoder
For encoding single-or multi-channel signals into Ambisonics, there are the ambix_encode_o<N>, or ambix_encode_i<L>_o<N> VST plugins available from Kronlachner's ambix plugin suite or the IEM MultiEncoder from the IEM plugin suite. As exemplarily shown in Fig. 4.32, the multi encoder allows to encode channel-based multi-channel audio material, where channel-based [52] typically refers to each channel of the multi-channel material meant to be played back on a separate loudspeaker of clearly defined direction, cf. [44]. Elsewhere, the embedding of virtual playback directions can also be found referred to as beds or virtual panning spots.   The IEM AllRADecoder permits to manually enter or import the loudspeaker coordinates and channel indices, with the coordinates specified by the azimuth and elevation angle in degrees, as exemplified for the IEM production studio in Fig. 4.33. The figure also shows that just entering the pure 5 + 7 + 0 layout would produce an error message Point of origin not within convex hull. Try adding imaginary loudspeakers.
By adding an imaginary loudspeaker below whose signal is typically omitted, see Fig. 4.34, it becomes geometrically valid to calculate and employ the resulting decoder, however it is better to also insert an imaginary loudspeaker at the rear whose signal is preserved by specifying the gain value 1, as shown in Fig. 4.35.

Reaper, IEM RoomEncoder, and IEM BinauralDecoder
Particularly relevant for head-phone-based listening, rendering of anechoic sounds will typically not externalize well, as it does not match the mental expectation of ordinary listening environments [53][54][55][56]. To avoid that this would rather cause an in-head localization than the desired external sound image, one can, e.g., use the IEM RoomEncoder plugin, see Fig. 4.36. It is based on an image-source room model In combination of both, IEM RoomEncoder and IEM BinauralDecoder with an Ambisonics-encoded single-channel sound (e.g. using ambix_encoder), one can simply try to place the source and receiver together in the symmetry plane of the room, and then to slightly shift one of both sideways to see how externalization improves by slight asymmetry in the ear signals.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.