Ambisonics pp 53-98 | Cite as

# Ambisonic Amplitude Panning and Decoding in Higher Orders

## Abstract

Already in the 1970s, the idea of using continuous harmonic functions of scalable resolution was described by Cooper and then Gerzon, who introduced the name Ambisonics. This chapter starts by reviewing properties of first-order horizontal Ambisonics, using an interpretation in terms of panning functions. And the required mathematical formulations for 3D higher-order Ambisonics are developed here, with the idea to improve the directional resolution. Based on this formalism, ideal loudspeaker layouts can be defined for constant loudness, localization, and width, according to the previous models. The chapter discusses how Ambisonics can be decoded to less ideal, typical loudspeaker setups for studios, concerts, sound-reinforcement systems, and to headphones. The behavior is analyzed by a rich variety of listening experiments and for various decoding applications. The chapter concludes with example applications using free software tools.

Cooper [2] used higher-order angular harmonics to formulate circular panning of auditory events. Due to the work of Felgett [3], Gerzon [4], and Craven [5], the term Ambisonics became common for technology using spherical harmonic functions. Around the early 2000s, most notably Bamford [6], Malham [7], Poletti [8], Jot [9], and Daniel [10] pioneered the development of higher-order Ambisonic panning and decoding, Ward and Abhayapala [11], Dickens [12], and at the lab of the authors Sontacchi [13].

Another leap happened around 2010, when Ambisonic decoding to loudspeakers could be largely improved by considering regularization methods [14], singular-value decomposition [15], and all-round Ambisonic decoding (AllRAD) [15, 16], a combination of vector-base panning techniques with Ambisonics, yielding the most robust and flexible higher-order decoding method known today.

For headphones, after the work of Jot [9] that outlined the basic problems of binaural decoding in the 1990s, Sun, Bernschütz, Ben-Hur, and Brinkmann [17, 18, 19] made important contributions to binaural decoding, and we consider TAC and MagLS decoders by Zaunschirm and Schörkhuber [20, 21] as the essential binaural decoders. Both remove HRTF delays or optimize HRTF phases at high frequencies to avoid spectral artifacts. By interaural covariance correction, MagLS/TAC manage to play back diffuse fields consistently, using the formalism of Vilkamo et al [22].

## 4.1 Direction Spread in First-Order 2D Ambisonics

*a*to make the directional spread to the loudspeakers system adjustable and either cardioid-shaped \(a=1\), 2D-supercardioid-shaped \(a=\sqrt{2}\), or 2D-hypercardioid-shaped \(a=2\), using:

For first-order 2D Ambisonics, it is theoretically optimal to use at least a ring of 4 loudspeakers with uniform angular spacing and \(a=\sqrt{2}\), which is easily checked with the aid of a computer, cf. Fig. 4.1, and explained below and in Sect. 4.4.

*. The panning-function interpretation with its directional spread has some similarity to MDAP, with its attempt to directionally spread an amplitude-panned signal. Similar to the discrete virtual spread by \({\pm }\alpha =\arccos \Vert \varvec{r}_\mathrm {E}\Vert \) around the panning direction. The virtual direction spread of first-order Ambisonics is described by its continuous panning function \(g(\varphi )\) in Eq. (4.1). To inspect the continuous function by the \(\varvec{r}_\mathrm {E}\) measure defined in Eq. ( 2.7), we may evaluate an integral over the panning function instead of the sum. Because of the symmetry around \(\varphi _\mathrm {s}\), we may set for convenience \(\varphi _\mathrm {s}=0\), which knowingly causes \(r_\mathrm {E,y}=0\), and evaluate*

**Direction spread in FOA*** Ideal loudspeaker layouts*. Not only is the directional aiming of the virtual, continuous first-order Ambisonic panning function ideal and its width panning-invariant, also its loudness measure is panning-invariant. However, decoding to a physical loudspeaker setup can degrade the ideal behavior. For which loudspeaker layout are these properties preserved by sampling decoding?

The 2D first-order Ambisonic components (*W*, *X*, *Y*) correspond to \(\{1,\,\cos \varphi ,\,\sin \varphi \}\) patterns, a first-order Fourier series in the angle. Sampling the playback directions by \(\mathrm {L}=3\) uniformly spaced loudspeakers on the horizon, the sampling theorem for this series is already fulfilled. Accordingly, Parseval’s theorem ensures panning-invariant loudness *E* for any panning direction.

For an ideal \(\varvec{r}_\mathrm {E}\) measure, however, one more loudspeaker is required \(\mathrm {L}\ge 4\) for a uniformly spaced horizontal ring. To explain this increase exhaustively, the concept of circular/spherical polynomials and *t*-designs will be introduced in this chapter. For a brief explanation, \(g^2(\varphi )\) is a second-order expression and therefore to represent the ideal constant loudness \(E=\int g^2(\varphi )\,\mathrm {d}\varphi \) of the continuous panning function consistently after discretization \(E=\frac{2\pi }{\mathrm {L}}\sum _l g_l^2\), it requires \(\mathrm {L}=3\) uniformly spaced loudspeakers, as argued before. By contrast, the expressions \(g^2(\varphi )\cos \varphi \) and \(g^2(\varphi )\sin \varphi \) are third-order and appear in \(\varvec{r}_\mathrm {E}\cdot E=\int g^2(\varphi )\,[\cos \varphi ,\,\sin \varphi ]^\mathrm {T}\mathrm {d}\varphi \). Consequently, ideal mapping of \(\varvec{r}_\mathrm {E}\) (direction and width) requires at least one more loudspeaker \(\mathrm {L}=4\) for a uniformly spaced arrangement to make the continuous and the discretized form \(\varvec{r}_\mathrm {E}\,E=\frac{2\pi }{\mathrm {L}}\sum _l g_l^2\,[\cos \upvarphi _l,\,\sin \upvarphi _l]^\mathrm {T}\) perfectly equal.

*. An \(\mathrm {N{th}}\)-order cardioid pattern is obtained from the cardioid pattern by taking its \(\mathrm {N{th}}\) power*

**Towards a higher-order panning function***. In first-order Ambisonics, panning functions consist of an omnidirectional part, \(\cos (0\varphi )=1\), and a figure-of-eight to*

**Rotated panning function***x*, \(\cos \varphi \), but that was not all: Recording and playback also required a figure-of-eight pattern to

*y*, \(\sin \varphi \).

*Higher-order Ambisonics in 2D (and the associated set of theoretical microphone directivities) is based on the Fourier series in the azimuth angle*\(\varphi \).

## 4.2 Higher-Order Polynomials and Harmonics

The previous section required that direction and length of the \(\varvec{r}_\mathrm {E}\) vector resulting from amplitude panning on loudspeakers matched the desired auditory event direction and width. Harmonic functions with strict symmetry around a panning direction \(\varvec{\theta }_\mathrm {s}\) will help us in achieving this goal and in defining good sampling.

Regardless of the dimensions, be it in 2D or 3D, we desire to define continuous and resolution-limited axisymmetric functions around the panning direction \(\varvec{\uptheta }_\mathrm {s}\) to fulfill our perceptual goals of a panning-invariant loudness *E*, width \(\Vert \varvec{r}_\mathrm {E}\Vert \), and perfect alignment between panning direction \(\varvec{\uptheta }_\mathrm {s}\) and localized direction \(\varvec{r}_\mathrm {E}\). Then we hope to find suitable directional discretization schemes for ideal loudspeaker layouts, so that the measures *E* and \(\varvec{r}_\mathrm {E}\) are perfectly reconstructed in playback.

The projection of a variable direction vector \(\varvec{\theta }\) onto the panning direction \(\varvec{\uptheta }_\mathrm {s}\) always yields the cosine of the enclosed angle \(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta }=\cos \phi \), no matter whether it is in two or three dimensions. Hereby constructing the panning function based on this projection readily meets the desired goals. The \(m\mathrm {th}\) power thereof, \((\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m=\cos ^m\phi \) helps to build an \(\mathrm {N{th}}\)-order power series \(g=\sum _{m=0}^\mathrm {N}a_{m}(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m\) to describe a virtual Ambisonic panning function.

For 2D, such a *circular polynomial* \(g=\sum _{m=0}^\mathrm {N}a_{m}(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m\) contains all \((\mathrm {N}+1)(\mathrm {N}+2)/2\) mixed powers by \((\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m=(\uptheta _\mathrm {xs}\theta _\mathrm {x}+ \uptheta _\mathrm {ys}\theta _\mathrm {y})^m=\sum _{k=0}^m{m \atopwithdelims ()k}\,(\uptheta _\mathrm {xs}\theta _\mathrm {x})^k\,(\uptheta _\mathrm {ys}\theta _\mathrm {y})^{m-k}\) of the direction vectors’ entries \(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}=[\uptheta _\mathrm {xs},\,\uptheta _\mathrm {ys}]\) and \(\varvec{\theta }=[\theta _\mathrm {x},\,\theta _\mathrm {y}]^\mathrm {T}\). However, we could already recognize that it only takes \(2\mathrm {N}+1\) functions to express \(g=\sum _{m=0}^\mathrm {N}a_{m}(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m=\sum _{m=0}^\mathrm {N}a_{m}\cos ^m\phi \): First an initial polynomial with relative azimuth \(\phi =\varphi -\varphi _\mathrm {s}\) relating to a harmonic series of \(\mathrm {N}+1\) cosines or Chebyshev-polynomials \(g=\sum _{m=0}^\mathrm {N}b_m\,\cos m\phi =\sum _{m=0}^\mathrm {N}b_m\,T_m(\varvec{\uptheta }_\mathrm {s}\varvec{\theta })\). Then, in terms of absolute azimuth \(\varphi \), the trigonometric addition theorem re-expresses the series into one of \(\mathrm {N}+1\) cosines and \(\mathrm {N}\) sines, with \(T_m(\varvec{\uptheta }_\mathrm {s}\varvec{\theta })=\cos [m(\varphi -\varphi _\mathrm {s})]=\cos m\varphi _\mathrm {s}\cos m\varphi +\sin m\varphi _\mathrm {s}\sin m\varphi \). As shown in the upcoming section, we can alternatively obtain such orthonormal harmonic functions by solving a second-order differential equation that is generally used to define harmonics, which bears the later benefit that we can use the approach to define spherical harmonics in three space dimensions.

*Spherical polynomials* are similar, \(g=\sum _{n=0}^\mathrm {N}a_n\,(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^n\), involving the expressions \((\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^n=(\uptheta _\mathrm {xs}\theta _\mathrm {x}+ \uptheta _\mathrm {ys}\theta _\mathrm {y}+\uptheta _\mathrm {zs}\theta _\mathrm {z})^n=\sum _{k=0}^n\sum _{l=0}^{n-k}{n \atopwithdelims ()k}{l \atopwithdelims ()n-k}\,(\uptheta _\mathrm {zs}\theta _\mathrm {z})^k(\uptheta _\mathrm {xs}\theta _\mathrm {x})^{l}(\uptheta _\mathrm {ys}\theta _\mathrm {y})^{n-k-l}\). Again, all these \((\mathrm {N}+1)(\mathrm {N}+2)(\mathrm {N}+3)/6\) combinations would be too many to form an orthogonal set of basis functions. Moreover, while the different cosine harmonics are orthogonal axisymmetric functions in 2D, they are not in 3D. On the sphere, the \(\mathrm {N}+1\) orthogonal Legendre polynomials \(P_n(\cos \phi )\) replace the cosine series as a basis for \(g=\sum _{n=0}^\mathrm {N}c_n\,P_n(\cos \phi )\), as shown below. All mathematical derivations for the sphere rely on the definition of harmonics. They result in \((\mathrm {N}+1)^2\) spherical harmonics and their addition theorem as a basis in terms of absolute directions \(\frac{2n+1}{4\pi }P_n(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })=\sum _{m=-n}^nY_n^m(\varvec{\uptheta }_s)Y_n^m(\varvec{\theta })\). Dickins’ thesis is interesting for further reading [12].

In both regimes, 2D and 3D, the circular or spherical polynomials concept will be used to determine optimal layouts, so-called *t*-designs. Such *t*-designs are directional sampling grids that are able to keep the information about the constant part of any either circular (2D) or spherical (3D) polynomials up to the order \(\mathrm {N}\le t\). This will be a mathematical key property exploited to determine requirements for preserving *E* and \(\varvec{r}_\mathrm {E}\) measures during Ambisonic playback with optimal loudspeaker setups, but not only. Also *t*-designs simplify numerical integration of circular or spherical harmonics to define state-of-the-art Ambisonic decoders or mapping effects.

## 4.3 Angular/Directional Harmonics in 2D and 3D

*f*, the Laplacian \(\bigtriangleup f\) describes the curvature. Any harmonic function is proportional to its curvature by an eigenvalue \(\lambda \),

*harmonics*. For suitable eigenvalues \(\lambda \), harmonics span an orthogonal set of basis functions that are typically used for Fourier expansion on a finite interval. It seems desirable to find such harmonics for functions only exhibiting directional dependencies, i.e. in the azimuth angle \(\varphi \) in 2D, and azimuth and zenith angle \(\varphi ,\vartheta \) in 3D.

## 4.4 Panning with Circular Harmonics in 2D

*r*and the angle \(\varphi \) to the

*x*axis, \( \bigtriangleup =\frac{1}{r}\frac{\partial }{\partial r} + \frac{\partial ^2}{\partial r^2}+\frac{1}{r^2}\frac{\partial ^2}{\partial \varphi ^2}. \) And for functions \(\Phi =\Phi (\varphi )\) purely in the angle \(\varphi \), the radial derivatives of \(\bigtriangleup \Phi \) all vanish and it remains (\(\partial \rightarrow \mathrm {d}\))

*. An infinitely narrow angular range around a desired direction \(|\varphi -\varphi _\mathrm {s}|<\varepsilon \rightarrow 0\) is represented by the transformation integral over a Dirac delta distribution \(\delta (\varphi -\varphi _{s})\), cf. Appendix Eq. (A.16), so that the coefficients of such a panning function are*

**2D panning function***. In the theory of circular/spherical polynomials in the variable \(\zeta =\cos (\varphi -\varphi _\mathrm {s})\), so-called*

**Optimal sampling of the 2D panning function***t*-designs in 2D are optimal point sets of given angles \(\{\upvarphi _l\}\) with \(l=1,\dots ,\mathrm {L}\) and size \(\mathrm {L}\). A

*t*-design allows to perfectly compute the integral (constant part) over the polynomials \(\mathcal {P}_m(\zeta )\) of limited degree \(m\le t\) by discrete summation

*E*is a polynomial of the order \(2\mathrm {N}\). The integral to calculate \(\varvec{r}_\mathrm {E}\) is over \(g_\mathrm {N}^2\,\cos (\phi )\) and thus of the order \(2\mathrm {N}+1\). In playback, to get a perfectly panning-invariant loudness measure

*E*of the continuous panning function and also the perfectly oriented \(\varvec{r}_\mathrm {E}\) vector of constant spread \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert \), the parameter

*t*must be \(t\ge 2\mathrm {N}+1\). In 2D, all regular polygons are

*t*-designs with \(\mathrm {L}=t+1\) points

## 4.5 Ambisonics Encoding and Optimal Decoding in 2D

*s*into Ambisonic signals \(\chi _m\), we multiply the signal with the encoder representing the direction of the signal at the angle \(\varphi _\mathrm {s}\) by the weights \(\Phi _m(\varphi _\mathrm {s})\)

## 4.6 Listening Experiments on 2D Ambisonics

The perceptually adjusted panning angle of \(2\mathrm {nd}\)-order max-\(\varvec{r}_\mathrm {E}\) Ambisonics panning on 6 horizontal loudspeakers matches quite well the acoustic reference direction as shown in Fig. 4.4, similar to MDAP in Fig. 3.8, but with a slightly more accurate median by \(0.5^\circ \) on average, and in particular at side and back panning directions.

Another aspect to investigate is how stable the results are for center and off-center listening seats as shown in Fig. 4.5. It illustrates that \(\mathrm {max}\)-\(\varvec{r}_\mathrm {E}\) with the highest order achieves the best stability with regard to localization at off-center listening seats. Astonishingly, the delay compensation for non-uniform delay times to the center deteriorated the results, most probably because of the nearly linear frontal arrangement of loudspeakers that is more robust to lateral shifts of the listening positions than a circular arrangement.

Figure 4.6a, b shows the direction histogram for two different weightings \(a_m\), and it illustrates that proper sidelobe suppression of the panning function by using \(\mathrm {max}\)-\(\varvec{r}_\mathrm {E}\) weights is decisive at shifted listening positions to avoid splitting of the auditory image, as it appears in Fig. 4.6b without the weights (basic).

Peter Stitt’s work shows that the localization offsets at off-center listening seats do not increase with the radius of the loudspeaker arrangement as long as the off-center seat stays in proportion to the radius, Fig. 4.6c. The result are predicted by the sweet area model from Sect. 2.2.9 for the first order (top row) and third order (bottom row) in Fig. 4.7, with both sizes small setup (left) and large setup (right).

Frank’s 2016 experiments [29] used scales on the floor from which listeners read off where the sweet area ends in every radial direction, cf. Fig. 4.8a. For Fig. 4.8b, the criterion for listeners to indicate leaving the sweet area was when the frontally panned sound was mapped outside the loudspeaker pairs L, C, and R. It showed that a sweet area providing perceptually plausible playback measures at least \(\frac{2}{3}\) of the radius of the loudspeaker setup if the order is high enough.

The perceived width of auditory events is investigated in the experimental results of Fig. 4.9, [25], in which pink noise was frontally panned in different orientations of the loudspeaker ring (with one loudspeaker in front, with front direction lying quarter- and half-spaced wrt. loudspeaker spacing). Listeners compared the width of multiple stimuli, and the results were expected to indicate constant width for the differently rotated loudspeaker ring, as the optimal arrangement with \(\mathrm {L}=2\mathrm {N}+2\) provides constant \(\varvec{r}_\mathrm {E}\) length. The panning-invariant length is not perfectly reflected in the perceived widths with \(3\mathrm {rd}\) order on 8 loudspeakers, for which the on-loudspeaker position is perceived as being significantly wider. By contrast, the high-order experiment with \(7\mathrm {th}\) order on 16 loudspeakers would perfectly validate the model.

Figure 4.10 shows experiments investigating the time-variant change in sound coloration for a pink-noise virtual source rotating at a speed of \(100^\circ \)/s, and for different Ambisonic panning setups. There is an obvious advantage of a reduced fluctuation in coloration at both listening positions, centered and off-center, when using the side-lobe-suppressing “\(\mathrm {max}\)-\(\varvec{r}_\mathrm {E}\)” weighting instead of the “basic” rectangular truncation of the Fourier series. At the off-center listening position, \(\mathrm {max}\)-\(\varvec{ r}_\mathrm {E}\) weights achieve good results with regard to constant coloration for both \(3\mathrm {rd}\) and \(7\mathrm {th}\) order arrangements with 8 and 16 loudspeakers that were investigated.

*All the above experiments deal with how non-diffuse signals are presented. To complement what is shown in Fig. 1.21 of Chap. 1 with an explanation, the relation between Ambisonic order and its ability to preserve diffuse fields is estimated here by the covariance between uncorrelated directions. Assume a max-\(\varvec{r}_\mathrm {E}\)-weighted \(\mathrm {N{th}}\)-order Ambisonic panning function \(g(\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{\theta })\) that is normalized to \(g(1)=1\), encodes two sounds \(s_{1,2}\) from two directions \(\varvec{\theta }_1\) and \(\varvec{\theta }_2\), with the sounds being uncorrelated and unit-variance \(E\{s_1s_2\}=\delta _{1,2}\). We can find that the Ambisonic representation mixes the sounds at their respective mapped directions and yields an increase of their correlation \(x_1=s_{1}+g_{12}\,s_2\) and \(x_2=s_{2}+g_{12}\,s_1\), using \(g_{12}=g(\cos \phi )\),*

**How well would diffuse signals be preserved played back?**## 4.7 Panning with Spherical Harmonics in 3D

*r*and two angles, azimuth \(\varphi \) indicating the polar angle of the orthogonal projection to the

*xy*plane, and the zenith angle \(\vartheta \) indicating the angle to the

*z*axis, according to the right-handed spherical coordinate systems in ISO31-11, ISO80000-2, [30, 31], Fig. 4.11.

*r*signifying the radius, \(\varphi \) the azimuth angle, and the zenith angle \(\vartheta \) re-expressed as \(\zeta =\frac{z}{r}=\cos \vartheta \), yielding the operator \( \bigtriangleup = \frac{2}{r}\frac{\partial }{\partial r} + \frac{\partial ^2}{\partial r^2}+\frac{1}{r^2(1-\zeta ^2)}\frac{\partial ^2}{\partial \varphi ^2} -\frac{2}{r^2}\zeta \frac{\partial }{\partial \zeta }+ \frac{1-\zeta ^2}{r^2} \frac{\partial ^2}{\partial \zeta ^2}\). Any radius-dependent part is removed to define an eigenproblem yielding the basis for panning functions, taking only \(r^2\bigtriangleup _{\upvarphi ,\upzeta ,3\mathrm {D}}\),

*m*is the degree and for each

*n*it is limited by \(-n\le 0 \le n\).

The spherical harmonics, Fig. 4.12, are orthonormal on the sphere \(-\pi \le \varphi \le \pi \) and \(0\le \vartheta \le \pi \), and for unbounded order \(\mathrm {N}\rightarrow \infty \) they are complete; see also Appendix A.3.7.

*Note that the above N3D normalization*\(\int _{\varvec{\theta }\in \mathbb {S}^2}|Y_n^m(\varvec{\theta })|^2\,\mathrm {d}\varvec{\theta }=1\)

*defines each spherical harmonic except for an arbitrary-phase it might be multiplied with. Legendre functions for the zenith dependency might be defined differently in literature, and for azimuth, some implementations use*\(\sin (m\varphi )\)

*instead of*\(\sin (|m|\varphi )\).

*In Ambisonics, real-valued functions and the SN3D normalization*\(\sqrt{\frac{1}{2}\frac{(n-|m|)!}{(n+|m|)!}}\)

*are preferred, and positive signs of the first-order dipole components in the directions of the respective coordinate axes,*

*x*,

*y*,

*z*,

*are preferred. This might require to involve the Condon-Shortley phase*\((-1)^m\)

*to correct the signs of the Legendre functions, or*\(-1\)

*for*\(m<0\)

*to correct the sign of azimuthal sinusoids, depending on the implementation of the respective functions. It is often helpful to employ converters and directional checks to ensure compatibility!*

*. An infinitely narrow direction range around a desired direction \(\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{\theta }>\cos \varepsilon \rightarrow 1\) is represented by the transformation integral over the Dirac delta \(\delta (1-\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{\theta })\), cf. Eq.(A.41), so that the coefficients of the panning function are*

**3D panning function***m*by the

*spherical harmonics addition theorem*\(\sum _{m=-n}^n Y_n^m(\varvec{\theta }_\mathrm {s})\,Y_n^m(\varvec{\theta }) ={\textstyle \frac{2n+1}{4\pi }}\,P_n(\cos \phi )\),

*. In the theory of spherical polynomials in the variable \(\zeta =\varvec{\theta }_s^\mathrm {T}\varvec{\theta }\), so-called*

**Optimal sampling of the 3D panning function***t*-designs describe point sets of given directions \(\{\varvec{\uptheta }_l\}\) with \(l=1,\dots ,\mathrm {L}\) and size \(\mathrm {L}\) that allow to perfectly compute the integral (constant part) over the polynomials \(\mathcal {P}_n(\zeta )\) of limited order \(n\le t\) by discrete summation

*E*is calculated by the integral over \(g_\mathrm {N}^2\), therefore over a polynomial of the order \(2\mathrm {N}\). The integral to calculate \(\varvec{r}_\mathrm {E}\) runs over \(g_\mathrm {N}^2\,\zeta \), therefore over a polynomial of the order \(2\mathrm {N}+1\). In playback, to get a perfectly panning-invariant loudness measure

*E*of the continuous panning function and also the perfectly oriented \(\varvec{r}_\mathrm {E}\) vector of constant spread \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert \), the parameter

*t*must be \(t\ge 2\mathrm {N}+1\). In 3D there are only 5 geometrically regular layouts

the tetrahedron, \(\mathrm {L}=4\) corners, is a 2-design,

the octahedron, \(\mathrm {L}=6\) corners, is a 3-design,

the hexahedron (cube), \(\mathrm {L}=8\) corners, is a 3-design,

the icosahedron, \(\mathrm {L}=12\) corners, is a 5-design,

the dodecahedron, \(\mathrm {L}=20\) corners, is a 5-design.

For instance, for \(\mathrm {N}=1\), the octahedron is a suitable spherical design, for \(\mathrm {N}=2\), the icosahedral or dodecahedral layouts are suitable.

Exceeding the geometrically regular layouts, there are designs found by optimization to be regular under the mathematical rule to approximate \(\int _{\mathbb {S}^2}Y_n^m(\varvec{\theta })\,\mathrm {d}\varvec{\theta }=\sqrt{4\pi }\delta _n\) accurately by \(\frac{4\pi }{\mathrm {L}}\sum _lY_n^m(\varvec{\uptheta }_l)\) for all \(n\le t\) and \(|m|\le n\). A large collection can be found by Hardin and Sloane [32], Gräf and Potts [33], and Womersley [34] available on the following websites

http://neilsloane.com/sphdesigns/dim3/

http://homepage.univie.ac.at/manuel.graef/quadrature.php

(Chebyshev-type Quadratures on \(\mathbb {S}^2\)), and

https://web.maths.unsw.edu.au/~rsw/Sphere/EffSphDes/ss.html.

## 4.8 Ambisonic Encoding and Optimal Decoding in 3D

*s*into Ambisonic signals \(\chi _{nm}\), we multiply the signal with the encoder representing the direction \(\varvec{\theta }_\mathrm {s}\) of the signal by the weights \(Y_n^m(\varvec{\theta }_\mathrm {s})\)

## 4.9 Ambisonic Decoding to Loudspeakers

Ambisonic decoding to loudspeakers has been dealt with by numerous researchers, in the past, particularly because result are not very stable for first-order Ambisonics, and later because they strongly depend on how uniform the loudspeaker layout is for higher-order Ambisonics. Moreover, Solvang found that even the use of too many loudspeakers has a degrading effect [35].

For first-order decoding, the Vienna decoders by Michael Gerzon [36] are often cited, and for higher-order Ambisonic decoding, one can, e.g. find works by Daniel with max-\(\varvec{r}_\mathrm {E}\) [37] and pseudo-inverse decoding [10], also by Poletti [14, 38, 39].

What turned out to be the most practical solution, is the All-Round Ambisonic Decoding approach (AllRAD) due to its feature of allowing imaginary loudspeaker insertion and downmix as described in the sections above, cf. [40]. It moreover does not have restrictions on the Ambisonics order, which for other decoders often yields poor controllability of panning-dependent fluctuations in loudness and directional mapping errors.

The playable set of directions \(\varvec{\uptheta }_l\) or \(\upvarphi _l\) is usually finite and discrete, and it is represented by the surrounding loudspeakers’ directions. The directional distribution of the surrounding loudspeakers is typically neither a *t*-design (with \(t\ge 2\mathrm {N}+1\) in general, sometimes not even regular polygons with \(\mathrm {L}\ge 2\mathrm {N}+2\) loudspeakers for 2D, in particular). In such cases, it is extremely helpful to be aware of the properties of the various decoder design methods.

### 4.9.1 Sampling Ambisonic Decoder (SAD)

*E*measure on the circle or sphere of the surrounding directions. However, the sampling decoder would neither yield perfectly constant loudness and width measures,

*E*, \(\Vert \varvec{r}_\mathrm {E}\Vert \), nor a correct aiming of the localization measure \(\varvec{r}_\mathrm {E}\) if the loudspeaker layout wasn’t optimal. For instance concerning loudness, for panning towards directional regions of poor loudspeaker coverage, sampling misses out the main lobe of the panning function, yielding a noticeably reduced loudness.

### 4.9.2 Mode Matching Decoder (MAD)

### 4.9.3 Energy Preservation on Optimal Layouts

*t*-design selected by \(t\ge 2\mathrm {N}\), the loudness measure

*E*is panning-invariant, in general,

*E*measure also holds for the mode-matching decoder using a \(t\ge 2\mathrm {N}\)-design, as it becomes equivalent to a sampling decoder \(\varvec{D}={\sqrt{\frac{\mathrm {L}}{S_{\mathrm {D}-1}}}}\varvec{Y}_\mathrm {N}^\mathrm {T}(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}= {\sqrt{\frac{\mathrm {L}}{S_{\mathrm {D}-1}}}}\varvec{Y}_\mathrm {N}^\mathrm {T}{\frac{S_{\mathrm {D}-1}}{\mathrm {L}}}={\sqrt{\frac{S_{\mathrm {D}-1}}{\mathrm {L}}}}\varvec{Y}_\mathrm {N}^\mathrm {T}\). Under these ideal conditions, both decoders are energy-preserving.

### 4.9.4 Loudness Deficiencies on Sub-optimal Layouts

### 4.9.5 Energy-Preserving Ambisonic Decoder (EPAD)

*E*by enforcing \(\varvec{D}^\mathrm {T}\varvec{D}=\varvec{I}\), which is otherwise only achieved on \(t\ge 2\mathrm {N}\)-designs. We may search for a decoding matrix \(\varvec{D}\) whose entries are closest to the sampling decoder under the constraint to be column-orthogonal:

^{1}\(\varvec{D}^\mathrm {T}\varvec{D}=\varvec{I}\). The energy-preserving decoder in this basic version requires \(\mathrm {L}\ge 2\mathrm {N}+1\) loudspeakers in 2D or \(\mathrm {L}\ge (\mathrm {N}+1)^2\) in 3D to work.

Note that if the loudspeaker setup directions are already a \(t\ge 2\mathrm {N}\) design, the sampling, mode-matching, and energy-preserving decoders are equivalent.

### 4.9.6 All-Round Ambisonic Decoding (AllRAD)

In Chap. 3 on vector-base amplitude panning methods, a well-balanced panning result in terms of loudness, width, and localization was achieved by MDAP that distributes a signal to an arrangement of several superimposed VBAP virtual sources. Hereby \(E=\text {const.}\), \(\varvec{r}_\mathrm {E}\approx r_\mathrm {E}\,\varvec{\theta }_\mathrm {s}\), and \(r_\mathrm {E}\approx \text {const}\). This works for nearly any loudspeaker layout.

While, to calculate loudspeaker gains, MDAP superimposes an arrangement of discrete virtual sources within a range of \({\pm }\alpha \) around the panning direction \(\varvec{\theta }_\mathrm {s}\), one could also think of superimposing a quasi-continuous distribution of virtual sources that are weighted by a continuous panning function \(g(\varvec{\theta })\).

The ideal continuous panning function \(g(\varvec{\theta })\) of axisymmetric directional spread around the panning direction \(\varvec{\theta }_\mathrm {s}\) is described by \(g(\varvec{\theta })=\varvec{y}_\mathrm {N}^\mathrm {T}(\varvec{\theta })\,\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\,\varvec{y}_\mathrm {N}(\varvec{\theta }_\mathrm {s})\), the Ambisonic panning function. This rotation-invariant continuous function is optimal in terms of loudness, width, and localization measures, which are all evaluated by continuous integrals: \(E=\int g^2(\varvec{\theta })\,\mathrm {d}\varvec{\theta }=\text {const.}\) expresses panning-invariant loudness, \(\varvec{r}_\mathrm {E}=\frac{1}{E}\int g^2(\varvec{\theta })\,\varvec{\theta }\,\mathrm {d}\varvec{\theta }=r_\mathrm {E}\,\varvec{\theta }_\mathrm {s}\) indicates a perfect alignment \(\varvec{r}_\mathrm {E}\parallel \varvec{\theta }_\mathrm {s}\) with the panning direction and a panning-invariant width \(r_\mathrm {E}=\text {const}\). However, the optimal values of these integrals are only preserved by discretization with optimal \(t\ge 2\mathrm {N}+1\)-design loudspeaker layouts.

*is preceded by the work of Batke and Keiler [16]. They describe Ambisonic panning \(\varvec{g}_\mathrm {AllRAD}(\varvec{\theta })=\varvec{D}\,\varvec{y}_\mathrm {N}(\varvec{\theta })\) by a decoder \(\varvec{D}\), whose result matches best with VBAP \(\varvec{g}_\mathrm {VBAP}(\varvec{\theta })\). Without max-\(\varvec{r}_\mathrm {E}\) weights yet, we use this here to define AllRAD by the integral expressing a minimum-mean-square-error problem using the integral over all panning directions \(\varvec{\theta }\)*

**All-round Ambisonic decoding (AllRAD)***t*-design. As VBAP’s gain functions aren’t smooth (derivatives are non-continuous), they are order-unlimited, and a

*t*-design of sufficiently high

*t*should be used. In the 3D practice, the 5200 pts. Chebyshev-type design from [33] is dense enough. Note that the VBAP part permits improvements by insertion and downmix of imaginary loudspeakers to adapt to asymmetric or hemispherical layouts, as suggested in the original paper [40], cf. Sect. 3.3.

*Note that the decoder needs to be scaled properly. For instance, the norm of the omnidirectional component (first column) could be equalized to one, as it would typically be with a sampling decoder; there are alternative strategies to circumvent the scaling problem* [41].

### 4.9.7 EPAD and AllRAD on Sub-optimal Layouts

### 4.9.8 Decoding to Hemispherical 3D Loudspeaker Layouts

In typical loudspeaker playback situations for large audience, a solid floor and no loudspeakers below ear level are considered practical for several reasons. However, this does not permit decoding by sampling with optimal *t*-design layouts covering all directions. As shown above, EPAD and AllRAD do not require such arrays. And yet, they still require some care when used with hemispherical loudspeaker layouts, see [15, 40] for further reading.

* EPAD with hemispherical loudspeaker layouts*. Even for a hemispherical layout, the energy-preserving decoding method requires \(\mathrm {L}\ge (\mathrm {N}+1)^2\) loudspeakers to achieve a perfectly panning-invariant loudness. However, this is counter-intuitive:

*Why should one need at least as many loudspeakers on a hemisphere as are required for same-order playback on a full sphere? Shouldn’t the number be half as many?*

*t*-design. By singular-value decomposition of the positive semi-definite matrix \(\varvec{G}=\varvec{Q}\,\mathrm {diag}\{\varvec{s}\}\,\varvec{Q}^\mathrm {T}\), with \(\varvec{Q}^\mathrm {T}\varvec{Q}=\varvec{Q}\varvec{Q}^\mathrm {T}=\varvec{I}\), we diagonalize \(\varvec{G}\) and find new basis functions \(\varvec{\tilde{y}}_\mathrm {N}(\varvec{\theta })\), the so-called

*Slepian*functions [42], that are orthogonal on

*S*

*S*by

Integration ranges \(0\le \vartheta \le \vartheta _\mathrm {max}\) to obtain \((\mathrm {N}+1)(\mathrm {N}+2)/2\) Slepian functions with minimum loudness fluctuation \(\frac{\mathrm {max}E}{\mathrm {min}E}\) for panning on the hemisphere

\(\mathrm {N}\) | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
---|---|---|---|---|---|---|---|---|---|

\(\vartheta _\mathrm {max}\) | \(115^\circ \) | \(114^\circ \) | \(113^\circ \) | \(111^\circ \) | \(108^\circ \) | \(106^\circ \) | \(104^\circ \) | \(102^\circ \) | \(101^\circ \) |

\( \frac{\mathrm {max} E}{\mathrm {min}E}\) | 0.5 dB | 0.4 dB | 0.4 dB | 0.3 dB | 0.3 dB | 0.3 dB | 0.2 dB | 0.3 dB | 0.2 dB |

*. Because of the vector-base amplitude panning involved, all-round Ambisonic decoding (AllRAD) is comparatively robust to irregular loudspeaker setups. Still, a hemispherical layout does not contain any loudspeaker direction vector pointing to the lower half space, therefore one could just omit information of the lower half space. However, the Ambisonic panning function implies a directional spread, so that panning to exactly the horizon also produces content below, whose omission causes: (i) a loss in loudness, (ii) a slight elevation of the perceived direction, cf. Fig. 4.18.*

**AllRAD with hemispherical loudspeaker layouts**As discussed in the section on triangulation Sect. 3.3, the insertion of imaginary loudspeakers fixes this behavior. In the case of hemispherical loudspeaker layouts, it is not necessary to downmix the signal of the imaginary loudspeaker at nadir to stabilize both loudness and localization for panning to the horizon.

Signal contributions below but close to the horizon largely contribute to the horizontal loudspeakers, and it is therefore safe to dispose the signal that would feed the imaginary loudspeaker at nadir without loss of loudness. Moreover, this contribution from below also reinforces signals on the horizontal loudspeakers so that localization is pulled back down. Both can be observed in Fig. 4.18 that shows the loudness measure *E* as well as mislocalization and width by the measure \(\varvec{r}_\mathrm {E}\) using max-\(\varvec{r}_\mathrm {E}\)-weighted AllRAD with \(5\mathrm {th}\)-order Ambisonics along a vertical panning circle on the IEM mobile Ambisonics Array (mAmbA). It consists of 25 loudspeakers set up in rings of 8, 8, 4, 4, and 1 loudspeakers at 0, 20, 40, 60, 90 degrees elevation. Rings two and four start at 0 degree, the others are half-way rotated.

*. Figure 4.18 shows a comparison of AllRAD and EPAD decoding to the 25-channel mAmbA hemispherical loudspeaker layout.*

**Performance comparison on hemispherical layouts**While (top in Fig. 4.18) AllRAD produces a loudness fluctuation roughly spanning 1 dB for panning on the hemisphere, EPAD only exhibits 0.3 dB, as specified in Table 4.1. While in monophonic playback of noise, loudness differences of less than 0.5 dB can be heard, it is safe to assume that a weak directional loudness fluctuation of less than 1 dB is normally inaudible. In this regard, loudness fluctuation should be no problem with both EPAD and AllRAD.

Concerning the directional mapping, EPAD produces a more strongly pronounced ripple, with \(\varvec{r}_\mathrm {E}\) indicating sounds on the horizon \(\vartheta _\mathrm {s}=\pm 90^\circ \) to be pulled upwards towards \(0^\circ \) more with EPAD (\(7^\circ \)) than with AllRAD (\(3^\circ \)). In terms of width, both EPAD and AllRAD exhibit the \({\approx }20^\circ \) average associated with max-\(\varvec{r}_\mathrm {E}\) weighting. However, EPAD also produces a greater fluctuation, and it widens up to about \(30^\circ \) degree for panning to the horizon \(\vartheta _\mathrm {s}=\pm 90^\circ \).

Figure 4.19 shows the result of max-\(\varvec{r}_\mathrm {E}\)-weighted \(2\mathrm {nd}\)-order EPAD and \(5\mathrm {th}\)-order AllRAD for the \(4+5+0\) layout using a vertical panning curve. While the perfectly constant loudness measure of EPAD might be favored over the almost \(+3\) dB loudness increase of front and back for AllRAD, AllRAD’s lower directional error, narrower width mapping, greater flexibility, and simplicity has often proven to be clearly superior in practice.

## 4.10 Practical Studio/Sound Reinforcement Application Examples

This section analyzes the application of 3D Ambisonic amplitude panning consisting of encoding and AllRAD to studio (with typical setups of 2 m radius) and sound reinforcement applications (for an audience of, e.g., 250 people). Application scenarios are sketched in [43], and various other examples are given below. Requirements of a constant loudness and width are analyzed below, and as sound reinforcement requires a particularly large sweet area, the \(\varvec{r}_\mathrm {E}\) vector model for off-center listening positions from Sect. 2.2.9 is used to depict the sweet area size.

*E*in dB (left column) and the width measure \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert \) in degrees (right column). As \(5\mathrm {th}\)-order max-\(\varvec{r}_\mathrm {E}\)-weighted AllRAD typically produces minor directional mapping errors, they aren’t explicitly shown in Figs. 4.20 and 4.22. However, the mappings of the sweet area size of plausible localization in Figs. 4.21 and 4.23 illustrate the usefulness of the systems for the listening areas hosting the number of listeners targeted for either the studio or the sound reinforcement application.

Figure 4.20 illustrates AllRAD’s tendency of attenuated signals in too closely spaced loudspeaker ensembles as in the front section of the ITU [44] \(4+5+0\). By contrast, for instance the mAmbA layout in Fig. 4.22 only has 8 loudspeakers on the horizon, and signals panned to the largely spaced below-horizon triangles tend to get louder. Moreover, it is easier for loudspeaker systems of many channels such as IEM CUBE, mAmbA, Lobby, and Ligeti Hall in Fig. 4.22 to yield smooth loudness and width mappings. Still, also with only a few loudspeakers, slight direction adjustment in the layout can fix some of the behavior, as with the IEM Production Studio, whose \({\pm }45^\circ \) loudspeakers in the elevated layer is superior to a \({\pm } 30^\circ \) spacing.

A hint for designing good decoders sometimes is idealization: often it is better to disregard the true loudspeaker setup locations and feed the decoder design with idealized positions instead. Hereby can one trade slight directional distortions for a more uniform loudness distribution. For instance at the IEM CUBE, loudspeaker locations of the horizontal ring could be idealized to \(30^\circ \) to get a smoother loudness mapping as the one shown in Fig. 4.22.

## 4.11 Ambisonic Decoding to Headphones

Typically, Ambisonic decoding to headphones can be done similarly as with loudspeakers, except that the loudspeaker signals are rendered to headphones by convolution with the head-related impulse responses (HRIRs) of the corresponding playback directions. Various databases of such HRIRs can be found, e.g., on the website SOFA-conventions.^{2} This headphone decoding approach is classically using a small set of so-called *virtual loudspeakers*, as it is found in many places in technical literature, e.g. in the pioneering works of Jean-Marc Jot et al. [9] or Jérôme Daniel [10]. It is relevant in many important other works [18, 45, 46], the SADIE project,^{3} and it is employed in Sect. 1.4.2 on first-order Ambisonics.

* Coarse*. However, as outlined in some research papers [9, 18, 46], these approaches have in common that low-order Ambisonic synthesis is problematic. It can either happen when inserting a

*dense grid*of virtual-loudspeaker HRIRs that the Ambisonic smoothing attenuates high-frequency at frontal and dorsal directions. Or, what had been the solution for a long time, a

*coarse grid*of virtual-loudspeaker HRIRs does not attenuate high frequencies, but still yields that spatial quality strongly depends on the particular grid layout or orientation [46]. An early paper by Jot [9] proposed to remove the time delays of the HRIR before Ambisonic decomposition, and then to re-insert the otherwise missing interaural time-delay afterwards, for any sound panned in Ambisonics, which unfortunately yields an

*object-based*panning system rather than a

*scene-based*Ambisonic system.

*. Some dense-grid approaches propose to keep the HRIR time delays, or if formulated in the frequency domain: the HRTF phases (head-related transfer function), and hereby stay in a scene-based Ambisonic format, while correcting spectral deficiencies by diffuse-field or interaural-covariance equalization [18, 47]. Finally, most recent solutions proposed by Jin, Sun, and Epain, [17, 48] or Zaunschirm, Schörkhuber, and Höldrich [20, 21] modify the HRIR time delays/HRTF phases but only above, e.g., 3 kHz, without any object-based re-insertion afterwards. The omission of high-frequency interaural time-delay/phase information is a reasonable trade off done in favor of a more important accuracy in spectral magnitude.*

**Dense***The geometrical theory of diffraction [49] suggests that HRIRs must always contain at least the delay to the ear of either the shortest direct path or the shortest indirect path via the surface of the head. For a spherical head model with the radius \(\mathrm {R}=0.0875\) m and speed of sound \(c=343\) \(\frac{\mathrm {m}}{\mathrm {s}}\), the Woodworth-Schlosberg formula [50] is composed of this consideration, see Fig. 4.24. The left ear receives a distant horizontal sound from the azimuth interval \(0\le \phi \le \frac{\pi }{2}\) as direct sound anticipated by \(\tau =-\frac{\mathrm {R}}{c}\sin \phi \), or for \(-\frac{\pi }{2}<\phi \le 0\) as an indirect sound delayed by \(\tau =-\frac{\mathrm {R}}{c}\,\phi \),*

**What does directional HRIR smoothing do to high frequencies?**^{4}in Fig. 4.25b.

Directional smoothing of the discrete directional HRTFs causes relevant spectral problems, regardless of whether directional smoothing is done by Ambisonics, VBAP, MDAP. Mainly the geometric delay in the HRIRs is responsible for the emerging comb-filter or low-pass behavior. One could pull out the linear phase trend above the frequency limit and re-insert it, but is re-insertion necessary?

### 4.11.1 High-Frequency Time-Aligned Binaural Decoding (TAC)

As a pre-requisite for their binaural Ambisonic decoders, Schörkhuber et al. [21] tested, above which frequency the removal of the HRTF linear phase trend remains inaudible in direct HRTF-based rendering without panning or smoothing. In fact, most of their listeners could not distinguish the absence of the linear phase trend when removed above 3 kHz for various sound examples (drums, speech, pink noise, rendered at directions \(10^\circ \), \(-45^\circ \), \(80^\circ \), \(-130^\circ \)). They had their subjects compare the result to a reference with unaltered HRTFs, and the result is analyzed in Fig. 4.28.

*y*axis, so \(\arccos \pm \theta _\mathrm {y}\), but shifted by \(90^\circ \), hence \(\phi =\pm \arcsin \theta _\mathrm {y}\).

^{5}is shown in Fig. 4.29. The resulting polar patterns (ta) clearly outperform the linear decomposition (lin) at frequencies above 2kHz in representing the original HRTFs (max).

### 4.11.2 Magnitude Least Squares (MagLS)

In practice, however, results turn out to be perfect already with an iterative combination of the reconstructed phase \(\hat{\phi }_{l,k-1}\) from the previous frequency \(\omega _{k-1}\) with the HRTF magnitude \(|h_{k,l}|\) of the current frequency \(\omega _k\), before a linear decomposition thereof into spherical harmonic coefficients \(\varvec{\hat{h}}_{\mathrm {SH},k}\).

The results of the MagLS approach (mls) outperform the time-alignment approach (ta) in the exemplary results shown for \(\mathrm {N}=3\) in Fig. 4.29, in particular at the highest frequencies, where sphere-model-based delay simplification is not sufficiently helpful, anymore.

### 4.11.3 Diffuse-Field Covariance Constraint

Also for both the above approaches that modify the high-frequency phase, Zaunschirm et al. [20] note that low order rendering degrades envelopment in diffuse fields, so that they introduce an additional covariance constraint as defined by Vilkamo [22]. It can be implemented as a \(2\times 2\) filter matrix equalizing the resulting frequency-domain diffuse-field covariance matrix to the one of the original HRTF datasets. On the main diagonal, this covariance matrix shows the diffuse-field ear sensitivities (left and right), and off-diagonal it contains the diffuse-field inter-aural cross correlation.

## 4.12 Practical Free-Software Examples

### 4.12.1 Pd and Circular/Spherical Harmonics

For decoding to headphones, programming in Pd also looks rather similar as in the first-order example in Fig. 1.14, only more HRIRs matching the respective loudspeaker positions need to be employed. To work in 3 dimensions, programming in Pd would also be similar as in the corresponding first-order example of Fig. 1.15, using the matrix object [mtx_spherical_harmonics]. Typically, pre-calculated decoders including AllRAD and max-\(\varvec{r}_\mathrm {E}\) are used and loaded by, e.g., [mtx D.mtx] into Pd to keep programming simple.

### 4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM AllRADecoder

*channel-based*[52] typically refers to each channel of the multi-channel material meant to be played back on a separate loudspeaker of clearly defined direction, cf. [44]. Elsewhere, the embedding of virtual playback directions can also be found referred to as

*beds*or

*virtual panning spots*.

The IEM AllRADecoder permits to manually enter or import the loudspeaker coordinates and channel indices, with the coordinates specified by the azimuth and elevation angle in degrees, as exemplified for the IEM production studio in Fig. 4.33. The figure also shows that just entering the pure \(5+7+0\) layout would produce an error message *Point of origin not within convex hull. Try adding imaginary loudspeakers.*

### 4.12.3 Reaper, IEM RoomEncoder, and IEM BinauralDecoder

Particularly relevant for head-phone-based listening, rendering of anechoic sounds will typically not *externalize* well, as it does not match the mental expectation of ordinary listening environments [53, 54, 55, 56]. To avoid that this would rather cause an in-head localization than the desired external sound image, one can, e.g., use the IEM RoomEncoder plugin, see Fig. 4.36. It is based on an image-source room model and encodes first-order wall-reflections involving reflection factors and propagation delays together with the desired direct sound.

The MagLS approach for Ambisonic decoding, using the KU100 measurements from Cologne Applied Science University and (optionally) their headphone equalization curves is implemented by the IEM BinauralDecoder, see Fig. 4.37.

## Footnotes

- 1.
In detail, this follows from Open image in new window.

- 2.
- 3.
- 4.
Data HRIR_CIRC360.sofa from http://sofacoustics.org/data/database/thk.

- 5.
Data HRIR_L2702.sofa from http://sofacoustics.org/data/database/thk.

## References

- 1.J.S. Bamford, Ambisonic sound for the masses (1994)Google Scholar
- 2.D.H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. J. Audio Eng. Soc.
**20**(5), 346–360 (1972)Google Scholar - 3.P. Felgett, Ambisonic reproduction of directionality in surround-sound systems. Nature
**252**, 534–538 (1974)CrossRefGoogle Scholar - 4.M.A. Gerzon, The design of precisely coincident microphone arrays for stereo and surround sound, in
*prepr. L-20 of 50th Audio Engineering Society Convention*(1975)Google Scholar - 5.P. Craven, M.A. Gerzon, Coincident microphone simulation covering three dimensional space and yielding various directional outputs, U.S. Patent, no. 4,042,779 (1977)Google Scholar
- 6.J.S. Bamford, An analysis of ambisonic sound systems of first and second order, Master’s thesis, University of Waterloo, Ontario (1995)Google Scholar
- 7.D.G. Malham, A. Myatt, 3D sound spatialization using ambisonic techniques. Comput. Music J.
**19**(4), 58–70 (1995)CrossRefGoogle Scholar - 8.M.A. Poletti, The design of encoding functions for stereophonic and polyphonic sound systems. J. Audio Eng. Soc.
**44**(11), 948–963 (1996)Google Scholar - 9.J.-M. Jot, V. Larcher, J.-M. Pernaux, A comparative study of 3-d audio encoding and rendering techniques, in
*16th AES Conference*(Rovaniemi, 1999)Google Scholar - 10.J. Daniel, Représentation des champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia, Ph.D. dissertation, Université Paris 6 (2001)Google Scholar
- 11.D.B. Ward, T.D. Abhayapala, Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans. Speech Audio Process.
**9**(6), 697–707 (2001)CrossRefGoogle Scholar - 12.G. Dickins, Sound field representation, reconstruction and perception, Master’s thesis, Australian National University, Canberra (2003)Google Scholar
- 13.A. Sontacchi, Neue Ansätze der Schallfeldreproduktion, Ph.D. dissertation, TU Graz (2003)Google Scholar
- 14.M.A. Poletti, Robust two-dimensional surround sound reproduction for nonuniform loudspeaker layouts. J. Audio Eng. Soc.
**55**(7/8), 598–610 (2007)Google Scholar - 15.F. Zotter, H. Pomberger, M. Noisternig, Energy-preserving ambisonic decoding. Acta Acust. United Acust.
**98**(11), 37–47 (2012)CrossRefGoogle Scholar - 16.J.-M. Batke, F. Keiler, Using vbap-derived panning functions for 3d ambisonics decoding, in
*2nd International Symposium on Ambisonics and Spherical Acoustics*(Paris, 2010)Google Scholar - 17.D. Sun,
*Generation and perception of three-dimensional sound fields using higher order ambisonics*(University of Sydney, School of Electrical and Information Engineering, 2013). Ph.D thesisGoogle Scholar - 18.Z. Ben-Hur, F. Brinkmann, J. Sheaffer, S. Weinzierl, B. Rafaely, Spectral equalization in binaural signals represented by order-truncated spherical harmonics. J. Acoust. Soc. Am.
**141**(6) (2017)CrossRefGoogle Scholar - 19.F. Brinkmann, S. Weinzierl, Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposition, in
*AES Conference AVAR*(Redmont, 2018)Google Scholar - 20.M. Zaunschirm, C. Schörkhuber, R. Höldrich, Binaural rendering of ambisonic signals by head-related impulse response time alignment and a diffuseness constraint. J. Acoust. Soc. Am.
**143**(6), 3616–3627 (2018)CrossRefGoogle Scholar - 21.C. Schörkhuber, M. Zaunschirm, R. Höldrich, Binaural rendering of ambisonic signals via magnitude least squares, in
*Fortschritte der Akustik - DAGA*(Munich, 2018)Google Scholar - 22.J. Vilkamo, T. Bäckström, A. Kuntz, Optimized covariance domain framework for time-frequency processing of spatial audio. J. Audio Eng. Soc.
**61**(6), 403–411 (2013)Google Scholar - 23.F.W.J. Olver, R.F. Boisvert, C.W. Clark (eds.),
*NIST Handbook of Mathematical Functions*(Cambridge University Press, Cambridge, 2000), http://dlmf.nist.gov. Accessed June 2012 - 24.J. Daniel, J.-B. Rault, J.-D. Polack, Acoustic properties and perceptive implications of stereophonic phenomena, in
*AES 6th International Conference: Spatial Sound Reproduction*(1999)Google Scholar - 25.M. Frank, How to make ambisonics sound good, in
*Forum Acusticum, Krakow*(2014)Google Scholar - 26.M. Frank, F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in
*Fortschritte der Akustik - DAGA*(Kiel, 2017)Google Scholar - 27.M. Frank, A. Sontacchi, F. Zotter, Localization experiments using different 2d ambisonic decoders, in
*25th Tonmeistertagung*(2008)Google Scholar - 28.P. Stitt, S. Bertet, M.v. Walstijn, Off-centre localisation performance of ambisonics and hoa for large and small loudspeaker array radii. Acta Acust. United Acust.
**100**(5), 937–944 (2014)CrossRefGoogle Scholar - 29.M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in
*AES 142nd Convention*(2017)Google Scholar - 30.ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology (1978)Google Scholar
- 31.ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the natural sciences and technology (2009)Google Scholar
- 32.R.H. Hardin, N.J.A. Sloane, Mclaren’s improved snub cube and other new spherical designs in three dimensions. Discret. Comput. Geom.
**15**, 429–441 (1996), http://neilsloane.com/sphdesigns/dim3/MathSciNetCrossRefGoogle Scholar - 33.M. Gräf, D. Potts, On the computation of spherical designs by a new optimization approach based on fast spherical fourier transforms. Numer. Math.
**119**(2011), http://homepage.univie.ac.at/manuel.graef/quadrature.phpMathSciNetCrossRefGoogle Scholar - 34.R.S. Womersley,
*Chapter, Efficient spherical designs with good geometric properties in Contemporary Computational Mathematics - A Celebration of the 80th Birthday of Ian Sloan*(Springer, Berlin, 2018), pp. 1243–1285CrossRefGoogle Scholar - 35.A. Solvang, Spectral impairment of 2d higher-order ambisonics. J. Audio Eng. Soc.
**56**(4) (2008)Google Scholar - 36.M.A. Gerzon, G.J. Barton, Ambisonic decoders for HDTV, in
*prepr. 3345, 92nd AES Convention*, Vienna, 1992Google Scholar - 37.J. Daniel, J.-B. Rault, J.-D. Polack, Ambisonics encoding of other audio formats for multiple listening conditions, in
*prepr. 4795, 105th AES Convention*(San Francisco, 1998)Google Scholar - 38.M.A. Poletti, A unified theory of horizontal holographic sound systems. J. Audio Eng. Soc.
**48**(12), 1155–1182 (2000)Google Scholar - 39.M.A. Poletti,
*Three-dimensional surround sound systems based on spherical harmonics*(J. Audio Eng, Soc, 2005)Google Scholar - 40.F. Zotter, M. Frank,
*All-round ambisonic panning and decoding*(J. Audio Eng, Soc, 2012)Google Scholar - 41.F. Zotter, M. Frank, Ambisonic decoding with panning-invariant loudness on small layouts (allrad2), in
*144th AES Convention, prepr. 9943*(Milano, 2018)Google Scholar - 42.R. Pail, G. Plank, W.-D. Schuh, Spatially restricted data distributions on the sphere: the method of ortnonormalized functions and applications. J. Geodesy
**75**, 44–56 (2001)CrossRefGoogle Scholar - 43.M. Frank, A. Sontacchi, Case study on ambisonics for multi-venue and multi-target concerts and broadcasts. J. Audio Eng. Soc.
**65**(9) (2017)CrossRefGoogle Scholar - 44.ITU,
*Recommendation BS.2051: Advanced sound system for programme production*. ITU (2018)Google Scholar - 45.M. Noisternig, A. Sontacchi, T. Musil, R. Höldrich, “A 3d ambisonic based binaural sound reproduction system, in
*24th AES Conference*(Banff, 2003)Google Scholar - 46.B. Bernschütz, A. V. Giner, C. Pörschmann, J. Arend, Binaural reproduction of plane waves with reduced modal order. Acta Acust. United Acust.
**100**(5) (2014)CrossRefGoogle Scholar - 47.S. Delikaris-Manias, J. Vilkamo, Adaptive mixing of excessively directive and robust beamformers for reproduction of spatial sound, in
*Parametric Time-Frequency Domain Spatial Audio*, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley, New Jersey, 2017)CrossRefGoogle Scholar - 48.C.T. Jin, N. Epain, D. Sun, Perceptually motivated binaural rendering of higher order ambisonic sound scenes, in
*Fortschritte der Akustik AIA-DAGA*(Merano, 2013). MarchGoogle Scholar - 49.J.B. Keller, Geometrical theory of diffraction. J. Acoust. Soc. Am.
**52**(2), 116–130 (1962)MathSciNetGoogle Scholar - 50.R.S. Woodworth, H. Schlosberg,
*Experimental Psychology*(Holt, Rinehart and Winston, 1954)Google Scholar - 51.P.W. Kassakian, Convex approximation and optimization with applications in magnitude filter design and radiation pattern synthesis, Ph.D. dissertation, EECS Department, University of California, Berkeley (2006)Google Scholar
- 52.ITU,
*Recommendation BS.2076: Audio Definition Model*(ITU, 2017)Google Scholar - 53.S. Werner, G. Götz, F. Klein, Influence of head tracking on the externalization of auditory events at divergence between synthesized andf listening room using a binaural headphone system, in
*prepr. 9690, 142nd AES Convention*(Berlin, 2017)Google Scholar - 54.F. Klein, S. Werner, T. Mayenfels, Influences of tracking on externalization of binaural synthesis in situations of room divergence. J. Audio Eng. Soc.
**65**(3), 178–187 (2017)CrossRefGoogle Scholar - 55.J. Cubick, Investigating distance perception, externalization and speech intelligibility in complex acoustic environments, Ph.D. dissertation, DTU Copenhagen (2017)Google Scholar
- 56.G. Plenge, Über das Problem der Im-Kopf-Lokalisation. Acustica
**26**(5), 241–252 (1972)Google Scholar

## Copyright information

**Open Access** This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.