Abstract
Already in the 1970s, the idea of using continuous harmonic functions of scalable resolution was described by Cooper and then Gerzon, who introduced the name Ambisonics. This chapter starts by reviewing properties of first-order horizontal Ambisonics, using an interpretation in terms of panning functions. And the required mathematical formulations for 3D higher-order Ambisonics are developed here, with the idea to improve the directional resolution. Based on this formalism, ideal loudspeaker layouts can be defined for constant loudness, localization, and width, according to the previous models. The chapter discusses how Ambisonics can be decoded to less ideal, typical loudspeaker setups for studios, concerts, sound-reinforcement systems, and to headphones. The behavior is analyzed by a rich variety of listening experiments and for various decoding applications. The chapter concludes with example applications using free software tools.
...the second-order Ambisonic system offers improved imaging over a wider area than the first-order system and is suitable for larger rooms.
Jeffrey S. Bamford [1], Canadian Acoustics, 1994.
You have full access to this open access chapter, Download chapter PDF
Cooper [2] used higher-order angular harmonics to formulate circular panning of auditory events. Due to the work of Felgett [3], Gerzon [4], and Craven [5], the term Ambisonics became common for technology using spherical harmonic functions. Around the early 2000s, most notably Bamford [6], Malham [7], Poletti [8], Jot [9], and Daniel [10] pioneered the development of higher-order Ambisonic panning and decoding, Ward and Abhayapala [11], Dickens [12], and at the lab of the authors Sontacchi [13].
Another leap happened around 2010, when Ambisonic decoding to loudspeakers could be largely improved by considering regularization methods [14], singular-value decomposition [15], and all-round Ambisonic decoding (AllRAD) [15, 16], a combination of vector-base panning techniques with Ambisonics, yielding the most robust and flexible higher-order decoding method known today.
For headphones, after the work of Jot [9] that outlined the basic problems of binaural decoding in the 1990s, Sun, Bernschütz, Ben-Hur, and Brinkmann [17,18,19] made important contributions to binaural decoding, and we consider TAC and MagLS decoders by Zaunschirm and Schörkhuber [20, 21] as the essential binaural decoders. Both remove HRTF delays or optimize HRTF phases at high frequencies to avoid spectral artifacts. By interaural covariance correction, MagLS/TAC manage to play back diffuse fields consistently, using the formalism of Vilkamo et al [22].
4.1 Direction Spread in First-Order 2D Ambisonics
In 2D first-order Ambisonics as discussed in Chap. 1, the directional mapping of a single sound source from the angle \(\varphi _\mathrm {s}\) to the direction of each loudspeaker \(\varphi \) is described by the shape of panning function (or direction-spread function) in Eq. (1.17). The directional spreading is not infinitely narrow, but determined by what can be represented by first-order directivity patterns. Consequently, sound from the angle \(\varphi _\mathrm {s}\) will be mapped by a dipole pattern aligned with the source and an additional omnidirectional pattern. We can involve a spread parameter a to make the directional spread to the loudspeakers system adjustable and either cardioid-shaped \(a=1\), 2D-supercardioid-shaped \(a=\sqrt{2}\), or 2D-hypercardioid-shaped \(a=2\), using:
This function represents how first-order Ambisonic panning would distribute a mono signal to loudspeakers. With the loudspeaker positions described by the set of angles \(\{\upvarphi _l\}\), a vector of amplitude-panning gains with an entry for each loudspeaker could be determined by sampling the direction-spread function:
With these gain values, we evaluate models of perceived loudness, direction, and width, as introduced in Chap. 2, in order to enter a discussion of perceptual goals.
If the loudspeaker directions \(\{\varvec{\uptheta }_l\}\) are chosen suitably, it is possible to obtain panning-independent loudness, direction, and width measures \(E=\sum _l g_l^2\), \(\varvec{r}_\mathrm {E}=\frac{1}{E}\sum _l g_l^2\varvec{\uptheta }_l\), and \(\frac{5}{8}\frac{180^\circ }{\pi }\,2\arccos \Vert \varvec{r}_\mathrm {E}\Vert \). How is it done?
For first-order 2D Ambisonics, it is theoretically optimal to use at least a ring of 4 loudspeakers with uniform angular spacing and \(a=\sqrt{2}\), which is easily checked with the aid of a computer, cf. Fig. 4.1, and explained below and in Sect. 4.4.
Direction spread in FOA. The panning-function interpretation with its directional spread has some similarity to MDAP, with its attempt to directionally spread an amplitude-panned signal. Similar to the discrete virtual spread by \({\pm }\alpha =\arccos \Vert \varvec{r}_\mathrm {E}\Vert \) around the panning direction. The virtual direction spread of first-order Ambisonics is described by its continuous panning function \(g(\varphi )\) in Eq. (4.1). To inspect the continuous function by the \(\varvec{r}_\mathrm {E}\) measure defined in Eq. (2.7), we may evaluate an integral over the panning function instead of the sum. Because of the symmetry around \(\varphi _\mathrm {s}\), we may set for convenience \(\varphi _\mathrm {s}=0\), which knowingly causes \(r_\mathrm {E,y}=0\), and evaluate
The maximum of \(r_\mathrm {E,x}=\frac{2a}{2+a^2}\) is found by \(\frac{\mathrm {d}}{\mathrm {d}a }r_\mathrm {E,x}=\frac{4+2a^2-4a^2}{2+a^2}=0\), hence at \(a=\sqrt{2}\). Consequently, the 2D max-\(\varvec{r}_\mathrm {E}\) weight is \(r_\mathrm {E,x}=\frac{\sqrt{2}}{2}=\frac{1}{\sqrt{2}}\) and yields the angle \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert =45^\circ \). This would resemble a 2D-MDAP-equivalent source spread to \({\pm }45^\circ \). Note that first-order Ambisonics cannot map to a smaller spread than this. Only higher orders permit to further reduce this spread to a desired angle below \(90^\circ \).
Ideal loudspeaker layouts. Not only is the directional aiming of the virtual, continuous first-order Ambisonic panning function ideal and its width panning-invariant, also its loudness measure is panning-invariant. However, decoding to a physical loudspeaker setup can degrade the ideal behavior. For which loudspeaker layout are these properties preserved by sampling decoding?
The 2D first-order Ambisonic components (W, X, Y) correspond to \(\{1,\,\cos \varphi ,\,\sin \varphi \}\) patterns, a first-order Fourier series in the angle. Sampling the playback directions by \(\mathrm {L}=3\) uniformly spaced loudspeakers on the horizon, the sampling theorem for this series is already fulfilled. Accordingly, Parseval’s theorem ensures panning-invariant loudness E for any panning direction.
For an ideal \(\varvec{r}_\mathrm {E}\) measure, however, one more loudspeaker is required \(\mathrm {L}\ge 4\) for a uniformly spaced horizontal ring. To explain this increase exhaustively, the concept of circular/spherical polynomials and t-designs will be introduced in this chapter. For a brief explanation, \(g^2(\varphi )\) is a second-order expression and therefore to represent the ideal constant loudness \(E=\int g^2(\varphi )\,\mathrm {d}\varphi \) of the continuous panning function consistently after discretization \(E=\frac{2\pi }{\mathrm {L}}\sum _l g_l^2\), it requires \(\mathrm {L}=3\) uniformly spaced loudspeakers, as argued before. By contrast, the expressions \(g^2(\varphi )\cos \varphi \) and \(g^2(\varphi )\sin \varphi \) are third-order and appear in \(\varvec{r}_\mathrm {E}\cdot E=\int g^2(\varphi )\,[\cos \varphi ,\,\sin \varphi ]^\mathrm {T}\mathrm {d}\varphi \). Consequently, ideal mapping of \(\varvec{r}_\mathrm {E}\) (direction and width) requires at least one more loudspeaker \(\mathrm {L}=4\) for a uniformly spaced arrangement to make the continuous and the discretized form \(\varvec{r}_\mathrm {E}\,E=\frac{2\pi }{\mathrm {L}}\sum _l g_l^2\,[\cos \upvarphi _l,\,\sin \upvarphi _l]^\mathrm {T}\) perfectly equal.
Towards a higher-order panning function. An \(\mathrm {N{th}}\)-order cardioid pattern is obtained from the cardioid pattern by taking its \(\mathrm {N{th}}\) power
which makes it narrower. With \(\mathrm {N}=2\), this becomes, using \(\cos ^2\varphi =\frac{1}{2}(1+\cos 2\varphi )\),
More generally, Chebyshev polynomials \(T_m(\cos \varphi )=\cos m\varphi \), cf. [23, Eq. 3.11.6] can be used to argue that there is always a fully equivalent cosine series describing the higher-order 2D panning function in the azimuth angle
Rotated panning function. In first-order Ambisonics, panning functions consist of an omnidirectional part, \(\cos (0\varphi )=1\), and a figure-of-eight to x, \(\cos \varphi \), but that was not all: Recording and playback also required a figure-of-eight pattern to y, \(\sin \varphi \).
The additional component allows to express rotated first-order directivities by a basis set of fixed directivities. For higher orders, a panning function rotated to a non-zero aiming \(\varphi _\mathrm {s}\ne 0\)
can be re-expressed by the addition theorem \(\cos (\alpha +\beta )=\cos \alpha \,\cos \beta -\sin \alpha \sin \beta \) into a series involving the sinusoids (odd symmetric part of a Fourier series),
We conclude: Higher-order Ambisonics in 2D (and the associated set of theoretical microphone directivities) is based on the Fourier series in the azimuth angle \(\varphi \).
4.2 Higher-Order Polynomials and Harmonics
The previous section required that direction and length of the \(\varvec{r}_\mathrm {E}\) vector resulting from amplitude panning on loudspeakers matched the desired auditory event direction and width. Harmonic functions with strict symmetry around a panning direction \(\varvec{\theta }_\mathrm {s}\) will help us in achieving this goal and in defining good sampling.
Regardless of the dimensions, be it in 2D or 3D, we desire to define continuous and resolution-limited axisymmetric functions around the panning direction \(\varvec{\uptheta }_\mathrm {s}\) to fulfill our perceptual goals of a panning-invariant loudness E, width \(\Vert \varvec{r}_\mathrm {E}\Vert \), and perfect alignment between panning direction \(\varvec{\uptheta }_\mathrm {s}\) and localized direction \(\varvec{r}_\mathrm {E}\). Then we hope to find suitable directional discretization schemes for ideal loudspeaker layouts, so that the measures E and \(\varvec{r}_\mathrm {E}\) are perfectly reconstructed in playback.
The projection of a variable direction vector \(\varvec{\theta }\) onto the panning direction \(\varvec{\uptheta }_\mathrm {s}\) always yields the cosine of the enclosed angle \(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta }=\cos \phi \), no matter whether it is in two or three dimensions. Hereby constructing the panning function based on this projection readily meets the desired goals. The \(m\mathrm {th}\) power thereof, \((\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m=\cos ^m\phi \) helps to build an \(\mathrm {N{th}}\)-order power series \(g=\sum _{m=0}^\mathrm {N}a_{m}(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m\) to describe a virtual Ambisonic panning function.
For 2D, such a circular polynomial \(g=\sum _{m=0}^\mathrm {N}a_{m}(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m\) contains all \((\mathrm {N}+1)(\mathrm {N}+2)/2\) mixed powers by \((\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m=(\uptheta _\mathrm {xs}\theta _\mathrm {x}+ \uptheta _\mathrm {ys}\theta _\mathrm {y})^m=\sum _{k=0}^m{m \atopwithdelims ()k}\,(\uptheta _\mathrm {xs}\theta _\mathrm {x})^k\,(\uptheta _\mathrm {ys}\theta _\mathrm {y})^{m-k}\) of the direction vectors’ entries \(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}=[\uptheta _\mathrm {xs},\,\uptheta _\mathrm {ys}]\) and \(\varvec{\theta }=[\theta _\mathrm {x},\,\theta _\mathrm {y}]^\mathrm {T}\). However, we could already recognize that it only takes \(2\mathrm {N}+1\) functions to express \(g=\sum _{m=0}^\mathrm {N}a_{m}(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^m=\sum _{m=0}^\mathrm {N}a_{m}\cos ^m\phi \): First an initial polynomial with relative azimuth \(\phi =\varphi -\varphi _\mathrm {s}\) relating to a harmonic series of \(\mathrm {N}+1\) cosines or Chebyshev-polynomials \(g=\sum _{m=0}^\mathrm {N}b_m\,\cos m\phi =\sum _{m=0}^\mathrm {N}b_m\,T_m(\varvec{\uptheta }_\mathrm {s}\varvec{\theta })\). Then, in terms of absolute azimuth \(\varphi \), the trigonometric addition theorem re-expresses the series into one of \(\mathrm {N}+1\) cosines and \(\mathrm {N}\) sines, with \(T_m(\varvec{\uptheta }_\mathrm {s}\varvec{\theta })=\cos [m(\varphi -\varphi _\mathrm {s})]=\cos m\varphi _\mathrm {s}\cos m\varphi +\sin m\varphi _\mathrm {s}\sin m\varphi \). As shown in the upcoming section, we can alternatively obtain such orthonormal harmonic functions by solving a second-order differential equation that is generally used to define harmonics, which bears the later benefit that we can use the approach to define spherical harmonics in three space dimensions.
Spherical polynomials are similar, \(g=\sum _{n=0}^\mathrm {N}a_n\,(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^n\), involving the expressions \((\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })^n=(\uptheta _\mathrm {xs}\theta _\mathrm {x}+ \uptheta _\mathrm {ys}\theta _\mathrm {y}+\uptheta _\mathrm {zs}\theta _\mathrm {z})^n=\sum _{k=0}^n\sum _{l=0}^{n-k}{n \atopwithdelims ()k}{l \atopwithdelims ()n-k}\,(\uptheta _\mathrm {zs}\theta _\mathrm {z})^k(\uptheta _\mathrm {xs}\theta _\mathrm {x})^{l}(\uptheta _\mathrm {ys}\theta _\mathrm {y})^{n-k-l}\). Again, all these \((\mathrm {N}+1)(\mathrm {N}+2)(\mathrm {N}+3)/6\) combinations would be too many to form an orthogonal set of basis functions. Moreover, while the different cosine harmonics are orthogonal axisymmetric functions in 2D, they are not in 3D. On the sphere, the \(\mathrm {N}+1\) orthogonal Legendre polynomials \(P_n(\cos \phi )\) replace the cosine series as a basis for \(g=\sum _{n=0}^\mathrm {N}c_n\,P_n(\cos \phi )\), as shown below. All mathematical derivations for the sphere rely on the definition of harmonics. They result in \((\mathrm {N}+1)^2\) spherical harmonics and their addition theorem as a basis in terms of absolute directions \(\frac{2n+1}{4\pi }P_n(\varvec{\uptheta }_\mathrm {s}^\mathrm {T}\varvec{\theta })=\sum _{m=-n}^nY_n^m(\varvec{\uptheta }_s)Y_n^m(\varvec{\theta })\). Dickins’ thesis is interesting for further reading [12].
In both regimes, 2D and 3D, the circular or spherical polynomials concept will be used to determine optimal layouts, so-called t-designs. Such t-designs are directional sampling grids that are able to keep the information about the constant part of any either circular (2D) or spherical (3D) polynomials up to the order \(\mathrm {N}\le t\). This will be a mathematical key property exploited to determine requirements for preserving E and \(\varvec{r}_\mathrm {E}\) measures during Ambisonic playback with optimal loudspeaker setups, but not only. Also t-designs simplify numerical integration of circular or spherical harmonics to define state-of-the-art Ambisonic decoders or mapping effects.
4.3 Angular/Directional Harmonics in 2D and 3D
The Laplacian is defined in the \(\mathrm {D}\)-dimensional Cartesian space as
and for any function f, the Laplacian \(\bigtriangleup f\) describes the curvature. Any harmonic function is proportional to its curvature by an eigenvalue \(\lambda \),
and therefore is an oscillatory function. Generally, eigensolutions \(\bigtriangleup f=-\lambda \,f\) to the Laplacian are called harmonics. For suitable eigenvalues \(\lambda \), harmonics span an orthogonal set of basis functions that are typically used for Fourier expansion on a finite interval. It seems desirable to find such harmonics for functions only exhibiting directional dependencies, i.e. in the azimuth angle \(\varphi \) in 2D, and azimuth and zenith angle \(\varphi ,\vartheta \) in 3D.
4.4 Panning with Circular Harmonics in 2D
For 2 dimensions Appendix A.3.2 uses the generalized chain rule to convert the Laplacian of a 2D coordinate system \(\bigtriangleup =\frac{\partial ^2}{\partial x^2}+\frac{\partial ^2}{\partial y^2}\) to a polar coordinate system with the radius r and the angle \(\varphi \) to the x axis, \( \bigtriangleup =\frac{1}{r}\frac{\partial }{\partial r} + \frac{\partial ^2}{\partial r^2}+\frac{1}{r^2}\frac{\partial ^2}{\partial \varphi ^2}. \) And for functions \(\Phi =\Phi (\varphi )\) purely in the angle \(\varphi \), the radial derivatives of \(\bigtriangleup \Phi \) all vanish and it remains (\(\partial \rightarrow \mathrm {d}\))
It is only yielding useful solutions with \(\lambda \,r^2=m^2\), \(m\in \mathbb {Z}\), cf. Appendix A.3.4, Fig. 4.2,
which defines how to decompose panning functions of limited order \(|m|<\mathrm {N}\). The harmonics are periodic in azimuth, orthogonal and normalized (orthonormal) on the period \(-\pi \le \varphi \le \pi \). Due to their completeness, any square-integrable function \(g(\varphi )\) can be expanded into a series of the harmonics using coefficients \(\gamma _m\)
For a known function \(g(\varphi )\), the coefficients \(\gamma _m\) are obtained by the transformation integral
as shown in Appendix Eq. (A.14).
2D panning function. An infinitely narrow angular range around a desired direction \(|\varphi -\varphi _\mathrm {s}|<\varepsilon \rightarrow 0\) is represented by the transformation integral over a Dirac delta distribution \(\delta (\varphi -\varphi _{s})\), cf. Appendix Eq. (A.16), so that the coefficients of such a panning function are
As the infinite circular harmonic series is complete, the panning function is
and in practice we resolution-limit it to the \(\mathrm {N{th}}\) Ambisonic order, \(|m|\le \mathrm {N}\), and use an additional weight \(a_{m}\) that allows us to design its side lobes
The max-\(\varvec{r}_\mathrm {E}\) panning function [24] uses the weights \(a_m=\cos (\frac{\pi \,m}{2(\mathrm {N}+1)})\), as derived in Appendix Eq. (A.20). The spread is now adjustable by the order to \({\pm }\frac{90^\circ }{\mathrm {N}+1}\). The result is shown in Fig. 4.3, compared with no side-lobe suppression when \(a_n=1\) (basic).
It is easy to recognize: \(\Phi _m(\varphi _{s})\) represents the recorded or encoded directions, and \(\Phi _m(\varphi )\) represents the decoded playback directions.
Optimal sampling of the 2D panning function. In the theory of circular/spherical polynomials in the variable \(\zeta =\cos (\varphi -\varphi _\mathrm {s})\), so-called t-designs in 2D are optimal point sets of given angles \(\{\upvarphi _l\}\) with \(l=1,\dots ,\mathrm {L}\) and size \(\mathrm {L}\). A t-design allows to perfectly compute the integral (constant part) over the polynomials \(\mathcal {P}_m(\zeta )\) of limited degree \(m\le t\) by discrete summation
regardless of any angular shift \(\varphi _\mathrm {s}\). In 2D, Chebyshev polynomials \(T_m(\cos \phi )=\cos (m\phi )\) are orthogonal polynomials, therefore an \(\mathrm {N{th}}\)-order panning function composed out of \(\cos (m\phi )\) is always a polynomial of \(\mathrm {N{th}}\) degree. Knowing this, it is clear that the integral over \(g_\mathrm {N}^2\) required to evaluate the loudness measure E is a polynomial of the order \(2\mathrm {N}\). The integral to calculate \(\varvec{r}_\mathrm {E}\) is over \(g_\mathrm {N}^2\,\cos (\phi )\) and thus of the order \(2\mathrm {N}+1\). In playback, to get a perfectly panning-invariant loudness measure E of the continuous panning function and also the perfectly oriented \(\varvec{r}_\mathrm {E}\) vector of constant spread \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert \), the parameter t must be \(t\ge 2\mathrm {N}+1\). In 2D, all regular polygons are t-designs with \(\mathrm {L}=t+1\) points
We can use the smallest set of \(2\mathrm {N}+2\) angles \(\upvarphi _l=\frac{180^\circ }{\mathrm {N}+1}\,(l-1)\) as optimal 2D layout.
4.5 Ambisonics Encoding and Optimal Decoding in 2D
To encode a signal s into Ambisonic signals \(\chi _m\), we multiply the signal with the encoder representing the direction of the signal at the angle \(\varphi _\mathrm {s}\) by the weights \(\Phi _m(\varphi _\mathrm {s})\)
or in vector notation
using the column vector \(\varvec{y}_\mathrm {N}=[\Phi _{-\mathrm {N}}(\varphi _\mathrm {s}),\,\dots ,\,\Phi _{\mathrm {N}}(\varphi _\mathrm {s})]^\mathrm {T}\) of \(2\mathrm {N}+1\) components. The Ambisonic signals in \(\varvec{\chi }_\mathrm {N}\) are weighted by side-lobe suppressing weights \(\varvec{a}_\mathrm {N}=[a_{|-\mathrm {N}|},\,\dots ,\,a_\mathrm {N}]^\mathrm {T}\), expressed by the multiplication with a diagonal matrix \(\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\), and then decoded to the \(\mathrm {L}\) loudspeaker signals \(\varvec{x}\) by a sampling decoder
using
In total, the system for encoding and decoding can also be written to yield a set of loudspeaker gains for one virtual source
or in particular for the 2D sampling decoder \(\varvec{g}=\sqrt{\frac{2\pi }{\mathrm {L}}}\,\varvec{Y}_\mathrm {N}^\mathrm {T}\,\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\,\varvec{ y}_\mathrm {N}(\varphi _\mathrm {s})\).
4.6 Listening Experiments on 2D Ambisonics
There are several listening experiments discussing the features of Ambisonics, most of which are summarized in [25], which will be discussed complemented with those from [26] below.
The perceptually adjusted panning angle of \(2\mathrm {nd}\)-order max-\(\varvec{r}_\mathrm {E}\) Ambisonics panning on 6 horizontal loudspeakers matches quite well the acoustic reference direction as shown in Fig. 4.4, similar to MDAP in Fig. 3.8, but with a slightly more accurate median by \(0.5^\circ \) on average, and in particular at side and back panning directions.
Another aspect to investigate is how stable the results are for center and off-center listening seats as shown in Fig. 4.5. It illustrates that \(\mathrm {max}\)-\(\varvec{r}_\mathrm {E}\) with the highest order achieves the best stability with regard to localization at off-center listening seats. Astonishingly, the delay compensation for non-uniform delay times to the center deteriorated the results, most probably because of the nearly linear frontal arrangement of loudspeakers that is more robust to lateral shifts of the listening positions than a circular arrangement.
Figure 4.6a, b shows the direction histogram for two different weightings \(a_m\), and it illustrates that proper sidelobe suppression of the panning function by using \(\mathrm {max}\)-\(\varvec{r}_\mathrm {E}\) weights is decisive at shifted listening positions to avoid splitting of the auditory image, as it appears in Fig. 4.6b without the weights (basic).
Peter Stitt’s work shows that the localization offsets at off-center listening seats do not increase with the radius of the loudspeaker arrangement as long as the off-center seat stays in proportion to the radius, Fig. 4.6c. The result are predicted by the sweet area model from Sect. 2.2.9 for the first order (top row) and third order (bottom row) in Fig. 4.7, with both sizes small setup (left) and large setup (right).
Frank’s 2016 experiments [29] used scales on the floor from which listeners read off where the sweet area ends in every radial direction, cf. Fig. 4.8a. For Fig. 4.8b, the criterion for listeners to indicate leaving the sweet area was when the frontally panned sound was mapped outside the loudspeaker pairs L, C, and R. It showed that a sweet area providing perceptually plausible playback measures at least \(\frac{2}{3}\) of the radius of the loudspeaker setup if the order is high enough.
The perceived width of auditory events is investigated in the experimental results of Fig. 4.9, [25], in which pink noise was frontally panned in different orientations of the loudspeaker ring (with one loudspeaker in front, with front direction lying quarter- and half-spaced wrt. loudspeaker spacing). Listeners compared the width of multiple stimuli, and the results were expected to indicate constant width for the differently rotated loudspeaker ring, as the optimal arrangement with \(\mathrm {L}=2\mathrm {N}+2\) provides constant \(\varvec{r}_\mathrm {E}\) length. The panning-invariant length is not perfectly reflected in the perceived widths with \(3\mathrm {rd}\) order on 8 loudspeakers, for which the on-loudspeaker position is perceived as being significantly wider. By contrast, the high-order experiment with \(7\mathrm {th}\) order on 16 loudspeakers would perfectly validate the model.
Figure 4.10 shows experiments investigating the time-variant change in sound coloration for a pink-noise virtual source rotating at a speed of \(100^\circ \)/s, and for different Ambisonic panning setups. There is an obvious advantage of a reduced fluctuation in coloration at both listening positions, centered and off-center, when using the side-lobe-suppressing “\(\mathrm {max}\)-\(\varvec{r}_\mathrm {E}\)” weighting instead of the “basic” rectangular truncation of the Fourier series. At the off-center listening position, \(\mathrm {max}\)-\(\varvec{ r}_\mathrm {E}\) weights achieve good results with regard to constant coloration for both \(3\mathrm {rd}\) and \(7\mathrm {th}\) order arrangements with 8 and 16 loudspeakers that were investigated.
How well would diffuse signals be preserved played back? All the above experiments deal with how non-diffuse signals are presented. To complement what is shown in Fig. 1.21 of Chap. 1 with an explanation, the relation between Ambisonic order and its ability to preserve diffuse fields is estimated here by the covariance between uncorrelated directions. Assume a max-\(\varvec{r}_\mathrm {E}\)-weighted \(\mathrm {N{th}}\)-order Ambisonic panning function \(g(\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{\theta })\) that is normalized to \(g(1)=1\), encodes two sounds \(s_{1,2}\) from two directions \(\varvec{\theta }_1\) and \(\varvec{\theta }_2\), with the sounds being uncorrelated and unit-variance \(E\{s_1s_2\}=\delta _{1,2}\). We can find that the Ambisonic representation mixes the sounds at their respective mapped directions and yields an increase of their correlation \(x_1=s_{1}+g_{12}\,s_2\) and \(x_2=s_{2}+g_{12}\,s_1\), using \(g_{12}=g(\cos \phi )\),
This result was presented in Fig. 1.21 and was used to argue that the directional separation of first-order Ambisonics by its high crosstalk term \(g_{12}\) might be too weak. Higher-order Ambisonics decreases this directional crosstalk and therefore improves the representation of diffuse sound fields.
4.7 Panning with Spherical Harmonics in 3D
In three space dimensions, the spherical coordinate system has a radius r and two angles, azimuth \(\varphi \) indicating the polar angle of the orthogonal projection to the xy plane, and the zenith angle \(\vartheta \) indicating the angle to the z axis, according to the right-handed spherical coordinate systems in ISO31-11, ISO80000-2, [30, 31], Fig. 4.11.
By the generalized chain rule, Appendix A.3 re-writes the Laplacian to spherical coordinates in 3D with r signifying the radius, \(\varphi \) the azimuth angle, and the zenith angle \(\vartheta \) re-expressed as \(\zeta =\frac{z}{r}=\cos \vartheta \), yielding the operator \( \bigtriangleup = \frac{2}{r}\frac{\partial }{\partial r} + \frac{\partial ^2}{\partial r^2}+\frac{1}{r^2(1-\zeta ^2)}\frac{\partial ^2}{\partial \varphi ^2} -\frac{2}{r^2}\zeta \frac{\partial }{\partial \zeta }+ \frac{1-\zeta ^2}{r^2} \frac{\partial ^2}{\partial \zeta ^2}\). Any radius-dependent part is removed to define an eigenproblem yielding the basis for panning functions, taking only \(r^2\bigtriangleup _{\upvarphi ,\upzeta ,3\mathrm {D}}\),
whose solution with \(\lambda =n(n+1)\) defines the spherical harmonics
The pre-requisites are (i) periodicity in \(\varphi \) and (ii) that the function \(Y_n^m\) is finite on the sphere. In addition to the circular harmonics \(\Phi _m\) expressing the dependency on azimuth \(\varphi \) according to Eq. (4.10), the spherical harmonics contain the associated Legendre functions \(P_n^m\) and their normalization term
to express the dependency on the zenith angle \(\vartheta \). The index \(n\ge 0\) expresses the order and the directional resolution can be limited by requiring \(0\le n\le \mathrm {N}\). The index m is the degree and for each n it is limited by \(-n\le 0 \le n\).
The spherical harmonics, Fig. 4.12, are orthonormal on the sphere \(-\pi \le \varphi \le \pi \) and \(0\le \vartheta \le \pi \), and for unbounded order \(\mathrm {N}\rightarrow \infty \) they are complete; see also Appendix A.3.7.
The spherical harmonics permit a series representation of square-integrable 3D directional functions by the coefficients \(\gamma _{nm}\),
From a known function \(g(\varvec{\theta })\), the coefficients are obtained by the transformation integral over the unit sphere \(\mathbb {S}^2\), cf. appendix Eq. (A.38)
Note that the above N3D normalization \(\int _{\varvec{\theta }\in \mathbb {S}^2}|Y_n^m(\varvec{\theta })|^2\,\mathrm {d}\varvec{\theta }=1\) defines each spherical harmonic except for an arbitrary-phase it might be multiplied with. Legendre functions for the zenith dependency might be defined differently in literature, and for azimuth, some implementations use \(\sin (m\varphi )\) instead of \(\sin (|m|\varphi )\). In Ambisonics, real-valued functions and the SN3D normalization \(\sqrt{\frac{1}{2}\frac{(n-|m|)!}{(n+|m|)!}}\) are preferred, and positive signs of the first-order dipole components in the directions of the respective coordinate axes, x, y, z, are preferred. This might require to involve the Condon-Shortley phase \((-1)^m\) to correct the signs of the Legendre functions, or \(-1\) for \(m<0\) to correct the sign of azimuthal sinusoids, depending on the implementation of the respective functions. It is often helpful to employ converters and directional checks to ensure compatibility!
3D panning function. An infinitely narrow direction range around a desired direction \(\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{\theta }>\cos \varepsilon \rightarrow 1\) is represented by the transformation integral over the Dirac delta \(\delta (1-\varvec{\theta }_\mathrm {s}^\mathrm {T}\varvec{\theta })\), cf. Eq.(A.41), so that the coefficients of the panning function are
As infinitely many spherical harmonics are complete, the panning function is
and in practice, the finite-resolution \(\mathrm {N{th}}\)-order panning function with \(n\le \mathrm {N}\) employs a weight \(a_n\) to reduce side lobes and optimize the spread
The max-\(\varvec{r}_\mathrm {E}\) panning function uses the weights \(a_n=P_n\bigl [\cos (\frac{137.9^\circ }{\mathrm {N}+1.51})\bigr ]\), as derived in Appendix Eq. (A.46). The spread is now adjustable by the order to \({\pm }\frac{137.9^\circ }{\mathrm {N}+1.51}\). Figure 4.13 shows a comparison to the basic weighting \(a_n=1\). An alternative expression that uses Legendre polynomials \(P_n\) and only depends on the angle \(\phi \) to the panning direction \(\varvec{\theta }_\mathrm {s}\) is obtained by replacing the sum over m by the spherical harmonics addition theorem \(\sum _{m=-n}^n Y_n^m(\varvec{\theta }_\mathrm {s})\,Y_n^m(\varvec{\theta }) ={\textstyle \frac{2n+1}{4\pi }}\,P_n(\cos \phi )\),
Comparison to first-order Ambisonics shows: now \(Y_n^m(\varvec{\theta }_{s})\) represents the recorded or encoded directions, and \(Y_n^m(\varvec{\theta })\) represents the decoded playback directions.
Optimal sampling of the 3D panning function. In the theory of spherical polynomials in the variable \(\zeta =\varvec{\theta }_s^\mathrm {T}\varvec{\theta }\), so-called t-designs describe point sets of given directions \(\{\varvec{\uptheta }_l\}\) with \(l=1,\dots ,\mathrm {L}\) and size \(\mathrm {L}\) that allow to perfectly compute the integral (constant part) over the polynomials \(\mathcal {P}_n(\zeta )\) of limited order \(n\le t\) by discrete summation
relative to any axis \(\varvec{\theta }_\mathrm {s}\) the point set is projected onto. In 3D, the Legendre polynomials \(P_n(\zeta )\) are orthogonal polynomials, therefore an \(\mathrm {N{th}}\)-order panning function composed thereof is a polynomial of \(\mathrm {N{th}}\) order. The loudness measure E is calculated by the integral over \(g_\mathrm {N}^2\), therefore over a polynomial of the order \(2\mathrm {N}\). The integral to calculate \(\varvec{r}_\mathrm {E}\) runs over \(g_\mathrm {N}^2\,\zeta \), therefore over a polynomial of the order \(2\mathrm {N}+1\). In playback, to get a perfectly panning-invariant loudness measure E of the continuous panning function and also the perfectly oriented \(\varvec{r}_\mathrm {E}\) vector of constant spread \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert \), the parameter t must be \(t\ge 2\mathrm {N}+1\). In 3D there are only 5 geometrically regular layouts
-
the tetrahedron, \(\mathrm {L}=4\) corners, is a 2-design,
-
the octahedron, \(\mathrm {L}=6\) corners, is a 3-design,
-
the hexahedron (cube), \(\mathrm {L}=8\) corners, is a 3-design,
-
the icosahedron, \(\mathrm {L}=12\) corners, is a 5-design,
-
the dodecahedron, \(\mathrm {L}=20\) corners, is a 5-design.
For instance, for \(\mathrm {N}=1\), the octahedron is a suitable spherical design, for \(\mathrm {N}=2\), the icosahedral or dodecahedral layouts are suitable.
Exceeding the geometrically regular layouts, there are designs found by optimization to be regular under the mathematical rule to approximate \(\int _{\mathbb {S}^2}Y_n^m(\varvec{\theta })\,\mathrm {d}\varvec{\theta }=\sqrt{4\pi }\delta _n\) accurately by \(\frac{4\pi }{\mathrm {L}}\sum _lY_n^m(\varvec{\uptheta }_l)\) for all \(n\le t\) and \(|m|\le n\). A large collection can be found by Hardin and Sloane [32], Gräf and Potts [33], and Womersley [34] available on the following websites
http://neilsloane.com/sphdesigns/dim3/
http://homepage.univie.ac.at/manuel.graef/quadrature.php
(Chebyshev-type Quadratures on \(\mathbb {S}^2\)), and
https://web.maths.unsw.edu.au/~rsw/Sphere/EffSphDes/ss.html.
Figure 4.14 gives some graphical examples.
4.8 Ambisonic Encoding and Optimal Decoding in 3D
To encode a signal s into Ambisonic signals \(\chi _{nm}\), we multiply the signal with the encoder representing the direction \(\varvec{\theta }_\mathrm {s}\) of the signal by the weights \(Y_n^m(\varvec{\theta }_\mathrm {s})\)
or in vector notation
using the column vector \(\varvec{y}_\mathrm {N}=[Y_0^0(\varvec{\theta }_\mathrm {s}),\,Y_1^{-1}(\varvec{\theta }_\mathrm {s}),\,\dots ,\,Y_{\mathrm {N}}^\mathrm {N}(\varvec{\theta }_\mathrm {s})]^\mathrm {T}\) of \((\mathrm {N}+1)^2\) components. The Ambisonic signals in \(\varvec{\chi }_\mathrm {N}\) are weighted by side-lobe suppressing weights \(\varvec{a}_\mathrm {N}=[a_0,\,a_1,a_1,a_1,\, a_2,\dots ,a_\mathrm {N}]^\mathrm {T}\), expressed by the multiplication with a diagonal matrix \(\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\), and then decoded to the \(\mathrm {L}\) loudspeaker signals \(\varvec{x}\) by a sampling decoder
using
In total, the system for encoding Eq. (4.35) and decoding Eq. (4.36) can also be written to yield loudspeaker gains for one signal
or in particular for the 3D sampling decoding \(\varvec{g}=\sqrt{\frac{4\pi }{\mathrm {L}}}\, \varvec{Y}_\mathrm {N}^\mathrm {T}\,\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\,\varvec{y}_\mathrm {N}(\varvec{\theta }_\mathrm {s})\).
4.9 Ambisonic Decoding to Loudspeakers
Ambisonic decoding to loudspeakers has been dealt with by numerous researchers, in the past, particularly because result are not very stable for first-order Ambisonics, and later because they strongly depend on how uniform the loudspeaker layout is for higher-order Ambisonics. Moreover, Solvang found that even the use of too many loudspeakers has a degrading effect [35].
For first-order decoding, the Vienna decoders by Michael Gerzon [36] are often cited, and for higher-order Ambisonic decoding, one can, e.g. find works by Daniel with max-\(\varvec{r}_\mathrm {E}\) [37] and pseudo-inverse decoding [10], also by Poletti [14, 38, 39].
What turned out to be the most practical solution, is the All-Round Ambisonic Decoding approach (AllRAD) due to its feature of allowing imaginary loudspeaker insertion and downmix as described in the sections above, cf. [40]. It moreover does not have restrictions on the Ambisonics order, which for other decoders often yields poor controllability of panning-dependent fluctuations in loudness and directional mapping errors.
The playable set of directions \(\varvec{\uptheta }_l\) or \(\upvarphi _l\) is usually finite and discrete, and it is represented by the surrounding loudspeakers’ directions. The directional distribution of the surrounding loudspeakers is typically neither a t-design (with \(t\ge 2\mathrm {N}+1\) in general, sometimes not even regular polygons with \(\mathrm {L}\ge 2\mathrm {N}+2\) loudspeakers for 2D, in particular). In such cases, it is extremely helpful to be aware of the properties of the various decoder design methods.
4.9.1 Sampling Ambisonic Decoder (SAD)
The sampling decoder as introduced above is the simplest decoding method. For dimensions (\(\mathrm {D}=2\)) and three (\(\mathrm {D}=3\)), it uses the matrix \(\varvec{Y}_\mathrm {N}=[\varvec{y}_\mathrm {N}(\varvec{\uptheta }_1),\,\dots ,\,\varvec{y}_\mathrm {N}(\varvec{\uptheta }_\mathrm {L})]\) containing the respective circular or spherical harmonics \(\varvec{y}_\mathrm {N}(\varvec{\theta })\) sampled at the loudspeaker directions \(\{\varvec{\uptheta }_l\}\),
with the circumference of the unit circle denoted as \(S_1=2\pi \) or the surface of the unit sphere written as \(S_2=4\pi \). The factor \(\sqrt{\frac{S_{\mathrm {D}-1}}{\mathrm {L}}}\) expresses that each loudspeaker synthesizes a fraction of the E measure on the circle or sphere of the surrounding directions. However, the sampling decoder would neither yield perfectly constant loudness and width measures, E, \(\Vert \varvec{r}_\mathrm {E}\Vert \), nor a correct aiming of the localization measure \(\varvec{r}_\mathrm {E}\) if the loudspeaker layout wasn’t optimal. For instance concerning loudness, for panning towards directional regions of poor loudspeaker coverage, sampling misses out the main lobe of the panning function, yielding a noticeably reduced loudness.
4.9.2 Mode Matching Decoder (MAD)
The mode-matching method is used in [10, 39] and yields a fundamentally different decoder design. Its concept is to re-encode the gain vector \(\varvec{g}\) of the loudspeakers for any panning direction \(\varvec{\theta }_\mathrm {s}\) by the encoding matrix \(\varvec{Y}_\mathrm {N}=[\varvec{y}_\mathrm {N}(\varvec{\uptheta }_1),\,\dots ,\,\varvec{y}_\mathrm {N}(\varvec{\uptheta }_\mathrm {L})]\) for all loudspeaker directions \(\{\varvec{\uptheta }_l\}\). Ideally, the re-encoded result should match the encoding of the panning direction with sidelobes suppressed
Using the definition \(\varvec{g}=\varvec{D}\,\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\,\varvec{y}_\mathrm {N}(\varvec{\theta }_s)\) of the panning gains, we obtain
so that the decoder \(\varvec{D}\) is required to be right-inverse to the matrix \(\varvec{Y}_\mathrm {N}\), i.e. \(\varvec{Y}_\mathrm {N}\,\varvec{D}=\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T}(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}=\varvec{I}\), see Eq. (A.63) in Appendix A.4. For the inverse of \(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T}\) to exist, it is necessary to have at least as many loudspeakers as harmonics, i.e. \(\mathrm {L}\ge (\mathrm {N}+1)^2\) with \(\mathrm {D}=3\) or \(\mathrm {L}\ge 2\mathrm {N}+1\) for \(\mathrm {D}=2\). However, this is not a sufficient criterion yet: In directions poorly covered with loudspeakers, the inversion will boost the loudness, so that the result is often numerically ill conditioned for \((\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}\) unless the loudspeaker layout is uniformly designed, at least. Mode matching decoding is ill-conditioned on hemispherical or semicircular loudspeaker layouts. The solution is equivalently described by the more general pseudo inverse \(\varvec{Y}_\mathrm {N}^\dagger \), which is right-inverse for fat matrices.
4.9.3 Energy Preservation on Optimal Layouts
For instance, for an order of \(\mathrm {N}=2\), 2D Ambisonics should work optimally with a ring of \(45^\circ \) spaced loudspeakers on the horizon, a circular \((2\mathrm {N}+1)\)-design, or for 3D, a spherical \((2\mathrm {N}+1)\)-design. On a t-design selected by \(t\ge 2\mathrm {N}\), the loudness measure E is panning-invariant, in general,
This is because a \(t\ge 2\mathrm {N}\)-design discretization preserves orthonormality
which implies for the sampling decoder \(\varvec{D}^\mathrm {T}\varvec{D}=\frac{S_{\mathrm {D}-1}}{\mathrm {L}}\,\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T}=\varvec{I}\), and we notice the panning invariant norm of \(g(\varvec{\theta })\) within its coefficients \(\varvec{\gamma }_\mathrm {N}=\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\varvec{y}_\mathrm {N}(\varvec{\theta }_\mathrm {s})\) by the Parseval theorem \(\int g^2(\varvec{\theta })\,\mathrm {d}\varvec{\theta }=\Vert \varvec{\gamma }_\mathrm {N}\Vert ^2\). The panning invariant E measure also holds for the mode-matching decoder using a \(t\ge 2\mathrm {N}\)-design, as it becomes equivalent to a sampling decoder \(\varvec{D}={\sqrt{\frac{\mathrm {L}}{S_{\mathrm {D}-1}}}}\varvec{Y}_\mathrm {N}^\mathrm {T}(\varvec{Y}_\mathrm {N}\varvec{Y}_\mathrm {N}^\mathrm {T})^{-1}= {\sqrt{\frac{\mathrm {L}}{S_{\mathrm {D}-1}}}}\varvec{Y}_\mathrm {N}^\mathrm {T}{\frac{S_{\mathrm {D}-1}}{\mathrm {L}}}={\sqrt{\frac{S_{\mathrm {D}-1}}{\mathrm {L}}}}\varvec{Y}_\mathrm {N}^\mathrm {T}\). Under these ideal conditions, both decoders are energy-preserving.
4.9.4 Loudness Deficiencies on Sub-optimal Layouts
For 2D layouts, Fig. 4.15 shows what happens if a decoder is calculated for a \(t\ge 2\mathrm {N}+1\)-design with one loudspeaker removed: While, for panning across the gap, the sampling Ambisonic decoder (SAD) yields a quieter signal, moderate localization errors and width fluctuation, the mode-matching decoder (MAD) yields a strong loudness increase and severe jumps in the localization/width. MAD is therefore not very practical with sub-optimal layouts, SAD only slightly more so.
4.9.5 Energy-Preserving Ambisonic Decoder (EPAD)
To establish panning-invariant loudness for decoding to non-uniform surround loudspeaker layouts one can ensure a constant loudness measure E by enforcing \(\varvec{D}^\mathrm {T}\varvec{D}=\varvec{I}\), which is otherwise only achieved on \(t\ge 2\mathrm {N}\)-designs. We may search for a decoding matrix \(\varvec{D}\) whose entries are closest to the sampling decoder under the constraint to be column-orthogonal:
The singular value decomposition of
can be used to create
by replacing the singular values \(\varvec{s}\) with ones. Such a decoder is column-orthogonal, as the singular-value decomposition delivers \(\varvec{U}^\mathrm {T}\varvec{U}=\varvec{I}\) and \(\varvec{V}\varvec{V}^\mathrm {T}=\varvec{I}\), and as a consequenceFootnote 1 \(\varvec{D}^\mathrm {T}\varvec{D}=\varvec{I}\). The energy-preserving decoder in this basic version requires \(\mathrm {L}\ge 2\mathrm {N}+1\) loudspeakers in 2D or \(\mathrm {L}\ge (\mathrm {N}+1)^2\) in 3D to work.
Note that if the loudspeaker setup directions are already a \(t\ge 2\mathrm {N}\) design, the sampling, mode-matching, and energy-preserving decoders are equivalent.
4.9.6 All-Round Ambisonic Decoding (AllRAD)
In Chap. 3 on vector-base amplitude panning methods, a well-balanced panning result in terms of loudness, width, and localization was achieved by MDAP that distributes a signal to an arrangement of several superimposed VBAP virtual sources. Hereby \(E=\text {const.}\), \(\varvec{r}_\mathrm {E}\approx r_\mathrm {E}\,\varvec{\theta }_\mathrm {s}\), and \(r_\mathrm {E}\approx \text {const}\). This works for nearly any loudspeaker layout.
While, to calculate loudspeaker gains, MDAP superimposes an arrangement of discrete virtual sources within a range of \({\pm }\alpha \) around the panning direction \(\varvec{\theta }_\mathrm {s}\), one could also think of superimposing a quasi-continuous distribution of virtual sources that are weighted by a continuous panning function \(g(\varvec{\theta })\).
The ideal continuous panning function \(g(\varvec{\theta })\) of axisymmetric directional spread around the panning direction \(\varvec{\theta }_\mathrm {s}\) is described by \(g(\varvec{\theta })=\varvec{y}_\mathrm {N}^\mathrm {T}(\varvec{\theta })\,\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\,\varvec{y}_\mathrm {N}(\varvec{\theta }_\mathrm {s})\), the Ambisonic panning function. This rotation-invariant continuous function is optimal in terms of loudness, width, and localization measures, which are all evaluated by continuous integrals: \(E=\int g^2(\varvec{\theta })\,\mathrm {d}\varvec{\theta }=\text {const.}\) expresses panning-invariant loudness, \(\varvec{r}_\mathrm {E}=\frac{1}{E}\int g^2(\varvec{\theta })\,\varvec{\theta }\,\mathrm {d}\varvec{\theta }=r_\mathrm {E}\,\varvec{\theta }_\mathrm {s}\) indicates a perfect alignment \(\varvec{r}_\mathrm {E}\parallel \varvec{\theta }_\mathrm {s}\) with the panning direction and a panning-invariant width \(r_\mathrm {E}=\text {const}\). However, the optimal values of these integrals are only preserved by discretization with optimal \(t\ge 2\mathrm {N}+1\)-design loudspeaker layouts.
All-round Ambisonic decoding (AllRAD) is preceded by the work of Batke and Keiler [16]. They describe Ambisonic panning \(\varvec{g}_\mathrm {AllRAD}(\varvec{\theta })=\varvec{D}\,\varvec{y}_\mathrm {N}(\varvec{\theta })\) by a decoder \(\varvec{D}\), whose result matches best with VBAP \(\varvec{g}_\mathrm {VBAP}(\varvec{\theta })\). Without max-\(\varvec{r}_\mathrm {E}\) weights yet, we use this here to define AllRAD by the integral expressing a minimum-mean-square-error problem using the integral over all panning directions \(\varvec{\theta }\)
Equivalently, as described by Zotter and Frank [40] who coined the name, we may define AllRAD as VBAP synthesis on the physical loudspeakers when using as multiple-virtual-source inputs the Ambisonic panning function \(g_\mathrm {AMBI}(\varvec{\theta })=\varvec{y}_\mathrm {N}^\mathrm {T}(\varvec{\theta })\,\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\,\varvec{y}_\mathrm {N}(\varvec{\theta }_\mathrm {s})\) sampled at an optimal layout of virtual loudspeakers. Here, we write the synthesis as the integral over infinitely many virtual loudspeakers \(\varvec{\theta }\),
We can obviously pull the term \(\mathrm {diag}\{\varvec{a}_\mathrm {N}\}\varvec{y}_\mathrm {N}(\varvec{\theta }_\mathrm {s})\) out of the integral. The remaining integral defines the AllRAD matrix \(\varvec{D}\). We may interpret it as a transformation of the VBAP loudspeaker gain functions \(\varvec{g}_\mathrm {VBAP}(\varvec{\theta })\) into spherical harmonic coefficients. In the original paper [40], AllRAD is evaluated by an optimal layout of discrete virtual loudspeakers
using the directions \(\{\varvec{\hat{\uptheta }}_l\}\) of a t-design. As VBAP’s gain functions aren’t smooth (derivatives are non-continuous), they are order-unlimited, and a t-design of sufficiently high t should be used. In the 3D practice, the 5200 pts. Chebyshev-type design from [33] is dense enough. Note that the VBAP part permits improvements by insertion and downmix of imaginary loudspeakers to adapt to asymmetric or hemispherical layouts, as suggested in the original paper [40], cf. Sect. 3.3.
Note that the decoder needs to be scaled properly. For instance, the norm of the omnidirectional component (first column) could be equalized to one, as it would typically be with a sampling decoder; there are alternative strategies to circumvent the scaling problem [41].
4.9.7 EPAD and AllRAD on Sub-optimal Layouts
Figude 4.16 shows the improvement achieved with EPAD and AllRAD on an equi-angular arrangement that is suboptimal by the missing loudspeaker at \(-90^\circ \). Both decoders manage to handle either the loudness stabilization perfectly well (EPAD) or keep the directional and spread mapping errors small (AllRAD). We notice that for EPAD, with the constraint that \(\mathrm {L}\ge (2\mathrm {N}+1)\) just fulfilled for \(\mathrm {N}=3\) and \(\mathrm {L}=7\) of the simulation, it would not simply be possible to remove any further loudspeakers without degradation.
4.9.8 Decoding to Hemispherical 3D Loudspeaker Layouts
In typical loudspeaker playback situations for large audience, a solid floor and no loudspeakers below ear level are considered practical for several reasons. However, this does not permit decoding by sampling with optimal t-design layouts covering all directions. As shown above, EPAD and AllRAD do not require such arrays. And yet, they still require some care when used with hemispherical loudspeaker layouts, see [15, 40] for further reading.
EPAD with hemispherical loudspeaker layouts. Even for a hemispherical layout, the energy-preserving decoding method requires \(\mathrm {L}\ge (\mathrm {N}+1)^2\) loudspeakers to achieve a perfectly panning-invariant loudness. However, this is counter-intuitive: Why should one need at least as many loudspeakers on a hemisphere as are required for same-order playback on a full sphere? Shouldn’t the number be half as many?
We can show that while the spherical harmonics are orthonormal on the sphere \(\mathbb {S}^2\), i.e. \(\int _{\mathbb {S}^2}\varvec{y}_\mathrm {N}(\varvec{\theta })\varvec{y}^\mathrm {T}_\mathrm {N}(\varvec{\theta })\,\mathrm {d}\varvec{\theta }=\varvec{I}\), they aren’t orthogonal on the hemisphere \(S=\mathbb {S}^2:\vartheta \le \vartheta _\mathrm {max}\)
Here, \(\varvec{G}\) is called Gram matrix, and it is evaluated by \( \frac{4\pi }{\hat{\mathrm {L}}}\sum _{l:\uptheta _{\mathrm {z},l}\ge 0}\varvec{y}_\mathrm {N}(\varvec{\uptheta }_l)\varvec{y}^\mathrm {T}_\mathrm {N}(\varvec{\uptheta }_l)\) using a high-enough t-design. By singular-value decomposition of the positive semi-definite matrix \(\varvec{G}=\varvec{Q}\,\mathrm {diag}\{\varvec{s}\}\,\varvec{Q}^\mathrm {T}\), with \(\varvec{Q}^\mathrm {T}\varvec{Q}=\varvec{Q}\varvec{Q}^\mathrm {T}=\varvec{I}\), we diagonalize \(\varvec{G}\) and find new basis functions \(\varvec{\tilde{y}}_\mathrm {N}(\varvec{\theta })\), the so-called Slepian functions [42], that are orthogonal on S
Typically, the singular values in \(\varvec{s}\) are sorted descendingly \(s_1\ge s_2\ge \dots \ge s_{(\mathrm {N}+1)^2}\) so that it is possible to cut out basis functions of significantly large contribution to the upper hemisphere S by
Typically, the numerical integral is extended to slightly below the horizon, see Table 4.1, so that truncation to the \((\mathrm {N}+1)(\mathrm {N}+2)/2\) most significant basis functions, see Fig. 4.17, produces a minimum fluctuation in the loudness measure \(\tilde{E}=\Vert \varvec{\tilde{y}}_\mathrm {N}(\varvec{\theta })\Vert ^2\) for panning on the hemisphere.
With \(\varvec{\tilde{y}}_\mathrm {N}(\varvec{\theta })\), EPAD is calculated in the same way as for the ordinary harmonics
with the main difference that the lower limit for the number of loudspeakers decreases to \(\mathrm {L}\ge (\mathrm {N}+1)(\mathrm {N}+2)/2\). Interfaced to the spherical harmonics by \([\varvec{I},\,\varvec{0}]\,\varvec{Q}^\mathrm {T}\), the hemispherical energy-preserving decoder becomes
AllRAD with hemispherical loudspeaker layouts. Because of the vector-base amplitude panning involved, all-round Ambisonic decoding (AllRAD) is comparatively robust to irregular loudspeaker setups. Still, a hemispherical layout does not contain any loudspeaker direction vector pointing to the lower half space, therefore one could just omit information of the lower half space. However, the Ambisonic panning function implies a directional spread, so that panning to exactly the horizon also produces content below, whose omission causes: (i) a loss in loudness, (ii) a slight elevation of the perceived direction, cf. Fig. 4.18.
As discussed in the section on triangulation Sect. 3.3, the insertion of imaginary loudspeakers fixes this behavior. In the case of hemispherical loudspeaker layouts, it is not necessary to downmix the signal of the imaginary loudspeaker at nadir to stabilize both loudness and localization for panning to the horizon.
Signal contributions below but close to the horizon largely contribute to the horizontal loudspeakers, and it is therefore safe to dispose the signal that would feed the imaginary loudspeaker at nadir without loss of loudness. Moreover, this contribution from below also reinforces signals on the horizontal loudspeakers so that localization is pulled back down. Both can be observed in Fig. 4.18 that shows the loudness measure E as well as mislocalization and width by the measure \(\varvec{r}_\mathrm {E}\) using max-\(\varvec{r}_\mathrm {E}\)-weighted AllRAD with \(5\mathrm {th}\)-order Ambisonics along a vertical panning circle on the IEM mobile Ambisonics Array (mAmbA). It consists of 25 loudspeakers set up in rings of 8, 8, 4, 4, and 1 loudspeakers at 0, 20, 40, 60, 90 degrees elevation. Rings two and four start at 0 degree, the others are half-way rotated.
Performance comparison on hemispherical layouts. Figure 4.18 shows a comparison of AllRAD and EPAD decoding to the 25-channel mAmbA hemispherical loudspeaker layout.
While (top in Fig. 4.18) AllRAD produces a loudness fluctuation roughly spanning 1 dB for panning on the hemisphere, EPAD only exhibits 0.3 dB, as specified in Table 4.1. While in monophonic playback of noise, loudness differences of less than 0.5 dB can be heard, it is safe to assume that a weak directional loudness fluctuation of less than 1 dB is normally inaudible. In this regard, loudness fluctuation should be no problem with both EPAD and AllRAD.
Concerning the directional mapping, EPAD produces a more strongly pronounced ripple, with \(\varvec{r}_\mathrm {E}\) indicating sounds on the horizon \(\vartheta _\mathrm {s}=\pm 90^\circ \) to be pulled upwards towards \(0^\circ \) more with EPAD (\(7^\circ \)) than with AllRAD (\(3^\circ \)). In terms of width, both EPAD and AllRAD exhibit the \({\approx }20^\circ \) average associated with max-\(\varvec{r}_\mathrm {E}\) weighting. However, EPAD also produces a greater fluctuation, and it widens up to about \(30^\circ \) degree for panning to the horizon \(\vartheta _\mathrm {s}=\pm 90^\circ \).
With the 9 loudspeakers of the ITU [44] \(4+5+0\) layout (horizontal ring: \(\varphi =0,\,\pm 30^\circ ,\,\pm 120^\circ \), upper ring at \(40^\circ \) elevation with \(\varphi =\pm 30^\circ ,\,\pm 120^\circ \)), it is not possible anymore to use EPAD with \(5\mathrm {th}\) order, which would be the optimal resolution for the front loudspeaker triplet. EPAD only supports orders up to \(\mathrm {N}=2\), and to lose level towards below-horizon directions, we can use the reduced set of 6 Slepian functions; alternatively all 9 spherical harmonics of \(\mathrm {N}=2\) would also be thinkable. For AllRAD, imaginary loudspeakers are inserted at the sides at azimuth/elevation \({\pm }75^\circ /27^\circ \), up at \(0^\circ /78^\circ \), back \(180^\circ /35^\circ \), and below \(0^\circ /-90^\circ \). It is reasonable to downmix the imaginary loudspeakers with a factor one for up, sides, back, and re-normalize the VBAP gain matrix, while disposing the signal of the imaginary loudspeaker below. AllRAD permits to use the order \(\mathrm {N}=5\), which resolves the frontal loudspeaker triplet much better for horizontal panning.
Figure 4.19 shows the result of max-\(\varvec{r}_\mathrm {E}\)-weighted \(2\mathrm {nd}\)-order EPAD and \(5\mathrm {th}\)-order AllRAD for the \(4+5+0\) layout using a vertical panning curve. While the perfectly constant loudness measure of EPAD might be favored over the almost \(+3\) dB loudness increase of front and back for AllRAD, AllRAD’s lower directional error, narrower width mapping, greater flexibility, and simplicity has often proven to be clearly superior in practice.
4.10 Practical Studio/Sound Reinforcement Application Examples
This section analyzes the application of 3D Ambisonic amplitude panning consisting of encoding and AllRAD to studio (with typical setups of 2 m radius) and sound reinforcement applications (for an audience of, e.g., 250 people). Application scenarios are sketched in [43], and various other examples are given below. Requirements of a constant loudness and width are analyzed below, and as sound reinforcement requires a particularly large sweet area, the \(\varvec{r}_\mathrm {E}\) vector model for off-center listening positions from Sect. 2.2.9 is used to depict the sweet area size.
The analysis of decoders above described loudness measures for panning on a circle. To observe them with panning across all directions in Figs. 4.20 and 4.22, world-map-like mappings using a gray-scale representation of the loudness and width measures are more reasonable. For several loudspeaker layouts, its axes are azimuth horizontally and zenith vertically, and the gray-scale map displays the loudness measure E in dB (left column) and the width measure \(\arccos \Vert \varvec{r}_\mathrm {E}\Vert \) in degrees (right column). As \(5\mathrm {th}\)-order max-\(\varvec{r}_\mathrm {E}\)-weighted AllRAD typically produces minor directional mapping errors, they aren’t explicitly shown in Figs. 4.20 and 4.22. However, the mappings of the sweet area size of plausible localization in Figs. 4.21 and 4.23 illustrate the usefulness of the systems for the listening areas hosting the number of listeners targeted for either the studio or the sound reinforcement application.
Figure 4.20 illustrates AllRAD’s tendency of attenuated signals in too closely spaced loudspeaker ensembles as in the front section of the ITU [44] \(4+5+0\). By contrast, for instance the mAmbA layout in Fig. 4.22 only has 8 loudspeakers on the horizon, and signals panned to the largely spaced below-horizon triangles tend to get louder. Moreover, it is easier for loudspeaker systems of many channels such as IEM CUBE, mAmbA, Lobby, and Ligeti Hall in Fig. 4.22 to yield smooth loudness and width mappings. Still, also with only a few loudspeakers, slight direction adjustment in the layout can fix some of the behavior, as with the IEM Production Studio, whose \({\pm }45^\circ \) loudspeakers in the elevated layer is superior to a \({\pm } 30^\circ \) spacing.
A hint for designing good decoders sometimes is idealization: often it is better to disregard the true loudspeaker setup locations and feed the decoder design with idealized positions instead. Hereby can one trade slight directional distortions for a more uniform loudness distribution. For instance at the IEM CUBE, loudspeaker locations of the horizontal ring could be idealized to \(30^\circ \) to get a smoother loudness mapping as the one shown in Fig. 4.22.
4.11 Ambisonic Decoding to Headphones
Typically, Ambisonic decoding to headphones can be done similarly as with loudspeakers, except that the loudspeaker signals are rendered to headphones by convolution with the head-related impulse responses (HRIRs) of the corresponding playback directions. Various databases of such HRIRs can be found, e.g., on the website SOFA-conventions.Footnote 2 This headphone decoding approach is classically using a small set of so-called virtual loudspeakers, as it is found in many places in technical literature, e.g. in the pioneering works of Jean-Marc Jot et al. [9] or Jérôme Daniel [10]. It is relevant in many important other works [18, 45, 46], the SADIE project,Footnote 3 and it is employed in Sect. 1.4.2 on first-order Ambisonics.
Coarse. However, as outlined in some research papers [9, 18, 46], these approaches have in common that low-order Ambisonic synthesis is problematic. It can either happen when inserting a dense grid of virtual-loudspeaker HRIRs that the Ambisonic smoothing attenuates high-frequency at frontal and dorsal directions. Or, what had been the solution for a long time, a coarse grid of virtual-loudspeaker HRIRs does not attenuate high frequencies, but still yields that spatial quality strongly depends on the particular grid layout or orientation [46]. An early paper by Jot [9] proposed to remove the time delays of the HRIR before Ambisonic decomposition, and then to re-insert the otherwise missing interaural time-delay afterwards, for any sound panned in Ambisonics, which unfortunately yields an object-based panning system rather than a scene-based Ambisonic system.
Dense. Some dense-grid approaches propose to keep the HRIR time delays, or if formulated in the frequency domain: the HRTF phases (head-related transfer function), and hereby stay in a scene-based Ambisonic format, while correcting spectral deficiencies by diffuse-field or interaural-covariance equalization [18, 47]. Finally, most recent solutions proposed by Jin, Sun, and Epain, [17, 48] or Zaunschirm, Schörkhuber, and Höldrich [20, 21] modify the HRIR time delays/HRTF phases but only above, e.g., 3 kHz, without any object-based re-insertion afterwards. The omission of high-frequency interaural time-delay/phase information is a reasonable trade off done in favor of a more important accuracy in spectral magnitude.
What does directional HRIR smoothing do to high frequencies? The geometrical theory of diffraction [49] suggests that HRIRs must always contain at least the delay to the ear of either the shortest direct path or the shortest indirect path via the surface of the head. For a spherical head model with the radius \(\mathrm {R}=0.0875\) m and speed of sound \(c=343\) \(\frac{\mathrm {m}}{\mathrm {s}}\), the Woodworth-Schlosberg formula [50] is composed of this consideration, see Fig. 4.24. The left ear receives a distant horizontal sound from the azimuth interval \(0\le \phi \le \frac{\pi }{2}\) as direct sound anticipated by \(\tau =-\frac{\mathrm {R}}{c}\sin \phi \), or for \(-\frac{\pi }{2}<\phi \le 0\) as an indirect sound delayed by \(\tau =-\frac{\mathrm {R}}{c}\,\phi \),
as plotted in Fig. 4.25a, and recognizable from dummy-head measurementsFootnote 4 in Fig. 4.25b.
If the HRIR is smoothed across an angular range, the time-delay curve gets spread across time as well, see Fig. 4.26. In this way, depending on whether the smoothing uses a continuous or discrete set of directions, one either obtains something like a comb filter or a sinc-shaped frequency response. This smoothing is least disturbing around the direct-ear side as shown left in Fig. 4.26, and, as the indirect ear also encounters high-frequency shadowing effects, it is most disturbing mainly for frontal and rear sounds at \(0^\circ \) or \(180^\circ \), as shown right in Fig. 4.26. The corresponding frequency responses are roughly exemplified with what third-order Ambisonics equivalent smoothing would do to either \(45^\circ \)-spaced HRIRs in Fig. 4.27a or \(15^\circ \)-spaced ones in Fig. 4.27b.
To get an upper frequency limit, it is insightful to work in the frequency domain where the HRIR is denoted head-related transfer function (HRTF). A simplified linearized-phase version around \(\phi =0\) uses \(\tau \approx \frac{\mathrm {R}}{c}\;\phi \), and the resulting in the Fourier transform with \(\omega =2\pi \,f\) is
To represent it by circular or spherical harmonics transformation limited to the order \(\mathrm {N}\), a maximum phase change represented by the harmonic \(e^{\mathrm {i}\mathrm {N}\phi }\) implies that we can only resolve the phase up to \(\frac{\omega }{c}\mathrm {R}\le \mathrm {N}\), hence the range of accurate operation is limited in frequency
As high-frequency HRTF phase evolves more rapidly over the angle as what the finite order can represent, this typically yields attenuation of the high frequencies when obtaining circular/spherical harmonics coefficients by transformation integral.
Directional smoothing of the discrete directional HRTFs causes relevant spectral problems, regardless of whether directional smoothing is done by Ambisonics, VBAP, MDAP. Mainly the geometric delay in the HRIRs is responsible for the emerging comb-filter or low-pass behavior. One could pull out the linear phase trend above the frequency limit and re-insert it, but is re-insertion necessary?
4.11.1 High-Frequency Time-Aligned Binaural Decoding (TAC)
As a pre-requisite for their binaural Ambisonic decoders, Schörkhuber et al. [21] tested, above which frequency the removal of the HRTF linear phase trend remains inaudible in direct HRTF-based rendering without panning or smoothing. In fact, most of their listeners could not distinguish the absence of the linear phase trend when removed above 3 kHz for various sound examples (drums, speech, pink noise, rendered at directions \(10^\circ \), \(-45^\circ \), \(80^\circ \), \(-130^\circ \)). They had their subjects compare the result to a reference with unaltered HRTFs, and the result is analyzed in Fig. 4.28.
By this finding, it is possible to split up each of the \(2\times 1\) HRIRs \(\varvec{h}(t,\varvec{\theta })\) into an unaltered low-pass band and a time-aligned high-pass band to unify the high-frequency HRIR delay
The time delay model \(\tau (\phi )\) uses the angle to the left/right ear on the positive/negative y axis, so \(\arccos \pm \theta _\mathrm {y}\), but shifted by \(90^\circ \), hence \(\phi =\pm \arcsin \theta _\mathrm {y}\).
This removal allows use all available HRIRs of dense measurement sets for binaural synthesis of high accuracy, using a suitable linear Ambisonic decoder such as AllRAD. Assuming the resulting modified left and right HRIR for all directions are denoted as \(2\times \mathrm {L}\) matrix \(\varvec{\hat{H}}(t)=[\varvec{\hat{h}}(t,\varvec{\uptheta }_1),\,\dots ,\,\varvec{\hat{h}}(t,\varvec{\uptheta }_\mathrm {L})]^\mathrm {T}\), the \(2\times (\mathrm {N}+1)^2\) filter set for decoding every of the Ambisonic channels to the ears becomes:
Results achieved by a pseudo-inverse decoding to hereby time-aligned HRIRs using \(\mathrm {R}=0.085\) cm with \(\mathrm {N}=3\) from the 2702-directions Cologne HRIRsFootnote 5 is shown in Fig. 4.29. The resulting polar patterns (ta) clearly outperform the linear decomposition (lin) at frequencies above 2kHz in representing the original HRTFs (max).
4.11.2 Magnitude Least Squares (MagLS)
Alternative to high-frequency time delay disposal, Schörkhuber et al. present an optimum-phase approach [21] that disregards phase match in favor of an improved magnitude match above cutoff. Formulated exemplarily for the left ear, across every HRTF direction \(\varvec{\uptheta }_l\), and for every discrete frequency \(\omega _k\), with \(h_{l,k}=h(\varvec{\uptheta }_l,\omega _k)\), this becomes
Typically, one would need to solve magnitude least squares or magnitude squares least squares tasks with semidefinite relaxation, see Kassakian [51].
In practice, however, results turn out to be perfect already with an iterative combination of the reconstructed phase \(\hat{\phi }_{l,k-1}\) from the previous frequency \(\omega _{k-1}\) with the HRTF magnitude \(|h_{k,l}|\) of the current frequency \(\omega _k\), before a linear decomposition thereof into spherical harmonic coefficients \(\varvec{\hat{h}}_{\mathrm {SH},k}\).
Every frequency below cutoff \(\omega _{k}<2\pi f_\mathrm {N}\) just uses the linear least-squares spherical harmonics decomposition with the left-inverse of the spherical harmonics \(\varvec{Y}_\mathrm {N}\) sampled at the HRTF measurement nodes,
Continuing with the first frequency above/equal to cutoff \(\omega _k\ge 2\pi f_\mathrm {N}\), the algorithm proceeds as:
and then moves to the next frequency \(k\leftarrow k+1\). The results are typically transformed back to time domain to get a real-valued impulse response for every spherical harmonic to the regarded ear.
The results of the MagLS approach (mls) outperform the time-alignment approach (ta) in the exemplary results shown for \(\mathrm {N}=3\) in Fig. 4.29, in particular at the highest frequencies, where sphere-model-based delay simplification is not sufficiently helpful, anymore.
4.11.3 Diffuse-Field Covariance Constraint
Also for both the above approaches that modify the high-frequency phase, Zaunschirm et al. [20] note that low order rendering degrades envelopment in diffuse fields, so that they introduce an additional covariance constraint as defined by Vilkamo [22]. It can be implemented as a \(2\times 2\) filter matrix equalizing the resulting frequency-domain diffuse-field covariance matrix to the one of the original HRTF datasets. On the main diagonal, this covariance matrix shows the diffuse-field ear sensitivities (left and right), and off-diagonal it contains the diffuse-field inter-aural cross correlation.
At every frequency, the \(2\times 2\) diffuse-field covariance matrix of the original, very-high-order spherical harmonics HRTF dataset \(\varvec{H}_\mathrm {SH}^\mathrm {H}\) of the dimensions \(2\times (\mathrm {M}+1)^2\) with (\(\mathrm {M}\gg \mathrm {N}\)) is given by
The derivation why this inner product of spherical harmonic coefficients represents the diffuse-field covariance is given in Appendix A.5. The low-order high-frequency modified HRTF coefficient set \(\varvec{\tilde{H}}_\mathrm {SH}\) of the dimensions \(2\times (\mathrm {N}+1)^2\) also has a \(2\times 2\) covariance matrix \(\varvec{\hat{R}}\) that will differ from the more accurate \(\varvec{R}\),
Its diffuse-field reproduction improves after equalizing \(\varvec{R}=\varvec{\hat{R}}\) by a \(2\times 2\) filter matrix,
Appendix A.5 shows the derivation of \(\varvec{M}\) based on [20, 22]. In summary, it is composed of factors obtained by Cholesky and SVD matrix decompositions
While MagLS binaural decoding with orders higher than 2 or 3 does not require covariance correction, the correction enhances the decorrelation of the ear signals for \(1\mathrm {st}\) to \(2\mathrm {nd}\) order reproduction, as shown in Fig. 4.30.
4.12 Practical Free-Software Examples
4.12.1 Pd and Circular/Spherical Harmonics
Similar as in the example section on first-order encoding and decoding in pure data (Pd), Fig. 4.31 shows \(3\mathrm {rd}\)-order 2D Ambisonic encoding and decoding for an octagon loudspeaker layout. The implementation [mtx_circular_harmonics] of the circular harmonics is used from the iemmatrix library, and the numbers for \(\frac{180}{\pi }=57.29\) and \(a_m=\cos \frac{\pi m}{2\cdot (\mathrm {N}+1)}\) were pre-calculated. Note the similarity to the first-order 2D example of Fig. 1.13, to which the main change is the use of the circular harmonics matrix object.
For decoding to headphones, programming in Pd also looks rather similar as in the first-order example in Fig. 1.14, only more HRIRs matching the respective loudspeaker positions need to be employed. To work in 3 dimensions, programming in Pd would also be similar as in the corresponding first-order example of Fig. 1.15, using the matrix object [mtx_spherical_harmonics]. Typically, pre-calculated decoders including AllRAD and max-\(\varvec{r}_\mathrm {E}\) are used and loaded by, e.g., [mtx D.mtx] into Pd to keep programming simple.
4.12.2 Ambix Encoder, IEM MultiEncoder, and IEM AllRADecoder
For encoding single- or multi-channel signals into Ambisonics, there are the , or VST plugins available from Kronlachner’s ambix plugin suite or the IEM MultiEncoder from the IEM plugin suite. As exemplarily shown in Fig. 4.32, the multi encoder allows to encode channel-based multi-channel audio material, where channel-based [52] typically refers to each channel of the multi-channel material meant to be played back on a separate loudspeaker of clearly defined direction, cf. [44]. Elsewhere, the embedding of virtual playback directions can also be found referred to as beds or virtual panning spots.
The IEM AllRADecoder permits to manually enter or import the loudspeaker coordinates and channel indices, with the coordinates specified by the azimuth and elevation angle in degrees, as exemplified for the IEM production studio in Fig. 4.33. The figure also shows that just entering the pure \(5+7+0\) layout would produce an error message Point of origin not within convex hull. Try adding imaginary loudspeakers.
By adding an imaginary loudspeaker below whose signal is typically omitted, see Fig. 4.34, it becomes geometrically valid to calculate and employ the resulting decoder, however it is better to also insert an imaginary loudspeaker at the rear whose signal is preserved by specifying the gain value 1, as shown in Fig. 4.35.
4.12.3 Reaper, IEM RoomEncoder, and IEM BinauralDecoder
Particularly relevant for head-phone-based listening, rendering of anechoic sounds will typically not externalize well, as it does not match the mental expectation of ordinary listening environments [53,54,55,56]. To avoid that this would rather cause an in-head localization than the desired external sound image, one can, e.g., use the IEM RoomEncoder plugin, see Fig. 4.36. It is based on an image-source room model and encodes first-order wall-reflections involving reflection factors and propagation delays together with the desired direct sound.
The MagLS approach for Ambisonic decoding, using the KU100 measurements from Cologne Applied Science University and (optionally) their headphone equalization curves is implemented by the IEM BinauralDecoder, see Fig. 4.37.
In combination of both, IEM RoomEncoder and IEM BinauralDecoder with an Ambisonics-encoded single-channel sound (e.g. using ambix_encoder), one can simply try to place the source and receiver together in the symmetry plane of the room, and then to slightly shift one of both sideways to see how externalization improves by slight asymmetry in the ear signals.
Notes
- 1.
In detail, this follows from .
- 2.
- 3.
- 4.
Data HRIR_CIRC360.sofa from http://sofacoustics.org/data/database/thk.
- 5.
Data HRIR_L2702.sofa from http://sofacoustics.org/data/database/thk.
References
J.S. Bamford, Ambisonic sound for the masses (1994)
D.H. Cooper, T. Shiga, Discrete-matrix multichannel stereo. J. Audio Eng. Soc. 20(5), 346–360 (1972)
P. Felgett, Ambisonic reproduction of directionality in surround-sound systems. Nature 252, 534–538 (1974)
M.A. Gerzon, The design of precisely coincident microphone arrays for stereo and surround sound, in prepr. L-20 of 50th Audio Engineering Society Convention (1975)
P. Craven, M.A. Gerzon, Coincident microphone simulation covering three dimensional space and yielding various directional outputs, U.S. Patent, no. 4,042,779 (1977)
J.S. Bamford, An analysis of ambisonic sound systems of first and second order, Master’s thesis, University of Waterloo, Ontario (1995)
D.G. Malham, A. Myatt, 3D sound spatialization using ambisonic techniques. Comput. Music J. 19(4), 58–70 (1995)
M.A. Poletti, The design of encoding functions for stereophonic and polyphonic sound systems. J. Audio Eng. Soc. 44(11), 948–963 (1996)
J.-M. Jot, V. Larcher, J.-M. Pernaux, A comparative study of 3-d audio encoding and rendering techniques, in 16th AES Conference (Rovaniemi, 1999)
J. Daniel, Représentation des champs acoustiques, application à la transmission et à la reproduction de scènes sonores complexes dans un contexte multimédia, Ph.D. dissertation, Université Paris 6 (2001)
D.B. Ward, T.D. Abhayapala, Reproduction of a plane-wave sound field using an array of loudspeakers. IEEE Trans. Speech Audio Process. 9(6), 697–707 (2001)
G. Dickins, Sound field representation, reconstruction and perception, Master’s thesis, Australian National University, Canberra (2003)
A. Sontacchi, Neue Ansätze der Schallfeldreproduktion, Ph.D. dissertation, TU Graz (2003)
M.A. Poletti, Robust two-dimensional surround sound reproduction for nonuniform loudspeaker layouts. J. Audio Eng. Soc. 55(7/8), 598–610 (2007)
F. Zotter, H. Pomberger, M. Noisternig, Energy-preserving ambisonic decoding. Acta Acust. United Acust. 98(11), 37–47 (2012)
J.-M. Batke, F. Keiler, Using vbap-derived panning functions for 3d ambisonics decoding, in 2nd International Symposium on Ambisonics and Spherical Acoustics (Paris, 2010)
D. Sun, Generation and perception of three-dimensional sound fields using higher order ambisonics (University of Sydney, School of Electrical and Information Engineering, 2013). Ph.D thesis
Z. Ben-Hur, F. Brinkmann, J. Sheaffer, S. Weinzierl, B. Rafaely, Spectral equalization in binaural signals represented by order-truncated spherical harmonics. J. Acoust. Soc. Am. 141(6) (2017)
F. Brinkmann, S. Weinzierl, Comparison of head-related transfer functions pre-processing techniques for spherical harmonics decomposition, in AES Conference AVAR (Redmont, 2018)
M. Zaunschirm, C. Schörkhuber, R. Höldrich, Binaural rendering of ambisonic signals by head-related impulse response time alignment and a diffuseness constraint. J. Acoust. Soc. Am. 143(6), 3616–3627 (2018)
C. Schörkhuber, M. Zaunschirm, R. Höldrich, Binaural rendering of ambisonic signals via magnitude least squares, in Fortschritte der Akustik - DAGA (Munich, 2018)
J. Vilkamo, T. Bäckström, A. Kuntz, Optimized covariance domain framework for time-frequency processing of spatial audio. J. Audio Eng. Soc. 61(6), 403–411 (2013)
F.W.J. Olver, R.F. Boisvert, C.W. Clark (eds.), NIST Handbook of Mathematical Functions (Cambridge University Press, Cambridge, 2000), http://dlmf.nist.gov. Accessed June 2012
J. Daniel, J.-B. Rault, J.-D. Polack, Acoustic properties and perceptive implications of stereophonic phenomena, in AES 6th International Conference: Spatial Sound Reproduction (1999)
M. Frank, How to make ambisonics sound good, in Forum Acusticum, Krakow (2014)
M. Frank, F. Zotter, Extension of the generalized tangent law for multiple loudspeakers, in Fortschritte der Akustik - DAGA (Kiel, 2017)
M. Frank, A. Sontacchi, F. Zotter, Localization experiments using different 2d ambisonic decoders, in 25th Tonmeistertagung (2008)
P. Stitt, S. Bertet, M.v. Walstijn, Off-centre localisation performance of ambisonics and hoa for large and small loudspeaker array radii. Acta Acust. United Acust. 100(5), 937–944 (2014)
M. Frank, F. Zotter, Exploring the perceptual sweet area in ambisonics, in AES 142nd Convention (2017)
ISO 31-11:1978, Mathematical signs and symbols for use in physical sciences and technology (1978)
ISO 80000-2, quantities and units? Part 2: Mathematical signs and symbols to be used in the natural sciences and technology (2009)
R.H. Hardin, N.J.A. Sloane, Mclaren’s improved snub cube and other new spherical designs in three dimensions. Discret. Comput. Geom. 15, 429–441 (1996), http://neilsloane.com/sphdesigns/dim3/
M. Gräf, D. Potts, On the computation of spherical designs by a new optimization approach based on fast spherical fourier transforms. Numer. Math. 119 (2011), http://homepage.univie.ac.at/manuel.graef/quadrature.php
R.S. Womersley, Chapter, Efficient spherical designs with good geometric properties in Contemporary Computational Mathematics - A Celebration of the 80th Birthday of Ian Sloan (Springer, Berlin, 2018), pp. 1243–1285
A. Solvang, Spectral impairment of 2d higher-order ambisonics. J. Audio Eng. Soc. 56(4) (2008)
M.A. Gerzon, G.J. Barton, Ambisonic decoders for HDTV, in prepr. 3345, 92nd AES Convention, Vienna, 1992
J. Daniel, J.-B. Rault, J.-D. Polack, Ambisonics encoding of other audio formats for multiple listening conditions, in prepr. 4795, 105th AES Convention (San Francisco, 1998)
M.A. Poletti, A unified theory of horizontal holographic sound systems. J. Audio Eng. Soc. 48(12), 1155–1182 (2000)
M.A. Poletti, Three-dimensional surround sound systems based on spherical harmonics (J. Audio Eng, Soc, 2005)
F. Zotter, M. Frank, All-round ambisonic panning and decoding (J. Audio Eng, Soc, 2012)
F. Zotter, M. Frank, Ambisonic decoding with panning-invariant loudness on small layouts (allrad2), in 144th AES Convention, prepr. 9943 (Milano, 2018)
R. Pail, G. Plank, W.-D. Schuh, Spatially restricted data distributions on the sphere: the method of ortnonormalized functions and applications. J. Geodesy 75, 44–56 (2001)
M. Frank, A. Sontacchi, Case study on ambisonics for multi-venue and multi-target concerts and broadcasts. J. Audio Eng. Soc. 65(9) (2017)
ITU, Recommendation BS.2051: Advanced sound system for programme production. ITU (2018)
M. Noisternig, A. Sontacchi, T. Musil, R. Höldrich, “A 3d ambisonic based binaural sound reproduction system, in 24th AES Conference (Banff, 2003)
B. Bernschütz, A. V. Giner, C. Pörschmann, J. Arend, Binaural reproduction of plane waves with reduced modal order. Acta Acust. United Acust. 100(5) (2014)
S. Delikaris-Manias, J. Vilkamo, Adaptive mixing of excessively directive and robust beamformers for reproduction of spatial sound, in Parametric Time-Frequency Domain Spatial Audio, ed. by V. Pulkki, S. Delikaris-Manias, A. Politis (Wiley, New Jersey, 2017)
C.T. Jin, N. Epain, D. Sun, Perceptually motivated binaural rendering of higher order ambisonic sound scenes, in Fortschritte der Akustik AIA-DAGA (Merano, 2013). March
J.B. Keller, Geometrical theory of diffraction. J. Acoust. Soc. Am. 52(2), 116–130 (1962)
R.S. Woodworth, H. Schlosberg, Experimental Psychology (Holt, Rinehart and Winston, 1954)
P.W. Kassakian, Convex approximation and optimization with applications in magnitude filter design and radiation pattern synthesis, Ph.D. dissertation, EECS Department, University of California, Berkeley (2006)
ITU, Recommendation BS.2076: Audio Definition Model (ITU, 2017)
S. Werner, G. Götz, F. Klein, Influence of head tracking on the externalization of auditory events at divergence between synthesized andf listening room using a binaural headphone system, in prepr. 9690, 142nd AES Convention (Berlin, 2017)
F. Klein, S. Werner, T. Mayenfels, Influences of tracking on externalization of binaural synthesis in situations of room divergence. J. Audio Eng. Soc. 65(3), 178–187 (2017)
J. Cubick, Investigating distance perception, externalization and speech intelligibility in complex acoustic environments, Ph.D. dissertation, DTU Copenhagen (2017)
G. Plenge, Über das Problem der Im-Kopf-Lokalisation. Acustica 26(5), 241–252 (1972)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2019 The Author(s)
About this chapter
Cite this chapter
Zotter, F., Frank, M. (2019). Ambisonic Amplitude Panning and Decoding in Higher Orders. In: Ambisonics. Springer Topics in Signal Processing, vol 19. Springer, Cham. https://doi.org/10.1007/978-3-030-17207-7_4
Download citation
DOI: https://doi.org/10.1007/978-3-030-17207-7_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-17206-0
Online ISBN: 978-3-030-17207-7
eBook Packages: EngineeringEngineering (R0)