Signal Flow and Effects in Ambisonic Productions

ThischapterpresentstheinternalworkingprinciplesofvariousAmbisonic 3D audio effects. No matter which digital audio workstation or processing software is used in a production, the general Ambisonic signal infrastructure is out-lined as an important overview of the signal processing chain. The effects presented are frequency-independent effects such as directional re-mapping (mirror, rotation, warping) and re-weighting (directional level modiﬁcation), and frequency-dependent effects such as widening/distance/diffuseness, diffuse reverberation


Fig. 5.1
Block diagram as in [2] χ N (t) = C c=1 y N (θ c ) s c (t). (5.1) Ambisonic surround-sound signal. Without decoding to a specific loudspeaker layout, the signal χ N of the Ambisonic bus might appear somewhat virtual. Nevertheless, it allows to be drawn as a surround-sound signal x(θ , t) whose amplitude can be evaluated and metered at any direction θ, anytime t, using the expansion into spherical harmonics x(θ , t) = y T N (θ) χ N (t). (5.2) Upmixing. As first-order recordings are not highly resolved, there are several works on algorithms with resolution enchancement strategies that re-assign time-frequency bins more sharply to directions. A good summary on such input-specific insert effects has been given in the book [5,6]. Available solutions are DirAC, HOA-DirAC, COMPASS, Harpex.
Higher order. Higher-order microphones require more of the acoustic holophonic and holographic basics than presented above, yielding pre-processing filters as inputspecific insert effect. Higher-order recording is dealt with in the subsequent Chap. 6 after the derivation of the wave equation and the solutions in the spherical coordinate system.

Insert effects: Generic re-mapping and leveling.
One can imagine that it should be possible to manipulate the surround-sound signal x(θ , t) in various ways. For instance, effects based on directional re-mapping can take signals out of their original directional range and place them back into the Ambisonic signal at manipulated directions. Also, directions can be altered in amplitude levels so that, for instance, signals at directions with unwanted content undergo attenuation. Many more useful effects are presented below.
Decoding to loudspeakers/headphones. To map the modified Ambisonic signalχÑ to loudspeakers or headphones, an Ambisonic decoder is needed as discussed in the previous chapter. For decoding to headphones, it should be considered to either take only as few HRIR directions to decode to as possible [7,8], before signals get convolved and mixed to avoid coloration at frontal directions where delays in the HRIRs change too strongly over the direction to get resolved properly [9,10]. Alternatively, the approach in [11] proposed removal of the HRIR delay at high frequencies and diffuse-field covariance equalization by a 2 × 2 filter system, cf. Sect. 4.11.

Embedding of Channel-Based, Spot-Microphone, and First-Order Recordings
Microphone arrays for near-coincident higher-order Ambisonic recording based on holography will be discussed in the subsequent chapter. Nevertheless it possible to use (i) spot and close microphones and encode their direction into the directional panorama, (ii) first-order microphone arrays to fill the Ambisonic channels only up to the first order, (iii) more classical non-coincident or equivalence-stereophonic microphone arrays whose typical playback directions are encoded in Ambisonics.
The study by Kurz et al. [12] investigated how recordings by first-order encoding of the soundfield microphone ST450 and the Oktava MK4012 tetrahedral microphone arrays compare to the equivalence-stereophonic ORTF, see Fig. 5.2. In addition, ORTF-like mapping of the Oktava MK4012's frontal signals to the ±30 • directions in 5th order was tested instead of its first-order encoding. Figure 5. 3 shows the results of the study in terms of the perceptual attributes localization and spatial depth. It seems that a mixture between ORTF-like 5th-order encoding and first-order encoding of the MK4012 microphone achieves preferred results, while the first-order encoded output of the ST450 Soundfield microphone is rated fair in both attributes, the ORTF microphone only ranked well terms of localization. The results of the ST450 were independent from its orientation, whereas the localization of the first-order-encoded MK4012 was found to be dependent on the orientation. This dependency of the MK4012 is because its microphones are not sufficiently coincident.
As a bottom line of the detailed analysis, one should be encouraged to keep using classical microphone techniques where known to be appropriate and encode their output in higher-order beds or virtual playback directions. However, this should be done with the awareness that stereophonic recording won't necessarily work for a   [12] for different microphones, orientations, and playback processing large audience area, for which the robustness in directional mapping of equivalencebased techniques seem to be attractive. An interesting layout is, e.g., specified in Hendrickx et al's work [13], in which they use an equivalence-stereophonic six-channel microphone array. Another interesting idea was used in the ICSA Ambisonics Summer School 2017. A height layer of suitably inclined super-cardioid microphones was added at small vertical distance to the horizontal microphone layer, similarly as the upwards-pointing directional microphones suggested in Lee's and Wallis' work [14,15] to provide sufficiently attenuated horizontal sounds to the height layer.  Binaural rendering study using surround-with-height material. In another study by Lee, Frank, and Zotter [16], static headphone-based rendering of channel-based recordings was compared using direct HRIR-based rendering or Ambisonics-based binaural rendering, cf. Sect. 4.11. The aim was to find whether differently recorded material could be rendered at high quality via binaural Ambisonics renderers, or under which settings this would imply quality degradation when compared to channel-based binaural rendering. The results from the half of the listening experiment done in Graz is analyzed in Fig. 5.4, and the renderers compared were channel-based "ref", a low-passed mono anchor designed to have poor quality "0", a first-order binaural Ambisonic renderer "1c" based on a cube layout with loudspeakers at ±90 • , ±270 • azimuth and ±35.3 • elevation, and MagLS binaural Ambisonic renderers at the orders "1", "2", "3", "4", and "5". Obviously, for orders 2 and above, there is not much quality degradation compared to the reference channel-based binaural rendering. The spatial quality cannot be distinguished from the reference for MagLS with Ambisonic orders 3 and above, and the timbral qualities cannot be distinguished for Ambisonic orders 2 and above.
While this result simplifies the practical requirements for headphone playback remarkably, it can be supposed that due to the limited sweet spot size, loudspeaker playback would still require higher orders, typically.

Frequency-Independent Ambisonic Effects
Many frequency-and time-independent Ambisonic effects are based on the aforementioned re-mapping of directions and manipulation of directional amplitudes, see e.g. Kronlachner's thesis, [2,17]; advanced effects can be found in [18]. In general, the surround-sound signal allows to be manipulated by any thinkable transformation that modifies the directional mapping and amplitude of its contents. The formulation expresses an operation that is able to pick out every direction θ of the input signal, weight its signal by a directional gain g(θ), and re-map it to a new directionθ = τ {θ } within a transformed signalx. To find out how this affects Ambisonic signals, we write both x andx as Ambisonic signals and use S D yÑ(θ) y T N (θ ) dθ = I by integrating over yÑ(θ )dθ S D to getχÑ(t) on the leftχÑ to find the transformed signals being just re-mixed Ambisonic input signals by the matrix T (note that it might require an increased Ambisonic orderÑ). Numerical evaluation of the matrix T = S D yÑ(θ ) g(θ) yÑ(θ ) dθ is best done by using a highenough t-design = [θ l ] to discretize the integration variableθ = τ {θ }. For the discretized input directions θ , an inverse mapping θ = τ −1 {θ} of the output direction must exist (directional re-mapping must be bijective), so that we can write This formalism is generic and covers simplistic and more complex tasks. It helps understanding that every frequency-independent directional weighting and/or remapping is just re-mixing the Ambisonic signals by a matrix, as in Fig. 5.6a.
The ambix VST plugin suite implements several effects, e.g. in the VST plugins ambix_mirror, ambix_rotate, ambix_directional_loudness, ambix_warp. The sections below explain how these and other effects work inside.

Mirror
Mirroring does not actually require the generic re-mapping and re-weighting formalism from above, yet. The spherical harmonics associated with the Ambisonic channels are shown in Fig. 4.12 and upon closer inspection one recognizes their symmetries, see Fig. 5.5. To mirror the Ambisonic sound scene with regard to planes of symmetry, it is sufficient to sign-invert channels associated with odd-symmetric spherical harmonics as in Fig. 5.6b. Formally, the transform matrix consists of a diagonal matrix T = diag{c} only, with the corresponding sign-change sequence c. Up-down: For instance, spherical harmonics with |m| = n are even symmetric with regard to z = 0 (up-down), and from this index on, every second harmonic in m is. To flip up and down, it is therefore sufficient to invert the signs of odd-symmetric spherical harmonics with regard to z = 0; they are characterized by n + m being an odd number, or c nm = (−1) n+m .

Left-right:
The sin ϕ-related spherical harmonics with m < 0 are odd-symmetric with regard to y = 0 (left-right); therefore sign-inverting the signals with the index m < 0 exchanges left and and right in the Ambisonic surround signal, i.e. c nm = (−1) m<0 . Front-back: Every odd-numbered m > 0 is odd-symmetric with regard to x = 0 (front-back), and so is every even-numbered harmonic with m < 0. Inverting the sign of these harmonics, c nm = (−1) m+(m<0) , flips front and back in the Ambisonic surround signal.

3D Rotation
Rotation can be expressed by a general rotation matrix R consisting of a rotation around z by χ , around y by ϑ, and again around z by ϕ, see Fig. 5.7. This rotation matrix maps every direction θ to a rotated directionθ: Using this as a transform rule τ (θ ) = R θ with neutral gain g(θ) = 1, we find the transform matrix by the inverse mapping θ = R Tθ as Using theL directions of a t ≥ 2N-design is sufficient to sample the harmonics accurately. With the resulting T , rotation is implemented as in Fig. 5.6a. There is plenty of potential for simplification: As only the spherical harmonics of a given order n are required to re-express a rotated spherical harmonic of the same order n, T is actually block diagonal T = blk diag n {T n }, and within each spherical harmonic order, the integral could be more efficiently evaluated using a smaller t ≥ 2n-design. Moreover, there are various fast and recursive ways to calculate the entries of T as in [19][20][21][22][23][24][25] and implemented in most plugins. And yet, in practice a naïve implementation can be fast enough and pragmatic.
Rotation around z. One special case of rotation is important and particularly simple to implement. A directional encoding in azimuth always either is equal to m (ϕ s ) in 2D, or contains it in 3D. For m > 0, the azimuth encoding m (ϕ s ) depends on cos mϕ s , and its negative-sign version −m (ϕ s ) depends on sin(|m|ϕ s ). The encoding angle can be offset by the trigonometric addition theorems. They can be written as a matrix: By this, any Ambisonic signal, be it 2D or 3D, can be rotated around z by the matrices R(mϕ) for the signal pairs with ±m.
(5.8) Figure 5.8a shows the processing scheme implementing only the non-zero entries of the associated matrix operation T . Combined with a fixed set of 90 • rotations around y (read from files), it can be used to access all rotational degrees of freedom in 3D [20]. The rotation effect is one of the most important features when using head-tracked interactive VR playback for headphones. Here, rotation counteracting the head movement has the task to support the impression of a static image of the virtual outside world.

Directional Level Modification/Windowing
What might be most important when mixing is the option to treat the gains of different directions differently: it might be necessary to attenuate directions of uninteresting or disturbing content while boosting directions of a soft target signal. For such a Rotation around z and Ambisonic widening/diffuseness apply simple 2 × 2 rotation matrices/filter matrices to each Ambisonic signal pair χ n,m χ n,−m of the same order n. Note that the order of the input/output channels plotted is not the typical ACN sequence to avoid crossing connections and hereby simplify the diagram manipulation there is a neutral directional re-mappingθ = θ and the transform to define the matrix T that is implemented as in Fig. 5.6a remains In the simplest version, as implemented in ambix_directional_loudness, the gain function just consists of two mutually exclusive regions, e.g. within a region of diameter α around the direction θ g , and a complementary region outside, with separately controlled gains g in and g out : where u(x) represents the unit-step function that is 1 for x ≥ 0 and 0 else. Note that the Ambisonic order of this effect will need to be larger to be lossless. However, with reasonably chosen sizes α and gain ratios g in /g out , the effect will nevertheless produce reasonable results. Figure 5.9 shows a window at azimuth and elevation at 22.5 • with an aperture of 50 • using g in = 1 and g out = 0 and the order of N = 10 with a grid of encoded directions to illustrate the influence of the transformation.
For reference: entries of the tensor used to analytically re-expand the product of two spherical functions x(θ ) g(θ) given by their spherical harmonic coefficients χ nm , γ nm are called Gaunt coefficients or Clebsh-Gordan coefficients [6,26].

Warping
Gerzon [27, Eq. 4a] described the effect dominance that is meant to warp the Ambisonic surround scene to modify how vitally the essential parts in front of the scene are presented. Warping wrt. a direction. For mathematical simplicity, we describe this bilinear warping with regard to the z direction. To warp with regard to the frontal direction, one first rotates the front upwards, applies the warping operation there, and then rotates back. The bilinear warping modifies the normalized z coordinate ζ = cos ϑ = θ z so that signals from the horizon ζ = 0 are pulled toζ = α, while keeping for the polesζ = ±1 what was originally there ζ = ±1. Hereby, the surround signal gets squeezed towards or stretched away from the zenith, or when rotating before and after: towards/from any direction. The integral can be discretized and solved by a suitable t-design as before, only that for lossless operation, the output orderÑ must be higher than the input order N. We get a matrix T that is implemented as in Fig. 5.6a is computed by The inverse mapping yields and it modifies the coordinates of the t-design inserted forθ l = θ l = [θ x,l , θ y,l , θ z,l ] T withζ l = θ z,l accordingly 14) The gain g(ζ ) of the generic transformation is useful to preserve the loudness of what becomes wider and therefore louder in terms of the E measure after re-mapping. To preserve loudness, the resulting surround signal is divided by the square root of the stretch applied, which is related to the slope of the mapping by 1 g = dζ dζ . Expressed as de-emphasis gain, we get Figure 5.10 shows warping of the horizontal plane by 20 • downwards, using the test image parameters as with windowing; de-emphasis attenuates widened areas.
In the same fashion, Kronlachner [17] describes another warping curve that warps with regard to fixed horizontal plane and pole, either squeezing or stretching the content towards or away from the horizon, symmetrically for both the upper and lower hemispheres (second option of the ambix_warp plugin).

Parametric Equalization
There are two ways of employing parametric equalizers to Ambisonic channels. Either a single-/multi-channel input of a mono-encoder or a multiple-input encoder is filtered by parametric equalizers. Or each of the Ambisonic signal's channels is filtered by the same parametric equalizer, see Fig. 5.11a.  Fig. 5.11 Block diagram of processing that commonly and equally affects all Ambisonic signals, such as parametric equalization and dynamic processing (compression), without recombining the signals Bass management is often important to not overdrive smaller loudspeaker systems of, e.g., a 5th-order hemispherical playback system with subwoofer signals: All 36 channels from the Ambisonic bus can be sent to a decoder section, in which frequencies below 70-100 Hz are high-cut by a 4th-order filter before running through the Ambisonic decoder, while the first channel from the Ambisonics bus alone, the omnidirectional channel, is being sent to a subwoofer section, in which a 4th-order filter high-cut removes the high frequencies above 70-100 Hz before the signal is sent to the subwoofers. If the playback system is time-aligned between subwoofer and higher frequencies, the 4th-order crossovers should be Linkwitz-Riley filters (either squared Butterworth high-pass or low-pass filters) to preserve phase equality [28].
For more information on parametric equalizers, the reader is referred to Udo Zölzer's book on Digital audio effects [29].

Dynamic Processing/Compression
Individual compression of different Ambisonic channels would destroy the directional consistency of the Ambisonics signal. Consequently, dynamic processing should rather affect the levels of all Ambisonic channels in the same way. As it typically contains all the audio signals, it is useful to have the first, omnidirectional Ambisonic channel control the dynamic processor as side-chain input, see Fig. 5.11b. For more information on dynamic processing, the reader is referred to Udo Zölzer's book on Digital audio effects [29].
Moreover, it is sometimes useful to compress the vocals of a singer separately. To this end, the directional compression would first extract a part of the Ambisonic signals by a directional window, creating one set of Ambisonic signals without the directional region of the window, and another one exclusively containing it. The compression is applied on the resulting window signal before re-combining it with the rest signals.

Widening (Distance/Diffuseness/Early Lateral Reflections)
Basic widening and diffuseness effects can be regarded as being inspired by Gerzon [30] and Laitinen [31] who proposed to apply frequency-dependent panning filters, mapping different frequencies to directions dispersed around the panning direction. The resulting effect is fundamentally different from and superior to increasing the spread in frequency-independent MDAP with enlarged spread or Ambisonics with reduced order, which could yield audible comb filtering.
To apply this technique to Ambisonics, Zotter et al. [32] proposed to employ a dispersive, i.e. frequency-dependent, rotation of the Ambisonic scene around the zaxis as in Eq. (5.8) by the matrix R as described above and in Fig. 5.8b, using 2 × 2 matrices of filters to implement the frequency-dependent argument mφ cos ωτ R(mφ cos ωτ ) = cos(mφ cos ωτ ) sin(mφ cos ωτ ) − sin(mφ cos ωτ ) cos(mφ cos ωτ ) , (5.16) whose parametersφ and τ allow to control the magnitude and change rate of the rotation with increasing frequency. How this filter matrix is implemented efficiently was described in [33], where a sinusoidally frequency-varying pair of functions was found to correspond to the sparse impulse responses in the time domain allowing for truncation to just a few terms in q, typically 11 taps between −5 ≤ q ≤ 5 or fewer, and hereby an efficient implementation. For the implementation of the filter matrix, for each degree m, the value α = mφ. (It might be helpful to be reminded of a phase-modulated cosine and sine from radio communication, whose spectra are the same functions as this impulse response pair.) As the algorithm places successive frequencies at slightly displaced directions, the auditory source width increases. Moreover, the frequency-dependent part causes a smearing of the temporal fine-structure in the signal. In [34], it was found that implementations discarding the negative values of q, i.e. keeping q ≥ 0 sound more natural and still exhibit a sufficiently strong effect. Time constants τ around 1.5 ms yield a widening effect, and a diffuseness and distance impression is obtained with τ around 15 ms. The parameterφ is adjustable between 0 (no effect) and larger values.
Beyond 80 • the audio quality starts to degrade. The use as diffusing effect has turned out to be useful as simple simulation of early lateral reflections, because most parts of the spectrum are played back near the reversal points ±φ of the dispersion contour. For naturally sounding early reflections, additional shelving filters introducing attenuation of high frequencies prove useful. Figures 5.12 and 5.13 show experimental ratings of the perceived effect strength (width or distance) of the above algorithm in [34], which was implemented as frequency-dependent (dispersive) panning on just a few loudspeakers L = 3, 4, 5, 7 evenly arranged from −90 • to 90 • on the horizon at 2.5 m distance from the central listening position. Loudspeakers were controlled by a sampling decoder of the orders N = 1, 2, 3, 5 with the center of the max-r E -weighted panning direction at 0 • in front. The signal was speech and as a reference it used the frontal loudspeaker with the unprocessed signal "REF". The experiment tested the algorithm with both the symmetric impulse responses suggested by Eq. (5.18), and such truncated to their causal q ≥ 0-side, for a listening position at the center of the arrangement (bullet marker) and at 1.25 m shifted to the right, off-center (square marker). Figure 5.12 indicates for the widening algorithm with τ = 1.5 ms that the perceived width saturates above N > 2 at both listening positions. Despite the effect of the causal-sided Perceived width (left) and audio quality (right) of frequency-dependent dispersive Ambisonic rotation as widening effect using the setting τ = 1.5 ms, the Ambisonic orders N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions at the center (bullet marker) and half-way right off-center (square marker)

Fig. 5.13
Perceived width (left) and audio quality (right) of frequency-dependent dispersive Ambisonic rotation as distance/diffuseness effect using the setting τ = 15 ms, the Ambisonic orders N = 1, 2, 3, 5, and L = 3, 4, 5, 7 loudspeakers on the frontal semi-circle, with listening positions at the center (bullet marker) and half-way right off-center (square marker) implementation is weaker in effect strength, it highly outperforms the symmetric FIR implementation in terms audio quality (right diagram), while still producing a clearly noticeable effect when compared to the unprocessed reference (left diagram).
A more pronounced preference of the causal-sided implementation in terms of audio quality is found in Fig. 5.13 for the setting τ = 15 ms, where the algorithm is increasing the diffuseness or perceived distance for orders N > 2 at both listening positions.

Feedback Delay Networks for Diffuse Reverberation
Feedback delay networks (FDN, cf. [35,36]) can directly be employed to create diffuse Ambisonic reverberation. A dense response and an individual reverberation for every encoded source can be expected when feeding the Ambisonic signals directly into the inputs of the FDN.
As in Fig. 5.14, FDNs consists of a matrix A that is orthogonal A T A = I and should mix the signals of the feedback loop well enough to distribute them across all different channels to couple the resonators associated with the different delays τ i . These delays should not have common divisors to avoid pronounced resonance frequencies, and are therefore typically chosen to be related to prime numbers. Small delays are typically selected to be more closely spaced {2, 3, 5, . . . } ms to simulate a diffuse part with densely spaced response at the beginning, and long delays further apart often make the reverberation more interesting. Using unity factors as channel gains g τ i lo , g τ i mi , g τ i hi = 1 and any orthogonal matrix A, the reverberation time becomes infinite. For smaller channel gains, the FDN produces decaying output.
Reverberation is characterized by the exponentially decaying envelope 10 −3 t T 60 . For a single delay of the length τ i , the corresponding gain is g τ i with g = 10 − 3 T 60 . This factor with the corresponding exponent provides equal reverberation decay rate in every channel, and hereby exact control of the reverberation time. To make the effect sound natural, it is typical to adjust the gains within a high-mid-low filter set to decrease the reverberation towards higher frequency bands by the gains g τ i hi ≤ g τ i mi ≤ g τ i lo . The vector gathering the current sample for every feedback path is multiplied by the matrix A. For calculation in real-time, Rocchesso proposed to use a scaled Hadamard matrix A = 1 M H of the dimensions M = 2 k in [37]. It consists of ±1 entries only and hereby perfectly mixes the signal across the different feedbacks to create a diffuse set of resonances. What is more, this not only replaces the M × M multiply and adds of matrix multiplication multiplies by sums and differences, it is

Reverberation by Measured Room Impulse Responses and Spatial Decomposition Method in Ambisonics
The first-order spatial impulse response of a room at the listener can be improved by resolution enhancements of the spatial decomposition method (SDM) by Tervo [38], which is a broad-band version of spatial impulse response rendering (SIRR) by Merimaa and Pulkki [39,40]. For reliable measurements, typically loudspeakers are employed, and the typical measurement signals aren't impulses, but swept-sine signals that can are reverted to impulses by deconvolution. A room impulse response is typically sparse in its beginning whenever direct sound and early reflections arrive at the measurement location. Generally, it is likely that those arrival times in the early part do not coincide and are well separated from each other, so that one can assume their temporal disjointness at the receiver. From a room impulse response h(t) that complies with this assumption, for which there consequently is a direction of arrival (DOA) θ DOA (t) for every time instant, one could construct an Ambisonic receiver-directional room impulse response as in [41] , depending on the direction θ R at the receiver. This response can be transformed into the spherical harmonic domain by integrating it over y N (θ R ) dθ R S 2 , to get the set of N th -order Ambisonic room impulse responses A signal s(t) convolved by this vector of impulse responses theoretically generates a 3D Ambisonic image of the mono sound in the room of the measurement. This can be done, e.g., by the plug-in mcfx_convolver. Now there are two problems to be solved: (i) how to estimate θ DOA (t), (ii) how to deal with the diffuse part of h(t), when there are more sound arrivals at a time than one.
Estimation of the DOA. One could now just detect the temporal peaks of the room impulse response and assign the guessed evolution of the direction of arrival as suggested in [42], and hereby span the envelopment of the room impulse response. Alternatively, if the room impulse response was recorded by a microphone array as in [38], array processing can be used to estimate the direction of arrival θ DOA (t). For first-order Ambisonic microphone arrays, when suitably band-limited to the frequency range in which the directional mapping is correct, e.g. between 200 Hz and 4 kHz, the vector r DOA of Eq. (A.83) in Appendix A.6.2 yields a suitable estimatẽ .   The direct sound from the front is clearly visible, as well as strong early reflections from front and back, and equally distributed weak directions from the diffuse reverb.
Spectral decay recovery for higher-order RIRs. The second task mentioned above is that the multiplication of h(t) by y N [θ DOA (t)] to obtainh(t) degrades the spectral decay at higher orders. If there is no further processing, the resulting response typically exhibits a noticeable increased spectral brightness [38,41,43]. This unnatural brightness mainly affects the diffuse reverberation tail, where temporal disjointness is a poor assumption. There, the corresponding rapid changes of θ DOA (t) cause a strong amplitude modulation in the pre-processing of the late room impulse response at high Ambisonic orders. Typically, long decays of low frequencies leak into high frequencies, and hereby result in an erroneous spectral brightening of the diffuse tail. Figure 5.17 analyses the behavior in terms of an erroneous increase of reverberation time at high frequencies, especially when using high orders.
In order to equalize the spectral decay and hereby the reverberation time of the SDM-enhanced impulse response, there is a helpful pseudo-allpass property of the spherical harmonics for direct and diffuse fields, as described in Eqs.  (t, b). We can equalize the spectral sub-band decay for every band b and order n by targeting fulfillment of the pseudo-allpass property The formulation above relies on the correct spectral decay of the omnidirectional signal h 0 0 (t, b) =h 0 0 (t, b), which is unaffected by modulation. Correction is achieved by here, the expression E{| · | 2 } refers to estimation of the squared signal envelope.
Perceptual evaluation. Frank's 2016 experiments [44] measuring the area of the sweet spot also investigated the plausibility of reverberation created by their Ambisonically SDM-processed measurements at different order settings, N = 1, 3, 5. For Fig. 5.18b listeners indicated at which distance from the room's center they heard that envelopment began to collapse to the nearest loudspeakers. One can observe that rendering diffuse reverberation for a large audience benefits from a high Ambisonic order. Moreover, experiments in [43] revealed an improvement of the perceived spatial depth mapping, i.e. a clearer separation between foreground and background sound for the SDM-processed higher-order reverberation, cf. Fig. 1.21b.  The perceptual sweet spot size as investigated by Frank [44] for SDM processed RIRs cover an area in IEM CUBE that increases with the SDM order N chosen (black = 5th, gray = 3rd, light gray = 1st order Ambisonics). In comparison to panned direct sound, one should keep some distance to the loudspeakers to avoid breakdown of envelopment

Resolution Enhancement: DirAC, HARPEX, COMPASS
The concept of parametric audio processing [5] describes ways to obtain resolutionenhanced first-order Ambisonic recordings by parametric decomposition and rendering. One main idea is to decompose short-term stationary signals of a sound scene into a directional and a less directional diffuse stream. For synthesis of the directional part based on mono signals, it is clear how to obtain the most narrow presentations by amplitude panning or higher-order Ambisonic panning of consistent r E vector predictions as in Chap. 2. The synthesis of diffuse and enveloping parts based on a mono signal can require extra processing such as either widening/diffuseness effects or reverberation as in Sects. 5.5 and 5.6, which both also provide a directionally wide distribution of sound. Or more practically, the recording itself could deliver sufficiently many uncorrelated instances of the diffuse sound to be played back by surrounding virtual sources. Envelopment and diffuseness is based on providing a consistently low interaural covariance or cross correlation of sufficiently high decorrelation.

DirAC.
A main goal of DirAC (Directional Audio Coding [5]) is finding signals and parameters for sound rendering by analyzing first-order Ambisonic recordings. One variant is to use the intensity-vector-based analysis in the short-term Fourier transform (STFT), see also Appendix A.6.2: which can be treated similarly as the r E vector, regarding direction and diffuseness ψ = 1 − r DOA 2 .
Single-channel DirAC is Ville Pulkki's original way to decompose the W (t, ω) signal in the STFT domain into a directional signal √ 1 − ψ W (t, ω) that is synthesized by amplitude panning and a diffuse signal √ ψ W (t, ω) to be synthesized diffusely [45].
Virtual-microphone DirAC uses a first-order Ambisonic decoder to the given loudspeaker layout and time-frequency-adaptive sharpening masks increasing the focus of direct sounds, see Vilkamo [46] and [5,Ch. 6], or order e.g. Sect. 5.2.3. Playback of diffuse sounds benefits from an optional diffuseness effect.
HARPEX (high angular-resolution plane-wave expansion [47]) is Svein Berge's patented solution to optimally decode sub-band signals. It is based on the observation he made with Natasha Barrett that decoding to a tetrahedral loudspeaker layout is perceptually outperforming if the tetrahedron nodes are rotationally aligned with the sources of the recording. HARPEX accomplishes convincing diffuse and direct sound reproduction by decoding to a variably adapted virtual loudspeaker layout in every sub band. The layout is adaptively rotation-aligned with sources detected in the band. HARPEX is typically described using an estimator for direction pairs.

COMPASS (COding and Multidirectional Parameterization of Ambisonic Sound
Scenes [48]) by Archontis Politis can be seen as an extension of DirAC. In contrast to DirAC, it tries to detect and separate multiple direct sound sources from the ambient or background sound. This is done by applying two different kinds of beamformers: one that contains only the direct sound for each sound source (source signals) and one that contains everything but the direct sound (ambient signal). Similar as before, the source signals are reproduced using amplitude panning and the ambient signal is sent to the decorrelator. In contrast to DirAC, COMPASS is not limited to first-order input but can also enhance the spatial resolution of higher-order inputs.

IEM, ambix, and mcfx Plug-In Suites
The ambix_converter is an important tool when adapting between the different Ambisonic scaling conventions, e.g. the standard SN3D normalization that uses only that is called N3D, see Fig. 5. 19. This alternating definition is because of a practical choice of the ambix format [49] to avoid high-order channels becoming louder than the zerothorder channel. Also it permits to adapt between channel sequences such as ACN's i = n 2 + n + m or SID's i = n 2 + 2(n − |m|) + (m < 0). It is advisable to use test recordings with the main directions, e.g. front, left, top, and to check that the channel separation for decoded material is roughly exceeding 20 dB for 5th-order material. Moreover, it contains inversion of the Condon-Shortley phase that typically causes a 180 • rotation around the z axis, and it contains the left-right, front-back, and topbottom flips discussed in the mirroring operations above. The ambix_warping plugin, see Fig. 5.20, implements the above-mentioned warping operations shifting horizontal sounds towards one of the poles, or into both polar directions. Warping can be applied to any other direction than zenith and nadir when placing it between two mutually inverting ambix_rotation or IEM SceneRotator objects that intermediately rotate zenith to another direction.
The IEM SceneRotator as the ambix_rotation plugin can be controlled by head tracking and it essential for an immersive headphone-based experience, see Fig. 5.21. Its processing is done as described above.
The ambix_directional_loudness plugin in Fig. 5.22 implements the abovementioned directional amplitude window in either circular or equi-rectangular spherical shape. Several of these windows can be made, soloed, and remote controlled, each one of which allowing to set a gain for the inside and outside region. This is often useful in practice, when, e.g., reinforcing or attenuating desired or undesired signal parts within an Ambisonic scene.   If, for instance, the Ambisonic scene requires dynamic compression, as outlined in the section above, the IEM OmniCompressor is a helpful tool. It uses the omnidirectional Ambisonic channel to derive the compression gains (as a side-chain for all other Ambisonic channels). Similarly as the directional_loudness plug-in, the IEM DirectionalCompressor allows to select a window, but this time for setting different dynamic compression within and outside the selected window, see Fig. 5.24.
The multichannel mcfx_filter plugin in Fig. 5.25 does not only implement a set of parametric equalizers, a low-and high cut that can be toggled between filter skirts of either 2nd or 4th order, but it also features a real-time spectrum analyzer to observe the changes done to the signal. It is not only practical for Ambisonic purposes, it's just a set of parametric filters that is equally applied to all channels and controlled from one interface.
The mcfx_convolver plug-in in Fig. 5.26 is useful for many purposes, also scientific ones, e.g., when testing binaural filters or driving multi-channel arrays with filters, etc. Its configuration files use the jconvolver format that specifies which filter file (typically stored in multi-channel wav files) connects which of its multiple inlets to which of its multiple outlets. It is also used to implement the SDM-based reverberation described in the above sections.
For a cheaper reverberation network, the IEM FDNReverb network described above can be used, see Fig. 5.27. It is not in particular an Ambisonic tool, but can  be used in any multi-channel environment. The particularity of the implementation in the IEM suite is that a slow onset can be adjusted.
The ambix_widening plug-in in Fig. 5.28 implements the widening by frequencydependent, dispersive rotation of the Ambisonic scene around the z axis as described above. It can also be used to cheaply stylize lateral reflections instead of the IEM RoomEncoder (Fig. 4.36) with time constant settings exceeding 5 ms, or just as a Another tool is quite helpful, the mcfx_gain_delay plug-in in Fig. 5.29. It permits to to solo or mute individual channels, as well as delay and attenuate them differently. What is more and often even more useful: It is invaluably helpful for testing the signal chain, as one can step through the channels with different signals.

Aalto SPARTA
The SPARTA plug-in suite by Aalto University provides Ambisonic tools for encoding, decoding on loudspeakers and headphones, as well as visualization. A special feature is the COMPASS decoder plug-in Fig. 5.30 that can increase the spatial resolution of first-, second-, and third-order recordings. Playback can be done either The signal-dependent parametric processing allows to adjust the balance between direct and diffuse sound in each frequency band. In order to suppress artifacts due to the processing, the parametric playback (Par) can be mixed with the static decoding (Lin) of the original recording. While it is advisable to keep the parametric contribution below 2 /3 for noticable directional improvements and low artifacts, in general, in recordings with cymbals or hihats it is advisable to fade towards lin starting at around 4 kHz.

Røde
The Soundfield plug-in by Røde in Fig. 5.31 was originally designed to process the signals from the four cardioid microphone capsules of their Soundfield microphone. However, it also supports first-order Ambisonics as input format. It can decode to various loudspeaker arrangements by placing virtual microphones into the directions of the loudspeakers. The directivity of each virtual microphone can be adjusted between first-order cardioid and hyper-cardioid. Moreover, higher-order directivity patterns are possible using a parametric signal-dependent processing, resulting in an increase of the spatial resolution.