Compact Spherical Loudspeaker Arrays

This chapter introduces auditory objects that can be created by adjustable-directivity sources in rooms. After showing basic positioning properties in distance and direction, we describe physical ﬁrst- and higher-order spherical loudspeaker arrays and their control, such as the loudspeaker cubes or the icosahedral loudspeaker (IKO). Not only static auditory objects, but such traversing space by their time-varying beam forming are considered here. Signal dependency and different practical setups are discussed and brieﬂy analyzed. This young Ambisonic technology brings new means of expression to sound reinforcement, electroacoustic or computer music. While surrounding Ambisonic loudspeaker arrays play sound from outside the listening area into the audience, compact spherical loudspeaker arrays play sound into the room from a single position. Directivity adjustable in orientation and shape can be used to steer sound beams in order to excite wall reﬂections in the given, acoustic environment. The directional shapes and orientations of such beams are all controlled by—guess what—Ambisonic signals. Despite the huge practical difference, both applications do not only share the spherical harmonics that lend their shapes to Ambisonic signals: The control of radiating sound beams employs nearly the same model- or measurement-based radial steering ﬁlters as those of compact higher-order Ambisonic microphones. The electroacoustic background technology describe compact spherical loudspeaker arrays built with electrodynamic transducers. auditory


Perceived Distance
Laitinen showed in [20] that increasing the directivity of a listener-facing loudspeaker array from omnidirectional to second order was able to create auditory events that were perceptually closer than the physical distance to the loudspeaker array. The experimental results can be explained by the increase of the direct-to-reverberant energy ratio, as the sound beam of the directional source does not as much excite room reflections. Wendt extended Laitinen's work by experiments employing a simulation of a third-order directional source in a virtual room (third-order image source model) played back by a loudspeaker ring in an anechoic room [16]. He could show that the perceived distance between the listener and the higher-order directional source could not only be controlled by the order of the directivity pattern but also by the orientation of the source (towards the listener, away from the listener). Beams projecting sounds away from the listener were perceived behind the source, cf. Fig. 7.1. Again, the perceptual results could be modeled by simple measures known from room acoustics.

Perceived Direction
Using a similar room simulation, the study in [21] asked participants to indicate the perceived direction of an auditory event created by a third-order directional source. The results showed that for different source orientations, listeners perceived auditory objects at directions that often did not coincide with the sound source, but with the delayed reflection paths, cf. Fig. 7.2. Perceived directions focused on the direct sound and the three first reflections after 6, 8, and 9 ms. For some orientations, still even the second-order reflections at 12 and 14 ms were dominating localization. However, the influence of later reflections is reduced by the precedence effect. The perceived directions can be modeled by the extended energy vector originally developed for offcenter listening positions in surrounding loudspeakers arrangements, as also shown in [17]. Experiments in [22] showed that panning between a reflection and the direct sound creates auditory objects in between. When applying the appropriate delay and gain to the direct sound to compensate for the longer path of the reflection, the localization curves are similar to those of standard stereo using a pair of loudspeakers.

First-Order Compact Loudspeaker Arrays and Cubes
The simplest way of creating a loudspeaker array with adjustable directivity in a practical sense is a cube with loudspeakers on its plane surfaces, as suggested by Misdariis [23]. Restricting the directivity control to two dimensions reduces the number of loudspeaker drivers to four and facilitates to equip the array with a carrying handle on top and a flange adapter at the bottom, cf. [24] and Fig. 7.3.

Directivity control.
First-order Ambisonics utilizes monopole and dipole modes, which directly translate to the corresponding far-field radiation patterns. These modes can easily be created due to the cubic shape by either playing of all four drivers in phase or the opposing drivers out of phase, cf. Fig. 7.4. Nevertheless, the frequency responses of such monopole and dipole modes need to be equalized to enable their phase-and magnitude-aligned superposition in the far field. Filters and measurement data of cube loudspeakers built at IEM [24] are freely available on http://phaidra. kug.ac.at/o:67631.  To overcome the compressive effort of interior volume changes at low frequencies, the filter H bctl in Fig. 7.4 equalizes the smaller velocity of the loudspeaker cones when driven omnidirectionally to the velocity when driven in dipoles as a first step, and as a second step, it attenuates the monopole pattern slightly to account for its more efficient radiation at low frequencies. The filter H EQ is a general equalizer required to obtain a flat frequency response, 0 ≤ α ≤ 1 is a first-order omni to dipole beamshape parameter, and ϕ 0 is the beam direction. The filter H bctl can be specified as a 5th-order IIR filter purely based on geometric and electroacoustic parameters [19].
Direct and indirect sound with two cubes. The study in [19] examined the width of the listening area for the creation of a central auditory object between a pair of loudspeaker cubes cf. Fig. 7.5. Steering the two beams directly at the listener yielded a narrow listening area that increased with the distance to the loudspeakers, similar as known from typical stereo applications, cf. Fig. 2.9. A much wider listening area is achieved by steering the beams to the front wall to excite reflections. To this end, max-r E (super-cardioid) beams were chosen and oriented in a way to ideally suppress direct sound from the loudspeaker cubes at the listening position. The proposed setup of two loudspeaker cubes can be used to play back stable L, C, R channels of a surround production without the need of an actual center loudspeaker.

Surround with depth:
Together with the distance control described by Laitinen [20], the stable in-between auditory image has been used in [19] to establish a surround-  with-depth system consisting of a quadraphonic setup of four loudspeaker cubes. As first layer, it uses the direct sounds from the 4 loudspeakers from ±45 • and ±135 • together with the 4 in-between images at 0 • , ±90 • , and 180 • to obtain 8 directions for third-order Ambisonic surround panning. As a second layer for depth, surround with depth uses 4 cardioid beams pointing into the 4 room corners to provide the impression of distant sounds. Blending between those two layer is used to control the distance impression of surround sounds.

Higher-Order Compact Spherical Loudspeaker Arrays and IKO
With transducers mounted on spheres or polyhedra, higher-order radiators can be built. Typically, those are Platonic solids such as dodecahedra or icosahedra, as they can easily be manufactured from equal-sided polygons cf. Fig. 7.6. Often, the loudspeakers are also mounted onto a common interior volume. Hereby, the higher-order modes can be controlled at reduced impedance of the inner stiffness, however, this also causes acoustic coupling of the transducer motions. Typically, multiple-inputmultiple-output (MIMO) crosstalk cancellers are employed to suppress the coupling and to control the velocity of the transducer cones. If this is accomplished, the acoustic radiation can be modeled and equalized by the spherical cap model, cf. [6,15,25,26].
Here, u(ζ ) denotes the unit step function that is unity for ζ ≥ 0 and zero otherwise. The surface velocity distribution can be decomposed into spherical harmonics as The coefficients w (l) nm of the lth cap are defined by spherical convolution Eq. (A.56) of a Dirac delta δ(θ T l θ − 1) pointing to the cap center with a zenithal cap u(cos ϑ − cos α 2 ): where Y m n (θ l ) are the coefficients expressing the Dirac delta, extended to a cap by weighting with w n . The term w n = 2π Decoder. Without radiation control yet, any low-order target spherical harmonic n ≤ N can be synthesized as velocity pattern φ nm by superimposing the spherical cap coefficients w (l) nm with suitable transducer velocities v l , i.e. φ nm = l w n Y m n (θ l ) v l . We write a matrix/vector notation with the matrix Y = [ y(θ 1 ), . . . , y(θ L )] containing the spherical harmonics y(θ ) = [Y m n (θ )] nm sampled at the transducer positions {θ l } to represent Dirac deltas pointing there, and w = [w n ] nm to represent the cap shape, φ = diag{w}Y v. (7.5) As long as the order N up to which coefficients are controlled is low enough L ≥ (N + 1) 2 and transducers are well-distributed, perfect control is feasible. The corresponding velocities are found by solving a least-squares problem, see Appendix A.4, Eq. (A.63), yielding the right inverse of the Nth-order cap- . Exterior problem. The radiated sound pressure is described by the exterior problem denoted by the coefficients c nm in Eq. (6.13) and the spherical Hankel functions h (2) n (kr). To relate it to a time-derived surface velocity at the array radius r = a, we derive the exterior solution with regard to radius ∂ p Comparing Eq. (7.2) to Eq. (7.7) yields c nm = ρc[ih (2) n (ka)] −1 L l=0 w n Y m n (θ l ) v l , the coefficients to calculate the radiated pressure. Far away, we replace the spherical Hankel function that approaches h (2) n (kr) → i n+1 k −1 e −ikr by the term i n+1 k −1 in Eq. (6.13) so that the radiated far-field sound pressure p ∝ i n+1 k −1 Y m n c nm becomes

Directivity Control
The spherical harmonics coefficients of the far-field sound pressure pattern in Eq. (7.8) are controlled by the cap velocities v l and we desire to form the directional sound beam they represent according to a max-r E pattern a n Y m n (θ 0 ) yielding radiation focused towards θ 0 ψ nm = a n Y m n (θ 0 ). (7.10) To find suitable cap velocities v l , we equate the model Eqs. (7.9) and (7.10). In matrix/vector notation never used the equation is The diagonal matrix on the left is easy to invert, and for patterns up to the order n ≤ N, the mode-matching decoder D of Eq. (7.6) already gives us a way to define velocities inverting the matrix Y N from the right. The preliminary solution becomes

On-axis equalized, sidelobe-suppressing directivity control limiting the excursion.
The inverse cap shape coefficient w −1 n and the max-r E weight a n can be regarded as a part of the radiation control filters i −n k h (2) n (ka). The expression i −n−1 (ka) 2 h (2) n (ka) of compact spherical microphone arrays (Sect. 6.6) qualitatively differs by a factor k. Practical implementation of radiation control filters and their regularization is therefore quite similar to radial filters of spherical microphone arrays. There are three main differences, as explained in [15]: • With loudspeaker arrays, it is rather the excursion that is limited, which primarily entails a different strategy of adjusting the filter bank cut-on frequencies, which due to size are at lower frequencies where group-delay distortions are less disturbing, and linear-phase implementations would cause avoidably long delays. • Moreover, instead of cut-on filter slopes of (n + 1)th order required for noise removal in signals obtained from spherical microphone arrays, limited excursion requires cut-on slopes of at least (n + 3)th order, i.e. 4th order to cut on the 1storder Ambisonic signals. Thereof, one additional order is caused by the qualitative difference of k −1 in radial filters, and another order by the conversion of velocity to excursion by a factor (iω) −1 . • Finally, instead of diffuse-field equalization that is useful for surround sound playback of spherical microphone array signals, it is more useful to equalize spherical sound beams on-axis (free field).
On-axis equalization yields a different scaling of the sub-band max-r E weights a n,b = Typically, cut-on frequencies for compact spherical loudspeaker arrays are low, and linear-phase filterbanks would require long pre-delays. It is useful to employ Linkwitz-Riley filters for the crossovers, to get a low-latency implementation. To emphasize the similarity to Eq. (6.22), we write Linkwitz-Riley filters [27] as combination of an all-pass A m with twice the phase response of an mth-order Butterworth low-pass combined either with the magnitude-squared low-pass response Such a minimum-phase crossover is of even order, so that the minimum-order cut-on slope must be rounded up to the next even order 2 b+3 2 . Plain high/low crossovers would be in-phase unless combined with further crossovers to form narrower bands. However, an in-phase filterbank is obtained after inserting the product of all all-passes in every band, cf. [28]. Although non-minimum-phase, this is still low-latency. For the band b containing Ambisonic orders 0 ≤ n ≤ b, the modified filterbank is 14) The sum b H b (ω) is considered to be sufficiently flat, so that the radial filters for compact spherical loudspeaker arrays using Eqs. (7.8), (7.13), (7.14) become

Control System and Verification Based on Measurements
Velocity equalization/crosstalk cancellation. In the frequency domain, laser vibrometer measurements, cf. Fig. 7.8a, characterize the physical multiple-input-multipleoutput (MIMO) system of transducer input voltages u l (ω) to transducer velocities v l (ω) v(ω) = T (ω) u(ω), (7.16) including the effect of acoustic coupling through the common enclosure. Corresponding open measurement data sets 1 can be found online, as described in [18]. Theoretically, the frequency-domain inverse of the matrix T (ω) can be used to equalize and control the transducer velocities with acoustic crosstalk cancelled, as indicated in Fig. 7.7, In practice, this is only useful up to the frequency at which the loudspeaker cone vibration breaks up into modes, so typically below 1 kHz.
Control system: The entire control system with Ambisonic signals χ N (ω) as inputs uses Eqs. (7.6), (7.15), (7.17) Directivity measurement. It is useful to characterize the directivity obtained by measurements to verify the results; high-resolution 648 × 20 measurements G(ω) of the IKO are found online 1 . The sound pressure can be decomposed with the known directional sampling by left-inversion of a spherical harmonics matrix Y T 17 , see Appendix A.4, Eq. (A.65, which can be up to 17th order on a 10 • × 10 • grid in azimuth and zenith: With the highly resolved spherical harmonics coefficients, polar diagrams or balloon diagrams can be evaluated at any direction p(θ, ω) = y 17 (θ) T ψ 17 (ω), (7.20) given any control system delivering suitable voltages u for beamforming, as e.g. obtained by Eq. (7.18).  To inspect the frequency-dependent directivity, a horizontal cross section is shown in Fig 7.9. The beamforming gets effective above 100 Hz and a beam width of ±30 • is held until 2 kHz. The filterbank starts the 0th order above 38 Hz, and with 75, 125, 210 Hz, 1st, 2nd, and 3rd order are successively added including on-axis equalized max-r E weightings. Above 2 kHz both spatial aliasing and modal breakup of the transducer cones affect directivity. However, these beamforming-directiondependent distortions are often negligible in typical rooms.

Static Auditory Objects
The study in [16] showed that distance control by changing the directivity and its orientation can also be achieved with the IKO in a real room, cf. Fig. 7.10. The experiments used stationary pink noise and could create auditory objects nearly 2 m behind the IKO, which corresponds to the distance between the IKO and the front wall of the playback room.
The maximum distance of auditory objects created by the IKO is strongly signaldependent. Experiments in [14] showed that the auditory distance of pink noise bursts decreased for shorter fade-in times, while the fade-out time had no influence, cf. Fig. 7.11. A transient click sound was perceived even closer to the IKO. This can be explained by the precedence effect, that favors the earlier direct sound over the reflected sound from the walls. While this effect is strong for transient sounds, it is inhibited for stationary sounds with long fade-in times.
However, the precedence effect can even be reduced for transient click sounds by simultaneous playback of a masker sound that reduces the influence of the direct sound [29]. In comparison to no masker, playing a pink noise masker doubles the auditory distance, cf. Fig. 7.12. Using the room noise as a masker by playing the

Moving Auditory Objects
The studies in [14,15] extended the previous listening experiments towards simple time-varying beam directions, such as from the left to the right, front/back or circles.
To report the perceived locations of the moving auditory objects, listeners used a touch screen that showed a floor plan of the room, including the listening position and the position of the IKO. They had to indicate the location of the auditory object's trajectory every 500 ms. The perceived trajectories depend on the listening position, but they can always be recognized, cf. Fig. 7.13. The empirical knowledge was  [14] about body-space relations, composing sounds that are spatialized with different static directions and simple movements. For concerts, the artistic practice evolved to set the IKO up together with reflector baffles, cf. Fig. 7.14. A recent study in [30] investigated their effect on the perception of moving transient and stationary sounds. The baffles obviously reduce the signaldependency by contributing more additional reflection paths, contrasting the direct sound.

IEM Room Encoder and Directivity Shaper
The IEM Room Encoder VST plug-in, cf. Fig. 4.36, can not only be used to simulate the room reflections of an omnidirectional sound source based on the imagesource method, but it also supports directional sound sources. As format, it employs Ambisonics with ACN ordering and adjustable normalization up to seventh order. Thus, it enables to utilize data from directivity measurements or even directional recordings done with a surrounding spherical microphone, e.g. to put real instrument recordings into the virtual room.
As an alternative, the IEM Directivity Shaper, cf. Fig. 7.15 provides simple means to generate a frequency-dependent directivity pattern from scratch and to apply it on a mono input signal. This is useful to generate the typical rotary speaker effect of a Leslie cabinet.

IEM Cubes 5.1 Player and Surround with Depth
As shown in Fig. 7.5, a pair of loudspeaker cubes can create a stable auditory event in between them to replace an actual center loudspeaker. In order to play back an entire 5.1 production, the IEM cubes 5.1 Player plug-in extends this approach by two additional beams to the side walls for the surround channels, cf. Fig. 7. 16. The plug-in provides a control of the shape, direction, and level of all beams, as well as a delay compensation for the reflection paths.
Surround sound with depth can be realized with a quadraphonic setup of four loudspeaker cubes and a combination of the cubes Surround Decoder and multiple Distance Encoder plug-ins, cf. Fig. 7.16. For each source, the Distance Encoder controls position and distance, i.e. the blending between the two layers. The output of the plug-in is a 10-channel audio stream including 7 channels for third-order (inner layer) and 3 for first-order 2D Ambisonics (outer depth layer). The cubes Surround Decoder plug-in decodes the 10-channel audio stream and distributes the signals to the 16 drivers of four loudspeaker cubes. For each loudspeaker cube, the directions to excite direct and reflected sound of the inner layer and the diffuse sound of the depth layer can be adjusted in order to adapt to the playback environment. Additionally, the directivity patterns for direct, reflected, and diffuse sound beams can be controlled, as well as a delay to compensate for the longer propagation paths of the reflected sound. The plug-ins are available under https://git.iem.at/audioplugins/ CubeSpeakerPlugins.

IKO
Spatialization using the IKO can use a similar infrastructure of plug-ins as surrounding loudspeaker arrays. Ambisonic encoder plug-ins, such as the ambix_encoder or the IEM StereoEncoder or MultiEncoder, create the third-order Ambisonic signals that are subsequently fed to a decoder. Decoding to the IKO requires the processing steps as shown in Fig. 7.7: radiation control filters in the spherical harmonic domain, decoding from spherical harmonics to transducer signals, as well as crosstalk cancellation and equalization of the transducers. This processing can be summarized in a 16 (spherical harmonics up to third order) × 20 (transducers) filter matrix. Convolution can be done efficiently using the mcfx_convolver plug-in. Filter presets for the IKO can be found under http://phaidra.kug.ac.at/o:79235.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.