Efficient numerical simulation of the human voice

The process of voice production is a complex process and depends on the correct interaction of the vocal folds and the glottal airstream inducing the primary voice source, which is subsequently modulated by the vocal tract. Due to the restricted access to the glottis, not all aspects of the three-dimensional process can be captured by measurements without influencing the measurement object. Hence, the application of a numerical tool capturing the physical process of phonation can provide an extended database for voice treatment and, therefore, can contribute to an increased effectiveness of voice treatment. However, such numerical models involve complex and demanding procedures to model the material behavior and the mechanical contact of the vocal folds and to realize moving boundaries of the involved physical domains. The present paper proposes a numerical model called simVoice, which circumvents these computational expenses by prescribing the experimentally obtained vocal fold motion within the simulation. Additionally, a hybrid approach for sound computation further enhances the computational efficiency and yields good agreement with acoustic measurements. An analysis of the computational workloads suggests that the key factor for a further increase in efficiency is an optimized flow simulation and source term computation.


Introduction
The ability of humans to communicate is an essential characteristic that distinguishes us from other living beings and has played a crucial role in the evolution of the dominant species. People suffering from voice disorders have a reduced quality of life and today also poorer chances of professional success [4,26,42]. However, voice problems do not only affect the individual person, but are also of high social and economic relevance. A study in 2000 revealed that the annual losses within the Gross National Product of the USA are up to 186 billions dollars since approximately 10 percent of the entire US population are affected by communication disturbances [29].
The principal mechanism of voice generation is the self-sustained oscillation of the vocal folds (VFs) caused by the airstream coming from the lungs (see Fig. 1). Due to the vibrating vocal folds, the airstream is interrupted periodically generating acoustic waves dominated by the oscillation (fundamental) frequency f 0 . The arising primary voice source is subsequently modulated by the shape of the downstream oral and nasal airways referred as vocal tract.
In general, the purpose of numerical models of the human voice is either to gain a deeper insight into voice generation or a clini- cal application. Thereby, simVoice become a valuable clinical tool in phoniatrics by • contributing to a better understanding of pathological and physiological voice production, • helping to identify new treatment approaches, • predicting the outcome of conservative and surgical voice treatment, and • supporting training of medical stuff in phoniatrics.
For a patient specific application in a clinical environment, special focus in the development of the model simVoice must be put on computational efficiency, which is achieved by (1) applying the hybrid aeroacoustic approach based on the perturbed convective wave equation (PCWE) and (2) by circumventing fluid-structure interaction (FSI) modeling by prescribing the vocal fold motion (forward coupling). Figure 2 illustrates the application of simVoice in a clinical environment as a tool supporting decision making, as well as for finding new treatment approaches. In the first step, the vocal fold motion is measured by high-speed imaging, and the voice of the patient is recorded. Furthermore, the geometry of the larynx and the vocal tract are obtained by magnetic resonance imaging (MRT) and the geometry and VF motion of the phonation model simVoice is adapted to the patient specific physiology. Subsequently, the geometry of the numerical model is modified according to the chosen treatment approach (e.g. surgical treatment), which is thought to yield an improvement of voice quality. In the next step, the voice of the modified geometry is computed by the numerical model. If the resulting voice quality is satisfying, the selected treatment can be performed. Otherwise, the geometry is modified again until a satisfying result is obtained. Thus, an efficient numerical model capturing the crucial physical aspects of phonation will contribute to a more effective treatment of voice disorders.
The rest of the paper is organized as follows. In Sect. 2, the state of art and a classification of numerical phonation models is provided and in Sect. 3, the methodology of simVoice is presented according to the simulation workflow. The model of the larynx and the vocal fold motion is illustrated in Sect. 3.1 and the hybrid workflow is explained in Sect. 3.2 including the derivation of the governing equation for flow-induced sound generation and propagation. Subsequently, the steps of the hybrid workflow, namely the computational fluid dynamics simulation, aeroacoustic source term computation and interpolation, and computation of the acoustic field are presented within Sect. 3.2, before analyzing the computational efficiency of the phonation model (Sect. 4.1) and showing validation

Background
The interaction of fluid dynamics (airflow), structural mechanics of the VFs (tissue), and acoustics (air), which is the principal mechanism of voice production, can be described in different levels of complexity. On a fundamental level, models of human phonation can be distinguished between lumped mass models and models based on continuum mechanics with the respective governing partial differential equation (PDE) according to Fig. 3. For the latter approach, the governing PDEs are typically solved by a finite volume method (FVM), finite element method (FEM), or finite difference method (FDM). PDE-based models can be further classified by the motion of the vocal folds, which can be static, externally driven (prescribed motion) or self-excited due to fluid-structure-acoustic interaction (FSAI). Another crucial model property is dimensionality which has a substantial impact on the degrees of freedom (DOFs) and hence computation time. Thereby, it has to be noted that only three dimensional (3D) models are capable of reflecting the real physics since flow turbulence, which is a dominating phenomenon of phonation, requires three dimensions for a correct physical representation(vorticity distribution) [25].
Lumped mass models While lumped mass models were used in the past for principal investigations of the self-excited oscillation of the vocal folds [8,12], they are still applied nowadays but in a threedimensional (3D) arrangement (multi-mass models) to investigate material properties of the vocal folds [44,45]. Due to the strong simplifications of this modeling approach, not all physical aspects can be captured. Therefore, PDE-based tools, which aim to resolve all physical details, are more promising for principal research as well Self excited (FSI) PDE-based models The most general approach is to model the interaction of the airflow and the structural mechanics of the vocal folds, which yields the self-excited oscillation of the vocal folds for certain flow and mechanic parameters. A first model considering fluid-structure interaction was introduced by Alipour et al. [1]. Thereby, the domain alteration was realized by mesh deformation, which induces distorted flow cells within the glottis. Thus, the immersed boundary method (IBM) was later applied [5,13,[46][47][48], which does not have this drawback. Besides ensuring the boundary conditions of the moving domain boundaries, a material model of the anisotropic tissue of the vocal folds has to be established. These two tasks require complex numeric procedures and high computational effort, which makes them not yet applicable in a clinical environment.
PDE-based models with static vocal folds This modeling approach assumes a quasi-steady condition at specific instances in the oscillation cycle and aims to investigate the glottal aerodynamics but does not allow a thorough acoustic analysis [11,24,32].
Externally driven PDE-based models Kaltenbacher et al. established a 2D FSAI-model which takes fluid-acoustic coupling into account by deploying acoustic perturbation equations within a hybrid aeroacoustic approach [17]. This model was simplified by Zörner et al., who replaced the fully-coupled interface condition by a one-way coupling [49]. Therewith, the costly coupled simulation was reduced to a pure computational fluid dynamic (CFD) simulation with a prescribed movement of the vocal folds and appropriate boundary conditions. In general, the presented results showed that a pure CFDsimulation with a prescribed structural movement can substitute the fully-coupled approach [49]. In simVoice, this modeling approach is applied in combination with a hybrid aeroacoustic approach based on the perturbed convective wave equation. The detailed description of the model with focus on the numerical efficiency is provided in [37], whereas a thorough description of the CFD incompressible flow simulation can be found in [31] and [30]. Furthermore, an extensive source term analysis for a validated synthetic setup is provided in [36] and the application to typical vocal cord dysfunctions is presented in [7]. Figure 4 illustrates the workflow of simVoice which is based on the hybrid aeroacoustic approach (Sect. 3.2) consisting of (1) a flow simulation, (2) an acoustic source term computation and interpolation and (3) an acoustic propagation simulation. Thereby, the acoustic source term excites the acoustic field and is defined by the right hand side of a the acoustic wave equation governing the propagation simulation and hence features the excitation of the acoustic field. In the acoustic propagation simulation, the VFs are fixed, whereas in the flow simulation they are externally driven. Therefore, the VF motion is obtained by fitting a vocal fold model (Sect. 3.1) to real physiological VF oscillations captured by glottal endoscopy.

Geometry and vocal fold model
For modeling the VF-motion, the commonly used M5 model with two degrees of freedom (DOFs) and a sinusoidal glottal orifice is chosen (see Fig. 5) [33]. Thereby, each vocal fold has a degree of freedom (DOF) of translation y and rotation-DOF φ. Due to the rotation-DOF, the model is able to capture the characteristic convergent-todivergent shape change of the glottal duct during an oscillation cycle [21,22]. Furthermore, typical abnormalities and pathologies of vocal fold vibration, such as glottal insufficiency (incomplete VF closure), aperiodic or asymmetric oscillation can be reproduced. Figure 7 illustrates the VF model and provides the main dimensions, of the larynx. The false vocal folds (fVFs), also known as ventricular folds are modeled rigid since they are not excited by the glottal airstream. The temporal evolutions of the DOFs y and φ are fitted to reproduce the measured glottal area waveform (GAW, see Fig. 6), which describes the temporal evolution of the area of the glottal orifice and can be obtained by high-speed videoendoscopy measurements during phonation in excised larynges (ex-vivo), as well as in living individuals (in-vivo). The vocal fold shape in the closed and the opened condition is shown in Fig. 7. Due to numerical issues of the flow solver, a gap of h min = 0.2 mm is remaining in the closed condition. A more detailed description of the applied vocal fold model and larynx geometry including the shape of the false vocal folds (ventricular folds) can be found in [30,31].
In the current version of simVoice, a simplified straight vocal tract with circular cross-sections according to [28] is used (see Fig. 8). Compared to a realistic physiological shape, the model complexity is reduced significantly while preserving the acoustic properties. A more detailed description of the simplification procedure can be found in [2,28].

Hybrid aeroacoustic approach
simVoice is based on the hybrid aeroacoustic approach which decomposes the task of sound computation into a flow computation and an acoustic propagation simulation [35]. For the latter part, the aeroacoustic source terms depending on the partial differential equation (PDE) governing the acoustic field (such as Lighthill acoustic analogy, acoustic perturbation equations, etc.) have to be computed from the flow field obtained in the first step [35]. The hybrid approach is commonly applied in computational aeroacoustics (CAA) [3,16,18,27,[39][40][41] due to several advantages: • Adjustment of simulation features to the specific requirements of the respective physical field. Since the characteristics and dominating phenomena of the flow and the acoustic field differ substantially, the numerical computation of these fields has their specific requirements: -Spatial discretization. The element size can be chosen individually for the flow and acoustic field (i.e. handle the disparity of length scales). Whereas in fluid dynamics boundary layers and In many applications, the reduction of computational effort due to valid simplification of the flow simulation outweighs the additional effort due to the acoustic simulation and the acoustic source term computation and treatment. Furthermore, when the source term computation is performed on the fly within the CFD-export, the additional effort is decreased. For the present model, the perturbed convective wave equation (PCWE) is chosen to model the sound generation and propagation inside the acoustic domain. An advantage of this wave equation is its numerical efficiency, since it is a PDE with only one scalar unknown being the acoustic velocity potential ψ a . The derivation of so-called acoustic perturbation equations (APEs) [6] is based on the perturbation ansatz, which is applied to the quantities of the flow field according to [9]. Therewith, the pressure p, the flow velocity v, and the density ρ are decomposed into a mean part (p,v,ρ) and a fluctuating part consisting of acoustic (p a , v a , ρ a ) and incompressible flow components (p ic , v ic ). Additionally, a density correction ρ 1 is introduced. By applying the splitting approach to the compressible flow equations, the APE-2 system can be derived [6], which was reformulated by Hüppe and Kaltenbacher [10,15] by introducing the acoustic velocity potential ψ a defined as v a = −∇ψ a .
Thereby, the material parametersρ (incompressible fluid density) and c (speed of sound) depend on the condition of the airstream. For in-vivo phonation, the material parameters are set to ρ = 1.145 kg m −3 and c = 355 m s −1 corresponding to the condition of expiration air (T ≈ 35 • C, 95% relative humidity) whereas ex-vivo investigations ambient conditions have to be considered. The PDE describes aeroacoustic sources generated by incompressible flow structures and wave propagation through moving media. Since the solution quantity is a scalar unknown, this form has an enhanced numerical efficiency compared to the APE-2 system. The differential operator (material derivative) in Eq. (4) is defined by consisting of the partial time derivative and a convective term, where v is the mean velocity. The right-hand side (RHS) of Eq. (4) defines the aeroacoustic source term wherein the convective term can be neglected for low flow velocities v due to the dominating partial time derivative especially at high frequencies. Since low flow velocity applies to human phonation, the simplification is exploited. When the mean flow v is also neglected in the wave operator on the left-hand side of Eq. (4), the weak formulation of the PCWE (4) reads as 1 c 2 ϕ ∂ 2 ψ a ∂t 2 dx + ∇ϕ · ∇ψ a dx = ϕ ∇ψ a · n ds − 1 ρc 2 ϕ f dx (8) with ϕ denoting the test function and the integrals being defined in the computational domain (volume term) and its boundary (boundary term). This weak form of the PCWE is solved by the FE-based multi-physics solver openCFS for the solution quantity ψ a [43]. The relation p a =ρ Dψ a Dt allows the calculation of the acoustic pressure p a .

1) Computational fluid dynamics (CFD) simulation
To provide the flow field for the subsequent acoustic source term computation, the unsteady incompressible Navier-Stokes equations are solved by the CFD software Star-CCM+ (Siemens PLM Software, Plano, TX/USA). Compressible effects can be neglected due to the low flow velocity (mean convective Mach number inside the glottis Ma < 0.1). The computational domain comprises the larynx and the downstream vocal tract (Fig. 5). The overset mesh algorithm of Star-CCM+ is used to realize the vocal fold motion, which is based on the above introduced M5-model. This procedure does not allow full closure of the vocal folds. Thus, a gap of 0.2 mm is remaining in the closed condition, which causes a negligible leakage flow. A large eddy simulation (LES) in combination with the wall-adapting local eddy-viscosity (WALE) subgrid-scale model is used to model the turbulent flow. The computational grid consists of about 1.3 million hexahedral cells with two levels of refinement within the glottal region. Further details about the flow simulation can be found in [30,31]. In Fig. 8 the resulting flow field at t = 0.5 T is visualized by means of streamlines.

2) Aeroacoustic source term computation and interpolation
Having determined the evolution of the flow field, the simplified acoustic source term of Eq. (7) can be computed on the CFD grid. The resulting source term at the time instance t = 0.5 T is displayed in Fig. 9. In a next step, the source term has to be interpolated from the CFD grid to the generally coarser acoustic grid (see Fig. 10). This is realized by a conservative interpolation scheme, which considers the intersection volumes of flow and acoustic cells and hence conserves the energy globally, as well as locally [34,37,38]. Figure 11 shows the acoustic simulation setup. The static computational domain covers the CFD domain at the maximum opening of the vocal folds in order to capture all acoustic sources. The vocal fold motion can be neglected for the acoustic simulation because the glottal gap is acoustically compact since

3) Computational acoustics simulation
where h max = 4 mm denotes the maximum glottal gap and f max = 5 kHz the maximum resolved frequency. Thus, the changing acoustic domain has no impact on the wave propagation within the considered frequency range. In addition to the larynx and vocal tract, a propagation region is added, which represents the ambient air, where the voice is emitted. The adjacent perfectly matched layer (PML) region ensures free field radiation at the boundary [14], whereas at the larynx inlet wave reflections are avoided by an absorbing boundary condition (ABC) [14] at inlet . Compared to a PML, the ABC does not introduce additional DOFs but provides full absorption just for plane waves, which applies to the present model because with H being the larger dimension of the duct cross-section (see Fig. 5) The remaining bounding surface wall is modeled soundhard by setting the acoustic particle velocity normal to the wall v n = ∇ψ a · n to zero (natural boundary condition). In order to preserve element quality and meshing flexibility a non-conforming interface of type Nitsche is applied between the vocal tract and the propagation region, which is also implemented in openCFS [43]. The shape of the vocal tract in Fig. 11 is simplified from a realistic MRI-based human vocal tract (corresponding to the vowel 'a') to a straight rotational symmetric shape while preserving the acoustic filter characteristics according to the procedure presented in [2]. The simulation setup is the result of a convergence study presented in [37]. Therein, the element size, the number of elements within the PML, and further aspects were successively modified to obtain an efficient acoustic simulation setup. An overview of cell

Results
In order to validate the numerical model simVoice, a measurement on a test rig with artificial vocal folds (M5-shaped) out of silicone and a simple cuboid-shaped vocal tract out of aluminum was performed [19,20,22,23]. Thereby, the shape of the larynx and the vocal tract of the measurement and the simulation agreed. Figure 12 shows the resulting acoustic spectra of the measurement of 20 oscillation cycles of the VFs and the simulation, where the parts of the acoustic source term in Eq. (7) were considered separately as well as combined. The plot proofs that neglecting the convective source term is valid since its consideration only changes the spectrum insignificantly. Above 300 Hz, the spectrum of the measurement is well reproduced by the simulation. Reflections of the acoustic waves at the walls of the anechoic room in the measurement are thought to be the reason for the underestimation of the amplitude by the simulation underneath 300 Hz since this is the cut-off frequency of the anechoic room. A detailed description of the experimental setup is published in [19,20,22,23].

Computational efficiency
Since the goal of the present research work is a clinical application, a special focus is put on computational efficiency which is a crucial factor for the applicability in a clinical environment for patient specific treatment approaches. Table 2 summarizes the simulation details of the principal computational tasks of the hybrid workflow. It clearly shows that the bottleneck of the workflow is the CFD simulation demanding 94.5% of the total computation time. However, this emphasizes the potential of the hybrid approach in terms of computational efficiency, since the effort due to tasks introduced by the hybrid approach, is only about 5.5%. Therefore, the reduction of computational effort due to simplifications on the flow simulation (incompressible flow, reduction of computational domain) clearly outweighs the additional effort owing to the hybrid approach (source term computation and interpolation and acoustic propagation simulation). Moreover, circumventing FSI of the airflow and the vocal folds yields a massive reduction of computation time since the tissue has not to be discretized (no additional DOFs), no material model of the tissue is required and complex algorithms treating the moving boundary (VF surface) are avoided.

Conclusion
A numerical model called simVoice, which is capable of predicting the voice based on a measured vocal fold motion and the geometry of the larynx and vocal tract was presented. The hybrid aeroacoustic  approach based on the PCWE and the prescribed vocal fold motion, which circumvents modeling fluid-structure interaction, contributes to the high numerical efficiency of the model. However, the CFD simulation is still too computationally challenging to be performed on a conventional computer. Thus, special effort will be put into an efficiency-optimized flow simulation and source term computation. Moreover, the application of stochastic methods and artificial intelligence for the multiplication of flow data of a reduced number of oscillation cycles will be investigated. Nevertheless, numerical models capturing the crucial physical aspects of phonation will develop in the future and have the potential to support clinical work and to contribute to a deeper understanding of human voice production.
Publisher's Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. His research involves theory, modeling, simulation and experimental investigation of complex systems in engineering, material and medical science. A main focus is on the development of advanced Finite Element (FE) methods for multi-field problems (electromagnetics-mechanics, mechanics-acoustics, piezoelectricity, flow dynamicsmechanics, aeroacoustics), and their application to design mechatronic sensors and actuators.