1 Introduction

Recently introduced legislation will end the sales of internal combustion engine (ICE) powertrains in the next decades. The United Kingdom’s Road to Zero strategy [1], for instance, aims to phase out the sales of new ICE cars by the end of 2040, whilst keeping the UK in the forefront of design and manufacturing of low-emission vehicles. A common approach to reduce the emission of vehicles is by powertrain electrification. Nevertheless, new challenges regarding noise, vibration, and harshness (NVH) arise with the shift from ICE driven cars to electrified powertrain. The radiated sound spectrum of an electric vehicle (EV) has different characteristics of a classic ICE. Whilst the ICE noise is usually broadband and below 1 kHz, the noise spectrum in an EV is dominated by tonal behaviour in regions up to 10 kHz [2]. And, even though, EV powertrains are usually quieter than ICE ones [2, 3], the lack of masking noise from the engine can lead to a potential nuisance in the perception of the quality of the radiated sound due to its tonal behaviour [4], especially if these tones are considered to be prominent [5].

The main NVH excitation sources for EVs are the electromagnetic forces and the transmission error in the gearbox [6]. The electromagnetic forces acting on the stator and rotor of the electric motor cause tonal noise and vibration at specific frequencies that depend on the speed of the rotor, number of poles, and number of stator teeth [7, 8]. This is the so-called motor whistling. Radial force and torque ripple are usually the main contributors to the radiated noise from the motor [9]. Active regulation of current and pole shape optimisation has been used to mitigate motor whistling [7]. The tonal behaviour of the powertrain transmission is linked to the gear dynamic meshing force and occurs at the gear meshing frequencies and their harmonics [10]. This is known as gear whining. It has been shown that reduction in the pressure angle and increase in the helix angle can reduce the gearbox noise in EVs [11]. Whilst motor whistling is usually more aggressive at lower speeds and, therefore, lower tonal frequencies, gear whine tends to be more dominant at higher speeds, which also leads to higher tonal frequencies [12].

Subjective studies show that people exposed to noises with tonal components rate these as more annoying than broadband noise, especially when the spectrum contains several tones [13]. Therefore, it is of interest to assess how the EV NVH will be perceived. It has also been shown that tonal nuisance increases with the increase of the tone frequency [14]. One way of evaluating the presence of tones is via Prominence ratio metric [15]. If the PR value is above a certain threshold, the band is said to be prominent. More details are provided in Sect. 3.

Studies have been published on the application of artificial neural networks in the field of NVH evaluation. These studies have aimed to assess the interior noise of electric vehicles through vehicle road tests and subjective analysis with jurors [16]. Sound quality has also been predicted using acoustic signals and a machine learning approach via relevance vector machine [17]. Ride comfort prediction has also been attempted by training a network on experimental data [18]. Machine learning has been applied in other areas as well, such as analysing gear tooth surfaces [19]and identifying motor whistle in real-time [20]. Whilst successful in estimating sound quality features, both studies rely on using experimental acoustic data to predict how the sound might be perceived. This has an obvious limitation of requiring acoustic data.

The vibration and noise prediction of electric motors is commonly divided in three methods [21]: numerical, analytical, and semi analytical. This paper uses a numerical method for the prediction and combines it with a data driven approach to reduce the number of required numerical simulations. The current study develops a model to calculate the forces acting on a housing via the bearings of a Permanent Magnet Synchronous Machine (PMSM) electric powertrain. It is known that bearings are responsible for the transmission of the driveline excitations to the housing [22, 23]. Therefore, it should be a reasonable approach that the acoustic features are also somehow embedded in this information. The acceleration data of one of the most flexible housing locations is also calculated in this study. The vibrational response of the housing is then used to predict the sound pressure radiated by the system. The radiated noise spectrum is used as a target to train a network that classifies different frequencies as prominent or not based on the input forces and acceleration. This is done using a finite element model of the housing, coupled with a flexible multi-body dynamics (MBD) model of the PMSM electric powertrain, to calculate the forces and acceleration acting on the housing. The flexible MBD model includes the shaft dynamics and more importantly, it allows for the vibration of the housing as it is excited by the motor, gears and bearings. The vibrational velocity of the housing can then be calculated and is used to predict the radiated noise spectrum using acoustic simulations. The resulting vibrational response is used to predict the radiated noise spectrum using acoustic simulations. The range of operating conditions considered varies from 1000 to 8100 RPM and 0.1428 \({T}_{max}\) Nm to \({T}_{max}\) Nm of transmitted torque. \({T}_{max}\) is the maximum torque the motor outputs in the normal operating condition range.

In this study simulation results of both MBD and acoustic models are used to train a network for the classification of sound quality in terms of the perception of tones. The network can be trained with the initial design data and once the trained network is available, it allows for quicker identification and resolution of NVH issues by reducing the necessity for more computationally demanding acoustic simulations that would not address adequately relevant tones or more expensive and difficult to get experimental data. To the best of the authors’ knowledge, such an approach of using an artificial neural network to optimise NVH performance without the necessity of extensive acoustics simulations and measurements is new. The results show that indeed it is possible to identify acoustic tonal content from the forces that excite the housing via the bearings. This novel approach proposed lays down the foundation for a framework of using an artificial neural network surrogate model that allows for the investigation of acoustic features by employing MBD results. Whilst in this paper the tonal behaviour of the noise spectrum, estimating the PR classification for a given frequency was used, the same technique could be applied to different psychoacoustic metrics. It can be a valuable tool in assessing how targeted design changes to the powertrain affect the perception of the radiated sound.

2 System modelling and simulations

The system comprises a PMSM connected to a two-stage, two-speed transmission with helical gears supported by eight bearings (six tapered roller bearings and two deep groove ball bearings). In this case, the PMSM is a motor with an internal rotor with the permanent magnets surrounded by a stator with the windings. The motor and the transmission are enclosed by a housing along with the inverter. The system can be seen schematically in Fig. 1.

Fig. 1
figure 1

Schematic of the powertrain

The software ABAQUS is used in this work for the FE model of the housing. In total, it has 1.3 million elements. The model [24] of the housing considers homogeneous properties to approximate an aluminium alloy. Orthotropic properties are used to model the stator made out of steel. The stator and the directions for the material are shown in Fig. 2.The lower modulus of elasticity in the Y direction of the stator is due to the fact that the stator comprises a tight package of a series of steel sheets and resin in this direction. The mechanical properties are summarised in Table 1.

Fig. 2
figure 2

Stator representation and directions

Table 1 Mechanical properties of the condensed housing and stator

For the multibody dynamic (MBD) simulations, only a smaller set of the nodes of these elements are used. Namely the nodes where forces are applied or where one is interested in calculating displacements, velocities, or accelerations. The kept nodes can be seen in Fig. 3. The four circular sections represent the stator teeth while the other nodes are related to bearing locations, mounts, and points in the housing where one can be interested in vibration data. A total of 305 nodes with 6 degrees-of-freedom each are retained through a dynamic condensation process that can represent the static behaviour along with the modal behaviour of the system [25, 26]. A total of 126 modes of the housing are considered in this condensation up to 5 kHz. For this study, results are considered to be valid up to 2.5 kHz, which covers up to the 40th natural frequency of the housing.

Fig. 3
figure 3

Nodal visualisation a the nodes kept after the condensation of the housing and b the same nodes along with all the other nodes in the MBD model

This work does not differentiate the sources of the tones in terms of motor whistling or gear whining but if the topology of the motor and transmission are known it is possible to separate and identify those. For instance, for a synchronous machine, the stator current frequency is given by [6]:

$${f}_{el}=\frac{p\omega }{60}$$
(1)

where \(p\) is the number of pole pairs and \(\omega \) is the speed of the rotor in rpm. The frequency of the electromagnetic forces acting on the stator, given by \({f}_{EMf}=2{f}_{el}\) [27], and its harmonics are the motor whistling frequencies.

In the case of helical gears, the gear meshing frequency is given by [28]:

$${{\text{f}}}_{{\text{meshing}}}=\frac{\mathrm{Z\omega }}{60}$$
(2)

where \(Z\) is the number of gear teeth and \(\omega \) is the shaft speed in rpm. This and its harmonics define the frequency signature for gear whining. It has been mentioned how gear whining in EV is perceived as particularly annoying [29].

Along with the gear meshing frequencies and the electromagnetic forces frequencies, the spectrum of the dynamic response will also contain components attributed to the natural frequencies of the system and it could also include components from the inverter that controls the motor. For the inverter, the switching frequency ranges between 4 and 16 kHz for Si IGBTs (Silicon insulated-gate bipolar transistor) and 16–40 kHz for the newer SiC (Silicon carbide) inverters. The case study presented in this work considers modes up to 5 kHz for the powertrain housing. Acoustic results are valid to frequencies up to 2.5 kHz. The effects of the switching frequency are not present in this frequency range but should be mentioned as a potential source of tonal contribution. A study on the influence of the inverter modulation [30] has shown that switching frequency randomization can have a positive impact in the sound character from the inverter.

The flexible multi-body dynamics calculations are carried out using AVL EXCITE™ M, where the application of the electromagnetic forces along with the gear meshing and bearing behaviour can be simulated. Rayleigh damping [31] is considered for all flexible bodies in the simulation. For the housing, the damping ratio is 0.01 at 200 Hz and 600 Hz. For the flexible shafts, the damping ratio is also 0.01 but at 100 Hz and 1000 Hz. With these inputs, the software defines the \(\alpha \) and \(\beta \) coefficients for the Rayleigh damping and, therefore, the damping at other frequencies. The coefficients can be calculated using the following equation and solving the system for the two frequencies [25]:

$$\upxi = \frac{\alpha +2\pi \beta f}{4\pi f}$$
(3)

For the helical gears, the dynamic model is the same as in [32], where the gears are represented by thin slices of spur gears in the lead direction. The compliance of each slide is independent of its neighbours’ slices. This approach can take into account tooth bending, including tooth tilt. Whilst these have a linear load-deformation relation, the local contact stiffness possesses a nonlinear behaviour. A Hertz-Petersen method is used by the software to consider these nonlinearities, implicitly solving the equations of motion by time-integration at each step in an iterative method [25, 32]. The time-varying meshing stiffness is obtained by summing the stiffness of the individual slices that represent the gear [32]. The meshing stiffness and its time-varying characteristics is known for being an internal source of excitation in the gearbox [33, 34]. Transmission error is linked to the periodic variation of the meshing stiffness, being the difference on how the actual gear pair transmits motion in comparison to the ideally rigid gear pair. It is considered to be the main contributor to gear whine [35, 36].

There are two ball bearings and six cylindrical roller bearings supporting the shafts. For the ball bearings, Hertzian contact is assumed, whilst a Kunert empirical approach is used for the cylindrical roller bearings. Equation (4) shows the force/penetration relation [25]. Masses of the individual components of the bearings are not considered. Also, deformations of the housing and shaft do not lead to deformation in the geometry of the bearing raceways, that are always considered perfectly circular. The model takes into account the correct direction of the bearing forces, along with the variation of the bearing stiffness caused by roll-over of the rolling elements [25].

$$\updelta =\frac{8.1x{10}^{-5}{{\text{F}}}^{0.925}}{{{\text{l}}}_{{\text{eff}}}^{0.85}}$$
(4)

where \(\delta \) is the penetration depth in mm, \({l}_{eff}\) is the effective width of the rolling element in mm and \(F\) is the elastic contact force in N.

The total bearing force is found by summing the elastic and damping forces of all rolling elements, which are given, respectively by [25]:

$$\left|{{\text{F}}}_{{\text{elast}},{\text{j}}} \right|=\left\{\begin{array}{c}0, {\updelta }_{{\text{j}}}\le 0\\ {{\text{C}}}_{{\text{total}},{\text{j}}}{\updelta }_{{\text{j}}}, {\updelta }_{{\text{j}}}>0\end{array}\right.$$
(5)

where \(\left|{F}_{elast,j}\right|\) is the magnitude of the elastic force in N for the rolling element \(j\) given the penetration \({\delta }_{j}\) [mm] and the total stiffness \({C}_{total,j}\) [N/mm].

$$\left|{{\text{F}}}_{{\text{damp}},{\text{j}}} \right|=-{{\text{f}}}_{{\text{damp}}}{{\text{C}}}_{{\text{total}},{\text{j}}}\dot{{\updelta }_{{\text{j}}}}$$
(6)

where \(\left|{F}_{damp,j}\right|\) is the magnitude of the damping force [N] for the rolling element \(j\) due to the damping factor \({f}_{damp}\) in s, total stiffness \({C}_{total,j}\) [N/mm] and penetration velocity \(\dot{{\delta }_{j}}\) [mm/s]. \({f}_{damp}\) is normally in the range of (0.25 – 2.5)10–5 [25]. In this study, the value of 10–5 is used.

The electromagnetic calculations were carried out using motor-cad. The stator is divided axially in four equal sections. A central point at the tip of each tooth for these sections is used for the computation of the tangential and radial components of the electromagnetic forces for each operating condition. The estimation of the force densities in the airgap is done via the Maxwell stress tensors by [37, 38]:

$${{\text{f}}}_{{\text{t}}}=\frac{1}{{\upmu }_{0}}{{\text{B}}}_{{\text{r}}}{{\text{B}}}_{{\text{t}}}$$
(7)
$${{\text{f}}}_{{\text{r}}}=\frac{1}{{2\upmu }_{0}}\left({{\text{B}}}_{{\text{r}}}^{2}-{{\text{B}}}_{{\text{t}}}^{2}\right)$$
(8)

where \({f}_{t}\) and \({f}_{r}\) are the tangential and radial components of the force density in N/m2, \({B}_{t}\) and \({B}_{r}\) are the tangential and radial components of the magnetic flux density in T and \({\mu }_{0}\) is the permeability in N/A2.

From the force density, the forces at the central nodes are calculated by [38]:

$${{\text{F}}}_{{\text{t}}}={{\text{f}}}_{{\text{t}}}{{\text{L}}}_{{\text{stk}}}{{\text{L}}}_{{\text{arc}}}$$
(9)
$${{\text{F}}}_{{\text{r}}}={{\text{f}}}_{{\text{r}}}{{\text{L}}}_{{\text{stk}}}{{\text{L}}}_{{\text{arc}}}$$
(10)

where \({L}_{stk}\) is stator section lamination length and \({L}_{arc}\) is the length of the arc where the force is acting, both in m.

A total of 15 different operating conditions of speed-torque combination were considered. For each one of these operating points, a set of electromagnetic forces are calculated using a 2D FEA model. The forces on the stator are resolved into its radial and tangential components and applied at each stator tooth. The forces on the rotor are mainly tangential (torque ripple) and act on the rotor surface. Each simulation point takes around 12 h to complete. Once steady-state motion is achieved, results are obtained. With the MBD results, a process of data recovery is performed. It can be seen as the inverse operation of the condensation. It relates the results of the condensed nodes to the complete finite element model of the housing, in order to obtain its vibrational velocity.

The vibrational velocity of the housing is then used in AVL EXCITE™ Acoustics to calculate sound pressure at a chosen location using a Wave Based Technique [39] for exterior Helmholtz problems.

Figure 4 summarises the simulation workflow.

Fig. 4
figure 4

Simulation workflow

Experimental data was obtained in a hemi anechoic chamber, with the system mounted on brackets resting on the floor. The model was validated against this experiment using data from one accelerometer on the housing and sound pressure levels of five microphones around the housing. The setup along with a schematic of the experiment can be seen in Fig. 5.

Fig. 5
figure 5

Position of the microphones for the experiment schematically. The curved arrow indicates the direction of rotation of the motor

A comparison of the results for the acceleration is shown in Fig. 6 for 5000 rpm and 0.428 \({T}_{max}\), to illustrate the validation. Good agreement is observed between model and experiment.

Fig. 6
figure 6

Acceleration data validation

The comparison of the average SPL of the 5 microphones around the housing is shown in Fig. 7. Again, good agreement between model and experiment in terms of frequency content for the tones, but with the model overestimating the electromagnetic contribution. At lower frequencies, the experimental data is higher than the model prediction. This could possibly indicate additional background noise during the experiment or noise-floor for the microphones. These were not measured. All results are showed in A-weighted SPL.

Fig. 7
figure 7

Average SPL comparison

3 Tone perception

Prominent peaks in a radiated noise spectrum might not be perceived as a prominent tone when one is exposed to that noise. For instance, these peaks can be masked by the noise within the same frequency band (or not being loud enough to be perceived). Moreover, the human ability to distinguish tones varies depending on their frequency [40], being easier to distinguish tones at lower than higher frequencies. To quantify these variations, the concept of critical bands was developed [41]. These are the frequency bands where a second tone can interfere with the perception of another tone in the same band.

Common metrics used for the analysis of the tone perception are the Tone-To-Noise Ratio (TTNR) and the Prominence Ratio (PR) [15]. These were firstly developed for the analysis of telecommunication devices. Whilst the TTNR compares the tone to the background noise within the same band, the PR metric compares the band containing the tone to the band below it and to the band above it. The background noise in a band is the noise within that band neglecting the small bandwidth of the peak (tone). The PR has been shown to correctly predict the tonal perception of noise [42] and to correlate with the perceived nuisance of the noise of electric vehicles [43]. Moreover, the PR metric can cope with multiple tones present in the same band [44]. The electromagnetic excitation and gear teeth meshing mechanism may lead to multiple tones within close frequency proximity. Therefore, the PR metric will be used in this study.

The ECMA-74 standard considers a band prominent if that band is at least 9 dB higher than the adjacent bands for frequencies greater than 1 kHz. For lower frequencies, this threshold varies between 20 to 9 dB. However, subjective studies on electric vehicles [43], suggest that PR values greater than 5 dB are classified as highly audible even at lower frequencies. It also shows that the perceived annoyance increases with PR.

The Prominence Ratio is calculated [44]:

$${\text{PR}}=10{\text{log}}\left(\frac{2{\text{B}}}{{\text{A}}+{\text{C}}}\right)$$
(11)

where B is the level of the critical band containing the tone (or tones), A and C are the levels for the adjacent bands. Figure 1 illustrates the bands A, B and C. For the levels A, B and C, the sound pressure values are squared and then summed over the limits of each band [43].

The bands are defined using the Equivalent Rectangular Bandwidth (ERB) [45]. The ERB are rectangular band-pass filters that model the filtering of the auditory system. The ERB is a revision of the classical critical bands with a smooth analytical equation [46], given by [45]:

$${\text{ERB}}=24.7\left(\frac{4.37{\text{f}}}{1000}+1\right)$$
(12)

where \(f\) is the frequency of the tone of interest in Hz.

One can define the bandwidth of B by defining the tone to be analysed. In other words, once the frequency of the tone of interest \({f}_{B}\) is chosen, the bandwidth \(ER{B}_{B}\) of B is also defined. Those also define the highest frequency of the lower band and the lowest frequency of the upper band. The bands must be contiguous. Therefore, the upper frequency of band A \({f}_{Au}\) is given by:

$${{\text{f}}}_{{\text{Au}}}={{\text{f}}}_{{\text{B}}}-\frac{{{\text{ERB}}}_{{\text{B}}}}{2}$$
(13)

Whilst the lower frequency of band C \({f}_{Cl}\) is:

$${{\text{f}}}_{{\text{Cl}}}={{\text{f}}}_{{\text{B}}}+\frac{{{\text{ERB}}}_{{\text{B}}}}{2}$$
(14)

Equations (12)–(14) can be combined to write a linear system of equations that allows for the calculation of the bandwidths of band A \(ER{B}_{A}\) and band C \(ER{B}_{C}\), along with the central frequencies \({f}_{A}\) and \({f}_{C}\):

$$\left[\begin{array}{cc}\begin{array}{cc}\begin{array}{c}\begin{array}{c}\frac{4.37}{1000}\\ 1\end{array}\\ \begin{array}{c}0\\ 0\end{array}\end{array}& \begin{array}{c}\begin{array}{c}-\frac{1}{24.7}\\ 0.5\end{array}\\ \begin{array}{c}0\\ 0\end{array}\end{array}\end{array}& \begin{array}{cc}\begin{array}{c}\begin{array}{c}0\\ 0\end{array}\\ \begin{array}{c}\frac{4.37}{1000}\\ 1\end{array}\end{array}& \begin{array}{c}\begin{array}{c}0\\ 0\end{array}\\ \begin{array}{c}-\frac{1}{24.7}\\ -0.5\end{array}\end{array}\end{array}\end{array}\right]\left[\begin{array}{c}\begin{array}{c}{{\text{f}}}_{{\text{A}}}\\ {{\text{ERB}}}_{{\text{A}}}\end{array}\\ \begin{array}{c}{{\text{f}}}_{{\text{C}}}\\ {{\text{ERB}}}_{{\text{C}}}\end{array}\end{array}\right]=\left[\begin{array}{c}\begin{array}{c}-1\\ {{\text{f}}}_{{\text{Au}}}\end{array}\\ \begin{array}{c}-1\\ {{\text{f}}}_{{\text{Cl}}}\end{array}\end{array}\right]$$
(15)

Figure 8 illustrates the bands A, B and C (in a sound pressure spectrum) that are used for the PR calculations, along with their Equivalent Rectangular Bandwidths.

Fig. 8
figure 8

Example of bands in the PR calculation

The Prominence Ratio was developed for the analysis of radiated noise. In this study, the calculation of the PR is extended to analyse the forces acting on the housing of the powertrain via the bearings, along with the acceleration response of the housing. The bearing forces and housing acceleration can represent components due to the electromagnetic excitation, the gear meshing mechanism, and modal response. The key hypothesis in this work is that there is a relation between the perception of a tone in the noise produced and the prominence of tones in the excitation forces and acceleration.

4 MBD simulation global results

The flexible MBD model is used to assess the feasibility of using a neural network for the prediction of the tonal content of the powertrain radiated noise. Two operating conditions were considered: torque output of \({T}_{max}\) with the rotor speed at 1000 rpm and 3000 rpm, respectively. The sound pressure level spectra for these two cases are shown in Fig. 9, with an example for the bands A, B and C when the frequency of interest is 1100 Hz. The SPL was calculated on a virtual microphone located at 0.5 m from the top surface of the housing (as shown in Fig. 1). One can notice several peaks in both spectra. In terms of NVH, these peaks might be perceived as prominent or not. If they are prominent, an increase in the nuisance will be noticed. For the 1000 rpm case, the peak at 600 Hz can be traced to the torque ripple (electromagnetic source, motor whistling). Whilst, for instance, the peak at 1100 Hz (3000 rpm case) is due to gear meshing (whining). One can also notice a peak at around 560 Hz in both cases. This is linked to a structural natural frequency of the system and is independent of the speed. The analysis to define it as prominent or not combined with the root-cause of this peak allows for the correct course of action in terms of NVH control.

Fig. 9
figure 9

Sound pressure levels for the MBD model: a 1000 rpm and b 3000 rpm

The numerical simulations were expanded to consider more operating conditions (combinations of torque output for different rotor speeds). In addition, the calculation of the acceleration at one point on the housing was included. Figure 10 summarises the sound pressure levels for the different rotor speeds (from 1000 to 8100 rpm) and torque output scenarios (0.142 \({T}_{max}\) to \({T}_{max}\)). At 1000 rpm, the largest contributor to the SPL comes from the electromagnetic excitation, whilst for the 3000–7300 rpm cases, the gear meshing frequencies play a more significant role in the frequency range considered. The natural frequency of the system at around 560 Hz is particularly evident in the 3000 rpm cases. For the 8100 rpm cases, once again the largest contribution to the noise spectrum has electromagnetic origin, but that happens because the gear meshing frequencies are outside the frequency range considered in this study.

Fig. 10
figure 10

Sound pressure levels for the different simulation scenarios

An example of forces generated at the different bearing locations acting on the housing is shown in Fig. 11. The bearing force results at 5000 rpm show larger contributions linked to the electromagnetic excitation, while the SPL spectra shows similar levels of radiated noise at frequencies due to gear meshing and electromagnetic forces. The different colours in the plot are just an indication of the different bearings. For confidentiality reasons, the values for the y scale are normalised. Figure 12 shows an example for the out-of-plane acceleration at a point of the housing. This point was chosen based on the largest displacement of the mode at around 560 Hz. This mode has a large contribution at 3000 rpm, but it is a good contributor to the SPL in all cases analysed. Similarly, to the results for the forces, the acceleration spectrum shows larger contributions at the electromagnetic excitation frequencies (at around 20 dB higher than the contributions at gear meshing frequencies). The combination of these two factors complicates the establishment of a clear connection between SPL (and PR) and the outcomes of force and acceleration results. A pattern recognition network will therefore be used to establish this link.

Fig. 11
figure 11

Forces from the bearings for the 5000 rpm 0.428 \({{\text{T}}}_{{\text{max}}}\) case. Different colours for the eight different bearings

Fig. 12
figure 12

Acceleration in dB scale for the 5000 rpm, 0.428 \({{\text{T}}}_{{\text{max}}}\) case

The force spectra shown correspond to the resultants on each bearing, to represent the contribution from all directions. It is given by:

$${{\text{F}}}_{{\text{T}}}=\sqrt{{{\text{F}}}_{{\text{x}}}^{2}+{{\text{F}}}_{{\text{y}}}^{2}+{{\text{F}}}_{{\text{z}}}^{2}}$$
(16)

where \({F}_{x}\), \({F}_{y}\) and \({F}_{z}\) are the components of the force in the x, y and z direction, respectively.

Once the spectra for the noise, acceleration and forces are obtained, the PR values can be calculated using equations (12) to (15). Figure 13 shows an example of the Prominence Ratios for two different frequencies that are seen as peaks in the SPL spectrum at 3000 rpm 0.142 \({T}_{max}\) case (Fig. 14). A band is considered to be prominent if the noise PR is larger than 9 dB. The red x highlights a prominent band, whilst the green circle highlights a non-prominent band. If a band is deemed prominent, the perceived nuisance from this radiated noise will be increased, influencing the subjective analysis of the sound quality of the vehicle.

Fig. 13
figure 13

Prominence ratios at 3000 rpm, 0.142 \({{\text{T}}}_{{\text{max}}}\) for two different tone frequencies

Fig. 14
figure 14

Sound Pressure level (3000 rpm, 0.142 \({{\text{T}}}_{{\text{max}}}\))

5 Prominence identification and neural network

The concept of prominence ratio was applied to the bearing force spectra, along with the radiated noise spectrum for chosen frequencies at two different operating conditions. The data is summarised in Table 2.

Table 2 Prominence Ratio for the selected central frequencies of band B

Generally, Table 2 shows that if the prominence ratio for all bearing forces is high, the noise PR will also be high. However, for the 400 Hz tone, there is one distinctively low PR value for a bearing force that reduces the noise PR to 0.1. This suggests that the 400 Hz band is not as prominent, despite the PR values for other bearing forces being similar to those for the 200 Hz band, for example. Additionally, while one bearing force has a low PR for the 600 Hz band, it is still a prominent band in terms of noise.

Since no obvious relation can be drawn between the PR for the bearing forces and the PR for the radiated noise, one can, however, use these data to train a pattern recognition neural network to better correlate the values.

Artificial neural networks can be used to map outputs to inputs of complex systems. Little prior knowledge of the system and input–output relationship is required [47]. The network is divided into input layer, hidden layer (or layers) and output layer. Each layer has a set of neurons, and each neuron has its own weight and bias. From layer to layer, each neuron passes its value to the next one adjusted by the adequate weight and bias. The training process is the process of adjusting these weights and biases [47]. An example of network used in this study is shown in Fig. 15. The input layer of the network used has 8 or 9 neurons, depending on if the acceleration data is used or not. These neurons connect to the 5 neurons in the hidden layer, which are linked to the two neurons in the output layer, where one of the neurons is triggered indicating if the noise is classified as prominent or not for that band.

Fig. 15
figure 15

Network schematically highlighting inputs and outputs. The neurons are represented by circles whilst the arrows connecting circles represent the link between the neurons in each layer. The hidden layer has five neurons

A single hidden layer network was trained using the Scaled Conjugate Gradient method [48]. Eight inputs are considered (corresponding to the eight bearing forces) for each sample. The output layer has two outputs: prominent or non-prominent. For the number of features of this network (8), the literature suggests that 40–80 samples can be used to train a classifier [49, 50]. As a proof-of-concept exercise, the number of samples in this first approximation is much smaller. The MATLAB function “patternnet” was used for the network. It is a feedforward network that can be trained and used to classify inputs to a target [51]. It is known that a single hidden layer neural network with a finite number of neurons acts as a universal approximator [52].

For a motor velocity of 1000 rpm, six frequencies were selected for analysis. The chosen PR values correspond to the bands centred at 100, 200, 300, 400, 560, and 800 Hz at 1000 rpm. The noise PR values are used as targets and are converted into an array with ones and zeros, as shown in Table 3. A threshold of 9 dB for the noise PR is assumed. If the PR value is greater than 9 dB, the band is considered prominent and a value of 0 is assigned to the first column of the array while the second column is assigned a 1. If the PR value is less than 9 dB, the first column receives a 1, and the second, a 0.

Table 3 Target logical array for classifying the tone prominence

The network is therefore trained with the 48 force PR values (8 for each one of the 6 selected bands) with the defined targets. The quality of the network is assessed by feeding the trained network with the inputs again to confirm its output. A perfect match is observed.

The trained network is also tested against the unseen remaining data. Once again, a perfect match is observed, as summarised in Table 4.

Table 4 Proof-of-concept network. Unseen data results

The network was also trained considering the results of the expanded simulations presented in Fig. 5. The data for the training were processed as follows:

  • The \(f\) in equation (12) is varied from 170 to 2300 Hz being sampled at every 5 Hz (a total of 427 frequencies). This includes a more complete view of the spectrum instead of using selected peaks only.

  • For each of these \(f\) the noise PR was calculated along with the acceleration and force PRs.

  • Again, if the noise PR is greater than 9 dB, the band is considered prominent (991 samples)

  • From the total available data points (6405 = 427 frequencies times 15 cases, summarised in Fig. 16), the same number of non-prominent bands as prominent bands are randomly selected and used for the training.

Fig. 16
figure 16

Operating points considered

Therefore, 1982 data points are available for the training, validation, and test of the network. The network will classify the noise for a given frequency of interest as prominent or not using the acceleration and bearing force data. Once again, a single hidden layer network was trained. The size of the hidden layer is five. Firstly, the Scaled Conjugate Gradient (SCG) was used as the training algorithm and then the Resilient backpropagation (Rprop). Both methods are recommended for training classifier networks [53].

The available data was divided into 70% for the training, 15% for the validation and 15% for the test, with unseen data. The same seed for the random number generator (rng) function was used in both cases, so that the initialisation of the first random weights and biases had the same starting condition for the training. The cross-entropy performance is shown in Fig. 17. The confusion matrices are shown in Fig. 18 for network trained using the SCG approach and in Fig. 19 for the network trained with the Rprop algorithm. Analysing the results with the test data and the cross-entropy performance, the Rprop algorithm has a slight better performance, with a smaller cross-entropy value for the validation set. In order to avoid overfitting, Matlab automatically keeps tracking the cross-entropy of all three sets. If increase in the cross-entropy of the validation set is detected for a few interations while the cross-entropy of the training set keeps decreasing, the network training is halted. The weights and biases of the network that lead to the minimum cross-entropy of the validation sets are then selected and define the network. One can notice that the cross-entropy value for the train set shows a small decrease in both cases after the parameters for best performance were chosen (marked by the green circle). In contrast, the cross-entropy for the test and validation sets show a small increase, which could indicate some overfitting, if these parameters for the network were used.

Fig. 17
figure 17

Performance of the networks trained with different approaches: a using SCG and b using Rprop

Fig. 18
figure 18

Confusion matrices with the SCG training. P stands for prominent and Np for not prominent

Fig. 19
figure 19

Confusion matrices with the Rprop training. P stands for prominent and Np for not prominent

A confusion matrix compares the real class of a variable with the class predicted by the neural network. The Target Class axis (the first two columns of the matrices in the previous figures) list the actual number of Prominent and non-prominent tones, whilst the Output class axis (first two rows) show how the network predicted the noise. Focusing for instance on the All Confusion Matrix in Fig. 18, 927 instances were correctly identified as prominent, whilst 64 prominent instances were classified as non-prominent by the network. 126 instances of non-prominent bands were wrongly identified as prominent by the network and 865 non prominent bands were correctly identified. The percentage shown below each one of these elements represents the relative number of samples in relation with the total number of samples. For instance, 927 of the correctly identified prominent bands represent 46.8% of the total samples (927 + 126 + 64 + 865 = 1982). Element (1,3) of the matrix represents the relative accuracy of the network prominent band output. In other words, the number of correctly identified (in green) prominent bands divided by the sum of the correct and incorrect prominent bands output (927/1053). In red, the number of incorrectly classified divided by the sum of the correct and incorrect prominent bands output. Element (2,3) is the equivalent analysis but for the non-prominent output. On the other hand, element (3,1) shows the relative accuracy between the correctly identified prominent bands and the actual number of prominent bands (green percentage). From the total of 927 + 64 = 991 prominent bands, the network correctly classified 93.5% (927/991). The red percentage shows the percentage of prominent bands that were classified as non-prominent (64/991). Element (3,2) is the equivalent but related to the non-prominent classification. Finally, element (3,3) shows the overall accuracy in relation to the total number of occurrences. For example, the 1792 correctly classified samples (927 + 865) represent 90.4% of the total number of 1982 samples. The misclassified samples are 190 (126 + 64) and represent 9.6% of the total sample size of 1982 samples.

6 Discussion and limitations

The proof-of-concept case study has shown that it is possible to predict features of the radiated noise by using the predicted dynamic behaviour of the powertrain. When more data were available, the trained network correctly classified 97% of the prominent bands in the unseen data, whilst around 89% of the non-prominent bands were correctly classified. As a whole, around 92% of the data were correctly classified, as seen in the element (3,3) of the All Confusion Matrix in Fig. 19. The inclusion of the acceleration of a point in the housing allows to pick up a better description of the behaviour of the housing, increasing the accuracy of the trained network. For comparison, another network that did not consider the housing acceleration as an input, but only using the bearing forces was also trained. The same seed was used for the rng, once again. The confusion matrices for this network are shown in Fig. 20 whilst the cross-entropy performance is shown in Fig. 21. The accuracy of the network drops and it has a deteriorating performance. The radiated sound is directly linked to the vibrating surface of the housing, so it is expected that including this information would lead to more accurate NVH predictions.

Fig. 20
figure 20

Confusion matrices. Network without housing acceleration input. P stands for prominent and Np for not prominent

Fig. 21
figure 21

Performance of the network without using the housing acceleration input

The noise PR versus the acceleration and (each of the) bearing force PR are shown in Fig. 22. The horizontal line marks the 9 dB threshold for a band to be considered prominent. Whilst there seems to be a trend of higher PR from the MBD simulations equals to higher noise PR, it is also possible to see how, for instance, that acceleration PR = 0 can lead to both prominent and not prominent bands. The training of the network took around 1 s to complete, which is deemed reasonable compared to simpler linear regression models using the “fitlm” function in Matlab. It is important to note that these models will fit the noise PR and then this information has to be compared to the 9 dB target for the classification. Different degrees of polynomials were tested, varying from 1 up to 4. These polynomials also have products of the inputs up to the desired order. For 9 inputs, there are 10 coefficients to be adjusted when order 1 is considered and 715 when order 4 is considered. The number of coefficients has to be equal or smaller than the number of samples available. The same 70% split of data for training was considered here for the fitting. From the available samples, 1388 were selected at random for the fitting. Figure 23 shows the equivalent training and test confusion matrices from the regression fits with order 1 and 4 to illustrate results. The computational time for these fits is negligible.

Fig. 22
figure 22

Noise PR versus MBD PR

Fig. 23
figure 23

Confusion matrices from linear fit models. In (a) and (c), the results from the fitting of the 1388 data points using polynomials of order 1 and 4 respectively. In (b) and (d), using the fitted model to predict the response of the remaining data

The results of the fitted models are highly dependent on the polynomial degree, which might require some tunning. Moreover, even the highest order achievable with the same amount of data does not generalise as well as the trained network. The trained network correctly identified the prominent bands in the unseen data 97% of the times versus the 88.6% of the fitted model. A 9.5% increase in the performance.

It is important to highlight that due the nature of the metric used (the Prominence Ratio) the applicability of this network is limited to changes that act on the amplitude of the peaks without drastically changing the background noise in the bands considered. In other words, the general shape of the spectrum should remain more or less constant other than changes in the peaks. For instance, Fig. 24 shows 3 different SPL spectra. For a chosen frequency, all three leads to the same PR, as it is a relative metric. However, it does allow for an investigation of targeted changes of the peaks. Especially considering that these peaks usually have a known root-cause, such as being linked to electromagnetic excitation, gear meshing or modes of the structure. Changes in the damping of structure, along with bearing preloads, bearing damping and microgeometry of gears can be perceived by the network. Figure 25 shows an example of the forces transmitted from one of the bearings to the housing when different damping values and axial preloads are considered (in this case, doubling and halving the damping ratio and bearing preloads). The red curve shows the nominal case where the paper is based. Figure 26 shows the SPL of the nominal case compared to the SPL when 20 μm lead crowning was applied to the gear pairs. It is possible to note reduction in peaks related to some of the gear meshing frequencies. It is outside the scope of this work to optimise these results, so the changes were not targeted and are merely illustrative, to show the effects on the forces and SPL along with the network behaviour.

Fig. 24
figure 24

Three SPL spectra with same Prominence Ratio

Fig. 25
figure 25

Force from one bearing (3000 rpm 0.428 \({{\text{T}}}_{{\text{max}}}\) different scenarios)

Fig. 26
figure 26

SPL when one microgeometry change is made. Zoomed in to show some differences in the SPL

The data from the bearing forces and acceleration in the examined microgeometry change case is then used as input for the trained network and regression model again for comparison. The confusion matrices for both cases are shown in Fig. 27. As expected, the network has generalised the classification problem better than the fitted model with the same amount of data and both with negligible computation time. Whilst the fitted model correctly classified 75.6% of the prominent samples, the trained network correctly identified 95.1%. The network has a superior performance by 25%.

Fig. 27
figure 27

Prediction on unseen data due to microgeometry change a the regression model and b the Rprop trained network

Also, for a tone to be perceived as prominent, it requires that it can actually be heard. For this purpose, the Loudness metric can be used, which relates the sound pressure levels to the perceived intensity of a sound [54]. The Loudness, measured in phon, has been used to correlate and assess the subjective perception of sound quality in electric powertrains [10]. In this case, the trained network would be helpful in the sense that it points out which peaks are perceived as more challenging from an NVH perspective. For instance, in the 3000 rpm–0.142 \({T}_{max}\) case, Fig. 14, the PR indicates that it is more important to address the peak near 560 Hz than the one near 1200 Hz, even though they have similar SPL and Loudness levels (Fig. 28). The design engineer can therefore target their changes to that peak and evaluate the acoustic results using the trained network instead of having to run simulations for both the MDB model and the acoustic model for each design change, in an attempt to mitigate the tonal contribution at 560 Hz.

Fig. 28
figure 28

Equal-loudness-level contour

Moreover, this study used 9 dB as the target for a tone to be perceived as prominent. However, these results are a direct simulation of the powertrain. The noise perceived by the driver and passengers will be mitigated by, for instance the powertrain being enclosed. Also, the noise inside a car contains additional contributions to the powertrain. For instance, road/tyre noise, wind noise, noise from ancillary systems such as air-conditioning that will contribute to the general SPL spectrum.

It is possible though to train a network with a different target other than the 9 dB to take these other factors into account. Additional data can point on what the new targets should be. To illustrate this scenario, a broadband noise is added to the SPL of the powertrain to represent the contribution from additional sources. It is possible to choose a frequency of interest and reverse the calculations of equations (12)–(15) to find what would be the required SPL at that frequency that returns a given PR in the presence of the added background noise. Figure 29 shows the original SPL level, along with the added broadband noise. It also shows the summation of the original SPL with the background noise and the new increased SPL required to achieve the chosen PR. The inset magnifies in the increased peak. In this case, the peak at 600 Hz was used as an example. Without the added noise, that peak has a PR of 12.5 dB. Once the additional noise is considered, the PR drops to 7.3 dB. In order to score PR = 9 with this level of additional background noise, the sound pressure level at 600 would have to be larger. Artificially increasing this value and recalculating the PR based on the SPL of the powertrain only, the increased PR is 16 dB. Therefore, depending on the expected level of additional noise from the different sources in a vehicle, the target for training the network can be adapted.

Fig. 29
figure 29

Added noise. 1000 rpm, \({{\text{T}}}_{{\text{max}}}\) operating condition

To illustrate this capability, a new network was trained with an arbitrary target PR value of 12 dB. The same single hidden layer and Rprop configuration was chosen. The network performed similarly to the 9 dB case as shown by the confusion matrices (Fig. 30) and cross-entropy performance (Fig. 31), correctly classifying near 92% of the data.

Fig. 30
figure 30

Confusion matrices (12 dB target). P stands for prominent and Np for not prominent

Fig. 31
figure 31

Performance of the 12 dB target network

7 Final remarks

In this study, a flexible MBD model was used to calculate the forces transmitted to the housing of an electric powertrain used in a commercial vehicle. The model also included the calculation of the vibration at a chosen location of the housing. From the dynamic results, the noise radiated by the system was calculated. Both sets of results were used to train artificial neural networks to assess the sound quality of the radiated noise. Electric powertrains are generally quieter than ICEs, but their tonal behaviour might cause discomfort for drivers and passengers. This works contributes to the body of knowledge by showing how psychoacoustic features can be estimated using multibody dynamics simulations and a surrogate model without the necessity of additional expensive acoustic simulations. The proposed surrogate model is an artificial neural network. The data for the training of this network can source from the initial design phase.

A common practice to evaluate the sound quality in the presence of tones is to use the Prominence Ratio metric. Some efforts have been made to assess the NVH of EVs using machine learning and experimental acoustic data in the past. The foundation for a novel framework was presented in this work and it can be a valuable optimisation tool for NVH performance. It overcomes the necessity of new acoustic data for estimation of acoustic features. The perception of a tonal behaviour in the radiated noise comes with an increased nuisance to passengers and drivers. However, by looking at results of the MBD alone, once a network is trained for the classification, it is possible to identify and mitigate aggressive NVH behaviour. This is especially useful in the design phase, where a variety of changes from the initial design can be investigated without the necessity of computationally expensive acoustic calculations, for these changes. Each MBD simulation took around 12 h to complete, whilst each acoustic simulation took around 14 h. However, the acoustic simulation times drastically increase with the frequency range. Therefore, as the necessity for simulations that cover higher frequency bands increases, this methodology can save considerable simulation time training the network with the nominal design first and then proceeding with optimising the radiated noise by targeting changes to the powertrain. The trained network accurately classified if the radiated noise is prominent or not in around 92% (both 9 dB and 12 dB targets, respectively) of the cases based only on the results of the MBD model. The training of the neural network was not time demanding and the increase on the use of neural networks also comes with an increase on the availability of toolboxes that require minimum pre-processing of the data. A comparison with a similarly fast linear regression method was shown. Even though the regression model can establish some link between bearing forces, housing acceleration and radiated noise, the network performance is superior on the generalisation problem, showing greater accuracy in the prominence determination.

This work does not aim to further the development of artificial neural networks but introduces a flexible new application that can help in assessing targeted changes in the design of powertrains. Different training algorithms were assessed by checking the confusion matrices and cross-entropy. The limitations of the method were also addressed and are linked to the metric chosen. In terms of future work, this methodology is flexible enough to allow for training using different metrics, in the event one is more suitable for the required application. Whilst this work relies on results of detailed MBD simulations, it could greatly decrease the computation times if coupled with a reliable analytical or lumped parameters model for the calculation of the forces originating from the bearings to the housing. An investigation relating PR targets for the bearing forces that would lead to a target PR noise can also be envisaged. It would require the analysis of different systems and possibly the inclusion of more information such as radiation efficiency, surface velocity, for more complete generalisation.