Introduction

Respiratory diseases are one among the leading cause of deaths in the world, which is usually diagnosed through auscultation. Based on the frequency, intensity, time duration, and quality of the sound, a pathological and healthy-normal breath signal can be distinguished [1,2,3]. When the normal lung sounds are generated due to the movement of air through the tracheobronchial tree, the vibrations of solid tissues, are responsible for the adventitious or abnormal lung sounds. In the normal lung sounds, the vesicular sounds (VS) are heard over the chest wall distant from larger airways. Adventitious sounds can be generally classified as continuous and discontinuous based on their duration of occurrence. When the continuous adventitious breath sounds (wheezes, stridor, and rhonchi are musical) exhibit a time duration of > 250 ms, the discontinuous adventitious signals show a time duration of < 25 ms. Thus, many significant, characteristic features and conditions of the lung can be understood from the auscultation of lung sounds [1, 3,4,5].

In the present study, an attempt is made to bring out the hidden complexities in the adventitious and continuous respiratory disease—Expiratory Wheeze (EW). Wheezing sounds are musical sounds that can be identified by their intensity, pitch, location, and time duration between expiration and inspiration. They can be high or low pitched depending on the narrowing of the airway obstructions. The bronchial obstruction due to tumours, accumulation of mucus or any other secretions, bronchostenosis by inflammation or the presence of foreign bodies results in the generation of wheezing sound. The nature of the obstruction decides the nature of the wheezing, i.e., the occupancy of a flexible obstruction in the air passage causes inspiratory or expiratory wheezing whereas, a rigid permanent obstruction produces a wheezing sound throughout the respiration. The most common condition of wheezing observed in people is the expiratory wheezing (EW), where the wheezing sound is heard during the exhalation. The location of the generation of the wheezing sounds includes the branches between the second and seventh generation of the airway tree coupled with the oscillation of air molecules passing through narrowed airway walls [1]. Therefore, the lung sound due to EW and normal vesicular (VS) breathing have significant features that can aid in the diagnosis of diseases. The vesicular sounds are low-pitch, non-musical sounds usually heard over most of the lung surface.

In the standard auscultation using a stethoscope, the possibility of making errors in the diagnosis is high as it depends on the sensitivity of ears, the presence of noises, and practise in recognising sounds. This is overcome in the computerised methods of analysing the complex and nonlinear lung sounds. The computerized analysis of the signals in the time, frequency, and time–frequency domain could reveal important information regarding the pathological conditions [6]. As the ordinary, as well as abnormal lung sounds, occurs at specific frequency ranges, the details on the number of frequency components and its nature of persistence can be obtained. This is usually done with the help of spectral analysis tools such as time spectrum, fast Fourier transform (FFT), power spectral density (PSD), and wavelet analyses. According to Goldberger [7], human body is a wonderful laboratory for the study of fractals, chaos, and other types of nonlinear dynamics. Nonlinear time series analysis is an excellent mathematical tool for studying complex biological signals. The multi-dimensional phase space representation throws light into the respiratory dynamics of the time series data. When the sample entropy (SE) is a measure of unpredictability of a time series, the maximal Lyapunov exponent (MLE) accounts the dynamics of trajectory of the phase portrait reconstructed using the time series data. The fractal analysis can probe the self-affine and self-similar fractal nature of the respiratory signals. Walking-divider, epsilon-blanket, power spectrum, and box-counting methods are some among the methods for exploring the fractality of a signal. The degree of self-similarity of the lung sound signal is quantified by the two parameters—fractal dimension (Db) and Hurst exponent (Hb)—estimated using the simple box-counting method [8,9,10].

The present study proposes a simple, cost-effective diagnostic method based on the mathematical tools for the investigations on the VS and adventitious EW breath signal. The lung sound signals are analysed in time, frequency and their combined domains by PSD and wavelet analyses. The nonlinear respiratory characteristics are extracted using the powerful nonlinear time series and the hidden complexity by the fractal analysis. The PSD being, the most appropriate for the feature extraction of a dataset, the principal component analysis can be used for the classification of the signals based on these features. The study is significant in the context of the outbreak of the pandemic COVID-19 as it proposes methods based on time series and fractal analyses for analysing breath sound signals.

Methodology

Technological advancement has enabled the use of computerised methods for the analysis of biomedical signals that could overcome many of the limitations of simple auscultation using a stethoscope. In the present study, digital audio sound signals of 35 normal vesicular (VS) and expiratory wheezing (EW) breath signals collected from various respiratory sound databases [11,12,13,14] are analysed. The lung sounds being nonlinear and non-stationary signals, lot of information can be unwrapped from the spectral (PSD and wavelet), nonlinear time series (phase portrait, sample entropy and Lyapunov exponent), and fractal analyses (fractal dimension and Hurst exponent).

Fast Fourier Transform and power spectral density analyses help in the objective and quantitative analysis of the pulmonary or respiratory sounds that are variations of sound intensity over time. From literature, it can be seen that many of the adventitious sounds occur at specific frequency intervals. Therefore, the frequency distribution of lung sounds could reveal hidden information on lung diseases. In the FFT algorithm, the lung sounds signal in the time domain is transformed into a signal in the frequency domain. If x(t) is defined as a time-domain signal having frequency f, time, t and length L then the Fourier Transformed signal X(f) is expressed as Eq. (1) [15]

$$X\left( f \right) = \int {x\left( t \right)e^{{ - j2\pi ft}} dt}$$
(1)

From the complex FFT signal, the real-valued power spectrum, signal's power content at each frequency, can be obtained from the relation (Eq. 2)

$$P = \frac{{\left| {X\left( f \right)} \right|^{2} }}{L}$$
(2)

Thus, the PSD data from the FFT analysis is one among the most commonly used method for the feature extraction of data for lung sound classification.

The time–frequency representation in the wavelet transform overcome the limitations of the time and frequency domain signal analyses. Wavelet scalogram gives a visual representation of the temporal evolution of the spectrum of the frequencies. Wavelet analysis enables the decomposition of a signal (x(t)) into its shifted (\(\Gamma\) -translation parameter) and scaled forms (s-scale paramter) of the mother wavelet (base function-\(\varphi\)) as given by the expression [16]

$$W_{c} f\left( {s,\Gamma } \right) = \int\limits_{{ - \infty }}^{\infty } {x\left( t \right) \cdot s^{{ - 1/2}} \varphi \left( {\frac{{t - \Gamma }}{s}} \right)dt}$$
(3)

The Morse wavelet, introduced by Daubechies and Paul, which belongs to the category of continuous analytic wavelet is used in analysing the normal and abnormal lung sound as it unifies all the other analytic wavelets in a single domain.

The complex turbulent airflow dynamics due to the structural interaction with airway walls makes the lung sound signals nonlinear. This nonlinearity can be unveiled through the nonlinear time series analysis of the breath signal using the R software. A nonlinear dynamical system can be described entirely by its multi-dimensional phase space representation. For the construction of phase portrait, the parameters, delay coordinate (τ) and embedding dimension (m) are calculated using the nonlinearTseries package in the R software. The mutual information function gives mutual dependence between two variables. In the R software, we get the average or auto mutual information, i.e., mutual information computed between a time series data and its time-shifted data. The embedding dimension gives the minimum number of dimensions essential for a phase portrait construction. The method put forward by Cao [17] is one of the practically used methods for choosing the proper embedding dimension, as it is capable of distinguishing deterministic and stochastic signals, works good for high dimensional attractors, and it is also independent of the length of the data. Thus, by employing the Taken’s method [9] of delay (“buildTakens” command in R software), using τ and m, a phase portrait is reconstructed as represented in Eq. (4), which helps in the visualisation of the hidden complexity in a time series data. A reconstructed vector in the phase plane in terms of τ and m can be formulated as,

$${\varvec{x}}_{{\mathbf{n}}} = \left( {x_{{n - \left( {m - 1} \right)\tau }} , x_{{\left( {n - \left( {m - 2} \right)\tau } \right)}} , \ldots ,x_{n} } \right)$$
(4)

Two widely used indicators of chaoticity and complexity of a time series data are the maximal Lyapunov exponent (MLE) and Sample entropy (SE) [9, 10, 18]. MLE measures the dynamics of the evolution of the trajectory in a phase portrait. The positive value of MLE denotes the rate of divergence of trajectories, indicating a chaotic domain and negative value shows the rate of convergence [2]. The rate of divergence (at time t) of two trajectories in the phase space having δZ0 as the initial separation vector is given by Eq. (5)

$$\left| {\delta Z\left( t \right)} \right| \approx e^{{{\uplambda }t}} \left| {\delta Z_{0} } \right|$$
(5)

The value of MLE (λ) is obtained from the slope of the plot between \(log\frac{\delta Z(t)}{\delta {Z}_{0}}\) and time (t). The command ‘maxLyapunov’ in the R software is used for this. The SE is a scale-independent computational means of quantifying the complexity in terms of regularity or predictability of temporal signals. The SE is estimated using the command ‘sampleEntropy’ in R software, employing the values of time-delay, embedding dimension, and tolerance. The SE is computed in a radius of the neighbourhood (r) using Eq. (6) [19],

$$SE\left( {m,r} \right) = {\ln}\left( {\frac{{C^{m} \left( r \right)}}{{C^{m + 1} \left( r \right)}}} \right)$$
(6)

where \({C}^{m}(r)\) and \({C}^{m+1}(r)\) denotes the correlation sum of dimension m and m + 1, respectively. The value of SE at a particular value of m is the average of SE value for different values of r. Higher and lower values of SE represent complex and regular signals, respectively.

The complex signals having chaotic behaviour is expected to exhibit fractal nature too. The fractal dimension quantifies the degree of complexity of a self-affine and self-similar time series data [20,21,22]. Of various methods of finding the fractal dimension, the box-counting method is employed in the present study. The box-counting fractal dimension (Db) of the respiratory signals is estimated using ‘fd.estim.boxcount’ function present in the ‘fractaldim’ package of R software. In the box-counting method, Db is calculated by overlaying boxes of different sizes (ε) on to the signal and then counting the number of boxes N(ε) required to envelop the signal completely. Following the fractal power law, the relation between Db, N(ε) and (ε) is given by [23]

$${\text{N}}\left( \varepsilon \right) \propto \varepsilon^{{ - D_{b} }}$$
(7)

where the slope of \(lnN\left(\in \right)\) vs \(\mathit{ln}\left(\frac{1}{\in }\right)\) plot gives Db, from which the value of Hurst exponent Hb can be calculated using Eq. (8).

$${\text{H}}_{{\text{b}}} = {2} - {\text{D}}_{{\text{b}}}$$
(8)

Depending on the value of Hb, the time-varying signals falls under the three categories—persistent, antipersistent, and Brownian. In an antipersistent time series, also known as mean-reverting series, for which Hb < 0.5, the succeeding values possess a tendency to come back or revert to the long-term mean value. For Hb > 0.5, the time series is said to have persistent nature, holding a short-term positive correlation between the data points. When there is a null correlation between the preceding and following data points in a signal, it is termed as Brownian time series or random walk [21]. Thus, Db and Hb are considered as potential parameters to denote the complexity of biological spatiotemporal signals.

Classification of the complex pathological signals from the normal healthy ones using signal-processing tools is a challenging task. The principal component analysis is one such method employed to analyse large time-series datasets by reducing the dimensionality by creating new uncorrelated variables that successively maximize variance, without any loss of information [24]. Even though various parameters like PSD, Db, SE, MLE, and Hb can be used for PCA analysis, as the lung sounds have characteristic frequency distribution, the PSD data from the FFT analysis contains more information regarding the pathological conditions. Thus, in the present work, the power spectrum of the signals in the range 100–1000 Hz are found, segmented into 26, and the average PSD of each segment is calculated. The obtained mean values of the PSD data are the elements of the feature matrix required for the PCA analysis.

Results and discussion

The integration of auscultation with computerised signal processing has emerged as a reliable and quantitative diagnostic tool in the field of pulmonology. Monitoring of the audio signals produced in the lung provides valuable information regarding its functioning and helps in the diagnosis of diseased conditions. One of the easiest and simplest way of representing an audio signal is by its time spectrum. From the representative time spectrum shown in Fig. 1, out of the thirty-five signals, the basic features of the VS breath signal and EW signal can be observed. When the VS signal shows a near-pause between its inspiration and expiration phase, the wheezing signal is continuous. Also, the difference in the time duration in the inspiratory and expiratory phase in the VS and EW signal is evident from the time spectrum. From literature [3], it is clear that the intensity and time duration of the inspiratory phase in the VS signal is more when compared to its expiratory phase, as seen in Fig. 1a, which is due to the less turbulence of the airflow. But, from the time spectrum given in Fig. 1b, it can be understood that the wheeze happens during the exhalation period of breathing, resulting in the expiratory wheezing condition, where the duration of expiration is dominant when compared to inspiration. The magnified portion of the EW signal (2.25 s to 2.34 s) showing sinusoidal pattern is given in the inset of Fig. 1b.

Fig. 1
figure 1

Time spectrum of the breath signal a VS and b EW with a portion (2.25 s to 2.34 s) magnified in the inset

The lung sounds are highly non-stationary signals and the degree of non-stationarity increases as its abnormality increases [25, 26]. The difference in the passage of the air through regions of different cross-section produces a wide spectrum of frequencies. i.e., the number of frequency components, their spread, intensity, and mode of appearance reflects the mechanical features of the airways and the nature of airflow through them helps in providing important information regarding the state of lungs and thus in diagnosis. Hence, the frequency domain and time–frequency domain representation of the respiratory signals using FFT and wavelet technique is analysed. Figure 2 shows the power spectral density plot (PSDP- frequency (Hz) vs. power as mean square amplitude (MSA)) of a representative signal of VS and EW sound signal during a single cycle of breathing. A large number of frequency components spread over a wide range of frequencies between 120 and 700 Hz can be observed in the PSDP of the vesicular signal, which may be due to the flow of air through the lobar and segmental airways having varying diameters. (0.56 cm and 0.83 cm). From the PSDP of the VS signal, the overtone band around a frequency of 580 Hz corresponding to the fundamental frequency at 260 Hz can be seen indicating the flow of air through varying cross-sections. But, the EW signal shows a well-defined high intense peak around 260 Hz in the power spectrum, which indicates the narrowing of the air passage. The reduction in the calibre of airways accelerates the airflow producing musical EW sounds and oscillations of walls. Grotberg and Davis [27] have shown that the fluttering of walls of the airways with fluid in EW is responsible for the frequency components generated. The critical flutter frequency indicates the musical pitch of the signal, which increases with narrowing of walls, increased bending resistance, elastance, and longitudinal tension. From the literature [26], it is well understood that the expiratory sounds are produced mostly from the obstruction in the distal airways, those having < 2 mm internal diameter, (mainly from fifth to the seventh generation of the airways having diameter—0.08 cm to 0.2 cm) producing single high intense peaks in the PSDP. The examination of the power spectral plot of the inspiratory and expiratory phases of VS shows a large number of frequency components, whereas the EW signal shows only during the inspiratory phase. The sharp peak and the narrow frequency spread in the FFT of the expiratory phase of EW suggest expiratory wheezing. Thus, the analysis of the PSDP, obtained from the FFT data, helps in distinguishing whether the disease is in the expiratory or inspiratory phase.

Fig. 2
figure 2

Power spectral density plot of breath signal: VS—a inspiration b expiration, c respiration; EW—d inspiration, e expiration, f respiration

A more evident differentiation between the VS and EW sound signal can be had from the time–frequency representation, wavelet scalogram, as shown in the representative scalogram, Fig. 3. Literature reports that the expiration to inspiration ratio in a VS signal is 1:3 with a pause between the two phases, which can be seen on the wavelet scalogram displayed in Fig. 3c. In addition, the short-time persistence of the higher intensity frequency component in the VS signal during expiration is evident. On analysing the wavelet scalogram of EW signal given in Fig. 3f, continuous undulating sinusoidal deflections having a very high intense peak at 261 Hz can be seen due to the constrained flow of air through the narrowed region. The inspiratory phase is merely visible as its intensity and time duration is very low, as observed in Fig. 2c. However, very strong, highly persistent, and intense frequency component is visible throughout the expiratory phase of the EW signal again confirming that the wheezing is caused during the period of exhalation. The dilation of airways due to elasticity during inspiration let the air to flow around the obstruction. As the airways contract during exhalation, airflow increases, causing the high-intense wheezing sounds. The presence of a large number of frequency components with lower intensity, during the inspiratory phase of the EW signal, is clear from the wavelet scalogram given in Fig. 3d. When the inspiratory and expiratory phases of EW are compared, a significant difference in the amplitude of the signals can be seen in Fig. 3d and e as observed in the time spectrum and PSDP. This shows that the expiration is a strenuous process for a patient suffering from wheezing, as more energy has to be put in for expelling air from the lungs. The forcing of air out of the lungs through the airways with mucus and secretions results in turbulence. A comparison of the wavelet scalogram of expiration of VS and EW, shown in Fig. 3b and e, it is evident that the high intense frequency component persists for a longer duration of time in EW. Thus, the wavelet analysis provides a clear distinction between the pathological and normal lung sound signal.

Fig. 3
figure 3

Wavelet scalogram of breath signal: VS—a inspiration b expiration, c respiration; EW—d inspiration, e expiration, f respiration

The normal, as well as adventitious breath sounds, are produced only by the turbulent and vorticose airflow, which makes them complex and nonlinear. These non-stationary and nonlinear lung sounds are analysed using nonlinear time series analysis to unveil the hidden dynamics involved during respiration. The advantage of phase portrait analysis through the parameters—m, τ, MLE, SE, Hb, and Db—is that it offers good fidelity as it details the multidimensional aspect of the lung sound signal giving the correlation among the data points evolving in time. The multi-dimensional phase space representation, obtained from the estimated values of τ and m, contains all the information regarding the complexity of the VS and EW signals. A representative phase portrait of inspiration, expiration, and respiration of both VS and EW are shown in Fig. 4. The values of τ and m used for plotting Fig. 4 computed are—for VS—(a) inspiration (τ = 37, m = 6) (b) expiration (τ = 39, m = 6) (c) respiration (τ = 38, m = 7) and for EW—(d) inspiration (τ = 22, m = 7) (e) expiration (τ = 40, m = 8) (f) respiration (τ = 43, m = 7). On comparing the phase portrait of EW with VS, it can be seen that the randomness is lowered in EW during inspiration, expiration, and respiration indicating chaotic nature. The randomness in the phase portrait of the VS signal can be attributed to the higher degree of freedom of air molecules passing through airways of larger diameter. Figures 4e and f substantiate the musical nature of the expiratory phase, a characteristic feature of the EW signal. The mathematical explanation for the phase portraits of respiration of VS and EW can be given through the Lyapunov exponent, which quantifies the rate of divergence or convergence of state-space trajectories. The MLE of the normal and abnormal signals is calculated and is given in the box-and-whisker plot displayed in Fig. 5a. The higher mean value of MLE of VS (0.034) compared to that of EW (0.024) accounts for the faster rate of diverging of phase trajectory in VS, which is in agreement with Figs. 4c and f.

Fig. 4
figure 4

Phase portrait of breath signal: VS—a inspiration (τ = 37, m = 6), b expiration (τ = 39, m = 6), c respiration (τ = 38, m = 7); EW—d inspiration (τ = 22, m = 7), e expiration (τ = 40, m = 8), f respiration (τ = 43, m = 7)

Fig. 5
figure 5

Whiskers-and-box plot for EW and VS signals of respiration: a Lyapunov exponent, b Sample entropy, c Fractal dimension and d Hurst exponent

Sample entropy is one among the chaotic indices used to define the periodicity or irregularity in a time-series signal. The sample entropy values of the pseudo-periodic EW signals and transient VS signals are given in the whisker-and-box plot (Fig. 5c). The sample entropy (SE) of EW is 1.202, which is higher than that of VS signal 0.792. The increase in the value of SE for the EW signal is attributed to the increased complexity due to the fluid flutter airflow dynamics. During expiration when the air flows through narrowed airway tubes with obstructions in the form of mucus or tumours, the intra-airway pressure decreases, resulting in the collapse of the airway. When air is forced out through such a constricted region, musical wheezing sound signals are produced. The higher velocity of the air molecules coming out of the constricted region forms vortices and turbulence, which is responsible for the higher value of SE of EW signals and thereby giving a picture of the nature of constriction in the airways. The finite fractal structure of the lungs suggests that lung sounds exhibit fractal nature. The fractal dimension (Db) of the VS and EW signals are calculated by the box-counting method, and the Db values of the signals are represented in the whisker-and-box plot Fig. 5c. The higher mean value of Db of EW (1.850) indicates a higher self-similarity and complexity of the time series data when compared to that of VS (1.783). The higher complexity of the EW signal like the higher value of SE is due to the high intense and persistent frequency component in the signal. The Hurst exponent obtained from the Db values gives information about the antipersistant nature of the signals, as both the signals show value below 0.5. The whisker-and-box plot of Hb values shown in Fig. 5d suggests more randomness to VS signal as its Hb value (0.219) is closer to 0.5 than EW (0.151).

Along with the analysis, classification of pathological and normal signals are also significant in the field of medical diagnosis. The PCA analysis is one such feature extraction tool, which makes use of any of the important features of a system like FFT, PSD, SE, MLE, Db, or Hb to reduce the dimensionality of the system with minimal loss of information. In the present analysis, the most suitable feature selected for extraction is the mean of the PSD data from the FFT. The 26 features of each signal are subjected to PCA using the R software. Figure 6 displays the PCA biplot of EW and VS signals. When the horizontal axis of Fig. 6 indicates the projections on to the first principal component (PC1), which is the direction of the data having the most variance, the vertical axis points to the second principal component (PC2), which indicate the direction orthogonal to PC1. The principal components being orthogonal, their projections are uncorrelated. The PC1 and PC2 could cover about 99.9% of the total variance of the original data set. The spectral features being distinct, as elicited by the PCA of EW and VS, it opens a possibility of employing such spectral, fractal, and time series techniques in the auscultation for diagnosing respiratory diseases.

Fig. 6
figure 6

Principal component analysis for EW and VS signals of respiration

Conclusion

The present work discusses a novel approach of using powerful mathematical tools like FFT, PSD, fractals, time series, and principal component analyses in the auscultation for diagnosing respiratory diseases. Wheezes, the acoustic manifestations of obstructions and mucus secretions in the respiratory airways indicating the pathological lung condition are analysed. 35 signals of EW and VS, when subjected to spectral analysis, revealed a clear difference in their time duration, intensity, and the number of frequency components. The expiratory nature of wheezing is evident from the time spectrum and wavelet scalogram. The lowering of the randomness in EW during inspiration, expiration, and respiration, indicates the musical and chaotic nature evidenced through its respective phase portraits. The higher value of MLE and randomness in the phase portrait of the VS signal is attributed to the higher degree of freedom of air molecules passing through airways of larger diameter. The passage of air molecules through the constricted regions of airways in EW accounts for the formation of vortices and turbulence resulting in the higher value of SE. The higher self-similarity, complexity and antipersistence of the time series data of EW, when compared to that of VS, is revealed through the higher Db and lower Hb value. The PCA biplot obtained by extracting features of the PSD is a potential tool for classification. The paper presents a functional, cost-effective, and non-invasive tool for the safer and rapid detection of lung diseases. The study becomes more relevant in the context of the outbreak of the pandemic COVID-19, as it suggests a novel approach for breath sound signal analysis through the nonlinear time series parameters that reflects the correlation between data points evolving in time. Thus the work reported in the paper attempts to kindle the minds of researchers striving for developing novel digital auscultation techniques integrating the principles of mathematics and statistics.