Skip to main content
Log in

Glottal Activity Detection from the Speech Signal Using Multifractal Analysis

  • Published:
Circuits, Systems, and Signal Processing Aims and scope Submit manuscript

Abstract

This work proposes a novel method for the detection of glottal activity regions from the speech signal. Glottal activity detection refers to the problem of discriminating voiced and unvoiced segments of the speech signal. This is a fundamental step in the work flow of many speech processing applications. Much of the existing approaches for voiced/unvoiced detection are based on linear measures though the speech is produced from an underlying nonlinear process. The present work solves the problem from a nonlinear perspective, using the framework of multifractal analysis. The fractal property of the speech signal during the production of voiced and unvoiced sounds is sought to obtain the characterization of glottal activity. The characterization is done by computing the Hurst exponent from the evaluation of the scaling property of fluctuations present in the speech signal. Experimental analysis shows that Hurst exponent varies consistently with respect to the dynamics of glottal activity. The performance of the proposed method has been evaluated on the CMU-arctic, Keele and KED-Timit databases with simultaneous electroglottogram signals. Experimental results show that the average detection accuracy or error rate of the proposed method is comparable to the best performing algorithm on clean speech signals. Besides, evaluation of the robustness of the proposed method to noise degradation shows comparable results with other methods for signal-to-noise ratio greater than 10 dB and 20 dB, respectively, for white noise and babble noise.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10

Similar content being viewed by others

References

  1. O.A. Adeyemi, Multifractal Analysis of Unvoiced Speech Signals, Ph.D. dissertation, University of Rhode Island, USA, 1997

  2. N. Adiga, S.R.M. Prasanna, Detection of glottal activity using different attributes of source information. IEEE Signal Process. Lett. 22(11), 2107–2111 (2015)

    Article  Google Scholar 

  3. N. Adiga, B.K. Khonglah, S.R.M. Prasanna, Improved voicing decision using glottal activity features for statistical parametric speech synthesis. Digit. Signal Process. 71, 131–143 (2017)

    Article  MathSciNet  Google Scholar 

  4. G. Aneeja, B. Yegnanarayana, Single frequency filtering approach for discriminating speech and nonspeech. IEEE Trans. Audio Speech Lang. Process. 23(4), 705–717 (2015)

    Article  Google Scholar 

  5. D. Arifianto, Dual parameters for voiced-unvoiced speech signal determination, in Proceedings of ICASSP, vol. 4 (2007), pp. 749–752

  6. B.S. Atal, L.R. Rabiner, A pattern recognition approach to voiced–unvoiced–silence classification with applications to speech recognition. IEEE Trans. Acoust. Speech Signal Process. 24(3), 201–212 (1976)

    Article  Google Scholar 

  7. A. Benyassine, E. Shlomot, H. Su, D. Massaloux, C. Lamblin, J. Petit, ITU-T Recommendation G.729 Annex B: a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data applications. IEEE Commun. Mag. 35(9), 64–73 (1997)

    Article  Google Scholar 

  8. S. Bhaduri, D. Ghosh, Speech, music and multifractality. Curr. Sci. 110(9), 1817–1822 (2016)

    Article  Google Scholar 

  9. S. Bhaduri, A. Chakraborty, D. Ghosh, Speech emotion quantification with chaos-based modified visibility graph-possible precursor of suicidal tendency. J. Neurol. Neurosci. 7(3), 1–7 (2016)

    Article  Google Scholar 

  10. W.A. Black, Ked timit database (2002). http://festvox.org/dbs/dbs_kdt.html. Accessed 14 Oct 2018

  11. W.A. Black, T. Paul, C. Richard, The Festival speech synthesis system (2014). http://www.cstr.ed.ac.uk/projects/festival/. Accessed 06 June 2019

  12. N. Dhananjaya, B. Yegnanarayana, Voiced/nonvoiced detection based on robustness of voiced epochs. IEEE Signal Process. Lett. 17(3), 273–276 (2010)

    Article  Google Scholar 

  13. T. Drugman, A. Alwan, Joint robust voicing detection and pitch estimation based on residual harmonics, in Proceedings of Interspeech (2011), pp. 1973–1976

  14. T. Drugman, P. Alku, A. Alwan, B. Yegnanarayana, Glottal source processing: from analysis to applications. Comput. Speech Lang. 28(5), 1117–1138 (2014)

    Article  Google Scholar 

  15. D. Enqing, L. Guizhong, Z. Yatong, Z. Xiaodi, Applying support vector machines to voice activity detection, in Proceedings of International Conference Signal Processing (2002), pp. 1124–1127

  16. D.C. Gonzalez, L.L. Ling, F. Violaro, Analysis of the multifractal nature of speech signals, in CIARP in: LNCS, vol. 7441, ed. by L. Alvarez et al. (Springer, Berlin, 2012)

  17. D. Govind, S.R.M. Prasanna, B. Yegnanarayana, Significance of glottal activity detection for duration modification, in Proceedings of Speech Prosody (2012), pp. 470–473

  18. N. Henrich, C. d’Alessandro, B. Doval, M. Castellengo, On the use of the derivative of electroglottographic signals for characterization of nonpathological phonation. J. Acoust. Soc. Am. 115(3), 1321–1332 (2004)

    Article  Google Scholar 

  19. R.S. Holambe, M.S. Deshpande, chap 2, in Nonlinearity Framework in Speech Processing (Springer, Boston, 2012), pp. 11–25

  20. H.E. Hurst, Long-term storage capacity of reservoirs. Trans. Am. Soc. Civ. Eng. 166, 770–799 (1951)

    Google Scholar 

  21. E.A.F. Ihlen, Introduction to multi-fractal detrended fluctuation analysis in Matlab. Front. Physiol. 3(141), 1–18 (2012)

    Google Scholar 

  22. K. Itoh, M. Mizushima, Environmental noise reduction based on speech/non-speech identification for hearing aids, in Proceedings of ICASSP, vol. 1 (1997), pp. 419–422

  23. L. Janer, J.J. Bonet, E. Lleida-Solano, Pitch detection and voiced/unvoiced decision algorithm based on wavelet transforms, in Proceedings of IEEE International Conference on Spoken Language Processing (1996), pp 1209–1212

  24. J.W. Kantelhardt, S.A. Zschiegner, E.K. Bunde, S. Havlin, A. Bunde, H.E. Stanley, Multifractal detrended fluctuation analysis of non-stationary time series. Phys. A 316, 87–114 (2002)

    Article  Google Scholar 

  25. J. Kominek, A. Black, CMU-arctic speech databases, in Proceedings of ISCA Speech Synthesis Workshop (2004), pp. 223–224

  26. A.I. Koutrouvelis, G.P. Kafentzis, N.D. Gaubitch, R. Heusdens, A fast method for high-resolution voiced/unvoiced detection and glottal closure/opening instant estimation of speech. IEEE Trans. Audio Speech Lang. Process. 24(2), 316–328 (2016)

    Article  Google Scholar 

  27. G.J. Lal, E.A. Gopalakrishnan, D. Govind, Accurate estimation of glottal closure instants and glottal opening instants from electroglottographic signal using variational mode decomposition. Circuits Syst. Signal Process. 37(2), 810–830 (2018)

    Article  MathSciNet  Google Scholar 

  28. G.J. Lal, E.A. Gopalakrishnan, D. Govind, Epoch estimation from emotional speech signals using variational mode decomposition. Circuits Syst. Signal Process. 37(8), 3245–3274 (2018)

    Article  MathSciNet  Google Scholar 

  29. H. Liu, W. Zhang, Mandarin emotion recognition based on multifractal theory towards human–robot interaction, in Proceedings of International Conference on Robotics and Biomimetics (2013), pp. 593–598

  30. B.B. Manelbort, A multifractal walk down Wall Street. Sci. Am. 298, 70–73 (1999)

    Google Scholar 

  31. K.S.R. Murty, B. Yegnanarayana, M. Anand Joseph, Characterization of glottal activity from speech signals. IEEE Signal Process. Lett. 16(6), 469–472 (2009)

    Article  Google Scholar 

  32. V. Nair, Role of intermittency in the onset of combustion instability, Ph.D. thesis, Indian Institute of Technology Madras, India, 2014

  33. V. Nair, R.I. Sujith, Multifractality in combustion noise: predicting an impending combustion instability. J. Fluid Mech. 747, 635–655 (2014)

    Article  Google Scholar 

  34. T. Ng, B. Zhang, L. Nguyen, S. Matsoukas, X. Zhou, N. Mesgarani, K. Vesely, P. Matejka, Developing a speech activity detection system for the DARPA RATS program, in Proceedings of Interspeech (2012), pp. 1–4

  35. A. Pandey, R.K. Das, N. Adiga, N. Gupta, S.R.M. Prasanna, Significance of glottal activity detection for speaker verification in degraded and limited data condition, in Proceedings of TENCON (2015), pp. 1–6

  36. C.K. Peng, S. Havlin, H.E. Stanley, A.L. Goldberger, Quantification of scaling exponents and crossover phenomena in nonstationary heartbeat time series. Chaos Interdiscip. J. Nonlinear Sci. 5, 82–87 (1995)

    Article  Google Scholar 

  37. F. Plante, G.F. Meyer, W.A. Aubsworth, A pitch extraction reference database, in Proceedings of Eurospeech (1995), pp. 827–840

  38. A.P. Prathosh, T.V. Ananthapadmanabha, A.G. Ramakrishnan, Epoch extraction based on integrated linear prediction residual using plosion index. IEEE Trans. Audio Speech Lang. Process. 21(12), 2471–2480 (2013)

    Article  Google Scholar 

  39. F. Qi, C. Bao, Y. Liu, A novel two-step SVM classifier for voiced/unvoiced/silence classification of speech, in Proceedings of International Symposium on Chinese Spoken Language Processing (2004), pp. 77–80

  40. T.F. Quatieri, Discrete-Time Speech Signal Processing: Principles and Practice (Prentice-Hall, Upper Saddle River, 2002)

    Google Scholar 

  41. T. Schreiber, A. Schmitz, Surrogate time series. Phys. D 142, 346–382 (2000)

    Article  MathSciNet  Google Scholar 

  42. J.K. Shah, A.N. Iyer, B.Y. Smolenski, R.E. Yantorno, Robust voiced/unvoiced classification using novel features and Gaussian mixture model, in Proceedings of ICASSP (2004), pp. 1–4

  43. C. Shahnaz, W. Zhu, M.O. Ahmad, A multifeature voiced/unvoiced decision algorithm for noisy speech, in Proceedings of IEEE International Symposium on Circuits and Systems (2006), pp. 2525–2528

  44. J. Sohn, N.S. Kim, W. Sung, A statistical model-based voice activity detection. IEEE Signal Process. Lett. 6(1), 1–3 (1999)

    Article  Google Scholar 

  45. S.H. Strogatz, Nonlinear Dynamics and Chaos: With Applications to Physics, Biology, Chemistry and Engineering (Westview Press, Boulder, 2000)

    MATH  Google Scholar 

  46. F. Takens, Detecting strange attractors in turbulence, in Lectures Notes in Mathematics, vol. 898 (1981), pp. 366–381

  47. D. Talkin, A robust algorithm for pitch tracking (RAPT). Speech Coding Synth. 495, 495–518 (1995)

    Google Scholar 

  48. J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, J. Farmer, Testing for nonlinearity in time series: the method of surrogate data. Phys. D 58, 77–94 (1992)

    Article  Google Scholar 

  49. M.R.P. Thomas, P.A. Naylor, The sigma algorithm: a glottal activity detector for electroglottographic signals. IEEE Trans. Audio Speech Lang. Process. 17, 1557–1566 (2009)

    Article  Google Scholar 

  50. D. Valj, B. Kotnik, B. Horvat, Z. Kacic, A computationally efficient mel-filter bank VAD algorithm for distributed speech recognition systems. EURASIP J. Adv. Signal Process. 4, 487–497 (2005)

    MATH  Google Scholar 

  51. A. Varga, H.J. Steeneken, Assessment for automatic speech recognition. NOISEX-92: a database and an experiment to study the effect of additive noise on speech recognition systems. Speech Commun. 12(3), 247–251 (1993)

    Article  Google Scholar 

  52. Z. Zhang, Mechanics of human voice production and control. J. Acoust. Soc. Am. 140(4), 2614–2635 (2016)

    Article  Google Scholar 

  53. X.L. Zhang, J. Wu, Deep belief networks based voice activity detection. IEEE Trans. Audio Speech Lang. Process. 21(4), 697–710 (2013)

    Article  Google Scholar 

  54. H. Zhao, S. He, Analysis of speech signals characteristics based on MF-DFA with moving overlapping windows. Phys. A 442, 343–349 (2016)

    Article  Google Scholar 

Download references

Acknowledgements

The authors gratefully acknowledge Amrita Vishwa Vidyapeetham for the generous funding provided to the first author in pursuing his Ph.D. Further, we thank Dr. Vineeth Nair (IIT Bombay) for providing a better understanding of MFDFA through his Ph.D. thesis.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to E. A. Gopalakrishnan.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

A Surrogate Test and Relevance of Fractal Analysis

Basically, the surrogate test examines the validity of a null hypothesis \(({H_0})\) formulated for the original time series. \({H_0}\) is formulated based on the origin of the time series under investigation. Now, surrogates for the original time series are created by a surrogate algorithm, preserving the amplitude distribution, autocorrelation, local mean and variance of the original data. Then, a discriminating statistic is measured for the original data \(({T_\mathrm{o}})\) and its surrogates \(({T_\mathrm{s}})\). Also, the distribution of \({T_\mathrm{s}}\) obtained for the surrogate data can be estimated. If \({T_\mathrm{o}}\) and \({T_\mathrm{s}}\) are significantly different, the null hypothesis \({H_0}\) can be rejected. However, the rejection is carried out in a probabilistic manner, specifying the significance level. The significance level is determined by the number of surrogate data sets as, \(p = 2 \times \frac{1}{{\left( {{{{n}}_\mathrm{s}} + 1} \right) }}\) where \({{{\text {n}}_\mathrm{s}}}\) is the total number of surrogates. The measure of significance for rejection is given by

$$\begin{aligned} t = \frac{{\left| {{T_\mathrm{o}} - \mathrm{mean}({T_\mathrm{s}})} \right| }}{{{\sigma _\mathrm{s}}}} \end{aligned}$$
(10)

where \({{\sigma _\mathrm{s}}}\) is the standard deviation of the distribution for surrogates.

The general techniques employed for the generation of surrogate data are random permutation (RP), Fourier transform (FT) and amplitude-adjusted Fourier transform (AAFT) [41, 48]. RP shuffles the data so that any linear correlations present in the original data may be destroyed. However, surrogates preserve the amplitude distribution of the original data. The FT surrogate is based on the null hypothesis that the data originate from a linear Gaussian process. It maintains the amplitude spectrum but the phases are randomized. AAFT algorithm was developed based on the null hypothesis that the original data are a monotonic nonlinear transformation of a linear Gaussian process. Here, the original data are initially rescaled to preserve the Gaussian property. Then, FT surrogates are generated and rescaled back to the amplitude distribution of the original data. Further, the choice of the discriminating statistic is also crucial in the surrogate analysis. The statistic should be selected in such a way that it should not preserve the hypothesis governing the surrogate algorithm. For example, the selection of mean as a discriminating statistic will always satisfy the null hypothesis governing random permutation. In this work, we use the HE estimated using MFDFA as the discriminating statistic.

We use an ensemble of 99 surrogates of the same length as the original speech signal (‘kdt001’ from KED-Timit) in each of the surrogate methods. Figure 11 shows the distribution of the discriminating statistics for the surrogates \(({T_\mathrm{s}})\) and the original speech signal \(({T_\mathrm{o}})\). The corresponding measure of significance t obtained for each surrogate test is also shown in Fig. 11. Here, \({H_0}\) for each surrogate method is rejected if the measure of significance (t) is greater than 2. This is evident from Fig. 11a where the null hypothesis governing RP surrogates is rejected, revealing some temporal correlation in the speech signal. Such a correlated time series is then tested using FT surrogates. If FT is also rejected, then the analysis is continued with AAFT surrogates. Here, we observe that \({H_0}\) governing FT and AAFT are also rejected (Fig. 11b, c). The rejection of \({H_0}\) governing RP, FT and AAFT is an indication of nonlinear process involved in the production of the speech signal. Further, we verified results of the surrogate test in two ways. Firstly, a voiced segment from the same utterance is simulated using LP coefficients and impulse train as excitation. The LP coefficients are obtained by LP analysis (of order 12) of the original speech segment. We use a frame size of 20 ms and frame shift of 10 ms for the LP analysis. The simulated speech output is tested using RP and FT surrogates. We find that the \({H_0}\) governing RP is rejected and FT is accepted (Fig. 12). Secondly, a synthetic version of the aforesaid Timit utterance is generated using ‘HTS-2005’ online tool [11], which uses a source–filter model. Again, we observe that \({H_0}\) governing FT is accepted (Fig. 13b). This is an indication of the linear process involved in the production of both the simulated (LP-based) and HTS synthetic speech signals. Thus, we conjecture that the source–filter model gives a linear approximation of the speech signal. The actual speech production process involves nonlinear interaction of the subsystems/parts from lungs to lips. In this context, a nonlinear technique like fractal analysis appears to be appropriate/relevant for the analysis of speech signals.

Fig. 11
figure 11

Surrogate test for the speech signal ‘kdt001’ from KED-Timit database. Distribution of HE for the surrogates generated using a RP, b FT and c AAFT. The statistic for the original utterance is shown as a vertical dashed line (red). The measure of significance t is indicated on top right for each method (Color figure online)

Fig. 12
figure 12

Surrogate test for the simulated speech signal using LP analysis and impulse excitation. Distribution of HE for the surrogates generated using a RP and b FT. The statistic for the original utterance is shown as a vertical dashed line (red). The measure of significance t is indicated on top right for each method (Color figure online)

Fig. 13
figure 13

Surrogate test for the synthetic version of ‘kdt001’ speech signal generated using HTS-2005, which uses source–filter model. Distribution of HE for the surrogates generated using a RP and b FT. The statistic for the original utterance is shown as a vertical dashed line (red). The measure of significance t is indicated on top right for each method (Color figure online)

In order to investigate the relevance of fractal analysis in characterizing GA/non-GA regions of the speech signal, we have conducted the following experiment. Firstly, we took the aforesaid simulated voiced speech (/aa/) and its original version. Then, we perform MFDFA on these segments. The multifractal spectrum obtained for each of these segments is shown in Fig. 14c. It is observed that the multifractal spectrum of both the segments is clustered near to zero. Secondly, we simulated an unvoiced speech segment (/s/) from the same Timit utterance using LP analysis and random noise excitation. Then, we repeated the fractal analysis on the simulated unvoiced speech and its original version. Figure 15c depicts the resultant multifractal spectrums. From the plot, we can infer that the original unvoiced speech corresponds to a large inverted arc (indicating its strong multifractal nature). In contrast, the spectrum of the simulated unvoiced speech corresponds to a small arc (indicating an approximately monofractal nature). Precisely, the multifractal nature of the unvoiced speech is lost in the LP analysis. This is due to the limitation of the LP analysis in identifying the source–filter interaction. The LP analysis relies on the assumption that source and filter are decoupled entities and the excitation source has no influence on the vocal tract system parameters. However, the excitation source can influence the vocal tract system parameters in the actual speech production process. This in turn can contribute to spatial and temporal variations in the speech signal, resulting in its multifractal nature.

Fig. 14
figure 14

Multifractal analysis on original and simulated voiced speech signal. a Original voiced speech signal, corresponding b simulated version, c multifractal spectrum

Fig. 15
figure 15

Multifractal analysis on original and simulated unvoiced speech signal. a Original unvoiced speech signal, corresponding b simulated version, c multifractal spectrum

Thus, we can conclude that the simulated voiced speech preserves the fractal nature of the original version. However, the simulated unvoiced speech did not capture the spatial and temporal variations (multifractal nature) in the original version. Therefore, the characterization/detection of GA/non-GA (voiced/unvoiced) region based on multifractal analysis may not be effective in the simulated speech signal using LP coefficients and impulse/random noise excitation.

B A Comparative Study of Different Ground-Truth GA Estimation Methods

The ground-truth GA regions of the speech signal from CMU-arctic and KED-Timit databases can be estimated from the corresponding EGG signals. Here, we investigated the performance of three state-of-the-art methods, such as the ZFF method, the ’SIGMA’ algorithm and the VMD method in estimating the ground-truth GA regions from the EGG signal. The ZFF method uses a simple threshold operation on the SoE at the estimated epochs (from the EGG) for the identification of reference GA regions. Precisely, an epoch with SoE greater than one percent of the maximum SoE is considered as the ground-truth GA region [12]. Using the SIGMA algorithm, the boundaries of GA regions are estimated based on the initial and final instants of glottal closure [26]. That is, if the distance between consecutive GA regions is greater than twice the maximum pitch period, the corresponding region is marked as a non-GA region. The VMD method relies on the estimation of epochs from the mode which oscillates close to the fundamental frequency of the speech signal [27, 28]. For GA regions, the method estimates epochs as the positive zero crossings of the selected mode.

Fig. 16
figure 16

Comparison of estimated ground-truth GA regions using different methods. a EGG signal with actual GA regions indicated using dashed blue line, b DEGG signal with estimated epochs from SIGMA marked using ‘\(*\),’ c SoE obtained from the ZFF signal, d VMD output signal with estimated epochs marked using ‘\(*\). The estimated GA regions are shown in thick red line on bd (Color figure online)

Figure 16 shows the demonstration of GAD from the EGG signal (taken from the Keele database since it has proper ground truth) using the aforementioned methods. The detected GA/non-GA regions from these methods are marked using a thick line (red) on the corresponding output signals (Fig. 16b–d). Reference GA/non-GA regions are shown in Fig. 16a using a dashed blue line. From the visual inspection of Fig. 16c, d, it is clear that the VMD method and ZFF method outperform SIGMA in estimating GA regions, especially at voicing offset regions (region indicated using dashed circle). Therefore, one can choose either the ZFF method or the VMD method for estimation of ground-truth GA regions. Nevertheless, we perform a performance evaluation of the proposed method (and other state-of-the-art methods) using ground-truth GA regions obtained from the ZFF method and VMD method. The evaluation is done on a test database (here, we choose CMU-arctic) since a comprehensive study on ground-truth GAD algorithms is beyond the scope of the present work. Precisely, we compute the \( VUV _\mathrm{E}\) during the detection of GA regions from the speech signal with respect to references from the ZFF method and VMD method. Figure 17a–c shows the \( VUV _\mathrm{E}\) obtained for each method (with respect to the two references) on BDL, SLT and JMK databases, respectively. From the results, it is evident that the \( VUV _\mathrm{E}\) is nearly equal for the two references. Hence, the choice of a method for the estimation of ground-truth GA region is not very critical.

Fig. 17
figure 17

A comparative evaluation of the performance of the proposed method and state-of-the-art methods in GAD from speech signal using the ZFF- and VMD-based ground-truth references. Evaluation on a BDL, b SLT and c JMK databases. I—ZFF method, II—SRH method, III—GEFBA, IV—proposed method

C Phase Space Reconstruction

Generally, all variables defining the state of a complex system are not available in practice. In such a scenario, the dynamics of the system can be visualized by reconstructing the phase space of evolution of the measured variable or time series [33]. The reconstructed phase space will be topologically equivalent to the original one. Also, the variation in the dynamics is always found to be reflected in the structure of the phase space. Therefore, we employ the phase space reconstruction method to visualize the dynamics of the speech production system during the production of voiced and unvoiced speech sounds. We use the time delay embedding technique [46] for reconstruction. Precisely, the speech data are converted into an ensemble of delay vectors. Each delay vector corresponds to a state in the reconstructed phase space. For faithful reconstruction, the embedding parameters such as optimum time delay \(\tau \) and least embedding dimension \(d_0\) are properly determined. \(\tau \) is obtained by computing the first minimum of the average mutual information (AMI). Further, the false nearest neighbors (FNN) technique is used for determining \(d_0\).

For a demonstration of the difference in complexity of the phase space corresponding to the GA/non-GA, we took the same voiced and an unvoiced segment shown in Fig. 4. Then, optimum time delay \({\tau }\) and minimum embedding dimension \({d_0}\) are computed for the segments separately. Figure 18b, d shows the reconstructed phase space for the voiced (Fig. 18a) and unvoiced segments (Fig. 18c), respectively. By visual inspection of Fig. 18b, it is clear that the phase space for voiced segment shows circular patterns (closed trajectories) representing periodic/quasiperiodic oscillations. This occurs due to the periodic/quasiperiodic vibration of vocal folds during the production of voiced speech sounds. On the other hand, the attractor for unvoiced sound (Fig. 18d) is expanded in all directions indicating irregular fluctuations in the airflow. The reconstructed phase space also reveals that the voiced and unvoiced sounds possess different multifractal structure and can be characterized based on multifractal measures.

Fig. 18
figure 18

Demonstration of the reconstructed phase space of the voiced and unvoiced segment of a speech signal. a, b A voiced speech segment and corresponding reconstructed phase space, c, d an unvoiced speech segment and corresponding reconstructed phase space

D Phoneme Category of Sounds

Table 7 gives the phoneme category of sounds mentioned in Figs. 3,48 and  9.

Table 7 Phoneme category of sounds

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jyothish Lal, G., Gopalakrishnan, E.A. & Govind, D. Glottal Activity Detection from the Speech Signal Using Multifractal Analysis. Circuits Syst Signal Process 39, 2118–2150 (2020). https://doi.org/10.1007/s00034-019-01253-4

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00034-019-01253-4

Keywords

Navigation