Skip to main content
Log in

Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

  • Published:
Sādhanā Aims and scope Submit manuscript

Abstract

Detection of transitions between broad phonetic classes in a speech signal has applications such as landmark detection and segmentation. The proposed hierarchical method detects silence to non-silence transitions, sonorant to non-sonorant transitions and vice-versa. The subset of the extrema (minimum or maximum amplitude samples) above a threshold, occurring between every pair of successive zero-crossings, is selected from each frame of the bandpass-filtered speech signal. Locations of the first and the last extrema lie on either side far away from the mid-point (reference) of a frame, if the speech signal belongs to a non-transition segment; else, one of these locations lies within a few samples from the reference, indicating a transition frame. The transitions are detected from the entire TIMIT database for clean speech and 93.6% of them are within a tolerance of 20 ms from the phone boundaries. Sonorant, unvoiced non-sonorant and silence classes and their respective onsets are detected with an accuracy of about 83.5% for the same tolerance with respect to the labelled TIMIT database as reference. The results are as good as, and in some aspects better than, the state-of-the-art methods for similar tasks. The proposed method is also tested on the test set of the TIMIT database for robustness with respect to white, babble and Schroeder noise, and about 90% of the transitions are detected within a tolerance of 20 ms at the signal to noise ratio of 5 dB. On NTIMIT database, 62.7% of the transitions are detected, and 63.5% of the sonorant onsets, within 20 ms tolerance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11
Figure 12

Similar content being viewed by others

References

  1. Fant G 2003 Speech sounds and features. Cambridge, MA: The MIT Press (Chapter 2)

  2. Hasegawa J M, Baker J, Borys S, Chen K, Coogan E, Greenberg S, Juneja A, Kirchhoff K, Livescu K, Mohan S, Muller J, Sonmez K and Wang T 2005 Landmark-based Speech Recognition: Report of the 2004 Johns Hopkins Summer Workshop

  3. SaiJayram A K V, Ramasubramanian V and Sreenivas T V 2002 Robust parameters for automatic segmentation of speech. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. I-513–I-516

  4. van Hemert J P 1991 Automatic segmentation of speech. IEEE Trans. Signal Process. 39: 1008–1012

    Article  Google Scholar 

  5. Muralishankar R, Srikanth R and Ramakrishnan A G 2003 Subspace and hypothesis based effective segmentation of co-articulated basic-units for concatenative speech synthesis. In: Proceedings of IEEE TENCON, October 15–17, Bangalore, vol. 1, pp. 388–392

  6. Obrecht R A 1986 Automatic segmentation of continuous speech signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2275–2278

  7. Svendsen T and Soong F K 1987 On the automatic segmentation of speech signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 77–80

  8. Sarkar A and Sreenivas T V 2005 Automatic speech segmentation using average level crossing rate information. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 397–400

  9. Ananthakrishnan G, Ranjani H G and Ramakrishnan A G 2006 Language independent automated segmentation of speech using Bach scale filter-banks. In: Proceedings of the IV International Conference on Intelligent Sensing and Information Processing, pp. 115–120

  10. Jakobson R, Fant G and Halle M 1952 Preliminaries to speech analysis: the distinctive features and their correlates. Cambridge, MA: The MIT Press

  11. Chomsky N and Halle M 1968 The sound pattern of English. Cambridge, MA: The MIT Pres

  12. King S and Taylor P 2000 Detection of phonological features in continuous speech using neural networks. Comput. Speech Lang. 14: 333–353

    Article  Google Scholar 

  13. Frankel J, Wester M and King S 2007 Articulatory feature recognition using dynamic Bayesian networks. Comput. Speech Lang. 21: 620–640

    Article  Google Scholar 

  14. Juneja A and Espy-Wilson C Y 2002 Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning. In: Proceedings of the IEEE International Conference on Neural Information Processing, pp. 726–730

  15. Juneja A and Espy-Wilson C Y 2008 A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. J. Acoust. Soc. Am. 123: 1154–1168

    Article  Google Scholar 

  16. Stevens K N 2002 Toward a model for lexical access based on acoustic landmarks and distinctive features. J. Acoust. Soc. Am. 111: 1872–1891

    Article  Google Scholar 

  17. Salomon A, Espy-Wilson C Y and Deshmukh O 2004 Detection of speech landmarks: use of temporal information. J. Acoust. Soc. Am. 115: 1296–1305

    Article  Google Scholar 

  18. Liu S A 1996 Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Am. 100: 3417–3430

    Article  Google Scholar 

  19. Lippmann R P 1997 Speech recognition by machines and humans. Speech Commun. 22: 1–15

    Article  Google Scholar 

  20. Mesgarani N, Cheung C, Johnson K and Chang E F 2014 Phonetic feature encoding in human superior temporal gyrus. Science 343: 1006–1010

    Article  Google Scholar 

  21. Reddy D R 1966 Phoneme grouping for speech recognition. J. Acoust. Soc. Am. 41: 1295–1300

    Article  Google Scholar 

  22. Ananthapadmanabha T V, Prathosh A P and Ramakrishnan A G 2014 Detection of the closure–burst transitions of stops and affricates in continuous speech using the plosion index. J. Acoust. Soc. Am. 135: 460–471

    Article  Google Scholar 

  23. Garofolo J S, Lamel L F, Fisher W M, Fiscus J G, Pallett D S and Dahlgrena N L 1993 DARPA TIMIT acoustic-phonetic continuous speech corpus. NISTIR Publication No. 4930. Washington, DC: U.S. Department of Commerce

  24. Niyogi P and Sondhi M M 2002 Detecting stop consonants in continuous speech. J. Acoust. Soc. Am. 111: 1063–1076

    Article  Google Scholar 

  25. Noisex-92 [Online] Available: http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html

  26. Rosen S 1992 Temporal information in speech: acoustic, auditory, and linguistic aspects. Philos. Trans. R. Soc. London B: Biol. Sci. 336: 367–373

    Article  Google Scholar 

  27. Niyogi P and Ramesh P 2003 The voicing feature for stop consonants: recognition experiments with continuously spoken alphabets. Speech Commun. 41: 349–367

    Article  Google Scholar 

  28. Prasanna S R M, Reddy B V S and Krishnamoorthy P 2009 Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17: 556–565

    Article  Google Scholar 

  29. Jankowski C, Kalyanswamy A, Basson S and Spitz J 1990 NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In: Proceedings of the 1990 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-90, pp. 109–112

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to K V Vijay Girish.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ananthapadmanabha, T.V., Vijay Girish, K.V. & Ramakrishnan, A.G. Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes. Sādhanā 43, 153 (2018). https://doi.org/10.1007/s12046-018-0923-x

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s12046-018-0923-x

Keywords

Navigation