Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

Ananthapadmanabha, T V; Vijay Girish, K V; Ramakrishnan, A G

doi:10.1007/s12046-018-0923-x

Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

Published: 04 August 2018

Volume 43, article number 153, (2018)
Cite this article

Sādhanā Aims and scope Submit manuscript

T V Ananthapadmanabha¹,
K V Vijay Girish ORCID: orcid.org/0000-0002-3332-1500² &
A G Ramakrishnan²

90 Accesses
Explore all metrics

Abstract

Detection of transitions between broad phonetic classes in a speech signal has applications such as landmark detection and segmentation. The proposed hierarchical method detects silence to non-silence transitions, sonorant to non-sonorant transitions and vice-versa. The subset of the extrema (minimum or maximum amplitude samples) above a threshold, occurring between every pair of successive zero-crossings, is selected from each frame of the bandpass-filtered speech signal. Locations of the first and the last extrema lie on either side far away from the mid-point (reference) of a frame, if the speech signal belongs to a non-transition segment; else, one of these locations lies within a few samples from the reference, indicating a transition frame. The transitions are detected from the entire TIMIT database for clean speech and 93.6% of them are within a tolerance of 20 ms from the phone boundaries. Sonorant, unvoiced non-sonorant and silence classes and their respective onsets are detected with an accuracy of about 83.5% for the same tolerance with respect to the labelled TIMIT database as reference. The results are as good as, and in some aspects better than, the state-of-the-art methods for similar tasks. The proposed method is also tested on the test set of the TIMIT database for robustness with respect to white, babble and Schroeder noise, and about 90% of the transitions are detected within a tolerance of 20 ms at the signal to noise ratio of 5 dB. On NTIMIT database, 62.7% of the transitions are detected, and 63.5% of the sonorant onsets, within 20 ms tolerance.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Phoneme sequence recognition via DTW-based classification

Article 19 October 2015

Segmentation of the period of the fundamental tone of a voice source

Article 01 March 2016

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

Article 08 September 2016

References

Fant G 2003 Speech sounds and features. Cambridge, MA: The MIT Press (Chapter 2)
Hasegawa J M, Baker J, Borys S, Chen K, Coogan E, Greenberg S, Juneja A, Kirchhoff K, Livescu K, Mohan S, Muller J, Sonmez K and Wang T 2005 Landmark-based Speech Recognition: Report of the 2004 Johns Hopkins Summer Workshop
SaiJayram A K V, Ramasubramanian V and Sreenivas T V 2002 Robust parameters for automatic segmentation of speech. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. I-513–I-516
van Hemert J P 1991 Automatic segmentation of speech. IEEE Trans. Signal Process. 39: 1008–1012
Article Google Scholar
Muralishankar R, Srikanth R and Ramakrishnan A G 2003 Subspace and hypothesis based effective segmentation of co-articulated basic-units for concatenative speech synthesis. In: Proceedings of IEEE TENCON, October 15–17, Bangalore, vol. 1, pp. 388–392
Obrecht R A 1986 Automatic segmentation of continuous speech signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 2275–2278
Svendsen T and Soong F K 1987 On the automatic segmentation of speech signals. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 77–80
Sarkar A and Sreenivas T V 2005 Automatic speech segmentation using average level crossing rate information. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 397–400
Ananthakrishnan G, Ranjani H G and Ramakrishnan A G 2006 Language independent automated segmentation of speech using Bach scale filter-banks. In: Proceedings of the IV International Conference on Intelligent Sensing and Information Processing, pp. 115–120
Jakobson R, Fant G and Halle M 1952 Preliminaries to speech analysis: the distinctive features and their correlates. Cambridge, MA: The MIT Press
Chomsky N and Halle M 1968 The sound pattern of English. Cambridge, MA: The MIT Pres
King S and Taylor P 2000 Detection of phonological features in continuous speech using neural networks. Comput. Speech Lang. 14: 333–353
Article Google Scholar
Frankel J, Wester M and King S 2007 Articulatory feature recognition using dynamic Bayesian networks. Comput. Speech Lang. 21: 620–640
Article Google Scholar
Juneja A and Espy-Wilson C Y 2002 Segmentation of continuous speech using acoustic-phonetic parameters and statistical learning. In: Proceedings of the IEEE International Conference on Neural Information Processing, pp. 726–730
Juneja A and Espy-Wilson C Y 2008 A probabilistic framework for landmark detection based on phonetic features for automatic speech recognition. J. Acoust. Soc. Am. 123: 1154–1168
Article Google Scholar
Stevens K N 2002 Toward a model for lexical access based on acoustic landmarks and distinctive features. J. Acoust. Soc. Am. 111: 1872–1891
Article Google Scholar
Salomon A, Espy-Wilson C Y and Deshmukh O 2004 Detection of speech landmarks: use of temporal information. J. Acoust. Soc. Am. 115: 1296–1305
Article Google Scholar
Liu S A 1996 Landmark detection for distinctive feature-based speech recognition. J. Acoust. Soc. Am. 100: 3417–3430
Article Google Scholar
Lippmann R P 1997 Speech recognition by machines and humans. Speech Commun. 22: 1–15
Article Google Scholar
Mesgarani N, Cheung C, Johnson K and Chang E F 2014 Phonetic feature encoding in human superior temporal gyrus. Science 343: 1006–1010
Article Google Scholar
Reddy D R 1966 Phoneme grouping for speech recognition. J. Acoust. Soc. Am. 41: 1295–1300
Article Google Scholar
Ananthapadmanabha T V, Prathosh A P and Ramakrishnan A G 2014 Detection of the closure–burst transitions of stops and affricates in continuous speech using the plosion index. J. Acoust. Soc. Am. 135: 460–471
Article Google Scholar
Garofolo J S, Lamel L F, Fisher W M, Fiscus J G, Pallett D S and Dahlgrena N L 1993 DARPA TIMIT acoustic-phonetic continuous speech corpus. NISTIR Publication No. 4930. Washington, DC: U.S. Department of Commerce
Niyogi P and Sondhi M M 2002 Detecting stop consonants in continuous speech. J. Acoust. Soc. Am. 111: 1063–1076
Article Google Scholar
Noisex-92 [Online] Available: http://www.speech.cs.cmu.edu/comp.speech/Section1/Data/noisex.html
Rosen S 1992 Temporal information in speech: acoustic, auditory, and linguistic aspects. Philos. Trans. R. Soc. London B: Biol. Sci. 336: 367–373
Article Google Scholar
Niyogi P and Ramesh P 2003 The voicing feature for stop consonants: recognition experiments with continuously spoken alphabets. Speech Commun. 41: 349–367
Article Google Scholar
Prasanna S R M, Reddy B V S and Krishnamoorthy P 2009 Vowel onset point detection using source, spectral peaks, and modulation spectrum energies. IEEE Trans. Audio Speech Lang. Process. 17: 556–565
Article Google Scholar
Jankowski C, Kalyanswamy A, Basson S and Spitz J 1990 NTIMIT: a phonetically balanced, continuous speech, telephone bandwidth speech database. In: Proceedings of the 1990 International Conference on Acoustics, Speech, and Signal Processing, ICASSP-90, pp. 109–112

Download references

Author information

Authors and Affiliations

Voice and Speech Systems, Malleswaram, Bangalore, India
T V Ananthapadmanabha
Indian Institute of Science, Bangalore, India
K V Vijay Girish & A G Ramakrishnan

Authors

T V Ananthapadmanabha
View author publications
You can also search for this author in PubMed Google Scholar
K V Vijay Girish
View author publications
You can also search for this author in PubMed Google Scholar
A G Ramakrishnan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to K V Vijay Girish.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ananthapadmanabha, T.V., Vijay Girish, K.V. & Ramakrishnan, A.G. Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes. Sādhanā 43, 153 (2018). https://doi.org/10.1007/s12046-018-0923-x

Download citation

Received: 31 March 2017
Revised: 31 December 2017
Accepted: 20 February 2018
Published: 04 August 2018
DOI: https://doi.org/10.1007/s12046-018-0923-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

Abstract

Access this article

Similar content being viewed by others

Phoneme sequence recognition via DTW-based classification

Segmentation of the period of the fundamental tone of a voice source

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Relative occurrences and difference of extrema for detection of transitions between broad phonetic classes

Abstract

Access this article

Similar content being viewed by others

Phoneme sequence recognition via DTW-based classification

Segmentation of the period of the fundamental tone of a voice source

Improvements in the Detection of Vowel Onset and Offset Points in a Speech Sequence

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation