Abstract
In this paper, an effective method is proposed for the automatic phone segmentation of speech signal without using prior information about the transcript of utterance. The spectral change is used as the criterion for hypothesizing the phone boundary. Gaussian function can be used to measure the similarity of two vectors. Then a dissimilarity function is derived from the Gaussian function to measure the variation of speech spectra between mean feature vectors before and after the considered location. The peaks in the dissimilarity curve indicate locations of phone boundaries. Experiments on the TIMIT corpus show that the proposed method is more accurate than previous methods.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Scharenborg, O., Wan, V., Ernestus, M.: Unsupervised speech segmentation: an analysis of the hypothesized phone boundaries. J. Acoust. Soc. Amer. 172(2), 1084–1095 (2010)
Estevan, Y.P., Wan, V., Scharenborg, O.: Finding Maximum Margin Segments in Speech. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process 2007, ICASSP 2007, pp. 937–940 (2007)
Räsänen, O., Laine, U.K., Altosaar, T.: Blind segmentation of speech using non-linear filtering methods. In: Ipsic, I. (ed.) Speech Technologies, pp. 105–124. InTech Publishing (2011)
Aversano, G., Esposito, A., Esposito, A., Marinaro, M.: A New Text-Independent Method for Phoneme Segmentation. In: Proc. the 44th IEEE Midwest Symposium on Circuit and System 2001, vol. 2, pp. 516–519 (2001)
Dusan, S., Rabiner, L.: On the Relation between Maximum Spectral Transition Position and Phone Boundaries. In: Proc. INTERSPEECH 2006, pp. 17–21 (2006)
ten Bosch, L., Cranen, B.: A computational model for unsupervised word discovery. In: Proc. INTERSPEECH 2007, pp. 1481–1484 (2007)
Almpanidis, G., Kotti, M., Kotropoulos, C.: Robust Detection of Phone Boundaries Using Model Selection Criteria with Few Observation. IEEE Trans. on Audio, Speech, and Lang. Process. 17(2), 287–298 (2009)
Qiao, Y., Shimomura, N., Minematsu, N.: Unsupervised Optimal Phoneme Segmentation: Objective, Algorithm, and Comparisons. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. 2008, ICASSP 2008, pp. 3989–3992 (2008)
Lee, C.Y., Glass, J.: A nonparametric Bayesian Approach to Acoustic Model Discovery. In: Proc. 50th Annual Meeting of the Association for Computational Linguistics, pp. 40–49 (2012)
Cherniz, A.S., Torres, M.E., Rufiner, H.L.: Dynamic Speech Parameterization for Text-Independent Phone Segmentation. In: Proc. 32nd Annual International Conference of the IEEE EMBS, pp. 4044–4047 (2010)
Khanagha, V., Daoudi, K., Pont, O., Yahia, H.: A novel text-independent phonetic segmentation algorithm based on the microcanonical multiscal formalism. In: Proc. INTERSPEECH 2010, pp. 1393–1396 (2010)
Khanagha, V., Daoudi, K., Pont, O., Yahia, H.: Improving Text-Independent Phonetic Segmentation based on the microcanonical multiscal formalism. In: Proc. IEEE Int. Conf. Acoust., Speech, Signal Process. 2011, ICASSP 2011, pp. 4484–4487 (2011)
Huang, X., Acero, A., Hon, H.W.: Section 5.4 Digital Filters and Windows. In: Spoken Language Processing. Prentice Hall PTR (2001)
Deller Jr., J.R., Hansen, J.H.L., Proakis, J.G.: Section 6.2.4 Other Forms and Variations on the stRC Parameters. In: Discrete-Time Processing of Speech Signals. IEEE Press (2000)
Peng, H., Luo, L., Lin, C.: The parameter optimization of Gaussian function via the similarity comparison within class and between classes. In: Proc. Third Pacific-Asia Conference on Circuits, Communications and System 2011, PACCS 2011, pp. 1–4 (2011)
Delacourt, P., Wellekens, C.J.: DISTBIC: A Speaker-based segmentation for audio data indexing. Speech Commun. 32(1-2), 111–126 (2000)
Ajmera, J., McCowan, I., Bourlard, H.: Robust Speaker Change Detection. IEEE Signal Processing Letters 11(8), 649–651 (2004)
Räsänen, O.J., Laine, U.K., Altosaar: An Improved Speech Segmentation Quality Measure: the R-value. In: Proc. INTERSPEECH 2009, pp. 1851–1854 (1854)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Hoang, DT., Wang, HC. (2014). Text-Independent Phone Segmentation Method Using Gaussian Function. In: Huynh, V., Denoeux, T., Tran, D., Le, A., Pham, S. (eds) Knowledge and Systems Engineering. Advances in Intelligent Systems and Computing, vol 244. Springer, Cham. https://doi.org/10.1007/978-3-319-02741-8_11
Download citation
DOI: https://doi.org/10.1007/978-3-319-02741-8_11
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-02740-1
Online ISBN: 978-3-319-02741-8
eBook Packages: EngineeringEngineering (R0)