Abstract
In speech recognition, the procurement of accurate patterns that describe an input signal is a crucial task. Frequency-domain processing provides with rich information for such signal descriptions. However a first interpretation of the time-domain characteristics of the speech utterances may be enough for obtaining important information contained in the signal in a faster way. This paper shows that segmentation and labelling of speech may be performed using only time-domain information in an exact and accurate way. The method obtains syllable and phoneme level segmentation in two stages. The first identifies sonority decrease intervals for estimating transitions between syllables. The second, refines the placement of boundaries using a set of fuzzy-rules that com-pared current time-marks with previously computed syllable-transition values. The system was tested using an Italian language digit database. The reported results show that the accuracy of the inter-syllabic boundary placements get improved when using the fuzzy-correction method.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
A. Ljolje and M. D. Riley, “Automatic Segmentation and Labelling of Speech”, Proc. IC-ASSP 91, (1991) pp. 473–476.
J. W. Pitton, K Wang and B Juang, “Time-frequency analysis and auditory modeling for automatic recognition of speech”, Proceedings of the IEEE, Vol. 84, No. 9, Sep. (1996), pp.1199–1214.
J. Saunders, “Real-time discrimination of broadcast speech/music,” in Proc. Int. Conf. Acoustic, Speech, and Signal Processing (ICASSP-96), vol. 2, Atlanta, GA, May 7–10, (1996), pp. 993–996.
N. Kumar, W. Himmelbauer, G. Cauwenberghs and A. Andreou, “An Analog VLSI Chip with Asynchronous Interface for Auditory Feature Extraction,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, 45 (5), (1998), pp 600–606,.
S. Raptis and G. Carayannis, “Fuzzy Logic for Rule-Based Formant Speech Synthesis”, Proc. EUROSPEECH 97, (1997), pp. 1599–1602..
C. T. Hsieh, M.C. Su, E. Lai and C.H. Hsu., “A Segmentation Method for Continuous Speech Utilizing Hybrid Neuro-Fuzzy Network” Journal of Information Science and Engi-neering. Vol 15 (1999), pp. 615–628,.
C. T. Hsieh and S. C. Chien, “Speech segmentation and clustering problem based on fuzzy rules and transition states,” Twelfth International Association of Science and Technology for Development International Conference on Applied Information, (1994), pp.291–294.
D. Torre Toledano, M. A. Rodríguez Crespo, J. G. Escalada Sardina “Trying to Mimic Human Segmentation of Speech Using HMM and Fuzzy Logic Post-correction Rules” Proceedings of third ESCA/COSCOSDA International Workshop on Speech Synthesis. November (1998)
I. Kopecek. “Automatic Segmentation into Syllable Segments”, Proceedings of First International Conference on Language Resources and Evaluation,May (1998), pp. 1275–1279.
L. R. Rabiner and M. R. Sambur, “An algorithm for determinig the endpoints of isolated utterances,” The Bell System Technical Journal, Vol. 54, No. 2, (1975), pp. 297–315.
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2002 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Mayora-Ibarra, O., Curatelli, F. (2002). Time-Domain Segmentation and Labelling of Speech with Fuzzy-Logic Post-Correction Rules. In: Coello Coello, C.A., de Albornoz, A., Sucar, L.E., Battistutti, O.C. (eds) MICAI 2002: Advances in Artificial Intelligence. MICAI 2002. Lecture Notes in Computer Science(), vol 2313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46016-0_15
Download citation
DOI: https://doi.org/10.1007/3-540-46016-0_15
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43475-7
Online ISBN: 978-3-540-46016-9
eBook Packages: Springer Book Archive