Time-Domain Segmentation and Labelling of Speech with Fuzzy-Logic Post-Correction Rules

Mayora-Ibarra, O.; Curatelli, F.

doi:10.1007/3-540-46016-0_15

O. Mayora-Ibarra⁵ &
F. Curatelli⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2313))

Included in the following conference series:

Mexican International Conference on Artificial Intelligence

660 Accesses
1 Citations

Abstract

In speech recognition, the procurement of accurate patterns that describe an input signal is a crucial task. Frequency-domain processing provides with rich information for such signal descriptions. However a first interpretation of the time-domain characteristics of the speech utterances may be enough for obtaining important information contained in the signal in a faster way. This paper shows that segmentation and labelling of speech may be performed using only time-domain information in an exact and accurate way. The method obtains syllable and phoneme level segmentation in two stages. The first identifies sonority decrease intervals for estimating transitions between syllables. The second, refines the placement of boundaries using a set of fuzzy-rules that com-pared current time-marks with previously computed syllable-transition values. The system was tested using an Italian language digit database. The reported results show that the accuracy of the inter-syllabic boundary placements get improved when using the fuzzy-correction method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

A. Ljolje and M. D. Riley, “Automatic Segmentation and Labelling of Speech”, Proc. IC-ASSP 91, (1991) pp. 473–476.
Google Scholar
J. W. Pitton, K Wang and B Juang, “Time-frequency analysis and auditory modeling for automatic recognition of speech”, Proceedings of the IEEE, Vol. 84, No. 9, Sep. (1996), pp.1199–1214.
Google Scholar
J. Saunders, “Real-time discrimination of broadcast speech/music,” in Proc. Int. Conf. Acoustic, Speech, and Signal Processing (ICASSP-96), vol. 2, Atlanta, GA, May 7–10, (1996), pp. 993–996.
Google Scholar
N. Kumar, W. Himmelbauer, G. Cauwenberghs and A. Andreou, “An Analog VLSI Chip with Asynchronous Interface for Auditory Feature Extraction,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, 45 (5), (1998), pp 600–606,.
Article Google Scholar
S. Raptis and G. Carayannis, “Fuzzy Logic for Rule-Based Formant Speech Synthesis”, Proc. EUROSPEECH 97, (1997), pp. 1599–1602..
Google Scholar
C. T. Hsieh, M.C. Su, E. Lai and C.H. Hsu., “A Segmentation Method for Continuous Speech Utilizing Hybrid Neuro-Fuzzy Network” Journal of Information Science and Engi-neering. Vol 15 (1999), pp. 615–628,.
Google Scholar
C. T. Hsieh and S. C. Chien, “Speech segmentation and clustering problem based on fuzzy rules and transition states,” Twelfth International Association of Science and Technology for Development International Conference on Applied Information, (1994), pp.291–294.
Google Scholar
D. Torre Toledano, M. A. Rodríguez Crespo, J. G. Escalada Sardina “Trying to Mimic Human Segmentation of Speech Using HMM and Fuzzy Logic Post-correction Rules” Proceedings of third ESCA/COSCOSDA International Workshop on Speech Synthesis. November (1998)
Google Scholar
I. Kopecek. “Automatic Segmentation into Syllable Segments”, Proceedings of First International Conference on Language Resources and Evaluation,May (1998), pp. 1275–1279.
Google Scholar
L. R. Rabiner and M. R. Sambur, “An algorithm for determinig the endpoints of isolated utterances,” The Bell System Technical Journal, Vol. 54, No. 2, (1975), pp. 297–315.
Google Scholar

Download references

Author information

Authors and Affiliations

ITESM, Campus Cuernavaca, Av. Paseo de la Reforma 182-A, Mor. C.P. 62589, Lomas de Cuernavaca Temixco, México
O. Mayora-Ibarra
DIBE, Universitá di Genova, Via Opera Pia 11A, 16145, Genova, Italy
F. Curatelli

Authors

O. Mayora-Ibarra
View author publications
You can also search for this author in PubMed Google Scholar
F. Curatelli
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Computer Science Section, Electrical Engineering Department, CINVESTAV-IPN, Av. IPN 2508, Col. San Pedro Zacatenco, D.F. 07300, Mexico, Mexico
Carlos A. Coello Coello
Computer Science Department, ITESM-Mexico City, Calle del Puente 222, Tlalpan, D.F. 14380, Mexico, Mexico
Alvaro de Albornoz
Computer Science Department, ITESM-Cuernavaca, Reforma 182-A, Lomas de Cuernavaca, Temixco, 62589, Morelos, Mexico
Luis Enrique Sucar
Department of Computer Science, ITAM, Rio Hondo 1, Progreso Tizapan, D.F. 01000, Mexico, Mexico
Osvaldo Cairó Battistutti

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mayora-Ibarra, O., Curatelli, F. (2002). Time-Domain Segmentation and Labelling of Speech with Fuzzy-Logic Post-Correction Rules. In: Coello Coello, C.A., de Albornoz, A., Sucar, L.E., Battistutti, O.C. (eds) MICAI 2002: Advances in Artificial Intelligence. MICAI 2002. Lecture Notes in Computer Science(), vol 2313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46016-0_15

Download citation

DOI: https://doi.org/10.1007/3-540-46016-0_15
Published: 07 May 2002
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-43475-7
Online ISBN: 978-3-540-46016-9
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics