Skip to main content

Time-Domain Segmentation and Labelling of Speech with Fuzzy-Logic Post-Correction Rules

  • Conference paper
  • First Online:
MICAI 2002: Advances in Artificial Intelligence (MICAI 2002)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 2313))

Included in the following conference series:

Abstract

In speech recognition, the procurement of accurate patterns that describe an input signal is a crucial task. Frequency-domain processing provides with rich information for such signal descriptions. However a first interpretation of the time-domain characteristics of the speech utterances may be enough for obtaining important information contained in the signal in a faster way. This paper shows that segmentation and labelling of speech may be performed using only time-domain information in an exact and accurate way. The method obtains syllable and phoneme level segmentation in two stages. The first identifies sonority decrease intervals for estimating transitions between syllables. The second, refines the placement of boundaries using a set of fuzzy-rules that com-pared current time-marks with previously computed syllable-transition values. The system was tested using an Italian language digit database. The reported results show that the accuracy of the inter-syllabic boundary placements get improved when using the fuzzy-correction method.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. A. Ljolje and M. D. Riley, “Automatic Segmentation and Labelling of Speech”, Proc. IC-ASSP 91, (1991) pp. 473–476.

    Google Scholar 

  2. J. W. Pitton, K Wang and B Juang, “Time-frequency analysis and auditory modeling for automatic recognition of speech”, Proceedings of the IEEE, Vol. 84, No. 9, Sep. (1996), pp.1199–1214.

    Google Scholar 

  3. J. Saunders, “Real-time discrimination of broadcast speech/music,” in Proc. Int. Conf. Acoustic, Speech, and Signal Processing (ICASSP-96), vol. 2, Atlanta, GA, May 7–10, (1996), pp. 993–996.

    Google Scholar 

  4. N. Kumar, W. Himmelbauer, G. Cauwenberghs and A. Andreou, “An Analog VLSI Chip with Asynchronous Interface for Auditory Feature Extraction,” IEEE Trans. Circuits and Systems II: Analog and Digital Signal Processing, 45 (5), (1998), pp 600–606,.

    Article  Google Scholar 

  5. S. Raptis and G. Carayannis, “Fuzzy Logic for Rule-Based Formant Speech Synthesis”, Proc. EUROSPEECH 97, (1997), pp. 1599–1602..

    Google Scholar 

  6. C. T. Hsieh, M.C. Su, E. Lai and C.H. Hsu., “A Segmentation Method for Continuous Speech Utilizing Hybrid Neuro-Fuzzy Network” Journal of Information Science and Engi-neering. Vol 15 (1999), pp. 615–628,.

    Google Scholar 

  7. C. T. Hsieh and S. C. Chien, “Speech segmentation and clustering problem based on fuzzy rules and transition states,” Twelfth International Association of Science and Technology for Development International Conference on Applied Information, (1994), pp.291–294.

    Google Scholar 

  8. D. Torre Toledano, M. A. Rodríguez Crespo, J. G. Escalada Sardina “Trying to Mimic Human Segmentation of Speech Using HMM and Fuzzy Logic Post-correction Rules” Proceedings of third ESCA/COSCOSDA International Workshop on Speech Synthesis. November (1998)

    Google Scholar 

  9. I. Kopecek. “Automatic Segmentation into Syllable Segments”, Proceedings of First International Conference on Language Resources and Evaluation,May (1998), pp. 1275–1279.

    Google Scholar 

  10. L. R. Rabiner and M. R. Sambur, “An algorithm for determinig the endpoints of isolated utterances,” The Bell System Technical Journal, Vol. 54, No. 2, (1975), pp. 297–315.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2002 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Mayora-Ibarra, O., Curatelli, F. (2002). Time-Domain Segmentation and Labelling of Speech with Fuzzy-Logic Post-Correction Rules. In: Coello Coello, C.A., de Albornoz, A., Sucar, L.E., Battistutti, O.C. (eds) MICAI 2002: Advances in Artificial Intelligence. MICAI 2002. Lecture Notes in Computer Science(), vol 2313. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-46016-0_15

Download citation

  • DOI: https://doi.org/10.1007/3-540-46016-0_15

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-43475-7

  • Online ISBN: 978-3-540-46016-9

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics