Abstract
Spoken term discovery is a challenging task when a lot of spoken content is generated without annotation. The spoken term discovery task accomplished by pattern matching techniques resolves the challenge by directly capturing the resemblance of the spoken terms at the acoustic feature level. Despite feasibility, the pattern-matching approach generates more false alarms during the discovery task due to fluctuations that arise in natural speech; hence degradation in the performance was observed. In the proposed approach, the challenge that arises due to the variability is addressed in two stages. In the first stage, the RASTA-PLP spectrogram was used as an acoustic feature representation that reduces the variabilities among similar spoken contents. In the second stage, the novel Diagonal Pattern Search method unconstrainedly computes the pattern resemblance between the identical spoken terms at the segmental level. The proposed approach was evaluated using the IITKGP-SDUC speech corpus and inferred that a 10.11% improvement in the accuracy was achieved compared to other state-of-the-art systems in the spoken term discovery task.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
- 2.
transliterated from Hindi to English for readability.
- 3.
available at http://cse.iitkgp.ac.in/~ksrao/res.html.
References
Park, A., Glass, J.R.: Towards unsupervised pattern discovery in speech. In: 2005 IEEE Workshop on Automatic Speech Recognition and Understanding, pp. 53–58 (2005)
Aimetti, G.: Modelling early language acquisition skills: towards a general statistical learning mechanism. In: Proceedings of the Student Research Workshop at EACL 2009, pp. 1–9 (2009)
Birla, L., et al.: A robust unsupervised pattern discovery and clustering of speech signals. Pattern Recogn. Lett. 116, 254–261 (2018)
ten Bosch, L., Cranen, B.: A computational model for unsupervised word discovery. ISCA, Antwerp (2007)
Carterette, B., Voorhees, E.M.: Overview of information retrieval evaluation. In: Lupu, M., Mayer, K., Tait, J., Trippe, A. (eds.) Current Challenges in Patent Information Retrieval. The Information Retrieval Series, vol. 29, pp. 69–85. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-19231-9_3
Catanese, L., et al.: MODIS: an audio motif discovery software. In: Show & Tell-Interspeech 2013 (2013)
Gupta, V., Ajmera, J., Kumar, A., Verma, A.: A language independent approach to audio search. In: Twelfth Annual Conference of the International Speech Communication Association (2011)
Hermansky, H., Morgan, N.: Rasta processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)
Jansen, A., Van Durme, B.: Efficient spoken term discovery using randomized algorithms. In: 2011 IEEE Workshop on Automatic Speech Recognition & Understanding, pp. 401–406. IEEE (2011)
Kamper, H., Livescu, K., Goldwater, S.: An embedded segmental k-means model for unsupervised segmentation and clustering of speech. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 719–726 (2017)
Karthik, P.D., Saranya, M., Murthy, H.A.: A fast query-by-example spoken term detection for zero resource languages. In: 2016 International Conference on Signal Processing and Communications (SPCOM), pp. 1–5. IEEE (2016)
Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2008)
Ram, D., Miculicich, L., Bourlard, H.: CNN based query by example spoken term detection. In: Interspeech, pp. 92–96 (2018)
Räsänen, O., Doyle, G., Frank, M.C.: Unsupervised word discovery from speech using automatic segmentation into syllable-like units. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Ravi, K.K., Krothapalli, S.R.: Phoneme segmentation-based unsupervised pattern discovery and clustering of speech signals. Circ. Syst. Signal Process. 41(4), 2088–2117 (2022)
San, N., et al.: Leveraging pre-trained representations to improve access to untranscribed speech from endangered languages. In: 2021 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 1094–1101. IEEE (2021)
Weintraub, M.: Keyword-spotting using SRI’s DECIPHER large-vocabulary speech-recognition system. In: 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 2, pp. 463–466. IEEE (1993)
Zhang, Y., Glass, J.R.: Towards multi-speaker unsupervised speech pattern discovery. In: 2010 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4366–4369 (2010)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Sudhakar, P., Sreenivasa Rao, K., Mitra, P. (2023). Unsupervised Discovery of Recurring Spoken Terms Using Diagonal Patterns. In: Maji, P., Huang, T., Pal, N.R., Chaudhury, S., De, R.K. (eds) Pattern Recognition and Machine Intelligence. PReMI 2023. Lecture Notes in Computer Science, vol 14301. Springer, Cham. https://doi.org/10.1007/978-3-031-45170-6_7
Download citation
DOI: https://doi.org/10.1007/978-3-031-45170-6_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-45169-0
Online ISBN: 978-3-031-45170-6
eBook Packages: Computer ScienceComputer Science (R0)