Word Discovering in Low-Resources Languages Through Cross-Lingual Phonemes

García-Granada, Fernando; Sanchis, Emilio; Castro-Bleda, Maria Jose; González, José Ángel; Hurtado, Lluís-F.

doi:10.1007/978-3-030-26061-3_14

Fernando García-Granada¹¹,
Emilio Sanchis¹¹,
Maria Jose Castro-Bleda¹¹,
José Ángel González¹¹ &
…
Lluís-F. Hurtado¹¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 11658))

Included in the following conference series:

International Conference on Speech and Computer

Abstract

An approach for discovering word units in an unknown language under zero resources conditions is presented in this paper. The method is based only on acoustic similarity, combining a cross-lingual phoneme recognition, followed by an identification of consistent strings of phonemes. To this end, a 2-phases algorithm is proposed. The first phase consists of an acoustic-phonetic decoding process, considering a universal set of phonemes, not related with the target language. The goal is to reduce the search space of similar segments of speech, avoiding the quadratic search space if all-to-all speech files are compared. In the second phase, a further refinement of the founded segments is done by means of different approaches based on Dynamic Time Warping. In order to include more hypotheses than only those that correspond to perfect matching in terms of phonemes, an edit distance is calculated for the purpose to also incorporate hypotheses under a given threshold. Three frame representations are studied: raw acoustic features, autoencoders and phoneme posteriorgrams. This approach has been evaluated on the corpus used in Zero resources speech challenge 2017.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Badino, L., Canevari, C., Fadiga, L., Metta, G.: An auto-encoder based approach to unsupervised learning of subword units. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7634–7638, May 2014. https://doi.org/10.1109/ICASSP.2014.6855085
Badino, L., Mereta, A., Rosasco, L.: Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders. In: INTERSPEECH (2015)
Google Scholar
Baljekar, P., Sitaram, S., Muthukumar, P.K., Black, A.W.: Using articulatory features and inferred phonological segments in zero resource speech processing. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Józefowicz, R., Bengio, S.: Generating sentences from a continuous space. CoRR abs/1511.06349 (2015). http://arxiv.org/abs/1511.06349
Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3079–3087. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf
Driesen, J., ten Bosch, L., hamme, H.V.: Adaptive non-negative matrix factorization in a computational model of language acquisition. In: INTERSPEECH (2009)
Google Scholar
Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). http://dl.acm.org/citation.cfm?id=1953048.2021068
Dunbar, E., et al.: The zero resource speech challenge 2017. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 323–330, December 2017. https://doi.org/10.1109/ASRU.2017.8268953
Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 3–10. Morgan-Kaufmann (1994). http://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdf
Jansen, A., Durme, B.V.: Efficient spoken term discovery using randomized algorithms. In: 2011 IEEE Workshop on Automatic Speech Recognition Understanding, pp. 401–406, December 2011. https://doi.org/10.1109/ASRU.2011.6163965
Jansen, A., Church, K.: Towards unsupervised training of speaker independent acoustic models. In: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011, pp. 1693–1692 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_1693.html
Kamper, H., Livescu, K., Goldwater, S.: An embedded segmental k-means model for unsupervised segmentation and clustering of speech. CoRR abs/1703.08135 (2017). http://arxiv.org/abs/1703.08135
Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814. Omnipress (2010). http://www.icml2010.org/papers/432.pdf
Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2008). https://doi.org/10.1109/TASL.2007.909282
Article Google Scholar
Qin, L., Rudnicky, A.I.: OOV word detection using hybrid models with mixed types of fragments. In: INTERSPEECH (2012)
Google Scholar
Räsänen, O.: A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events. Cognition 120(2), 149–176 (2011)
Article Google Scholar
Renshaw, D., Kamper, H., Jansen, A., Goldwater, S.: A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)
Google Scholar
Schwarz, P., Matejka, P., Burget, L., Glembek, O.: Phoneme recognizer based on long temporal context. http://speech.fit.vutbr.cz/software/phoneme-recognizer-based-long-temporal-context
Siu, M.H., Gish, H., Chan, A., Belfield, W., Lowe, S.: Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery. Comput. Speech Lang. 28(1), 210–223 (2014). https://doi.org/10.1016/j.csl.2013.05.002
Vanhainen, N., Salvi, G.: Word discovery with beta process factor analysis. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)
Google Scholar
Vanhainen, N., Salvi, G.: Pattern discovery in continuous speech using block diagonal infinite hmm. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3719–3723. IEEE (2014)
Google Scholar
Zhang, Y., Glass, J.R.: Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: 2009 IEEE Workshop on Automatic Speech Recognition Understanding, pp. 398–403, November 2009. https://doi.org/10.1109/ASRU.2009.5372931
Zhang, Y., Glass, J.R.: Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 398–403. IEEE (2009)
Google Scholar

Download references

Acknowledgments

This work was funded by the Spanish MINECO and FEDER founds under contract TIN2017-85854-C4-2-R. Work of José-Ángel González is also financed by Universitat Politècnica de València under grant PAID-01-17.

Author information

Authors and Affiliations

VRAIN Valencian Research Institute for Artificial Intelligence, Universitat Politècnica de València, Camino de Vera s/n., 46022, Valencia, Spain
Fernando García-Granada, Emilio Sanchis, Maria Jose Castro-Bleda, José Ángel González & Lluís-F. Hurtado

Authors

Fernando García-Granada
View author publications
You can also search for this author in PubMed Google Scholar
Emilio Sanchis
View author publications
You can also search for this author in PubMed Google Scholar
Maria Jose Castro-Bleda
View author publications
You can also search for this author in PubMed Google Scholar
José Ángel González
View author publications
You can also search for this author in PubMed Google Scholar
Lluís-F. Hurtado
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to Fernando García-Granada , Emilio Sanchis , Maria Jose Castro-Bleda , José Ángel González or Lluís-F. Hurtado .

Editor information

Editors and Affiliations

Utrecht University, Utrecht, The Netherlands
Albert Ali Salah
St. Petersburg Institute for Informatics and Automation of the Russian Academy of Sciences, St. Petersburg, Russia
Alexey Karpov
Moscow State Linguistic University, Moscow, Russia
Rodmonga Potapova

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

García-Granada, F., Sanchis, E., Castro-Bleda, M.J., González, J.Á., Hurtado, LF. (2019). Word Discovering in Low-Resources Languages Through Cross-Lingual Phonemes. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_14

Download citation

DOI: https://doi.org/10.1007/978-3-030-26061-3_14
Published: 24 July 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-26060-6
Online ISBN: 978-3-030-26061-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics