Skip to main content

Word Discovering in Low-Resources Languages Through Cross-Lingual Phonemes

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2019)

Abstract

An approach for discovering word units in an unknown language under zero resources conditions is presented in this paper. The method is based only on acoustic similarity, combining a cross-lingual phoneme recognition, followed by an identification of consistent strings of phonemes. To this end, a 2-phases algorithm is proposed. The first phase consists of an acoustic-phonetic decoding process, considering a universal set of phonemes, not related with the target language. The goal is to reduce the search space of similar segments of speech, avoiding the quadratic search space if all-to-all speech files are compared. In the second phase, a further refinement of the founded segments is done by means of different approaches based on Dynamic Time Warping. In order to include more hypotheses than only those that correspond to perfect matching in terms of phonemes, an edit distance is calculated for the purpose to also incorporate hypotheses under a given threshold. Three frame representations are studied: raw acoustic features, autoencoders and phoneme posteriorgrams. This approach has been evaluated on the corpus used in Zero resources speech challenge 2017.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Badino, L., Canevari, C., Fadiga, L., Metta, G.: An auto-encoder based approach to unsupervised learning of subword units. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 7634–7638, May 2014. https://doi.org/10.1109/ICASSP.2014.6855085

  2. Badino, L., Mereta, A., Rosasco, L.: Discovering discrete subword units with binarized autoencoders and hidden-Markov-model encoders. In: INTERSPEECH (2015)

    Google Scholar 

  3. Baljekar, P., Sitaram, S., Muthukumar, P.K., Black, A.W.: Using articulatory features and inferred phonological segments in zero resource speech processing. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  4. Bowman, S.R., Vilnis, L., Vinyals, O., Dai, A.M., Józefowicz, R., Bengio, S.: Generating sentences from a continuous space. CoRR abs/1511.06349 (2015). http://arxiv.org/abs/1511.06349

  5. Dai, A.M., Le, Q.V.: Semi-supervised sequence learning. In: Cortes, C., Lawrence, N.D., Lee, D.D., Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing Systems, vol. 28, pp. 3079–3087. Curran Associates, Inc. (2015). http://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf

  6. Driesen, J., ten Bosch, L., hamme, H.V.: Adaptive non-negative matrix factorization in a computational model of language acquisition. In: INTERSPEECH (2009)

    Google Scholar 

  7. Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. 12, 2121–2159 (2011). http://dl.acm.org/citation.cfm?id=1953048.2021068

  8. Dunbar, E., et al.: The zero resource speech challenge 2017. In: 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 323–330, December 2017. https://doi.org/10.1109/ASRU.2017.8268953

  9. Hinton, G.E., Zemel, R.S.: Autoencoders, minimum description length and Helmholtz free energy. In: Cowan, J.D., Tesauro, G., Alspector, J. (eds.) Advances in Neural Information Processing Systems, vol. 6, pp. 3–10. Morgan-Kaufmann (1994). http://papers.nips.cc/paper/798-autoencoders-minimum-description-length-and-helmholtz-free-energy.pdf

  10. Jansen, A., Durme, B.V.: Efficient spoken term discovery using randomized algorithms. In: 2011 IEEE Workshop on Automatic Speech Recognition Understanding, pp. 401–406, December 2011. https://doi.org/10.1109/ASRU.2011.6163965

  11. Jansen, A., Church, K.: Towards unsupervised training of speaker independent acoustic models. In: INTERSPEECH 2011, 12th Annual Conference of the International Speech Communication Association, Florence, Italy, 27–31 August 2011, pp. 1693–1692 (2011). http://www.isca-speech.org/archive/interspeech_2011/i11_1693.html

  12. Kamper, H., Livescu, K., Goldwater, S.: An embedded segmental k-means model for unsupervised segmentation and clustering of speech. CoRR abs/1703.08135 (2017). http://arxiv.org/abs/1703.08135

  13. Nair, V., Hinton, G.E.: Rectified linear units improve restricted Boltzmann machines. In: Fürnkranz, J., Joachims, T. (eds.) Proceedings of the 27th International Conference on Machine Learning (ICML 2010), pp. 807–814. Omnipress (2010). http://www.icml2010.org/papers/432.pdf

  14. Park, A.S., Glass, J.R.: Unsupervised pattern discovery in speech. IEEE Trans. Audio Speech Lang. Process. 16(1), 186–197 (2008). https://doi.org/10.1109/TASL.2007.909282

    Article  Google Scholar 

  15. Qin, L., Rudnicky, A.I.: OOV word detection using hybrid models with mixed types of fragments. In: INTERSPEECH (2012)

    Google Scholar 

  16. Räsänen, O.: A computational model of word segmentation from continuous speech using transitional probabilities of atomic acoustic events. Cognition 120(2), 149–176 (2011)

    Article  Google Scholar 

  17. Renshaw, D., Kamper, H., Jansen, A., Goldwater, S.: A comparison of neural network methods for unsupervised representation learning on the zero resource speech challenge. In: Sixteenth Annual Conference of the International Speech Communication Association (2015)

    Google Scholar 

  18. Schwarz, P., Matejka, P., Burget, L., Glembek, O.: Phoneme recognizer based on long temporal context. http://speech.fit.vutbr.cz/software/phoneme-recognizer-based-long-temporal-context

  19. Siu, M.H., Gish, H., Chan, A., Belfield, W., Lowe, S.: Unsupervised training of an HMM-based self-organizing unit recognizer with applications to topic classification and keyword discovery. Comput. Speech Lang. 28(1), 210–223 (2014). https://doi.org/10.1016/j.csl.2013.05.002

  20. Vanhainen, N., Salvi, G.: Word discovery with beta process factor analysis. In: Thirteenth Annual Conference of the International Speech Communication Association (2012)

    Google Scholar 

  21. Vanhainen, N., Salvi, G.: Pattern discovery in continuous speech using block diagonal infinite hmm. In: 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3719–3723. IEEE (2014)

    Google Scholar 

  22. Zhang, Y., Glass, J.R.: Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: 2009 IEEE Workshop on Automatic Speech Recognition Understanding, pp. 398–403, November 2009. https://doi.org/10.1109/ASRU.2009.5372931

  23. Zhang, Y., Glass, J.R.: Unsupervised spoken keyword spotting via segmental DTW on Gaussian posteriorgrams. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 398–403. IEEE (2009)

    Google Scholar 

Download references

Acknowledgments

This work was funded by the Spanish MINECO and FEDER founds under contract TIN2017-85854-C4-2-R. Work of José-Ángel González is also financed by Universitat Politècnica de València under grant PAID-01-17.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Fernando García-Granada , Emilio Sanchis , Maria Jose Castro-Bleda , José Ángel González or Lluís-F. Hurtado .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

García-Granada, F., Sanchis, E., Castro-Bleda, M.J., González, J.Á., Hurtado, LF. (2019). Word Discovering in Low-Resources Languages Through Cross-Lingual Phonemes. In: Salah, A., Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2019. Lecture Notes in Computer Science(), vol 11658. Springer, Cham. https://doi.org/10.1007/978-3-030-26061-3_14

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-26061-3_14

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-26060-6

  • Online ISBN: 978-3-030-26061-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics