Skip to main content

Does A Priori Phonological Knowledge Improve Cross-Lingual Robustness of Phonemic Contrasts?

  • Conference paper
  • First Online:
Speech and Computer (SPECOM 2020)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 12335))

Included in the following conference series:

  • 1592 Accesses

Abstract

For speech models that depend on sharing between phonological representations an often overlooked issue is that phonological contrasts that are succinctly described language-internally by the phonemes and their respective featurizations are not necessarily robust across languages. This paper extends a recently proposed method for assessing the cross-linguistic consistency of phonological features in phoneme inventories. The original method employs binary neural classifiers for individual phonological contrasts trained solely on audio. This method cannot resolve some important phonological contrasts, such as retroflex consonants, cross-linguistically. We extend this approach by leveraging prior phonological knowledge during classifier training. We observe that since phonemic descriptions are articulatory rather than acoustic the model input space needs to be grounded in phonology to better capture phonemic correlations between the training samples. The cross-linguistic consistency of the proposed method is evaluated in a multilingual setting on held-out low-resource languages and classification quality is reported. We observe modest gains over the baseline for difficult cases, such as cross-lingual detection of aspiration, and discuss multiple confounding factors that explain the dimensions of the difficulty for this task.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Abadi, M., et al.: TensorFlow: a system for large-scale machine learning. In: Proceedings of 12th Symposium on Operating Systems Design and Implementation (OSDI), pp. 265–283. USENIX Association (2016)

    Google Scholar 

  2. Chomsky, N., Halle, M.: The Sound Pattern of English. Harper & Row, New York (1968)

    Google Scholar 

  3. Demirsahin, I., Jansche, M., Gutkin, A.: A unified phonological representation of South Asian languages for multilingual text-to-speech. In: Proceedings of 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), pp. 80–84. ISCA, Gurugram, India (2018). https://doi.org/10.21437/SLTU.2018-17

  4. Emeneau, M.: India as a linguistic area. Language 32(1), 3–16 (1956). https://doi.org/10.2307/410649

    Article  Google Scholar 

  5. Fu, T., Gao, S., Wu, X.: Improving minority language speech recognition based on distinctive features. In: Peng, Y., Yu, K., Lu, J., Jiang, X. (eds.) IScIDE 2018. LNCS, vol. 11266, pp. 411–420. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02698-1_36

    Chapter  Google Scholar 

  6. Ganchev, T., Fakotakis, N., Kokkinakis, G.: Comparative evaluation of various MFCC implementations on the speaker verification task. In: Proceedings of 10th International Conference on Speech and Computer (SPECOM), vol. 1, pp. 191–194, Patras, Greece (2005)

    Google Scholar 

  7. Gussenhoven, C.: Understanding Phonology, 4th edn. Routledge, London (2017). https://doi.org/10.4324/9781315267982

    Book  Google Scholar 

  8. Gutkin, A.: Eidos: an open-source auditory periphery modeling toolkit and evaluation of cross-lingual phonemic contrasts. In: Proceedings of 1st Joint Spoken Language Technologies for Under-Resourced Languages (SLTU) and Collaboration and Computing for Under-Resourced Languages (CCURL) Workshop (SLTU-CCURL 2020), pp. 9–20. European Language Resources Association (ELRA), Marseille (2020)

    Google Scholar 

  9. Hall, T.A.: Distinctive Feature Theory. Mouton de Grutyer, Berlin (2001). https://doi.org/10.1515/9783110886672

    Book  Google Scholar 

  10. Hammarström, H., Forkel, R., Haspelmath, M., Bank, S.: Glottolog 4.2.1. Max Planck Institute for the Science of Human History, Jena, Germany (2020). https://doi.org/10.5281/zenodo.3754591

  11. Haspelmath, M., Dryer, M.S., Gil, D., Comrie, B.: The World Atlas of Language Structures. Oxford University Press, Oxford (2005). https://doi.org/10.5281/zenodo.3731125

    Book  Google Scholar 

  12. He, D., Yang, X., Lim, B.P., Liang, Y., Hasegawa-Johnson, M., Chen, D.: When CTC training meets acoustic landmarks. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5996–6000. IEEE, Brighton (2019). https://doi.org/10.1109/ICASSP.2019.8683607

  13. He, F., et al.: Open-source multi-speaker speech corpora for building Gujarati, Kannada, Malayalam, Marathi, Tamil and Telugu speech synthesis systems. In: Proceedings of 12th Language Resources and Evaluation Conference (LREC), pp. 6494–6503. European Language Resources Association (ELRA), Marseille (2020)

    Google Scholar 

  14. Hoogervorst, T.: Detecting pre-modern lexical influence from South India in Maritime Southeast Asia. Archipel: Études interdisciplinaires sur le monde insulindien (89), 63–93 (2015). https://doi.org/10.4000/archipel.490

  15. Jakobson, R., Fant, G., Halle, M.: Preliminaries to Speech Analysis: The Distinctive Features and Their Correlates. MIT Press, Cambridge (1952)

    Google Scholar 

  16. Johny, C., Gutkin, A., Jansche, M.: Cross-lingual consistency of phonological features: an empirical study. In: Proceedings of Interspeech 2019, pp. 1741–1745. ISCA, Graz (2019). https://doi.org/10.21437/Interspeech.2019-2184

  17. Karaulov, I., Tkanov, D.: Attention model for articulatory features detection. In: Proceedings of Interspeech 2019, pp. 1571–1575. ISCA, Graz (2019). https://doi.org/10.21437/Interspeech.2019-3020

  18. Kirchhoff, K., Fink, G.A., Sagerer, G.: Conversational speech recognition using acoustic and articulatory input. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 3, pp. 1435–1438. IEEE, Istanbul (2000). https://doi.org/10.1109/ICASSP.2000.861883

  19. Kjartansson, O., Sarin, S., Pipatsrisawat, K., Jansche, M., Ha, L.: Crowd-sourced speech corpora for Javanese, Sundanese, Sinhala, Nepali, and Bangladeshi Bengali. In: Proceedings of 6th Workshop on Spoken Language Technologies for Under-Resourced Languages (SLTU), pp. 52–55. ISCA, Gurugram (2018). https://doi.org/10.21437/SLTU.2018-11

  20. Krawczyk, B.: Learning from imbalanced data: open challenges and future directions. Prog. Artif. Intell. 5(4), 221–232 (2016). https://doi.org/10.1007/s13748-016-0094-0

    Article  Google Scholar 

  21. Merkx, D., Scharenborg, O.: Articulatory feature classification using convolutional neural networks. In: Proceedings of Interspeech, Hyderabad, India, pp. 2142–2146 (2018). https://doi.org/10.21437/Interspeech.2018-2275

  22. Metze, F., Waibel, A.: A flexible stream architecture for ASR using articulatory features. In: Proceedings of 7th International Conference on Spoken Language Processing (ICSLP), pp. 2133–2136. ISCA, Denver (2002)

    Google Scholar 

  23. Momayyez, P., Waterhouse, J., Rose, R.: Exploiting complementary aspects of phonological features in automatic speech recognition. In: Proceedings of IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), pp. 47–52. IEEE, Kyoto (2007). https://doi.org/10.1109/ASRU.2007.4430082

  24. Moran, S., McCloy, D.: PHOIBLE 2.0. Max Planck Institute for Evolutionary Anthropology, Jena, Germany (2019). http://phoible.org/

  25. Mortensen, D.R., et al.: AlloVera: a multilingual allophone database. arXiv preprint arXiv:2004.08031 (2020)

  26. Mortensen, D.R., Littell, P., Bharadwaj, A., Goyal, K., Dyer, C., Levin, L.: PanPhon: a resource for mapping IPA segments to articulatory feature vectors. In: Proceedings of COLING, Osaka, Japan, pp. 3475–3484 (2016)

    Google Scholar 

  27. Phillips, A., Davis, M.: BCP 47 - Tags for Identifying Languages. IETF Trust (2009)

    Google Scholar 

  28. Povey, D.: Open SLR. John Hopkins University, Baltimore (2020). http://www.openslr.org/resources.php

  29. Qu, L., Weber, C., Lakomkin, E., Twiefel, J., Wermter, S.: Combining articulatory features with end-to-end learning in speech recognition. In: Kůrková, V., Manolopoulos, Y., Hammer, B., Iliadis, L., Maglogiannis, I. (eds.) ICANN 2018. LNCS, vol. 11141, pp. 500–510. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-01424-7_49

    Chapter  Google Scholar 

  30. Rallabandi, S., Black, A.: Variational attention using articulatory priors for generating code mixed speech using monolingual corpora. In: Proceedings of Interspeech, pp. 3735–3739 (2019). https://doi.org/10.21437/Interspeech.2019-1103

  31. Rasipurama, R., Magimai-Doss, M.: Articulatory feature based continuous speech recognition using probabilistic lexical modeling. Comput. Speech Lang. 36, 233–259 (2016). https://doi.org/10.1016/j.csl.2015.04.003

    Article  Google Scholar 

  32. Repp, B.H.: Categorical perception: issues, methods, findings. In: Speech and Language: Advances in Basic Research and Practice, vol. 10, pp. 243–335. Elsevier (1984)

    Google Scholar 

  33. Rose, R., Momayyez, P.: Integration of multiple feature sets for reducing ambiguity in ASR. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. IV-325–IV-328. IEEE, Honolulu (2007). https://doi.org/10.1109/ICASSP.2007.366915

  34. Siniscalchi, S.M., Lee, C.H.: A study on integrating acoustic-phonetic information into lattice rescoring for automatic speech recognition. Speech Commun. 51(11), 1139–1153 (2009). https://doi.org/10.1016/j.specom.2009.05.004

    Article  Google Scholar 

  35. Siniscalchi, S.M., Svendsen, T., Lee, C.H.: Toward a detector-based universal phone recognizer. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4261–4264. IEEE, Las Vegas (2008). https://doi.org/10.1109/ICASSP.2008.4518596

  36. Smith, S.L., Kindermans, P.J., Ying, C., Le, Q.V.: Don’t decay the learning rate, increase the batch size. arXiv preprint arXiv:1711.00489 (2017)

  37. Srivastava, N., Hinton, G.E., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. (JMLR) 15(56), 1929–1958 (2014)

    MathSciNet  MATH  Google Scholar 

  38. Stüker, S., Schultz, T., Metze, F., Waibel, A.: Integrating multilingual articulatory features into speech recognition. In: Proceedings of EuroSpeech, pp. 1033–1036. ISCA, Geneva (2003)

    Google Scholar 

  39. Stüker, S., Schultz, T., Metze, F., Waibel, A.: Multilingual articulatory features. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. I144–I147. IEEE, Hong Kong (2003). https://doi.org/10.1109/ICASSP.2003.1198737

  40. Stüker, S., Waibel, A.: Porting speech recognition systems to new languages supported by articulatory feature models. In: Proceedings of 13th International Conference on Speech and Computer (SPECOM). St. Petersburg, Russia (2009)

    Google Scholar 

  41. Tolba, H., Selouani, S., O’Shaughnessy, D.: Auditory-based acoustic distinctive features and spectral cues for automatic speech recognition using a multi-stream paradigm. In: Proceedings of International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. I-837–I-840. IEEE, Orlando (2002). https://doi.org/10.1109/ICASSP.2002.5743869

  42. Tsvetkov, Y., et al.: Polyglot neural language models: a case study in cross-lingual phonetic representation learning. In: Proceedings of 2016 Conference of the North American Chapter of the Association for Computational Linguistics, pp. 1357–1366. ACL, San Diego (2016). https://doi.org/10.18653/v1/N16-1161

  43. Wibawa, J.A.E., et al.: Building open Javanese and Sundanese corpora for multilingual text-to-speech. In: Proceedings of 11th Conference on Language Resources and Evaluation (LREC), pp. 1610–1614. European Language Resources Association (ELRA), Miyazaki (2018)

    Google Scholar 

  44. Young, S., et al.: The HTK Book. Cambridge University Engineering Department, Cambridge (2006)

    Google Scholar 

  45. Zeiler, M.D.: ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012)

  46. Zheng, H., Yang, Z., Liu, W., Liang, J., Li, Y.: Improving deep neural networks using softplus units. In: Proceedings of International Joint Conference on Neural Networks (IJCNN), pp. 1–4. IEEE (2015). https://doi.org/10.1109/IJCNN.2015.7280459

Download references

Acknowledgments.

The authors would like to thank Cibu Johny for his help with the experiments, and Işın Demirşahin and Rob Clark for fruitful discussions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alexander Gutkin .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Skidmore, L., Gutkin, A. (2020). Does A Priori Phonological Knowledge Improve Cross-Lingual Robustness of Phonemic Contrasts?. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2020. Lecture Notes in Computer Science(), vol 12335. Springer, Cham. https://doi.org/10.1007/978-3-030-60276-5_51

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-60276-5_51

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-60275-8

  • Online ISBN: 978-3-030-60276-5

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics