Abstract
This paper presents a dialect recognition system for the Kurdish language using speaker embedding. Two main goals are followed in this research: first, we investigate the availability of dialect information in speaker embedding, then this information is used for spoken dialect recognition in the Kurdish language. Second, we introduce a public dataset for Kurdish spoken dialect recognition named Zar. The Zar dataset comprises 16,385 utterances in 49 h-36 min for five dialects of the Kurdish language (Northern Kurdish, Central Kurdish, Southern Kurdish, Hawrami, and Zazaki). The dialect recognition is done with x-vector speaker embedding which is trained for speaker recognition using Voxceleb1 and Voxceleb2 datasets. After that, the extracted x-vectors are used to train support vector machine (SVM) and decision tree classifiers for dialect recognition. The results are compared with an i-vector system that is trained specifically for Kurdish spoken dialect recognition. In both systems (i-vector and x-vector), the SVM classifier with 87% of precision results in better performance. Our results show that the information preserved in the speaker embedding can be used for automatic dialect recognition.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Li, H., Ma, B., Lee, K.: Spoken language recognition: from fundamentals to practice. In: Proceedings of the IEEE, vol. 101, issue 5, pp. 1136–1159 (2013). https://doi.org/10.1109/JPROC.2012.2237151
Biadsy, F., Soltauy, H., Manguy, L., Navratily, J., Hirschberg, J.: Discriminative phonotactics for dialect recognition using context-dependent phone classifiers. In: Proceedings of the IEEE Odyssey: Speaker and Language Recognition Workshop, pp. 263–270, Brno, Czech Republic (2010)
Wang, W., Song, W., Chen, Ch., Zhang, Z., Xin, Y.: I-vector features and deep neural network modeling for language recognition. Procedia Comput. Sci. 147, 36–43 (2019)
Torres-Carrasquillo, P., Gleason, T., Reynolds, D.: Dialect identification using Gaussian Mixture Models (2004)
Lei, Y., Hansen, J.: Factor analysis-based information integration for Arabic dialect identification. In: 2009 IEEE International Conference on Acoustics, Speech and Signal Processing, pp. 4337–4340 (2009). https://doi.org/10.1109/ICASSP.2009.4960589
Hanani, A., Naser, R.: Spoken Arabic dialect recognition using X-vectors. Natural Language Engineering. Cambridge University Press (2020)
Snyder, D., Garcia-Romero, D., McCree, A., Sell, G., Povey, D., Khudanpur, S.: Spoken language recognition using X-vectors. In: Proceedings of the Odyssey 2018 The Speaker and Language Recognition Workshop, pp. 105–111 (2018). https://doi.org/10.21437/Odyssey.2018-15
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S.: X-vectors: robust DNN embedding for speaker recognition. In: 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 5329–5333 (2018). https://doi.org/10.1109/ICASSP.2018.8461375
Raj, D., Snyder, D., Povey, D., Khudanpur, S.: Probing the information encoded in X-vectors. In: 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 726–733 (2019). https://doi.org/10.1109/ASRU46091.2019.9003979
Mohammadamini, M., Matrouf, D., Bonastre, J-F., Serizel, R., Dowerah, S., Jouvet, D.: Compensate multiple distortions for speaker recognition systems. In: EUSIPCO (2021)
Veisi, H., MohammadAmini, M., Hosseini, H.: Toward Kurdish language processing: experiments in collecting and processing the AsoSoft text corpus. Digit. Scholarsh. Humanit. 35(1), 176–193 (2020). https://doi.org/10.1093/llc/fqy074
Malmasi, S.: Subdialectal differences in Sorani Kurdish. In: Proceedings of the Third Workshop on NLP for Similar Languages, Varieties and Dialects, Osaka, Japan (2016)
Veisi, H., Hosseini, H., Mohammadamini, M., Fathy, W., Mahmudi, A.: A Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon (2021), https://arxiv.org/abs/2102.07412v1
Abdul, Z.: Kurdish speaker identification based on one dimensional convolutional neural network. Comput. Methods Diff. Equat. 7(4), 566–572 (2019). (Special Issue)
Hassani, H., Hamid, O.: Using Artificial Neural Networks in Dialect Identification in Less-resourced Languages - The Case of Kurdish Dialects Identification
Hassani, H., Medjedovic, D.: Automatic Kurdish dialects identification. In: Conference: Fifth International Conference on Natural language Processing, Sydney, Australia (2016)
Pappagari, R., Wang, T., Villalba, J., Chen, N., Dehak, N.: X-vectors meet emotions: a study on dependencies between emotion and speaker recognition. In: ICASSP (2020)
Nandwana, M.K., et al.: The VOiCES from a distance challenge 2019: analysis of speaker verification results and remaining challenges. In: Proceedings of the Speaker and Language Recognition Workshop, pp. 165–170. https://doi.org/10.21437/Odyssey.2020-24
Snyder, D., Chen, G., Povey, D.: MUSAN A Music, Speech, and Noise Corpus (2015) arXiv:1510.08484v1
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2021 Springer Nature Switzerland AG
About this paper
Cite this paper
Amani, A., Mohammadamini, M., Veisi, H. (2021). Kurdish Spoken Dialect Recognition Using X-Vector Speaker Embedding. In: Karpov, A., Potapova, R. (eds) Speech and Computer. SPECOM 2021. Lecture Notes in Computer Science(), vol 12997. Springer, Cham. https://doi.org/10.1007/978-3-030-87802-3_5
Download citation
DOI: https://doi.org/10.1007/978-3-030-87802-3_5
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-87801-6
Online ISBN: 978-3-030-87802-3
eBook Packages: Computer ScienceComputer Science (R0)