Abstract
The Arabic language has a rich morphology and structure with a diverse vocabulary and rarely used words. Consequently, most Arabic Natural Language Processing (NLP) tasks could benefit from embedding models that do not assign a distinct vector to each unique word in the used vocabulary but instead focus on the internal structure of words. The semantic meaning of a word is related to the meaning of its composing characters which contain rich internal information. In this paper, we propose a new embedding model using two levels of granularity; words and characters. Moreover, we describe the details of generating an Arabic word embeddings using Word2Vec and FastText models. Furthermore, a Deep Learning (DL) architecture will be applied to the top of the word-character embeddings. Experimental results show that the proposed scheme outperforms the state-of-the-art methods proposed for Arabic chatbots.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
AlHumoud, S., Al, A., Aldamegh, W.: Arabic chatbots: a survey. Int. J. Adv. Comput. Sci. Appl. 9, 535–541 (2018)
Bensalah, N., Ayad, H., Adib, A., Farouk, A.I.E.: LSTM or GRU for Arabic machine translation? Why not both! In: International Conference on Innovation and New Trends in Information Technology, INTIS 2019, Tangier, Morocco, 20–21 December (2019)
Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)
Brini, W., Ellouze, M., Mesfar, S., Belguith, L.H.: An Arabic question-answering system for factoid questions. In: 2009 International Conference on Natural Language Processing and Knowledge Engineering, pp. 1–7 (2009)
Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)
Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)
Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18, 602–610 (2005)
Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)
Kaibi, I., Satori, H., et al.: Sentiment analysis approach based on combination of word embedding techniques. In: Bhateja, V., Satapathy, S., Satori, H. (eds.) Embedded Systems and Artificial Intelligence. AISC, vol. 1076, pp. 805–813. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0947-6_76
Khalifa, M., Shaalan, K.: Character convolutions for Arabic named entity recognition with long short-term memory networks. Comput. Speech Lang. 58, 335–346 (2019)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013)
Muangkammuen, P., Intiruk, N., Saikaew, K.R.: Automated Thai-FAQ chatbot using RNN-LSTM. In: 2018 22nd International Computer Science and Engineering Conference (ICSEC), pp. 1–4 (2018)
Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)
Shawar, A., Atwell, E.: An Arabic chatbot giving answers from the Quran. In: Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, pp. 197–202 (2004)
Sojasingarayar, A.: Seq2Seq AI chatbot with attention mechanism (2020)
Xu, A., Liu, Z., Guo, Y., Sinha, V., Akkiraju, R.: A new chatbot for customer service on social media. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3506–3510 (2017)
Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The United Nations parallel corpus v1. 0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 3530–3534 (2016)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Bensalah, N., Ayad, H., Adib, A., Ibn el farouk, A. (2022). Combining Word and Character Embeddings for Arabic Chatbots. In: Kacprzyk, J., Balas, V.E., Ezziyyani, M. (eds) Advanced Intelligent Systems for Sustainable Development (AI2SD’2020). AI2SD 2020. Advances in Intelligent Systems and Computing, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-90633-7_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-90633-7_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-90632-0
Online ISBN: 978-3-030-90633-7
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)