Skip to main content

Combining Word and Character Embeddings for Arabic Chatbots

  • Conference paper
  • First Online:
Advanced Intelligent Systems for Sustainable Development (AI2SD’2020) (AI2SD 2020)

Abstract

The Arabic language has a rich morphology and structure with a diverse vocabulary and rarely used words. Consequently, most Arabic Natural Language Processing (NLP) tasks could benefit from embedding models that do not assign a distinct vector to each unique word in the used vocabulary but instead focus on the internal structure of words. The semantic meaning of a word is related to the meaning of its composing characters which contain rich internal information. In this paper, we propose a new embedding model using two levels of granularity; words and characters. Moreover, we describe the details of generating an Arabic word embeddings using Word2Vec and FastText models. Furthermore, a Deep Learning (DL) architecture will be applied to the top of the word-character embeddings. Experimental results show that the proposed scheme outperforms the state-of-the-art methods proposed for Arabic chatbots.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 189.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 249.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://radimrehurek.com/gensim/about.html.

  2. 2.

    https://radimrehurek.com/gensim/models/fasttext.html.

  3. 3.

    https://github.com/Jihad92/arabic-chatbot.

References

  1. AlHumoud, S., Al, A., Aldamegh, W.: Arabic chatbots: a survey. Int. J. Adv. Comput. Sci. Appl. 9, 535–541 (2018)

    Google Scholar 

  2. Bensalah, N., Ayad, H., Adib, A., Farouk, A.I.E.: LSTM or GRU for Arabic machine translation? Why not both! In: International Conference on Innovation and New Trends in Information Technology, INTIS 2019, Tangier, Morocco, 20–21 December (2019)

    Google Scholar 

  3. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguist. 5, 135–146 (2017)

    Article  Google Scholar 

  4. Brini, W., Ellouze, M., Mesfar, S., Belguith, L.H.: An Arabic question-answering system for factoid questions. In: 2009 International Conference on Natural Language Processing and Knowledge Engineering, pp. 1–7 (2009)

    Google Scholar 

  5. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder-decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)

    Google Scholar 

  6. Erhan, D., Bengio, Y., Courville, A., Manzagol, P.-A., Vincent, P., Bengio, S.: Why does unsupervised pre-training help deep learning? J. Mach. Learn. Res. 11, 625–660 (2010)

    MathSciNet  MATH  Google Scholar 

  7. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18, 602–610 (2005)

    Article  Google Scholar 

  8. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)

    Article  Google Scholar 

  9. Kaibi, I., Satori, H., et al.: Sentiment analysis approach based on combination of word embedding techniques. In: Bhateja, V., Satapathy, S., Satori, H. (eds.) Embedded Systems and Artificial Intelligence. AISC, vol. 1076, pp. 805–813. Springer, Singapore (2020). https://doi.org/10.1007/978-981-15-0947-6_76

  10. Khalifa, M., Shaalan, K.: Character convolutions for Arabic named entity recognition with long short-term memory networks. Comput. Speech Lang. 58, 335–346 (2019)

    Article  Google Scholar 

  11. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: 1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, 2–4 May 2013, Workshop Track Proceedings (2013)

    Google Scholar 

  12. Muangkammuen, P., Intiruk, N., Saikaew, K.R.: Automated Thai-FAQ chatbot using RNN-LSTM. In: 2018 22nd International Computer Science and Engineering Conference (ICSEC), pp. 1–4 (2018)

    Google Scholar 

  13. Papineni, K., Roukos, S., Ward, T., Zhu, W.-J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318. Association for Computational Linguistics (2002)

    Google Scholar 

  14. Shawar, A., Atwell, E.: An Arabic chatbot giving answers from the Quran. In: Proceedings of TALN04: XI Conference sur le Traitement Automatique des Langues Naturelles, pp. 197–202 (2004)

    Google Scholar 

  15. Sojasingarayar, A.: Seq2Seq AI chatbot with attention mechanism (2020)

    Google Scholar 

  16. Xu, A., Liu, Z., Guo, Y., Sinha, V., Akkiraju, R.: A new chatbot for customer service on social media. In: Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, pp. 3506–3510 (2017)

    Google Scholar 

  17. Ziemski, M., Junczys-Dowmunt, M., Pouliquen, B.: The United Nations parallel corpus v1. 0. In: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC 2016), pp. 3530–3534 (2016)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nouhaila Bensalah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bensalah, N., Ayad, H., Adib, A., Ibn el farouk, A. (2022). Combining Word and Character Embeddings for Arabic Chatbots. In: Kacprzyk, J., Balas, V.E., Ezziyyani, M. (eds) Advanced Intelligent Systems for Sustainable Development (AI2SD’2020). AI2SD 2020. Advances in Intelligent Systems and Computing, vol 1417. Springer, Cham. https://doi.org/10.1007/978-3-030-90633-7_48

Download citation

Publish with us

Policies and ethics