Skip to main content

Arabic Machine Translation Based on the Combination of Word Embedding Techniques

  • Chapter
  • First Online:
Intelligent Systems in Big Data, Semantic Web and Machine Learning

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1344))

Abstract

Automatic Machine Translation is a computer application that automatically translates one source-language sentence into the corresponding target-language sentence. With the increased volume of user-generated content on the web, textual information becomes freely available and with a gigantic quantity. Hence, it is becoming increasingly common to adopt automated analysis tools from Machine Learning (ML) to represent such kind of information. In this paper, we propose a new method called Enhanced Word Vectors (EWVs) generated using Word2vec and FastText models. These EWVs are then used for training and testing a new Deep Learning (DL) architecture based on Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs). Moreover, special preprocessing of the Arabic sentences is carried out. The performance of the proposed scheme is validated and compared with Word2vec and FastText using UN dataset. From the experimental results, we find that in most of the cases, our proposed approach achieves the best results, compared to Word2vec and FastText models alone.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 149.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 199.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    https://radimrehurek.com/gensim/about.html.

  2. 2.

    https://radimrehurek.com/gensim/models/fasttext.html.

References

  1. Abdelali, A., Darwish, K., Durrani, N., and Mubarak, H.: Farasa: a fast and furious segmenter for arabic. In: Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Demonstrations, pp. 11–16 (2016)

    Google Scholar 

  2. Al-Sallab, A., Baly, R., Hajj, H., Shaban, K.B., El-Hajj, W., Badaro, G.: Aroma: A recursive deep learning model for opinion mining in Arabic as a low resource language. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 16(4) (2017)

    Google Scholar 

  3. Alkhatib, M., Shaalan, K.: The key challenges for Arabic machine translation, vol. 01, pp. 139–156 (2018)

    Google Scholar 

  4. Almahairi, A., Cho, K., Habash, N., Courville, A.C.: First result on Arabic neural machine translation. CoRR abs/1606.02680 (2016)

    Google Scholar 

  5. Alqudsi, A., Omar, N., Shaker, K.: Arabic machine translation: a survey. Artif. Intell. Rev. 42 (2012)

    Google Scholar 

  6. Alrajeh, A.: A recipe for Arabic-English neural machine translation. CoRR abs/1808.06116 (2018)

    Google Scholar 

  7. Athiwaratkun, B., Wilson, A.G., Anandkumar, A.: Probabilistic FastText for multi-sense word embeddings. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, ACL 2018, Melbourne, Australia, 15–20 July 2018, Volume 1: Long Papers, pp. 1–11 (2018)

    Google Scholar 

  8. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: 3rd International Conference on Learning Representations, ICLR (2015)

    Google Scholar 

  9. Bensalah, N., Ayad, H., Adib, A., Farouk, A.I.E.: Arabic sentiment analysis based on 1-D convolutional neural network. In: International Conference on Smart City Applications, SCA20, Safranbolu, Turkey (2020)

    Google Scholar 

  10. dBensalah, N., Ayad, H., Adib, A., arouk, A.I.E.: Combining word and character embeddings in Arabic chatbots. In: Advanced Intelligent Systems for Sustainable Development, AI2SD 2020, Tangier, Morocco (2020)

    Google Scholar 

  11. Bensalah, N., Ayad, H., Adib, A., Farouk, A.I.E.: CRAN: an hybrid CNN-RNN attention-based model for Arabic machine translation. In: International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications, CloudTech’20, Marrakesh, Morocco (2020)

    Google Scholar 

  12. Bergstra, J., Bengio, Y.: Random search for hyper-parameter optimization. J. Mach. Learn. Res. 13, 281–305 (2012)

    MathSciNet  MATH  Google Scholar 

  13. Bojanowski, P., Grave, E., Joulin, A., Mikolov, T.: Enriching word vectors with subword information. Trans. Assoc. Comput. Linguistics 5, 135–146 (2017)

    Article  Google Scholar 

  14. Cho, K., van Merriënboer, B., Bahdanau, D., Bengio, Y.: On the properties of neural machine translation: encoder–decoder approaches. In: Proceedings of SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, pp. 103–111 (2014)

    Google Scholar 

  15. Cho, K., van Merrienboer, B., Gulcehre, A., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. In: EMNLP, pp. 1724–1734. ACL (2014)

    Google Scholar 

  16. Durrani, N., Dalvi, F., Sajjad, H., Vogel, S.: QCRI machine translation systems for IWSLT 16. CoRR abs/1701.03924 (2017)

    Google Scholar 

  17. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. The MIT Press (2016)

    Google Scholar 

  18. Graves, A., Mohamed, A., Hinton, G.E.: Speech recognition with deep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, pp. 6645–6649 (2013)

    Google Scholar 

  19. Habash, N., Sadat, F.: Arabic preprocessing schemes for statistical machine translation. In: Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers, pp. 49–52, June 2006

    Google Scholar 

  20. Hadla, L., Hailat, T., Al-Kabi, M.: Evaluating Arabic to English machine translation. Int. J. Adv. Comput. Sci. Appl. 5 (2014)

    Google Scholar 

  21. Hadla, L., Hailat, T., Al-Kabi, M.: Evaluating Arabic to English machine translation. Int. J. Adv. Comput. Sci. Appl. 5, 68–73 (2014)

    Google Scholar 

  22. Harris, Z.: Distributional structure. Word 10(2–3), 146–162 (1954)

    Article  Google Scholar 

  23. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9, 1735–1780 (1997)

    Article  Google Scholar 

  24. LeCun, Y., Bottou, L., Bengio, Y., Haffner, P.: Gradient-based learning applied to document recognition. Proc. IEEE 86, 2278–2324 (1998)

    Article  Google Scholar 

  25. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J. Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  26. Monroe, W., Green, S., Manning, C.D.: Word segmentation of informal Arabic with domain adaptation. In: Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 206–211, June 2014

    Google Scholar 

  27. Oudah, M., Almahairi, A., Habash, N.: The impact of preprocessing on Arabic-English statistical and neural machine translation. In: Proceedings of Machine Translation Summit XVII Volume 1: Research Track, MTSummit, pp. 214–221 (2019)

    Google Scholar 

  28. Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. In: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28, ICML 2013, pp. III-1310–III-1318 (2013). JMLR.org

  29. Pasha, A., Al-Badrashiny, M., Diab, M., El Kholy, A., Eskander, R., Habash, N., Pooleery, M., Rambow, O., Roth, R.: MADAMIRA: a fast, comprehensive tool for morphological analysis and disambiguation of Arabic. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC 2014), pp. 1094–1101, May 2014

    Google Scholar 

  30. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. CoRR (2014)

    Google Scholar 

  31. Sutskever, I., Vinyals, O., Le, Q.V.: Sequence to sequence learning with neural networks. In: Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014, 8–13 December 2014, pp. 3104–3112 (2014)

    Google Scholar 

  32. Tiedemann, J.: Parallel data, tools and interfaces in OPUS. In: Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC 2012), Istanbul, Turkey, pp. 2214–2218 (2012)

    Google Scholar 

  33. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L.U., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, vol. 30, pp. 5998–6008 (2017)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nouhaila Bensalah .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2021 The Author(s), under exclusive license to Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Bensalah, N., Ayad, H., Adib, A., El Farouk, A.I. (2021). Arabic Machine Translation Based on the Combination of Word Embedding Techniques. In: Gherabi, N., Kacprzyk, J. (eds) Intelligent Systems in Big Data, Semantic Web and Machine Learning. Advances in Intelligent Systems and Computing, vol 1344. Springer, Cham. https://doi.org/10.1007/978-3-030-72588-4_17

Download citation

Publish with us

Policies and ethics