Single Arabic Document Summarization Using Natural Language Processing Technique

  • Asmaa A. Bialy
  • Marwa A. Gaheen
  • R. M. ElEraky
  • A. F. ElGamal
  • Ahmed A. EweesEmail author
Part of the Studies in Computational Intelligence book series (SCI, volume 874)


This paper presents a method based on natural language processing (NLP) for single Arabic document summarization. The suggested method based on the extractive method to select the most valuable information in the document. However, working with Arabic text is considered as a challenging task, this chapter tries to produce an accurate result by using some of NLP techniques. The proposed method is formed from three phases, the first one work as a pre-processing phase to unify synonyms terms, stemming, remove punctuation marks and remove text decoration. Consequently, it produces the features vectors and scores these features to start to select the clauses with the highest scores then marks it as important clauses. The suggested method’s results are compared versus the traditional methods. In this context, two human experts summarized all the datasets manually in order to prepare a strong compare and effective evaluation of the suggested method. In the evaluation phase, some of the performance measures include accuracy, precision, recall, f-measure, and Rouge measure are used. The experimental results denoted that the suggested method showed a competitive execution compared with the human experts in summarization ratio as well as in the accuracy of the produced document.


Natural language processing Arabic text summarization Single document summarization Extractive method 


  1. 1.
    A. Nenkova, K. Mckeown, Automatic Summarization (USA, 2011), p. 1Google Scholar
  2. 2.
    S. Suneetha, automatic text summarization: the current state of the art. Int. J. Sci. Adv. Technol. 1(9), (2011), ISSN: 2221-8386Google Scholar
  3. 3.
    R. Mol, Sabeeha: an automatic document summarization system using a fusion method. Int. Res. J. Eng. Technol. (IRJET), 3 (2016), ISSN: 2395-0056Google Scholar
  4. 4.
    Y. Rajput, P. Saxena, A combined approach for effective text mining using node clustering. Int. J. Adv. Res. Comput. Commun. Eng. 5(4), 321–324 (2016), ISSN: 2319 5940Google Scholar
  5. 5.
    N. Bhatia, A. Jaiswal, Literature review on automatic text summarization: single and multiple summarizations. Int. J. Comput. Appl. (IJCA) 117(6), 0975–8887 (2016)CrossRefGoogle Scholar
  6. 6.
    D. Radev, S. Teufel, H. Saggion, W. Lam J. Blitzer A. Celebi, et al., Evaluation of text summarization in a cross-lingual information retrieval framework, (2011)Google Scholar
  7. 7.
    S. Lagrini, M. Redjimi, N. Azizi, Automatic arabic text summarization approaches. Int. J. Computer Appl. 164(5) (2017)CrossRefGoogle Scholar
  8. 8.
    A. Al-Saleh, M. Menail, Automatic Arabic text summarization: a survey. Artif. Intell. Rev. Arch 45(2), 203–234 (2016)CrossRefGoogle Scholar
  9. 9.
    M. Tafiqe, Y. Farag, M. Younis, Comparative and Contrastive Linguistics (Cairo University, 2014)Google Scholar
  10. 10.
    A. Basiony, Computer for extracting knowledge and opinion mining (Dar El Kotb El-elmia for publishing, Cairo-Egypt, 2011)Google Scholar
  11. 11.
    H. Oufaida, O. Noualib, P. Blache, Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J. King Saud Univ.-Comput. Inf. Sci. 450–461 (2014)CrossRefGoogle Scholar
  12. 12.
    K. Merchant, Y. Pande, NLP based latent semantic analysis for legal text summarization, in 2018 International Conference on Advances in Computing, Communications and Informatics (ICACCI) (IEEE, 2018), pp. 1803–1807Google Scholar
  13. 13.
    A. Khan, N. Salim, H. Farman, M. Khan, B. Jan, A. Ahmad, A. Paul, Abstractive text summarization based on improved semantic graph approach. Int. J. Parallel Prog. 46(5), 992–1016 (2018)CrossRefGoogle Scholar
  14. 14.
    D.B. Patel,, S. Shah, H.R. Chhinkaniwala, Fuzzy logic based multi Document Summarization with improved sentence scoring and redundancy removal technique, Expert. Syst. Appl. (2019)Google Scholar
  15. 15.
    M.R. Chaud, A. Di Felippo, Exploring content selection strategies for multilingual multi-document summarization based on the universal network language (UNL). Revista de Estudos da Linguagem 26(1), 45–71 (2018)CrossRefGoogle Scholar
  16. 16.
    Cagliero, L., Garza, P., Baralis, E.: ELSA: a multilingual document summarization algorithm based on frequent itemsets and latent semantic analysis. ACM Trans. Inf. Syst. (TOIS),  37(2) (2019)‏CrossRefGoogle Scholar
  17. 17.
    S. Narayan, S.B. Cohen, M. Lapata, Ranking sentences for extractive summarization with reinforcement learning. arXiv preprint arXiv:1802.08636 (2018)
  18. 18.
    C. Kedzie, K. McKeown, H. Daume III, Content selection in deep learning models of summarization, arXiv preprint arXiv:1810.12343 (2018)
  19. 19.
    S. Song, H. Huang, T. Ruan, Abstractive text summarization using LSTM-CNN based deep learning. Multimed. Tools Appl. 78(1), 857–875 (2019)CrossRefGoogle Scholar
  20. 20.
    M.S. Bewoor, S.H. Patil, Empirical analysis of single and multi document summarization using clustering algorithms. Eng., Technol. Appl. Sci. Res. 8(1), 2562–2567 (2018)Google Scholar
  21. 21.
    H. Van Lierde, T.W. Chow, Learning with fuzzy hypergraphs: a topical approach to query-oriented text summarization. Inf. Sci. 496, 212–224 (2019)CrossRefGoogle Scholar
  22. 22.
    P. Wu, Q. Zhou, Z. Lei, W. Qiu, X. Li: Template oriented text summarization via knowledge graph, in 2018 International Conference on Audio, Language and Image Processing (ICALIP) (IEEE, 2018), pp. 79–83Google Scholar
  23. 23.
    Y. Wu, R. Chen, C. Li, S. Chen, W. Zou, Automatic summarization generation technology of network document based on knowledge graph, in International Conference on Advanced Hybrid Information Processing, (Springer, Cham, 2018), pp. 20–27Google Scholar
  24. 24.
    C. Mallick, A.K. Das, M. Dutta, A.K. Das, A. Sarkar, Graph-based text summarization using modified TextRank, in Soft Computing in Data Analytics, (Springer, Singapore, 2019), pp. 137–146Google Scholar
  25. 25.
    A. Cohan, N. Goharian, Scientific article summarization using citation-context and article’s discourse structure. arXiv preprint arXiv:1704.06619 (2017)
  26. 26.
    X. Wang, Y. Yoshida, T. Hirao, K. Sudoh, M. Nagata, Summarization based on task-oriented discourse parsing. IEEE Trans. Audio Speech Lang. Process. 23(8), 1358–1367 (2015)CrossRefGoogle Scholar
  27. 27.
    R. Rautray, R.C. Balabantaray, Cat swarm optimization based evolutionary framework for multi document summarization. Phys. A 477, 174–186 (2017)CrossRefGoogle Scholar
  28. 28.
    J.M. Sanchez-Gomez, M.A. Vega-Rodríguez, C.J. Pérez, Extractive multi-document text summarization using a multi-objective artificial bee colony optimization approach. Knowl.-Based Syst. 159, 1–8 (2018)CrossRefGoogle Scholar
  29. 29.
    M.A. Mosa, A.S. Anwar, A. Hamouda, A survey of multiple types of text summarization based on swarm intelligence optimization techniques (2018)Google Scholar
  30. 30.
    L. Suanmali, N. Salim, M.S. Binwahlan, Genetic algorithm based sentence extraction for text summarization. Int. J. Innov. Comput. 1(1), (2011)Google Scholar
  31. 31.
    Keskes, I., Lhioui, M., Benamara, F., Belguith, L.: Automatic summarization of Arabic texts biased on segmented discourse representation theory international computing conference in Arabic (ICCA, 26–28 December, Egypt 2012)Google Scholar
  32. 32.
    K. Nandhini, S.R. Balasundaram, Use of genetic algorithm for cohesive summary extraction to assist reading difficulties. Appl. Comput. Intell. Soft Comput. (2013)Google Scholar
  33. 33.
    F.G. El Sherief, Towards A Hybrid Framework for Automatic Arabic Summarizer, Unpublished Ph.D’s thesis, Faculty of Computer and Information, Cairo University (2015)Google Scholar
  34. 34.
    H. Froud, A. Lachkar, S. Ouatik, Arabic text summarization based on latent semantic analysis to enhance arabic documents clustering. Colloq. Inf. Sci. Technol. (CIST) 22–24 October (2016)Google Scholar
  35. 35.
    Y.A. Jaradat, A.T. Al-Taani, Hybrid-based Arabic single-document text summarization approach using genatic algorithm, in 2016 7th International Conference on Information and Communication Systems (ICICS), (IEEE, 2016), pp. 85–91Google Scholar
  36. 36.
    R.S. Baraka, S.N. Al Breem, Automatic arabic text summarization for large scale multiple documents using genetic algorithm and mapreduce, in 2017 Palestinian International Conference on Information and Communication Technology (PICICT), (IEEE, 2017), pp. 40–45Google Scholar
  37. 37.
    A.M. Azmi, N.I. Altmami, An abstractive Arabic text summarizer with user controlled granularity. Inf. Process. Manage. 54(6), 903–921 (2018)CrossRefGoogle Scholar
  38. 38.
    Y.C. Shekhar, A. Sharan, Hybrid approach for single text document summarization using statistical and sentiment features. Int. J. Inf. Retr. Res. (IJIRR), 46–70 (2015)Google Scholar
  39. 39.
    Y.K. Menna, D. Gopalani, Feature priority based sentence filtering method for extractive automatic text Summarization (2015)Google Scholar
  40. 40.
    J. Singh, V. Gupta, A systematic review of text stemming techniques (2016)Google Scholar
  41. 41.
    A. Haboush, A. Momani, M. Al-Zoubi, M. Tarazi: Arabic text summarization model using clustering techniques. World Comput. Sci. Inf. Technol. J. WCSIT, 2(3) 62–67 (2012)Google Scholar
  42. 42.
    M.M. Refaat, A.A. Ewees, M.M. Eisa, A.A. Sallam, Automated assessment of students’ arabic free-text answers. Int. J. Intell. Comput. Inf. Sci. 12(1), 213–222 (2012)Google Scholar
  43. 43.
    N. El-Fishawy, A. Hamouda, G. Attiya, M. Atef, Arabic summarization in Twitter social network. Ain Shams Eng. J. 5(2), 411–420 (2014)CrossRefGoogle Scholar
  44. 44.
    A.A. Ewees, M. Eisa, M.M. Refaat, Comparison of cosine similarity and k-NN for automated essays scoring. Cogn. Process. 3(12) (2014)Google Scholar
  45. 45.
    R.A. Ibrahim, et al., Galaxy images classification using hybrid brain storm optimization with moth flame optimization. J. Astron. Telesc., Instrum., Syst. 4(3), 038001 (2018)CrossRefGoogle Scholar
  46. 46.
    E.H. Houssein, A.E. Ahmed, Mohamed Abd ElAziz. Improving twin support vector machine based on hybrid swarm optimizer for heartbeat classification. Pattern Recognit. Image Anal. 28(2), 243–253 (2018)CrossRefGoogle Scholar
  47. 47.
    M Abd Elaziz, A.A. Ewees, A.E. Hassanien, Multi-objective whale optimization algorithm for content-based image retrieval. Multimed. Tools Appl. 77(19), 26135–26172 (2018)Google Scholar
  48. 48.
    M. Boudabous, M. Maaloul, I. Keskes, L. Belguith. Automatic summarization of arabic texts between digital learning theory and rhetorical structure theory. Commun. ACS, 4(2) (2011)Google Scholar

Copyright information

© Springer Nature Switzerland AG 2020

Authors and Affiliations

  • Asmaa A. Bialy
    • 1
  • Marwa A. Gaheen
    • 1
  • R. M. ElEraky
    • 2
    • 3
  • A. F. ElGamal
    • 4
  • Ahmed A. Ewees
    • 1
    • 2
    Email author
  1. 1.Computer DepartmentDamietta UniversityDamiettaEgypt
  2. 2.Bisha UniversityBishaKingdom of Saudi Arabia
  3. 3.Faculty of Specific EducationDamietta UniversityDamiettaEgypt
  4. 4.Computer Department, Faculty of Specific EducationMansoura UniversityMansouraEgypt

Personalised recommendations