Skip to main content
Log in

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

  • Research Article - Computer Engineering and Computer Science
  • Published:
Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Abstract

The fast-growing amount of information on the Internet makes the research in automatic document summarization very urgent. It is an effective solution for information overload. Many approaches have been proposed based on different strategies, such as latent semantic analysis (LSA). However, LSA, when applied to document summarization, has some limitations which diminish its performance. In this work, we try to overcome these limitations by applying statistic and linear algebraic approaches combined with syntactic and semantic processing of text. First, the part of speech tagger is utilized to reduce the dimension of LSA. Then, the weight of the term in four adjacent sentences is added to the weighting schemes while calculating the input matrix to take into account the word order and the syntactic relations. In addition, a new LSA-based sentence selection algorithm is proposed, in which the term description is combined with sentence description for each topic which in turn makes the generated summary more informative and diverse. To ensure the effectiveness of the proposed LSA-based sentence selection algorithm, extensive experiment on Arabic and English are done. Four datasets are used to evaluate the new model, Linguistic Data Consortium (LDC) Arabic Newswire-a corpus, Essex Arabic Summaries Corpus (EASC), DUC2002, and Multilingual MSS 2015 dataset. Experimental results on the four datasets show the effectiveness of the proposed model on Arabic and English datasets. It performs comprehensively better compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Binwahlan, M.S.; Salim, N.; Suanmali, L.: Fuzzy swarm diversity hybrid model for text summarization. Inf. Process. Manag. 46(5), 571–588 (2010). https://doi.org/10.1016/j.ipm.2010.03.004

    Article  Google Scholar 

  2. Khan, K.; Baharudin, B.B.; Khan, A.: Semantic-based unsupervised hybrid technique for opinion targets extraction from unstructured reviews. Arab. J. Sci. Eng. 39(5), 3681–3689 (2014). https://doi.org/10.1007/s13369-014-0990-1

    Article  Google Scholar 

  3. Imam, I.; Nounou, N.; Hamouda, A.; Khalek, H.A.A.: An ontology-based summarization system for Arabic documents (OSSAD). Int. J. Comput. Appl. 74(17), 38–43 (2013)

    Article  Google Scholar 

  4. Qumsiyeh, R.; Ng, Y.-K.: Enhancing web search by using query-based clusters and multi-document summaries. Knowl. Inf. Syst. 47(2), 355–380 (2016). https://doi.org/10.1007/s10115-015-0852-5

    Article  Google Scholar 

  5. Sarkar, D.: Text Summarization. In: Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data, pp. 217–263. Apress, Berkeley, CA (2016)

    Chapter  Google Scholar 

  6. Heu, J.-U.; Qasim, I.; Lee, D.-H.: FoDoSu: multi-document summarization exploiting semantic analysis based on social Folksonomy. Inf. Process. Manag. 51(1), 212–225 (2015). https://doi.org/10.1016/j.ipm.2014.06.003

    Article  Google Scholar 

  7. Hammo, B.H.: A hybrid arabic text summarization technique based on text structure and topic identification. Int. J. Comput. Process. Lang. 23(1), 39–65 (2011)

    Article  Google Scholar 

  8. Al Qassem, L.M.; Wang, D.; Al Mahmoud, Z.; Barada, H.; Al-Rubaie, A.; Almoosa, N.I.: Automatic Arabic summarization: a survey of methodologies and systems. Proc. Comput. Sci. 117, 10–18 (2017). https://doi.org/10.1016/j.procs.2017.10.088

    Article  Google Scholar 

  9. Ferreira, R.; de Souza Cabral, L.; Lins, R.D.; Pereirae Silva, G.; Freitas, F.; Cavalcanti, G.D.C.; Lima, R.; Simske, S.J.; Favaro, L.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40(14), 5755–5764 (2013). https://doi.org/10.1016/j.eswa.2013.04.023

    Article  Google Scholar 

  10. Zhu, J.; Jiang, Y.; Li, B.; Sun, M.: Ontology-based automatic summarization of web document. Int. J. Adv. Comput. Technol. 4(14), 289–309 (2012). https://doi.org/10.4156/ijact.vol4.issue14.34

    Article  Google Scholar 

  11. Jeong, H.; Ko, Y.; Seo, J.: How to improve text summarization and classification by mutual cooperation on an integrated framework. Expert Syst. Appl. 60, 222–233 (2016). https://doi.org/10.1016/j.eswa.2016.05.001

    Article  Google Scholar 

  12. Isonuma, M.; Fujino, T.; Mori, J.; Matsuo, Y.; Sakata, I.: Extractive Summarization Using multi-task learning with document classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), pp. 2091–2100

  13. Triantafillou, E.; Kiros, J.R.; Urtasun, R.; Zemel, R.: Towards generalizable sentence embeddings. In: Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany, 2016, pp. 239–248

  14. Wu, Z.; Lei, L.; Li, G.; Huang, H.; Zheng, C.; Chen, E.; Xu, G.: A topic modeling based approach to novel document automatic summarization. Expert Syst. Appl. 84, 12–23 (2017). https://doi.org/10.1016/j.eswa.2017.04.054

    Article  Google Scholar 

  15. Gong, Y.; Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25, ACM (2001)

  16. Steinberger, J.; Jezek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of the ISIM’04, pp. 93–100 (2004)

  17. Murray, G.; Renals, S.; Carletta, J.: Extractive summarization of meeting recordings. In: INTERSPEECH (2005)

  18. Ozsoy, M.G.; Alpaslan, F.N.; Cicekli, I.: Text summarization using latent semantic analysis. J. Inf. Sci. 37(4), 405–417 (2011). https://doi.org/10.1177/0165551511408848

    Article  MathSciNet  Google Scholar 

  19. Wang, Y.; Ma, J.: A comprehensive method for text summarization based on latent semantic analysis. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) Natural Language Processing and Chinese Computing: Second CCF Conference, NLPCC 2013, Chongqing, China, November 15–19, 2013, Proceedings, pp. 394–401. Springer, Berlin (2013)

    Chapter  Google Scholar 

  20. Shen, Y.; He, X.; Gao, J.; Deng, L.; Gr, #233, Mesnil, g.: A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, pp. 101–110. ACM, 2661935 (2014)

  21. Al-Saleh, A.B.; Menai, M.E.B.: Automatic Arabic text summarization: a survey. Artif. Intell. Rev. 45(2), 203–234 (2016). https://doi.org/10.1007/s10462-015-9442-x

    Article  Google Scholar 

  22. El-Haj, M.; Kruschwitz, U.; Fox, C.: Multi-document Arabic text summarisation. In: 2011 3rd Computer Science and Electronic Engineering Conference (CEEC), pp. 40–44 (2011)

  23. Froud, H.; Lachkar, A.; Ouatik, S.A.: Arabic text summarization based on latent semantic analysis to enhance Arabic documents clustering. arXiv preprint arXiv:1302.1612 (2013)

  24. Ba-Alwi, F.M.; Gaphari, G.H.; Al-Duqaimi, F.N.: Arabic text summarization using latent semantic analysis. Br. J. Appl. Sci. Technol. 10(2), 1–14 (2015)

    Article  Google Scholar 

  25. Althobaiti, M.; Kruschwitz, U.; Poesio, M.: AraNLP: A Java-Based Library for the Processing of Arabic Text, pp. 4134–4138. University of Essex, Colchester (2013)

    Google Scholar 

  26. Farghaly, A.; Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8(4), 1–22 (2009). https://doi.org/10.1145/1644879.1644881

    Article  Google Scholar 

  27. Nadera, B.: The Arabic natural language processing: introduction and challenges. Int. J. Engl. Lang. Transl. Stud. 2(3), 106–112 (2014)

    Google Scholar 

  28. El-Khair, I.A.: Effects of stop words elimination for Arabic information retrieval: a comparative study. Int. J. Comput. Inf. Sci. 4(3), 119–133 (2006)

    Google Scholar 

  29. Taghva, K.; Elkhoury, R.; Coombs, J.: Arabic stemming without a root dictionary. In: International Conference on Information Technology: Coding and Computing (ITCC’05)—Volume II, 4–6, vol. 151, pp. 152–157 (2005)

  30. Zahedi, M.-H.; Kahani, M.: SREC: discourse-level semantic relation extraction from text. Neural Comput. Appl. 23, 1573–1582 (2013)

    Article  Google Scholar 

  31. Köprü, S.: An efficient part-of-speech tagger for Arabic. In: Gelbukh, A.F. (ed.) Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, February 20–26, 2011. Proceedings, Part I, pp. 202–213. Springer, Berlin (2011)

    Chapter  Google Scholar 

  32. Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)

    Book  Google Scholar 

  33. Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization. Inf. Process. Manag. 45(1), 20–34 (2009). https://doi.org/10.1016/j.ipm.2008.06.002

    Article  Google Scholar 

  34. Kalman, D.: A singularly valuable decomposition: the SVD of a matrix. Coll. Math. J. 27(1), 2–23 (1996)

    Article  MathSciNet  Google Scholar 

  35. Menéndez, H.D.; Plaza, L.; Camacho, D.: A Genetic Graph-Based Clustering Approach to Biomedical Summarization, pp. 978-1-4503-1850-1. ACM (2013)

  36. Jing, H.; Barzilay, R.; McKeown, K.; Elhadad, M.: Summarization evaluation methods: experiments and analysis. In: AAAI Symposium on Intelligent Summarization, pp. 51–59 (1998)

  37. Sobh, I.; Darwish, N.; Fayek, M.: Evaluation Approaches for an Arabic Extractive Generic Text Summarization System, pp. 150–155. The Research and Development International Company, RDI, Cairo University, Giza, Egypt. http://www.rdi-eg.com (2013)

  38. Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain (2004)

  39. El-Haj, M.: Essex Arabic summaries corpus (EASC). In: Text Analysis Conference (TAC) 2011, vol. 2016, vol. 10/03/2015. Lancaster University (2011)

  40. El-Haj, M.; Kruschwitz, U.; Fox, C.: Using mechanical turk to create a corpus of Arabic summaries. In: Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (2010)

  41. Lin, C.-Y.; Hovy, E.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology—Volume 1, Edmonton, Canada 2003, pp. 71–78. Association for Computational Linguistics, 1073465

  42. Giannakopoulos, G.; Kubina, J.; Conroy, J.; Steinberger, J.; Favre, B.; Kabadjov, M.; Kruschwitz, U.; Poesio, M.: Multiling 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In: Proceedings of SIGDIAL, Prague, pp. 270–274 (2015)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Zuping Zhang.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Al-Sabahi, K., Zhang, Z., Long, J. et al. An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization. Arab J Sci Eng 43, 8079–8094 (2018). https://doi.org/10.1007/s13369-018-3286-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s13369-018-3286-z

Keywords

Navigation