An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

Al-Sabahi, Kamal; Zhang, Zuping; Long, Jun; Alwesabi, Khaled

doi:10.1007/s13369-018-3286-z

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

Research Article - Computer Engineering and Computer Science
Published: 05 May 2018

Volume 43, pages 8079–8094, (2018)
Cite this article

Arabian Journal for Science and Engineering Aims and scope Submit manuscript

Kamal Al-Sabahi¹,
Zuping Zhang¹,
Jun Long¹ &
…
Khaled Alwesabi¹

329 Accesses
22 Citations
1 Altmetric
Explore all metrics

Abstract

The fast-growing amount of information on the Internet makes the research in automatic document summarization very urgent. It is an effective solution for information overload. Many approaches have been proposed based on different strategies, such as latent semantic analysis (LSA). However, LSA, when applied to document summarization, has some limitations which diminish its performance. In this work, we try to overcome these limitations by applying statistic and linear algebraic approaches combined with syntactic and semantic processing of text. First, the part of speech tagger is utilized to reduce the dimension of LSA. Then, the weight of the term in four adjacent sentences is added to the weighting schemes while calculating the input matrix to take into account the word order and the syntactic relations. In addition, a new LSA-based sentence selection algorithm is proposed, in which the term description is combined with sentence description for each topic which in turn makes the generated summary more informative and diverse. To ensure the effectiveness of the proposed LSA-based sentence selection algorithm, extensive experiment on Arabic and English are done. Four datasets are used to evaluate the new model, Linguistic Data Consortium (LDC) Arabic Newswire-a corpus, Essex Arabic Summaries Corpus (EASC), DUC2002, and Multilingual MSS 2015 dataset. Experimental results on the four datasets show the effectiveness of the proposed model on Arabic and English datasets. It performs comprehensively better compared to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Using Statistical and Semantic Analysis for Arabic Text Summarization

Automatic Arabic Text Summarization Using Analogical Proportions

Article 19 August 2020

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

Article 04 February 2021

References

Binwahlan, M.S.; Salim, N.; Suanmali, L.: Fuzzy swarm diversity hybrid model for text summarization. Inf. Process. Manag. 46(5), 571–588 (2010). https://doi.org/10.1016/j.ipm.2010.03.004
Article Google Scholar
Khan, K.; Baharudin, B.B.; Khan, A.: Semantic-based unsupervised hybrid technique for opinion targets extraction from unstructured reviews. Arab. J. Sci. Eng. 39(5), 3681–3689 (2014). https://doi.org/10.1007/s13369-014-0990-1
Article Google Scholar
Imam, I.; Nounou, N.; Hamouda, A.; Khalek, H.A.A.: An ontology-based summarization system for Arabic documents (OSSAD). Int. J. Comput. Appl. 74(17), 38–43 (2013)
Article Google Scholar
Qumsiyeh, R.; Ng, Y.-K.: Enhancing web search by using query-based clusters and multi-document summaries. Knowl. Inf. Syst. 47(2), 355–380 (2016). https://doi.org/10.1007/s10115-015-0852-5
Article Google Scholar
Sarkar, D.: Text Summarization. In: Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data, pp. 217–263. Apress, Berkeley, CA (2016)
Chapter Google Scholar
Heu, J.-U.; Qasim, I.; Lee, D.-H.: FoDoSu: multi-document summarization exploiting semantic analysis based on social Folksonomy. Inf. Process. Manag. 51(1), 212–225 (2015). https://doi.org/10.1016/j.ipm.2014.06.003
Article Google Scholar
Hammo, B.H.: A hybrid arabic text summarization technique based on text structure and topic identification. Int. J. Comput. Process. Lang. 23(1), 39–65 (2011)
Article Google Scholar
Al Qassem, L.M.; Wang, D.; Al Mahmoud, Z.; Barada, H.; Al-Rubaie, A.; Almoosa, N.I.: Automatic Arabic summarization: a survey of methodologies and systems. Proc. Comput. Sci. 117, 10–18 (2017). https://doi.org/10.1016/j.procs.2017.10.088
Article Google Scholar
Ferreira, R.; de Souza Cabral, L.; Lins, R.D.; Pereirae Silva, G.; Freitas, F.; Cavalcanti, G.D.C.; Lima, R.; Simske, S.J.; Favaro, L.: Assessing sentence scoring techniques for extractive text summarization. Expert Syst. Appl. 40(14), 5755–5764 (2013). https://doi.org/10.1016/j.eswa.2013.04.023
Article Google Scholar
Zhu, J.; Jiang, Y.; Li, B.; Sun, M.: Ontology-based automatic summarization of web document. Int. J. Adv. Comput. Technol. 4(14), 289–309 (2012). https://doi.org/10.4156/ijact.vol4.issue14.34
Article Google Scholar
Jeong, H.; Ko, Y.; Seo, J.: How to improve text summarization and classification by mutual cooperation on an integrated framework. Expert Syst. Appl. 60, 222–233 (2016). https://doi.org/10.1016/j.eswa.2016.05.001
Article Google Scholar
Isonuma, M.; Fujino, T.; Mori, J.; Matsuo, Y.; Sakata, I.: Extractive Summarization Using multi-task learning with document classification. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing (2017), pp. 2091–2100
Triantafillou, E.; Kiros, J.R.; Urtasun, R.; Zemel, R.: Towards generalizable sentence embeddings. In: Proceedings of the 1st Workshop on Representation Learning for NLP, Berlin, Germany, 2016, pp. 239–248
Wu, Z.; Lei, L.; Li, G.; Huang, H.; Zheng, C.; Chen, E.; Xu, G.: A topic modeling based approach to novel document automatic summarization. Expert Syst. Appl. 84, 12–23 (2017). https://doi.org/10.1016/j.eswa.2017.04.054
Article Google Scholar
Gong, Y.; Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 19–25, ACM (2001)
Steinberger, J.; Jezek, K.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of the ISIM’04, pp. 93–100 (2004)
Murray, G.; Renals, S.; Carletta, J.: Extractive summarization of meeting recordings. In: INTERSPEECH (2005)
Ozsoy, M.G.; Alpaslan, F.N.; Cicekli, I.: Text summarization using latent semantic analysis. J. Inf. Sci. 37(4), 405–417 (2011). https://doi.org/10.1177/0165551511408848
Article MathSciNet Google Scholar
Wang, Y.; Ma, J.: A comprehensive method for text summarization based on latent semantic analysis. In: Zhou, G., Li, J., Zhao, D., Feng, Y. (eds.) Natural Language Processing and Chinese Computing: Second CCF Conference, NLPCC 2013, Chongqing, China, November 15–19, 2013, Proceedings, pp. 394–401. Springer, Berlin (2013)
Chapter Google Scholar
Shen, Y.; He, X.; Gao, J.; Deng, L.; Gr, #233, Mesnil, g.: A latent semantic model with convolutional-pooling structure for information retrieval. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, Shanghai, China, pp. 101–110. ACM, 2661935 (2014)
Al-Saleh, A.B.; Menai, M.E.B.: Automatic Arabic text summarization: a survey. Artif. Intell. Rev. 45(2), 203–234 (2016). https://doi.org/10.1007/s10462-015-9442-x
Article Google Scholar
El-Haj, M.; Kruschwitz, U.; Fox, C.: Multi-document Arabic text summarisation. In: 2011 3rd Computer Science and Electronic Engineering Conference (CEEC), pp. 40–44 (2011)
Froud, H.; Lachkar, A.; Ouatik, S.A.: Arabic text summarization based on latent semantic analysis to enhance Arabic documents clustering. arXiv preprint arXiv:1302.1612 (2013)
Ba-Alwi, F.M.; Gaphari, G.H.; Al-Duqaimi, F.N.: Arabic text summarization using latent semantic analysis. Br. J. Appl. Sci. Technol. 10(2), 1–14 (2015)
Article Google Scholar
Althobaiti, M.; Kruschwitz, U.; Poesio, M.: AraNLP: A Java-Based Library for the Processing of Arabic Text, pp. 4134–4138. University of Essex, Colchester (2013)
Google Scholar
Farghaly, A.; Shaalan, K.: Arabic natural language processing: challenges and solutions. ACM Trans. Asian Lang. Inf. Process. 8(4), 1–22 (2009). https://doi.org/10.1145/1644879.1644881
Article Google Scholar
Nadera, B.: The Arabic natural language processing: introduction and challenges. Int. J. Engl. Lang. Transl. Stud. 2(3), 106–112 (2014)
Google Scholar
El-Khair, I.A.: Effects of stop words elimination for Arabic information retrieval: a comparative study. Int. J. Comput. Inf. Sci. 4(3), 119–133 (2006)
Google Scholar
Taghva, K.; Elkhoury, R.; Coombs, J.: Arabic stemming without a root dictionary. In: International Conference on Information Technology: Coding and Computing (ITCC’05)—Volume II, 4–6, vol. 151, pp. 152–157 (2005)
Zahedi, M.-H.; Kahani, M.: SREC: discourse-level semantic relation extraction from text. Neural Comput. Appl. 23, 1573–1582 (2013)
Article Google Scholar
Köprü, S.: An efficient part-of-speech tagger for Arabic. In: Gelbukh, A.F. (ed.) Computational Linguistics and Intelligent Text Processing: 12th International Conference, CICLing 2011, Tokyo, Japan, February 20–26, 2011. Proceedings, Part I, pp. 202–213. Springer, Berlin (2011)
Chapter Google Scholar
Manning, C.D.; Raghavan, P.; Schütze, H.: Introduction to Information Retrieval. Cambridge University Press, Cambridge (2008)
Book Google Scholar
Lee, J.-H.; Park, S.; Ahn, C.-M.; Kim, D.: Automatic generic document summarization based on non-negative matrix factorization. Inf. Process. Manag. 45(1), 20–34 (2009). https://doi.org/10.1016/j.ipm.2008.06.002
Article Google Scholar
Kalman, D.: A singularly valuable decomposition: the SVD of a matrix. Coll. Math. J. 27(1), 2–23 (1996)
Article MathSciNet Google Scholar
Menéndez, H.D.; Plaza, L.; Camacho, D.: A Genetic Graph-Based Clustering Approach to Biomedical Summarization, pp. 978-1-4503-1850-1. ACM (2013)
Jing, H.; Barzilay, R.; McKeown, K.; Elhadad, M.: Summarization evaluation methods: experiments and analysis. In: AAAI Symposium on Intelligent Summarization, pp. 51–59 (1998)
Sobh, I.; Darwish, N.; Fayek, M.: Evaluation Approaches for an Arabic Extractive Generic Text Summarization System, pp. 150–155. The Research and Development International Company, RDI, Cairo University, Giza, Egypt. http://www.rdi-eg.com (2013)
Lin, C.-Y.: Rouge: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, Barcelona, Spain (2004)
El-Haj, M.: Essex Arabic summaries corpus (EASC). In: Text Analysis Conference (TAC) 2011, vol. 2016, vol. 10/03/2015. Lancaster University (2011)
El-Haj, M.; Kruschwitz, U.; Fox, C.: Using mechanical turk to create a corpus of Arabic summaries. In: Proceedings of the International Conference on Language Resources and Evaluation. European Language Resources Association (2010)
Lin, C.-Y.; Hovy, E.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology—Volume 1, Edmonton, Canada 2003, pp. 71–78. Association for Computational Linguistics, 1073465
Giannakopoulos, G.; Kubina, J.; Conroy, J.; Steinberger, J.; Favre, B.; Kabadjov, M.; Kruschwitz, U.; Poesio, M.: Multiling 2015: multilingual summarization of single and multi-documents, on-line fora, and call-center conversations. In: Proceedings of SIGDIAL, Prague, pp. 270–274 (2015)

Download references

Author information

Authors and Affiliations

School of Information Science and Engineering, Central South University, Changsha, China
Kamal Al-Sabahi, Zuping Zhang, Jun Long & Khaled Alwesabi

Authors

Kamal Al-Sabahi
View author publications
You can also search for this author in PubMed Google Scholar
Zuping Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Jun Long
View author publications
You can also search for this author in PubMed Google Scholar
Khaled Alwesabi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zuping Zhang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Al-Sabahi, K., Zhang, Z., Long, J. et al. An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization. Arab J Sci Eng 43, 8079–8094 (2018). https://doi.org/10.1007/s13369-018-3286-z

Download citation

Received: 10 November 2017
Accepted: 16 April 2018
Published: 05 May 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s13369-018-3286-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

Abstract

Access this article

Similar content being viewed by others

Using Statistical and Semantic Analysis for Arabic Text Summarization

Automatic Arabic Text Summarization Using Analogical Proportions

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An Enhanced Latent Semantic Analysis Approach for Arabic Document Summarization

Abstract

Access this article

Similar content being viewed by others

Using Statistical and Semantic Analysis for Arabic Text Summarization

Automatic Arabic Text Summarization Using Analogical Proportions

An Arabic Multi-source News Corpus: Experimenting on Single-document Extractive Summarization

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation