Advertisement

Cognitive Computation

, Volume 10, Issue 4, pp 651–669 | Cite as

A Hybrid Approach for Arabic Text Summarization Using Domain Knowledge and Genetic Algorithms

  • Qasem A. Al-Radaideh
  • Dareen Q. Bataineh
Article

Abstract

Text summarization is the process of producing a shorter version of a specific text. Automatic summarization techniques have been applied to various domains such as medical, political, news, and legal domains proving that adapting domain-relevant features could improve the summarization performance. Despite the existence of plenty of research work in the domain-based summarization in English and other languages, there is a lack of such work in Arabic due to the shortage of existing knowledge bases. In this paper, a hybrid, single-document text summarization approach (abbreviated as (ASDKGA)) is presented. The approach incorporates domain knowledge, statistical features, and genetic algorithms to extract important points of Arabic political documents. The ASDKGA approach is tested on two corpora KALIMAT corpus and Essex Arabic Summaries Corpus (EASC). The Recall-Oriented Understudy for Gisting Evaluation (ROUGE) framework was used to compare the automatically generated summaries by the ASDKGA approach with summaries generated by humans. Also, the approach is compared against three other Arabic text summarization approaches. The (ASDKGA) approach demonstrated promising results when summarizing Arabic political documents with average F-measure of 0.605 at the compression ratio of 40%.

Keywords

Domain-based summarization Hybrid approaches Genetic algorithms Arabic text summarization Sentence extraction 

Notes

Compliance with Ethical Standards

Conflict of Interest

The authors declare that they have no conflict of interest.

Informed Consent

All procedures followed were in accordance with the ethical standards of the responsible committee on human experimentation (institutional and national) and with the Helsinki declaration of 1975, as revised in 2008 [15].

Human and Animal Rights

This article does not contain any studies with human or animal subjects performed by the any of the authors.

References

  1. 1.
    Lloret E, Palomar M. Text summarization in progress: a literature review. Artif Intell Rev. 2010;37(1):1–41.CrossRefGoogle Scholar
  2. 2.
    Radev D, Hovy E, McKeown K. Introduction to the special issue on summarization. Comput linguist. 2002;28(4):399–408.CrossRefGoogle Scholar
  3. 3.
    Ježek, K. and Steinberger, J. Automatic text summarization (the state of the Art 2007 and new challenges). In: the conference Znalosti, Bratislava, Slovakia 2008; p 1–12.Google Scholar
  4. 4.
    Saggion H. Automatic summarization: an overview. Rev Fr Linguist Appl. 2008;13(1):63–81.Google Scholar
  5. 5.
    Luhn H. The automatic creation of literature abstracts. IBM J Res Dev. 1958;2(2):159–65.CrossRefGoogle Scholar
  6. 6.
    Reeve L, Han H, Brooks A. The use of domain-specific concepts in biomedical text summarization. Inf Process Manag. 2007;43(6):1765–76.CrossRefGoogle Scholar
  7. 7.
    Chen Y, Foong O, Yong S, Kurniawan I. Text summarization for oil and gas drilling topic. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2008;2(6):1799–802.Google Scholar
  8. 8.
    Yeh J, Ke H, Yang W, Meng I. Text summarization using a trainable summarizer and latent semantic analysis. Inf Process Manag. 2005;41(1):75–95.CrossRefGoogle Scholar
  9. 9.
    Moens, M., Uyttendaele, C., and Dumortier, J. Abstracting of legal cases: the SALOMON experience. In: the 6th International Conference on Artificial Intelligence and Law (ICAIL97), Melbourne, Australia. 1997; p 114–122.Google Scholar
  10. 10.
    De Hollander, G. and Marx, M. Summarization of meetings using word clouds. In: the Computer Science and Software Engineering (CSSE) CSI International Symposium, Tehran 2011; p 54–61.Google Scholar
  11. 11.
    Summers, E. and Stephens, K. Politwitics: summarization of political tweets. 2012. Retrieved Mar. 10, 2015 from the World Wide Web: http://bid.berkeley.edu/cs294-1-spring13/images/3/34/Politwitics_report.pdf.
  12. 12.
    Chong L, Chen Y. Text summarization for oil and gas news article. Int J Comput Electr Autom Control Inf Eng World Acad Sci Technol. 2009;3(5):1282–5.Google Scholar
  13. 13.
    Sarkar K. Using domain knowledge for text summarization in medical domain. Int J Recent Trends Eng. 2009;1(1):200–5.Google Scholar
  14. 14.
    Imam I, Hamouda A, Khalek H. An ontology-based summarization system for Arabic documents (OSSAD). Int J Comput Appl. 2013;74(17):38–43.Google Scholar
  15. 15.
    Jr S, Pappa C, Freitas A, Kaestner C. Automatic text summarization with genetic algorithm-based attribute selection. Adv Artif Intell–IBERAMIA Springer. 2004:305–14.Google Scholar
  16. 16.
    Qazvinian V, Hassanabadi L, Halavati R. Summarising text with a genetic algorithm-based sentence extraction. Int J Knowl Manag Stud. 2008;2(4):426–44.CrossRefGoogle Scholar
  17. 17.
    Fattah M, Ren F. Automatic text summarization. Int J Comput Electr Autom Control Inf Eng. 2008;2(1):90–3.Google Scholar
  18. 18.
    Litvak M, Last M, Friedman M. A new approach to improving multilingual summarization using genetic algorithms. In: The 48th Annual Meeting of the Association for Computational Linguistics, Uppsala, Sweden; 2010. p. 927–36.Google Scholar
  19. 19.
    Nandhini K, Balasundaram S. Use of genetic algorithms for cohesive summary extraction to assist reading difficulties. Appl Comput Intell Soft Comput. 2013;2013:1–11.CrossRefGoogle Scholar
  20. 20.
    Hammo B, Abu-Salem H, Evens M. A hybrid Arabic text summarization technique based on text structure and topic identification. Int J Comput Process Lang. 2011;23(01):39–65.CrossRefGoogle Scholar
  21. 21.
    Al-Omour M. Extractive-based Arabic text summarization approach. M.Sc Thesis: Department of Computer Science, Yarmouk University, Irbid, Jordan; 2012.Google Scholar
  22. 22.
    Ibrahim A, Elghazaly T, Gheith M. A novel Arabic text summarization model based on rhetorical structure theory and vector space model. Int J Comput Linguist Nat Lang Process. 2013;2(8):480–4.Google Scholar
  23. 23.
    Douzidia, F. and Lapalme, G. Lakhas, an Arabic summarization system. In: the Document Understanding Conference (DUC), Boston, USA. 2004; p128–135.Google Scholar
  24. 24.
    Bawakid, A., and Oussalah, M. A semantic summarization system: the University of Birmingham at TAC 2008. In: the first text analysis conference (TAC), Maryland, USA 2008; p 1–6.Google Scholar
  25. 25.
    Al-Radaideh Q, Afif M. Arabic text summarization using aggregate similarity. In: The international Arab Conference on Information Technology (ACIT’2009). Yemen; 2009. p. 1–8.Google Scholar
  26. 26.
    Sobh I. An optimized dual classification system for Arabic extractive generic text summarization. M.Sc Thesis: Department of Computer Engineering, Cairo University, Giza, Egypt; 2009.Google Scholar
  27. 27.
    Hamodeh, A. and Mousa, M. Automatic system for summarizing Arabic comments on social media networks. Al-Majala Al-Dawlia Lelitesalat, Al-Jameia Al-Arabia Lelhasibat. Special Issue. 2013; p 44–56. (In Arabic).Google Scholar
  28. 28.
    Al-Taani Ahmad and Al-Rousan, Suhaib. Arabic multi-document text summarization. In: the 17th International Conference on Intelligent Text Processing and Computational Linguistics (CICLing 2016), Turkey 2016.Google Scholar
  29. 29.
    Oufaida H, Nouali O, Blache. Minimum redundancy and maximum relevance for single and multi-document Arabic text summarization. J King Saud Univ-Comput Inf Sci. 2014;26(4):450–61.Google Scholar
  30. 30.
    Al-Khawaldeh F, Samawi V. Lexical cohesion and entailment-based segmentation for Arabic text summarization (LCEAS). World Comput Sci Inf Technol J (WCSIT). 2015;5(03):51–60.Google Scholar
  31. 31.
    Tran HN, Cambria E, Hussain A. Towards GPU-based common-sense reasoning: using fast subgraph matching. Cogn Comput. 2016;8(6):1074–86.CrossRefGoogle Scholar
  32. 32.
    Yunqing Xia Y, Cambria E, Hussain A, Zhao H. Word polarity disambiguation using Bayesian model and opinion-level features. Cogn Comput. 2015;7(3):369–80.CrossRefGoogle Scholar
  33. 33.
    Li Y, Pan Q, Yang T, Suhang Wang S, Tang J, Cambria E. Learning word representations for sentiment analysis. Cogn Comput. 2017;9(6):843–51.CrossRefGoogle Scholar
  34. 34.
    Al-Radaideh Q, Gh A-Q. Application of rough set-based feature selection for Arabic sentiment analysis. Cogn Comput. 2017;9(4):346–445.CrossRefGoogle Scholar
  35. 35.
    Recupero D, Presutti V, Consoli S, Gangemi A, Nuzzolese A. Sentilo: frame-based sentiment analysis. Cogn Comput. 2015;7(2):211–25.CrossRefGoogle Scholar
  36. 36.
    Dashtipour K, Poria S, Hussain A, Cambria E, Hawalah A, Gelbukh A, et al. Multilingual sentiment analysis: state-of-the-art and independent comparison of techniques. Cogn Comput. 2016;8:757–71.CrossRefGoogle Scholar
  37. 37.
    Mukhtar N, Khan MA, Chiragh N. Effective use of evaluation measures for the validation of best classifier in Urdu sentiment analysis. Cogn Comput. 2017;9(4):446–56.CrossRefGoogle Scholar
  38. 38.
    Lo SL, Cambria E, Chiong R, Cornforth D. Multilingual sentiment analysis: from formal to informal and scarce resource languages. Artif Intell Rev. 2017;48(4):499–527.CrossRefGoogle Scholar
  39. 39.
    Duwairi R, El-Orfali M. A study of the effects of preprocessing strategies on sentiment analysis for Arabic text. J Inf Sci. 2014;40(4):501–13.CrossRefGoogle Scholar
  40. 40.
    El-Khair I. Effects of stop words elimination for Arabic information retrieval: a comparative study. Int J Comput Inf Sci. 2006;4(3):119–33.Google Scholar
  41. 41.
    Green, S. and Manning, C. Better arabic parsing: baselines, evaluations, and analysis. In: the 23rd International Conference on Computational Linguistics (COLING), Beijing, China. 2010; p 394–402.Google Scholar
  42. 42.
    Mustafa S. Word stemming for Arabic information retrieval: the case for simple light stemming. Abhath Al-Yarmouk: Sci Eng Ser. 2012;21(1):123–44.Google Scholar
  43. 43.
    Singh J, Gupta V. An efficient corpus-based stemmer. Cogn Comput. 2017;9(5):671–88.CrossRefGoogle Scholar
  44. 44.
    Edmundson H. New methods in automatic extracting. J Assoc Comput Mach. 1969;16(2):264–85.CrossRefGoogle Scholar
  45. 45.
    Perumal K, Chaudhuri B. Language independent sentence extraction based text summarization. In: The 9th international conference on natural language processing (ICON), Chennai, India; 2011. p. 213–7.Google Scholar
  46. 46.
    Kumar Y, Salim N. Automatic multi document summarization approaches. J Comput Sci. 2011;8(1):133–40.CrossRefGoogle Scholar
  47. 47.
    Gupta V, Lehal G. A Survey of text summarization extractive techniques. J Emerg Technol Web Intell. 2010;2(3):258–68.Google Scholar
  48. 48.
    Miller B, Goldberg D. Genetic algorithms, tournament selection, and the effects of noise. Complex Syst. 1995;9(3):193–212.Google Scholar
  49. 49.
    El-Haj, M. and Koulali, R. KALIMAT: a multipurpose Arabic corpus. In the Second Workshop on Arabic Corpus Linguistics, Lancaster University, UK. 2011b; p 22–25. http://sourceforge.net/projects/kalimat/.
  50. 50.
    El-Haj M., Kruschwitz U., and Fox C. Using mechanical Turk to create a corpus of Arabic summaries. In: The 7th international language resources and evaluation conference (LREC), Valletta, Malta. 2010; p 36–39.Google Scholar
  51. 51.
    Lin, C. ROUGE: a package for automatic evaluation of summaries. In: the ACL Workshop on Text Summarization Branches out, Barcelona, Spain. 2004; p 74–81.Google Scholar
  52. 52.
    El-Haj M, Kruschwitz U, Fox C. Experimenting with automatic text summarisation for Arabic. Hum Lang Technol Chall Comput Sci Linguist Springer. 2011a:490–9.Google Scholar

Copyright information

© Springer Science+Business Media, LLC, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Information Technology and Computer SciencesYarmouk UniversityIrbidJordan

Personalised recommendations