Skip to main content

Persian Automatic Text Summarization Based on Named Entity Recognition

Abstract

In this paper, we propose an unsupervised method for summarizing Farsi texts based on our neural named entity recognition (NER) system. This method consists of three phases: training a supervised NER model, recognizing named entities of the text, and generating a summary. The proposed method is an unsupervised extractive single-document summarization method. Although the proposed method is language independent, we focus on Farsi text summarization in this work. Firstly, we produce a word embedding based on Hamshahri2 corpus. Secondly, we train a neural network on Arman NER corpus. Then, the proposed algorithm ranks the sentences of the text based on the named entities in each sentence and produces the summary. Finally, the proposed method is evaluated on Pasokh single-document data set using the ROUGE evaluation measure. Without using any handcrafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Farsi methods, and we achieved an overall improvement of ROUGE-2 recall score of 10.2%.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3

References

  1. AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri a standard Farsi text collection. Knowl-Based Syst 22(5):382–387

    Article  Google Scholar 

  2. Arman Named Entity Recognition corpora. http://dadegan.ir/catalog/armanner. Accessed 21 Oct 2019

  3. Asef P, Mohsen K, Ahmad TS, Ahmad E, Hadi Q (2014) IJAZ: an operational system for single-document summarization of Farsi news texts. Signal Data Process 11(121):33–48

    Google Scholar 

  4. Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: Semantic Web, pp 722–735

  5. Baxendale PB (1958) Machine-made index for technical literature—an experiment. IBM J Res Dev 2(4):354–361

    Article  Google Scholar 

  6. Bazghandi M, Tabrizi GT, Jahan MV, Mashahd I (2012) Extractive summarization Of Farsi documents based on PSO clustering. jiA 1:1

    Google Scholar 

  7. Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155

    MATH  Google Scholar 

  8. Berger A, Mittal VO (2000) Query-relevant Summarization Using FAQs. In: Proceedings of the 38th annual meeting on association for computational linguistics. Stroudsburg, PA, USA, pp 294–301

  9. Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning

  10. Chen Z et al (2015) Revisiting word embedding for contrasting meaning. In: ACL, vol 1, pp 106–115

  11. Chiu JP, Nichols E (2015) Named entity recognition with bidirectional LSTM-CNNs. ArXiv Prepr. arXiv:151108308

  12. Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167

  13. Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    MATH  Google Scholar 

  14. Dalianis H (2000) Swesum: a text summerizer for swedish. KTH

  15. Devlin J, Zbib R, Huang Z, Lamar T, Schwartz RM, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: ACL, vol 1, pp 1370–1380

  16. Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285

    Article  Google Scholar 

  17. Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  18. Google Code-word2vec. https://code.google.com/archive/p/word2vec/. Accessed 24 Feb 2019

  19. Hassel M, Mazdak N (2004) FarsiSum: a Farsi text summarizer. In: Proceedings of the workshop on computational approaches to Arabic script-based languages, pp 82–84

  20. Hazm (2017) Python library for digesting Farsi text. Sobhe

  21. Hinton GE, Mcclelland JL, Rumelhart DE (1986) Distributed representations, parallel distributed processing: explorations in the microstructure of cognition: foundations, vol 1. MIT Press, Cambridge

    Google Scholar 

  22. Honarpisheh MA, Ghassem-Sani G, Mirroshandel SA (2008) A multi-document multi-lingual automatic summarization system. In: IJCNLP, pp 733–738

  23. Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 525–533

  24. Khademi ME, Fakhredanesh M, Hoseini SM (2020) Conceptual Persian Text Summarizer: a new model in continuous vector space. Int Arab J Inf Technol 17(4):529–538

    Google Scholar 

  25. Khanpour H (2009) Sentence extraction for summarization and notetaking. University of Malaya, Kuala Lumpur

    Google Scholar 

  26. Kiyomarsi F, Esfahani FR (2011) Optimizing Farsi text summarization based on fuzzy logic approach. In: 2011 international conference on intelligent building and management

  27. Lin C (2004) Rouge: a package for automatic evaluation of summaries. In: Workshop on text summarization branches out at ACL, pp 74–81

  28. Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    MathSciNet  Article  Google Scholar 

  29. Mikolov T, Deoras A, Kombrink S, Burget L, Černocký J (2011) Empirical evaluation and combination of advanced language modeling techniques. In: Twelfth annual conference of the international speech communication association

  30. Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013a) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Red Hook, pp 3111–3119

    Google Scholar 

  31. Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space. ArXiv Prepr. arXiv:13013781

  32. Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  33. Moghaddas BB, Kahani M, Toosi SA, Pourmasoumi A, Estiri A (2013) Pasokh: a standard corpus for the evaluation of Farsi text summarizers. In: 2013 3th international eConference on computer and knowledge engineering (ICCKE), pp 471–475

  34. Pradhan S et al (2013) Towards robust linguistic analysis using OntoNotes. In: CoNLL, pp 143–152

  35. Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  36. Rxnlp (2017) ROUGE-2.0: Java implementation of ROUGE for evaluation of summarization tasks. Stemming, stopwords and unicode support

  37. Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518

    Article  Google Scholar 

  38. Shafiee F, Shamsfard M (2017) Similarity versus relatedness: a novel approach in extractive Farsi document summarisation. J Inf Sci 44:314–330

    Article  Google Scholar 

  39. Shakeri H, Gholamrezazadeh S, Salehi MA, Ghadamyari F (2012) A new graph-based algorithm for Farsi text summarization. In: Computer science and convergence. Springer, Berlin, pp 21–30

  40. Shamsfard M (2008) Developing FarsNet: a lexical ontology for Farsi. In: 4th Global WordNet conference, Szeged, Hungary

  41. Shamsfard M, Akhavan T, Joorabchi ME (2009) Farsi document summarization by PARSUMIST. World Appl Sci J 7:199–205

    Google Scholar 

  42. Shamsfard M et al (2010) Semi automatic development of farsnet; the Farsi wordnet. In: Proceedings of 5th global WordNet conference, Mumbai, India, vol 29

  43. Song W, Choi LC, Park SC, Ding XF (2011) Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst Appl 38(8):9112–9121

    Article  Google Scholar 

  44. Strutz T (2010) Data fitting and uncertainty: a practical introduction to weighted least squares and beyond. Vieweg and Teubner, Wiesbaden

    Google Scholar 

  45. Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Jordan MI, LeCun Y, Solla SA (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 3104–3112

    Google Scholar 

  46. Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: Proceedings of the 2009 SIAM international conference on data mining, pp 1148–1159

  47. Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  48. Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol 4, pp 142–147

  49. Tofighy M, Kashefi O, Zamanifar A, Javadi HHS (2011) Farsi text summarization using fractal theory. In: International conference on informatics engineering and information science, pp 651–662

  50. Tofighy SM, Raj RG, Javad HHS (2013) AHP techniques for Farsi text summarization. Malays J Comput Sci 26(1):1–8

    Google Scholar 

  51. Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188

    MathSciNet  Article  Google Scholar 

  52. van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  53. Zamanifar A, Kashefi O (2011) AZOM: a Persian structured text summarizer. In: International conference on application of natural language to information systems. Springer, Berlin, Heidelberg, pp 234–237

  54. Zamanifar A, Minaei-Bidgoli B, Sharifi M (2008) A new hybrid Farsi text summarization technique based on term co-occurrence and conceptual property of the text. In: Ninth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing, 2008. SNPD’08, pp 635–639

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mohammad Fakhredanesh.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khademi, M.E., Fakhredanesh, M. Persian Automatic Text Summarization Based on Named Entity Recognition. Iran J Sci Technol Trans Electr Eng (2020). https://doi.org/10.1007/s40998-020-00352-2

Download citation

Keywords

  • Extractive summarization
  • Named entity recognition
  • Continuous vector space
  • Word embedding