Skip to main content
Log in

Abstract

In this paper, we propose an unsupervised method for summarizing Farsi texts based on our neural named entity recognition (NER) system. This method consists of three phases: training a supervised NER model, recognizing named entities of the text, and generating a summary. The proposed method is an unsupervised extractive single-document summarization method. Although the proposed method is language independent, we focus on Farsi text summarization in this work. Firstly, we produce a word embedding based on Hamshahri2 corpus. Secondly, we train a neural network on Arman NER corpus. Then, the proposed algorithm ranks the sentences of the text based on the named entities in each sentence and produces the summary. Finally, the proposed method is evaluated on Pasokh single-document data set using the ROUGE evaluation measure. Without using any handcrafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Farsi methods, and we achieved an overall improvement of ROUGE-2 recall score of 10.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

References

  • AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri a standard Farsi text collection. Knowl-Based Syst 22(5):382–387

    Article  Google Scholar 

  • Arman Named Entity Recognition corpora. http://dadegan.ir/catalog/armanner. Accessed 21 Oct 2019

  • Asef P, Mohsen K, Ahmad TS, Ahmad E, Hadi Q (2014) IJAZ: an operational system for single-document summarization of Farsi news texts. Signal Data Process 11(121):33–48

    Google Scholar 

  • Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: Semantic Web, pp 722–735

  • Baxendale PB (1958) Machine-made index for technical literature—an experiment. IBM J Res Dev 2(4):354–361

    Article  Google Scholar 

  • Bazghandi M, Tabrizi GT, Jahan MV, Mashahd I (2012) Extractive summarization Of Farsi documents based on PSO clustering. jiA 1:1

    Google Scholar 

  • Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155

    MATH  Google Scholar 

  • Berger A, Mittal VO (2000) Query-relevant Summarization Using FAQs. In: Proceedings of the 38th annual meeting on association for computational linguistics. Stroudsburg, PA, USA, pp 294–301

  • Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning

  • Chen Z et al (2015) Revisiting word embedding for contrasting meaning. In: ACL, vol 1, pp 106–115

  • Chiu JP, Nichols E (2015) Named entity recognition with bidirectional LSTM-CNNs. ArXiv Prepr. arXiv:151108308

  • Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167

  • Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537

    MATH  Google Scholar 

  • Dalianis H (2000) Swesum: a text summerizer for swedish. KTH

  • Devlin J, Zbib R, Huang Z, Lamar T, Schwartz RM, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: ACL, vol 1, pp 1370–1380

  • Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285

    Article  Google Scholar 

  • Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256

  • Google Code-word2vec. https://code.google.com/archive/p/word2vec/. Accessed 24 Feb 2019

  • Hassel M, Mazdak N (2004) FarsiSum: a Farsi text summarizer. In: Proceedings of the workshop on computational approaches to Arabic script-based languages, pp 82–84

  • Hazm (2017) Python library for digesting Farsi text. Sobhe

  • Hinton GE, Mcclelland JL, Rumelhart DE (1986) Distributed representations, parallel distributed processing: explorations in the microstructure of cognition: foundations, vol 1. MIT Press, Cambridge

    Google Scholar 

  • Honarpisheh MA, Ghassem-Sani G, Mirroshandel SA (2008) A multi-document multi-lingual automatic summarization system. In: IJCNLP, pp 733–738

  • Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 525–533

  • Khademi ME, Fakhredanesh M, Hoseini SM (2020) Conceptual Persian Text Summarizer: a new model in continuous vector space. Int Arab J Inf Technol 17(4):529–538

    Google Scholar 

  • Khanpour H (2009) Sentence extraction for summarization and notetaking. University of Malaya, Kuala Lumpur

    Google Scholar 

  • Kiyomarsi F, Esfahani FR (2011) Optimizing Farsi text summarization based on fuzzy logic approach. In: 2011 international conference on intelligent building and management

  • Lin C (2004) Rouge: a package for automatic evaluation of summaries. In: Workshop on text summarization branches out at ACL, pp 74–81

  • Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165

    Article  MathSciNet  Google Scholar 

  • Mikolov T, Deoras A, Kombrink S, Burget L, Černocký J (2011) Empirical evaluation and combination of advanced language modeling techniques. In: Twelfth annual conference of the international speech communication association

  • Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013a) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Red Hook, pp 3111–3119

    Google Scholar 

  • Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space. ArXiv Prepr. arXiv:13013781

  • Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41

    Article  Google Scholar 

  • Moghaddas BB, Kahani M, Toosi SA, Pourmasoumi A, Estiri A (2013) Pasokh: a standard corpus for the evaluation of Farsi text summarizers. In: 2013 3th international eConference on computer and knowledge engineering (ICCKE), pp 471–475

  • Pradhan S et al (2013) Towards robust linguistic analysis using OntoNotes. In: CoNLL, pp 143–152

  • Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326

    Article  Google Scholar 

  • Rxnlp (2017) ROUGE-2.0: Java implementation of ROUGE for evaluation of summarization tasks. Stemming, stopwords and unicode support

  • Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518

    Article  Google Scholar 

  • Shafiee F, Shamsfard M (2017) Similarity versus relatedness: a novel approach in extractive Farsi document summarisation. J Inf Sci 44:314–330

    Article  Google Scholar 

  • Shakeri H, Gholamrezazadeh S, Salehi MA, Ghadamyari F (2012) A new graph-based algorithm for Farsi text summarization. In: Computer science and convergence. Springer, Berlin, pp 21–30

  • Shamsfard M (2008) Developing FarsNet: a lexical ontology for Farsi. In: 4th Global WordNet conference, Szeged, Hungary

  • Shamsfard M, Akhavan T, Joorabchi ME (2009) Farsi document summarization by PARSUMIST. World Appl Sci J 7:199–205

    Google Scholar 

  • Shamsfard M et al (2010) Semi automatic development of farsnet; the Farsi wordnet. In: Proceedings of 5th global WordNet conference, Mumbai, India, vol 29

  • Song W, Choi LC, Park SC, Ding XF (2011) Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst Appl 38(8):9112–9121

    Article  Google Scholar 

  • Strutz T (2010) Data fitting and uncertainty: a practical introduction to weighted least squares and beyond. Vieweg and Teubner, Wiesbaden

    Google Scholar 

  • Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Jordan MI, LeCun Y, Solla SA (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 3104–3112

    Google Scholar 

  • Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: Proceedings of the 2009 SIAM international conference on data mining, pp 1148–1159

  • Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323

    Article  Google Scholar 

  • Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol 4, pp 142–147

  • Tofighy M, Kashefi O, Zamanifar A, Javadi HHS (2011) Farsi text summarization using fractal theory. In: International conference on informatics engineering and information science, pp 651–662

  • Tofighy SM, Raj RG, Javad HHS (2013) AHP techniques for Farsi text summarization. Malays J Comput Sci 26(1):1–8

    Google Scholar 

  • Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188

    Article  MathSciNet  Google Scholar 

  • van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605

    MATH  Google Scholar 

  • Zamanifar A, Kashefi O (2011) AZOM: a Persian structured text summarizer. In: International conference on application of natural language to information systems. Springer, Berlin, Heidelberg, pp 234–237

  • Zamanifar A, Minaei-Bidgoli B, Sharifi M (2008) A new hybrid Farsi text summarization technique based on term co-occurrence and conceptual property of the text. In: Ninth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing, 2008. SNPD’08, pp 635–639

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mohammad Fakhredanesh.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khademi, M.E., Fakhredanesh, M. Persian Automatic Text Summarization Based on Named Entity Recognition. Iran J Sci Technol Trans Electr Eng (2020). https://doi.org/10.1007/s40998-020-00352-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s40998-020-00352-2

Keywords

Navigation