Abstract
In this paper, we propose an unsupervised method for summarizing Farsi texts based on our neural named entity recognition (NER) system. This method consists of three phases: training a supervised NER model, recognizing named entities of the text, and generating a summary. The proposed method is an unsupervised extractive single-document summarization method. Although the proposed method is language independent, we focus on Farsi text summarization in this work. Firstly, we produce a word embedding based on Hamshahri2 corpus. Secondly, we train a neural network on Arman NER corpus. Then, the proposed algorithm ranks the sentences of the text based on the named entities in each sentence and produces the summary. Finally, the proposed method is evaluated on Pasokh single-document data set using the ROUGE evaluation measure. Without using any handcrafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Farsi methods, and we achieved an overall improvement of ROUGE-2 recall score of 10.2%.
Similar content being viewed by others
References
AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri a standard Farsi text collection. Knowl-Based Syst 22(5):382–387
Arman Named Entity Recognition corpora. http://dadegan.ir/catalog/armanner. Accessed 21 Oct 2019
Asef P, Mohsen K, Ahmad TS, Ahmad E, Hadi Q (2014) IJAZ: an operational system for single-document summarization of Farsi news texts. Signal Data Process 11(121):33–48
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: Semantic Web, pp 722–735
Baxendale PB (1958) Machine-made index for technical literature—an experiment. IBM J Res Dev 2(4):354–361
Bazghandi M, Tabrizi GT, Jahan MV, Mashahd I (2012) Extractive summarization Of Farsi documents based on PSO clustering. jiA 1:1
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155
Berger A, Mittal VO (2000) Query-relevant Summarization Using FAQs. In: Proceedings of the 38th annual meeting on association for computational linguistics. Stroudsburg, PA, USA, pp 294–301
Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning
Chen Z et al (2015) Revisiting word embedding for contrasting meaning. In: ACL, vol 1, pp 106–115
Chiu JP, Nichols E (2015) Named entity recognition with bidirectional LSTM-CNNs. ArXiv Prepr. arXiv:151108308
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
Dalianis H (2000) Swesum: a text summerizer for swedish. KTH
Devlin J, Zbib R, Huang Z, Lamar T, Schwartz RM, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: ACL, vol 1, pp 1370–1380
Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Google Code-word2vec. https://code.google.com/archive/p/word2vec/. Accessed 24 Feb 2019
Hassel M, Mazdak N (2004) FarsiSum: a Farsi text summarizer. In: Proceedings of the workshop on computational approaches to Arabic script-based languages, pp 82–84
Hazm (2017) Python library for digesting Farsi text. Sobhe
Hinton GE, Mcclelland JL, Rumelhart DE (1986) Distributed representations, parallel distributed processing: explorations in the microstructure of cognition: foundations, vol 1. MIT Press, Cambridge
Honarpisheh MA, Ghassem-Sani G, Mirroshandel SA (2008) A multi-document multi-lingual automatic summarization system. In: IJCNLP, pp 733–738
Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 525–533
Khademi ME, Fakhredanesh M, Hoseini SM (2020) Conceptual Persian Text Summarizer: a new model in continuous vector space. Int Arab J Inf Technol 17(4):529–538
Khanpour H (2009) Sentence extraction for summarization and notetaking. University of Malaya, Kuala Lumpur
Kiyomarsi F, Esfahani FR (2011) Optimizing Farsi text summarization based on fuzzy logic approach. In: 2011 international conference on intelligent building and management
Lin C (2004) Rouge: a package for automatic evaluation of summaries. In: Workshop on text summarization branches out at ACL, pp 74–81
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Mikolov T, Deoras A, Kombrink S, Burget L, Černocký J (2011) Empirical evaluation and combination of advanced language modeling techniques. In: Twelfth annual conference of the international speech communication association
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013a) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Red Hook, pp 3111–3119
Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space. ArXiv Prepr. arXiv:13013781
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Moghaddas BB, Kahani M, Toosi SA, Pourmasoumi A, Estiri A (2013) Pasokh: a standard corpus for the evaluation of Farsi text summarizers. In: 2013 3th international eConference on computer and knowledge engineering (ICCKE), pp 471–475
Pradhan S et al (2013) Towards robust linguistic analysis using OntoNotes. In: CoNLL, pp 143–152
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Rxnlp (2017) ROUGE-2.0: Java implementation of ROUGE for evaluation of summarization tasks. Stemming, stopwords and unicode support
Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518
Shafiee F, Shamsfard M (2017) Similarity versus relatedness: a novel approach in extractive Farsi document summarisation. J Inf Sci 44:314–330
Shakeri H, Gholamrezazadeh S, Salehi MA, Ghadamyari F (2012) A new graph-based algorithm for Farsi text summarization. In: Computer science and convergence. Springer, Berlin, pp 21–30
Shamsfard M (2008) Developing FarsNet: a lexical ontology for Farsi. In: 4th Global WordNet conference, Szeged, Hungary
Shamsfard M, Akhavan T, Joorabchi ME (2009) Farsi document summarization by PARSUMIST. World Appl Sci J 7:199–205
Shamsfard M et al (2010) Semi automatic development of farsnet; the Farsi wordnet. In: Proceedings of 5th global WordNet conference, Mumbai, India, vol 29
Song W, Choi LC, Park SC, Ding XF (2011) Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst Appl 38(8):9112–9121
Strutz T (2010) Data fitting and uncertainty: a practical introduction to weighted least squares and beyond. Vieweg and Teubner, Wiesbaden
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Jordan MI, LeCun Y, Solla SA (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 3104–3112
Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: Proceedings of the 2009 SIAM international conference on data mining, pp 1148–1159
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol 4, pp 142–147
Tofighy M, Kashefi O, Zamanifar A, Javadi HHS (2011) Farsi text summarization using fractal theory. In: International conference on informatics engineering and information science, pp 651–662
Tofighy SM, Raj RG, Javad HHS (2013) AHP techniques for Farsi text summarization. Malays J Comput Sci 26(1):1–8
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
Zamanifar A, Kashefi O (2011) AZOM: a Persian structured text summarizer. In: International conference on application of natural language to information systems. Springer, Berlin, Heidelberg, pp 234–237
Zamanifar A, Minaei-Bidgoli B, Sharifi M (2008) A new hybrid Farsi text summarization technique based on term co-occurrence and conceptual property of the text. In: Ninth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing, 2008. SNPD’08, pp 635–639
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Khademi, M.E., Fakhredanesh, M. Persian Automatic Text Summarization Based on Named Entity Recognition. Iran J Sci Technol Trans Electr Eng (2020). https://doi.org/10.1007/s40998-020-00352-2
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s40998-020-00352-2