Persian Automatic Text Summarization Based on Named Entity Recognition

Khademi, Mohammad Ebrahim; Fakhredanesh, Mohammad

doi:10.1007/s40998-020-00352-2

301 Accesses
6 Citations
Explore all metrics

Abstract

In this paper, we propose an unsupervised method for summarizing Farsi texts based on our neural named entity recognition (NER) system. This method consists of three phases: training a supervised NER model, recognizing named entities of the text, and generating a summary. The proposed method is an unsupervised extractive single-document summarization method. Although the proposed method is language independent, we focus on Farsi text summarization in this work. Firstly, we produce a word embedding based on Hamshahri2 corpus. Secondly, we train a neural network on Arman NER corpus. Then, the proposed algorithm ranks the sentences of the text based on the named entities in each sentence and produces the summary. Finally, the proposed method is evaluated on Pasokh single-document data set using the ROUGE evaluation measure. Without using any handcrafted features, our proposed method achieves state-of-the-art results. We compared our unsupervised method with the best supervised Farsi methods, and we achieved an overall improvement of ROUGE-2 recall score of 10.2%.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

AleAhmad A, Amiri H, Darrudi E, Rahgozar M, Oroumchian F (2009) Hamshahri a standard Farsi text collection. Knowl-Based Syst 22(5):382–387
Article Google Scholar
Arman Named Entity Recognition corpora. http://dadegan.ir/catalog/armanner. Accessed 21 Oct 2019
Asef P, Mohsen K, Ahmad TS, Ahmad E, Hadi Q (2014) IJAZ: an operational system for single-document summarization of Farsi news texts. Signal Data Process 11(121):33–48
Google Scholar
Auer S, Bizer C, Kobilarov G, Lehmann J, Cyganiak R, Ives Z (2007) Dbpedia: a nucleus for a web of open data. In: Semantic Web, pp 722–735
Baxendale PB (1958) Machine-made index for technical literature—an experiment. IBM J Res Dev 2(4):354–361
Article Google Scholar
Bazghandi M, Tabrizi GT, Jahan MV, Mashahd I (2012) Extractive summarization Of Farsi documents based on PSO clustering. jiA 1:1
Google Scholar
Bengio Y, Ducharme R, Vincent P, Jauvin C (2003) A neural probabilistic language model. J Mach Learn Res 3(Feb):1137–1155
MATH Google Scholar
Berger A, Mittal VO (2000) Query-relevant Summarization Using FAQs. In: Proceedings of the 38th annual meeting on association for computational linguistics. Stroudsburg, PA, USA, pp 294–301
Brants T, Popat AC, Xu P, Och FJ, Dean J (2007) Large language models in machine translation. In: Proceedings of the joint conference on empirical methods in natural language processing and computational natural language learning
Chen Z et al (2015) Revisiting word embedding for contrasting meaning. In: ACL, vol 1, pp 106–115
Chiu JP, Nichols E (2015) Named entity recognition with bidirectional LSTM-CNNs. ArXiv Prepr. arXiv:151108308
Collobert R, Weston J (2008) A unified architecture for natural language processing: deep neural networks with multitask learning. In: Proceedings of the 25th international conference on machine learning, pp 160–167
Collobert R, Weston J, Bottou L, Karlen M, Kavukcuoglu K, Kuksa P (2011) Natural language processing (almost) from scratch. J Mach Learn Res 12(Aug):2493–2537
MATH Google Scholar
Dalianis H (2000) Swesum: a text summerizer for swedish. KTH
Devlin J, Zbib R, Huang Z, Lamar T, Schwartz RM, Makhoul J (2014) Fast and robust neural network joint models for statistical machine translation. In: ACL, vol 1, pp 1370–1380
Edmundson HP (1969) New methods in automatic extracting. J ACM 16(2):264–285
Article Google Scholar
Glorot X, Bengio Y (2010) Understanding the difficulty of training deep feedforward neural networks. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics, pp 249–256
Google Code-word2vec. https://code.google.com/archive/p/word2vec/. Accessed 24 Feb 2019
Hassel M, Mazdak N (2004) FarsiSum: a Farsi text summarizer. In: Proceedings of the workshop on computational approaches to Arabic script-based languages, pp 82–84
Hazm (2017) Python library for digesting Farsi text. Sobhe
Hinton GE, Mcclelland JL, Rumelhart DE (1986) Distributed representations, parallel distributed processing: explorations in the microstructure of cognition: foundations, vol 1. MIT Press, Cambridge
Google Scholar
Honarpisheh MA, Ghassem-Sani G, Mirroshandel SA (2008) A multi-document multi-lingual automatic summarization system. In: IJCNLP, pp 733–738
Jin F, Huang M, Zhu X (2010) A comparative study on ranking and selection strategies for multi-document summarization. In: Proceedings of the 23rd international conference on computational linguistics: posters, pp 525–533
Khademi ME, Fakhredanesh M, Hoseini SM (2020) Conceptual Persian Text Summarizer: a new model in continuous vector space. Int Arab J Inf Technol 17(4):529–538
Google Scholar
Khanpour H (2009) Sentence extraction for summarization and notetaking. University of Malaya, Kuala Lumpur
Google Scholar
Kiyomarsi F, Esfahani FR (2011) Optimizing Farsi text summarization based on fuzzy logic approach. In: 2011 international conference on intelligent building and management
Lin C (2004) Rouge: a package for automatic evaluation of summaries. In: Workshop on text summarization branches out at ACL, pp 74–81
Luhn HP (1958) The automatic creation of literature abstracts. IBM J Res Dev 2(2):159–165
Article MathSciNet Google Scholar
Mikolov T, Deoras A, Kombrink S, Burget L, Černocký J (2011) Empirical evaluation and combination of advanced language modeling techniques. In: Twelfth annual conference of the international speech communication association
Mikolov T, Sutskever I, Chen K, Corrado GS, Dean J (2013a) Distributed representations of words and phrases and their compositionality. In: Burges CJC, Bottou L, Welling M, Ghahramani Z, Weinberger KQ (eds) Advances in neural information processing systems, vol 26. Curran Associates Inc, Red Hook, pp 3111–3119
Google Scholar
Mikolov T, Chen K, Corrado G, Dean J (2013b) Efficient estimation of word representations in vector space. ArXiv Prepr. arXiv:13013781
Miller GA (1995) WordNet: a lexical database for English. Commun ACM 38(11):39–41
Article Google Scholar
Moghaddas BB, Kahani M, Toosi SA, Pourmasoumi A, Estiri A (2013) Pasokh: a standard corpus for the evaluation of Farsi text summarizers. In: 2013 3th international eConference on computer and knowledge engineering (ICCKE), pp 471–475
Pradhan S et al (2013) Towards robust linguistic analysis using OntoNotes. In: CoNLL, pp 143–152
Roweis ST, Saul LK (2000) Nonlinear dimensionality reduction by locally linear embedding. Science 290(5500):2323–2326
Article Google Scholar
Rxnlp (2017) ROUGE-2.0: Java implementation of ROUGE for evaluation of summarization tasks. Stemming, stopwords and unicode support
Schwenk H (2007) Continuous space language models. Comput Speech Lang 21(3):492–518
Article Google Scholar
Shafiee F, Shamsfard M (2017) Similarity versus relatedness: a novel approach in extractive Farsi document summarisation. J Inf Sci 44:314–330
Article Google Scholar
Shakeri H, Gholamrezazadeh S, Salehi MA, Ghadamyari F (2012) A new graph-based algorithm for Farsi text summarization. In: Computer science and convergence. Springer, Berlin, pp 21–30
Shamsfard M (2008) Developing FarsNet: a lexical ontology for Farsi. In: 4th Global WordNet conference, Szeged, Hungary
Shamsfard M, Akhavan T, Joorabchi ME (2009) Farsi document summarization by PARSUMIST. World Appl Sci J 7:199–205
Google Scholar
Shamsfard M et al (2010) Semi automatic development of farsnet; the Farsi wordnet. In: Proceedings of 5th global WordNet conference, Mumbai, India, vol 29
Song W, Choi LC, Park SC, Ding XF (2011) Fuzzy evolutionary optimization modeling and its applications to unsupervised categorization and extractive summarization. Expert Syst Appl 38(8):9112–9121
Article Google Scholar
Strutz T (2010) Data fitting and uncertainty: a practical introduction to weighted least squares and beyond. Vieweg and Teubner, Wiesbaden
Google Scholar
Sutskever I, Vinyals O, Le QV (2014) Sequence to sequence learning with neural networks. In: Jordan MI, LeCun Y, Solla SA (eds) Advances in neural information processing systems. MIT Press, Cambridge, pp 3104–3112
Google Scholar
Tang J, Yao L, Chen D (2009) Multi-topic based query-oriented summarization. In: Proceedings of the 2009 SIAM international conference on data mining, pp 1148–1159
Tenenbaum JB, De Silva V, Langford JC (2000) A global geometric framework for nonlinear dimensionality reduction. Science 290(5500):2319–2323
Article Google Scholar
Tjong Kim Sang EF, De Meulder F (2003) Introduction to the CoNLL-2003 shared task: language-independent named entity recognition. In: Proceedings of the seventh conference on Natural language learning at HLT-NAACL 2003, vol 4, pp 142–147
Tofighy M, Kashefi O, Zamanifar A, Javadi HHS (2011) Farsi text summarization using fractal theory. In: International conference on informatics engineering and information science, pp 651–662
Tofighy SM, Raj RG, Javad HHS (2013) AHP techniques for Farsi text summarization. Malays J Comput Sci 26(1):1–8
Google Scholar
Turney PD, Pantel P (2010) From frequency to meaning: vector space models of semantics. J Artif Intell Res 37:141–188
Article MathSciNet Google Scholar
van der Maaten L, Hinton G (2008) Visualizing data using t-SNE. J Mach Learn Res 9:2579–2605
MATH Google Scholar
Zamanifar A, Kashefi O (2011) AZOM: a Persian structured text summarizer. In: International conference on application of natural language to information systems. Springer, Berlin, Heidelberg, pp 234–237
Zamanifar A, Minaei-Bidgoli B, Sharifi M (2008) A new hybrid Farsi text summarization technique based on term co-occurrence and conceptual property of the text. In: Ninth ACIS international conference on software engineering, artificial intelligence, networking, and parallel/distributed computing, 2008. SNPD’08, pp 635–639

Download references

Author information

Authors and Affiliations

Malek Ashtar University of Technology, Tehran, Tehran, Iran
Mohammad Ebrahim Khademi & Mohammad Fakhredanesh

Authors

Mohammad Ebrahim Khademi
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Fakhredanesh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammad Fakhredanesh.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Khademi, M.E., Fakhredanesh, M. Persian Automatic Text Summarization Based on Named Entity Recognition. Iran J Sci Technol Trans Electr Eng (2020). https://doi.org/10.1007/s40998-020-00352-2

Download citation

Received: 20 August 2018
Accepted: 23 May 2020
Published: 03 July 2020
DOI: https://doi.org/10.1007/s40998-020-00352-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Persian Automatic Text Summarization Based on Named Entity Recognition

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

Information extraction from electronic medical documents: state of the art and future research directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Persian Automatic Text Summarization Based on Named Entity Recognition

Abstract

Access this article

Similar content being viewed by others

Impact of word embedding models on text analytics in deep learning environment: a review

TextConvoNet: a convolutional neural network based architecture for text classification

Information extraction from electronic medical documents: state of the art and future research directions

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation