Open information extraction as an intermediate semantic structure for Persian text summarization

Rahat, Mahmoud; Talebpour, Alireza

doi:10.1007/s00799-018-0244-z

Open information extraction as an intermediate semantic structure for Persian text summarization

Published: 28 June 2018

Volume 19, pages 339–352, (2018)
Cite this article

International Journal on Digital Libraries Aims and scope Submit manuscript

388 Accesses
3 Citations
2 Altmetric
1 Mention
Explore all metrics

Abstract

Semantic applications typically exploit structures such as dependency parse trees, phrase-chunking, semantic role labeling or open information extraction. In this paper, we introduce a novel application of Open IE as an intermediate layer for text summarization. Text summarization is an important method for providing relevant information in large digital libraries. Open IE is referred to the process of extracting machine-understandable structural propositions from text. We use these propositions as a building block to shorten the sentence and generate a summary of the text. The proposed system offers a new form of summarization that is able to break the structure of the sentence and extract the most significant sub-sentential elements. Other advantages include the ability to identify and eliminate less important sections of the sentence (such as adverbs, adjectives, appositions or dependent clauses), or duplicate pieces of sentences which in turn opens up the space for entering more sentences in the summary to enhance the coverage and coherency of it. The proposed system is localized for Persian language; however, it can be adopted to other languages. Experiments performed on a standard data set “Pasokh” with a standard comparison tool showed promising results for the proposed approach. We used summaries produced by the system in a real-world application in the virtual library of Shahid Beheshti University and received good feedbacks from users.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Recent automatic text summarization techniques: a survey

Article 29 March 2016

A survey on narrative extraction from textual data

Article Open access 06 January 2023

A sentence is known by the company it keeps: Improving Legal Document Summarization Using Deep Clustering

Article 01 February 2023

Notes

http://swesum.nada.kth.se/index-farsi.html.
http://swesum.nada.kth.se/index-eng.html.
http://textmining.noornet.net/FA/Summarization.
http://www.matnak.com.
http://ijaz.um.ac.ir/.
https://github.com/sobhe/baaz.
http://www.mehrnews.com.
An n-gram is n consecutive words from a given text.

References

Vo, D., Bagheri, E.: Open information extraction. Encycl. Semant. Comput. Robot. Intell. (2017). https://doi.org/10.1142/S2425038416300032
Article Google Scholar
Angeli, G., Johnson Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 7th International Joint Conference Natural Language Processing, vol. 1 Long Papers, no. 1, pp. 344–354 (2015)
Khot, T., Sabharwal, A., Clark, P.: Answering Complex Questions Using Open Information Extraction. arXiv Preprint arXiv:1704.05572 (2017)
Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al. :Open language learning for information extraction. In: EMNLP-CoNLL ’12 Proceedings of 2012 Joint Conference Empirical Methods Natural Language Processing Computation Natural Language Learning, pp. 523–534 (2012)
Zhila, A., Gelbukh, A.: Open information extraction from real Internet texts in Spanish using constraints over part-of-speech sequences: problems of the method, their causes, and ways for improvement. Rev. Signos 49(90), 119–142 (2016)
Article Google Scholar
Zhila, A., Gelbukh, A.: Comparison of open information extraction for Spanish and English. Int. Dialogue Conf. 12(1), 794–802 (2013)
Google Scholar
Zhila, A., Gelbukh, A.: Open information extraction for Spanish language based on syntactic constraints. In: Proceedings of ACL 2014 Student Resources Work, pp. 78–85 (2014)
Gamallo, P., Garcia, M.: Multilingual open information extraction. In: Portuguese Conference on Artificial Intelligence. Springer, Berlin (2015)
Google Scholar
Falke, T., Stanovsky, G., Gurevych, I., Dagan, I.: Porting an open information extraction system from English to German. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 892–898 (2016)
Rahat, M., Talebpour, A.: Parsa: an open information extraction system for Persian. Digit. Scholarsh. Humanit. (2018). https://doi.org/10.1093/llc/fqy003
Article Google Scholar
Rahat, M., Talebpour, A., Monemian, S.: A recursive algorithm for open information extraction from Persian texts. Int. J. Comput. Appl. Technol. 57(3), 193–206 (2018)
Article Google Scholar
Stanovsky, G., Dagan, I.: Open IE as an intermediate structure for semantic tasks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2 Short Papers, pp. 303–308 (2015)
Christensen, J., et al.: Towards coherent multi-document summarization. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Section 3, pp. 1163–1173 (2013)
Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.-Y.: Insights from CL-SciSumm 2016: the faceted scientific document summarization shared task. Int. J. Digit. Libr. 1–9 (2017). https://doi.org/10.1007/s00799-017-0221-y
Article Google Scholar
Conroy, J.M., Davis, S.T.: Section mixture models for scientific document summarization. Int. J. Digit. Libr. 86, 1–18 (2017)
Google Scholar
Al Saied, H., Dugué, N., Lamirel, J.-C.: Automatic summarization of scientific publications using a feature selection approach. Int. J. Digit. Libr. (2017). https://doi.org/10.1007/s00799-017-0214-x
Article Google Scholar
Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. (2017). https://doi.org/10.1007/s00799-017-0216-8
Article Google Scholar
Richardson, W.R., Srinivasan, V., Fox, E.A.: Knowledge discovery in digital libraries of electronic theses and dissertations: an NDLTD case study. Int. J. Digit. Libr. 9(2), 163–171 (2008)
Article Google Scholar
Modaresi, P., Gross, P., Sefidrodi, S., Eckhof, M., Conrad, S.: On (commercial) benefits of automatic text summarization systems in the news domain: a case of media monitoring and media response analysis. arXiv Preprint arXiv:1701.00728 (2017)
Ferreira, R., et al.: A context based text summarization system. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 66–70 (2014)
Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv Preprint arXiv:1509.00685 (2015)
Banerjee, S., Mitra, P., Sugiyama, K.: Multi-document abstractive summarization using ILP based multi-sentence compression. In: IJCAI, pp. 1208–1214 (2015)
Lloret, E., Romá-Ferri, M.T., Palomar, M.: COMPENDIUM: a text summarization system for generating abstracts of research papers. Data Knowl. Eng. 88, 164–175 (2013)
Article Google Scholar
Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)
Article Google Scholar
Fang, C., Mu, D., Deng, Z., Wu, Z.: Word-sentence co-ranking for automatic extractive text summarization. Expert Syst. Appl. 72, 189–195 (2017)
Article Google Scholar
Ansamma, J., Premjith, P.S., Wilscy, M.: Extractive multi-document summarization using population-based multicriteria optimization. Expert Syst. Appl. 86, 385–397 (2017)
Article Google Scholar
Ouyang, Y., Li, W., Zhang, R., Li, S., Lu, Q.: A progressive sentence selection strategy for document summarization. Inf. Process. Manag. 49(1), 213–221 (2013)
Article Google Scholar
Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv Preprint arXiv:1707.02268 (2017)
Hassel, M.N.M.: FarsiSum: a Persian text summarizer. In: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages. Association for Computational Linguistics (2004)
Zamanifar, A., Kashefi, O.: AZOM: a Persian structured text summarizer. In: International Conference on Application of Natural Language to Information Systems, pp. 234–237 (2011)
Chapter Google Scholar
Poormasoomi, A., Kahani, M., Toosi, S., Estiri, A., QAEIM, H.: IJAZ: an operational system for single-document summarization of Persian news texts. In: SIGNAL Data Processing, vol. 11, no. 1, pp. 33–48 (2014)
Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, no. i, pp. 355–366 (2013)
Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545 (2011)
Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)
Google Scholar
Nguyen, D.Q.: jLDADMM: a Java package for the LDA and DMM topic models. [Online]. http://jldadmm.sourceforge.net/ (2015)
Li, L., et al.: Computational linguistics literature and citations oriented citation linkage, classification and summarization. Int. J. Digit. Libr. (2017). https://doi.org/10.1007/s00799-017-0219-5
Article Google Scholar
Li, L., et al.: CIST system for CL-SciSumm 2016 shared task. In: Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), pp. 156–167 (2016)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space, pp. 1–12 (2013)
Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)
Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings NAACL-HLT, no. June, pp. 746–751 (2013)
Mojgan, S., Jahani, C., Megyesi, B., Nivre, J.: A Persian treebank with stanford typed dependencies. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, pp. 796–801 (2014)
Moghaddas, B.B., Kahani, M., Toosi, S. A., Pourmasoumi, A., Estiri, A.: Pasokh: A standard corpus for the evaluation of Persian text summarizers. In: Proceedings of the 2013 3th International eConference on Computer and Knowledge Engineering, ICCKE, pp. 471–475 (2013)
Lin, C., Hovy, E., Rey, M.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, no. June, pp. 71–78 (2003)
Steinberger, J., Ježek, K.: Evaluation measures for text summarization. Comput. Inform. 28, 1001–1025 (2009)
MATH Google Scholar
Ledeneva, Y.N. : Automatic language-independent detection of multiword descriptions for text summarization. Instituto Politécnico Naciona (2008)
Lin, C., Rey, M.: ROUGE?: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Work (2004)
Mirella, L., Barzilay, R.: Automatic evaluation of text coherence: models and representations. IJCAI 5, 1085–1090 (2005)
Google Scholar
Lloret, E., Palomar, M.: Tackling redundancy in text summarization through different levels of language analysis. Comput. Stand. Interfaces 35(5), 507–518 (2013)
Article Google Scholar
Siddharthan, A., Nenkova, A., McKeown, K.: Syntactic simplification for improving content selection in multi-document summarization. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 896 (2004)
Thadani, K., McKeown, K.: A framework for identifying textual redundancy. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 873–880 (2008)
Carrillo-Mendoza, P., Calvo, H., Gelbukh, A.: Intra-document and inter-document redundancy in multi-document summarization. In: Mexican International Conference on Artificial Intelligence, pp. 105–115 (2016)
Chapter Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers for their constructive comments, Asef Pourmasoumi for providing us with the data and benchmarking tool of Pasokh corpus, Azadeh Zamanifar for sharing the code of their summarizer and Seyedamin Monemian for his help on running the experiments.

Author information

Authors and Affiliations

Faculty of Computer Science and Engineering, Shahid Beheshti University, Daneshjo Blv, Velenjak, Tehran, Iran
Mahmoud Rahat & Alireza Talebpour

Authors

Mahmoud Rahat
View author publications
You can also search for this author in PubMed Google Scholar
Alireza Talebpour
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mahmoud Rahat.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rahat, M., Talebpour, A. Open information extraction as an intermediate semantic structure for Persian text summarization. Int J Digit Libr 19, 339–352 (2018). https://doi.org/10.1007/s00799-018-0244-z

Download citation

Received: 12 September 2017
Revised: 05 May 2018
Accepted: 08 May 2018
Published: 28 June 2018
Issue Date: November 2018
DOI: https://doi.org/10.1007/s00799-018-0244-z

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Open information extraction as an intermediate semantic structure for Persian text summarization

Abstract

Access this article

Similar content being viewed by others

Recent automatic text summarization techniques: a survey

A survey on narrative extraction from textual data

A sentence is known by the company it keeps: Improving Legal Document Summarization Using Deep Clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Open information extraction as an intermediate semantic structure for Persian text summarization

Abstract

Access this article

Similar content being viewed by others

Recent automatic text summarization techniques: a survey

A survey on narrative extraction from textual data

A sentence is known by the company it keeps: Improving Legal Document Summarization Using Deep Clustering

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation