International Journal on Digital Libraries

, Volume 19, Issue 4, pp 339–352 | Cite as

Open information extraction as an intermediate semantic structure for Persian text summarization

  • Mahmoud RahatEmail author
  • Alireza Talebpour


Semantic applications typically exploit structures such as dependency parse trees, phrase-chunking, semantic role labeling or open information extraction. In this paper, we introduce a novel application of Open IE as an intermediate layer for text summarization. Text summarization is an important method for providing relevant information in large digital libraries. Open IE is referred to the process of extracting machine-understandable structural propositions from text. We use these propositions as a building block to shorten the sentence and generate a summary of the text. The proposed system offers a new form of summarization that is able to break the structure of the sentence and extract the most significant sub-sentential elements. Other advantages include the ability to identify and eliminate less important sections of the sentence (such as adverbs, adjectives, appositions or dependent clauses), or duplicate pieces of sentences which in turn opens up the space for entering more sentences in the summary to enhance the coverage and coherency of it. The proposed system is localized for Persian language; however, it can be adopted to other languages. Experiments performed on a standard data set “Pasokh” with a standard comparison tool showed promising results for the proposed approach. We used summaries produced by the system in a real-world application in the virtual library of Shahid Beheshti University and received good feedbacks from users.


Text summarization Extractive summary Open information extraction Persian (Farsi) text processing 



We would like to thank the anonymous reviewers for their constructive comments, Asef Pourmasoumi for providing us with the data and benchmarking tool of Pasokh corpus, Azadeh Zamanifar for sharing the code of their summarizer and Seyedamin Monemian for his help on running the experiments.


  1. 1.
    Vo, D., Bagheri, E.: Open information extraction. Encycl. Semant. Comput. Robot. Intell. (2017). CrossRefGoogle Scholar
  2. 2.
    Angeli, G., Johnson Premkumar, M.J., Manning, C.D.: Leveraging linguistic structure for open domain information extraction. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics. 7th International Joint Conference Natural Language Processing, vol. 1 Long Papers, no. 1, pp. 344–354 (2015)Google Scholar
  3. 3.
    Khot, T., Sabharwal, A., Clark, P.: Answering Complex Questions Using Open Information Extraction. arXiv Preprint arXiv:1704.05572 (2017)
  4. 4.
    Schmitz, M., Bart, R., Soderland, S., Etzioni, O., et al. :Open language learning for information extraction. In: EMNLP-CoNLL ’12 Proceedings of 2012 Joint Conference Empirical Methods Natural Language Processing Computation Natural Language Learning, pp. 523–534 (2012)Google Scholar
  5. 5.
    Zhila, A., Gelbukh, A.: Open information extraction from real Internet texts in Spanish using constraints over part-of-speech sequences: problems of the method, their causes, and ways for improvement. Rev. Signos 49(90), 119–142 (2016)CrossRefGoogle Scholar
  6. 6.
    Zhila, A., Gelbukh, A.: Comparison of open information extraction for Spanish and English. Int. Dialogue Conf. 12(1), 794–802 (2013)Google Scholar
  7. 7.
    Zhila, A., Gelbukh, A.: Open information extraction for Spanish language based on syntactic constraints. In: Proceedings of ACL 2014 Student Resources Work, pp. 78–85 (2014)Google Scholar
  8. 8.
    Gamallo, P., Garcia, M.: Multilingual open information extraction. In: Portuguese Conference on Artificial Intelligence. Springer, Berlin (2015)Google Scholar
  9. 9.
    Falke, T., Stanovsky, G., Gurevych, I., Dagan, I.: Porting an open information extraction system from English to German. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pp. 892–898 (2016)Google Scholar
  10. 10.
    Rahat, M., Talebpour, A.: Parsa: an open information extraction system for Persian. Digit. Scholarsh. Humanit. (2018). CrossRefGoogle Scholar
  11. 11.
    Rahat, M., Talebpour, A., Monemian, S.: A recursive algorithm for open information extraction from Persian texts. Int. J. Comput. Appl. Technol. 57(3), 193–206 (2018)CrossRefGoogle Scholar
  12. 12.
    Stanovsky, G., Dagan, I.: Open IE as an intermediate structure for semantic tasks. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, vol. 2 Short Papers, pp. 303–308 (2015)Google Scholar
  13. 13.
    Christensen, J., et al.: Towards coherent multi-document summarization. In: Proceedings of the 2013 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Section 3, pp. 1163–1173 (2013)Google Scholar
  14. 14.
    Jaidka, K., Chandrasekaran, M.K., Rustagi, S., Kan, M.-Y.: Insights from CL-SciSumm 2016: the faceted scientific document summarization shared task. Int. J. Digit. Libr. 1–9 (2017). CrossRefGoogle Scholar
  15. 15.
    Conroy, J.M., Davis, S.T.: Section mixture models for scientific document summarization. Int. J. Digit. Libr. 86, 1–18 (2017)Google Scholar
  16. 16.
    Al Saied, H., Dugué, N., Lamirel, J.-C.: Automatic summarization of scientific publications using a feature selection approach. Int. J. Digit. Libr. (2017). CrossRefGoogle Scholar
  17. 17.
    Cohan, A., Goharian, N.: Scientific document summarization via citation contextualization and scientific discourse. Int. J. Digit. Libr. (2017). CrossRefGoogle Scholar
  18. 18.
    Richardson, W.R., Srinivasan, V., Fox, E.A.: Knowledge discovery in digital libraries of electronic theses and dissertations: an NDLTD case study. Int. J. Digit. Libr. 9(2), 163–171 (2008)CrossRefGoogle Scholar
  19. 19.
    Modaresi, P., Gross, P., Sefidrodi, S., Eckhof, M., Conrad, S.: On (commercial) benefits of automatic text summarization systems in the news domain: a case of media monitoring and media response analysis. arXiv Preprint arXiv:1701.00728 (2017)
  20. 20.
    Ferreira, R., et al.: A context based text summarization system. In: 2014 11th IAPR International Workshop on Document Analysis Systems (DAS), pp. 66–70 (2014)Google Scholar
  21. 21.
    Rush, A.M., Chopra, S., Weston, J.: A neural attention model for abstractive sentence summarization. arXiv Preprint arXiv:1509.00685 (2015)
  22. 22.
    Banerjee, S., Mitra, P., Sugiyama, K.: Multi-document abstractive summarization using ILP based multi-sentence compression. In: IJCAI, pp. 1208–1214 (2015)Google Scholar
  23. 23.
    Lloret, E., Romá-Ferri, M.T., Palomar, M.: COMPENDIUM: a text summarization system for generating abstracts of research papers. Data Knowl. Eng. 88, 164–175 (2013)CrossRefGoogle Scholar
  24. 24.
    Gambhir, M., Gupta, V.: Recent automatic text summarization techniques: a survey. Artif. Intell. Rev. 47(1), 1–66 (2017)CrossRefGoogle Scholar
  25. 25.
    Fang, C., Mu, D., Deng, Z., Wu, Z.: Word-sentence co-ranking for automatic extractive text summarization. Expert Syst. Appl. 72, 189–195 (2017)CrossRefGoogle Scholar
  26. 26.
    Ansamma, J., Premjith, P.S., Wilscy, M.: Extractive multi-document summarization using population-based multicriteria optimization. Expert Syst. Appl. 86, 385–397 (2017)CrossRefGoogle Scholar
  27. 27.
    Ouyang, Y., Li, W., Zhang, R., Li, S., Lu, Q.: A progressive sentence selection strategy for document summarization. Inf. Process. Manag. 49(1), 213–221 (2013)CrossRefGoogle Scholar
  28. 28.
    Allahyari, M., et al.: Text summarization techniques: a brief survey. arXiv Preprint arXiv:1707.02268 (2017)
  29. 29.
    Hassel, M.N.M.: FarsiSum: a Persian text summarizer. In: Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages. Association for Computational Linguistics (2004)Google Scholar
  30. 30.
    Zamanifar, A., Kashefi, O.: AZOM: a Persian structured text summarizer. In: International Conference on Application of Natural Language to Information Systems, pp. 234–237 (2011)CrossRefGoogle Scholar
  31. 31.
    Poormasoomi, A., Kahani, M., Toosi, S., Estiri, A., QAEIM, H.: IJAZ: an operational system for single-document summarization of Persian news texts. In: SIGNAL Data Processing, vol. 11, no. 1, pp. 33–48 (2014)Google Scholar
  32. 32.
    Del Corro, L., Gemulla, R.: ClausIE: clause-based open information extraction. In: Proceedings of the 22nd International Conference on World Wide Web, no. i, pp. 355–366 (2013)Google Scholar
  33. 33.
    Fader, A., Soderland, S., Etzioni, O.: Identifying relations for open information extraction. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing, pp. 1535–1545 (2011)Google Scholar
  34. 34.
    Nguyen, D.Q., Billingsley, R., Du, L., Johnson, M.: Improving topic models with latent feature word representations. Trans. Assoc. Comput. Linguist. 3, 299–313 (2015)Google Scholar
  35. 35.
    Nguyen, D.Q.: jLDADMM: a Java package for the LDA and DMM topic models. [Online]. (2015)
  36. 36.
    Li, L., et al.: Computational linguistics literature and citations oriented citation linkage, classification and summarization. Int. J. Digit. Libr. (2017). CrossRefGoogle Scholar
  37. 37.
    Li, L., et al.: CIST system for CL-SciSumm 2016 shared task. In: Proceedings of the Joint Workshop on Bibliometric-enhanced Information Retrieval and Natural Language Processing for Digital Libraries (BIRNDL), pp. 156–167 (2016)Google Scholar
  38. 38.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient Estimation of Word Representations in Vector Space, pp. 1–12 (2013)Google Scholar
  39. 39.
    Mikolov, T., Chen, K., Corrado, G., Dean, J.: Distributed Representations of Words and Phrases and their Compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)Google Scholar
  40. 40.
    Mikolov, T., Yih, W., Zweig, G.: Linguistic regularities in continuous space word representations. In: Proceedings NAACL-HLT, no. June, pp. 746–751 (2013)Google Scholar
  41. 41.
    Mojgan, S., Jahani, C., Megyesi, B., Nivre, J.: A Persian treebank with stanford typed dependencies. In: Proceedings of the 9th International Conference on Language Resources and Evaluation, pp. 796–801 (2014)Google Scholar
  42. 42.
    Moghaddas, B.B., Kahani, M., Toosi, S. A., Pourmasoumi, A., Estiri, A.: Pasokh: A standard corpus for the evaluation of Persian text summarizers. In: Proceedings of the 2013 3th International eConference on Computer and Knowledge Engineering, ICCKE, pp. 471–475 (2013)Google Scholar
  43. 43.
    Lin, C., Hovy, E., Rey, M.: Automatic evaluation of summaries using N-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology, vol. 1, no. June, pp. 71–78 (2003)Google Scholar
  44. 44.
    Steinberger, J., Ježek, K.: Evaluation measures for text summarization. Comput. Inform. 28, 1001–1025 (2009)zbMATHGoogle Scholar
  45. 45.
    Ledeneva, Y.N. : Automatic language-independent detection of multiword descriptions for text summarization. Instituto Politécnico Naciona (2008)Google Scholar
  46. 46.
    Lin, C., Rey, M.: ROUGE?: a package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Work (2004)Google Scholar
  47. 47.
    Mirella, L., Barzilay, R.: Automatic evaluation of text coherence: models and representations. IJCAI 5, 1085–1090 (2005)Google Scholar
  48. 48.
    Lloret, E., Palomar, M.: Tackling redundancy in text summarization through different levels of language analysis. Comput. Stand. Interfaces 35(5), 507–518 (2013)CrossRefGoogle Scholar
  49. 49.
    Siddharthan, A., Nenkova, A., McKeown, K.: Syntactic simplification for improving content selection in multi-document summarization. In: Proceedings of the 20th International Conference on Computational Linguistics, p. 896 (2004)Google Scholar
  50. 50.
    Thadani, K., McKeown, K.: A framework for identifying textual redundancy. In: Proceedings of the 22nd International Conference on Computational Linguistics-Volume 1, pp. 873–880 (2008)Google Scholar
  51. 51.
    Carrillo-Mendoza, P., Calvo, H., Gelbukh, A.: Intra-document and inter-document redundancy in multi-document summarization. In: Mexican International Conference on Artificial Intelligence, pp. 105–115 (2016)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag GmbH Germany, part of Springer Nature 2018

Authors and Affiliations

  1. 1.Faculty of Computer Science and EngineeringShahid Beheshti UniversityDaneshjo Blv, VelenjakIran

Personalised recommendations