Advertisement

Intra-document and Inter-document Redundancy in Multi-document Summarization

  • Pabel Carrillo-Mendoza
  • Hiram Calvo
  • Alexander Gelbukh
Conference paper
Part of the Lecture Notes in Computer Science book series (LNCS, volume 10061)

Abstract

Multi-document summarization differs from single-document summarization in excessive redundancy of mentions of some events or ideas. We show how the amount of redundancy in a document collection can be used for assigning importance to sentences in multi-document extractive summarization: for instance, an idea could be important if it is redundant across documents because of its popularity; on the other hand, an idea could be important if it is not redundant across documents because of its novelty. We propose an unsupervised graph-based technique that, based on proper similarity measures, allows us to experiment with intra-document and inter-document redundancy. Our experiments on DUC corpora show promising results.

Keywords

Multi-document summarization Graph-based methods Unsupervised summarization Doc2vec Intra-document redundancy Per-document redundancy Inter-document redundancy Cross-documents redundancy 

References

  1. 1.
    Nenkova, A., McKeown, K.: Automatic summarization. Found. Trends Inf. Retr. 5(2–3), 103–233 (2011)CrossRefGoogle Scholar
  2. 2.
    Cambria, E., Poria, S., Gelbukh, A., Kwok, K.: Sentic API: a common-sense based API for concept-level sentiment analysis. In: Proceedings of the 4th Workshop on Making Sense of Microposts, Co-located with WWW 2014, 23rd International World Wide Web Conference, Number 1141 in CEUR Workshop Proceedings (2014)Google Scholar
  3. 3.
    Poria, S., Gelbukh, A., Agarwal, B., Cambria, E., Howard, N.: Common sense knowledge based personality recognition from text. In: Castro, F., Gelbukh, A., González, M. (eds.) MICAI 2013. LNCS, vol. 8266, pp. 484–496. Springer, Heidelberg (2013). doi: 10.1007/978-3-642-45111-9_42 CrossRefGoogle Scholar
  4. 4.
    Poria, S., Gelbukh, A., Hussain, A., Howard, N., Das, D., Bandyopadhyay, S.: Enhanced SenticNet with affective labels for concept-based opinion mining. IEEE Intell. Syst. 28, 31–38 (2013)CrossRefGoogle Scholar
  5. 5.
    Cambria, E., Poria, S., Bajpai, R., Schuller, B.: SenticNet 4: A semantic resource for sentiment analysis based on conceptual primitives. In: 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan (2016)Google Scholar
  6. 6.
    Poria, S., Cambria, E., Hazarika, D., Vij, P.: A deeper look into sarcastic tweets using deep convolutional neural networks. In: 26th International Conference on Computational Linguistics (COLING 2016), Osaka, Japan, pp. 1601–1612 (2016)Google Scholar
  7. 7.
    Celikyilmaz, A., Hakkani-Tur, D.: A hybrid hierarchical model for multi-document summarization. In: Proceedings of 48th Annual Meeting of the Association for Computational Linguistics (ACL 2010), Uppsala, Sweden, pp. 815–824 (2010)Google Scholar
  8. 8.
    Prasad Pingali, R.K, Varma, V.: IIIT hyderabad at DUC 2007. In: Proceedings of 7th Document Understanding Conference (DUC 2007), Rochester, NY (2007)Google Scholar
  9. 9.
    Carbonell, J., Goldstein, J.: The use of MMR, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 1998), Melbourne, Australia. pp. 335–336 (1998)Google Scholar
  10. 10.
    Li, Y., Li, S.: Query-focused multi-document summarization: Combining a topic model with graph-based semi-supervised learning. In: Proceedings of 25th International Conference on Computational Linguistics (COLING 2014), Dublin, Ireland, pp. 1197–1207 (2014)Google Scholar
  11. 11.
    Ouyang, Y., Li, W., Li, S., Qin, L.: Applying regression models to query-focused multi-document summarization. Inf. Process. Manag. 47(2), 227–237 (2011)CrossRefGoogle Scholar
  12. 12.
    Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)CrossRefzbMATHGoogle Scholar
  13. 13.
    Toutanova, K., Brockett, C., Gamon, M., Jagarlamudi, J., Suzuki, H., Vanderwende, L.: The PYTHY summarization system: Microsoft research at DUC 2007. In: Proceedings of 7th Document Understanding Conference(DUC 2007), Rochester, NY (2007)Google Scholar
  14. 14.
    Parveen, D., Strube, M.: Multi-document summarization using bipartite graphs. In: Proceedings of TextGraphs-9: Graph-based Methods for Natural Language Processing, Workshop at EMNLP 2014, Doha, Qatar, pp. 15–24 (2014)Google Scholar
  15. 15.
    Galanis, D., Lampouras, G., Androutsopoulos, I.: Extractive multi-document summarization with integer linear programming and support vector regression. In: Proceedings of 24th International Conference on Computational Linguistics (COLING 2012), Mumbai, India, pp. 911–926 (2012)Google Scholar
  16. 16.
    Mihalcea, R., Tarau, P.: A language independent algorithm for single and multiple document summarization. In: Proceedings of 2nd International Join Conference on Natural Language Processing(IJCNLP 2005), Jeju Island, Korea, pp. 19–24 (2005)Google Scholar
  17. 17.
    Shen, C., Li, T.: Multi-document summarization via the minimum dominating set. In: Proceedings of 23rd International Conference on Computational Linguistics (COLING 2010), Beijing, China, vol. 2, pp. 984–992 (2010)Google Scholar
  18. 18.
    Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of 9th conference on Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, vol. 4, pp. 404–411 (2004)Google Scholar
  19. 19.
    Le, Q., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of 31st International Conference on Machine Learning (ICML 2014), Beijing, China, pp. 1188–1196 (2014)Google Scholar
  20. 20.
    Erkan, G., Radev, D.R.: LexPageRank: Prestige in multi-document text summarization. In: Proceedings of 9th conference Empirical Methods in Natural Language Processing (EMNLP 2004), Barcelona, Spain, pp. 365–371 (2004)Google Scholar
  21. 21.
    Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of 1st Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (HLT-NAACL 2003), Edmonton, Canada, pp. 71–78 (2003)Google Scholar
  22. 22.
    Poria, S., Cambria, E., Gelbukh, A., Bisio, F., Hussain, A.: Sentiment data flow analysis by means of dynamic linguistic patterns. IEEE Comput. Intell. Mag. 10, 26–36 (2015)CrossRefGoogle Scholar
  23. 23.
    Majumder, N., Poria, S., Gelbukh, A., Cambria, E.: Deep learning based document modeling for personality detection from text. IEEE Intell. Syst. 32, 74–79 (2017)CrossRefGoogle Scholar
  24. 24.
    Poria, S., Cambria, E., Bajpai, R., Hussain, A.: A review of affective computing: From unimodal analysis to multimodal fusion. Inf. Fus. 37, 98–125 (2017)CrossRefGoogle Scholar
  25. 25.
    Poria, S., Peng, H., Hussain, A., Howard, N., Cambria, E.: Ensemble application of convolutional neural networks and multiple kernel learning for multimodal sentiment analysis. Neurocomputing, page in press (2017)Google Scholar
  26. 26.
    Chikersal, P., Poria, S., Cambria, E., Gelbukh, A., Siong, C.E.: Modelling public sentiment in twitter: using linguistic patterns to enhance supervised learning. In: Gelbukh, A. (ed.) CICLing 2015. LNCS, vol. 9042, pp. 49–65. Springer, Cham (2015). doi: 10.1007/978-3-319-18117-2_4 Google Scholar
  27. 27.
    Pakray, P., Neogi, S., Bhaskar, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: A textual entailment system using anaphora resolution. In: Text Analysis Conference, Recognizing Textual Entailment Track (TAC RTE), System Report. Notebook (2011)Google Scholar
  28. 28.
    Pakray, P., Poria, S., Bandyopadhyay, S., Gelbukh, A.: Semantic textual entailment recognition using UNL. POLIBITS 43, 23–27 (2011)CrossRefGoogle Scholar
  29. 29.
    Pakray, P., Pal, S., Poria, S., Bandyopadhyay, S., Gelbukh, A.: JU CSE TAC: Textual entailment recognition system at TAC RTE-6. In: Text Analysis Conference, Recognizing Textual Entailment Track (TAC RTE), System Report. Notebook (2010)Google Scholar
  30. 30.
    Poria, S., Cambria, E., Gelbukh, A.: Aspect extraction for opinion mining with a deep convolutional neural network. Knowl. Based Syst. 108, 42–49 (2016)CrossRefGoogle Scholar
  31. 31.
    Poria, S., Chaturvedi, I., Cambria, E., Hussain, A.: Convolutional MKL based multimodal emotion recognition and sentiment analysis. In: 16th International Conference on Data Mining (ICDM 2016), pp. 439–448. IEEE (2016)Google Scholar

Copyright information

© Springer International Publishing AG 2017

Authors and Affiliations

  1. 1.CICInstituto Politécnico NacionalMexico CityMexico

Personalised recommendations