Skip to main content

Semi-extractive Multi-document Summarization via Submodular Functions

  • Conference paper
  • First Online:
Statistical Language and Speech Processing (SLSP 2015)

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Included in the following conference series:

  • 666 Accesses

Abstract

In this paper, we propose an Extractive Maximum Coverage KnaPsack (MCKP) based model for query-based multi document summarization which integrates three monotone and submodular measures to detect importance of a sentence including Coverage, Relevance, and Compression. We apply an efficient scalable greedy algorithm to generate a summary which has a near optimal solution when its scoring functions are monotone nondecreasing and submodular. We use DUC 2007 dataset to evaluate our proposed method and the result shows improvement over two closely related works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    The way we find the semantic similarity between two vectors of words, representing the sentence and query is inspired by the maximum weighted matching problem in a bipartite graph.

  2. 2.

    This sentence is from DUC 2007, topic D0701A.

  3. 3.

    http://duc.nist.gov/.

  4. 4.

    ROUGE package is available at http://www.berouge.com.

References

  1. Baxendale, P.B.: Machine-made index for technical literature: an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)

    Article  Google Scholar 

  2. Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 481–490. Association for Computational Linguistics (2011)

    Google Scholar 

  3. Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998)

    Google Scholar 

  4. Chali, Y.L., Hasan, S.A.: On the effectiveness of using sentence compression models for query-focused multi-document summarization. In: Proceedings of COLING 2012, pp. 457–474. Citeseer (2012)

    Google Scholar 

  5. Dang, H.T.: Overview of duc 2005. In: Proceedings of the Document Understanding Conference (2005)

    Google Scholar 

  6. Dasgupta, A., Kumar, R., Ravi, S.: Summarization through submodularity and dispersion. In: ACL, vol. 1, pp. 1014–1022 (2013)

    Google Scholar 

  7. Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)

    Article  MATH  Google Scholar 

  8. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 457–479 (2004)

    Google Scholar 

  9. Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on Computational Linguistics, p. 397. Association for Computational Linguistics (2004)

    Google Scholar 

  10. Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 10–18. Association for Computational Linguistics (2009)

    Google Scholar 

  11. Gillick, D., Favre, B., Hakkani-Tur, D.: The icsi summarization system at tac 2008. In: Proceedings of the Text Understanding Conference (2008)

    Google Scholar 

  12. Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., Xie, S.: The icsi/utd summarization system at tac 2009. In: Proceedings of the Text Analysis Conference Workshop, Gaithersburg, MD, USA (2009)

    Google Scholar 

  13. Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, vol. 4, pp. 40–48. Association for Computational Linguistics (2000)

    Google Scholar 

  14. Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)

    Google Scholar 

  15. Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004)

    Google Scholar 

  16. Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 912–920. Association for Computational Linguistics (2010)

    Google Scholar 

  17. Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 510–520. Association for Computational Linguistics (2011)

    Google Scholar 

  18. Lin, H., Bilmes, J., Xie, S.: Graph-based submodular selection for extractive summarization. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 381–386. IEEE (2009)

    Google Scholar 

  19. Lovász, L.: Submodular functions and convexity. Mathematical Programming The State of the Art. Springer, Heidelberg (1983)

    Google Scholar 

  20. Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)

    Article  MathSciNet  Google Scholar 

  21. Mani, I., Maybury, M.T.: Advances in automatic text summarization, vol. 293. MIT Press, Cambridge (1999)

    Google Scholar 

  22. McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)

    Chapter  Google Scholar 

  23. Morita, H., Sasano, R., Takamura, H., Okumura, M.: Subtree extractive summarization via submodular maximization. In: ACL, vol. 1, pp. 1023–1032 (2013)

    Google Scholar 

  24. Petrov, S., Klein, D.: Learning and inference for hierarchically split pcfgs. In: Proceedings of the National Conference on Artificial Intelligence, vol. 22, p. 1663. AAAI Press, MIT Press; Menlo Park, Cambridge, London (1999, 2007)

    Google Scholar 

  25. Pirró, G., Euzenat, J.: A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 615–630. Springer, Heidelberg (2010)

    Chapter  Google Scholar 

  26. Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)

    Article  Google Scholar 

  27. Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004)

    Article  MATH  Google Scholar 

  28. Schiffman, B., Nenkova, A., McKeown, K.: Experiments in multidocument summarization. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 52–58. Morgan Kaufmann Publishers Inc. (2002)

    Google Scholar 

  29. Sekine, S., Nobata, C.: Sentence extraction with information extraction technique. In: Proceedings of the Document Understanding Conference (2001)

    Google Scholar 

  30. Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789. Association for Computational Linguistics (2009)

    Google Scholar 

  31. Yih, W.-T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: IJCAI (2007)

    Google Scholar 

Download references

Acknowledgments

Authors would like to thank Taylor Berg-Kirkpatrick [2] for kindly providing us valuable information and details of their work on compression features and also Giuseppe Pirro [25] for kindly providing us with their API for accessing WordNet.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Fatemeh Ghiyafeh Davoodi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2015 Springer International Publishing Switzerland

About this paper

Cite this paper

Ghiyafeh Davoodi, F., Chali, Y. (2015). Semi-extractive Multi-document Summarization via Submodular Functions. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_10

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-25789-1_10

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-25788-4

  • Online ISBN: 978-3-319-25789-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics