Semi-extractive Multi-document Summarization via Submodular Functions

Ghiyafeh Davoodi, Fatemeh; Chali, Yllias

doi:10.1007/978-3-319-25789-1_10

Fatemeh Ghiyafeh Davoodi¹⁶ &
Yllias Chali¹⁶

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 9449))

Included in the following conference series:

International Conference on Statistical Language and Speech Processing

666 Accesses

Abstract

In this paper, we propose an Extractive Maximum Coverage KnaPsack (MCKP) based model for query-based multi document summarization which integrates three monotone and submodular measures to detect importance of a sentence including Coverage, Relevance, and Compression. We apply an efficient scalable greedy algorithm to generate a summary which has a near optimal solution when its scoring functions are monotone nondecreasing and submodular. We use DUC 2007 dataset to evaluate our proposed method and the result shows improvement over two closely related works.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
The way we find the semantic similarity between two vectors of words, representing the sentence and query is inspired by the maximum weighted matching problem in a bipartite graph.
2.
This sentence is from DUC 2007, topic D0701A.
3.
http://duc.nist.gov/.
4.
ROUGE package is available at http://www.berouge.com.

References

Baxendale, P.B.: Machine-made index for technical literature: an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)
Article Google Scholar
Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 481–490. Association for Computational Linguistics (2011)
Google Scholar
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998)
Google Scholar
Chali, Y.L., Hasan, S.A.: On the effectiveness of using sentence compression models for query-focused multi-document summarization. In: Proceedings of COLING 2012, pp. 457–474. Citeseer (2012)
Google Scholar
Dang, H.T.: Overview of duc 2005. In: Proceedings of the Document Understanding Conference (2005)
Google Scholar
Dasgupta, A., Kumar, R., Ravi, S.: Summarization through submodularity and dispersion. In: ACL, vol. 1, pp. 1014–1022 (2013)
Google Scholar
Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)
Article MATH Google Scholar
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 457–479 (2004)
Google Scholar
Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on Computational Linguistics, p. 397. Association for Computational Linguistics (2004)
Google Scholar
Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 10–18. Association for Computational Linguistics (2009)
Google Scholar
Gillick, D., Favre, B., Hakkani-Tur, D.: The icsi summarization system at tac 2008. In: Proceedings of the Text Understanding Conference (2008)
Google Scholar
Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., Xie, S.: The icsi/utd summarization system at tac 2009. In: Proceedings of the Text Analysis Conference Workshop, Gaithersburg, MD, USA (2009)
Google Scholar
Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, vol. 4, pp. 40–48. Association for Computational Linguistics (2000)
Google Scholar
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)
Google Scholar
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004)
Google Scholar
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 912–920. Association for Computational Linguistics (2010)
Google Scholar
Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 510–520. Association for Computational Linguistics (2011)
Google Scholar
Lin, H., Bilmes, J., Xie, S.: Graph-based submodular selection for extractive summarization. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 381–386. IEEE (2009)
Google Scholar
Lovász, L.: Submodular functions and convexity. Mathematical Programming The State of the Art. Springer, Heidelberg (1983)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Article MathSciNet Google Scholar
Mani, I., Maybury, M.T.: Advances in automatic text summarization, vol. 293. MIT Press, Cambridge (1999)
Google Scholar
McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)
Chapter Google Scholar
Morita, H., Sasano, R., Takamura, H., Okumura, M.: Subtree extractive summarization via submodular maximization. In: ACL, vol. 1, pp. 1023–1032 (2013)
Google Scholar
Petrov, S., Klein, D.: Learning and inference for hierarchically split pcfgs. In: Proceedings of the National Conference on Artificial Intelligence, vol. 22, p. 1663. AAAI Press, MIT Press; Menlo Park, Cambridge, London (1999, 2007)
Google Scholar
Pirró, G., Euzenat, J.: A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 615–630. Springer, Heidelberg (2010)
Chapter Google Scholar
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Article Google Scholar
Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004)
Article MATH Google Scholar
Schiffman, B., Nenkova, A., McKeown, K.: Experiments in multidocument summarization. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 52–58. Morgan Kaufmann Publishers Inc. (2002)
Google Scholar
Sekine, S., Nobata, C.: Sentence extraction with information extraction technique. In: Proceedings of the Document Understanding Conference (2001)
Google Scholar
Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789. Association for Computational Linguistics (2009)
Google Scholar
Yih, W.-T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: IJCAI (2007)
Google Scholar

Download references

Acknowledgments

Authors would like to thank Taylor Berg-Kirkpatrick [2] for kindly providing us valuable information and details of their work on compression features and also Giuseppe Pirro [25] for kindly providing us with their API for accessing WordNet.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Lethbridge, 4401 University Dr W, Lethbridge, Canada
Fatemeh Ghiyafeh Davoodi & Yllias Chali

Authors

Fatemeh Ghiyafeh Davoodi
View author publications
You can also search for this author in PubMed Google Scholar
Yllias Chali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fatemeh Ghiyafeh Davoodi .

Editor information

Editors and Affiliations

Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Adrian-Horia Dediu
Research Group on Mathematical Linguistic, Rovira i Virgili University, Tarragona, Spain
Carlos Martín-Vide
Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary
Klára Vicsi

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ghiyafeh Davoodi, F., Chali, Y. (2015). Semi-extractive Multi-document Summarization via Submodular Functions. In: Dediu, AH., Martín-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_10

Download citation

DOI: https://doi.org/10.1007/978-3-319-25789-1_10
Published: 17 November 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics