Abstract
In this paper, we propose an Extractive Maximum Coverage KnaPsack (MCKP) based model for query-based multi document summarization which integrates three monotone and submodular measures to detect importance of a sentence including Coverage, Relevance, and Compression. We apply an efficient scalable greedy algorithm to generate a summary which has a near optimal solution when its scoring functions are monotone nondecreasing and submodular. We use DUC 2007 dataset to evaluate our proposed method and the result shows improvement over two closely related works.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
The way we find the semantic similarity between two vectors of words, representing the sentence and query is inspired by the maximum weighted matching problem in a bipartite graph.
- 2.
This sentence is from DUC 2007, topic D0701A.
- 3.
- 4.
ROUGE package is available at http://www.berouge.com.
References
Baxendale, P.B.: Machine-made index for technical literature: an experiment. IBM J. Res. Dev. 2(4), 354–361 (1958)
Berg-Kirkpatrick, T., Gillick, D., Klein, D.: Jointly learning to extract and compress. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 481–490. Association for Computational Linguistics (2011)
Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 335–336. ACM (1998)
Chali, Y.L., Hasan, S.A.: On the effectiveness of using sentence compression models for query-focused multi-document summarization. In: Proceedings of COLING 2012, pp. 457–474. Citeseer (2012)
Dang, H.T.: Overview of duc 2005. In: Proceedings of the Document Understanding Conference (2005)
Dasgupta, A., Kumar, R., Ravi, S.: Summarization through submodularity and dispersion. In: ACL, vol. 1, pp. 1014–1022 (2013)
Edmundson, H.P.: New methods in automatic extracting. J. ACM (JACM) 16(2), 264–285 (1969)
Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 457–479 (2004)
Filatova, E., Hatzivassiloglou, V.: A formal model for information selection in multi-sentence text extraction. In: Proceedings of the 20th international conference on Computational Linguistics, p. 397. Association for Computational Linguistics (2004)
Gillick, D., Favre, B.: A scalable global model for summarization. In: Proceedings of the Workshop on Integer Linear Programming for Natural Langauge Processing, pp. 10–18. Association for Computational Linguistics (2009)
Gillick, D., Favre, B., Hakkani-Tur, D.: The icsi summarization system at tac 2008. In: Proceedings of the Text Understanding Conference (2008)
Gillick, D., Favre, B., Hakkani-Tur, D., Bohnet, B., Liu, Y., Xie, S.: The icsi/utd summarization system at tac 2009. In: Proceedings of the Text Analysis Conference Workshop, Gaithersburg, MD, USA (2009)
Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: Proceedings of the 2000 NAACL-ANLP Workshop on Automatic summarization, vol. 4, pp. 40–48. Association for Computational Linguistics (2000)
Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing, pp. 310–315. Association for Computational Linguistics (2000)
Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004)
Lin, H., Bilmes, J.: Multi-document summarization via budgeted maximization of submodular functions. In: Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 912–920. Association for Computational Linguistics (2010)
Lin, H., Bilmes, J.: A class of submodular functions for document summarization. In: Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 510–520. Association for Computational Linguistics (2011)
Lin, H., Bilmes, J., Xie, S.: Graph-based submodular selection for extractive summarization. In: IEEE Workshop on Automatic Speech Recognition & Understanding, ASRU 2009, pp. 381–386. IEEE (2009)
Lovász, L.: Submodular functions and convexity. Mathematical Programming The State of the Art. Springer, Heidelberg (1983)
Luhn, H.P.: The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165 (1958)
Mani, I., Maybury, M.T.: Advances in automatic text summarization, vol. 293. MIT Press, Cambridge (1999)
McDonald, R.: A study of global inference algorithms in multi-document summarization. In: Amati, G., Carpineto, C., Romano, G. (eds.) ECiR 2007. LNCS, vol. 4425, pp. 557–564. Springer, Heidelberg (2007)
Morita, H., Sasano, R., Takamura, H., Okumura, M.: Subtree extractive summarization via submodular maximization. In: ACL, vol. 1, pp. 1023–1032 (2013)
Petrov, S., Klein, D.: Learning and inference for hierarchically split pcfgs. In: Proceedings of the National Conference on Artificial Intelligence, vol. 22, p. 1663. AAAI Press, MIT Press; Menlo Park, Cambridge, London (1999, 2007)
Pirró, G., Euzenat, J.: A feature and information theoretic framework for semantic similarity and relatedness. In: Patel-Schneider, P.F., Pan, Y., Hitzler, P., Mika, P., Zhang, L., Pan, J.Z., Horrocks, I., Glimm, B. (eds.) ISWC 2010, Part I. LNCS, vol. 6496, pp. 615–630. Springer, Heidelberg (2010)
Porter, M.F.: An algorithm for suffix stripping. Program 14(3), 130–137 (1980)
Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manage. 40(6), 919–938 (2004)
Schiffman, B., Nenkova, A., McKeown, K.: Experiments in multidocument summarization. In: Proceedings of the Second International Conference on Human Language Technology Research, pp. 52–58. Morgan Kaufmann Publishers Inc. (2002)
Sekine, S., Nobata, C.: Sentence extraction with information extraction technique. In: Proceedings of the Document Understanding Conference (2001)
Takamura, H., Okumura, M.: Text summarization model based on maximum coverage problem and its variant. In: Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics, pp. 781–789. Association for Computational Linguistics (2009)
Yih, W.-T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: IJCAI (2007)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Ghiyafeh Davoodi, F., Chali, Y. (2015). Semi-extractive Multi-document Summarization via Submodular Functions. In: Dediu, AH., MartÃn-Vide, C., Vicsi, K. (eds) Statistical Language and Speech Processing. SLSP 2015. Lecture Notes in Computer Science(), vol 9449. Springer, Cham. https://doi.org/10.1007/978-3-319-25789-1_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-25789-1_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-25788-4
Online ISBN: 978-3-319-25789-1
eBook Packages: Computer ScienceComputer Science (R0)