Automatic Extractive Multi-document Summarization Based on Archetypal Analysis

Canhasi, Ercan; Kononenko, Igor

doi:10.1007/978-3-662-48331-2_3

Ercan Canhasi² &
Igor Kononenko²

Part of the book series: Signals and Communication Technology ((SCT))

1807 Accesses
2 Citations

Abstract

The applications of matrix factorization are an important tool for text summarization. In last years, several variations of the non-negative matrix factorization (NMF) methods have found their usage in multi-document summarization (MDS). For matrix factorization to work efficiently in MDS, it is essential to show the ability of selecting the most typical data points from the given data space. In the chapter, we first describe the archetypal analysis (AA) and its weighted version and then we present the AA-based document summarization method for the two most known summarization tasks, namely the general and the query-focused MDS. Archetypal analysis, also known as the convex NMF, in contrast to other NMF methods selects distinct (archetypal) sentences and therefore leads to variability and diversity in content of the generated summaries. We conducted experiments on the data of document understanding conference. Experimental results evidence the improvement of the proposed approach over other closely related methods including ones using the NMF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Hardcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

E. Canhasi, I. Kononenko, Multi-document summarization via archetypal analysis of the content-graph joint model. Knowl. Inf. Syst. 41(3), 821–842 (2014)
Article Google Scholar
E. Canhasi, I. Kononenko, Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst. Appl. 41(2), 535–543 (2014)
Article Google Scholar
J. Steinberger, K. Ježek, Text summarization and singular value decomposition, Advances in Information Systems (Springer, Berlin, 2005), pp. 245–254
Google Scholar
C.B. Lee, M.S. Kim, H.R. Park, Automatic summarization based on principal component analysis, Progress in Artificial Intelligence (Springer, Berlin, 2003), pp. 409–413
Chapter Google Scholar
J. Yeh, Text summarization using a trainable summarizer and latent semantic analysis. Inf. Process. Manag. 41(1), 75–95 (2005)
Article Google Scholar
J.-H. Lee, S. Park, C.M. Ahn, D. Kim, Automatic generic document summarization based on non-negative matrix factorization. Info. Process. Manag. 45(1), 20–34 (2009)
Article Google Scholar
D. Wang, T. Li, S. Zhu, C. Ding, Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization, in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, 2008), pp. 307–314
Google Scholar
L. Hennig, D. Labor, Topic-based multi-document summarization with probabilistic latent semantic analysis. Recent Advances in Natural Language Processing (RANLP) (2009)
Google Scholar
Y. Ledeneva, R.G. Hernández, R.M. Soto, R.C. Reyes, A. Gelbukh, Em clustering algorithm for automatic text summarization, Advances in Artificial Intelligence (Springer, Berlin, 2011), pp. 305–315
Chapter Google Scholar
G. Erkan, D.R. Radev, Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22, 457–479 (2004)
Google Scholar
R. Arora, B. Ravindran, Latent dirichlet allocation and singular value decomposition based multi-document summarization, in: Eighth IEEE International Conference on Data Mining, ICDM’08 (2008), pp. 713–718
Google Scholar
S. Park, J.-H. Lee, C.-M. Ahn, J.S. Hong, S.-J. Chun, Query based summarization using non-negative matrix factorization, Knowledge-Based Intelligent Information and Engineering Systems (Springer, Berlin, 2006), pp. 84–89
Chapter Google Scholar
J. Otterbacher, G. Erkan, D.R. Radev, Biased lexrank: passage retrieval using random walks with question-based priors. Inf. Process. Manag. 45(1), 42–54 (2009)
Article Google Scholar
C. Bauckhage, C. Thurau, Making archetypal analysis practical, Pattern Recognition (Springer, Berlin, 2009), pp. 272–281
Chapter Google Scholar
M. Mørup, L.K. Hansen, Archetypal analysis for machine learning and data mining. Neurocomputing 80, 54–63 (2012)
Article Google Scholar
A. Cutler, L. Breiman, Archetypal analysis. Technometrics 36(4), 338–347 (1994)
Article MATH MathSciNet Google Scholar
M.J. Eugster, F. Leisch, Weighted and robust archetypal analysis. Comput. Stat. Data Anal. 55(3), 1215–1225 (2011)
Article MathSciNet Google Scholar
P. Pentti, T. Unto, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Env. Wiley Online Libr. 5(2), 111–126 (1994)
Google Scholar
C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in Text Summarization Branches Out: Proceedings of the ACL-04 Workshop (2004), pp. 74–81
Google Scholar
A. Khan, N. Salim, Y.J. Kumar, A framework for multi-document abstractive summarization based on semantic role labelling. Appl. Soft Comput. 30, 737–747 (2015)
Article Google Scholar
E. Canhasi, I. Kononenko. Semantic role frames graph-based multidocument summarization, in Proceedings SiKDD’11 (2011)
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Computer and Information Science, University of Ljubljana, Tržaška cesta 25, 1000, Ljubljana, Slovenia
Ercan Canhasi & Igor Kononenko

Authors

Ercan Canhasi
View author publications
You can also search for this author in PubMed Google Scholar
Igor Kononenko
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ercan Canhasi .

Editor information

Editors and Affiliations

University of Technology Sydney, Sydney, New South Wales, Australia
Ganesh R. Naik

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Canhasi, E., Kononenko, I. (2016). Automatic Extractive Multi-document Summarization Based on Archetypal Analysis. In: Naik, G. (eds) Non-negative Matrix Factorization Techniques. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48331-2_3

Download citation

DOI: https://doi.org/10.1007/978-3-662-48331-2_3
Published: 26 September 2015
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-662-48330-5
Online ISBN: 978-3-662-48331-2
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics