Skip to main content

Automatic Extractive Multi-document Summarization Based on Archetypal Analysis

  • Chapter
  • First Online:
Non-negative Matrix Factorization Techniques

Part of the book series: Signals and Communication Technology ((SCT))

Abstract

The applications of matrix factorization are an important tool for text summarization. In last years, several variations of the non-negative matrix factorization (NMF) methods have found their usage in multi-document summarization (MDS). For matrix factorization to work efficiently in MDS, it is essential to show the ability of selecting the most typical data points from the given data space. In the chapter, we first describe the archetypal analysis (AA) and its weighted version and then we present the AA-based document summarization method for the two most known summarization tasks, namely the general and the query-focused MDS. Archetypal analysis, also known as the convex NMF, in contrast to other NMF methods selects distinct (archetypal) sentences and therefore leads to variability and diversity in content of the generated summaries. We conducted experiments on the data of document understanding conference. Experimental results evidence the improvement of the proposed approach over other closely related methods including ones using the NMF.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. E. Canhasi, I. Kononenko, Multi-document summarization via archetypal analysis of the content-graph joint model. Knowl. Inf. Syst. 41(3), 821–842 (2014)

    Article  Google Scholar 

  2. E. Canhasi, I. Kononenko, Weighted archetypal analysis of the multi-element graph for query-focused multi-document summarization. Expert Syst. Appl. 41(2), 535–543 (2014)

    Article  Google Scholar 

  3. J. Steinberger, K. Ježek, Text summarization and singular value decomposition, Advances in Information Systems (Springer, Berlin, 2005), pp. 245–254

    Google Scholar 

  4. C.B. Lee, M.S. Kim, H.R. Park, Automatic summarization based on principal component analysis, Progress in Artificial Intelligence (Springer, Berlin, 2003), pp. 409–413

    Chapter  Google Scholar 

  5. J. Yeh, Text summarization using a trainable summarizer and latent semantic analysis. Inf. Process. Manag. 41(1), 75–95 (2005)

    Article  Google Scholar 

  6. J.-H. Lee, S. Park, C.M. Ahn, D. Kim, Automatic generic document summarization based on non-negative matrix factorization. Info. Process. Manag. 45(1), 20–34 (2009)

    Article  Google Scholar 

  7. D. Wang, T. Li, S. Zhu, C. Ding, Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization, in Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (ACM, 2008), pp. 307–314

    Google Scholar 

  8. L. Hennig, D. Labor, Topic-based multi-document summarization with probabilistic latent semantic analysis. Recent Advances in Natural Language Processing (RANLP) (2009)

    Google Scholar 

  9. Y. Ledeneva, R.G. Hernández, R.M. Soto, R.C. Reyes, A. Gelbukh, Em clustering algorithm for automatic text summarization, Advances in Artificial Intelligence (Springer, Berlin, 2011), pp. 305–315

    Chapter  Google Scholar 

  10. G. Erkan, D.R. Radev, Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22, 457–479 (2004)

    Google Scholar 

  11. R. Arora, B. Ravindran, Latent dirichlet allocation and singular value decomposition based multi-document summarization, in: Eighth IEEE International Conference on Data Mining, ICDM’08 (2008), pp. 713–718

    Google Scholar 

  12. S. Park, J.-H. Lee, C.-M. Ahn, J.S. Hong, S.-J. Chun, Query based summarization using non-negative matrix factorization, Knowledge-Based Intelligent Information and Engineering Systems (Springer, Berlin, 2006), pp. 84–89

    Chapter  Google Scholar 

  13. J. Otterbacher, G. Erkan, D.R. Radev, Biased lexrank: passage retrieval using random walks with question-based priors. Inf. Process. Manag. 45(1), 42–54 (2009)

    Article  Google Scholar 

  14. C. Bauckhage, C. Thurau, Making archetypal analysis practical, Pattern Recognition (Springer, Berlin, 2009), pp. 272–281

    Chapter  Google Scholar 

  15. M. Mørup, L.K. Hansen, Archetypal analysis for machine learning and data mining. Neurocomputing 80, 54–63 (2012)

    Article  Google Scholar 

  16. A. Cutler, L. Breiman, Archetypal analysis. Technometrics 36(4), 338–347 (1994)

    Article  MATH  MathSciNet  Google Scholar 

  17. M.J. Eugster, F. Leisch, Weighted and robust archetypal analysis. Comput. Stat. Data Anal. 55(3), 1215–1225 (2011)

    Article  MathSciNet  Google Scholar 

  18. P. Pentti, T. Unto, Positive matrix factorization: A non-negative factor model with optimal utilization of error estimates of data values. Env. Wiley Online Libr. 5(2), 111–126 (1994)

    Google Scholar 

  19. C.-Y. Lin, Rouge: A package for automatic evaluation of summaries, in Text Summarization Branches Out: Proceedings of the ACL-04 Workshop (2004), pp. 74–81

    Google Scholar 

  20. A. Khan, N. Salim, Y.J. Kumar, A framework for multi-document abstractive summarization based on semantic role labelling. Appl. Soft Comput. 30, 737–747 (2015)

    Article  Google Scholar 

  21. E. Canhasi, I. Kononenko. Semantic role frames graph-based multidocument summarization, in Proceedings SiKDD’11 (2011)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Ercan Canhasi .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Canhasi, E., Kononenko, I. (2016). Automatic Extractive Multi-document Summarization Based on Archetypal Analysis. In: Naik, G. (eds) Non-negative Matrix Factorization Techniques. Signals and Communication Technology. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-662-48331-2_3

Download citation

  • DOI: https://doi.org/10.1007/978-3-662-48331-2_3

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-662-48330-5

  • Online ISBN: 978-3-662-48331-2

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics