Skip to main content

Improving Sentence Extraction Through Rank Aggregation

  • Chapter
  • First Online:
From Extractive to Abstractive Summarization: A Journey

Abstract

A plethora of extractive summarisation techniques have been developed in the past decade, but very few enquiries have been made as to how these differ from each other or what factors affect these systems. Such meaningful comparison if available can be used to create a robust ensemble of these approaches, which has the possibility to consistently outperform each individual summarisation system. In this chapter we examine the roles of three principle components of an extractive summarisation technique: sentence ranking algorithm, sentence similarity metric and text representation scheme. We show that using a combination of several different sentence similarity measures, rather than choosing any particular measure, significantly improves performance of the resultant meta-system. Even simple ensemble techniques, when used in an informed manner, prove to be very effective in improving the overall performance and consistency of summarisation systems. While aggregating multiple ranking algorithms or text similarity measures, though the improvement in ROUGE score is not always significant, the resultant meta-systems are more robust than candidate systems. The results suggest that, when proposing a sentence extraction technique, defining better sentence similarity metrics would be more impactful than a new ranking algorithm. Also using multiple sentence similarity scores and ranking algorithms in favour of a particular combination always results in an improved and robust performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 109.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    ROUGE-1.5.5.pl -n 4 -m -a -x -l 100 -c 95 -r 1000 -f A -p 0.5 -t 0.

  2. 2.

    https://tartus.org/martin/PorterStemmer.

  3. 3.

    http://snowball.tartus.org/algorithms/english/stop.txt.

References

  1. Campos, R., Mangaravite, V., Pasquali, A., Jorge, A.M., Nunes, C., Jatowt, A.: A text feature based automatic keyword extraction method for single documents. In: European Conference on Information Retrieval, pp. 684–691. Springer (2018)

    Google Scholar 

  2. Cohn, T.A., Lapata, M.: Sentence compression as tree transduction. J. Artif. Intell. Res. 34, 637–674 (2009)

    Article  Google Scholar 

  3. Dang, H.T.: Overview of duc 2005. Proc. Doc. Underst. Conf. 2005, 1–12 (2005)

    Google Scholar 

  4. Dumais, S., Furnas, G., Landauer, T., Deerwester, S., Deerwester, S., et al.: Latent semantic indexing. In: Proceedings of the Text Retrieval Conference (1995)

    Google Scholar 

  5. Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 457–479, (2004)

    Google Scholar 

  6. Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009)

    Google Scholar 

  7. Hong, K., Conroy, J.M., Favre, B., Kulesza, A., Lin, H., Nenkova, A.: A repository of state of the art and competitive baseline summaries for generic news summarization. In: Proceedings of the Ninth International Conference on Language Resources and Evaluation, LREC 2014, Reykjavik, Iceland, 26–31 May 2014, pp. 1608–1616 (2014). http://www.lrec-conf.org/proceedings/lrec2014/summaries/1093.html

  8. Hong, K., Marcus, M., Nenkova, A.: System combination for multi-document summarization. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, pp. 107–117. Association for Computational Linguistics, Lisbon, Portugal (2015)

    Google Scholar 

  9. Kulesza, A., Taskar, B., et al.: Determinantal point processes for machine learning. Found. Trends® Mach. Learn. 5(2–3), 123–286 (2012)

    Article  Google Scholar 

  10. Lin, C.Y.: Rouge: A package for automatic evaluation of summaries. In: Text Summarization Branches Out: Proceedings of the ACL-04 Workshop, pp. 74–81 (2004)

    Google Scholar 

  11. Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of the 18th conference on Computational linguistics, vol. 1, pp. 495–501. Association for Computational Linguistics (2000)

    Google Scholar 

  12. Lin, H., Bilmes, J.: Learning mixtures of submodular shells with application to document summarization. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pp. 479–490. AUAI Press (2012)

    Google Scholar 

  13. Mandal, A., Ghosh, K., Pal, A., Ghosh, S.: Automatic catchphrase identification from legal court case documents. In: Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, pp. 2187–2190. ACM (2017)

    Google Scholar 

  14. Mehta, P., Majumder, P.: Effective aggregation of various summarization techniques. Inf. Process. Manag. 54(2), 145–158 (2018)

    Article  Google Scholar 

  15. Mihalcea, R., Tarau, P.: Textrank: bringing order into text. In: Proceedings of the 2004 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics (2004)

    Google Scholar 

  16. Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

    Google Scholar 

  17. Mogren, O., Kågebäck, M., Dubhashi, D.: Extractive summarization by aggregating multiple similarities. In: Proceedings of Recent Advances In Natural Language Processing, pp. 451–457 (2015)

    Google Scholar 

  18. Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 573–580. ACM (2006)

    Google Scholar 

  19. Owczarzak, K., Conroy, J.M., Dang, H.T., Nenkova, A.: An assessment of the accuracy of automatic evaluation in summarization. In: Proceedings of Workshop on Evaluation Metrics and System Comparison for Automatic Summarization, pp. 1–9. Association for Computational Linguistics (2012)

    Google Scholar 

  20. Owczarzak, K., Dang, H.T.: Overview of the tac 2011 summarization track: Guided task and aesop task. In: Proceedings of the Text Analysis Conference (TAC 2011), Gaithersburg, Maryland, USA (2011)

    Google Scholar 

  21. Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford InfoLab (1999)

    Google Scholar 

  22. Pei, Y., Yin, W., Fan, Q., Huang, L.: A supervised aggregation framework for multi-document summarization. In: Proceedings of 24th International Conference on Computational Linguistics: Technical Papers, pp. 2225–2242 (2012)

    Google Scholar 

  23. Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)

    Article  Google Scholar 

  24. Steinberger, J.: Using latent semantic analysis in text summarization and summary evaluation. In: Proceedings of ISIM04, pp. 93–100 (2004)

    Google Scholar 

  25. Voorhees, E.M.: The trec robust retrieval track. ACM SIGIR Forum 39(1), 11–20 (2005)

    Article  Google Scholar 

  26. Wang, D., Li, T.: Weighted consensus multi-document summarization. Inf. Process. Manag. 48(3), 513–523 (2012)

    Article  Google Scholar 

Download references

Acknowledgements

Adapted/Translated by permission from Elsevier: Elsevier, Information processing and management, vol 54/2, pages no. 145–158, Effective aggregation of various summarisation techniques, Parth Mehta and Prasenjit Majumder, Copyright (2018).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Parth Mehta .

Rights and permissions

Reprints and permissions

Copyright information

© 2019 Springer Nature Singapore Pte Ltd.

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Mehta, P., Majumder, P. (2019). Improving Sentence Extraction Through Rank Aggregation. In: From Extractive to Abstractive Summarization: A Journey. Springer, Singapore. https://doi.org/10.1007/978-981-13-8934-4_5

Download citation

  • DOI: https://doi.org/10.1007/978-981-13-8934-4_5

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-13-8933-7

  • Online ISBN: 978-981-13-8934-4

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics