Leveraging Content Similarity in Summaries for Generating Better Ensembles

  • Parth MehtaEmail author
  • Prasenjit Majumder


Previously in Chap.  5 we described the technique to effectively aggregate rank lists by variation in sentence similarity, text representation and ranking algorithms. This was part of a larger family of Consensus-based summarisation systems, that democratically select common content from several candidate systems by taking into account the individual rankings of candidates. In this chapter, we highlight the significant limitations of consensus-based systems that rely only on sentence ranking and not on the actual content of the candidate summaries. Their inability to take into account relative performance of individual systems and overlooking content of candidate summaries in favour of the sentence rankings limits their performance in several cases. We suggest an alternate approach that can potentially overcome these limitations. We show how, in the absence of gold standard summaries, the candidates can act as pseudo-relevant summaries to estimate the performance of individual systems. We then use this information to generate a better aggregate. Experiments show that the proposed content-based aggregation system outperforms existing rank list based aggregation techniques by a large margin.



Adapted/Translated by permission from Springer Nature: Springer Nature, Advances in Information Retrieval, pages no. 787–793, Content Based Weighted Consensus Summarization, Parth Mehta and Prasenjit Majumder, Copyright (2018)


  1. 1.
    Conroy, J.M., Schlesinger, J.D., Goldstein, J., Oleary, D.P.: Left-brain/right-brain multi-document summarization. In: Proceedings of the Document Understanding Conference (DUC) (2004)Google Scholar
  2. 2.
    Davis, S.T., Conroy, J.M., Schlesinger, J.D.: Occams–an optimal combinatorial covering algorithm for multi-document summarization. In: 2012 IEEE 12th International Conference on Data Mining Workshops (ICDMW), pp. 454–463. IEEE (2012)Google Scholar
  3. 3.
    Erkan, G., Radev, D.R.: Lexrank: graph-based lexical centrality as salience in text summarization. J. Artif. Intell. Res. 22, 457–479 (2004)CrossRefGoogle Scholar
  4. 4.
    Gillick, D., Favre, B., Hakkani-Tür, D., Bohnet, B., Liu, Y., Xie, S.: The ICSI/UTD summarization system at TAC 2009. In: Proceedings of the Second Text Analysis Conference, TAC 2009, Gaithersburg, Maryland, USA, 16–17 Nov 2009 (2009).
  5. 5.
    Haghighi, A., Vanderwende, L.: Exploring content models for multi-document summarization. In: Proceedings of Human Language Technologies: the 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, pp. 362–370. Association for Computational Linguistics (2009)Google Scholar
  6. 6.
    Hong, K., Conroy, J.M., Favre, B., Kulesza, A., Lin, H., Nenkova, A.: A repository of state of the art and competitive baseline summaries for generic news summarization. In: Proceedings of Language Resources and Evaluation Conference, pp. 1608–1616 (2014)Google Scholar
  7. 7.
    Hong, K., Nenkova, A.: Improving the estimation of word importance for news multi-document summarization. In: Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pp. 712–721 (2014)Google Scholar
  8. 8.
    Kulesza, A., Taskar, B., et al.: Determinantal point processes for machine learning. Found. Trends® Mach. Learn. 5(2–3), 123–286 (2012)CrossRefGoogle Scholar
  9. 9.
    Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of the 18th Conference on Computational Linguistics, vol. 1, pp. 495–501. Association for Computational Linguistics (2000)Google Scholar
  10. 10.
    Lin, H., Bilmes, J.: Learning mixtures of submodular shells with application to document summarization. In: Proceedings of the Twenty-Eighth Conference on Uncertainty in Artificial Intelligence, pp. 479–490. AUAI Press (2012)Google Scholar
  11. 11.
    Mehta, P., Majumder, P.: Content based weighted consensus summarization. In: European Conference on Information Retrieval, pp. 787–793. Springer (2018)Google Scholar
  12. 12.
    Mehta, P., Majumder, P.: Exploiting Local and Global Performance of Candidate Systems for Aggregation of Summarization Techniques (2018). arXiv:1809.02343
  13. 13.
    Nenkova, A., Vanderwende, L., McKeown, K.: A compositional context sensitive multi-document summarizer: exploring the factors that influence summarization. In: Proceedings of the 29th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 573–580. ACM (2006)Google Scholar
  14. 14.
    Page, L., Brin, S., Motwani, R., Winograd, T.: The Pagerank Citation Ranking: Bringing Order to the Web. Technical report, Stanford InfoLab (1999)Google Scholar
  15. 15.
    Radev, D.R., Jing, H., Styś, M., Tam, D.: Centroid-based summarization of multiple documents. Inf. Process. Manag. 40(6), 919–938 (2004)CrossRefGoogle Scholar
  16. 16.
    Wang, D., Li, T.: Weighted consensus multi-document summarization. Inf. Process. Manag. 48(3), 513–523 (2012)CrossRefGoogle Scholar

Copyright information

© Springer Nature Singapore Pte Ltd. 2019

Authors and Affiliations

  1. 1.Information Retrieval and Language Processing LabDhirubhai Ambani Institute of Information and Communication TechnologyGandhinagarIndia

Personalised recommendations