Skip to main content

A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization

  • Chapter
  • First Online:
Multi-source, Multilingual Information Extraction and Summarization

Abstract

In Chap. 1, multi-document summarization is introduced as a potential solution to the information explosion problem. A major challenge in creating a summary from information extracted from multiple sources is to decide the order in which those information must be presented in the summary. Incorrect ordering of information selected from multiple sources would lead to misunderstandings. In this chapter, we discuss the challenges involved when ordering information selected from multiple sources and present several approaches to overcome those challenges. We also introduce several semi-automatic evaluation measures to empirically evaluate an ordering of sentences created by an algorithm.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 54.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Similar content being viewed by others

Notes

  1. 1.

    Using the frequencies of words instead of the binary (0, 1) values as vector elements, did not have a positive impact in our experiments. We think this is because, compared to a document, a sentence typically has a lesser number of words, and a word does not appear many times in a single sentence.

  2. 2.

    http://www.pascal-network.org

  3. 3.

    www.mturk.com

  4. 4.

    http://lr-www.pi.titech.ac.jp/tsc/tsc3-en.html

  5. 5.

    http://research.nii.ac.jp/ntcir/index-en.html

  6. 6.

    http://www.csie.ntu.edu.tw/~cjlin/libsvm/

References

  1. Barzilay, R., Lee, L.: Catching the drift: probabilistic content models, with applications to generation and summarization. In: HLT-NAACL 2004: Proceedings of the Main Conference, Boston, pp. 113–120 (2004)

    Google Scholar 

  2. Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)

    Google Scholar 

  3. Bollegala, D., Okazaki, N., Ishizuka, M.: A bottom-up approach to sentence ordering for multi-document summarization. Inf. Process. Manag. 46(1), 89–109 (2010)

    Google Scholar 

  4. Bos, J., Maekert, K.: Recognising textual entailment with logical inference. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP 2005), Vancouver, pp. 628–635 (2005)

    Google Scholar 

  5. Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retreival, Melbourne, pp. 335–336 (1998)

    Google Scholar 

  6. Dagan, I., Glickman, O.: Probabilistic textual entailment: generic applied modeling of language variability. In: Proceedings of PASCAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble (2004)

    Google Scholar 

  7. Duboue, P., McKeown, K.: Empirically estimating order constraints for content planning in generation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL’01), Toulouse, pp. 172–179 (2001)

    Google Scholar 

  8. Duboue, P., McKeown, K.: Content planner construction via evolutionary algorithms and a corpus-based fitness function. In: Proceedings of the Second International Natural Language Generation Conference (INLG’02), New York, pp. 89–96 (2002)

    Google Scholar 

  9. Elhadad, N., McKeown, K.: Towards generating patient specific summaries of medical articles. In: Proceedings of the NAACL 2001 Workshop on Automatic Summarization, Pittsburgh (2001)

    Google Scholar 

  10. Filatova, E., Hovy, E.: Assining time-stamps to event-clauses. In: Proceedings of the 2001 ACL Workshop on Temporal and Spatial Information Processing, Toulouse (2001)

    Google Scholar 

  11. Ji, P.D., Pulman, S.: Sentence ordering with manifold-based classification in multi-document summarization. In: Proceedings of Empherical Methods in Natural Language Processing, Sydney, pp. 526–533 (2006)

    Google Scholar 

  12. Karamanis, N., Manurung, H.M.: Stochastic text structuring using the principle of continuity. In: Proceedings of the Second International Natural Language Generation Conference (INLG’02). Columbia University, New York, pp. 81–88 (2002)

    Google Scholar 

  13. Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)

    Google Scholar 

  14. Lapata, M.: Probabilistic text structuring: experiments with sentence ordering. In: Proceedings of the Annual Meeting of ACL 2003, Sapporo, pp. 545–552 (2003)

    Google Scholar 

  15. Lapata, M.: Automatic evaluation of information ordering. Comput. Linguist. 32(4), 471–484 (2006)

    Google Scholar 

  16. Lapata, M., Lascarides, A.: Learning sentence-internal temporal relations. J. Artif. Intell. Res. 27, 85–117 (2006)

    Google Scholar 

  17. Lin, C., Hovy, E.: Neats:a multidocument summarizer. In: Proceedings of the Document Understanding Workshop (DUC) (2001)

    Google Scholar 

  18. Mani, I., Wilson, G.: Robust temporal processing of news. In: Proceedings of the 38th Annual Meeting of ACL (ACL 2000), Hong Kong, pp. 69–76 (2000)

    Google Scholar 

  19. Mani, I., Schiffman, B., Zhang, J.: Inferring temporal ordering of events in news. In: Proceedings of North American Chapter of the ACL on Human Language Technology (HLT-NAACL 2003), Edmonton, pp. 55–57 (2003)

    Google Scholar 

  20. Mann, W., Thompson, S.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)

    Google Scholar 

  21. McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., Eskin, E.: Towards multidocument summarization by reformulation: progress and prospects. In: AAAI/IAAI, Orlando, pp. 453–460 (1999)

    Google Scholar 

  22. Okazaki, N., Matsuo, Y., Ishizuka, M.: Improving chronological sentence ordering by precedence relation. In: Proceedings of 20th International Conference on Computational Linguistics (COLING 04), Geneva, pp. 750–756 (2004)

    Google Scholar 

  23. Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 311–318 (2002)

    Google Scholar 

  24. Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, J., et al. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT, Cambridge (2000)

    Google Scholar 

  25. Radev, D.R., McKeown, K.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1999)

    Google Scholar 

  26. Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge/New York (2000)

    Google Scholar 

  27. Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. In: EMNLP’08, Honolulu (2008)

    Google Scholar 

  28. Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)

    Google Scholar 

  29. Xia, F., Liu, T.Y., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank: theory and algorithm. In: ICML 2008, Helsinki, pp. 1192–1199 (2008)

    Google Scholar 

  30. Zanzotto, F.M., Moschitti, A.: Automatic learning of textual entailments with cross-pair similarities. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, pp. 401–408 (2006)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Danushka Bollegala .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2013 Springer-Verlag Berlin Heidelberg

About this chapter

Cite this chapter

Bollegala, D., Okazaki, N., Ishizuka, M. (2013). A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization. In: Poibeau, T., Saggion, H., Piskorski, J., Yangarber, R. (eds) Multi-source, Multilingual Information Extraction and Summarization. Theory and Applications of Natural Language Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-28569-1_12

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-28569-1_12

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-28568-4

  • Online ISBN: 978-3-642-28569-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics