A Bottom-Up Approach to Sentence Ordering for Multi-Document Summarization

  • Danushka Bollegala
  • Naoaki Okazaki
  • Mitsuru Ishizuka
Part of the Theory and Applications of Natural Language Processing book series (NLP)


In Chap. 1, multi-document summarization is introduced as a potential solution to the information explosion problem. A major challenge in creating a summary from information extracted from multiple sources is to decide the order in which those information must be presented in the summary. Incorrect ordering of information selected from multiple sources would lead to misunderstandings. In this chapter, we discuss the challenges involved when ordering information selected from multiple sources and present several approaches to overcome those challenges. We also introduce several semi-automatic evaluation measures to empirically evaluate an ordering of sentences created by an algorithm.


Support Vector Machine Machine Translation Source Document Radial Basis Function Average Continuity 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. 1.
    Barzilay, R., Lee, L.: Catching the drift: probabilistic content models, with applications to generation and summarization. In: HLT-NAACL 2004: Proceedings of the Main Conference, Boston, pp. 113–120 (2004)Google Scholar
  2. 2.
    Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. 17, 35–55 (2002)Google Scholar
  3. 3.
    Bollegala, D., Okazaki, N., Ishizuka, M.: A bottom-up approach to sentence ordering for multi-document summarization. Inf. Process. Manag. 46(1), 89–109 (2010)Google Scholar
  4. 4.
    Bos, J., Maekert, K.: Recognising textual entailment with logical inference. In: Proceedings of the conference on Human Language Technology and Empirical Methods in Natural Language Processing (HLT-EMNLP 2005), Vancouver, pp. 628–635 (2005)Google Scholar
  5. 5.
    Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retreival, Melbourne, pp. 335–336 (1998)Google Scholar
  6. 6.
    Dagan, I., Glickman, O.: Probabilistic textual entailment: generic applied modeling of language variability. In: Proceedings of PASCAL Workshop on Learning Methods for Text Understanding and Mining, Grenoble (2004)Google Scholar
  7. 7.
    Duboue, P., McKeown, K.: Empirically estimating order constraints for content planning in generation. In: Proceedings of the 39th Annual Meeting of the Association for Computational Linguistics (ACL’01), Toulouse, pp. 172–179 (2001)Google Scholar
  8. 8.
    Duboue, P., McKeown, K.: Content planner construction via evolutionary algorithms and a corpus-based fitness function. In: Proceedings of the Second International Natural Language Generation Conference (INLG’02), New York, pp. 89–96 (2002)Google Scholar
  9. 9.
    Elhadad, N., McKeown, K.: Towards generating patient specific summaries of medical articles. In: Proceedings of the NAACL 2001 Workshop on Automatic Summarization, Pittsburgh (2001)Google Scholar
  10. 10.
    Filatova, E., Hovy, E.: Assining time-stamps to event-clauses. In: Proceedings of the 2001 ACL Workshop on Temporal and Spatial Information Processing, Toulouse (2001)Google Scholar
  11. 11.
    Ji, P.D., Pulman, S.: Sentence ordering with manifold-based classification in multi-document summarization. In: Proceedings of Empherical Methods in Natural Language Processing, Sydney, pp. 526–533 (2006)Google Scholar
  12. 12.
    Karamanis, N., Manurung, H.M.: Stochastic text structuring using the principle of continuity. In: Proceedings of the Second International Natural Language Generation Conference (INLG’02). Columbia University, New York, pp. 81–88 (2002)Google Scholar
  13. 13.
    Kendall, M.G.: A new measure of rank correlation. Biometrika 30, 81–93 (1938)Google Scholar
  14. 14.
    Lapata, M.: Probabilistic text structuring: experiments with sentence ordering. In: Proceedings of the Annual Meeting of ACL 2003, Sapporo, pp. 545–552 (2003)Google Scholar
  15. 15.
    Lapata, M.: Automatic evaluation of information ordering. Comput. Linguist. 32(4), 471–484 (2006)Google Scholar
  16. 16.
    Lapata, M., Lascarides, A.: Learning sentence-internal temporal relations. J. Artif. Intell. Res. 27, 85–117 (2006)Google Scholar
  17. 17.
    Lin, C., Hovy, E.: Neats:a multidocument summarizer. In: Proceedings of the Document Understanding Workshop (DUC) (2001)Google Scholar
  18. 18.
    Mani, I., Wilson, G.: Robust temporal processing of news. In: Proceedings of the 38th Annual Meeting of ACL (ACL 2000), Hong Kong, pp. 69–76 (2000)Google Scholar
  19. 19.
    Mani, I., Schiffman, B., Zhang, J.: Inferring temporal ordering of events in news. In: Proceedings of North American Chapter of the ACL on Human Language Technology (HLT-NAACL 2003), Edmonton, pp. 55–57 (2003)Google Scholar
  20. 20.
    Mann, W., Thompson, S.: Rhetorical structure theory: toward a functional theory of text organization. Text 8(3), 243–281 (1988)Google Scholar
  21. 21.
    McKeown, K., Klavans, J., Hatzivassiloglou, V., Barzilay, R., Eskin, E.: Towards multidocument summarization by reformulation: progress and prospects. In: AAAI/IAAI, Orlando, pp. 453–460 (1999)Google Scholar
  22. 22.
    Okazaki, N., Matsuo, Y., Ishizuka, M.: Improving chronological sentence ordering by precedence relation. In: Proceedings of 20th International Conference on Computational Linguistics (COLING 04), Geneva, pp. 750–756 (2004)Google Scholar
  23. 23.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: Bleu: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL), Philadelphia, pp. 311–318 (2002)Google Scholar
  24. 24.
    Platt, J.: Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In: Smola, J., et al. (eds.) Advances in Large Margin Classifiers, pp. 61–74. MIT, Cambridge (2000)Google Scholar
  25. 25.
    Radev, D.R., McKeown, K.: Generating natural language summaries from multiple on-line sources. Comput. Linguist. 24(3), 469–500 (1999)Google Scholar
  26. 26.
    Reiter, E., Dale, R.: Building Natural Language Generation Systems. Cambridge University Press, Cambridge/New York (2000)Google Scholar
  27. 27.
    Snow, R., O’Connor, B., Jurafsky, D., Ng, A.Y.: Cheap and fast – but is it good? evaluating non-expert annotations for natural language tasks. In: EMNLP’08, Honolulu (2008)Google Scholar
  28. 28.
    Tukey, J.W.: Exploratory Data Analysis. Addison-Wesley, Reading (1977)Google Scholar
  29. 29.
    Xia, F., Liu, T.Y., Wang, J., Zhang, W., Li, H.: Listwise approach to learning to rank: theory and algorithm. In: ICML 2008, Helsinki, pp. 1192–1199 (2008)Google Scholar
  30. 30.
    Zanzotto, F.M., Moschitti, A.: Automatic learning of textual entailments with cross-pair similarities. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, Sydney, pp. 401–408 (2006)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Danushka Bollegala
    • 1
  • Naoaki Okazaki
    • 2
  • Mitsuru Ishizuka
    • 3
  1. 1.Graduate School of Information Science and TechnologyThe University of TokyoBunkyo-kuJapan
  2. 2.Department of System Information Sciences, Graduate School of Information SciencesTohoku UniversityAoba-kuJapan
  3. 3.Department of Information and Communication Engineering, Graduate School of Information Science and TechnologyThe University of TokyoBunkyo-kuJapan

Personalised recommendations