Generating Update Summaries: Using an Unsupervized Clustering Algorithm to Cluster Sentences

Chapter
Part of the Theory and Applications of Natural Language Processing book series (NLP)

Abstract

This article presents a summarization system dedicated to update summarization. We first present the method on which this system is based, CBSEAS, and its adaptation to the update summarization task. Generating update summaries is a far more complicated task than generating “standard” summaries. We describe TAC 2009 “Update Task”, used to evaluate the system. This international evaluation campaign allowed us to compare our system to other automatic summarization systems. The results obtained were mixed: our system ranked among the first quarter for informational content, but only above average for linguistic quality.

Keywords

Informational Content Evaluation Campaign Anaphora Resolution Sentence Similarity Summarization System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

References

  1. 1.
    Barzilay, R., McKeown, K.R.: Sentence fusion for multidocument news summarization. Comput. Linguist. 31(3), 297–328 (2005)Google Scholar
  2. 2.
    Barzilay, R., Elhadad, N., McKeown, K.: Inferring strategies for sentence ordering in multidocument news summarization. J. Artif. Intell. Res. (JAIR) 17, 35–55 (2002)Google Scholar
  3. 3.
    Boros, E.P., Kantor, P.B., Neu, D.J.: A clustering based approach to creating multi-document summaries. In: Proceedings of the 4th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, New Orleans (2001)Google Scholar
  4. 4.
    Bossard, A., Rodrigues, C.: Combining a multi-document update summarization system – cbseas – with a genetic algorithm. Smart Innovation, Systems and Technologies. Springer (2011)Google Scholar
  5. 5.
    Bossard, A., Généreux, M., Poibeau, T.: Description of the lipn systems at tac2008: summarizing information and opinions. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)Google Scholar
  6. 6.
    Boudin, F., Torres-Moreno, J.-M., El-Bèze, M.: A scalable MMR approach to sentence scoring for multi-document update summarization. In: Proceedings of the 2008 COLING Conference, Manchester, pp. 21–24 (2008)Google Scholar
  7. 7.
    Carbonell, J., Goldstein, J.: The use of mmr, diversity-based reranking for reordering documents and producing summaries. In: SIGIR ’98: Proceedings of the 21st Annual International ACM SIGIR Conference, pp. 335–336. ACM, New York (1998)Google Scholar
  8. 8.
    Chowdary, C.R., Kumar, P.S.: Esum: an efficient system for query-specific multi-document summarization. In: Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval, ECIR ’09, pp. 724–728. Springer, Berlin/Heidelberg (2009)Google Scholar
  9. 9.
    Cunningham, H., Maynard, D., Bontcheva, K., Tablan, V.: GATE: a framework and graphical development environment for robust NLP tools and applications. In: Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics. Philadelphia (2002)Google Scholar
  10. 10.
    Dang, H.T., Owczarzak, K.: Overview of the TAC 2008 update summarization task. In: Notebook Papers and Results of TAC 2008, Gaithersburg, pp. 10–23 (2008)Google Scholar
  11. 11.
    Dang, H.T., Owczarzak, K.: Overview of the TAC 2009 update summarization task. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)Google Scholar
  12. 12.
    Davies, D.L., Bouldin, D.W.: A cluster separation measure. IEEE Trans. Pattern Anal. Mach. Intell. PAMI-1(2), 224–227 (1979)Google Scholar
  13. 13.
    de Loupy, C., Gu\(\acute{\text{ e}}\)gan, M., Ayache, C., Seng, S., Moreno, J.M.T.: A french human reference corpus for multi-document summarization and sentence compression. In: Proceedings of LREC’10, Valletta (2010)Google Scholar
  14. 14.
    Edmundson, H.P., Wyllys, R.E.: Automatic abstracting and indexing—survey and recommendations. Commun. ACM 4(5), 226–234 (1961)Google Scholar
  15. 15.
    Erkan, G., Radev, D.R.: Lexrank: graph-based centrality as salience in text summarization. J. Artif. Intell. Res. (JAIR) 22 (2004)Google Scholar
  16. 16.
    Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT, Cambridge (1998)Google Scholar
  17. 17.
    Galanis, D., Malakasiotis, P.: Aueb at tac 2008. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)Google Scholar
  18. 18.
    Genest, P.É., Lapalme, G., Yousfi-Monod, M.: Hextac: the creation of a manual extractive run. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)Google Scholar
  19. 19.
    Goldstein, J., Mittal, V., Carbonell, J., Kantrowitz, M.: Multi-document summarization by sentence extraction. In: NAACL-ANLP 2000 Workshop on Automatic Summarization, vol. 4, pp. 40–48. Association for Computational Linguistics, Morristown (2000)Google Scholar
  20. 20.
    He, R., Liu, Y., Qin, B., Liu, T., Li, S.: Hitir’s update summary at tac 2008: extractive content selection for language independence. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)Google Scholar
  21. 21.
    He, T., Chen, J., Gui, Z., Li, F.: Ccnu at tac 2008: proceeding on using semantic method for automated summarization yield. In: Notebook Papers and Results of TAC 2008, Gaithersburg (2008)Google Scholar
  22. 22.
    Ji, P.: Multi-document summarization based on unsupervised clustering. In: Ng, H., Leong, M.K., Kan, M.Y., Ji, D. (eds.) Information Retrieval Technology. Lecture Notes in Computer Science, vol. 4182, pp. 560–566. Springer Berlin/Heidelberg (2006)Google Scholar
  23. 23.
    Jiang, J.J., Conrath, D.W.: Semantic similarity based on corpus statistics and lexical taxonomy. In: International Conference Research on Computational Linguistics (ROCLING X), Taiwan (1997)Google Scholar
  24. 24.
    Kupiec, J., Pedersen, J., Chen, F.: A trainable document summarizer. In: SIGIR ’95: Proceedings of the 18th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 68–73. ACM, New York (1995). DOI http://doi.acm.org/10.1145/215206.215333
  25. 25.
    Likas, A., Vlassis, N., , Verbeek, J.: The global k-means clustering algorithm. Pattern Recognit. 36, 451–461 (2001)Google Scholar
  26. 26.
    Lin, C.Y.: Rouge: a package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), Barcelona (2004)Google Scholar
  27. 27.
    Lin, Z., Hoang, H.H., Qiu, L., Ye, S., Kan, M.Y.: NUS at TAC 2008: augmenting timestamped graphs with event information and selectively expanding opinion contexts. In: Proceedings of TAC 2008 Workshop on Automatic Summarization, Gaithersburg (2008)Google Scholar
  28. 28.
    Luhn, H.: The automatic creation of literature abstracts. IBM J. 2(2), 159–165 (1958)Google Scholar
  29. 29.
    MacQueen, J.: Some methods for classification and analysis of multivariate observations. In: Le Cam, L.M., Neyman, J. (eds.) Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, vol. 1, Statistics. University of California Press, Berkeley (1967)Google Scholar
  30. 30.
    Marcu, D.: Improving summarization through rhetorical parsing tuning (1998)Google Scholar
  31. 31.
    Nenkova, A., Passonneau, R.J., McKeown, K.: The pyramid method: incorporating human content selection variation in summarization evaluation. TSLP 4(2) (2007)Google Scholar
  32. 32.
    Radev, D., Winkel, A., Topper, M.: Multi document centroid-based text summarization. In: Proceedings of the ACL 2002 Demo Session, Philadelphia (2002)Google Scholar
  33. 33.
    Ribeiro, R., de Matos, D.M.: Extractive summarization of broadcast news: comparing strategies for european portuguese. In: Proceedings of the 10th International Conference on Text, Speech and Dialogue, TSD’07, pp. 115–122. Springer, Berlin/Heidelberg (2007)Google Scholar
  34. 34.
    Saggion, H., Gaizauskas, R.: Multi-document summarization by cluster/profile relevance and redundancy removal. In: Proceedings of the Document Understanding Conference 2004. NIST (2004)Google Scholar
  35. 35.
    Schmid, H.: Probabilistic part-of-speech tagging using decision trees. In: Proceedings of the International Conference on New Methods in Language Processing, Manchester (1994)Google Scholar
  36. 36.
    Varma, V., Bysani, P., Bharat, K.R.V., Kovelamudi, S., GSK, S., Kumar, K., Maganti, N.: Iit hyderabad at tac 2009. In: Notebook Papers and Results of TAC 2009, Gaithersburg (2009)Google Scholar
  37. 37.
    Wang, Y.W.: Sentence Ordering for Multi-Document Summarization in Response to Multiple queries. B.Sc, Northeastern University (2002)Google Scholar
  38. 38.
    Wang, B., Liu, B., Sun, C., Wang, X., Li, B.: Adaptive maximum marginal relevance based multi-email summarization. In: Proceedings of the International Conference on Artificial Intelligence and Computational Intelligence, AICI ’09, pp. 417–424. Springer, Berlin/Heidelberg (2009)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  1. 1.Laboratoire d’Informatique de Paris-Nord (UMR 7030, CNRS et U. Paris 13)VilletaneuseFrance

Personalised recommendations