Extractive Text Summarization: Can We Use the Same Techniques for Any Text?

  • Tatiana Vodolazova
  • Elena Lloret
  • Rafael Muñoz
  • Manuel Palomar
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7934)

Abstract

In this paper we address two issues. The first one analyzes whether the performance of a text summarization method depends on the topic of a document. The second one is concerned with how certain linguistic properties of a text may affect the performance of a number of automatic text summarization methods. For this we consider semantic analysis methods, such as textual entailment and anaphora resolution, and we study how they are related to proper noun, pronoun and noun ratios calculated over original documents that are grouped into related topics. Given the obtained results, we can conclude that although our first hypothesis is not supported, since it has been found no evident relationship between the topic of a document and the performance of the methods employed, adapting summarization systems to the linguistic properties of input documents benefits the process of summarization.

Keywords

text summarization textual entailment anaphora resolution 

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: a survey. Artificial Intelligence in Medicine 33, 157–177 (2005)CrossRefGoogle Scholar
  2. 2.
    Amini, M.-R., Gallinari, P.: The Use of Unlabeled Data to Improve Supervised Learning for Text Summarization. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2002, p. 105. ACM Press, New York (2002)CrossRefGoogle Scholar
  3. 3.
    Ceylan, H., Mihalcea, R., Öyertem, U., Lloret, E., Palomar, M.: Quantifying the Limits and Success of Extractive Summarization Systems Across Domains. In: Human Language Technologies, pp. 903–911. Association for Computational Linguistics, Stroudsburg (2010)Google Scholar
  4. 4.
    Chuang, W.T., Yang, J.: Text Summarization by Sentence Segment Extraction Using Machine Learning Algorithms. In: Terano, T., Liu, H., Chen, A.L.P. (eds.) PAKDD 2000. LNCS, vol. 1805, pp. 454–457. Springer, Heidelberg (2000)CrossRefGoogle Scholar
  5. 5.
    Edmunson, H.: New methods in automatic extracting. Journal of the ACM 16(2), 264–285 (1969)CrossRefGoogle Scholar
  6. 6.
    Elhadad, N., McKeown, K., Kaufman, D., Jordan, D.: Facilitating physicians access to information via tailored text summarization. In: AMIA Annual Symposium, pp. 226–230 (2005)Google Scholar
  7. 7.
    Elhadad, N., Kan, M.-Y., Klavans, J.L., McKeown, K.R.: Customization in a Unified Framework for Summarizing Medical Literature. In: Artificial Intelligence in Medicine, vol. 33, pp. 179–198 (2005)Google Scholar
  8. 8.
    Filippova, K., Mieskes, M., Nastase, V.: Cascaded Filtering for Topic-Driven Multi-Document Summarization. In: Proceedings of the Document Understanding Conference, Rochester, N.Y., pp. 30–35 (2007)Google Scholar
  9. 9.
    Galley, M.: Automatic Summarization of Conversational Multi-Party Speech. In: The Twenty-First National Conference on Artificial Intelligence and the Eighteenth Innovative Applications of Artificial Intelligence Conference, pp. 1914–1915. AAAI Press, Boston (2006)Google Scholar
  10. 10.
    Gong, Y., Liu, X.: Generic text summarization using relevance measure and latent semantic analysis. In: Proceedings of the 24th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2001), pp. 19–25. ACM Press, New York (2001)CrossRefGoogle Scholar
  11. 11.
    Hu, M., Sun, A., Lim, E.: Comments-Oriented Blog Summarization by Sentence. In: Proceedings of the 16th ACM Conference on Conference on Information and Knowledge Management, pp. 901–904. Association for Computational Linguistics, New York (2007)CrossRefGoogle Scholar
  12. 12.
    Kazantseva, A.: Automatic Summarization of Short Fiction, Master thesis (2006), http://www.site.uottawa.ca/~ankazant/pubs/thesis_corrected_18_12_06_OK.pdf
  13. 13.
    Lee, D.: Genres, registers, text types, domains and styles: clarifying the concepts and navigating a path through the BNC jungle. Language and Computers 5, 37–72 (2002)Google Scholar
  14. 14.
    Lin, C.-Y.: ROUGE: A Package for Automatic Evaluation of Summaries. In: Proceedings of the Workshop on Text Summarization, p. 89 (2004)Google Scholar
  15. 15.
    Lloret, E., Ferrández, O., Muñoz, R., Palomar, M.: A Text Summarization Approach Under the Influence of Textual Entailment. In: 5th International Workshop on NLPCS, pp. 22–31 (2008)Google Scholar
  16. 16.
    Lloret, L., Palomar, M.: A Gradual Combination of Features for Building Automatic Summarisation Systems. In: Proceedings of the 12th International Conference on Text, Speech and Dialogue (TSD), Pilsen, Czech Republic, pp. 16–23 (2009)Google Scholar
  17. 17.
    Luhn, H.P.: The Automatic Creation of Literature Abstracts. IBM Journal of Research and Development 2(2), 157–165 (1958)CrossRefGoogle Scholar
  18. 18.
    McKeown, K., Hirschberg, J., Galley, M., Maskey, S.: From Text to Speech Summarization. In: International Conference on Acoustics, Speech, and Signal Processing, pp. 997–1000. IEEE, Philadelphia (2005)Google Scholar
  19. 19.
    Mihalcea, R., Ceylan, H.: Explorations in Automatic Book Summarization. In: Proceedings of the 2007 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning (EMNLP-CoNLL), pp. 380–389 (2007)Google Scholar
  20. 20.
    Muresan, S., Tzoukermann, E., Klavans, J.L.: Combining Linguistic and Machine Learning Techniques for Email Summarization. In: Proceedings of the 2001 Workshop on Computational Natural Language Learning (ConLL 2001). Association for Computational Linguistics, Stroudsburg (2001)Google Scholar
  21. 21.
    Nenkova, A., Chae, J., Louis, A., Pitler, E.: Empirical Methods in Natural Language Generation. Springer, Heidelberg (2010)Google Scholar
  22. 22.
    Nenkova, A.: Automatic Summarization. Foundations and Trends in Information Retrieval 5, 103–233 (2011)CrossRefGoogle Scholar
  23. 23.
    Nenkova, A., Bagga, A.: Facilitating Email Thread Access by Extractive Summary Generation. In: Nicolov, N., Bontcheva, K., Angelova, G., Mitkov, R. (eds.) Recent Advances in Natural Language Processing III, Selected Papers from RANLP 2003, pp. 287–296. John Benjamins, Amsterdam (2003)Google Scholar
  24. 24.
    Plaza, L., Díaz, A.: Using Semantic Graphs and Word Sense Disambiguation. Techniques to Improve Text Summarization. Procesamiento del Lenguaje Natural 47, 97–105 (2011)Google Scholar
  25. 25.
    Saggion, H.: Topic-based Summarization at DUC 2005. In: Proceedings of the Document Understanding Workshop, Vancouver, B.C., Canada, pp. 1–6 (2005)Google Scholar
  26. 26.
    Steinberger, J., Poesio, M., Kabadjov, M.A., Ježek, K.: Two Uses of Anaphora Resolution in Summarization. Information Processing and Management 43(6), 1663–1680 (2007)CrossRefGoogle Scholar
  27. 27.
    Tatar, D., Tamaianu-Morita, E., Mihis, A., Lupsa, D.: Summarization by Logic Segmentation and Text Entailment. In: 33rd CICLing, pp. 15–26 (2008)Google Scholar
  28. 28.
    Teufel, S., Moens, M.: Sentence extraction as a classification task. In: ACL/EACL 1997 Workshop on Intelligent Scalable Text Summarization, pp. 58–65. Association for Computational Linguistics, Madrid (1997)Google Scholar
  29. 29.
    Vodolazova, T., Lloret, E., Muñoz, R., Palomar, M.: A Comparative Study of the Impact of Statistical and Semantic Features in the Framework of Extractive Text Summarization. In: Sojka, P., Horák, A., Kopeček, I., Pala, K. (eds.) TSD 2012. LNCS, vol. 7499, pp. 306–313. Springer, Heidelberg (2012)CrossRefGoogle Scholar
  30. 30.
    Yang, J., Cohen, A.M., Hersh, W.: Automatic summarization of mouse gene information by clustering and sentence extraction from MEDLINE abstracts. In: AMIA Annual Symposium, pp. 831–835 (2007)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2013

Authors and Affiliations

  • Tatiana Vodolazova
    • 1
  • Elena Lloret
    • 1
  • Rafael Muñoz
    • 1
  • Manuel Palomar
    • 1
  1. 1.Dept. Lenguajes y Sistemas InformáticosUniversidad de AlicanteAlicanteSpain

Personalised recommendations