Skip to main content

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

Abstract

Document summarization can be viewed as a reductive distilling of source text through content condensation, while words with high quantities of information are believed to carry more content and thereby importance. In this paper, we propose a new quantification measure for word significance used in natural language processing (NLP) tasks, and successfully apply it to an extractive text summarization approach. In a query-based summarization setting, the correlation between user queries and sentences to be scored is established from both the micro (i.e. at the word level) and the macro (i.e. at the sentence level) perspectives, resulting in an effective ranking formula. The experiments, both on a generic single document summarization evaluation, and on a query-based multi-document evaluation, verify the effectiveness of the proposed measures and show that the proposed approach achieves a state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: A survey. Artificial Intelligence in Medicine 33(2), 157–177 (2005)

    Article  Google Scholar 

  2. Aston, G., Burnard, L.: The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press, UK (1998)

    Google Scholar 

  3. Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)

    Google Scholar 

  4. Conroy, J.M., Schlesinger, J.D., O’Leary, D.P., Goldstein, J.: Back to basics: Classy 2006. In: Proceedings of DUC 2006, New York City, NY (2006)

    Google Scholar 

  5. Conroy, J.M., Schlesinger, J.D., O’Leary, D.P.: CLASSY 2007 at DUC 2007. In: Proceedings of DUC 2007, New York (2007)

    Google Scholar 

  6. Cruz, C.M., Urrea, A.M.: Extractive summarization based on word information and sentence position. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 653–656. Springer, Heidelberg (2005)

    Chapter  Google Scholar 

  7. Dang, H.T.: Overview of DUC 2005. In: Proceedings of DUC 2005, Vancouver, B.C., Canada (2005)

    Google Scholar 

  8. Díaz, A., Gervás, P.: User-model based personalized summarization. Information Processing and Management: An International Journal 43(6), 1715–1734 (2007)

    Article  Google Scholar 

  9. Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)

    Google Scholar 

  10. Jagarlamudi, J., Pingali, P., Varma, V.: Query independent sentence scoring approach to DUC 2006. In: Proceedings of DUC 2006, New York City, NY (2006)

    Google Scholar 

  11. Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of COLING 2000, Morristown, NJ, USA, pp. 495–501 (2000)

    Google Scholar 

  12. Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Moens, M.-F., Szpakowicz, S. (eds.) Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop, Barcelona, Spain, pp. 74–81 (2004)

    Google Scholar 

  13. Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development (1958)

    Google Scholar 

  14. Marcu, D.: From discourse structures to text summaries. In: Proceedings of the ACL 1997/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 82–88 (1997)

    Google Scholar 

  15. Mihalcea, R., Tarau, P.: Text Rank: Bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, Barcelona, Spain, pp. 404–411 (2004)

    Google Scholar 

  16. Nenkova, A., Vanderwende, L.: The impact of frequency on summarization. In: MSR-TR-2005-101 (2005)

    Google Scholar 

  17. Park, H.R., Han, Y.S., Kim, T.H.: Heuristic algorithms for automatic summarization of Korean texts. In: Online Proceedings of ICCS/JCSS 1999 (1999), http://www.jcss.gr.jp/iccs99OLP/p3-11/p3-11.htm

  18. Ravindra, G., Balakrishnan, N., Ramakrishnan, K.R.: Multi-document automatic text summarization using entropy estimates. In: Van Emde Boas, P., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2004. LNCS, vol. 2932, pp. 73–82. Springer, Heidelberg (2004)

    Google Scholar 

  19. Schilder, F., Kondadadi, R.: Fast Sum: Fast and accurate query-based multi-document summarization. In: Proceedings of ACL 2008: HLT, Short Papers, Columbus, Ohio, USA, pp. 205–208 (2008)

    Google Scholar 

  20. Sparck-Jones, K.: Automatic summarising: Factors and directions. In: Mani, I., Maybury, M. (eds.) Advances in Automatic Text Summarization, pp. 1–12. MIT Press, London (1999)

    Google Scholar 

  21. Sparck-Jones, K.: Automatic summarising: The state of the art. Information Processing and Management: An International Journal 43(6), 1449–1481 (2007)

    Article  Google Scholar 

  22. Teufel, S., Moens, M.: Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics 28(4), 409–445 (2002)

    Article  Google Scholar 

  23. Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of ACL 2007, Prague, pp. 552–559 (2007)

    Google Scholar 

  24. Yih, W.T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI 2007, pp. 1776–1782 (2007)

    Google Scholar 

  25. Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of SIGIR 2002, Tampere, Finland, pp. 113–120 (2002)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2009 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, X., Webster, J.J., Kit, C. (2009). An Extractive Text Summarizer Based on Significant Words. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_16

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-00831-3_16

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-00830-6

  • Online ISBN: 978-3-642-00831-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics