An Extractive Text Summarizer Based on Significant Words

Liu, Xiaoyue; Webster, Jonathan J.; Kit, Chunyu

doi:10.1007/978-3-642-00831-3_16

Xiaoyue Liu²¹,
Jonathan J. Webster²¹ &
Chunyu Kit²¹

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 5459))

Included in the following conference series:

International Conference on Computer Processing of Oriental Languages

870 Accesses
15 Citations

Abstract

Document summarization can be viewed as a reductive distilling of source text through content condensation, while words with high quantities of information are believed to carry more content and thereby importance. In this paper, we propose a new quantification measure for word significance used in natural language processing (NLP) tasks, and successfully apply it to an extractive text summarization approach. In a query-based summarization setting, the correlation between user queries and sentences to be scored is established from both the micro (i.e. at the word level) and the macro (i.e. at the sentence level) perspectives, resulting in an effective ranking formula. The experiments, both on a generic single document summarization evaluation, and on a query-based multi-document evaluation, verify the effectiveness of the proposed measures and show that the proposed approach achieves a state-of-the-art performance.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: A survey. Artificial Intelligence in Medicine 33(2), 157–177 (2005)
Article Google Scholar
Aston, G., Burnard, L.: The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press, UK (1998)
Google Scholar
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Google Scholar
Conroy, J.M., Schlesinger, J.D., O’Leary, D.P., Goldstein, J.: Back to basics: Classy 2006. In: Proceedings of DUC 2006, New York City, NY (2006)
Google Scholar
Conroy, J.M., Schlesinger, J.D., O’Leary, D.P.: CLASSY 2007 at DUC 2007. In: Proceedings of DUC 2007, New York (2007)
Google Scholar
Cruz, C.M., Urrea, A.M.: Extractive summarization based on word information and sentence position. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 653–656. Springer, Heidelberg (2005)
Chapter Google Scholar
Dang, H.T.: Overview of DUC 2005. In: Proceedings of DUC 2005, Vancouver, B.C., Canada (2005)
Google Scholar
Díaz, A., Gervás, P.: User-model based personalized summarization. Information Processing and Management: An International Journal 43(6), 1715–1734 (2007)
Article Google Scholar
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Google Scholar
Jagarlamudi, J., Pingali, P., Varma, V.: Query independent sentence scoring approach to DUC 2006. In: Proceedings of DUC 2006, New York City, NY (2006)
Google Scholar
Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of COLING 2000, Morristown, NJ, USA, pp. 495–501 (2000)
Google Scholar
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Moens, M.-F., Szpakowicz, S. (eds.) Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop, Barcelona, Spain, pp. 74–81 (2004)
Google Scholar
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development (1958)
Google Scholar
Marcu, D.: From discourse structures to text summaries. In: Proceedings of the ACL 1997/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 82–88 (1997)
Google Scholar
Mihalcea, R., Tarau, P.: Text Rank: Bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, Barcelona, Spain, pp. 404–411 (2004)
Google Scholar
Nenkova, A., Vanderwende, L.: The impact of frequency on summarization. In: MSR-TR-2005-101 (2005)
Google Scholar
Park, H.R., Han, Y.S., Kim, T.H.: Heuristic algorithms for automatic summarization of Korean texts. In: Online Proceedings of ICCS/JCSS 1999 (1999), http://www.jcss.gr.jp/iccs99OLP/p3-11/p3-11.htm
Ravindra, G., Balakrishnan, N., Ramakrishnan, K.R.: Multi-document automatic text summarization using entropy estimates. In: Van Emde Boas, P., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2004. LNCS, vol. 2932, pp. 73–82. Springer, Heidelberg (2004)
Google Scholar
Schilder, F., Kondadadi, R.: Fast Sum: Fast and accurate query-based multi-document summarization. In: Proceedings of ACL 2008: HLT, Short Papers, Columbus, Ohio, USA, pp. 205–208 (2008)
Google Scholar
Sparck-Jones, K.: Automatic summarising: Factors and directions. In: Mani, I., Maybury, M. (eds.) Advances in Automatic Text Summarization, pp. 1–12. MIT Press, London (1999)
Google Scholar
Sparck-Jones, K.: Automatic summarising: The state of the art. Information Processing and Management: An International Journal 43(6), 1449–1481 (2007)
Article Google Scholar
Teufel, S., Moens, M.: Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics 28(4), 409–445 (2002)
Article Google Scholar
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of ACL 2007, Prague, pp. 552–559 (2007)
Google Scholar
Yih, W.T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI 2007, pp. 1776–1782 (2007)
Google Scholar
Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of SIGIR 2002, Tampere, Finland, pp. 113–120 (2002)
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Chinese, Translation and Linguistics, City University of Hong Kong, Tat Chee Ave., Kowloon, Hong Kong
Xiaoyue Liu, Jonathan J. Webster & Chunyu Kit

Authors

Xiaoyue Liu
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan J. Webster
View author publications
You can also search for this author in PubMed Google Scholar
Chunyu Kit
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Computing, The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Wenjie Li
Division of Information and Communication Sciences, Macquarie University, NSW 2109, Sydney, Australia
Diego Mollá-Aliod

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, X., Webster, J.J., Kit, C. (2009). An Extractive Text Summarizer Based on Significant Words. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_16

Download citation

DOI: https://doi.org/10.1007/978-3-642-00831-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics