Abstract
Document summarization can be viewed as a reductive distilling of source text through content condensation, while words with high quantities of information are believed to carry more content and thereby importance. In this paper, we propose a new quantification measure for word significance used in natural language processing (NLP) tasks, and successfully apply it to an extractive text summarization approach. In a query-based summarization setting, the correlation between user queries and sentences to be scored is established from both the micro (i.e. at the word level) and the macro (i.e. at the sentence level) perspectives, resulting in an effective ranking formula. The experiments, both on a generic single document summarization evaluation, and on a query-based multi-document evaluation, verify the effectiveness of the proposed measures and show that the proposed approach achieves a state-of-the-art performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Afantenos, S., Karkaletsis, V., Stamatopoulos, P.: Summarization from medical documents: A survey. Artificial Intelligence in Medicine 33(2), 157–177 (2005)
Aston, G., Burnard, L.: The BNC Handbook: Exploring the British National Corpus with SARA. Edinburgh University Press, UK (1998)
Church, K.W., Hanks, P.: Word association norms, mutual information, and lexicography. Computational Linguistics 16(1), 22–29 (1990)
Conroy, J.M., Schlesinger, J.D., O’Leary, D.P., Goldstein, J.: Back to basics: Classy 2006. In: Proceedings of DUC 2006, New York City, NY (2006)
Conroy, J.M., Schlesinger, J.D., O’Leary, D.P.: CLASSY 2007 at DUC 2007. In: Proceedings of DUC 2007, New York (2007)
Cruz, C.M., Urrea, A.M.: Extractive summarization based on word information and sentence position. In: Gelbukh, A. (ed.) CICLing 2005. LNCS, vol. 3406, pp. 653–656. Springer, Heidelberg (2005)
Dang, H.T.: Overview of DUC 2005. In: Proceedings of DUC 2005, Vancouver, B.C., Canada (2005)
Díaz, A., Gervás, P.: User-model based personalized summarization. Information Processing and Management: An International Journal 43(6), 1715–1734 (2007)
Fellbaum, C.: WordNet: An Electronic Lexical Database. MIT Press, Cambridge (1998)
Jagarlamudi, J., Pingali, P., Varma, V.: Query independent sentence scoring approach to DUC 2006. In: Proceedings of DUC 2006, New York City, NY (2006)
Lin, C.Y., Hovy, E.: The automated acquisition of topic signatures for text summarization. In: Proceedings of COLING 2000, Morristown, NJ, USA, pp. 495–501 (2000)
Lin, C.Y.: ROUGE: A package for automatic evaluation of summaries. In: Moens, M.-F., Szpakowicz, S. (eds.) Text Summarization Branches Out: Proceedings of the ACL 2004 Workshop, Barcelona, Spain, pp. 74–81 (2004)
Luhn, H.P.: The automatic creation of literature abstracts. IBM Journal of Research and Development (1958)
Marcu, D.: From discourse structures to text summaries. In: Proceedings of the ACL 1997/EACL 1997 Workshop on Intelligent Scalable Text Summarization, Madrid, Spain, pp. 82–88 (1997)
Mihalcea, R., Tarau, P.: Text Rank: Bringing order into texts. In: Lin, D., Wu, D. (eds.) Proceedings of EMNLP 2004, Barcelona, Spain, pp. 404–411 (2004)
Nenkova, A., Vanderwende, L.: The impact of frequency on summarization. In: MSR-TR-2005-101 (2005)
Park, H.R., Han, Y.S., Kim, T.H.: Heuristic algorithms for automatic summarization of Korean texts. In: Online Proceedings of ICCS/JCSS 1999 (1999), http://www.jcss.gr.jp/iccs99OLP/p3-11/p3-11.htm
Ravindra, G., Balakrishnan, N., Ramakrishnan, K.R.: Multi-document automatic text summarization using entropy estimates. In: Van Emde Boas, P., Pokorný, J., Bieliková, M., Štuller, J. (eds.) SOFSEM 2004. LNCS, vol. 2932, pp. 73–82. Springer, Heidelberg (2004)
Schilder, F., Kondadadi, R.: Fast Sum: Fast and accurate query-based multi-document summarization. In: Proceedings of ACL 2008: HLT, Short Papers, Columbus, Ohio, USA, pp. 205–208 (2008)
Sparck-Jones, K.: Automatic summarising: Factors and directions. In: Mani, I., Maybury, M. (eds.) Advances in Automatic Text Summarization, pp. 1–12. MIT Press, London (1999)
Sparck-Jones, K.: Automatic summarising: The state of the art. Information Processing and Management: An International Journal 43(6), 1449–1481 (2007)
Teufel, S., Moens, M.: Summarizing scientific articles: Experiments with relevance and rhetorical status. Computational Linguistics 28(4), 409–445 (2002)
Wan, X., Yang, J., Xiao, J.: Towards an iterative reinforcement approach for simultaneous document summarization and keyword extraction. In: Proceedings of ACL 2007, Prague, pp. 552–559 (2007)
Yih, W.T., Goodman, J., Vanderwende, L., Suzuki, H.: Multi-document summarization by maximizing informative content-words. In: Proceedings of IJCAI 2007, pp. 1776–1782 (2007)
Zha, H.: Generic summarization and keyphrase extraction using mutual reinforcement principle and sentence clustering. In: Proceedings of SIGIR 2002, Tampere, Finland, pp. 113–120 (2002)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2009 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, X., Webster, J.J., Kit, C. (2009). An Extractive Text Summarizer Based on Significant Words. In: Li, W., Mollá-Aliod, D. (eds) Computer Processing of Oriental Languages. Language Technology for the Knowledge-based Economy. ICCPOL 2009. Lecture Notes in Computer Science(), vol 5459. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-00831-3_16
Download citation
DOI: https://doi.org/10.1007/978-3-642-00831-3_16
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-00830-6
Online ISBN: 978-3-642-00831-3
eBook Packages: Computer ScienceComputer Science (R0)