A new evaluation measure using compression dissimilarity on text summarization

Wang, Tong; Chen, Ping; Simovici, Dan

doi:10.1007/s10489-015-0747-x

A new evaluation measure using compression dissimilarity on text summarization

Published: 30 January 2016

Volume 45, pages 127–134, (2016)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Tong Wang¹,
Ping Chen² &
Dan Simovici¹

487 Accesses
6 Citations
Explore all metrics

Abstract

Evaluation of automatic text summarization is a challenging task due to the difficulty of calculating similarity of two texts. In this paper, we define a new dissimilarity measure – compression dissimilarity to compute the dissimilarity between documents. Then we propose a new automatic evaluating method based on compression dissimilarity. The proposed method is a completely “black box” and does not need preprocessing steps. Experiments show that compression dissimilarity could clearly distinct automatic summaries from human summaries. Compression dissimilarity evaluating measure could evaluate an automatic summary by comparing with high-quality human summaries, or comparing with its original document. The evaluating results are highly correlated with human assessments, and the correlation between compression dissimilarity of summaries and compression dissimilarity of documents can serve as a meaningful measure to evaluate the consistency of an automatic text summarization system.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

References

Jones KS, Galliers JR (1995) Evaluating natural language processing systems: an analysis and review, vol 1083. Springer Science & Business Media
Hassel M (2004) Evaluation of automatic text summarization. Licentiate Thesis, Stockholm, Sweden, pp 1–75
Steinberger J, Jeek K (2012) Evaluation measures for text summarization. In: Computing and Informatics, vol 28.2, pp 251–275
Lin C-Y (2004) ROUGE: a package for automatic evaluation of summaries. In: Text summarization branches out: Proceedings of the ACL-04 workshop, vol 8
Simovici D, Pletea D, Baraty S (2013) Evaluating data minability through compression an experimental study. In: Proceedings of Data Analytics, pp 97–102
Simovici D, Tenney R (1999) Theory of formal languages with applications. World Scientific
Simovici D, Chen P, Wang T, Pletea D (2015) Compression and data mining. In: Computing, Networking and Communications (ICNC), 2015 International Conference on. IEEE, pp 551–555
Nenkova A, Passonneau R (2005) Evaluating content selection in summarization: the pyramid method
Deerwester SC, Dumais ST, Landauer TK, Furnas GW, Harshman RA (1990) Indexing by latent semantic analysis. J Am Soc Inf Sci 41(6):391–407
Article Google Scholar
Blei DM, Ng AY, Jordan MI (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Wang T, Viswanath V, Chen P Extended topic model for word dependency. In: Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Beijing, China, pp 506–510
Salton G (1998) Automatic text processing. Addison-Wesley Publishing Company
Saggion H, Radev D, Teufel S, Lam W, Strassel SM (2002) Developing infrastructure for the evaluation of single and multi-document summarization systems in a cross-lingual environment. In: Ann Arbor, vol 1001, pp 48109–1092
Radev DR, Teufel S, Saggion H, Lam W, Blitzer J, Qi H, Celebi A, Liu D (2003) Evaluation challenges in large-scale multi-document summarization: the mead project. In: Proceedings of the 41st Annual Meeting on Association for Computational Linguistics, vol 1. Association for Computational Linguistics, pp 375–382
Mannila H (2000) Theoretical frameworks for data mining, vol 1.2. ACM SIGKDD Explorations Newsletter, pp 30–32
Berry MW, Dumais ST, O’Brien GW (1995) Using linear algebra for intelligent information retrieval. SIAM Rev 37.4:573–595
Article MathSciNet MATH Google Scholar
Ding CHQ (2005) A probabilistic model for latent semantic indexing. J Am Soc Inf Sci Technol 56.6:597–608
Article Google Scholar
Document understanding conference 2002. http://www-nlpir.nist.gov/projects/duc/
Document understanding conference 2007. http://www-nlpir.nist.gov/projects/duc/
Hirao T, Sasaki Y, Isozaki H, Maeda E (2002) NTTs text summarization system for DUC-2002. In: Proceedings of the Document Understanding Conference 2002
Nelson MR (1989) LZW data compression. Dr. Dobb’s Journal 14.10:29–36
Google Scholar
Deutsch LP (1996) DEFLATE compressed data format specification version 1.3

Download references

Author information

Authors and Affiliations

Department of Computer Science, University of Massachusetts Boston, Boston, MA, USA
Tong Wang & Dan Simovici
Department of Computer Engineering, University of Massachusetts Boston, Boston, MA, USA
Ping Chen

Authors

Tong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Ping Chen
View author publications
You can also search for this author in PubMed Google Scholar
Dan Simovici
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Wang.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Wang, T., Chen, P. & Simovici, D. A new evaluation measure using compression dissimilarity on text summarization. Appl Intell 45, 127–134 (2016). https://doi.org/10.1007/s10489-015-0747-x

Download citation

Published: 30 January 2016
Issue Date: July 2016
DOI: https://doi.org/10.1007/s10489-015-0747-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A new evaluation measure using compression dissimilarity on text summarization

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

A comprehensive and analytical review of text clustering techniques

Recent automatic text summarization techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new evaluation measure using compression dissimilarity on text summarization

Abstract

Access this article

Similar content being viewed by others

Siamese Neural Networks: An Overview

A comprehensive and analytical review of text clustering techniques

Recent automatic text summarization techniques: a survey

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation