Abstract
The presented work studies textual summaries, aiming to detect the qualities of human multi-document summaries, in contrast to automatically extracted ones. The measured features are based on a generic statistical regularity measure, named Symbol Sequence Statistical Regularity (SSSR). The measure is calculated over both character and word n-grams of various ranks, given a set of human and automatically extracted multi-document summaries from two different corpora. The results of the experiments indicate that the proposed measure provides enough distinctive power to discriminate between the human and non-human summaries. The results hint on the qualities a human summary holds, increasing intuition related to how a good summary should be generated.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Blache, P., Hemforth, B., Rauzy, S.: Acceptability prediction by means of grammaticality quantification. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 57–64 (2006)
Chang, C., Lin, C.: LIBSVM: a library for support vector machines, vol. 80, pp. 604–611 (2001), Software http://www.csie.ntu.edu.tw/cjlin/libsvm
Chenowith, N., Hayes, J.: Fluency in Writing: Generating Text in L1 and L2. Written Communication 18(1), 80 (2001)
Chomsky, N.: Grammaticality in the Logical Structure of Linguistic Theory (1955)
Chomsky, N.: Rules And Representations. Columbia University Press (2005)
Dang, H.T.: Overview of DUC 2006. In: Proceedings of HLT-NAACL 2006 (2006)
Giannakopoulos, G., Karkaletsis, V.: Summarization system evaluation variations based on n-gram graphs. In: TAC 2010 (2010)
Giannakopoulos, G., Karkaletsis, V., Vouros, G., Stamatopoulos, P.: Summarization system evaluation revisited: N-gram graphs. ACM Trans. Speech Lang. Process. 5(3), 1–39 (2008)
Hamon, O., Rajman, M.: X-Score: Automatic Evaluation of Machine Translation Grammaticality. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC (2006)
Hovy, E., Lin, C., Zhou, L., Fukumoto, J.: Basic Elements (2005)
Jing, H.: Using hidden Markov modeling to decompose human-written summaries. Computational Linguistics 28(4), 527–543 (2002)
John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, vol. 1, pp. 338–345 (1995)
Keller, F.: Gradience in Grammar. Ph.D. thesis, University of Edinburgh (2000)
Lin, C.: Rouge: A Package for Automatic Evaluation of Summaries. Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), 25–26 (2004)
Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press (1999)
McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)
Mutton, A., Dras, M., Wan, S., Dale, R.: GLEU: Automatic Evaluation of Sentence-Level Fluency. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 344–351 (2007)
Nenkova, A.: Understanding the process of multi-document summarization: content selection, rewriting and evaluation. PhD in Philosophy, Columbia University (2006)
Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2001)
Passonneau, R., McKeown, K., Sigelman, S., Goodkind, A.: Applying the Pyramid Method in the 2006 Document Understanding Conference (2006)
Prince, C., Smolensky, P.: Optimality Theory: Constraint Interaction in Generative Grammar. Optimality Theory in Phonology: A Reader (2004)
Sorace, A., Keller, F.: Gradience in linguistic data. Lingua 115(11), 1497–1524 (2005)
Witten, I., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.: Weka: Practical Machine Learning Tools and Techniques with Java Implementations. In: ICONIP/ANZIIS/ANNES, pp. 192–196 (1999)
Wold, S.: Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2(1), 37–52 (1987)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2012 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Giannakopoulos, G., Karkaletsis, V., Vouros, G.A. (2012). Detecting Human Features in Summaries – Symbol Sequence Statistical Regularity. In: Maglogiannis, I., Plagianakos, V., Vlahavas, I. (eds) Artificial Intelligence: Theories and Applications. SETN 2012. Lecture Notes in Computer Science(), vol 7297. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-30448-4_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-30448-4_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-30447-7
Online ISBN: 978-3-642-30448-4
eBook Packages: Computer ScienceComputer Science (R0)