Detecting Human Features in Summaries – Symbol Sequence Statistical Regularity

  • George Giannakopoulos
  • Vangelis Karkaletsis
  • George A. Vouros
Part of the Lecture Notes in Computer Science book series (LNCS, volume 7297)


The presented work studies textual summaries, aiming to detect the qualities of human multi-document summaries, in contrast to automatically extracted ones. The measured features are based on a generic statistical regularity measure, named Symbol Sequence Statistical Regularity (SSSR). The measure is calculated over both character and word n-grams of various ranks, given a set of human and automatically extracted multi-document summaries from two different corpora. The results of the experiments indicate that the proposed measure provides enough distinctive power to discriminate between the human and non-human summaries. The results hint on the qualities a human summary holds, increasing intuition related to how a good summary should be generated.


Machine Translation Computational Linguistics Property Grammar Input Document Summarization System 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Blache, P., Hemforth, B., Rauzy, S.: Acceptability prediction by means of grammaticality quantification. In: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the ACL, pp. 57–64 (2006)Google Scholar
  2. 2.
    Chang, C., Lin, C.: LIBSVM: a library for support vector machines, vol. 80, pp. 604–611 (2001), Software
  3. 3.
    Chenowith, N., Hayes, J.: Fluency in Writing: Generating Text in L1 and L2. Written Communication 18(1), 80 (2001)CrossRefGoogle Scholar
  4. 4.
    Chomsky, N.: Grammaticality in the Logical Structure of Linguistic Theory (1955)Google Scholar
  5. 5.
    Chomsky, N.: Rules And Representations. Columbia University Press (2005)Google Scholar
  6. 6.
    Dang, H.T.: Overview of DUC 2006. In: Proceedings of HLT-NAACL 2006 (2006)Google Scholar
  7. 7.
    Giannakopoulos, G., Karkaletsis, V.: Summarization system evaluation variations based on n-gram graphs. In: TAC 2010 (2010)Google Scholar
  8. 8.
    Giannakopoulos, G., Karkaletsis, V., Vouros, G., Stamatopoulos, P.: Summarization system evaluation revisited: N-gram graphs. ACM Trans. Speech Lang. Process. 5(3), 1–39 (2008)CrossRefGoogle Scholar
  9. 9.
    Hamon, O., Rajman, M.: X-Score: Automatic Evaluation of Machine Translation Grammaticality. In: Proceedings of the 5th International Conference on Language Resources and Evaluation, LREC (2006)Google Scholar
  10. 10.
    Hovy, E., Lin, C., Zhou, L., Fukumoto, J.: Basic Elements (2005)Google Scholar
  11. 11.
    Jing, H.: Using hidden Markov modeling to decompose human-written summaries. Computational Linguistics 28(4), 527–543 (2002)CrossRefGoogle Scholar
  12. 12.
    John, G., Langley, P.: Estimating continuous distributions in bayesian classifiers. In: Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, San Mateo, vol. 1, pp. 338–345 (1995)Google Scholar
  13. 13.
    Keller, F.: Gradience in Grammar. Ph.D. thesis, University of Edinburgh (2000)Google Scholar
  14. 14.
    Lin, C.: Rouge: A Package for Automatic Evaluation of Summaries. Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), 25–26 (2004)Google Scholar
  15. 15.
    Manning, C., Schütze, H.: Foundations of Statistical Natural Language Processing. The MIT Press (1999)Google Scholar
  16. 16.
    McCallum, A., Nigam, K.: A comparison of event models for naive bayes text classification. In: AAAI 1998 Workshop on Learning for Text Categorization, vol. 752, pp. 41–48 (1998)Google Scholar
  17. 17.
    Mutton, A., Dras, M., Wan, S., Dale, R.: GLEU: Automatic Evaluation of Sentence-Level Fluency. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics, pp. 344–351 (2007)Google Scholar
  18. 18.
    Nenkova, A.: Understanding the process of multi-document summarization: content selection, rewriting and evaluation. PhD in Philosophy, Columbia University (2006)Google Scholar
  19. 19.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, pp. 311–318 (2001)Google Scholar
  20. 20.
    Passonneau, R., McKeown, K., Sigelman, S., Goodkind, A.: Applying the Pyramid Method in the 2006 Document Understanding Conference (2006)Google Scholar
  21. 21.
    Prince, C., Smolensky, P.: Optimality Theory: Constraint Interaction in Generative Grammar. Optimality Theory in Phonology: A Reader (2004)Google Scholar
  22. 22.
    Sorace, A., Keller, F.: Gradience in linguistic data. Lingua 115(11), 1497–1524 (2005)CrossRefGoogle Scholar
  23. 23.
    Witten, I., Frank, E., Trigg, L., Hall, M., Holmes, G., Cunningham, S.: Weka: Practical Machine Learning Tools and Techniques with Java Implementations. In: ICONIP/ANZIIS/ANNES, pp. 192–196 (1999)Google Scholar
  24. 24.
    Wold, S.: Principal component analysis. Chemometrics and Intelligent Laboratory Systems 2(1), 37–52 (1987)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2012

Authors and Affiliations

  • George Giannakopoulos
    • 1
  • Vangelis Karkaletsis
    • 1
  • George A. Vouros
    • 2
  1. 1.Software and Knowledge Engineering LaboratoryNational Center of Scientific Research “Demokritos”Greece
  2. 2.Department of Digital SystemsUniversity of PireausGreece

Personalised recommendations