Structural Features for Predicting the Linguistic Quality of Text

Applications to Machine Translation, Automatic Summarization and Human-Authored Text
  • Ani Nenkova
  • Jieun Chae
  • Annie Louis
  • Emily Pitler
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5790)

Abstract

Sentence structure is considered to be an important component of the overall linguistic quality of text. Yet few empirical studies have sought to characterize how and to what extent structural features determine fluency and linguistic quality. We report the results of experiments on the predictive power of syntactic phrasing statistics and other structural features for these aspects of text. Manual assessments of sentence fluency for machine translation evaluation and text quality for summarization evaluation are used as gold-standard. We find that many structural features related to phrase length are weakly but significantly correlated with fluency and classifiers based on the entire suite of structural features can achieve high accuracy in pairwise comparison of sentence fluency and in distinguishing machine translations from human translations. We also test the hypothesis that the learned models capture general fluency properties applicable to human-authored text. The results from our experiments do not support the hypothesis. At the same time structural features and models based on them prove to be robust for automatic evaluation of the linguistic quality of multi-document summaries.

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. 1.
    Bailin, A., Grafstein, A.: The linguistic assumptions underlying readability formulae: a critique. Language and Communication 21, 285–301 (2001)CrossRefGoogle Scholar
  2. 2.
    Bangalore, S., Rambow, O.: Exploiting a probabilistic hierarchical model for generation. In: Proceedings of the 18th International Conference on Computational Linguistics (COLING 2000), pp. 42–48 (2000)Google Scholar
  3. 3.
    Bangalore, S., Rambow, O., Whittaker, S.: Evaluation metrics for generation. In: Proceedings of the First International Conference on Natural Language Generation (INLG 2000), pp. 1–8 (2000)Google Scholar
  4. 4.
    Banko, M., Mittal, V., Witbrock, M.: Headline generation based on statistical translation. In: Proceedings of the 38th Annual Meeting of the Association for Computational Linguistics (ACL 2000), pp. 318–325 (2000)Google Scholar
  5. 5.
    Barzilay, R., Lapata, M.: Modeling local coherence: An entity-based approach. Computational Linguistics 34(1), 1–34 (2008)CrossRefGoogle Scholar
  6. 6.
    Barzilay, R., McKeown, K.R.: Sentence fusion for multidocument news summarization. Computational Linguistics 31(3), 297–328 (2005)CrossRefMATHGoogle Scholar
  7. 7.
    Cahill, A., Forst, M.: Human evaluation of a German surface realisation ranker. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 201–221. Springer, Heidelberg (2010)Google Scholar
  8. 8.
    Cahill, A., Forst, M., Rohrer, C.: Stochastic realisation ranking for a free word order language. In: Proceedings of the Eleventh European Workshop on Natural Language Generation (ENLG 2007), pp. 17–24 (2007)Google Scholar
  9. 9.
    Charniak, E., Johnson, M.: Coarse-to-fine n-best parsing and maxent discriminative reranking. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 173–180 (2005)Google Scholar
  10. 10.
    Charniak, E.: A maximum-entropy-inspired parser. In: Proceedings of the 1st North American chapter of the Association for Computational Linguistics Conference (NAACL 2000), pp. 132–139 (2000)Google Scholar
  11. 11.
    Clarke, J., Lapata, M.: Models for sentence compression: A comparison across domains, training requirements and evaluation measures. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics (COLING/ACL 2006), pp. 377–384 (2006)Google Scholar
  12. 12.
    Collins, M., Koo, T.: Discriminative reranking for natural language parsing. Computational Linguistics 31(1), 25–70 (2005)MathSciNetCrossRefMATHGoogle Scholar
  13. 13.
    Collins-Thompson, K., Callan, J.P.: A language modeling approach to predicting reading difficulty. In: Proceedings of the Human Language Technology Conference of the North American Chapter of the Association for Computational Linguistics: HLT-NAACL 2004, pp. 193–200 (2004)Google Scholar
  14. 14.
    Conroy, J., Dang, H.: Mind the gap: dangers of divorcing evaluations of summary content from linguistic quality. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), pp. 145–152 (2008)Google Scholar
  15. 15.
    Corston-Oliver, S., Gamon, M., Brockett, C.: A machine learning approach to the automatic evaluation of machine translation. In: Proceedings of 39th Annual Meeting of the Association for Computational Linguistics (ACL 2001), pp. 148–155 (2001)Google Scholar
  16. 16.
    Deerwester, S., Dumais, S., Furnas, G., Landauer, T., Harshman, R.: Indexing by latent semantic analysis. Journal of the American Society for Information Science 41, 391–407 (1990)CrossRefGoogle Scholar
  17. 17.
    Elsner, M., Austerweil, J., Charniak, E.: A unified local and global model for discourse coherence. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 436–443 (2007)Google Scholar
  18. 18.
    Galley, M., McKeown, K.: Lexicalized Markov grammars for sentence compression. In: Human Language Technologies 2007: The Conference of the North American Chapter of the Association for Computational Linguistics; Proceedings of the Main Conference, pp. 180–187 (2007)Google Scholar
  19. 19.
    Graesser, A., McNamara, D., Louwerse, M., Cai, Z.: Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods Instruments and Computers 36(2), 193–202 (2004)CrossRefGoogle Scholar
  20. 20.
    Grosz, B., Joshi, A., Weinstein, S.: Centering: a framework for modelling the local coherence of discourse. Computational Linguistics 21(2), 203–226 (1995)Google Scholar
  21. 21.
    Haberlandt, K., Graesser, A.: Component processes in text comprehension and some of their interactions. Journal of Experimental Psychology: General 114(3), 357–374 (1985)CrossRefGoogle Scholar
  22. 22.
    Holmes, G., Donkin, A., Witten, I.: Weka: A machine learning workbench. In: Second Australian and New Zealand Conference on Intelligent Information Systems, pp. 357–361 (1994)Google Scholar
  23. 23.
    Huang, L.: Forest reranking: Discriminative parsing with non-local features. In: Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies (ACL 2008: HLT), pp. 586–594 (2008)Google Scholar
  24. 24.
    Jing, H.: Sentence reduction for automatic text summarization. In: Proceedings of the Sixth Conference on Applied Natural Language Processing (ANLP 2000), pp. 310–315 (2000)Google Scholar
  25. 25.
    Joachims, T.: Optimizing search engines using clickthrough data. In: Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 133–142 (2002)Google Scholar
  26. 26.
    Just, M., Carpenter, P.: The psychology of reading and language comprehension, Allyn, Bacon (1987)Google Scholar
  27. 27.
    Karamanis, N., Mellish, C., Poesio, M., Oberlander, J.: Evaluating centering for information ordering using corpora. Computational Linguististics 35(1), 29–46 (2009)CrossRefGoogle Scholar
  28. 28.
    Knight, K., Marcu, D.: Summarization beyond sentence extraction: a probabilistic approach to sentence compression. Artificial Intelligence 139(1), 91–107 (2002)CrossRefMATHGoogle Scholar
  29. 29.
    Langkilde, I., Knight, K.: Generation that exploits corpus-based statistical knowledge. In: Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and the 17th International Conference on Computational Linguistics (COLING-ACL 1998), pp. 704–710 (1998)Google Scholar
  30. 30.
    Lapata, M.: Probabilistic text structuring: Experiments with sentence ordering. In: Proceedings of the 41st Annual Meeting of the Association for Computational Linguistics (ACL 2003), pp. 545–552 (2003)Google Scholar
  31. 31.
    Lapata, M., Barzilay, R.: Automatic evaluation of text coherence: models and representations. In: Proceedings of the 19th International Joint Conference on Artificial Intelligence (IJCAI 2005), pp. 1085–1090 (2005)Google Scholar
  32. 32.
    Lin, C.Y., Hovy, E.: Automatic evaluation of summaries using n-gram co-occurrence statistics. In: Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology (NAACL 2003), pp. 71–78 (2003)Google Scholar
  33. 33.
    Lin, C.: Rouge: A package for automatic evaluation of summaries. In: Proceedings of the Workshop on Text Summarization Branches Out (WAS 2004), pp. 25–26 (2004)Google Scholar
  34. 34.
    McDonald, R.: Discriminative sentence compression with soft syntactic evidence. In: Proceedings of the 11th Conference of the European Chapter of the Association for Computational Linguistics (EACL 2006), pp. 297–304 (2006)Google Scholar
  35. 35.
    Mutton, A., Dras, M., Wan, S., Dale, R.: GLEU: Automatic evaluation of sentence-level fluency. In: Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (ACL 2007), pp. 344–351 (2007)Google Scholar
  36. 36.
    Over, P., Dang, H., Harman, D.: DUC in context. Information Processing Management 43(6), 1506–1520 (2007)CrossRefGoogle Scholar
  37. 37.
    Papineni, K., Roukos, S., Ward, T., Zhu, W.J.: BLEU: a method for automatic evaluation of machine translation. In: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics (ACL 2002), pp. 311–318 (2002)Google Scholar
  38. 38.
    Petersen, S.E., Ostendorf, M.: A machine learning approach to reading level assessment. Computer Speech and Language 23(1), 89–106 (2009)CrossRefGoogle Scholar
  39. 39.
    Pitler, E., Nenkova, A.: Revisiting readability: a unified framework for predicting text quality. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP 2008), pp. 186–195 (2008)Google Scholar
  40. 40.
    Rieser, V., Lemon, O.: Natural language generation as planning under uncertainty for spoken dialogue systems. In: Krahmer, E., Theune, M. (eds.) Empirical Methods in NLG. LNCS (LNAI), vol. 5790, pp. 105–120. Springer, Heidelberg (2010)Google Scholar
  41. 41.
    Schwarm, S., Ostendorf, M.: Reading level assessment using support vector machines and statistical language models. In: Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL 2005), pp. 523–530 (2005)Google Scholar
  42. 42.
    Siddharthan, A.: Syntactic simplification and Text Cohesion. Ph.D. thesis, University of Cambridge, UK (2003)Google Scholar
  43. 43.
    Soricut, R., Marcu, D.: Abstractive headline generation using WIDL-expressions. Information Processing and Management 43(6), 1536–1548 (2007)CrossRefGoogle Scholar
  44. 44.
    Soricut, R., Marcu, D.: Discourse generation using utility-trained coherence models. In: Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pp. 803–810 (2006)Google Scholar
  45. 45.
    Stolcke, A.: SRILM – an extensible language modeling toolkit. In: Seventh International Conference on Spoken Language Processing (ICSLP 2002), vol. 3 (2002)Google Scholar
  46. 46.
    Turner, J., Charniak, E.: Supervised and unsupervised learning for sentence compression. In: Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics (ACL 2005), pp. 290–297 (2005)Google Scholar
  47. 47.
    Velldal, E., Oepen, S.: Maximum entropy models for realization ranking. In: Proceedings of the 10th Machine Translation Summit, pp. 109–116 (2005)Google Scholar
  48. 48.
    Wan, S., Dale, R., Dras, M.: Searching for grammaticality: Propagating dependencies in the Viterbi algorithm. In: Proceedings of the Tenth European Workshop on Natural Language Generation (ENLG 2005), pp. 211–216 (2005)Google Scholar
  49. 49.
    Zajic, D., Dorr, B., Lin, J., Schwartz, R.: Multi-candidate reduction: Sentence compression as a tool for document summarization tasks. Information Processing Management 43(6), 1549–1570 (2007)CrossRefGoogle Scholar
  50. 50.
    Zwarts, S., Dras, M.: Choosing the right translation: A syntactically informed classification approach. In: Proceedings of the 22nd International Conference on Computational Linguistics (COLING 2008), pp. 1153–1160 (2008)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2010

Authors and Affiliations

  • Ani Nenkova
    • 1
  • Jieun Chae
    • 1
  • Annie Louis
    • 1
  • Emily Pitler
    • 1
  1. 1.University of PennsylvaniaUSA

Personalised recommendations