Behavior Research Methods

, Volume 40, Issue 2, pp 597–612 | Cite as

What is behind a summary-evaluation decision?

  • Iraide ZipitriaEmail author
  • Pedro Larrañaga
  • Ruben Armañanzas
  • Ana Arruarte
  • Jon A. Elorriaga


Research in psychology has reported that, among the variety of possibilities for assessment methodologies, summary evaluation offers a particularly adequate context for inferring text comprehension and topic understanding. However, grades obtained in this methodology are hard to quantify objectively. Therefore, we carried out an empirical study to analyze the decisions underlying human summary-grading behavior. The task consisted of expert evaluation of summaries produced in critically relevant contexts of summarization development, and the resulting data were modeled by means of Bayesian networks using an application called Elvira, which allows for graphically observing the predictive power (if any) of the resultant variables. Thus, in this article, we analyzed summary-evaluation decision making in a computational framework.


Bayesian Network Global Score Text Comprehension Bayesian Classifier Reading Text 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


  1. Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  2. Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge: Cambridge University Press.Google Scholar
  3. Bayes, T. (1764). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53, 370–418.CrossRefGoogle Scholar
  4. Blanco, R., Inza, I., Merino, M., Quiroga, J., & Larrañaga, P. (2005). Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS. Journal of Biomedical Informatics, 38, 376–388.PubMedCrossRefGoogle Scholar
  5. Bower, G. H., & Hilgard, E. R. (1981). Theories of learning (5th ed.). Englewood Cliffs, NJ: Prentice Hall.Google Scholar
  6. Bransford, J. D., Vye, N., Kinzer, C. K., & Risko, V. (1990). Teaching thinking and content knowledge: Toward an integrated approach. In B. F. Jones & L. Idol (Eds.), Dimensions of thinking and cognitive instruction (pp. 381–413). Hillsdale, NJ: Erlbaum.Google Scholar
  7. Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.Google Scholar
  8. Brown, A. L., & Day, J. D. (1983). Macrorules for summarizing texts: The development of expertise. Journal of Verbal Learning & Verbal Behavior, 22, 1–14.CrossRefGoogle Scholar
  9. Bull, S., & Pain, H. (1995, August). Did I say what I think I said, and do you agree with me? Inspecting and questioning the student model. Paper presented at the Seventh World Conference on Artificial Intelligence in Education (AACE ’ 95), Washington, DC.Google Scholar
  10. Burstein, J., & Marcu, D. (2003). Automated evaluation of discourse structure in student essays. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 209–229). Mahwah, NJ: Erlbaum.Google Scholar
  11. Cassany, D. (1993). Reparar la escritura: Didáctica de la corrección de lo escrito. Barcelona: Editorial Graó.Google Scholar
  12. Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Machine learning— EWSL-91: Proceedings of the European Working Session on Learning (pp. 164–178). Berlin: Springer.CrossRefGoogle Scholar
  13. Chung, G. K. W. K., & Baker, E. L. (2003). Issues in the reliability and validity of automated scoring of constructed responses. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A crossdisciplinary perspective (pp. 23–40). Mahwah, NJ: Erlbaum.Google Scholar
  14. Cizek, G. J., & Page, B. A. (2003). The concept of reliability in the context of automated essay scoring. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 125–145). Mahwah, NJ: Erlbaum.Google Scholar
  15. Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.Google Scholar
  16. Cook, R., & Kay, J. (1994). The justified user model: A viewable, explained user model. In Fourth International Conference on User Modeling (pp. 145–150). Hyannis, MA: Mitre Corp.Google Scholar
  17. Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.CrossRefGoogle Scholar
  18. Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.Google Scholar
  19. Dimitrova, V. (2003). STyLE-OLM: Interactive open learner modelling. International Journal of Artificial Intelligence in Education, 13, 35–78.Google Scholar
  20. Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). Tahoe City, CA: Morgan Kaufmann.Google Scholar
  21. Elosúa, M. R., García-Madruga, J. A., Gutiérrez, F., Luque, J. L., & Gárate, M. (2002). Effects of an intervention in active strategies for text comprehension and recall. Spanish Journal of Psychology, 5, 90–101.PubMedGoogle Scholar
  22. Elvira Consortium (2002). Elvira: An environment for creating and using probabilistic graphical models. In J. A. Gámez & A. Salmerón (Eds.), Proceedings of the First European Workshop on Probabilistic Graphical Models (pp. 222–230), Cuenca, Spain.Google Scholar
  23. Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (pp. 1022–1027). Tahoe City, CA: Morgan Kaufmann.Google Scholar
  24. Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.Google Scholar
  25. Fitzgerald, J. (1987). Research on revision in writing. Review of Educational Research, 57, 481–506.Google Scholar
  26. Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.CrossRefGoogle Scholar
  27. Garner, R. (1982). Efficient text summarization: Costs and benefits. Journal of Educational Research, 75, 275–279.Google Scholar
  28. Garner, R. (1987). Strategies for reading and studying expository text. Educational Psychologist, 22, 299–312.CrossRefGoogle Scholar
  29. Genesee, F., & Upshur, J. A. (1996). Classroom-based evaluation in second language education. Cambridge: Cambridge University Press.Google Scholar
  30. Glazer, E. M., & Hannafin, M. J. (2006). The collaborative apprenticeship model: Situated professional development within school settings. Teaching & Teacher Education, 22, 179–193.CrossRefGoogle Scholar
  31. Glymour, C. (2001). The mind’s arrows: Bayes nets and graphical causal models in psychology. Cambridge, MA: MIT Press.Google Scholar
  32. Goldberg, G. L., & Roswell, B. S. (1999). From perception to practice: The impact of teachers’ scoring experience on performancebased instruction and classroom assessment. Educational Assessment, 6, 257–290.CrossRefGoogle Scholar
  33. Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.Google Scholar
  34. Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: University of Michigan Press.Google Scholar
  35. Hosmer, D. W., Jr., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.Google Scholar
  36. Inoue, A. B. (2005). Community-based assessment pedagogy. Assessing Writing, 9, 208–238.CrossRefGoogle Scholar
  37. Jensen, F. V. (2001). Bayesian networks and decision graphs. New York: Springer.Google Scholar
  38. Kerber, R. (1992). ChiMerge: Discretization for numeric attributes. In P. Rosenbloom & P. Szolovits (Eds.), Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 123–128). Menlo Park, CA: AAAI Press.Google Scholar
  39. Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394.CrossRefGoogle Scholar
  40. Kirby, J. R., & Pedwell, D. (1991). Students’ approaches to summarisation. Educational Psychology, 11, 297–307.CrossRefGoogle Scholar
  41. Kozminsky, E., & Graetz, N. (1986). First vs. second language comprehension: Some evidence from text summarizing. Journal of Research in Reading, 9, 3–21.CrossRefGoogle Scholar
  42. Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47, 583–621.CrossRefGoogle Scholar
  43. Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.CrossRefGoogle Scholar
  44. Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). San Francisco: Morgan Kaufmann.Google Scholar
  45. Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B, 50, 157–224.Google Scholar
  46. Lehnert, W. G. (1981). Plots units and narrative summarization. Cognitive Science, 5, 293–331.CrossRefGoogle Scholar
  47. Long, J., & Harding-Esch, E. (1978). Summary and recall of text in first and second languages: Some factors contributing to performance differences. In D. Gerver & H. W. Sinaiko (Eds.), Language interpretation and communication (pp. 273–288). New York: Plenum.Google Scholar
  48. Magnani, L. (2001). Abduction, reason, and science: Processes of discovery and explanation. New York: Kluwer/Plenum.CrossRefGoogle Scholar
  49. Magnani, L. (2004). Model-based and manipulative abduction in science. Foundations of Science, 9, 219–247.CrossRefGoogle Scholar
  50. Manelis, L., & Yekovich, F. R. (1984). Analysis of expository prose and its relation to learning. Journal of Structural Learning, 8, 29–44.Google Scholar
  51. Mani, I., & Maybury, M. T. (1999). Advances in automatic text summarization. Cambridge, MA: MIT Press.Google Scholar
  52. McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.CrossRefGoogle Scholar
  53. Minsky, M. (1961). Steps toward artificial intelligence. Proceedings of the Institute of Radio Engineers, 49, 8–30.Google Scholar
  54. Neapolitan, R. E. (2003). Learning Bayesian networks. Harlow, U.K.: Prentice Hall.Google Scholar
  55. Page, E. B. (2003). Project essay grade: PEG. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43–54). Mahwah, NJ: Erlbaum.Google Scholar
  56. Pearl, J. (1987). Distributed revision of composite beliefs. Artificial Intelligence, 33, 173–215.CrossRefGoogle Scholar
  57. Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.Google Scholar
  58. Peirce, C. S. (1955). Abduction and induction. In J. Buchler (Ed.), Philosophical writings of Peirce (pp. 150–156). New York: Dover.Google Scholar
  59. Robinson, B., & Schaible, R. M. (1995). Collaborative teaching: Reaping the benefits. College Teaching, 43, 57–59.CrossRefGoogle Scholar
  60. Rumelhart, D. E. (1975). Notes on a schema for stories. In D. G. Bobrow & A. Collins (Eds.), Representation and understanding: Studies in cognitive science (pp. 185–210). New York: Academic Press.Google Scholar
  61. Schank, R. C., Lebowitz, M., & Birnbaum, L. (1980). An integrated understander. American Journal of Computational Linguistics, 6, 13–30.Google Scholar
  62. Sherrard, C. (1989). Teaching students to summarize: Applying textlinguistics. System, 17, 1–11.CrossRefGoogle Scholar
  63. Shimony, S. E., & Charniak, E. (1990). A new algorithm for finding MAP assignments to belief networks. In P. P. Bonissone, M. Henrion, L. N. Kanal, & J. F. Lemmer (Eds.), Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence (pp. 185–196). New York: Elsevier.Google Scholar
  64. Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York: Springer.Google Scholar
  65. Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B, 36, 111–147.Google Scholar
  66. Symons, S., & Pressley, M. (1993). Prior knowledge affects text search success and extraction of information. Reading Research Quarterly, 28, 250–261.CrossRefGoogle Scholar
  67. Taylor, B. M. (1982). Text structure and children’s comprehension and memory for expository material. Journal of Educational Psychology, 74, 323–340.CrossRefGoogle Scholar
  68. Thorndyke, P. W. (1977). Cognitive structures in comprehension and memory of narrative discourse. Cognitive Psychology, 9, 77–110.CrossRefGoogle Scholar
  69. Virvou, M., & Moundridou, M. (2001). Adding an instructor modelling component to the architecture of ITS authoring tools. International Journal of Artificial Intelligence in Education, 12, 185–211.Google Scholar
  70. Whittaker, J. (1990). Graphical models in applied multivariate statistics. Chichester, U.K.: Wiley.Google Scholar
  71. Winograd, P. N. (1984). Strategic difficulties in summarizing texts. Reading Research Quarterly, 19, 404–425.CrossRefGoogle Scholar
  72. Zipitria, I., Arruarte, A., & Elorriaga, J. A. (2006). Observing lemmatization effect in LSA coherence and comprehension grading of learner summaries. In M. Ikeda, K. D. Ashley, & T. W. Chan (Eds.), Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS 2006) (pp. 595–603). Berlin: Springer.Google Scholar

Copyright information

© Psychonomic Society, Inc. 2008

Authors and Affiliations

  • Iraide Zipitria
    • 1
    Email author
  • Pedro Larrañaga
    • 1
  • Ruben Armañanzas
    • 1
  • Ana Arruarte
    • 1
  • Jon A. Elorriaga
    • 1
  1. 1.Department of Social Psychology and Behavioral Science MethodologyUniversity of the Basque CountryDonostia, Basque CountrySpain

Personalised recommendations