Abstract
Research in psychology has reported that, among the variety of possibilities for assessment methodologies, summary evaluation offers a particularly adequate context for inferring text comprehension and topic understanding. However, grades obtained in this methodology are hard to quantify objectively. Therefore, we carried out an empirical study to analyze the decisions underlying human summary-grading behavior. The task consisted of expert evaluation of summaries produced in critically relevant contexts of summarization development, and the resulting data were modeled by means of Bayesian networks using an application called Elvira, which allows for graphically observing the predictive power (if any) of the resultant variables. Thus, in this article, we analyzed summary-evaluation decision making in a computational framework.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
Bandura, A. (1977). Social learning theory. Englewood Cliffs, NJ: Prentice Hall.
Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge: Cambridge University Press.
Bayes, T. (1764). An essay towards solving a problem in the doctrine of chances. Philosophical Transactions of the Royal Society of London, 53, 370–418.
Blanco, R., Inza, I., Merino, M., Quiroga, J., & Larrañaga, P. (2005). Feature selection in Bayesian classifiers for the prognosis of survival of cirrhotic patients treated with TIPS. Journal of Biomedical Informatics, 38, 376–388.
Bower, G. H., & Hilgard, E. R. (1981). Theories of learning (5th ed.). Englewood Cliffs, NJ: Prentice Hall.
Bransford, J. D., Vye, N., Kinzer, C. K., & Risko, V. (1990). Teaching thinking and content knowledge: Toward an integrated approach. In B. F. Jones & L. Idol (Eds.), Dimensions of thinking and cognitive instruction (pp. 381–413). Hillsdale, NJ: Erlbaum.
Breiman, L., Friedman, J., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.
Brown, A. L., & Day, J. D. (1983). Macrorules for summarizing texts: The development of expertise. Journal of Verbal Learning & Verbal Behavior, 22, 1–14.
Bull, S., & Pain, H. (1995, August). Did I say what I think I said, and do you agree with me? Inspecting and questioning the student model. Paper presented at the Seventh World Conference on Artificial Intelligence in Education (AACE ’ 95), Washington, DC.
Burstein, J., & Marcu, D. (2003). Automated evaluation of discourse structure in student essays. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 209–229). Mahwah, NJ: Erlbaum.
Cassany, D. (1993). Reparar la escritura: Didáctica de la corrección de lo escrito. Barcelona: Editorial Graó.
Catlett, J. (1991). On changing continuous attributes into ordered discrete attributes. In Y. Kodratoff (Ed.), Machine learning— EWSL-91: Proceedings of the European Working Session on Learning (pp. 164–178). Berlin: Springer.
Chung, G. K. W. K., & Baker, E. L. (2003). Issues in the reliability and validity of automated scoring of constructed responses. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A crossdisciplinary perspective (pp. 23–40). Mahwah, NJ: Erlbaum.
Cizek, G. J., & Page, B. A. (2003). The concept of reliability in the context of automated essay scoring. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 125–145). Mahwah, NJ: Erlbaum.
Clark, P., & Niblett, T. (1989). The CN2 induction algorithm. Machine Learning, 3, 261–283.
Cook, R., & Kay, J. (1994). The justified user model: A viewable, explained user model. In Fourth International Conference on User Modeling (pp. 145–150). Hyannis, MA: Mitre Corp.
Cover, T. M., & Hart, P. E. (1967). Nearest neighbor pattern classification. IEEE Transactions on Information Theory, 13, 21–27.
Cristianini, N., & Shawe-Taylor, J. (2000). An introduction to support vector machines and other kernel-based learning methods. Cambridge: Cambridge University Press.
Dimitrova, V. (2003). STyLE-OLM: Interactive open learner modelling. International Journal of Artificial Intelligence in Education, 13, 35–78.
Dougherty, J., Kohavi, R., & Sahami, M. (1995). Supervised and unsupervised discretization of continuous features. In Proceedings of the Twelfth International Conference on Machine Learning (pp. 194–202). Tahoe City, CA: Morgan Kaufmann.
Elosúa, M. R., García-Madruga, J. A., Gutiérrez, F., Luque, J. L., & Gárate, M. (2002). Effects of an intervention in active strategies for text comprehension and recall. Spanish Journal of Psychology, 5, 90–101.
Elvira Consortium (2002). Elvira: An environment for creating and using probabilistic graphical models. In J. A. Gámez & A. Salmerón (Eds.), Proceedings of the First European Workshop on Probabilistic Graphical Models (pp. 222–230), Cuenca, Spain.
Fayyad, U. M., & Irani, K. B. (1993). Multi-interval discretization of continuous-valued attributes for classification learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (pp. 1022–1027). Tahoe City, CA: Morgan Kaufmann.
Fisher, R. A. (1936). The use of multiple measurements in taxonomic problems. Annals of Eugenics, 7, 179–188.
Fitzgerald, J. (1987). Research on revision in writing. Review of Educational Research, 57, 481–506.
Friedman, N., Geiger, D., & Goldszmidt, M. (1997). Bayesian network classifiers. Machine Learning, 29, 131–163.
Garner, R. (1982). Efficient text summarization: Costs and benefits. Journal of Educational Research, 75, 275–279.
Garner, R. (1987). Strategies for reading and studying expository text. Educational Psychologist, 22, 299–312.
Genesee, F., & Upshur, J. A. (1996). Classroom-based evaluation in second language education. Cambridge: Cambridge University Press.
Glazer, E. M., & Hannafin, M. J. (2006). The collaborative apprenticeship model: Situated professional development within school settings. Teaching & Teacher Education, 22, 179–193.
Glymour, C. (2001). The mind’s arrows: Bayes nets and graphical causal models in psychology. Cambridge, MA: MIT Press.
Goldberg, G. L., & Roswell, B. S. (1999). From perception to practice: The impact of teachers’ scoring experience on performancebased instruction and classroom assessment. Educational Assessment, 6, 257–290.
Heckerman, D., Geiger, D., & Chickering, D. M. (1995). Learning Bayesian networks: The combination of knowledge and statistical data. Machine Learning, 20, 197–243.
Holland, J. H. (1975). Adaptation in natural and artificial systems: An introductory analysis with applications to biology, control, and artificial intelligence. Ann Arbor: University of Michigan Press.
Hosmer, D. W., Jr., & Lemeshow, S. (1989). Applied logistic regression. New York: Wiley.
Inoue, A. B. (2005). Community-based assessment pedagogy. Assessing Writing, 9, 208–238.
Jensen, F. V. (2001). Bayesian networks and decision graphs. New York: Springer.
Kerber, R. (1992). ChiMerge: Discretization for numeric attributes. In P. Rosenbloom & P. Szolovits (Eds.), Proceedings of the Tenth National Conference on Artificial Intelligence (pp. 123–128). Menlo Park, CA: AAAI Press.
Kintsch, W., & van Dijk, T. A. (1978). Toward a model of text comprehension and production. Psychological Review, 85, 363–394.
Kirby, J. R., & Pedwell, D. (1991). Students’ approaches to summarisation. Educational Psychology, 11, 297–307.
Kozminsky, E., & Graetz, N. (1986). First vs. second language comprehension: Some evidence from text summarizing. Journal of Research in Reading, 9, 3–21.
Kruskal, W. H., & Wallis, W. A. (1952). Use of ranks in one-criterion variance analysis. Journal of the American Statistical Association, 47, 583–621.
Landauer, T. K., & Dumais, S. T. (1997). A solution to Plato’s problem: The latent semantic analysis theory of acquisition, induction, and representation of knowledge. Psychological Review, 104, 211–240.
Langley, P., & Sage, S. (1994). Induction of selective Bayesian classifiers. In Proceedings of the Tenth Conference on Uncertainty in Artificial Intelligence (pp. 399–406). San Francisco: Morgan Kaufmann.
Lauritzen, S. L., & Spiegelhalter, D. J. (1988). Local computations with probabilities on graphical structures and their application to expert systems. Journal of the Royal Statistical Society: Series B, 50, 157–224.
Lehnert, W. G. (1981). Plots units and narrative summarization. Cognitive Science, 5, 293–331.
Long, J., & Harding-Esch, E. (1978). Summary and recall of text in first and second languages: Some factors contributing to performance differences. In D. Gerver & H. W. Sinaiko (Eds.), Language interpretation and communication (pp. 273–288). New York: Plenum.
Magnani, L. (2001). Abduction, reason, and science: Processes of discovery and explanation. New York: Kluwer/Plenum.
Magnani, L. (2004). Model-based and manipulative abduction in science. Foundations of Science, 9, 219–247.
Manelis, L., & Yekovich, F. R. (1984). Analysis of expository prose and its relation to learning. Journal of Structural Learning, 8, 29–44.
Mani, I., & Maybury, M. T. (1999). Advances in automatic text summarization. Cambridge, MA: MIT Press.
McCulloch, W. S., & Pitts, W. H. (1943). A logical calculus of the ideas immanent in nervous activity. Bulletin of Mathematical Biophysics, 5, 115–133.
Minsky, M. (1961). Steps toward artificial intelligence. Proceedings of the Institute of Radio Engineers, 49, 8–30.
Neapolitan, R. E. (2003). Learning Bayesian networks. Harlow, U.K.: Prentice Hall.
Page, E. B. (2003). Project essay grade: PEG. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 43–54). Mahwah, NJ: Erlbaum.
Pearl, J. (1987). Distributed revision of composite beliefs. Artificial Intelligence, 33, 173–215.
Pearl, J. (1988). Probabilistic reasoning in intelligent systems: Networks of plausible inference. San Mateo, CA: Morgan Kaufmann.
Peirce, C. S. (1955). Abduction and induction. In J. Buchler (Ed.), Philosophical writings of Peirce (pp. 150–156). New York: Dover.
Robinson, B., & Schaible, R. M. (1995). Collaborative teaching: Reaping the benefits. College Teaching, 43, 57–59.
Rumelhart, D. E. (1975). Notes on a schema for stories. In D. G. Bobrow & A. Collins (Eds.), Representation and understanding: Studies in cognitive science (pp. 185–210). New York: Academic Press.
Schank, R. C., Lebowitz, M., & Birnbaum, L. (1980). An integrated understander. American Journal of Computational Linguistics, 6, 13–30.
Sherrard, C. (1989). Teaching students to summarize: Applying textlinguistics. System, 17, 1–11.
Shimony, S. E., & Charniak, E. (1990). A new algorithm for finding MAP assignments to belief networks. In P. P. Bonissone, M. Henrion, L. N. Kanal, & J. F. Lemmer (Eds.), Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence (pp. 185–196). New York: Elsevier.
Spirtes, P., Glymour, C., & Scheines, R. (1993). Causation, prediction, and search. New York: Springer.
Stone, M. (1974). Cross-validatory choice and assessment of statistical predictions. Journal of the Royal Statistical Society: Series B, 36, 111–147.
Symons, S., & Pressley, M. (1993). Prior knowledge affects text search success and extraction of information. Reading Research Quarterly, 28, 250–261.
Taylor, B. M. (1982). Text structure and children’s comprehension and memory for expository material. Journal of Educational Psychology, 74, 323–340.
Thorndyke, P. W. (1977). Cognitive structures in comprehension and memory of narrative discourse. Cognitive Psychology, 9, 77–110.
Virvou, M., & Moundridou, M. (2001). Adding an instructor modelling component to the architecture of ITS authoring tools. International Journal of Artificial Intelligence in Education, 12, 185–211.
Whittaker, J. (1990). Graphical models in applied multivariate statistics. Chichester, U.K.: Wiley.
Winograd, P. N. (1984). Strategic difficulties in summarizing texts. Reading Research Quarterly, 19, 404–425.
Zipitria, I., Arruarte, A., & Elorriaga, J. A. (2006). Observing lemmatization effect in LSA coherence and comprehension grading of learner summaries. In M. Ikeda, K. D. Ashley, & T. W. Chan (Eds.), Proceedings of the 8th International Conference on Intelligent Tutoring Systems (ITS 2006) (pp. 595–603). Berlin: Springer.
Author information
Authors and Affiliations
Corresponding author
Additional information
Thiswork was partially supported by the University of the Basque Country (Grant UE06/19) and the Spanish Ministry of Education and Science (Grant TIN2006-14968-C02-01), as well as by the Gipuzkoa Council in collaboration with the European Union and by the Etortek, Saiotek, and Research Groups 2007-2012 (IT-242-07) programs (Basque Government), TIN2005-03824 and Consolider Ingenio 2010-CSD2007-00018 projects (Spanish Ministry of Education and Science), and COMBIOMED network in computational biomedicine (Carlos III Health Institute). R.A. is supported by Basque Government Grant AE-BFI-05/430.
Rights and permissions
About this article
Cite this article
Zipitria, I., Larrañaga, P., Armañanzas, R. et al. What is behind a summary-evaluation decision?. Behavior Research Methods 40, 597–612 (2008). https://doi.org/10.3758/BRM.40.2.597
Received:
Accepted:
Issue Date:
DOI: https://doi.org/10.3758/BRM.40.2.597