Observing Lemmatization Effect in LSA Coherence and Comprehension Grading of Learner Summaries

  • Iraide Zipitria
  • Ana Arruarte
  • Jon Ander Elorriaga
Part of the Lecture Notes in Computer Science book series (LNCS, volume 4053)


Current work in learner evaluation of Intelligent Tutoring Systems (ITSs), is moving towards open-ended educational content diagnosis. One of the main difficulties of this approach is to be able to automatically understand natural language. Our work is directed to produce automatic evaluation of learner summaries in Basque. Therefore, in addition to language comprehension, difficulties emerge from Basque morphology itself. In this work, Latent Semantic Analysis (LSA) is used to model comprehension in a language in which lemmatization has shown to be highly significant. This paper tests the influence of corpus lemmatization while performing automatic comprehension and coherence grading. Summaries graded by human judges in coherence and comprehension, have been tested against LSA based measures from source lemmatized and non-lemmatized corpora. After lemmatization, the amount of LSA known single terms was reduced in a 56% of its original number. As a result, LSA grades almost match human measures, producing no significant differences between the lemmatized and non-lemmatized approaches.


Latent Semantic Analysis Basque Country Text Comprehension Intelligent Tutor System Automatic Evaluation 
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Kintsch, W., Patel, V.L., Ericsson, K.A.: The role of long-term working memory in text comprehension. Psychologia 42, 186–198 (1999)Google Scholar
  2. 2.
    Barlett, F.C.: Remembering; a Studty in Experimental and Social Psychology. Cambridge University Press, Cambridge (1932)Google Scholar
  3. 3.
    Garner, R.: Efficient Text Summarization. Costs and Benefits. Journal of Educational Research 75(5), 275–279 (1982)Google Scholar
  4. 4.
    Zipitria, I., Elorriaga, J.A., Arruarte, A., de Ilarraza, A.D.: From Human to Automatic Summary Evaluation. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) ITS 2004. LNCS, vol. 3220, pp. 432–442. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  5. 5.
    Landauer, T.K., Dumais, S.T.: A solution to Plato’s problem: The Latent Semantic Analysis theory of acquisition, induction, and representation of knowledge. Psychological Review 104, 211–240 (1997)CrossRefGoogle Scholar
  6. 6.
    Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by Latent Semantic Analysis. Journal of the American Society of Information Science (1990)Google Scholar
  7. 7.
    Landauer, T.K., Foltz, P., Laham, D.: Introduction to Latent Semantic Analysis. Discourse Processes 25, 259–284 (1998)CrossRefGoogle Scholar
  8. 8.
    Foltz, P.W., Kintsch, W., Landauer, T.K.: The Measurement of Textual Coherence with Latent Semantic Analysis. Discourse Processes 25, 285–307 (1998)CrossRefGoogle Scholar
  9. 9.
    Wolfe, M.B.W., Schreiner, M.E., Rehder, B., Laham, D., Foltz, P.W., Kintsch, W., Lan-dauer, T.K.: Learning from text:Matching readers and texts by Latent Semantic Analysis. Discourse Processes 25, 309–336 (1998)CrossRefGoogle Scholar
  10. 10.
    Graesser, A.C., Person, N.K., Harter, D.: Teaching tactics and dialog in Autotutor. International Journal of Artificial Intelligence in Education 12, 257–279 (2001)Google Scholar
  11. 11.
    Wiemer-Hastings, P., Graesser, A.: Select-a-Kibitzer: A computer tool that gives meaningful feedback on student compositions. Interactive Learning Environments 8(2), 149–169 (2000)CrossRefGoogle Scholar
  12. 12.
    Wade-Stein, D., Kintsch, E.: Summary Street: Interactive Computer Support for Writing. Cognition and Instruction 22(3), 333–362 (2004)CrossRefGoogle Scholar
  13. 13.
    Miller, T.: Essay assessment with latent semantic analysis. Journal of Educational Computing Research 28 (2003)Google Scholar
  14. 14.
    Ventura, M.J., Franchescetti, D.R., Pennumatsa, P., Graesser, A.C., Hu, G.T.J.X., Cai, Z., Group, t.T.R.: Combining Computational Models of Short Essay Grading for Conceptual Physics Problems. In: Lester, J.C., Vicari, R.M., Paraguaçu, F. (eds.) ITS 2004. LNCS, vol. 3220, pp. 423–431. Springer, Heidelberg (2004)CrossRefGoogle Scholar
  15. 15.
    Tomasello, M.: Constructing a Language: A Usage-Based Theory of Language Acquisition. Harvard University Press, Cambridge (2003)Google Scholar
  16. 16.
    Palolahti, M., Leino, S., Jokela, M., Kopra, K., Paavilainen, P.: Event-related potentials suggest early interaction between syntax and semantics during on-line sentence comprehension. Neuroscience Letters 384(3), 222 (2005)CrossRefGoogle Scholar
  17. 17.
    Hagoort, P.: Interplay between Syntax and Semantics during Sentence Comprehension: ERP Effects of Combining Syntactic and Semantic Violations. Journal of Cognitive Neuroscience 15(6), 883–899 (2003)CrossRefGoogle Scholar
  18. 18.
    Landauer, T.K., Laham, D., Rehder, B., Schreiner, M.E.: How well can passage meaning be derived without using word order? A comparison of Latent Semantic Analysis and humans. In: 19th Annual Meeting of the Cognitive Science Society. Erlbaum, Mahwah (1997)Google Scholar
  19. 19.
    Wiemer-Hastings, P., Zipitria, I.: Rules for Syntax, Vectors for Semantics. In: Proceedings of the 23rd Annual Conference of the Cognitive Science Society. Erlbaum, Mahwah (2001)Google Scholar
  20. 20.
    Serafin, R., Eugenio, B.D.: FLSA: Extending Latent Semantic Analysis with Features for Dialogue Act Classification. In: Proceedings of the 42nd Annual Meeting of the Association for Computational Linguistics, Barcelona, Spain (2004)Google Scholar
  21. 21.
    Kanejiya, D., Kamar, A., Prasad, S.: Automatic Evaluation of Students’ Answers using Syntactically Enhanced LSA. In: Proceedings of the HLT-NAACL 2003 Workshop on Building Educational Applications Using Natural Language Processing (2003)Google Scholar
  22. 22.
    Olde, B.A., Franceschetti, D.R., Karnavat, A., Graesser, A.C., TRG.: The right stuff: Do you need to sanitize your corpus when using latent semantic analysis? In: 24rd Annual Conference of the Cognitive Science Society. Erlbaum, Mahwah (2002)Google Scholar
  23. 23.
    Landauer, T.K., Littman, M.L.: A statistical method for language-independent representation of the topical content of text segments. In: Proceedings of the Sixth Annual Conference of the UW Centre for the New Oxford English Dictionary and Text Research (1990)Google Scholar
  24. 24.
    Aduriz, I., Aranzabe, M.J., Arriola, J.M., de Ilarraza, A.D., Gojenola, K., Oronoz, M., Uria, L.: A Cascaded Syntactic Analyser for Basque. In: Gelbukh, A. (ed.) CICLing 2004. LNCS, vol. 2945, pp. 124–134. Springer, Heidelberg (2004)CrossRefGoogle Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2006

Authors and Affiliations

  • Iraide Zipitria
    • 1
    • 2
  • Ana Arruarte
    • 1
  • Jon Ander Elorriaga
    • 1
  1. 1.Language and Information Systems Department, Computer Science FacultyUniversity of the Basque CountryDonostia, Basque CountrySpain
  2. 2.Department of Research Methods in Education (MIDE)University of the Basque CountryDonostia, Basque CountrySpain

Personalised recommendations