User Modeling and User-Adapted Interaction

, Volume 19, Issue 3, pp 243–266 | Cite as

Addressing the assessment challenge with an online system that tutors as it assesses

  • Mingyu FengEmail author
  • Neil Heffernan
  • Kenneth Koedinger
Original Paper


Secondary teachers across the United States are being asked to use formative assessment data (Black and Wiliam 1998a,b; Roediger and Karpicke 2006) to inform their classroom instruction. At the same time, critics of US government’s No Child Left Behind legislation are calling the bill “No Child Left Untested”. Among other things, critics point out that every hour spent assessing students is an hour lost from instruction. But, does it have to be? What if we better integrated assessment into classroom instruction and allowed students to learn during the test? We developed an approach that provides immediate tutoring on practice assessment items that students cannot solve on their own. Our hypothesis is that we can achieve more accurate assessment by not only using data on whether students get test items right or wrong, but by also using data on the effort required for students to solve a test item with instructional assistance. We have integrated assistance and assessment in the ASSISTment system. The system helps teachers make better use of their time by offering instruction to students while providing a more detailed evaluation of student abilities to the teachers, which is impossible under current approaches. Our approach for assessing student math proficiency is to use data that our system collects through its interactions with students to estimate their performance on an end-of-year high stakes state test. Our results show that we can do a reliably better job predicting student end-of-year exam scores by leveraging the interaction data, and the model based on only the interaction information makes better predictions than the traditional assessment model that uses only information about correctness on the test items.


Intelligent tutoring system ASSISTments Dynamic assessment Assistance metrics Interactive tutoring 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. Anozie, N., Junker, B.W.: Predicting end-of-year accountability assessment scores from monthly student records in an online tutoring system. In: Beck, J., Aimeur, E., Barnes, T. (eds.) Educational Data Mining: Papers from the AAAI Workshop, pp. 1–6. AAAI Press, Menlo Park, CA Technical Report WS-06-05 (2006)Google Scholar
  2. Ayers E., Junker B.W.: Do skills combine additively to predict task difficulty in eighth grade mathematics? In: Beck, J., Aimeur, E., Barnes, T. (eds.) Educational Data Mining: Papers from the AAAI Workshop, pp. 14–20. AAAI Press, Menlo Park, CA, Technical Report WS-06-05 (2006)Google Scholar
  3. Baker, R.S., Corbett, A.T., Koedinger, K.R.: Detecting student misuse of intelligent tutoring systems. In: James, C.L., Vicari, R.M., Paraguacu, F. (eds.) Intelligent Tutoring Systems: 7th International Conference ITS 2004, Maceió, Alagoas, Brazil Proceedings, pp. 531–540. Springer-Verlag Berlin Heidelberg, Berlin, Germany (2004)Google Scholar
  4. Baker, R.S., Roll, I., Corbett, A.T., Koedinger, K.R.: Do performance goals lead students to game the system? In: Proceedings of the 12th International Conference on Artificial Intelligence in Education, pp. 57–64. Netherlands, Amsterdam (2005)Google Scholar
  5. Beck J.E., Sison J.: Using knowledge tracing in a noisy environment to measure student reading proficiencies. Int. J. Artif. Intell. Educ. 16, 129–143 (2006)Google Scholar
  6. Beck J.E., Jia P., Mostow J.: Automatically assessing oral reading fluency in a computer tutor that listens. Technol. Instr. Cogn. Learn. 2, 61–81 (2004)Google Scholar
  7. Black P., Wiliam D.: Assessment and classroom learning. Assess. Educ.: Princ., Policy Pract. 5, 7–74 (1998a)CrossRefGoogle Scholar
  8. Black P., Wiliam D.: Inside the black box: raising standards through classroom assessment. Phi Delta Kappan 80(2), 139–149 (1998b)Google Scholar
  9. Boston, C.: The concept of formative assessment. Pract. Assess. Res. Eval. 8(9) (2002)Google Scholar
  10. Campione J.C., Brown A.L., Bryant R.J.: Individual differences in learning and memory. In: Sternberg, R.J. (eds) Human Abilities: An Information-processing Approach, pp. 103–126. W. H. Freeman, New York (1985)Google Scholar
  11. Corbett, A.T., Bhatnagar, A.: Student modeling in the ACT Programming Tutor: Adjusting a procedural learning model with declarative knowledge. User Modeling: Proceedings of the Sixth International Conference on User Modeling UM97 Chia Laguna, Sardinia, Italy, pp. 243–254. Springer-Verlag Wein, New York (1997)Google Scholar
  12. Corbett A.T., Anderson J.R., O’Brien A.T.: Student modeling in the ACT programming tutor. In: Nichols, P., Chipman, S., Brennan, R. (eds) Cognitively Diagnostic Assessment., Erlbaum, Hillsdale, NJ (1995)Google Scholar
  13. Computer Research Association.: Cyberinfrastructure for Education and Learning for the Future: a Vision and Research Agenda. Final report of Cyberlearning Workshop Series workshops held Fall 2004—Spring 2005 by the Computing Research Association and the International Society of the Learning Sciences. Retrieved from on 10 November 2006 (2005)
  14. Embretson S.E.: Structured Rasch models for measuring individual-difference in learning and change. Int. J. Psychol. 27(3–4), 372–372 (1992)Google Scholar
  15. Feng M., Heffernan N.T.: Towards live informing and automatic analyzing of student learning: Reporting in the assistment system. J. Interact. Learn. Res. 18(2), 207–230 (2007) AACE, Chesapeake, VAGoogle Scholar
  16. Feng, M., Heffernan, N.T., Koedinger, K.R.: Addressing the testing challenge with a web-based e-assessment system that tutors as it assesses. In: Carr, L.A., De Roure, D.C., Iyengar, A., Goble, C.A., Dahlin, M. (eds.) Proceedings of the Fifteenth International World Wide Web Conference, pp. 307–316. Edinburgh UK, 2006. ACM Press, New York, NY (2006)Google Scholar
  17. Feng M., Heffernan N., Beck J., Koedinger K.: Can we predict which groups of questions students will learn from?. In: Baker, Beck (eds) Proceedings of the First International Conference on Educational Data Mining, pp. 218–225. Montreal, Canada (2008)Google Scholar
  18. Feng M., Beck J., Heffernan N., Koedinger K.: Can an intelligent tutoring system predict math proficiency as well as a standardized test? In: Baker, Beck (eds) Proceedings of the First International Conference on Educational Data Mining, pp. 107–116. Montreal, Canada (2008)Google Scholar
  19. Fischer G., Seliger E.: Multidimensional linear logistic models for change. Chap. 19. In: Linden, W.J., Hambleton, R.K. (eds) Handbook of Modern Item Response Theory, Springer-Verlag, New York (1997)Google Scholar
  20. Grigorenko E.L., Sternberg R.J.: Dynamic testing. Psychol. Bull. 124, 75–111 (1998)CrossRefGoogle Scholar
  21. Hulin C.L., Lissak R.I., Drasgow F.: Recovery of two- and three-parameter logistic item characteristic curves: A Monte Carlo study. Appl. Psychol. Meas. 6(3), 249–260 (1982)CrossRefGoogle Scholar
  22. Jannarone R.J.: Conjunctive item response theory kernels. Psychometrika 55(3), 357–373 (1986)CrossRefGoogle Scholar
  23. Koedinger, K.R., Aleven, V., Heffernan, N.T., McLaren, B., Hockenberry, M.: Opening the door to non-programmers: authoring intelligent tutor behavior by demonstration. In: Proceedings of the 7th International Conference on Intelligent Tutoring Systems, pp. 162–173. Maceio, Brazil (2004)Google Scholar
  24. Massachusetts Department of Education.: Massachusetts Mathematics Curriculum Framework. Retrieved from, 6 November 2005 (2000)
  25. MCAS technical report.: Retrieved from, 5 August 2005 (2001)
  26. Mitchell T.: Machine Learning. McGraw-Hill, Columbus, OH (1997)zbMATHGoogle Scholar
  27. Mostow J., Aist G.: Evalutating tutors that listen: an overview of Project LISTEN. In: Feltovich, P. (eds) Smart Machines in Education, pp. 169–234. MIT/AAAI Press, Menlo Park, CA (2001)Google Scholar
  28. Olson, L.: State test programs mushroom as NCLB Mandate Kicks. In: Education Week, 20 November, pp. 10–14 (2004)Google Scholar
  29. Olson, L.: Special report: testing takes off. Education Week, 30 November 2005, pp. 10–14 (2005)Google Scholar
  30. Raftery A.E.: Bayesian model selection in social research. Sociol Methodol 25, 111–163 (1995)CrossRefGoogle Scholar
  31. Razzaq, L., Heffernan, N.T.: Scaffolding vs. hints in the Assistment System. In: Ikeda, Ashley, Chan (eds.) Proceedings of the 8th International Conference on Intelligent Tutoring Systems, pp. 635–644. Springer-Verlag, Jhongli, Taiwan, Berlin, Germany (2006)Google Scholar
  32. Razzaq, L., Feng, M., Nuzzo-Jones, G., Heffernan, N.T., Koedinger, K.R., Junker, B., Ritter, S., Knight, A., Aniszczyk, C., Choksey, S., Livak, T., Mercado, E., Turner, T.E., Upalekar, R., Walonoski, J.A., Macasek, M.A., Rasmussen, K.P.: The ASSISTment project: blending assessment and assisting. In: Proceedings of the 12th Annual Conference on Artificial Intelligence in Education. Amsterdam, The Netherlands, pp. 555–562. ISO Press, Amsterdam (2005)Google Scholar
  33. Razzaq, L., Heffernan, N.T., Lindeman, R.W.: What level of tutor interaction is best? In: Luckin, Koedinger (eds.) Proceedings of the 13th Conference on Artificial Intelligence in Education, pp. 222–229. IOS Press, Los Angeles, CA, Amsterdam, The Netherlands (2007)Google Scholar
  34. Roediger H.L. III, Karpicke J.D.: The power of testing memory. Perspect. Psychol. Sci. 1(3), 181–210 (2006)CrossRefGoogle Scholar
  35. Sternburg R.J., Grigorenko E.L.: All testing is dynamic testing. Issues Educ. 7, 137–170 (2001)Google Scholar
  36. Sternburg R.J., Grigorenko E.L.: Dynamic Testing: The Nature and Measurement of Learning Potential. Cambridge University Press, Cambridge (2002)Google Scholar
  37. Tan E.S., Imbos T., Does R.J.M.: A distribution-free approach to comparing growth of knowledge. J. Educ. Measure. 31(1), 51–65 (1994)CrossRefGoogle Scholar
  38. Tatsuoka K.K.: Rule space: an approach for dealing with misconceptions based on item response theory. J. Educ. Measure. 20, 345–354 (1983)CrossRefGoogle Scholar
  39. van der Linden, W.J., Hambleton, R.K. (eds.): Handbook of Modern Item Response Theory. Springer Verlag, New York, NY (1997)zbMATHGoogle Scholar
  40. Walonoski, J., Heffernan, N.T.: Detection and analysis of off-task gaming behavior in intelligent tutoring systems. In: Ikeda, Ashley, Chan (eds.) In: Proceedings of the 8th International Conference on Intelligent Tutoring Systems. Berlin, pp. 382–391. Springer-Verlag, Jhongli, Taiwan (2006)Google Scholar
  41. Zimowski, M., Muraki, E., Mislevy, R., Bock, D.: BILOG-MG 3—Multiple-Group IRT Analysis and Test maintenance for Binary Items. Scientific Software International, Inc., Lincolnwood, IL. URL (2005)

Copyright information

© Springer Science+Business Media B.V. 2009

Authors and Affiliations

  • Mingyu Feng
    • 1
    • 2
    Email author
  • Neil Heffernan
    • 1
  • Kenneth Koedinger
    • 3
  1. 1.Department of Computer ScienceWorcester Polytechnic InstituteWorcesterUSA
  2. 2.AustinUSA
  3. 3.Human Computer Interaction InstituteCarnegie Mellon UniversityPittsburghUSA

Personalised recommendations