Research in Higher Education

, Volume 58, Issue 6, pp 605–616 | Cite as

Measuring Longitudinal Gains in Student Learning: A Comparison of Rasch Scoring and Summative Scoring Approaches

  • Yue ZhaoEmail author
  • Jenny M. Y. Huen
  • Y. W. Chan


This study pioneers a Rasch scoring approach and compares it to a conventional summative approach for measuring longitudinal gains in student learning. In this methodological note, our proposed methodology is demonstrated using an example of rating scales in a student survey as part of a higher education outcome assessment. Such assessments have become increasingly important worldwide for purposes of institutional accreditation and accountability to stakeholders. Data were collected from a longitudinal study by tracking self-reported learning outcomes of individual students in the same cohort who completed the student learning experience questionnaire (SLEQ) in their first and final years. Rasch model was employed for item calibration and latent trait estimation, together with a scaling procedure of concurrent calibration incorporating a randomly equivalent group design and a single group design to measure the gains in self-reported learning outcomes as yielded by repeated measures. The extent to which Rasch scoring compared to the conventional summative scoring method in its sensitivity to change was quantified by a statistical index namely relative performance (RP). Findings indicated greater ability to capture learning outcomes gains from Rasch scoring over the conventional summative scoring method, with RP values ranging from 3 to 17% in the cognitive, social, and value domains of the SLEQ. The Rasch scoring approach and the scaling procedure presented in the study can be readily generalised to studies using rating scales to measure change in student learning in the higher education context. The methodological innovations and contributions of this study are discussed.


Rasch model Measurement of change Student learning outcomes Higher education Institutional assessment Item response theory 


  1. Andrich, D., Sheridan, B., & Luo, G. (2012). Rumm 2030. Perth: Rumm Laboratories.Google Scholar
  2. Baghaei, P. (2008). Local dependency and Rasch measures. Rasch Measurement Transactions, 21(3), 1105–1106.Google Scholar
  3. Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.Google Scholar
  4. Canty, A., & Ripley, B. (2015). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-17.Google Scholar
  5. Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  6. Colvin, K. F., Champaign, J., Liu, A., Zhou, Q., Fredericks, C., & Pritchard, D. E. (2014). Learning in an introductory Physics MOOC: All cohorts learn equally, including an on-campus class. The International Review of Research in Open and Distributed Learning, 15(4), 1–11.CrossRefGoogle Scholar
  7. Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.CrossRefGoogle Scholar
  8. Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their applications. Cambridge: University Press.CrossRefGoogle Scholar
  9. Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185.CrossRefGoogle Scholar
  10. Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications.Google Scholar
  11. Harwell, M. R., & Gatti, G. G. (2001). Rescaling ordinal data to interval data in educational research. Review of Educational Research, 71, 105–131.CrossRefGoogle Scholar
  12. Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. New York, NY: Springer.CrossRefGoogle Scholar
  13. Lindman, H. R. (1974). Analysis of variance in complex experimental designs. San Francisco: Freeman.Google Scholar
  14. Liu, O. L. (2011). Outcomes assessment in higher education: Challenges and future research in the context of voluntary system of accountability. Educational Measurement: Issues and Practice, 30(3), 2–9.CrossRefGoogle Scholar
  15. Luo, N., Johnson, J. A., Shaw, J. W., & Coons, S. J. (2009). Relative efficiency of the EQ-5D, HUI2, and HUI3 index scores in measuring health burden of chronic medical conditions in a population health survey in the United States. Medical Care, 47(1), 53–60.CrossRefGoogle Scholar
  16. McHorney, C. A., Ware, J. E., Jr., Rogers, W., Raczek, A. E., & Lu, J. R. (1992). The validity and relative precision of MOS short-and long-form health status scales and Dartmouth COOP charts: Results from the Medical Outcomes Study. Medical Care, 30(5 Suppl), MS253–MS265.Google Scholar
  17. Norquist, J. M., Fitzpatrick, R., Dawson, J., & Jenkinson, C. (2004). Comparing alternative Rasch-based methods vs raw scores in measuring change in health. Medical Care, 42(1), I25–I36.Google Scholar
  18. Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.Google Scholar
  19. Paek, I., Baek, S. G., & Wilson, M. (2012). An IRT modeling of change over time for repeated measures item response data using a random weights linear logistic test model approach. Asia Pacific Education Review, 13(3), 487–494.CrossRefGoogle Scholar
  20. Prosser, M., Trigwell, K., Hazel, E., & Gallagher, P. (1994). Students’ experiences of learning and teaching at the topic level. Research and Development in Higher Education, 16, 305–310.Google Scholar
  21. R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria.
  22. Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago, IL: University of Chicago Press.Google Scholar
  23. Sharkness, J., & DeAngelo, L. (2011). Measuring student involvement: A comparison of classical test theory and item response theory in the construction of scales from student surveys. Research in Higher Education, 52(5), 480–507.CrossRefGoogle Scholar
  24. Wang, W. C., & Wu, Chyi-In. (2004). Gain score in item response theory as an effect size measure. Educational and Psychological Measurement, 64(5), 758–780.CrossRefGoogle Scholar
  25. Waugh, R. F. (1999). Approaches to studying for students in higher education: A Rasch measurement model analysis. British Journal of Educational Psychology, 69(1), 63–79.CrossRefGoogle Scholar
  26. Wright, B. D. (1996). Comparison requires stability. Rasch Measurement Transactions, 10, 506.Google Scholar
  27. Xie, Q., Zhong, X., Wang, W. C., & Lim, C. P. (2014). Development of an item-bank to assess generic competence in a higher education institute: A Rasch modelling approach. Higher Education Research and Development, 33(4), 821–835.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media New York 2016

Authors and Affiliations

  1. 1.The University of Hong KongHong Kong SARChina

Personalised recommendations