Skip to main content

Measuring Longitudinal Gains in Student Learning: A Comparison of Rasch Scoring and Summative Scoring Approaches

Abstract

This study pioneers a Rasch scoring approach and compares it to a conventional summative approach for measuring longitudinal gains in student learning. In this methodological note, our proposed methodology is demonstrated using an example of rating scales in a student survey as part of a higher education outcome assessment. Such assessments have become increasingly important worldwide for purposes of institutional accreditation and accountability to stakeholders. Data were collected from a longitudinal study by tracking self-reported learning outcomes of individual students in the same cohort who completed the student learning experience questionnaire (SLEQ) in their first and final years. Rasch model was employed for item calibration and latent trait estimation, together with a scaling procedure of concurrent calibration incorporating a randomly equivalent group design and a single group design to measure the gains in self-reported learning outcomes as yielded by repeated measures. The extent to which Rasch scoring compared to the conventional summative scoring method in its sensitivity to change was quantified by a statistical index namely relative performance (RP). Findings indicated greater ability to capture learning outcomes gains from Rasch scoring over the conventional summative scoring method, with RP values ranging from 3 to 17% in the cognitive, social, and value domains of the SLEQ. The Rasch scoring approach and the scaling procedure presented in the study can be readily generalised to studies using rating scales to measure change in student learning in the higher education context. The methodological innovations and contributions of this study are discussed.

This is a preview of subscription content, access via your institution.

Notes

  1. In each administration year, a response rate of around 61% was achieved.

  2. The other SLEQ items which are out of the scope of the current study assess students’ perceptions of their teaching and learning environment.

References

  • Andrich, D., Sheridan, B., & Luo, G. (2012). Rumm 2030. Perth: Rumm Laboratories.

    Google Scholar 

  • Baghaei, P. (2008). Local dependency and Rasch measures. Rasch Measurement Transactions, 21(3), 1105–1106.

    Google Scholar 

  • Benjamini, Y., & Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. Journal of the Royal Statistical Society: Series B, 57(1), 289–300.

    Google Scholar 

  • Canty, A., & Ripley, B. (2015). boot: Bootstrap R (S-Plus) Functions. R package version 1.3-17.

  • Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.

    Google Scholar 

  • Colvin, K. F., Champaign, J., Liu, A., Zhou, Q., Fredericks, C., & Pritchard, D. E. (2014). Learning in an introductory Physics MOOC: All cohorts learn equally, including an on-campus class. The International Review of Research in Open and Distributed Learning, 15(4), 1–11.

    Article  Google Scholar 

  • Cronbach, L. J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297–334.

    Article  Google Scholar 

  • Davison, A. C., & Hinkley, D. V. (1997). Bootstrap methods and their applications. Cambridge: University Press.

    Book  Google Scholar 

  • Efron, B. (1987). Better bootstrap confidence intervals. Journal of the American Statistical Association, 82(397), 171–185.

    Article  Google Scholar 

  • Hambleton, R. K., Swaminathan, H., & Rogers, H. (1991). Fundamentals of item response theory. Newbury Park, CA: Sage Publications.

    Google Scholar 

  • Harwell, M. R., & Gatti, G. G. (2001). Rescaling ordinal data to interval data in educational research. Review of Educational Research, 71, 105–131.

    Article  Google Scholar 

  • Kolen, M. J., & Brennan, R. L. (2004). Test equating, scaling, and linking. New York, NY: Springer.

    Book  Google Scholar 

  • Lindman, H. R. (1974). Analysis of variance in complex experimental designs. San Francisco: Freeman.

    Google Scholar 

  • Liu, O. L. (2011). Outcomes assessment in higher education: Challenges and future research in the context of voluntary system of accountability. Educational Measurement: Issues and Practice, 30(3), 2–9.

    Article  Google Scholar 

  • Luo, N., Johnson, J. A., Shaw, J. W., & Coons, S. J. (2009). Relative efficiency of the EQ-5D, HUI2, and HUI3 index scores in measuring health burden of chronic medical conditions in a population health survey in the United States. Medical Care, 47(1), 53–60.

    Article  Google Scholar 

  • McHorney, C. A., Ware, J. E., Jr., Rogers, W., Raczek, A. E., & Lu, J. R. (1992). The validity and relative precision of MOS short-and long-form health status scales and Dartmouth COOP charts: Results from the Medical Outcomes Study. Medical Care, 30(5 Suppl), MS253–MS265.

    Google Scholar 

  • Norquist, J. M., Fitzpatrick, R., Dawson, J., & Jenkinson, C. (2004). Comparing alternative Rasch-based methods vs raw scores in measuring change in health. Medical Care, 42(1), I25–I36.

    Google Scholar 

  • Nunnally, J. C., & Bernstein, I. H. (1994). Psychometric theory (3rd ed.). New York, NY: McGraw-Hill.

    Google Scholar 

  • Paek, I., Baek, S. G., & Wilson, M. (2012). An IRT modeling of change over time for repeated measures item response data using a random weights linear logistic test model approach. Asia Pacific Education Review, 13(3), 487–494.

    Article  Google Scholar 

  • Prosser, M., Trigwell, K., Hazel, E., & Gallagher, P. (1994). Students’ experiences of learning and teaching at the topic level. Research and Development in Higher Education, 16, 305–310.

    Google Scholar 

  • R Core Team (2013). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. http://www.R-project.org/.

  • Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. Chicago, IL: University of Chicago Press.

    Google Scholar 

  • Sharkness, J., & DeAngelo, L. (2011). Measuring student involvement: A comparison of classical test theory and item response theory in the construction of scales from student surveys. Research in Higher Education, 52(5), 480–507.

    Article  Google Scholar 

  • Wang, W. C., & Wu, Chyi-In. (2004). Gain score in item response theory as an effect size measure. Educational and Psychological Measurement, 64(5), 758–780.

    Article  Google Scholar 

  • Waugh, R. F. (1999). Approaches to studying for students in higher education: A Rasch measurement model analysis. British Journal of Educational Psychology, 69(1), 63–79.

    Article  Google Scholar 

  • Wright, B. D. (1996). Comparison requires stability. Rasch Measurement Transactions, 10, 506.

    Google Scholar 

  • Xie, Q., Zhong, X., Wang, W. C., & Lim, C. P. (2014). Development of an item-bank to assess generic competence in a higher education institute: A Rasch modelling approach. Higher Education Research and Development, 33(4), 821–835.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Yue Zhao.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Zhao, Y., Huen, J.M.Y. & Chan, Y.W. Measuring Longitudinal Gains in Student Learning: A Comparison of Rasch Scoring and Summative Scoring Approaches. Res High Educ 58, 605–616 (2017). https://doi.org/10.1007/s11162-016-9441-z

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11162-016-9441-z

Keywords

  • Rasch model
  • Measurement of change
  • Student learning outcomes
  • Higher education
  • Institutional assessment
  • Item response theory