Advertisement

Psychometrika

, Volume 84, Issue 1, pp 186–211 | Cite as

Penalized Best Linear Prediction of True Test Scores

  • Lili YaoEmail author
  • Shelby J. Haberman
  • Mo Zhang
Article
  • 162 Downloads

Abstract

In best linear prediction (BLP), a true test score is predicted by observed item scores and by ancillary test data. If the use of BLP rather than a more direct estimate of a true score has disparate impact for different demographic groups, then a fairness issue arises. To improve population invariance but to preserve much of the efficiency of BLP, a modified approach, penalized best linear prediction, is proposed that weights both mean square error of prediction and a quadratic measure of subgroup biases. The proposed methodology is applied to three high-stakes writing assessments.

Keywords

true test score PBLP subgroup biases 

Notes

Acknowledgements

Lili Yao was partially supported by the National Natural Science Foundation of China (61863012, 61263010) and partially by the Research Project of Science and Technology Department of Jiangxi Province, China (20181BBE50020, 20161BBE50082, 20161BAB202067).

References

  1. Attali, Y., & Burstein, J. (2006). Automated essay scoring with e-rater V.2. Journal of Technology, Learning and Assessment, 4(3), 1–29.Google Scholar
  2. Attali, Y., Burstein, J., & Andreyev, S. (2003). E-rater Version 2.0: Combining writing analysis feedback with automated essay scoring. Princeton, NJ: Educational Testing Service.Google Scholar
  3. Burstein, J., Chodorow, M., & Leacock, C. (2004). Automated essay evaluation: The Criterion online writing service. AI Magazine, 25(3), 27–36.Google Scholar
  4. Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel–Haenszel and standardization. Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
  5. Dorans, N. J., & Holland, P. W. (2000). Population invariance and the equitability of tests: Basic theory and the linear case. Journal of Educational Measurement, 37, 281–306.  https://doi.org/10.1111/j.1745-3984.2000.tb01088.x.CrossRefGoogle Scholar
  6. Haberman, S. J. (1984). Adjustment by minimum discriminant information. The Annals of Statistics, 12, 971–988.  https://doi.org/10.1214/aos/1176346715.CrossRefGoogle Scholar
  7. Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33, 204–229.  https://doi.org/10.3102/1076998607302636.CrossRefGoogle Scholar
  8. Haberman, S. J., & Qian, J. (2007). Linear prediction of a true score from a direct estimate and several derived estimates. Journal of Educational and Behavioral Statistics, 32, 6–23.  https://doi.org/10.3102/1076998606298036.CrossRefGoogle Scholar
  9. Haberman, S. J., & Sinharay, S. (2010a). The application of the cumulative logistic regression model to automated essay scoring. Journal of Educational and Behavioral Statistics, 35, 586–602.  https://doi.org/10.3102/1076998610375839.CrossRefGoogle Scholar
  10. Haberman, S. J., & Sinharay, S. (2010b). Reporting of subscores using multidimensional item response theory. Psychometrika, 75, 209–227.  https://doi.org/10.1007/S11336-010-9158-4.CrossRefGoogle Scholar
  11. Haberman, S. J. & Sinharay, S. (2011). How does the knowledge of subgroup membership of examinees affect the prediction of true subscores? Research Report No. RR-11-43. Princeton, NJ, Educational Testing Service.  https://doi.org/10.1002/j.2333-8504.2011.tb02279.x
  12. Haberman, S. J., & Sinharay, S. (2013). Does subgroup membership information lead to better estimation of true subscores? British Journal of Mathematical and Statistical Psychology, 66, 452–469.Google Scholar
  13. Haberman, S. J., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62, 79–95.  https://doi.org/10.1348/000711007x248875.CrossRefGoogle Scholar
  14. Haberman, S. J., & Yao, L. (2015). Repeater analysis for combining information from different assessments. Journal of Educational Measurement, 52, 223–251.  https://doi.org/10.1111/jedm.12075.CrossRefGoogle Scholar
  15. Haberman, S. J., Yao, L., & Sinharay, S. (2015). Prediction of true test scores from observed item scores and ancillary data. British Journal of Mathematical and Statistical Psychology, 68, 363–385.  https://doi.org/10.1111/bmsp.12052.CrossRefGoogle Scholar
  16. Lord, F. M., & Novick, M. R. (1968). Statistical theories of mental test scores. Reading, MA: Addison Wesley.Google Scholar
  17. Sinharay, S., Haberman, S. J., & Puhan, G. (2007). Subscores based on classical test theory: To report or not to report. Educational Measurement: Issues and Practice, 26, 421–28.  https://doi.org/10.1111/j.1745-3992.2007.00105.x.CrossRefGoogle Scholar
  18. Wainer, H., Sheehan, K., & Wang, X. (2000). Some paths toward making Praxis scores more useful. Journal of Educational Measurement, 37, 113–140.  https://doi.org/10.1111/j.1745-3984.2000.tb01079.x.CrossRefGoogle Scholar
  19. Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Swygert, K. A., & Thissen, D. (2001). Augmented scores-"Borrowing strength" to compute scores based on small numbers of items. In D. Thissen & H. Wainer (Eds.), Test scoring (pp. 343–387). Mahwah, NJ: Erlbaum.Google Scholar

Copyright information

© The Psychometric Society 2018

Authors and Affiliations

  1. 1.Educational Testing ServicePrincetonUSA
  2. 2.EdusoftJerusalemIsrael

Personalised recommendations