Skip to main content

Advertisement

Log in

Measuring Postsecondary Achievement: Lessons from Large-Scale Assessments in the K-12 Sector

  • Original Article
  • Published:
Higher Education Policy Aims and scope Submit manuscript

Interest in using large-scale standardized assessments in the postsecondary sector has been growing rapidly in recent years. However, our experience is still limited, and there is a serious dearth of research investigating the characteristics and effects of testing in the postsecondary sector. We have far more extensive experience with large-scale testing in the K-12 sector, particularly in the USA. In this paper, I discuss a number of important issues that have arisen in K-12 testing and explore their implications for testing in the postsecondary sector. These include mistaking the part for the whole, overstating comparability, adding functions to extant tests without sufficient justification or validation, Campbell’s Law, and unwarranted causal inference. All of these issues are relevant to assessment in the postsecondary sector, and some are more severe in that sector than in K-12 education. I end with recommendations for productive and appropriate uses of assessments in this sector.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Figure 1

Similar content being viewed by others

References

  • Allen, J. (2018) Personal communication, May 18.

  • American Educational Research Association, American Psychological Association, and National Council on Measurement in Education. (2014) Standard for educational and psychological testing (2014 Edition), Washington, DC: Authors.

    Google Scholar 

  • American Statistical Association. (2014) ASA statement on using value-added models for educational assessment, Author. https://www.amstat.org/asa/files/pdfs/POL-ASAVAM-Statement.pdf. Accessed 11 Apr 2019.

  • Astin, A.W. and Antonio, A.L. (2012) Assessment for excellence: the philosophy and practice of assessment and evaluation in higher education, Lanham, MD: Rowman & Littlefield.

  • Breakspear, S. (2012) The Policy Impact of PISA: An Exploration of the Normative Effects of International Benchmarking in School System Performance, Paris, OECD Publishing (OECD Education Working Papers No. 71), http://dx.doi.org/10.1787/5k9fdfqffr28-en.

  • Campbell, D.T. (1976) ‘Assessing the Impact of Planned Social Change,’ Occasional paper #8, in G.M. Lyons (ed.) Social Research and Public Policies, Hanover, NH: Dartmouth College.

    Google Scholar 

  • Castellano, K.E. and Ho, A.D. (2013) A practitioner’s guide to growth models, Washington, DC: Council of Chief State School Officers.

    Google Scholar 

  • Cizek, G.J. (2016) ‘Validating test score meaning and defending test score use: different aims, different methods’, Assessment in Education: Principles, Policy and Practice 23(2): 212–225.

    Google Scholar 

  • Coates, H. and Mahat, M. (2014) ‘Advancing student learning outcomes’ in H. Coates (ed.) Higher education learning outcomes assessment: international perspectives, Frankfurt am Main: Peter Lang, pp. 15–32.

    Google Scholar 

  • Corcoran, S.P., Jennings, J.L. and Beveridge, A.A. (2012) Teacher effectiveness on high- and low-stakes tests, New York University, working paper. Retrieved from https://www.nyu.edu/projects/corcoran/papers/Corcoran_Jennings_Houston_Teacher_Effects.pdf. Accessed 11 Apr 2019.

  • Council for Aid to Education. (2013) Performance assessment: CLA+ overview, New York, Author. Retrieved from https://2014.accreditation.ncsu.edu/pages/3.5/3.5.1/CLA.pdf. Accessed 11 Apr.

  • Hamilton, L.S., Nussbaum, E.M. and Snow, R.E. (1997) ‘Interview procedures for validating science assessments’, Applied Measurement in Education 10(2): 181–200.

    Article  Google Scholar 

  • Ho, A.D. (2007) ‘Discrepancies between score trends from NAEP and state tests: a scale-invariant perspective’, Educational Measurement: Issues and Practice 26(4): 11–20.

    Article  Google Scholar 

  • Holcombe, R., Jennings, J. and Koretz, D. (2013) ‘The roots of score inflation: an examination of opportunities in two states’ Tests’, in G. Sunderman (ed.) Charting reform, achieving equity in a diverse nation, Greenwich, CT: Information Age Publishing, pp. 163–189. http://dash.harvard.edu/handle/1/10880587. Accessed 11 Apr 2019.

  • Hoover, H.D., Dunbar, S.D., Frisbie, D.A., Oberley, K.R., Ordman, V.L., Naylor, R.J., Bray, G.B., Lewis, J.C., Qualls, A.L., Mengeling, M.A. and Shannon, G.P. (2003) The Iowa tests: guide to research and development, Forms A and B, Itasca, IL: Riverside Publishing.

    Google Scholar 

  • Jacob, B.A. (2005) ‘Accountability, incentives and behavior: the impact of high-stakes testing in the Chicago public schools’, Journal of Public Economics 89(5–6): 761–796.

    Article  Google Scholar 

  • Judd, T. and Keith, B. (2102) ‘Student learning outcomes at the program and institutional levels’, in C. Secolsky and D.B. Denison (eds.) Handbook on measurement, assessment, and evaluation in higher education, New York: Routledge, pp. 31–46.

  • Kane, M.T. (2006) ‘Validation’, in R.L. Brennan (ed.) Educational measurement (4th ed.), Westport, CT: American Council on Education/Praeger, pp. 17–64.

    Google Scholar 

  • Kane, M.T. (2016) ‘Explicating validity’, Assessment in Education: Principles, Policy and Practice 23(2): 198–211.

    Google Scholar 

  • Klein, S.P., Hamilton, L.S., McCaffrey, D.F., and Stecher, B.M. (2000) What do test scores in texas tell us? Santa Monica, CA: RAND (Issue Paper IP-202).

  • Klieme, E. (2016) TIMSSS 2015 and PISA 2015: How Are They Related At The Country Level? DIPF Working paper, Frankfurt, Germany: Deutsches Institut für Internationale Pädagogische Forschung.

  • Koretz, D. (2008) Measuring up: what educational testing really tells us, Cambridge, MA: Harvard University Press.

    Google Scholar 

  • Koretz, D. (2016) ‘Making the term “validity” useful’, Assessment in Education: Principles, Policy, and Practice 23(2): 290–292.

    Google Scholar 

  • Koretz, D. (2017) The testing charade: pretending to make schools better, Chicago: University of Chicago Press.

    Book  Google Scholar 

  • Koretz, D., and Barron, S.I. (1998) The validity of gains on the Kentucky Instructional Results Information System (KIRIS), Santa Monica, CA: RAND (MR-1014-ED).

  • Koretz, D. and Hamilton, L.S. (2006) ‘Testing for accountability in K-12’, in R.L. Brennan (ed.) Educational measurement (4th ed.), Westport, CT: American Council on Education/Praeger, pp. 531–578.

    Google Scholar 

  • Koretz, D., Linn, R.L., Dunbar, S.B. and Shepard, L.A. (1991) ‘The effects of high-stakes testing: preliminary evidence about generalization across tests,’ in R.L. Linn (chair), The effects of high stakes testing, symposium presented at the annual meetings of the American Educational Research Association and the National Council on Measurement in Education, Chicago, April. http://dash.harvard.edu/handle/1/10880553. Accessed 11 Apr 2019.

  • Lindquist, E.F. (1951) ‘Preliminary considerations in objective test construction’, in E.F. Lindquist (ed.) Educational measurement, Washington, DC: American Council on Education, pp. 119–184.

    Google Scholar 

  • Lockwood, J.R., McCaffrey, D.F., Hamilton, L.S., Stecher, B., Le, V. and Martinez, J.F. (2007) ‘The sensitivity of value-added teacher effect estimates to different mathematics achievement measures’, Journal of Educational Measurement 44(1): 47–67.

    Article  Google Scholar 

  • McCaffrey, D.F., Lockwood, J. R., Koretz, D.M. and Hamilton, L.S. (2003) Evaluating value-added models for teacher accountability, Santa Monica, CA: RAND (MG-158-EDU). Retrieved from http://www.rand.org/pubs/monographs/MG158.html. Accessed 11 Apr 2019.

  • Massachusetts Department of Elementary and Secondary Education. (2018) 2019 Next-generation MCAS test information for Grade 10 Mathematics, Malden, MA, Author (Revised September 7.) Retrieved from http://www.doe.mass.edu/mcas/tdd/math.html?section=nextgen. Accessed 11 Apr 2019.

  • Messick, S. (1989) ‘Validity’, in R. Linn (ed.) Educational measurement (3rd ed.), Washington, DC: American Council on Education, pp. 13–100.

    Google Scholar 

  • Mullis, I.V.S., Martin, M.O. and Foy, P. (2008) TIMSS 2007 International Mathematics Report, Newton, MA, TIMSS & PIRLS International Study Center, Boston College.

    Google Scholar 

  • Moore, K., Coates, H. and Croucher, G. (2014) ‘Understanding and improving higher education productivity’, in E. Hazelkorn, H. Coates and A.C. McCormick (eds.) Research handbook on quality, performance and accountability in higher education, Cheltenham, UK: Edward Elgar, pp. 161–177.

    Google Scholar 

  • Reardon, S.F. (2011) ‘The widening academic achievement gap between the rich and the poor: new evidence and possible explanations’, in R. Murnane and G. Duncan (eds.) Whither opportunity? Rising inequality and the uncertain life chances of low-income children, New York: Russell Sage Foundation, pp. 91–116.

    Google Scholar 

  • Rothstein, R. (2008) Holding accountability to account: How scholarship and experience in other fields inform exploration of performance incentives in education, Nashville: National Center on Performance Incentives, Vanderbilt Peabody College. Retrieved from http://www.epi.org/files/2014/holding-accountability-to-account.pdf. Accessed 11 Apr 2019.

  • Rubinstein, J. (2000) Cracking the MCAS Grade 10 Math, New York: Princeton Review Publishing.

    Google Scholar 

  • Secolsky, C. and Denison, D.B. (2012) Handbook on measurement, assessment, and evaluation in higher education, New York: Routledge.

    Book  Google Scholar 

  • Shavelson, R.J. (2010) Measuring college learning responsibly, Stanford, CA: Stanford University Press.

    Google Scholar 

  • Tremblay, K., Lalancette, D. and Roseveare, D. (2012) AHELO feasibility study report, Volume 1, Paris: OECD.

    Google Scholar 

  • U.S. Department of Education. (2006) A test of leadership: charting the future of U.S. higher education, Washington, DC: Author.

  • Waldow, F. (2009) ‘What PISA did and did not do: Germany after the ‘PISA-Shock”’, European Educational Research Journal 8: 476–483. Published online 1 January. http://dx.doi.org/10.2304/eerj.2009.8.3.476.

  • Williams, R. (2014) ‘Comparing and benchmarking higher education systems’, in E. Hazelkorn, H. Coates and A.C. McCormick (eds.) Research handbook on quality, performance and accountability in higher education, Cheltenham, UK: Edward Elgar, pp. 178–188.

    Google Scholar 

  • Wu, M. (2009) A Critical Comparison of the Contents of PISA and TIMSS Mathematics Assessments, unpublished working paper, University of Melbourne. Retrieved from https://edsurveys.rti.org/PISA/documents/WuA_Critical_Comparison_of_the_Contents_of_PISA_and_TIMSS_psg_WU_06.1.pdf. Accessed 11 Apr 2019.

  • Yamada, R. (2014) ‘Comparative analysis of learning outcomes assessment policy contexts’, in H. Coates (ed.) Higher education learning outcomes assessment: international perspectives, Frankfurt am Main: Peter Lang, pp. 33–48.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Daniel Koretz.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Koretz, D. Measuring Postsecondary Achievement: Lessons from Large-Scale Assessments in the K-12 Sector. High Educ Policy 32, 513–536 (2019). https://doi.org/10.1057/s41307-019-00142-4

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s41307-019-00142-4

Keywords

Navigation