Skip to main content
Log in

Ethics and Fairness in Assessing Learning Outcomes in Higher Education

  • Original Article
  • Published:
Higher Education Policy Aims and scope Submit manuscript

Fairness is one of the most important quality criteria of assessments and a necessary condition for valid test score interpretations. In this paper, we describe findings related to an assessment of N = 7664 beginning business and economics students at 46 universities across Germany using a domain-specific higher education entry test. From the perspective of test fairness as defined by the internationally established validation standards outlined by AERA et al. (Standards for Educational and Psychological Testing, AERA, Washington, DC, 2014), we identify which students had particular difficulty completing the test, taking into account gender- and language-related influence factors in the assessment of their test performance. Our results highlight the particular challenges in admissions testing in higher education. One of these challenges is finding a suitable way to address the disadvantages experienced by various groups of students to guarantee fairness and ethical integrity when developing and administering tests. Current migration developments and the overall internationalization of higher education have led to an increasing heterogeneity of student bodies. These challenges are therefore of particular relevance for education policy and practice when it comes to ensuring fairness when assessing learning outcomes for all student groups. This also applies in the context of commonly used standardized assessments and established examination practices in higher education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  • Abedi, J. (2006) ‘Language issues in item development’, in S.M. Downing and T.M. Haladyna (eds.) Handbook of test development, New Jersey: Lawrence Erlbaum Associates, pp. 377–398.

    Google Scholar 

  • American Educational Research Association (AERA) (2011) ‘Code of ethics’, Educational Researcher 40(3): 145–156.

    Article  Google Scholar 

  • American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME) and Joint Committee on Standards for Educational and Psychological Testing (2014) Standards for educational and psychological testing, Washington, DC: AERA.

    Google Scholar 

  • Avenia-Tapper, B. and Llosa, L. (2015) ‘Construct relevant or irrelevant? The role of linguistic complexity in the assessment of English language learners’ science knowledge’, Educational Assessment 20(2): 95–111.

    Article  Google Scholar 

  • Baker, F.B. and Kim, S.-H. (2004) Item response theory: parameter estimation techniques, New York: Dekker.

    Book  Google Scholar 

  • Boe, E.E., May H. and Boruch R.F. (2002) Student task persistence in the third international mathematics and science study: a major source of achievement differences at the national, classroom, and student levels, University of Pennsylvania: Center for Research and Evaluation in Social Policy.

  • Brückner, S. and Zlatkin-Troitschanskaia, O. (2018) ‘Threshold concepts for modeling and assessing higher education students’ understanding and learning in economics’, in O. Zlatkin-Troitschanskaia, M. Toepper, H.A. Pant, C. Lautenbach and C. Kuhn (eds.) Assessment of learning outcomes in higher education. Methodology of educational measurement and assessment, Cham: Springer, pp. 103–121.

    Chapter  Google Scholar 

  • Brückner, S., Förster, M., Zlatkin-Troitschanskaia, O. and Walstad, W.B. (2015a) ‘Effects of prior economic education, native language, and gender on economic knowledge of first-year students in higher education. A comparative study between Germany and the USA’, Studies in Higher Education 40(3): 437–453.

    Google Scholar 

  • Brückner, S., Förster, M., Zlatkin-Troitschanskaia, O., Happ, R., Walstad, W.B., Yamaoka, M. and Asano, T. (2015b) ‘Gender effects in assessment of economic knowledge and understanding: differences among undergraduate business and economics students in Germany, Japan, and the United States’, Peabody Journal of Education 90(4): 503–518.

    Article  Google Scholar 

  • Byrnes, J.P., Miller, D.C. and Schafer, W.D. (1999) ‘Gender difference in risk taking: a meta-analysis’, Psychological Bulletin 125(3): 367–383.

    Article  Google Scholar 

  • Camilli, G. (2006) ‘Test fairness’, in R.L. Brennan (ed.) Educational measurement, Westport, CT: American Council on Education, pp. 220–256.

    Google Scholar 

  • Childs, R.A. (1990) Gender bias and fairness. ERIC clearinghouse on tests measurement and evaluation Washington DC. http://www.ericdigests.org/pre-9218/gender.htm. Accessed 1 June 2018.

  • Cohen, J. (1988) Statistical power analysis for the behavioral sciences, Hillsdale, NJ: Erlbaum.

    Google Scholar 

  • Cole, N.S. and Zieky, M.J. (2001) ‘The new faces of fairness’, Journal of Educational Measurement 38(4), 369–382.

    Article  Google Scholar 

  • Council for Economic Education (CEE) (2010) Voluntary national content standards in economics. https://www.councilforeconed.org/wp-content/uploads/2012/03/voluntary-national-content-standards-2010.pdf. Accessed 1 June 2018.

  • Crocker, L. (2003) ‘Teaching for the test: Validity, fairness, and moral action’, Educational Measurement: Issues and Practice 22(3): 5–11.

    Article  Google Scholar 

  • Crooks, T.J., Kane, M.T. and Cohen, A.S. (1996) ‘Threats to the valid use of assessments’, Assessment in Education: Principles, Policy & Practice 3(3): 265–286.

    Article  Google Scholar 

  • DFG (2013) Safeguarding good scientific practice: recommendations of the commission on professional self regulation in science, Weinheim: Wiley.

    Google Scholar 

  • Edwards, D., Coates, H. and Friedman, T. (2013) ‘Using aptitude testing to diversify higher education intake — An Australian case study’, Journal of Higher Education Policy and Management 35(2): 136–152.

    Article  Google Scholar 

  • Eklöf, H. (2010) ‘Skill and will: test-taking motivation and assessment quality’, Assessment in Education: Principles, Policy & Practice 17(4): 345–356.

    Google Scholar 

  • Ercikan, K. and Pellegrino, J.W. (2017) Validation of score meaning for the next generation of assessments: The use of response processes, New York: Routledge.

    Book  Google Scholar 

  • European Group on Ethics in Science and New Technologies (2015) Statement on the formulation of a code of conduct for research integrity for projects funded by the European Commission. http://ec.europa.eu/research/ege/pdf/research_integrity_ege_statement.pdf#view=fit&pagemode=none. Accessed 1 June 2018.

  • Federal Office of Statistics [Statistisches Bundesamt (Destatis)] (2017) Education and CultureStudents at UniversitiesPreliminary Report Winter Term 2016/17 (subject series 11, series 4.1), Wiesbaden: Destatis Statistisches Bundesamt.

  • Federal Office of Statistics [Statistisches Bundesamt (Destatis)] (2018) Education and CultureStudents at Universities (subject series 11, series 4.1), Wiesbaden: Destatis Statistisches Bundesamt.

  • Finn, B. (2015) Measuring Motivation in Low-Stakes Assessments (ETS Research Report RR-15-19), Princeton, NJ: Educational Testing Service.

    Google Scholar 

  • Förster, M., Zlatkin-Troitschanskaia, O., Brückner, S., Happ, R., Hambleton, R. K., Walstad, W. B. et al. (2015) ‘Validating test score interpretations by cross-national comparison: comparing the results of students from Japan and Germany on an American test of economic knowledge in higher education‘, Zeitschrift für Psychologie (German Journal for Psychology) 223(1): 14–23.

    Article  Google Scholar 

  • German Council of Science and Humanities. (2012). Arbeitsbericht Prüfungsnoten an Hochschulen im Prüfungsjahr 2010 [Work Report Examination Grades at Universities in the Examination Year 2010]. https://www.wissenschaftsrat.de/download/archiv/2627-12.pdf. Accessed 30 May 2019.

  • Hambleton, R.K. and Zenisky, L. (2010) ‘Translating and adapting tests for cross-cultural assessments’, in D. Matsumoto and F. Van de Vijer (eds.) Cross-cultural research methods in psychology, Cambridge: Cambridge University Press, pp. 46–70.

    Chapter  Google Scholar 

  • Happ, R., Zlatkin-Troitschanksaia, O. and Schmidt, S. (2016) ‘An analysis of economic learning among undergraduates in introductory economics courses in Germany, Journal of Economic Education 47(4): 300–310.

    Article  Google Scholar 

  • Happ, R., Zlatkin-Troitschanskaia, O., and Förster, M. (2018) ‘How prior economic education influences beginning university students’ knowledge of economics’, Empirical Research in Vocational Education and Training 10(5): 1–20.

    Google Scholar 

  • Harkness, J. (2003) ‘Questionnaire translation’, in J. Harkness, F. van de Vijver and P. Mohler (eds.) Cross-cultural survey methods, Hoboken, NJ: Wiley, pp. 35–56.

    Google Scholar 

  • Hubley, A.M. and Zumbo, B.D. (2011) ‘Validity and the consequences of test interpretation and use’, Social Indicators Research 103(2): 219–230.

    Article  Google Scholar 

  • Hunter, J.E., Schmidt, F.L. and Rauschenberger, J.M. (1977) ‘Fairness of psychological tests: implications of four definitions for selection utility and minority hiring’, Journal of Applied Psychology 62(3): 245–260.

    Article  Google Scholar 

  • IBM Corp. (2017) IBM SPSS statistics for windows, version 25.0, Armonk, NY: IBM Corp.

  • International Test Commission (ITC) (2005) International Test Commission guidelines for translating and adapting tests. http://www.intestcom.org/files/guideline_test_adaptation.pdf. Accessed 1 June 2018.

  • Kane, M.T. (2013) ‘Validating the interpretations and uses of test scores’ Journal of Educational Measurement 50(1): 1–73.

    Article  Google Scholar 

  • Kim, H. and Lalancette, D. (2013) Literature review on the value-added measurement in higher education. http://www.oecd.org/education/skills-beyond-school/Litterature%20Review%20VAM.pdf. Accessed 1 June 2018.

  • Kong, X.J., Wise, S.L., Harmes, J.C. and Yang, S. (2006) ‘Motivational effects of praise in response-time based feedback: A follow-up study of the effort-monitoring CBT‘, in Annual Meeting of the National Council on Measurement in Education; 8–10 April 2006; San Francisco, USA.

  • Kunnan, A.J. (2010) ‘Test fairness and Toulmin’s argument structure’, Language Testing 27(2): 183–189.

    Article  Google Scholar 

  • Linn, R.L. (2008) Validation of uses and interpretations of state assessments, Washington, DC: Council of Chief State School Officers.

    Google Scholar 

  • Mercer, J.R. (1978) ‘Test validity, bias and fairness: an analysis from the perspective of the sociology of knowledge’, Interchange 9(1): 1–16.

    Article  Google Scholar 

  • Messick, S. (2000) ‘Consequences of test interpretation and use: the fusion of validity and values in psychological assessment’, in R.D. Goffin and E. Helmes (eds.) Problems and solutions in human assessment: Honoring Douglas N. Jackson at seventy, Boston: Kluwer Academic Publishers, pp. 3–20.

    Chapter  Google Scholar 

  • Michelsen, S., Sweetman, R., Stensaker, B. and Bleiklie, I. (2016) ‘Shaping perceptions of a policy instrument: the political–administrative formation of learning outcomes in higher education in Norway and England’, Higher Education Policy 29(3): 399–417.

    Article  Google Scholar 

  • Moosbrugger, H., and Höfling, V. (2010) ‘Standards für psychologisches Testen’ [Standards for psychological testing], in H. Moosbrugger and A. Kelava (eds.), Test- und Fragebogenkonstruktion [Test and Questionnaire Construction], Berlin: Springer, pp. 204–222.

  • Mutz, R., Bornmann, L., and Daniel, H.-D. (2015). ‘Testing for the fairness and predictive validity of research funding decisions: a multilevel multiple imputation for missing data approach using ex-ante and ex-post peer evaluation data from the Austrian science fund’, Journal of the Association for Information Science and Technology 66(11): 2321–2339.

    Article  Google Scholar 

  • Musekamp, F. and Pearce, J. (2016) ‘Student motivation in low-stakes assessment contexts: an exploratory analysis in engineering mechanics’, Assessment & Evaluation in Higher Education 41(5): 750–769.

    Article  Google Scholar 

  • OECD (2017). Education at a glance 2017: OECD indicators, Paris: OECD Publishing.

    Book  Google Scholar 

  • Orley, G.J. (2017) ‘Multiple imputation of the guessing parameter in the case of missing data’, Master of Arts thesis, College of Education and Human Sciences, University of Nebraska.

  • Owen, A.L. (2012) ‘Student characteristics, behavior, and performance in economics classes’, in G.M. Hoyt and K. McGoldrich (eds.) International handbook on teaching and learning, Northampton, MA: Edward Elgar, pp. 341–350.

    Google Scholar 

  • Pellegrino, J.W. (2010) The design of an assessment system for the race to the top: a learning sciences perspective on issues of growth and measurement, Princeton: Educational Testing Service.

    Google Scholar 

  • Powell, M. and Ansic, D. (1997) ‘Gender differences in risk behavior in financial decision-making: an experimental analysis’, Journal of Economic Psychology 18(6): 605–628.

    Article  Google Scholar 

  • Sawyer, R.L., Cole, N.S. and Cole, J.W.L. (1976) ‘Utilities and the issue of fairness in a decision theoretic model for selection’, Journal of Educational Measurement 13(1): 59–76.

    Article  Google Scholar 

  • Schipolowski, S., Wilhelm, O., and Schroeders, U. (2017) Berliner Test zur Erfassung fluider und kristalliner Intelligenz ab der 11. Jahrgangsstufe (BEFKI 11+) [Berlin test of fluid and crystallized in-telligence for grades 11 and above], Göttingen: Hogrefe.

  • Schütte, K., Zimmermann, F. and Köller, O. (2017) ‘The role of domain-specific ability self-concepts in the value students attach to school’, Learning and Individual Differences 56: 136–142.

    Article  Google Scholar 

  • Shepard, L.A. (1987) ‘The case for bias in tests of achievement and scholastic aptitude’, in S. Modgil and C. Modgil (eds.) Arthur Jensen: Consensus and controversy, London: Falmer Press, pp. 210–226.

    Google Scholar 

  • Spiel, C. and Schober, B. (2018) ‘Challenges for evaluation in higher education: entrance examinations and beyond: the sample case of medical education’, in O. Zlatkin-Troitschanskaia, M. Toepper, H. Pant, C. Lautenbach and C. Kuhn (eds.) Assessment of learning outcomes in higher education. Cross-national comparisons and perspectives, Cham: Springer, pp. 59–71.

    Chapter  Google Scholar 

  • Suarez Enciso, S. (2016). ‘The effects of missing data treatment on person ability estimates using IRT models‘, Master of Arts thesis, College of Education and Human Sciences, University of Nebraska.

  • Stata Corp (2013) Stata statistical software: release 13, College Station, TX: StataCorp LP.

    Google Scholar 

  • Vanclay, F., Baines, J.T. and Taylor, C.N. (2013) ‘Principles for ethical research involving humans: ethical professional practice in impact assessment Part I’, Impact Assessment and Project Appraisal (31)4: 243–253.

    Article  Google Scholar 

  • Verhoeven, B.H., Verwijnen, G.M., Scherpbier, A.J.J.A., and Van der Vleuten, C.P.M. (2002) ‘Growth of medical knowledge’ Medical Education 36:711–717.

  • Walker, C. (2011) ‘What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation’, Journal of Psychoeducational Assessment 29(4): 364–376.

    Article  Google Scholar 

  • Walstad, W.B., Rebeck, K. and Butters, R.B. (2013) Test of economic literacy: Examiner’s manual, New York: Council for Economic Education.

    Google Scholar 

  • Walstad, W.B. and Robson, D. (1997) ‘Differential item functioning and male-female differences on multiple-choice tests in economics’, Journal of Economic Education 28(2): 155–171.

    Article  Google Scholar 

  • Walstad, W.B., Schmidt, S., Zlatkin-Troitschanskaia, O. and Happ, R. (2018) ‘Pretest-posttest measurement of the economic knowledge of undergraduates — Estimating guessing effects‘, in Annual AEA conference on teaching and research in economic education; 5–7 January 2018; Philadelphia, USA.

  • Walstad, W.B. and Wagner, J. (2016) ‘The disaggregation of value-added test scores to assess learning outcomes in economics courses’, Journal of Economic Education 47(2): 121–131.

    Article  Google Scholar 

  • Walstad, W.B., Watts, M. and Rebeck, K. (2007) Test of understanding in college economics: examiner’s manual, New York: National Council on Economic Education.

    Google Scholar 

  • Wise, S.L. and DeMars, C.E. (2005) ‘Low examinee effort in low-stakes assessment: problems and potential solutions’, Educational Assessment 10(1): 1–17.

    Article  Google Scholar 

  • Wise, S.L. and Kong, X. (2005) ‘Response time effort: a new measure of examinee motivation in computer-based tests’, Applied Measurement in Education 18(2):163–183.

    Article  Google Scholar 

  • You, Z. and Hu, Y. (2013) ‘Walking a policy tightrope: the dilemma of balancing diversification and equality in Chinese college entrance examination reform’, Higher Education Policy 26(3): 309–324.

    Article  Google Scholar 

  • Zierky, M. (2006) ‘Fairness review in assessment’, in S.M. Downing and T.M. Haladyna (eds.) Handbook of test development, New Jersey: Lawrence Erlbaum Associates, Inc, pp. 359–376.

    Google Scholar 

  • Zlatkin-Troitschanskaia, O., Förster, M., Brückner, S., & Happ, R. (2014) ‘Insights from a German assessment of business and economics competence’, in H. Coates (Ed.) Higher education learning outcomes assessmentinternational perspectives. Frankfurt/Main: Peter Lang, pp. 175–197.

    Google Scholar 

  • Zlatkin-Troitschanskaia, O., Jitomirski, J., Happ, R., Molerov, D., Schlax, J., Kühling-Thees, C., Pant, H.A., Förster, M. and Brückner, S. (2019) ‘Validating a test for measuring knowledge and understanding of economics among university students’, Zeitschrift für pädagogische Psychologie (German Journal of Educational Psychology), in press.

  • Zlatkin-Troitschanskaia, O. and Pant, H.A. (2016) ‘Measurement advances and challenges in competency assessment in higher education’, Journal of Educational Measurement 53(3): 253–264.

    Article  Google Scholar 

  • Zlatkin-Troitschanskaia, O., Pant, H.A., Lautenbach, C., Molerov, D., Toepper, M. and Brückner, S. (2017) Modeling and measuring competencies in higher education - Approaches to challenges in higher education policy and practice. Wiesbaden: Springer.

    Google Scholar 

  • Zlatkin-Troitschanskaia, O., Shavelson, R.J. and Pant, H.A. (2018) ‘Assessment of learning outcomes in higher education. International comparisons and perspectives’, in C. Secolsky and C.B. Denison (eds.) Handbook on measurement, assessment, and evaluation in higher education, New York: Routledge, pp. 686–698.

    Google Scholar 

  • Zumbo, B.D. (2007) ‘Three generations of differential item functioning (DIF) analyses: considering where it has been, where it is now, and where it is going’, Language Assessment Quarterly 4(2): 223–233.

    Article  Google Scholar 

Download references

Acknowledgements

We would like to thank the anonymous reviewers and the editor for providing constructive feedback and helpful guidance in the revision of this paper. The study is being funded by the German Federal Ministry of Education and Research under Grant Number 01PK15001A.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to O. Zlatkin-Troitschanskaia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zlatkin-Troitschanskaia, O., Schlax, J., Jitomirski, J. et al. Ethics and Fairness in Assessing Learning Outcomes in Higher Education. High Educ Policy 32, 537–556 (2019). https://doi.org/10.1057/s41307-019-00149-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1057/s41307-019-00149-x

Keywords

Navigation