Ethics and Fairness in Assessing Learning Outcomes in Higher Education

Zlatkin-Troitschanskaia, O.; Schlax, J.; Jitomirski, J.; Happ, R.; Kühling-Thees, C.; Brückner, S.; Pant, H. A.

doi:10.1057/s41307-019-00149-x

Ethics and Fairness in Assessing Learning Outcomes in Higher Education

Original Article
Published: 16 July 2019

Volume 32, pages 537–556, (2019)
Cite this article

Higher Education Policy Aims and scope Submit manuscript

O. Zlatkin-Troitschanskaia¹,
J. Schlax¹,
J. Jitomirski²,
R. Happ¹,
C. Kühling-Thees¹,
S. Brückner¹ &
…
H. A. Pant²

732 Accesses
13 Citations
Explore all metrics

Fairness is one of the most important quality criteria of assessments and a necessary condition for valid test score interpretations. In this paper, we describe findings related to an assessment of N = 7664 beginning business and economics students at 46 universities across Germany using a domain-specific higher education entry test. From the perspective of test fairness as defined by the internationally established validation standards outlined by AERA et al. (Standards for Educational and Psychological Testing, AERA, Washington, DC, 2014), we identify which students had particular difficulty completing the test, taking into account gender- and language-related influence factors in the assessment of their test performance. Our results highlight the particular challenges in admissions testing in higher education. One of these challenges is finding a suitable way to address the disadvantages experienced by various groups of students to guarantee fairness and ethical integrity when developing and administering tests. Current migration developments and the overall internationalization of higher education have led to an increasing heterogeneity of student bodies. These challenges are therefore of particular relevance for education policy and practice when it comes to ensuring fairness when assessing learning outcomes for all student groups. This also applies in the context of commonly used standardized assessments and established examination practices in higher education.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Promoting Valid Assessment of Students with Disabilities and English Learners

The effect of a national education policy on language test performance: a fairness perspective

Article Open access 15 February 2015

Valid for the Elites? The Trade-Off Between Test Fairness and Test Validity

References

Abedi, J. (2006) ‘Language issues in item development’, in S.M. Downing and T.M. Haladyna (eds.) Handbook of test development, New Jersey: Lawrence Erlbaum Associates, pp. 377–398.
Google Scholar
American Educational Research Association (AERA) (2011) ‘Code of ethics’, Educational Researcher 40(3): 145–156.
Article Google Scholar
American Educational Research Association (AERA), American Psychological Association (APA), National Council on Measurement in Education (NCME) and Joint Committee on Standards for Educational and Psychological Testing (2014) Standards for educational and psychological testing, Washington, DC: AERA.
Google Scholar
Avenia-Tapper, B. and Llosa, L. (2015) ‘Construct relevant or irrelevant? The role of linguistic complexity in the assessment of English language learners’ science knowledge’, Educational Assessment 20(2): 95–111.
Article Google Scholar
Baker, F.B. and Kim, S.-H. (2004) Item response theory: parameter estimation techniques, New York: Dekker.
Book Google Scholar
Boe, E.E., May H. and Boruch R.F. (2002) Student task persistence in the third international mathematics and science study: a major source of achievement differences at the national, classroom, and student levels, University of Pennsylvania: Center for Research and Evaluation in Social Policy.
Brückner, S. and Zlatkin-Troitschanskaia, O. (2018) ‘Threshold concepts for modeling and assessing higher education students’ understanding and learning in economics’, in O. Zlatkin-Troitschanskaia, M. Toepper, H.A. Pant, C. Lautenbach and C. Kuhn (eds.) Assessment of learning outcomes in higher education. Methodology of educational measurement and assessment, Cham: Springer, pp. 103–121.
Chapter Google Scholar
Brückner, S., Förster, M., Zlatkin-Troitschanskaia, O. and Walstad, W.B. (2015a) ‘Effects of prior economic education, native language, and gender on economic knowledge of first-year students in higher education. A comparative study between Germany and the USA’, Studies in Higher Education 40(3): 437–453.
Google Scholar
Brückner, S., Förster, M., Zlatkin-Troitschanskaia, O., Happ, R., Walstad, W.B., Yamaoka, M. and Asano, T. (2015b) ‘Gender effects in assessment of economic knowledge and understanding: differences among undergraduate business and economics students in Germany, Japan, and the United States’, Peabody Journal of Education 90(4): 503–518.
Article Google Scholar
Byrnes, J.P., Miller, D.C. and Schafer, W.D. (1999) ‘Gender difference in risk taking: a meta-analysis’, Psychological Bulletin 125(3): 367–383.
Article Google Scholar
Camilli, G. (2006) ‘Test fairness’, in R.L. Brennan (ed.) Educational measurement, Westport, CT: American Council on Education, pp. 220–256.
Google Scholar
Childs, R.A. (1990) Gender bias and fairness. ERIC clearinghouse on tests measurement and evaluation Washington DC. http://www.ericdigests.org/pre-9218/gender.htm. Accessed 1 June 2018.
Cohen, J. (1988) Statistical power analysis for the behavioral sciences, Hillsdale, NJ: Erlbaum.
Google Scholar
Cole, N.S. and Zieky, M.J. (2001) ‘The new faces of fairness’, Journal of Educational Measurement 38(4), 369–382.
Article Google Scholar
Council for Economic Education (CEE) (2010) Voluntary national content standards in economics. https://www.councilforeconed.org/wp-content/uploads/2012/03/voluntary-national-content-standards-2010.pdf. Accessed 1 June 2018.
Crocker, L. (2003) ‘Teaching for the test: Validity, fairness, and moral action’, Educational Measurement: Issues and Practice 22(3): 5–11.
Article Google Scholar
Crooks, T.J., Kane, M.T. and Cohen, A.S. (1996) ‘Threats to the valid use of assessments’, Assessment in Education: Principles, Policy & Practice 3(3): 265–286.
Article Google Scholar
DFG (2013) Safeguarding good scientific practice: recommendations of the commission on professional self regulation in science, Weinheim: Wiley.
Google Scholar
Edwards, D., Coates, H. and Friedman, T. (2013) ‘Using aptitude testing to diversify higher education intake — An Australian case study’, Journal of Higher Education Policy and Management 35(2): 136–152.
Article Google Scholar
Eklöf, H. (2010) ‘Skill and will: test-taking motivation and assessment quality’, Assessment in Education: Principles, Policy & Practice 17(4): 345–356.
Google Scholar
Ercikan, K. and Pellegrino, J.W. (2017) Validation of score meaning for the next generation of assessments: The use of response processes, New York: Routledge.
Book Google Scholar
European Group on Ethics in Science and New Technologies (2015) Statement on the formulation of a code of conduct for research integrity for projects funded by the European Commission. http://ec.europa.eu/research/ege/pdf/research_integrity_ege_statement.pdf#view=fit&pagemode=none. Accessed 1 June 2018.
Federal Office of Statistics [Statistisches Bundesamt (Destatis)] (2017) Education and Culture — Students at Universities — Preliminary Report Winter Term 2016/17 (subject series 11, series 4.1), Wiesbaden: Destatis Statistisches Bundesamt.
Federal Office of Statistics [Statistisches Bundesamt (Destatis)] (2018) Education and Culture — Students at Universities (subject series 11, series 4.1), Wiesbaden: Destatis Statistisches Bundesamt.
Finn, B. (2015) Measuring Motivation in Low-Stakes Assessments (ETS Research Report RR-15-19), Princeton, NJ: Educational Testing Service.
Google Scholar
Förster, M., Zlatkin-Troitschanskaia, O., Brückner, S., Happ, R., Hambleton, R. K., Walstad, W. B. et al. (2015) ‘Validating test score interpretations by cross-national comparison: comparing the results of students from Japan and Germany on an American test of economic knowledge in higher education‘, Zeitschrift für Psychologie (German Journal for Psychology) 223(1): 14–23.
Article Google Scholar
German Council of Science and Humanities. (2012). Arbeitsbericht Prüfungsnoten an Hochschulen im Prüfungsjahr 2010 [Work Report Examination Grades at Universities in the Examination Year 2010]. https://www.wissenschaftsrat.de/download/archiv/2627-12.pdf. Accessed 30 May 2019.
Hambleton, R.K. and Zenisky, L. (2010) ‘Translating and adapting tests for cross-cultural assessments’, in D. Matsumoto and F. Van de Vijer (eds.) Cross-cultural research methods in psychology, Cambridge: Cambridge University Press, pp. 46–70.
Chapter Google Scholar
Happ, R., Zlatkin-Troitschanksaia, O. and Schmidt, S. (2016) ‘An analysis of economic learning among undergraduates in introductory economics courses in Germany’, Journal of Economic Education 47(4): 300–310.
Article Google Scholar
Happ, R., Zlatkin-Troitschanskaia, O., and Förster, M. (2018) ‘How prior economic education influences beginning university students’ knowledge of economics’, Empirical Research in Vocational Education and Training 10(5): 1–20.
Google Scholar
Harkness, J. (2003) ‘Questionnaire translation’, in J. Harkness, F. van de Vijver and P. Mohler (eds.) Cross-cultural survey methods, Hoboken, NJ: Wiley, pp. 35–56.
Google Scholar
Hubley, A.M. and Zumbo, B.D. (2011) ‘Validity and the consequences of test interpretation and use’, Social Indicators Research 103(2): 219–230.
Article Google Scholar
Hunter, J.E., Schmidt, F.L. and Rauschenberger, J.M. (1977) ‘Fairness of psychological tests: implications of four definitions for selection utility and minority hiring’, Journal of Applied Psychology 62(3): 245–260.
Article Google Scholar
IBM Corp. (2017) IBM SPSS statistics for windows, version 25.0, Armonk, NY: IBM Corp.
International Test Commission (ITC) (2005) International Test Commission guidelines for translating and adapting tests. http://www.intestcom.org/files/guideline_test_adaptation.pdf. Accessed 1 June 2018.
Kane, M.T. (2013) ‘Validating the interpretations and uses of test scores’ Journal of Educational Measurement 50(1): 1–73.
Article Google Scholar
Kim, H. and Lalancette, D. (2013) Literature review on the value-added measurement in higher education. http://www.oecd.org/education/skills-beyond-school/Litterature%20Review%20VAM.pdf. Accessed 1 June 2018.
Kong, X.J., Wise, S.L., Harmes, J.C. and Yang, S. (2006) ‘Motivational effects of praise in response-time based feedback: A follow-up study of the effort-monitoring CBT‘, in Annual Meeting of the National Council on Measurement in Education; 8–10 April 2006; San Francisco, USA.
Kunnan, A.J. (2010) ‘Test fairness and Toulmin’s argument structure’, Language Testing 27(2): 183–189.
Article Google Scholar
Linn, R.L. (2008) Validation of uses and interpretations of state assessments, Washington, DC: Council of Chief State School Officers.
Google Scholar
Mercer, J.R. (1978) ‘Test validity, bias and fairness: an analysis from the perspective of the sociology of knowledge’, Interchange 9(1): 1–16.
Article Google Scholar
Messick, S. (2000) ‘Consequences of test interpretation and use: the fusion of validity and values in psychological assessment’, in R.D. Goffin and E. Helmes (eds.) Problems and solutions in human assessment: Honoring Douglas N. Jackson at seventy, Boston: Kluwer Academic Publishers, pp. 3–20.
Chapter Google Scholar
Michelsen, S., Sweetman, R., Stensaker, B. and Bleiklie, I. (2016) ‘Shaping perceptions of a policy instrument: the political–administrative formation of learning outcomes in higher education in Norway and England’, Higher Education Policy 29(3): 399–417.
Article Google Scholar
Moosbrugger, H., and Höfling, V. (2010) ‘Standards für psychologisches Testen’ [Standards for psychological testing], in H. Moosbrugger and A. Kelava (eds.), Test- und Fragebogenkonstruktion [Test and Questionnaire Construction], Berlin: Springer, pp. 204–222.
Mutz, R., Bornmann, L., and Daniel, H.-D. (2015). ‘Testing for the fairness and predictive validity of research funding decisions: a multilevel multiple imputation for missing data approach using ex-ante and ex-post peer evaluation data from the Austrian science fund’, Journal of the Association for Information Science and Technology 66(11): 2321–2339.
Article Google Scholar
Musekamp, F. and Pearce, J. (2016) ‘Student motivation in low-stakes assessment contexts: an exploratory analysis in engineering mechanics’, Assessment & Evaluation in Higher Education 41(5): 750–769.
Article Google Scholar
OECD (2017). Education at a glance 2017: OECD indicators, Paris: OECD Publishing.
Book Google Scholar
Orley, G.J. (2017) ‘Multiple imputation of the guessing parameter in the case of missing data’, Master of Arts thesis, College of Education and Human Sciences, University of Nebraska.
Owen, A.L. (2012) ‘Student characteristics, behavior, and performance in economics classes’, in G.M. Hoyt and K. McGoldrich (eds.) International handbook on teaching and learning, Northampton, MA: Edward Elgar, pp. 341–350.
Google Scholar
Pellegrino, J.W. (2010) The design of an assessment system for the race to the top: a learning sciences perspective on issues of growth and measurement, Princeton: Educational Testing Service.
Google Scholar
Powell, M. and Ansic, D. (1997) ‘Gender differences in risk behavior in financial decision-making: an experimental analysis’, Journal of Economic Psychology 18(6): 605–628.
Article Google Scholar
Sawyer, R.L., Cole, N.S. and Cole, J.W.L. (1976) ‘Utilities and the issue of fairness in a decision theoretic model for selection’, Journal of Educational Measurement 13(1): 59–76.
Article Google Scholar
Schipolowski, S., Wilhelm, O., and Schroeders, U. (2017) Berliner Test zur Erfassung fluider und kristalliner Intelligenz ab der 11. Jahrgangsstufe (BEFKI 11+) [Berlin test of fluid and crystallized in-telligence for grades 11 and above], Göttingen: Hogrefe.
Schütte, K., Zimmermann, F. and Köller, O. (2017) ‘The role of domain-specific ability self-concepts in the value students attach to school’, Learning and Individual Differences 56: 136–142.
Article Google Scholar
Shepard, L.A. (1987) ‘The case for bias in tests of achievement and scholastic aptitude’, in S. Modgil and C. Modgil (eds.) Arthur Jensen: Consensus and controversy, London: Falmer Press, pp. 210–226.
Google Scholar
Spiel, C. and Schober, B. (2018) ‘Challenges for evaluation in higher education: entrance examinations and beyond: the sample case of medical education’, in O. Zlatkin-Troitschanskaia, M. Toepper, H. Pant, C. Lautenbach and C. Kuhn (eds.) Assessment of learning outcomes in higher education. Cross-national comparisons and perspectives, Cham: Springer, pp. 59–71.
Chapter Google Scholar
Suarez Enciso, S. (2016). ‘The effects of missing data treatment on person ability estimates using IRT models‘, Master of Arts thesis, College of Education and Human Sciences, University of Nebraska.
Stata Corp (2013) Stata statistical software: release 13, College Station, TX: StataCorp LP.
Google Scholar
Vanclay, F., Baines, J.T. and Taylor, C.N. (2013) ‘Principles for ethical research involving humans: ethical professional practice in impact assessment Part I’, Impact Assessment and Project Appraisal (31)4: 243–253.
Article Google Scholar
Verhoeven, B.H., Verwijnen, G.M., Scherpbier, A.J.J.A., and Van der Vleuten, C.P.M. (2002) ‘Growth of medical knowledge’ Medical Education 36:711–717.
Walker, C. (2011) ‘What’s the DIF? Why differential item functioning analyses are an important part of instrument development and validation’, Journal of Psychoeducational Assessment 29(4): 364–376.
Article Google Scholar
Walstad, W.B., Rebeck, K. and Butters, R.B. (2013) Test of economic literacy: Examiner’s manual, New York: Council for Economic Education.
Google Scholar
Walstad, W.B. and Robson, D. (1997) ‘Differential item functioning and male-female differences on multiple-choice tests in economics’, Journal of Economic Education 28(2): 155–171.
Article Google Scholar
Walstad, W.B., Schmidt, S., Zlatkin-Troitschanskaia, O. and Happ, R. (2018) ‘Pretest-posttest measurement of the economic knowledge of undergraduates — Estimating guessing effects‘, in Annual AEA conference on teaching and research in economic education; 5–7 January 2018; Philadelphia, USA.
Walstad, W.B. and Wagner, J. (2016) ‘The disaggregation of value-added test scores to assess learning outcomes in economics courses’, Journal of Economic Education 47(2): 121–131.
Article Google Scholar
Walstad, W.B., Watts, M. and Rebeck, K. (2007) Test of understanding in college economics: examiner’s manual, New York: National Council on Economic Education.
Google Scholar
Wise, S.L. and DeMars, C.E. (2005) ‘Low examinee effort in low-stakes assessment: problems and potential solutions’, Educational Assessment 10(1): 1–17.
Article Google Scholar
Wise, S.L. and Kong, X. (2005) ‘Response time effort: a new measure of examinee motivation in computer-based tests’, Applied Measurement in Education 18(2):163–183.
Article Google Scholar
You, Z. and Hu, Y. (2013) ‘Walking a policy tightrope: the dilemma of balancing diversification and equality in Chinese college entrance examination reform’, Higher Education Policy 26(3): 309–324.
Article Google Scholar
Zierky, M. (2006) ‘Fairness review in assessment’, in S.M. Downing and T.M. Haladyna (eds.) Handbook of test development, New Jersey: Lawrence Erlbaum Associates, Inc, pp. 359–376.
Google Scholar
Zlatkin-Troitschanskaia, O., Förster, M., Brückner, S., & Happ, R. (2014) ‘Insights from a German assessment of business and economics competence’, in H. Coates (Ed.) Higher education learning outcomes assessment — international perspectives. Frankfurt/Main: Peter Lang, pp. 175–197.
Google Scholar
Zlatkin-Troitschanskaia, O., Jitomirski, J., Happ, R., Molerov, D., Schlax, J., Kühling-Thees, C., Pant, H.A., Förster, M. and Brückner, S. (2019) ‘Validating a test for measuring knowledge and understanding of economics among university students’, Zeitschrift für pädagogische Psychologie (German Journal of Educational Psychology), in press.
Zlatkin-Troitschanskaia, O. and Pant, H.A. (2016) ‘Measurement advances and challenges in competency assessment in higher education’, Journal of Educational Measurement 53(3): 253–264.
Article Google Scholar
Zlatkin-Troitschanskaia, O., Pant, H.A., Lautenbach, C., Molerov, D., Toepper, M. and Brückner, S. (2017) Modeling and measuring competencies in higher education - Approaches to challenges in higher education policy and practice. Wiesbaden: Springer.
Google Scholar
Zlatkin-Troitschanskaia, O., Shavelson, R.J. and Pant, H.A. (2018) ‘Assessment of learning outcomes in higher education. International comparisons and perspectives’, in C. Secolsky and C.B. Denison (eds.) Handbook on measurement, assessment, and evaluation in higher education, New York: Routledge, pp. 686–698.
Google Scholar
Zumbo, B.D. (2007) ‘Three generations of differential item functioning (DIF) analyses: considering where it has been, where it is now, and where it is going’, Language Assessment Quarterly 4(2): 223–233.
Article Google Scholar

Download references

Acknowledgements

We would like to thank the anonymous reviewers and the editor for providing constructive feedback and helpful guidance in the revision of this paper. The study is being funded by the German Federal Ministry of Education and Research under Grant Number 01PK15001A.

Author information

Authors and Affiliations

Johannes Gutenberg University Mainz, Mainz, Germany
O. Zlatkin-Troitschanskaia, J. Schlax, R. Happ, C. Kühling-Thees & S. Brückner
Humboldt University Berlin, Berlin, Germany
J. Jitomirski & H. A. Pant

Authors

O. Zlatkin-Troitschanskaia
View author publications
You can also search for this author in PubMed Google Scholar
J. Schlax
View author publications
You can also search for this author in PubMed Google Scholar
J. Jitomirski
View author publications
You can also search for this author in PubMed Google Scholar
R. Happ
View author publications
You can also search for this author in PubMed Google Scholar
C. Kühling-Thees
View author publications
You can also search for this author in PubMed Google Scholar
S. Brückner
View author publications
You can also search for this author in PubMed Google Scholar
H. A. Pant
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to O. Zlatkin-Troitschanskaia.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zlatkin-Troitschanskaia, O., Schlax, J., Jitomirski, J. et al. Ethics and Fairness in Assessing Learning Outcomes in Higher Education. High Educ Policy 32, 537–556 (2019). https://doi.org/10.1057/s41307-019-00149-x

Download citation

Published: 16 July 2019
Issue Date: December 2019
DOI: https://doi.org/10.1057/s41307-019-00149-x

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Ethics and Fairness in Assessing Learning Outcomes in Higher Education

Access this article

Similar content being viewed by others

Promoting Valid Assessment of Students with Disabilities and English Learners

The effect of a national education policy on language test performance: a fairness perspective

Valid for the Elites? The Trade-Off Between Test Fairness and Test Validity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Ethics and Fairness in Assessing Learning Outcomes in Higher Education

Access this article

Similar content being viewed by others

Promoting Valid Assessment of Students with Disabilities and English Learners

The effect of a national education policy on language test performance: a fairness perspective

Valid for the Elites? The Trade-Off Between Test Fairness and Test Validity

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation