Abstract
Authenticity assessments evaluate learners using methods and contexts that mimic the way the tested content and skills will be used in the real world. While authenticity has long been a goal of assessors across the education spectrum, educators have struggled with the supposed tradeoff inherent to authentic assessment: reliability versus validity. This tradeoff was particularly concerning in the large-scale assessment that characterized K-12 education, but it was a concern of assessors in the professions as well, who worried that by making their assessments authentic, they made them irreproducible and therefore unreliable. Forty plus years after the arrival of authenticity on the professional assessment scene, the discussion has changed. Rigorous investigation into assessment techniques in medical education, in particular, has demonstrated that the authenticity tradeoff as it was originally argued is a fallacious one. Medical educators have discovered a variety of ways to imbue authentic assessments with reliability, and vice versa. This chapter discusses the historical discussion around authenticity, and looks closely at three signatory assessments in medical education to glean lessons for assessors in other professions in bridging this supposed divide.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
As with many educational concepts, authentic assessment is poorly and inconsistently defined in the literature (Frey et al. 2012). It has been, and continues to be, conflated with “performance assessment” and “formative assessment” (Baker and O’Neil 1996). This is an understandable development of usage, as the three movements arose from similar motivations. But it is important to remember that performance can be independent of context, whereas authentic assessment is always crafted with an eye towards the real world context of implementation (Wiggins 1991) and formative assessment more accurately describes the intended use of the assessment rather than what is being assessed.
- 2.
It is important to note that Fig. 4.1 describes a simplified state of relative validity and reliability risk. As we will see later in this chapter, there is no reason that an inauthentic assessment could not be made highly valid, nor is there any reason that a truly authentic assessment could not be made highly reliable. But when comparing two assessments at either end of the continuum the difference in relative risks of invalidity and unreliability are worth addressing. Additionally, it is important to note that this model lumps various types of validity together, but is probably most descriptive of content and construct validity over other descriptors of validity.
- 3.
Indeed, the roots of this movement run as far back as the 1950s, with Lindquist (1951; p. 152) arguing that “it should always be the fundamental goal of the achievement test constructor to make the elements of his test series as nearly equivalent to, or as much like, the elements of the criterion series as consequences of efficiency, comparability, economy, and expediency will permit.” (quote found by this author in Linn et al. 1991).
- 4.
Following Gipps (1995), I use reliability in “relation to consistency as a basis for comparability; issues of consistent and comparable administration, comparability of the task, and comparability of assessment of performance (among raters)… rather than technical test-retest or split-half measures of reliability.” Likewise, rather than parse validity into differing measures of construct, content, and criterion-related validity, I will instead use validity in its most general application of how well the test or measure in question is subjectively viewed to cover the concept it is claiming to measure, so called face validity. For an exceptional overview of the technical aspects of validity as they relate to authentic/performance assessment, I turn the reader to Moss (1992); additionally, Linn et al. (1991) broaden the consideration of assessment beyond reliability and validity in ways that are illuminating but beyond the aims of this chapter.
- 5.
Note that more recent analyses in the field of medicine, such as those done by Wimmers et al. (2007) suggest that content specificity alone does not completely explain differences in performance in the clinic. There is some X-factor that is independent to each learner that we must consider as well and that X-factor is likely to be some generalizable skill that each learner possesses to a greater or lesser degree.
- 6.
Example downloaded from http://medicine.tufts.edu/~/media/TUSM/MD/PDFs/Education/OEA/Faculty%20Development/Evaluation_Writing%20Exam%20Questions%20for%20Basic%20Sciences.pdf on December 17, 2015.
References
Al Ansari, A., Ali, S. K., & Donnon, T. (2013). The construct and criterion validity of the mini-CEX: a meta-analysis of the published research. Academic Medicine, 88(3), 468–474.
Archbald, D. A., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school. Washington DC: Office of Educational Research and Improvement.
Baker, E. L., & O‘Neil Jr, H. F. (1996). Performance assessment and equity. Implementing performance assessment: Promises, problems, and challenges, 183–199.
Baron, M. A., & Boschee, F. (1995). Authentic assessment: The key to unlocking student success. Lancaster, PA: Order Department, Technomic Publishing Company, Inc.
Black, H., Hale, J., Martin, S., & Yates, J. (1989). The quality of assessment. Edinburgh: Scottish Council for Research in Education.
Broadfoot, P. (1996). Education, assessment and society: A sociological analysis. Open University Press.
Burke, J., & Jessup, G. (1990). Assessment in NVQs: Disentangling validity from reliability. Assessment Debates, 188–196.
Case, S. M., & Swanson, D. B. (1998). Constructing written test questions for the basic and clinical sciences (2nd ed.). Philadelphia, PA: National Board of Medical Examiners.
Clarke, L., & wolf, A. (1991). Blue Badge Guides: Assessment of national knowledge requirements. Final Project Report to the Department of Employment (unpublished).
Cohen, R., Reznick, R. K., Taylor, B. R., Provan, J., & Rothman, A. (1990). Reliability and validity of the Objective Structured Clinical Examination in assessing surgical residents. The American Journal of Surgery, 160, 302–305.
Cunnington, J. P. W., Neville, A. J., & Norman, G. R. (1997). The risks of thoroughness: Reliability and validity of global ratings and checklists in an OSCE. Advances in Health Sciences Education, 1, 227–233.
Darling-Hammond, L., Ancess, J., & Falk, B. (1995). Authentic assessment in action: Studies of schools and students at work. Teachers College Press.
Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and teacher education, 16(5), 523–545.
Dong, T., Swygert, K. A., Durning, S. J., Saguil, A., Gilliland, W. R., Cruess, D., et al. (2014). Validity evidence for medical school OSCEs: Associations with USMLE® step assessments. Teaching and Learning in Medicine, 26(4), 379–386.
Elstein, A. S., Shulman, L. S., & Sprafka, S. A. (1978). Medical problem solving: An analysis of clinical reasoning. Harvard University Press.
Epstein, R. M. (2007). Assessment in medical education. New England Journal of Medicine, 356(4), 387–396.
Frey, B. B., Schmitt, V. L., & Allen, J. P. (2012). Defining authentic classroom assessment. Practical Assessment, Research & Evaluation, 17(2), 2.
Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., & Trow, M. (1994). The new production of knowledge: The dynamics of science and research in contemporary societies. Sage.
Gipps, C. (1995). Reliability, validity, and manageability in large-scale performance assessment. Evaluating authentic assessment, 105–123.
Gibbs, G. (1999). Using assessment strategically to change the way students learn. Assessment Matters in Higher Education, 41–53.
Gipps, C., McCallum, B., McAlister, S., & Brown, M. (1991). National assessment at seven: some emerging themes. In C. Gipps (Ed.), British Educational Research Association Annual Conference.
Glew, R. H., Ripkey, D. R., & Swanson, D. B. (1997). Relationship between students’ performances on the NBME Comprehensive Basic Science Examination and the USMLE Step 1: A longitudinal investigation at one school. Academic Medicine, 72(12), 1097–1102.
Greeno, J. G. (1989). A perspective on thinking. American Psychologist, 44(2), 134.
Gulikers, J. T., Bastiaens, T. J., Kirschner, P. A., & Kester, L. (2008). Authenticity is in the eye of the beholder: Student and teacher perceptions of assessment authenticity. Journal of Vocational Education and Training, 60(4), 401–412.
Harden, R. M. (1988). What is an OSCE? Medical Teacher, 10(1), 19–22.
Harden, R. M., & Gleeson, F. A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education, 12, 41–54.
Hodkinson, P. (1991). NCVQ and the 16‐19 curriculum. British Journal of Education and Work, 4(3), 25–38.
Jozefowicz, R. F., Koeppen, B. M., Case, S., Galbraith, R., Swanson, D., & Glew, R. H. (2002). The quality of in-house médical school examinations. Academic Medicine, 77(2), 156–161.
Khan, K. Z., Gaunt, K., Ramachandran, S., & Pushkar, P. (2013). The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part II: Organisation & Administration. Medical Teacher, 35(9), e1447–e1463.
Khan, K., & Ramachandran, S. (2012). Conceptual framework for performance assessment: competency, competence and performance in the context of assessments in healthcare–deciphering the terminology. Medical teacher, 34(11), 920–928.
Kibble, J. D., Johnson, T. R., Khalil, M. K., Peppler, R. D., & Davey, D. D. (2014). Use of the NBME Comprehensive Basic Science Exam as a progress test in the preclerkship curriculum of a new medical school. Advances in Physiology Education, 38, 315–320.
Kroboth, F. J., Hanusa, B. H., Parker, S., Coulehan, J. L., Kapoor, W. N., Brown, F. H., et al. (1992). The inter-rater reliability and internal consistency of a clinical evaluation exercise. Journal of General Internal Medicine, 7(2), 174–179.
Lee, M., & Wimmers, P. F. (2011). Clinical competence understood through the construct validity of three clerkship assessments. Medical Education, 45(8), 849–857.
Levine, H. G., McGuire, C. H., & Nattress Jr, L. W. (1970). The validity of multiple choice achievement tests as measures of competence in medicine. American Educational Research Journal, 69–82.
Lindquist, E. F. (1951). Preliminary considerations in objective test construction. Educational Measurement, 119–158.
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15–21.
Maclellan, E. (2004). Authenticity in assessment tasks: A heuristic exploration of academics’ perceptions. Higher Education Research & Development, 23(1), 19–33.
Marzano, R. J., Pickering, D. J., & McTighe, J. (1993). Assessing student outcomes: Performance assessment using the dimensions of learning model. Aurora, CO: McREL Institute.
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed.). Washington DC: Oryx Press.
Miller, G. E. (1990). The assessment of clinical skills/competence/performance. Academic Medicine, 65(9), S63–S67.
Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62(3), 229–258.
Myles, T., & Galvez-Myles, R. (2003). USMLE Step 1 and 2 scores correlate with family medicine clinical and examination scores. Family Medicine-Kansas City-, 35(7), 510–513.
Newmann, F. M., & Archbald, D. A. (1992). The nature of authentic academic achievement. Toward a new science of educational testing and assessment, 71–83.
Norman, G. R., Smith, E. K. M., Powles, A. C. P., Rooney, P. J., Henry, N. L., & Dodd, P. E. (1987). Factors underlying performance on written tests of knowledge. Medical Education, 21(4), 297–304.
Norman, G. R., Muzzin, L. J., Williams, R. G., & Swanson, D. B. (1985). Simulation in health sciences education. Journal of Instructional Development, 8(1), 11–17.
Norcini, J. J., Blank, L. L., Duffy, F. D., & Fortna, G. S. (2003). The mini-CEX: A method for assessing clinical skills. Annals of Internal Medicine, 138(6), 476–481.
Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): A preliminary investigation. Annals of Internal Medicine, 123(10), 795–799.
Norcini, J. J. (2005). Current perspectives in assessment: the assessment of performance at work. Medical Education, 39(9), 880–889.
Norcini, J. J., & McKinley, D. W. (2007). Assessment methods in medical education. Teaching and teacher education, 23(3), 239–250.
Pell, G., Fuller, R., Homer, M., & Roberts, T. (2010). How to measure the quality of the OSCE: A review of metrics-AMEE guide no. 49. Medical Teacher, 32(10), 802–811.
Prais, S. J. (1991). Vocational qualifications in Britain and Europe: theory and practice. National Institute Economic Review, 136(1), 86–92.
Resnick, L. B., & Resnick, D. P. (1992). Assessing the thinking curriculum: New tools for educational reform. In Changing assessments (pp. 37–75). Netherlands: Springer.
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 22–27.
Simon, S. R., Volkan, K., Hamann, C., Duffey, C., & Fletcher, S. W. (2002). The relationship between second-year medical students’ OSCE scores and USMLE Step 1 scores. Medical Teacher, 24(5), 535–539.
Smee, S. (2003). ABC of learning and teaching in medicine: skill based assessment. BMJ: British Medical Journal, 326(7391), 703.
Svinicki, M. D. (2004). Authentic assessment: Testing in reality. New Directions for Teaching and Learning, 2004(100), 23–29.
Terwilliger, J. S. (1998). Rejoinder: response to Wiggins and Newmann. Educational Researcher, 27(6), 22–23.
Van Der Vleuten, C. P., & Schuwirth, L. W. (2005). Assessing professional competence: from methods to programmes. Medical Education, 39(3), 309–317.
Vleuten, C. V. D., Luyk, S. V., Ballegooijen, A. V., & Swanson, D. B. (1989). Training and experience of examiners. Medical Education, 23(3), 290–296.
Vu, N. V., Steward, D. E., & Marcy, M. (1987). An assessment of the consistency and accuracy of standardized patients’ simulations. Academic Medicine, 62(12), 1000–1002.
Wiggins, G. (1991). Teaching to the (authentic) test. Educational Leadership, 46, 41–47.
Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75(3), 200–208.
Wiggins, G. (1998). Educative Assessment. Designing Assessments To Inform and Improve Student Performance. San Francisco, CA: Jossey-Bass Publishers. 94104.
Winckel, C. P., Reznick, R. K., Cohen, R., & Taylor, B. (1994). Reliability and construct validity of a structured technical skills assessment form. The American Journal of Surgery, 167(4), 423–427.
Wimmers, P. F., Splinter, T. A., Hancock, G. R., & Schmidt, H. G. (2007). Clinical competence: General ability or case-specific? Advances in Health Sciences Education, 12(3), 299–314.
Wolf, A. (1995). Authentic assessments in a competitive sector: Institutional prerequisites and cautionary tales. In H. Torrance (Ed.), Evaluating authentic assessment: Problems and possibilities in new approaches to assessment. Open University (Cited).
Wolf, A., & Silver, R. (1986). Work based learning: Trainee assessment by supervisors.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer International Publishing Switzerland
About this chapter
Cite this chapter
O’Neal, C. (2016). Beyond Authenticity: What Should We Value in Assessment in Professional Education?. In: Wimmers, P., Mentkowski, M. (eds) Assessing Competence in Professional Performance across Disciplines and Professions. Innovation and Change in Professional Education, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-30064-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-30064-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30062-7
Online ISBN: 978-3-319-30064-1
eBook Packages: EducationEducation (R0)