Beyond Authenticity: What Should We Value in Assessment in Professional Education?

O’Neal, Christopher

doi:10.1007/978-3-319-30064-1_4

Christopher O’Neal⁴

Part of the book series: Innovation and Change in Professional Education ((ICPE,volume 13))

1282 Accesses
1 Citations

Abstract

Authenticity assessments evaluate learners using methods and contexts that mimic the way the tested content and skills will be used in the real world. While authenticity has long been a goal of assessors across the education spectrum, educators have struggled with the supposed tradeoff inherent to authentic assessment: reliability versus validity. This tradeoff was particularly concerning in the large-scale assessment that characterized K-12 education, but it was a concern of assessors in the professions as well, who worried that by making their assessments authentic, they made them irreproducible and therefore unreliable. Forty plus years after the arrival of authenticity on the professional assessment scene, the discussion has changed. Rigorous investigation into assessment techniques in medical education, in particular, has demonstrated that the authenticity tradeoff as it was originally argued is a fallacious one. Medical educators have discovered a variety of ways to imbue authentic assessments with reliability, and vice versa. This chapter discusses the historical discussion around authenticity, and looks closely at three signatory assessments in medical education to glean lessons for assessors in other professions in bridging this supposed divide.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

eBook: USD 16.99; Price excludes VAT (USA)

Softcover Book: USD 129.99; Price excludes VAT (USA)

Hardcover Book: USD 179.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
As with many educational concepts, authentic assessment is poorly and inconsistently defined in the literature (Frey et al. 2012). It has been, and continues to be, conflated with “performance assessment” and “formative assessment” (Baker and O’Neil 1996). This is an understandable development of usage, as the three movements arose from similar motivations. But it is important to remember that performance can be independent of context, whereas authentic assessment is always crafted with an eye towards the real world context of implementation (Wiggins 1991) and formative assessment more accurately describes the intended use of the assessment rather than what is being assessed.
2.
It is important to note that Fig. 4.1 describes a simplified state of relative validity and reliability risk. As we will see later in this chapter, there is no reason that an inauthentic assessment could not be made highly valid, nor is there any reason that a truly authentic assessment could not be made highly reliable. But when comparing two assessments at either end of the continuum the difference in relative risks of invalidity and unreliability are worth addressing. Additionally, it is important to note that this model lumps various types of validity together, but is probably most descriptive of content and construct validity over other descriptors of validity.
3.
Indeed, the roots of this movement run as far back as the 1950s, with Lindquist (1951; p. 152) arguing that “it should always be the fundamental goal of the achievement test constructor to make the elements of his test series as nearly equivalent to, or as much like, the elements of the criterion series as consequences of efficiency, comparability, economy, and expediency will permit.” (quote found by this author in Linn et al. 1991).
4.
Following Gipps (1995), I use reliability in “relation to consistency as a basis for comparability; issues of consistent and comparable administration, comparability of the task, and comparability of assessment of performance (among raters)… rather than technical test-retest or split-half measures of reliability.” Likewise, rather than parse validity into differing measures of construct, content, and criterion-related validity, I will instead use validity in its most general application of how well the test or measure in question is subjectively viewed to cover the concept it is claiming to measure, so called face validity. For an exceptional overview of the technical aspects of validity as they relate to authentic/performance assessment, I turn the reader to Moss (1992); additionally, Linn et al. (1991) broaden the consideration of assessment beyond reliability and validity in ways that are illuminating but beyond the aims of this chapter.
5.
Note that more recent analyses in the field of medicine, such as those done by Wimmers et al. (2007) suggest that content specificity alone does not completely explain differences in performance in the clinic. There is some X-factor that is independent to each learner that we must consider as well and that X-factor is likely to be some generalizable skill that each learner possesses to a greater or lesser degree.
6.
Example downloaded from http://medicine.tufts.edu/~/media/TUSM/MD/PDFs/Education/OEA/Faculty%20Development/Evaluation_Writing%20Exam%20Questions%20for%20Basic%20Sciences.pdf on December 17, 2015.

References

Al Ansari, A., Ali, S. K., & Donnon, T. (2013). The construct and criterion validity of the mini-CEX: a meta-analysis of the published research. Academic Medicine, 88(3), 468–474.
Article Google Scholar
Archbald, D. A., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school. Washington DC: Office of Educational Research and Improvement.
Google Scholar
Baker, E. L., & O‘Neil Jr, H. F. (1996). Performance assessment and equity. Implementing performance assessment: Promises, problems, and challenges, 183–199.
Google Scholar
Baron, M. A., & Boschee, F. (1995). Authentic assessment: The key to unlocking student success. Lancaster, PA: Order Department, Technomic Publishing Company, Inc.
Google Scholar
Black, H., Hale, J., Martin, S., & Yates, J. (1989). The quality of assessment. Edinburgh: Scottish Council for Research in Education.
Google Scholar
Broadfoot, P. (1996). Education, assessment and society: A sociological analysis. Open University Press.
Google Scholar
Burke, J., & Jessup, G. (1990). Assessment in NVQs: Disentangling validity from reliability. Assessment Debates, 188–196.
Google Scholar
Case, S. M., & Swanson, D. B. (1998). Constructing written test questions for the basic and clinical sciences (2nd ed.). Philadelphia, PA: National Board of Medical Examiners.
Google Scholar
Clarke, L., & wolf, A. (1991). Blue Badge Guides: Assessment of national knowledge requirements. Final Project Report to the Department of Employment (unpublished).
Google Scholar
Cohen, R., Reznick, R. K., Taylor, B. R., Provan, J., & Rothman, A. (1990). Reliability and validity of the Objective Structured Clinical Examination in assessing surgical residents. The American Journal of Surgery, 160, 302–305.
Article Google Scholar
Cunnington, J. P. W., Neville, A. J., & Norman, G. R. (1997). The risks of thoroughness: Reliability and validity of global ratings and checklists in an OSCE. Advances in Health Sciences Education, 1, 227–233.
Article Google Scholar
Darling-Hammond, L., Ancess, J., & Falk, B. (1995). Authentic assessment in action: Studies of schools and students at work. Teachers College Press.
Google Scholar
Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and teacher education, 16(5), 523–545.
Article Google Scholar
Dong, T., Swygert, K. A., Durning, S. J., Saguil, A., Gilliland, W. R., Cruess, D., et al. (2014). Validity evidence for medical school OSCEs: Associations with USMLE® step assessments. Teaching and Learning in Medicine, 26(4), 379–386.
Article Google Scholar
Elstein, A. S., Shulman, L. S., & Sprafka, S. A. (1978). Medical problem solving: An analysis of clinical reasoning. Harvard University Press.
Google Scholar
Epstein, R. M. (2007). Assessment in medical education. New England Journal of Medicine, 356(4), 387–396.
Article Google Scholar
Frey, B. B., Schmitt, V. L., & Allen, J. P. (2012). Defining authentic classroom assessment. Practical Assessment, Research & Evaluation, 17(2), 2.
Google Scholar
Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., & Trow, M. (1994). The new production of knowledge: The dynamics of science and research in contemporary societies. Sage.
Google Scholar
Gipps, C. (1995). Reliability, validity, and manageability in large-scale performance assessment. Evaluating authentic assessment, 105–123.
Google Scholar
Gibbs, G. (1999). Using assessment strategically to change the way students learn. Assessment Matters in Higher Education, 41–53.
Google Scholar
Gipps, C., McCallum, B., McAlister, S., & Brown, M. (1991). National assessment at seven: some emerging themes. In C. Gipps (Ed.), British Educational Research Association Annual Conference.
Google Scholar
Glew, R. H., Ripkey, D. R., & Swanson, D. B. (1997). Relationship between students’ performances on the NBME Comprehensive Basic Science Examination and the USMLE Step 1: A longitudinal investigation at one school. Academic Medicine, 72(12), 1097–1102.
Article Google Scholar
Greeno, J. G. (1989). A perspective on thinking. American Psychologist, 44(2), 134.
Article Google Scholar
Gulikers, J. T., Bastiaens, T. J., Kirschner, P. A., & Kester, L. (2008). Authenticity is in the eye of the beholder: Student and teacher perceptions of assessment authenticity. Journal of Vocational Education and Training, 60(4), 401–412.
Article Google Scholar
Harden, R. M. (1988). What is an OSCE? Medical Teacher, 10(1), 19–22.
Article Google Scholar
Harden, R. M., & Gleeson, F. A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education, 12, 41–54.
Google Scholar
Hodkinson, P. (1991). NCVQ and the 16‐19 curriculum. British Journal of Education and Work, 4(3), 25–38.
Article Google Scholar
Jozefowicz, R. F., Koeppen, B. M., Case, S., Galbraith, R., Swanson, D., & Glew, R. H. (2002). The quality of in-house médical school examinations. Academic Medicine, 77(2), 156–161.
Article Google Scholar
Khan, K. Z., Gaunt, K., Ramachandran, S., & Pushkar, P. (2013). The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part II: Organisation & Administration. Medical Teacher, 35(9), e1447–e1463.
Article Google Scholar
Khan, K., & Ramachandran, S. (2012). Conceptual framework for performance assessment: competency, competence and performance in the context of assessments in healthcare–deciphering the terminology. Medical teacher, 34(11), 920–928.
Google Scholar
Kibble, J. D., Johnson, T. R., Khalil, M. K., Peppler, R. D., & Davey, D. D. (2014). Use of the NBME Comprehensive Basic Science Exam as a progress test in the preclerkship curriculum of a new medical school. Advances in Physiology Education, 38, 315–320.
Article Google Scholar
Kroboth, F. J., Hanusa, B. H., Parker, S., Coulehan, J. L., Kapoor, W. N., Brown, F. H., et al. (1992). The inter-rater reliability and internal consistency of a clinical evaluation exercise. Journal of General Internal Medicine, 7(2), 174–179.
Article Google Scholar
Lee, M., & Wimmers, P. F. (2011). Clinical competence understood through the construct validity of three clerkship assessments. Medical Education, 45(8), 849–857.
Article Google Scholar
Levine, H. G., McGuire, C. H., & Nattress Jr, L. W. (1970). The validity of multiple choice achievement tests as measures of competence in medicine. American Educational Research Journal, 69–82.
Google Scholar
Lindquist, E. F. (1951). Preliminary considerations in objective test construction. Educational Measurement, 119–158.
Google Scholar
Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15–21.
Article Google Scholar
Maclellan, E. (2004). Authenticity in assessment tasks: A heuristic exploration of academics’ perceptions. Higher Education Research & Development, 23(1), 19–33.
Article Google Scholar
Marzano, R. J., Pickering, D. J., & McTighe, J. (1993). Assessing student outcomes: Performance assessment using the dimensions of learning model. Aurora, CO: McREL Institute.
Google Scholar
Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed.). Washington DC: Oryx Press.
Google Scholar
Miller, G. E. (1990). The assessment of clinical skills/competence/performance. Academic Medicine, 65(9), S63–S67.
Article Google Scholar
Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62(3), 229–258.
Article Google Scholar
Myles, T., & Galvez-Myles, R. (2003). USMLE Step 1 and 2 scores correlate with family medicine clinical and examination scores. Family Medicine-Kansas City-, 35(7), 510–513.
Google Scholar
Newmann, F. M., & Archbald, D. A. (1992). The nature of authentic academic achievement. Toward a new science of educational testing and assessment, 71–83.
Google Scholar
Norman, G. R., Smith, E. K. M., Powles, A. C. P., Rooney, P. J., Henry, N. L., & Dodd, P. E. (1987). Factors underlying performance on written tests of knowledge. Medical Education, 21(4), 297–304.
Article Google Scholar
Norman, G. R., Muzzin, L. J., Williams, R. G., & Swanson, D. B. (1985). Simulation in health sciences education. Journal of Instructional Development, 8(1), 11–17.
Article Google Scholar
Norcini, J. J., Blank, L. L., Duffy, F. D., & Fortna, G. S. (2003). The mini-CEX: A method for assessing clinical skills. Annals of Internal Medicine, 138(6), 476–481.
Article Google Scholar
Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): A preliminary investigation. Annals of Internal Medicine, 123(10), 795–799.
Article Google Scholar
Norcini, J. J. (2005). Current perspectives in assessment: the assessment of performance at work. Medical Education, 39(9), 880–889.
Article Google Scholar
Norcini, J. J., & McKinley, D. W. (2007). Assessment methods in medical education. Teaching and teacher education, 23(3), 239–250.
Article Google Scholar
Pell, G., Fuller, R., Homer, M., & Roberts, T. (2010). How to measure the quality of the OSCE: A review of metrics-AMEE guide no. 49. Medical Teacher, 32(10), 802–811.
Article Google Scholar
Prais, S. J. (1991). Vocational qualifications in Britain and Europe: theory and practice. National Institute Economic Review, 136(1), 86–92.
Article Google Scholar
Resnick, L. B., & Resnick, D. P. (1992). Assessing the thinking curriculum: New tools for educational reform. In Changing assessments (pp. 37–75). Netherlands: Springer.
Google Scholar
Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 22–27.
Google Scholar
Simon, S. R., Volkan, K., Hamann, C., Duffey, C., & Fletcher, S. W. (2002). The relationship between second-year medical students’ OSCE scores and USMLE Step 1 scores. Medical Teacher, 24(5), 535–539.
Article Google Scholar
Smee, S. (2003). ABC of learning and teaching in medicine: skill based assessment. BMJ: British Medical Journal, 326(7391), 703.
Article Google Scholar
Svinicki, M. D. (2004). Authentic assessment: Testing in reality. New Directions for Teaching and Learning, 2004(100), 23–29.
Article Google Scholar
Terwilliger, J. S. (1998). Rejoinder: response to Wiggins and Newmann. Educational Researcher, 27(6), 22–23.
Google Scholar
Van Der Vleuten, C. P., & Schuwirth, L. W. (2005). Assessing professional competence: from methods to programmes. Medical Education, 39(3), 309–317.
Article Google Scholar
Vleuten, C. V. D., Luyk, S. V., Ballegooijen, A. V., & Swanson, D. B. (1989). Training and experience of examiners. Medical Education, 23(3), 290–296.
Article Google Scholar
Vu, N. V., Steward, D. E., & Marcy, M. (1987). An assessment of the consistency and accuracy of standardized patients’ simulations. Academic Medicine, 62(12), 1000–1002.
Article Google Scholar
Wiggins, G. (1991). Teaching to the (authentic) test. Educational Leadership, 46, 41–47.
Google Scholar
Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75(3), 200–208.
Google Scholar
Wiggins, G. (1998). Educative Assessment. Designing Assessments To Inform and Improve Student Performance. San Francisco, CA: Jossey-Bass Publishers. 94104.
Google Scholar
Winckel, C. P., Reznick, R. K., Cohen, R., & Taylor, B. (1994). Reliability and construct validity of a structured technical skills assessment form. The American Journal of Surgery, 167(4), 423–427.
Article Google Scholar
Wimmers, P. F., Splinter, T. A., Hancock, G. R., & Schmidt, H. G. (2007). Clinical competence: General ability or case-specific? Advances in Health Sciences Education, 12(3), 299–314.
Article Google Scholar
Wolf, A. (1995). Authentic assessments in a competitive sector: Institutional prerequisites and cautionary tales. In H. Torrance (Ed.), Evaluating authentic assessment: Problems and possibilities in new approaches to assessment. Open University (Cited).
Google Scholar
Wolf, A., & Silver, R. (1986). Work based learning: Trainee assessment by supervisors.
Google Scholar

Download references

Author information

Authors and Affiliations

Faculty of Education, David Geffen School of Medicine, UCLA, Los Angeles, CA, USA
Christopher O’Neal

Authors

Christopher O’Neal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Christopher O’Neal .

Editor information

Editors and Affiliations

University of California, Los Angeles, California, USA
Paul F. Wimmers
Alverno College, Milwaukee, Wisconsin, USA
Marcia Mentkowski

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

O’Neal, C. (2016). Beyond Authenticity: What Should We Value in Assessment in Professional Education?. In: Wimmers, P., Mentkowski, M. (eds) Assessing Competence in Professional Performance across Disciplines and Professions. Innovation and Change in Professional Education, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-30064-1_4

Download citation

DOI: https://doi.org/10.1007/978-3-319-30064-1_4
Published: 20 April 2016
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-30062-7
Online ISBN: 978-3-319-30064-1
eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics