Skip to main content

Beyond Authenticity: What Should We Value in Assessment in Professional Education?

  • Chapter
  • First Online:
Assessing Competence in Professional Performance across Disciplines and Professions

Part of the book series: Innovation and Change in Professional Education ((ICPE,volume 13))

Abstract

Authenticity assessments evaluate learners using methods and contexts that mimic the way the tested content and skills will be used in the real world. While authenticity has long been a goal of assessors across the education spectrum, educators have struggled with the supposed tradeoff inherent to authentic assessment: reliability versus validity. This tradeoff was particularly concerning in the large-scale assessment that characterized K-12 education, but it was a concern of assessors in the professions as well, who worried that by making their assessments authentic, they made them irreproducible and therefore unreliable. Forty plus years after the arrival of authenticity on the professional assessment scene, the discussion has changed. Rigorous investigation into assessment techniques in medical education, in particular, has demonstrated that the authenticity tradeoff as it was originally argued is a fallacious one. Medical educators have discovered a variety of ways to imbue authentic assessments with reliability, and vice versa. This chapter discusses the historical discussion around authenticity, and looks closely at three signatory assessments in medical education to glean lessons for assessors in other professions in bridging this supposed divide.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

eBook
USD 16.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 179.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    As with many educational concepts, authentic assessment is poorly and inconsistently defined in the literature (Frey et al. 2012). It has been, and continues to be, conflated with “performance assessment” and “formative assessment” (Baker and O’Neil 1996). This is an understandable development of usage, as the three movements arose from similar motivations. But it is important to remember that performance can be independent of context, whereas authentic assessment is always crafted with an eye towards the real world context of implementation (Wiggins 1991) and formative assessment more accurately describes the intended use of the assessment rather than what is being assessed.

  2. 2.

    It is important to note that Fig. 4.1 describes a simplified state of relative validity and reliability risk. As we will see later in this chapter, there is no reason that an inauthentic assessment could not be made highly valid, nor is there any reason that a truly authentic assessment could not be made highly reliable. But when comparing two assessments at either end of the continuum the difference in relative risks of invalidity and unreliability are worth addressing. Additionally, it is important to note that this model lumps various types of validity together, but is probably most descriptive of content and construct validity over other descriptors of validity.

  3. 3.

    Indeed, the roots of this movement run as far back as the 1950s, with Lindquist (1951; p. 152) arguing that “it should always be the fundamental goal of the achievement test constructor to make the elements of his test series as nearly equivalent to, or as much like, the elements of the criterion series as consequences of efficiency, comparability, economy, and expediency will permit.” (quote found by this author in Linn et al. 1991).

  4. 4.

    Following Gipps (1995), I use reliability in “relation to consistency as a basis for comparability; issues of consistent and comparable administration, comparability of the task, and comparability of assessment of performance (among raters)… rather than technical test-retest or split-half measures of reliability.” Likewise, rather than parse validity into differing measures of construct, content, and criterion-related validity, I will instead use validity in its most general application of how well the test or measure in question is subjectively viewed to cover the concept it is claiming to measure, so called face validity. For an exceptional overview of the technical aspects of validity as they relate to authentic/performance assessment, I turn the reader to Moss (1992); additionally, Linn et al. (1991) broaden the consideration of assessment beyond reliability and validity in ways that are illuminating but beyond the aims of this chapter.

  5. 5.

    Note that more recent analyses in the field of medicine, such as those done by Wimmers et al. (2007) suggest that content specificity alone does not completely explain differences in performance in the clinic. There is some X-factor that is independent to each learner that we must consider as well and that X-factor is likely to be some generalizable skill that each learner possesses to a greater or lesser degree.

  6. 6.

    Example downloaded from http://medicine.tufts.edu/~/media/TUSM/MD/PDFs/Education/OEA/Faculty%20Development/Evaluation_Writing%20Exam%20Questions%20for%20Basic%20Sciences.pdf on December 17, 2015.

References

  • Al Ansari, A., Ali, S. K., & Donnon, T. (2013). The construct and criterion validity of the mini-CEX: a meta-analysis of the published research. Academic Medicine, 88(3), 468–474.

    Article  Google Scholar 

  • Archbald, D. A., & Newmann, F. M. (1988). Beyond standardized testing: Assessing authentic academic achievement in the secondary school. Washington DC: Office of Educational Research and Improvement.

    Google Scholar 

  • Baker, E. L., & O‘Neil Jr, H. F. (1996). Performance assessment and equity. Implementing performance assessment: Promises, problems, and challenges, 183–199.

    Google Scholar 

  • Baron, M. A., & Boschee, F. (1995). Authentic assessment: The key to unlocking student success. Lancaster, PA: Order Department, Technomic Publishing Company, Inc.

    Google Scholar 

  • Black, H., Hale, J., Martin, S., & Yates, J. (1989). The quality of assessment. Edinburgh: Scottish Council for Research in Education.

    Google Scholar 

  • Broadfoot, P. (1996). Education, assessment and society: A sociological analysis. Open University Press.

    Google Scholar 

  • Burke, J., & Jessup, G. (1990). Assessment in NVQs: Disentangling validity from reliability. Assessment Debates, 188–196.

    Google Scholar 

  • Case, S. M., & Swanson, D. B. (1998). Constructing written test questions for the basic and clinical sciences (2nd ed.). Philadelphia, PA: National Board of Medical Examiners.

    Google Scholar 

  • Clarke, L., & wolf, A. (1991). Blue Badge Guides: Assessment of national knowledge requirements. Final Project Report to the Department of Employment (unpublished).

    Google Scholar 

  • Cohen, R., Reznick, R. K., Taylor, B. R., Provan, J., & Rothman, A. (1990). Reliability and validity of the Objective Structured Clinical Examination in assessing surgical residents. The American Journal of Surgery, 160, 302–305.

    Article  Google Scholar 

  • Cunnington, J. P. W., Neville, A. J., & Norman, G. R. (1997). The risks of thoroughness: Reliability and validity of global ratings and checklists in an OSCE. Advances in Health Sciences Education, 1, 227–233.

    Article  Google Scholar 

  • Darling-Hammond, L., Ancess, J., & Falk, B. (1995). Authentic assessment in action: Studies of schools and students at work. Teachers College Press.

    Google Scholar 

  • Darling-Hammond, L., & Snyder, J. (2000). Authentic assessment of teaching in context. Teaching and teacher education, 16(5), 523–545.

    Article  Google Scholar 

  • Dong, T., Swygert, K. A., Durning, S. J., Saguil, A., Gilliland, W. R., Cruess, D., et al. (2014). Validity evidence for medical school OSCEs: Associations with USMLE® step assessments. Teaching and Learning in Medicine, 26(4), 379–386.

    Article  Google Scholar 

  • Elstein, A. S., Shulman, L. S., & Sprafka, S. A. (1978). Medical problem solving: An analysis of clinical reasoning. Harvard University Press.

    Google Scholar 

  • Epstein, R. M. (2007). Assessment in medical education. New England Journal of Medicine, 356(4), 387–396.

    Article  Google Scholar 

  • Frey, B. B., Schmitt, V. L., & Allen, J. P. (2012). Defining authentic classroom assessment. Practical Assessment, Research & Evaluation, 17(2), 2.

    Google Scholar 

  • Gibbons, M., Limoges, C., Nowotny, H., Schwartzman, S., Scott, P., & Trow, M. (1994). The new production of knowledge: The dynamics of science and research in contemporary societies. Sage.

    Google Scholar 

  • Gipps, C. (1995). Reliability, validity, and manageability in large-scale performance assessment. Evaluating authentic assessment, 105–123.

    Google Scholar 

  • Gibbs, G. (1999). Using assessment strategically to change the way students learn. Assessment Matters in Higher Education, 41–53.

    Google Scholar 

  • Gipps, C., McCallum, B., McAlister, S., & Brown, M. (1991). National assessment at seven: some emerging themes. In C. Gipps (Ed.), British Educational Research Association Annual Conference.

    Google Scholar 

  • Glew, R. H., Ripkey, D. R., & Swanson, D. B. (1997). Relationship between students’ performances on the NBME Comprehensive Basic Science Examination and the USMLE Step 1: A longitudinal investigation at one school. Academic Medicine, 72(12), 1097–1102.

    Article  Google Scholar 

  • Greeno, J. G. (1989). A perspective on thinking. American Psychologist, 44(2), 134.

    Article  Google Scholar 

  • Gulikers, J. T., Bastiaens, T. J., Kirschner, P. A., & Kester, L. (2008). Authenticity is in the eye of the beholder: Student and teacher perceptions of assessment authenticity. Journal of Vocational Education and Training, 60(4), 401–412.

    Article  Google Scholar 

  • Harden, R. M. (1988). What is an OSCE? Medical Teacher, 10(1), 19–22.

    Article  Google Scholar 

  • Harden, R. M., & Gleeson, F. A. (1979). Assessment of clinical competence using an objective structured clinical examination (OSCE). Medical Education, 12, 41–54.

    Google Scholar 

  • Hodkinson, P. (1991). NCVQ and the 16‐19 curriculum. British Journal of Education and Work, 4(3), 25–38.

    Article  Google Scholar 

  • Jozefowicz, R. F., Koeppen, B. M., Case, S., Galbraith, R., Swanson, D., & Glew, R. H. (2002). The quality of in-house médical school examinations. Academic Medicine, 77(2), 156–161.

    Article  Google Scholar 

  • Khan, K. Z., Gaunt, K., Ramachandran, S., & Pushkar, P. (2013). The Objective Structured Clinical Examination (OSCE): AMEE Guide No. 81. Part II: Organisation & Administration. Medical Teacher, 35(9), e1447–e1463.

    Article  Google Scholar 

  • Khan, K., & Ramachandran, S. (2012). Conceptual framework for performance assessment: competency, competence and performance in the context of assessments in healthcare–deciphering the terminology. Medical teacher, 34(11), 920–928.

    Google Scholar 

  • Kibble, J. D., Johnson, T. R., Khalil, M. K., Peppler, R. D., & Davey, D. D. (2014). Use of the NBME Comprehensive Basic Science Exam as a progress test in the preclerkship curriculum of a new medical school. Advances in Physiology Education, 38, 315–320.

    Article  Google Scholar 

  • Kroboth, F. J., Hanusa, B. H., Parker, S., Coulehan, J. L., Kapoor, W. N., Brown, F. H., et al. (1992). The inter-rater reliability and internal consistency of a clinical evaluation exercise. Journal of General Internal Medicine, 7(2), 174–179.

    Article  Google Scholar 

  • Lee, M., & Wimmers, P. F. (2011). Clinical competence understood through the construct validity of three clerkship assessments. Medical Education, 45(8), 849–857.

    Article  Google Scholar 

  • Levine, H. G., McGuire, C. H., & Nattress Jr, L. W. (1970). The validity of multiple choice achievement tests as measures of competence in medicine. American Educational Research Journal, 69–82.

    Google Scholar 

  • Lindquist, E. F. (1951). Preliminary considerations in objective test construction. Educational Measurement, 119–158.

    Google Scholar 

  • Linn, R. L., Baker, E. L., & Dunbar, S. B. (1991). Complex, performance-based assessment: Expectations and validation criteria. Educational Researcher, 20(8), 15–21.

    Article  Google Scholar 

  • Maclellan, E. (2004). Authenticity in assessment tasks: A heuristic exploration of academics’ perceptions. Higher Education Research & Development, 23(1), 19–33.

    Article  Google Scholar 

  • Marzano, R. J., Pickering, D. J., & McTighe, J. (1993). Assessing student outcomes: Performance assessment using the dimensions of learning model. Aurora, CO: McREL Institute.

    Google Scholar 

  • Messick, S. (1989). Validity. In R. Linn (Ed.), Educational measurement (3rd ed.). Washington DC: Oryx Press.

    Google Scholar 

  • Miller, G. E. (1990). The assessment of clinical skills/competence/performance. Academic Medicine, 65(9), S63–S67.

    Article  Google Scholar 

  • Moss, P. A. (1992). Shifting conceptions of validity in educational measurement: Implications for performance assessment. Review of Educational Research, 62(3), 229–258.

    Article  Google Scholar 

  • Myles, T., & Galvez-Myles, R. (2003). USMLE Step 1 and 2 scores correlate with family medicine clinical and examination scores. Family Medicine-Kansas City-, 35(7), 510–513.

    Google Scholar 

  • Newmann, F. M., & Archbald, D. A. (1992). The nature of authentic academic achievement. Toward a new science of educational testing and assessment, 71–83.

    Google Scholar 

  • Norman, G. R., Smith, E. K. M., Powles, A. C. P., Rooney, P. J., Henry, N. L., & Dodd, P. E. (1987). Factors underlying performance on written tests of knowledge. Medical Education, 21(4), 297–304.

    Article  Google Scholar 

  • Norman, G. R., Muzzin, L. J., Williams, R. G., & Swanson, D. B. (1985). Simulation in health sciences education. Journal of Instructional Development, 8(1), 11–17.

    Article  Google Scholar 

  • Norcini, J. J., Blank, L. L., Duffy, F. D., & Fortna, G. S. (2003). The mini-CEX: A method for assessing clinical skills. Annals of Internal Medicine, 138(6), 476–481.

    Article  Google Scholar 

  • Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The mini-CEX (clinical evaluation exercise): A preliminary investigation. Annals of Internal Medicine, 123(10), 795–799.

    Article  Google Scholar 

  • Norcini, J. J. (2005). Current perspectives in assessment: the assessment of performance at work. Medical Education, 39(9), 880–889.

    Article  Google Scholar 

  • Norcini, J. J., & McKinley, D. W. (2007). Assessment methods in medical education. Teaching and teacher education, 23(3), 239–250.

    Article  Google Scholar 

  • Pell, G., Fuller, R., Homer, M., & Roberts, T. (2010). How to measure the quality of the OSCE: A review of metrics-AMEE guide no. 49. Medical Teacher, 32(10), 802–811.

    Article  Google Scholar 

  • Prais, S. J. (1991). Vocational qualifications in Britain and Europe: theory and practice. National Institute Economic Review, 136(1), 86–92.

    Article  Google Scholar 

  • Resnick, L. B., & Resnick, D. P. (1992). Assessing the thinking curriculum: New tools for educational reform. In Changing assessments (pp. 37–75). Netherlands: Springer.

    Google Scholar 

  • Shavelson, R. J., Baxter, G. P., & Pine, J. (1992). Performance assessments: Political rhetoric and measurement reality. Educational Researcher, 22–27.

    Google Scholar 

  • Simon, S. R., Volkan, K., Hamann, C., Duffey, C., & Fletcher, S. W. (2002). The relationship between second-year medical students’ OSCE scores and USMLE Step 1 scores. Medical Teacher, 24(5), 535–539.

    Article  Google Scholar 

  • Smee, S. (2003). ABC of learning and teaching in medicine: skill based assessment. BMJ: British Medical Journal, 326(7391), 703.

    Article  Google Scholar 

  • Svinicki, M. D. (2004). Authentic assessment: Testing in reality. New Directions for Teaching and Learning, 2004(100), 23–29.

    Article  Google Scholar 

  • Terwilliger, J. S. (1998). Rejoinder: response to Wiggins and Newmann. Educational Researcher, 27(6), 22–23.

    Google Scholar 

  • Van Der Vleuten, C. P., & Schuwirth, L. W. (2005). Assessing professional competence: from methods to programmes. Medical Education, 39(3), 309–317.

    Article  Google Scholar 

  • Vleuten, C. V. D., Luyk, S. V., Ballegooijen, A. V., & Swanson, D. B. (1989). Training and experience of examiners. Medical Education, 23(3), 290–296.

    Article  Google Scholar 

  • Vu, N. V., Steward, D. E., & Marcy, M. (1987). An assessment of the consistency and accuracy of standardized patients’ simulations. Academic Medicine, 62(12), 1000–1002.

    Article  Google Scholar 

  • Wiggins, G. (1991). Teaching to the (authentic) test. Educational Leadership, 46, 41–47.

    Google Scholar 

  • Wiggins, G. (1993). Assessment: Authenticity, context, and validity. Phi Delta Kappan, 75(3), 200–208.

    Google Scholar 

  • Wiggins, G. (1998). Educative Assessment. Designing Assessments To Inform and Improve Student Performance. San Francisco, CA: Jossey-Bass Publishers. 94104.

    Google Scholar 

  • Winckel, C. P., Reznick, R. K., Cohen, R., & Taylor, B. (1994). Reliability and construct validity of a structured technical skills assessment form. The American Journal of Surgery, 167(4), 423–427.

    Article  Google Scholar 

  • Wimmers, P. F., Splinter, T. A., Hancock, G. R., & Schmidt, H. G. (2007). Clinical competence: General ability or case-specific? Advances in Health Sciences Education, 12(3), 299–314.

    Article  Google Scholar 

  • Wolf, A. (1995). Authentic assessments in a competitive sector: Institutional prerequisites and cautionary tales. In H. Torrance (Ed.), Evaluating authentic assessment: Problems and possibilities in new approaches to assessment. Open University (Cited).

    Google Scholar 

  • Wolf, A., & Silver, R. (1986). Work based learning: Trainee assessment by supervisors.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Christopher O’Neal .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer International Publishing Switzerland

About this chapter

Cite this chapter

O’Neal, C. (2016). Beyond Authenticity: What Should We Value in Assessment in Professional Education?. In: Wimmers, P., Mentkowski, M. (eds) Assessing Competence in Professional Performance across Disciplines and Professions. Innovation and Change in Professional Education, vol 13. Springer, Cham. https://doi.org/10.1007/978-3-319-30064-1_4

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-30064-1_4

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-30062-7

  • Online ISBN: 978-3-319-30064-1

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics