Advances in Health Sciences Education

, Volume 12, Issue 2, pp 239–260 | Cite as

Broadening Perspectives on Clinical Performance Assessment: Rethinking the Nature of In-training Assessment

  • Marjan J. B. GovaertsEmail author
  • Cees P. M. van der Vleuten
  • Lambert W. T. Schuwirth
  • Arno M. M. Muijtjens



In-training assessment (ITA), defined as multiple assessments of performance in the setting of day-to-day practice, is an invaluable tool in assessment programmes which aim to assess professional competence in a comprehensive and valid way. Research on clinical performance ratings, however, consistently shows weaknesses concerning accuracy, reliability and validity. Attempts to improve the psychometric characteristics of ITA focusing on standardisation and objectivity of measurement thus far result in limited improvement of ITA-practices.


The aim of the paper is to demonstrate that the psychometric framework may limit more meaningful educational approaches to performance assessment, because it does not take into account key issues in the mechanics of the assessment process. Based on insights from other disciplines, we propose an approach to ITA that takes a constructivist, social-psychological perspective and integrates elements of theories of cognition, motivation and decision making. A central assumption in the proposed framework is that performance assessment is a judgment and decision making process, in which rating outcomes are influenced by interactions between individuals and the social context in which assessment occurs.


The issues raised in the article and the proposed assessment framework bring forward a number of implications for current performance assessment practice. It is argued that focusing on the context of performance assessment may be more effective in improving ITA practices than focusing strictly on raters and rating instruments. Furthermore, the constructivist approach towards assessment has important implications for assessment procedures as well as the evaluation of assessment quality. Finally, it is argued that further research into performance assessment should contribute towards a better understanding of the factors that influence rating outcomes, such as rater motivation, assessment procedures and other contextual variables.


clinical competence clinical education clinical ratings competence assessment educational measurement in-training assessment performance assessment rating process 


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.



The authors would like to thank Mereke Gorsira for critically reading and correcting the English manuscript.


  1. Barneveld van C. (2005). The dependability of medical students’ performance ratings as documented on in-training evaluations. Academic Medicine 80(3): 309–312CrossRefGoogle Scholar
  2. Bernardin, H.J., Orban, J.A. & Carlyle J.J. (1981). Performance ratings as a function of trust in appraisal and rater individual differences. Academy of Management Proceedings: 311–315Google Scholar
  3. Borman W.C., Motowidlo S.J. (1997). Task performance and contextual performance: the meaning for personnel selection research. Human Performance 10: 99–109CrossRefGoogle Scholar
  4. Cardy R.L., Bernardin H.J., Abbott J.G., Senderak M.P., Taylor K. (1987) The effects of individual performance schemata and dimension familiarization on rating accuracy. Journal of Occupational Psychology 60: 197–205Google Scholar
  5. Chi M.T.H., Glaser R., Farr M.J. (1989). The Nature of Expertise. Hillsdale, New JerseyGoogle Scholar
  6. Clauser B.E., Schuwirth L.W.T. (2002). The use of computers in assessment. In: G.R. Norman, C.P.M. van der Vleuten, D.I. Newble (eds), International Handbook of Research in Medical Education, Kluwer Academic Publishers, Dordrecht, pp.757–792Google Scholar
  7. Cleveland J.N., Murphy K.R., Williams R.E. (1989). Multiple uses of performance appraisal: prevalence and correlates. Journal of Applied Psychology 74: 130–135CrossRefGoogle Scholar
  8. Coderre S., Mandin H., Harasym P.H., Fick G.H. (2003). Diagnostic reasoning strategies and diagnostic success. Medical Education 37: 695–703CrossRefGoogle Scholar
  9. Crooks T. (1998). The impact of classroom evaluation practices on students. Review of Educational Research 58(4): 438–481CrossRefGoogle Scholar
  10. Delandshere G., Petrosky A.R. (1998). Assessment of complex performances: limitations of key measurement assumptions. Educational Researcher 27(2): 14–24CrossRefGoogle Scholar
  11. DeNisi A.S., Peters L.H. (1996). Organization of information in memory and the performance appraisal process: evidence from the field. Journal of Applied Psychology 81(6): 717–737CrossRefGoogle Scholar
  12. DeNisi A.S., Robbins T., Cafferty T.P. (1989). Organization of information used for performance appraisals: role of diary-keeping. Journal of Applied Psychology 74(1): 124–129CrossRefGoogle Scholar
  13. DeNisi A.S., Williams K.J. (1988). Cognitive approaches to performance appraisal. In: G. Ferris, K. Rowland (eds) Research in Personnel and Human Resource Management (Vol. 6). JAI Press, Greenwich, CTGoogle Scholar
  14. Driessen E., Vleuten van der C., Schuwirth L., Tartwijk van J., Vermunt J. (2005). The use of qualitative research criteria for portfolio assessment as an alternative to reliability evaluation: a case study. Medical Education, 39: 214–220CrossRefGoogle Scholar
  15. Erdogan B., Kraimer M.L., Liden R.C. (2001). Procedural justice as a two-dimensional construct. An examination in the performance appraisal context. Journal of Applied Behavioural Science 37(2): 205–222Google Scholar
  16. Eva K.W. (2004). What every teacher needs to know about clinical reasoning. Medical Education 39: 98–106CrossRefGoogle Scholar
  17. Fiske S.T., Taylor S.E. (1991). Social Cognition (2nd ed). McGraw-Hill, New YorkGoogle Scholar
  18. Forgas J.P., George J.M. (2001). Affective influences on judgments and behavior in organizations: an information processing perspective. Organizational Behavior and Human Decision Processes 86(1): 3–34CrossRefGoogle Scholar
  19. Forgas J.P. (2002). Feeling and doing: influences on interpersonal behavior. Psychological Inquiry 13(1): 1–28CrossRefGoogle Scholar
  20. Govaerts M.J.B., Vleuten van der C.P.M., Schuwirth L.W.T., Muijtjens A.M.M. (2005). The use of observational diaries in in-training evaluation: student perceptions. Advances in Health Sciences Education 10: 171–188CrossRefGoogle Scholar
  21. Gray J.D. (1996). Global rating scales in residency education. Academic Medicine 71(1): S55–S63CrossRefGoogle Scholar
  22. Greguras G.J., Robie C., Schleicher D.J., Goff M. III (2003). A field study of the effects of rating purpose on the quality of multisource ratings. Personnel Psychology 56: 1–20CrossRefGoogle Scholar
  23. Guba E., Lincoln Y. (1989). Fourth Generation Evaluation. Sage Publications, LondonGoogle Scholar
  24. Harris M. (1994). Rater motivation in the performance appraisal context: a theoretical framework. Journal of Management 20(4): 737–756CrossRefGoogle Scholar
  25. Hauenstein N.M.A. (1992). An information-processing approach to leniency in performance judgments. Journal of Applied Psychology 77(4): 485–493CrossRefGoogle Scholar
  26. Hawe E. (2003). It’s pretty difficult to fail: the reluctance of lecturers to award a failing grade. Assessment and Evaluation in Higher Education 28(4): 371–382CrossRefGoogle Scholar
  27. Hodgkinson G.P. (2003). The interface of cognitive and industrial, work and organizational psychology. Journal of Occupational and Organizational Psychology 76: 1–25CrossRefGoogle Scholar
  28. Hoffman K.G., Donaldson J.F. (2004). Contextual tensions of the clinical environment and their influence on teaching and learning. Medical Education 38: 448–454CrossRefGoogle Scholar
  29. Hogg M.A. (2003). Introducing social psychology. In: Hogg M.A. (ed) Social Psychology, Vol. I: Social Cognition and Social Perception. Sage Publications, London, pp xxi–lixGoogle Scholar
  30. Holmboe E.S. (2004). Faculty and the observation of trainees’ clinical skills: problems and opportunities. Academic Medicine 79(1): 16–22CrossRefGoogle Scholar
  31. Hull A.L., Hodder S., Berger B., Ginsberg D., Lindheim N., Quan J., Kleinhenz M. (1995). Validity of three clinical performance assessments of internal medicine clerks. Academic Medicine 70(6): 517–522CrossRefGoogle Scholar
  32. Jelley R.B., Goffin R.D. (2001). Can performance-feedback accuracy be improved? Effects of rater priming and rating-scale format on rating accuracy. Journal of Applied Psychology 86(1): 134–144CrossRefGoogle Scholar
  33. Johnson J.W. (2001). The relative importance of task and contextual performance dimensions to supervisor judgements of overall performance. Journal of Applied Psychology 86(5): 984–996CrossRefGoogle Scholar
  34. Johnston B. (2004). Summative assessment of portfolios: an examination of different approaches to agreement over outcomes. Studies in Higher Education 29(3): 395–412CrossRefGoogle Scholar
  35. Judge T.A., Ferris G.R. (1993). Social context of performance evaluation decisions. Academy of Management Journal 36(1): 80–105CrossRefGoogle Scholar
  36. Kahn M.J., Merrill W.W., Anderson D.S., Szerlip H.M. (2001). Residency program director evaluations do not correlate with performance on a required 4th-year objective structured clinical examination. Teaching and Learning in Medicine 13(1): 9–12CrossRefGoogle Scholar
  37. Klimoski R., Inks L. (1990). Accountability forces in performance appraisal. Organizational Behavior and Human Decision Processes, 45: 194–208CrossRefGoogle Scholar
  38. Krefting L. (1991). Rigor in qualitative research: the assessment of trustworthiness. American Journal of Occupational Therapy 45: 214–222Google Scholar
  39. Komatsu L.K. (1992). Recent views on conceptual structure. Psychological Bulletin 112(3): 500–526CrossRefGoogle Scholar
  40. Kozlowski S.W.J., Mongillo M. (1992). The nature of conceptual similarity schemata: examination of some basic assumptions. Personality and Social Psychology Bulletin 18: 88–95Google Scholar
  41. Kwolek C.J., Donnelly M.B., Sloan D.A., Birrell S.N., Strodel W.E., Schwartz R.W. (1997). Ward evaluations: should they be abandoned? Journal of Surgical Research, 69(1): 1–6CrossRefGoogle Scholar
  42. Lance C.E., LaPointe J.A., Stewart A.M. (1994). A test of the context dependency of three causal models of halo rater error. Journal of Applied Psychology 79(3): 332–340CrossRefGoogle Scholar
  43. Lance C.E., Teachout M.S., Donnelly T.M. (1992). Specification of the criterion construct space: an application of hierarchical confirmatory factor analysis. Journal of Applied Psychology 77(4): 437–452CrossRefGoogle Scholar
  44. Landy F.J., Farr J.L. (1980). Performance rating. Psychological Bulletin 87(1): 72–107CrossRefGoogle Scholar
  45. Lievens F. (2001). Assessor training strategies and their effects on accuracy, interrater reliability and discriminant validity. Journal of Applied Psychology 86(2): 225–264CrossRefGoogle Scholar
  46. Littlefield J.H., DaRosa D.A., Anderson K.D., Bell R.M., Nicholas G.G., Wolfson P.J. (1991). Assessing performance in clerkships: accuracy of surgery clerkship performance raters. Academic Medicine 66(9), S16–S18CrossRefGoogle Scholar
  47. Longenecker C.O., Gioia D.A. (2000). Confronting the “politics” in performance appraisal. Business Forum, 25(3,4): 17–23Google Scholar
  48. van Luijk, S.J., van der Vleuten, C.P.M. & Schelven, R.M. (1990). The relation between content and psychometric characteristics in performance-based testing. In W. Bender, R.J. Hiemstra, A.J.J.A. Scherpbier & R.P. Zwierstra (eds.), Teaching and Assessing Clinical Competence, pp. 497–502. Groningen: Boekwerk PublicationsGoogle Scholar
  49. McDowell L. (1995). The impact of innovative assessment on student learning. Innovations in Education and Training International, 32(4): 302–313Google Scholar
  50. McGaghie, W.C. (1993). Evaluating competence for professional practice. In: L. Curry, J.F. Wergin & Associates (eds.), Educating Professionals: Responding to New Expectations for Competence And Accountability, pp. 229–261. San Francisco: Jossey-Bass Inc., PublishersGoogle Scholar
  51. McIlroy J.H., Hodges B., McNaughton N., Regehr G. (2002). The effect of candidates’ perceptions of the evaluation method on reliability of checklist and global rating scores in an objective structured clinical examination. Academic Medicine 77: 725–728CrossRefGoogle Scholar
  52. Mero N.P., Motowidlo S.J. (1995). Effects of rater accountability on the accuracy and the favorability of performance ratings. Journal of Applied Psychology 80(4): 517–524CrossRefGoogle Scholar
  53. Mero N.P., Motowidlo S.J., Anna A.L. (2003). Effects of accountability on rating behavior and rater accuracy. Journal of Applied Social Psychology 33(12): 2493–2514CrossRefGoogle Scholar
  54. Messick S. (1994). The interplay of evidence and consequences in the validation of performance assessments. Educational Researcher 23(2): 13–23CrossRefGoogle Scholar
  55. Middendorf C.H., Macan T.H. (2002). Note-taking in the employment interview: effects on recall and judgments. Journal of Applied Psychology 87(2): 293–303CrossRefGoogle Scholar
  56. Murphy K.R., Balzer W.K. (1986). Systematic distortions in memory-based behavior ratings and performance evaluation: consequences for rating accuracy. Journal of Applied Psychology 71: 39–44CrossRefGoogle Scholar
  57. Murphy K.R., Balzer W.K. (1989). Rating errors and rating accuracy. Journal of Applied Psychology, 74(4): 619–624CrossRefGoogle Scholar
  58. Murphy K.R., Cleveland J.N. (1995). Understanding Performance Appraisal. Social, Organizational and Goal-based Perspectives. Sage Publications, Thousand Oaks, CAGoogle Scholar
  59. Murphy K.R., Cleveland J.N., Skattebo A.L., Kinney T.B. (2004). Raters who pursue different goals give different ratings. Journal of Applied Psychology 89(1): 158–164CrossRefGoogle Scholar
  60. Murphy K.R., Balzer W.K., Kellam K.L., Armstrong J. (1984). Effects of purpose of rating on accuracy in observing teacher behavior and evaluating teaching behavior. Journal of Educational Psychology 76: 45–54CrossRefGoogle Scholar
  61. Nahum G.G. (2004). Evaluating medical student obstetrics and gynecology clerkship performance: which assessment tools are most reliable? American Journal of Obstetrics and Gynaecology 191: 1762–1771CrossRefGoogle Scholar
  62. Nichols P.D., Smith P.L. (1998). Contextualizing the interpretation of reliability data. Educational Measurement: Issues and Practice 17: 24–36CrossRefGoogle Scholar
  63. Noel G.L., Herbers J.E.J., Caplow M.P., Cooper G.S., Pangaro L.N., Harvey J. (1992). How well do internal medicine faculty members evaluate the clinical skills of residents? Annals of Internal Medicine 117: 757–765Google Scholar
  64. Norman G. (2005). Research in clinical reasoning: past history and current trends. Medical Education 39(4): 418–427CrossRefGoogle Scholar
  65. Pangaro L.N. (2000). Investing in descriptive evaluation: a vision for the future of assessment. Medical Teacher 22(5): 478–481CrossRefGoogle Scholar
  66. Petrusa E.R. (2002). Clinical performance assessments. In: G.R. Norman, C.P.M. van der Vleuten, D.I. Newble (eds), International Handbook of Research in Medical Education, Kluwer Academic Publishers, Dordrecht, pp.673–709Google Scholar
  67. Piggot-Irvine E. (2003). Key features of appraisal effectiveness. The International Journal of Educational Management 17(4): 170–178CrossRefGoogle Scholar
  68. Prescott L.E., Norcini J.J., McKinlay P., Rennie J.S. (2002). Facing the challenges of competency-based assessment of postgraduate dental training: longitudinal evaluation of perfromance (LEP). Medical Education 36: 92–97CrossRefGoogle Scholar
  69. Ramsey P.G., Wenrich M.D., Carline J.D., Inui T.S., Larson E.B., Logerfo J.P. (1993). Use of peer ratings to evaluate physician performance. Journal of the American Medical Association 269(13): 1655–1660CrossRefGoogle Scholar
  70. Reznick R.K., Rajaratanam K. (2000). Performance-based assessment. In: L.H. Distlehorst, G.L. Dunnington, J.R. Folse (eds) Teaching and Learning in Medical and Surgical Education. Lessons Learned for the 21st Century, Lawrence Erlbaum Ass, Mahwah NJ, pp. 237–243Google Scholar
  71. Rothman A.J., Schwarz N. (1998). Constructing perceptions of vulnerability: personal relevance and the use of experiential information in health judgments. Personality and Social Psychology Bulletin 24(10): 1053–1064Google Scholar
  72. Rust C., O’Donovan B., Price M. (2005). A social constructivist assessment process model: how the research literature shows us this could be best practice. Assessment & Evaluation in Higher Education 30(3): 231–240CrossRefGoogle Scholar
  73. Sanchez J.I., DeLaTorre P. (1996). A second look at the relationship between rating and behavioral accuracy in performance appraisal. Journal of Applied Psychology 81(1): 3–10CrossRefGoogle Scholar
  74. Schleicher D.J., Day D.V. (1998) A cognitive evaluation of frame-of-reference rater training: content and process issues. Organizational Behaviour and Human Decision Processses 73(1): 76–101CrossRefGoogle Scholar
  75. Schmidt H.G., Norman G.R., Boshuizen H.P.A. (1990). A cognitive perspective on medical expertise: theory and implications. Academic Medicine 65(10): 611–621CrossRefGoogle Scholar
  76. Schwind C.J., Williams R.G., Boehler M.L., Dunnington G.L. (2004). Do individual attending post-rotation performance ratings detect resident clinical performance deficiencies? Academic Medicine 79: 453–457CrossRefGoogle Scholar
  77. Siemer M., Reisenzein R. (1998). Effects of mood on evaluative judgements: influence of reduced processing capacity and mood salience. Cognition and Emotion 12(6): 783–805CrossRefGoogle Scholar
  78. Silber C.G., Nasca T.J., Paskin D.L., Eiger G., Robeson M., Veloski J.J. (2004). Do global rating forms enable program directors to assess the ACGME competencies? Academic Medicine 79: 549–556CrossRefGoogle Scholar
  79. Sloan D.A., Donnelly M.B., Drake D.B., Schwartz R.W. (1995). Faculty sensitivity in detecting medical students’ clinical competence. Medical Teacher 17(3): 335–342Google Scholar
  80. Speer A.J., Soloman D.J., Fincher R.M. (2000). Grade inflation in internal medicine clerkships: results of a national survey. Teaching and Learning in Medicine 12: 112–116CrossRefGoogle Scholar
  81. Sulsky L.M., Keown J.L. (1999). Performance appraisal in the changing world of work: implications for the meaning and measurement of work performance. Canadian Psychology 39(1–2): 52–59Google Scholar
  82. Taylor M.S., Tracy K.B., Renard M.K., Harrison J.K., Carroll S.J. (1995). Due process in performance appraisal: a quasi-experiment in procedural justice. Administrative Science Quarterly 40: 495–523CrossRefGoogle Scholar
  83. Tetlock P.E. (1983). Accountability and complexity of thought. Journal of Personality and Social Psychology 45: 74–83CrossRefGoogle Scholar
  84. Tetlock P.E. (1985). Accountability: the neglected social context of judgment and choice. In: L.L. Cummings, B.M. Staw (eds) Research in Organizational Behavior Vol. 7, JAI Press, Greenwich, CT, pp 297–332Google Scholar
  85. Tigelaar D.E.H., Dolmans D.H.J.M., Wolfhagen I.H.A.P., van der Vleuten C.P.M. (2005). Quality issues in judging portfolios: implications for organizing teaching portfolio assessment procedures. Studies in Higher Education 30(5): 595–610CrossRefGoogle Scholar
  86. Turnbull J., Barneveld van C. (2002). Assessment of clinical performance: in-training evaluation. In: G.R. Norman, C.P.M. van der Vleuten, D.I. Newble (eds), International Handbook of Research in Medical Education, Kluwer Academic Publishers, Dordrecht, pp. 793–810Google Scholar
  87. Verhulst S., Colliver J., Paiva R., Williams R.G. (1986). A factor analysis of performance of first-year residents. Journal of Medical Education 61: 132–134Google Scholar
  88. Vleuten van der C.P.M. (1996). The assessment of professional competence: developments, research and practical implications. Advances in Health Sciences Education 1: 41–67CrossRefGoogle Scholar
  89. Vleuten van der C.P.M., Schuwirth L.W.T. (2005). Assessing professional competence: from methods to programmes. Medical Education 39: 309–317CrossRefGoogle Scholar
  90. Vleuten van der C.P.M., Scherpbier A.J.J.A., Dolmans D.H.J.M., Schuwirth L.W.T., Verwijnen G.M., Wolfhagen H.A.P. (2000). Clerkship assessment assessed. Medical Teacher 22(6): 592–600CrossRefGoogle Scholar
  91. Walsh J.P. (1995). Managerial and organizational cognition: notes from a trip down memory lane. Organization Science 6(3): 280–321CrossRefGoogle Scholar
  92. Williams K.J., DeNisi A.S., Blencoe A.G., Cafferty T.P. (1985). The role of appraisal purpose: effects of purpose on information acquisition and utilization. Organizational Behavior and Human Performance 35: 314–339Google Scholar
  93. Williams R.G., Klamen D.A., McGaghie W.C. (2003). Cognitive, social and envrionmental sources of bias in clinical performance settings. Teaching and Learning in Medicine 15(4): 270–292CrossRefGoogle Scholar
  94. Woehr D.J., Huffcutt A.I. (1994). Rater training for performance appraisal: a quantitative review. Journal of Occupational and Organisational Psychology 67: 189–205Google Scholar
  95. Zedeck S. (1986). A process analysis of the assessment center method. Research in Organizational Behavior 8: 259–296Google Scholar
  96. Zieky M.J. (2001). So much has changed: how the setting of cutscores has evolved since the 1980s. In G.J. Cizek (ed) Setting Performance Standard: Concepts, Methods and Perspectives, Lawrence Erlbaum Associates, Mahwah NJ, pp. 19–53Google Scholar

Copyright information

© Springer Science+Business Media, Inc. 2006

Authors and Affiliations

  • Marjan J. B. Govaerts
    • 1
    Email author
  • Cees P. M. van der Vleuten
    • 1
  • Lambert W. T. Schuwirth
    • 1
  • Arno M. M. Muijtjens
    • 1
  1. 1.Department of Educational Development and Research, Faculty of MedicineMaastricht UniversityMaastrichtThe Netherlands

Personalised recommendations