Advances in Health Sciences Education

, Volume 14, Issue 2, pp 219–232 | Cite as

The reliability of workplace-based assessment in postgraduate medical education and training: a national evaluation in general practice in the United Kingdom

  • Douglas J. Murphy
  • David A. Bruce
  • Stewart W. Mercer
  • Kevin W. Eva
Original Paper


To investigate the reliability and feasibility of six potential workplace-based assessment methods in general practice training: criterion audit, multi-source feedback from clinical and non-clinical colleagues, patient feedback (the CARE Measure), referral letters, significant event analysis, and video analysis of consultations. Performance of GP registrars (trainees) was evaluated with each tool to assess the reliabilities of the tools and feasibility, given raters and number of assessments needed. Participant experience of process determined by questionnaire. 171 GP registrars and their trainers, drawn from nine deaneries (representing all four countries in the UK), participated. The ability of each tool to differentiate between doctors (reliability) was assessed using generalisability theory. Decision studies were then conducted to determine the number of observations required to achieve an acceptably high reliability for “high-stakes assessment” using each instrument. Finally, descriptive statistics were used to summarise participants’ ratings of their experience using these tools. Multi-source feedback from colleagues and patient feedback on consultations emerged as the two methods most likely to offer a reliable and feasible opinion of workplace performance. Reliability co-efficients of 0.8 were attainable with 41 CARE Measure patient questionnaires and six clinical and/or five non-clinical colleagues per doctor when assessed on two occasions. For the other four methods tested, 10 or more assessors were required per doctor in order to achieve a reliable assessment, making the feasibility of their use in high-stakes assessment extremely low. Participant feedback did not raise any major concerns regarding the acceptability, feasibility, or educational impact of the tools. The combination of patient and colleague views of doctors’ performance, coupled with reliable competence measures, may offer a suitable evidence-base on which to monitor progress and completion of doctors’ training in general practice.


Workplace-based assessment Multi-source feedback Patient satisfaction Medical education Physician assessment 



The completion of the pilot was made possible thanks to the help and enthusiasm 171 GP registrars and staff from the Wales, Northern Ireland, Mersey, KSS, East Scotland, North and North East Scotland, South East Scotland and West Midlands Deaneries.

The authors would like to thank Mrs. Angela Inglis (Team Leader and Personal Assistant to Dr. David Bruce, GP Director in the East of Scotland Deanery) and her team, (Lee-Ann Troup, Linda Kirkcaldy, Susan Smith, Carol Ironside and Gill Ward) for their help, support, and contribution to the work contained in this paper.

© CARE SW Mercer, Scottish Executive 2004: The CARE Measure was originally developed by Dr. Stewart Mercer and colleagues as part of a Health Services Research Fellowship funded by the Chief Scientist Office of the Scottish Executive (2000–2003). The intellectual property rights of the measure belong to the Scottish Ministers. The measure is available for use free of charge for staff of the NHS and for research purposes, but cannot be used for commercial purposes. Anyone wishing to use the measure should contact and register with Stewart Mercer (email:

© MSF Tool—NHS Education for Scotland 2005–2006: This two question Multi-Source Feedback (MSF) was developed by Drs. Douglas Murphy, David Bruce, and Kevin Eva on behalf of NHS Education Scotland (2005–2006). The measure is available for use free of charge for staff of the NHS and for research purposes, but cannot be used for commercial purposes. Anyone wishing to use the measure should contact and register with Douglas Murphy or David Bruce

Ethical approval: Formal application and submission of the research proposal was made and ethical approval granted for all of the work contained in this paper by NHS Ethics Committee (Glasgow West).

Conflict of interest and source of funding statement

NHS Education Scotland and The Royal College of General Practitioners (RCGP) funded this study. DM was and DB is employed by NHS Education Scotland. DM and SWM are supported by a Primary Care Research Career Award Chief Scientist Office, Scottish Executive Health Department. The RCGP had no role in study design, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data and had final responsibility for the decision to submit for publication. Contributors D. Murphy and K. Eva designed the studies. Data collection was done by D. Murphy and D. Bruce. Data were analysed by D. Murphy and K. Eva. Data were interpreted by D. Murphy, D. Bruce, S. Mercer and K. Eva. The manuscript was written by D. Murphy, D. Bruce, S. Mercer and K. Eva. All authors were involved in the decision to submit the manuscript for publication.

Supplementary material

10459_2008_9104_MOESM1_ESM.doc (396 kb)
(DOC 396 kb)


  1. Ackerman, E. W., & Mitchell, G. K. (2006). An audit of structured diabetes care in a rural general practice. Medical Journal of Australia, 185(2), 69–72.Google Scholar
  2. Archer, J. C., Norcini, J., & Davies, H. A. (2005). Use of SPRAT for peer review of paediatricians in training. British Medical Journal, 330, 1251–1253.CrossRefGoogle Scholar
  3. Aveyard, P. (1997). Monitoring the performance of general practices. Journal of Evaluation in Clinical Practice, 3(4), 275–281.CrossRefGoogle Scholar
  4. Baker, R., Jones David, R., & Goldblatt, P. (2003). Monitoring mortality rates in general practice after Shipman. British Medical Journal, 326(7383), 274–276.CrossRefGoogle Scholar
  5. Campbell, L. M., Howie, J. G. R., & Murray, T. S. (1993). Summative assessment: A pilot project in the west of Scotland. British Journal of General Practice, 43, 430–434.Google Scholar
  6. Campbell, L. M., Howie, J. G., & Murray, T. S. (1995). Use of videotaped consultations in summative assessment of trainees in general practice. British Journal of General Practice, 45(392), 137–141.Google Scholar
  7. Crossley, J. G. M., Howe, A., Newble, D., Jolly, B., & Davies, H. A. (2001). Sheffield Assessment Instrument for Letters (SAIL): Performance assessment using outpatient letters. Medical Education, 35, 1115–1124.CrossRefGoogle Scholar
  8. Davis, M. H., Friedman, M., Harden, R. M., Howie, P., Ker, J., McGhee, C., Pippard, M. J., & Snadden, D. (2001). Portfolio assessment in medical students’ final examinations. Medical Teacher, 23, 357–366.CrossRefGoogle Scholar
  9. Eva, K. W. (2007). Putting the cart before the horse: Testing to improve learning. British Medical Journal, 334, 535.CrossRefGoogle Scholar
  10. Evans, R. G., Edwards, A., Evans, S., Elwyn, B., & Elwyn, G. (2007). Assessing the practicing physician using patient surveys: A systematic review of instruments and feedback methods. Family Practice, 24, 128–137.CrossRefGoogle Scholar
  11. Evans, R., Elwyn, G., & Edwards, A. (2004). Review of instruments for peer assessment of physicians. British Medical Journal, 328, 1240–1243.CrossRefGoogle Scholar
  12. Grant, A. J., Vermunt, J. D., Kinnersley, P., & Houston, H. (2007). Exploring students’ perceptions on the use of significant event analysis, as part of a portfolio assessment process in general practice, as a tool for learning how to use reflection in learning. BMC Medical Education, 7, 5. doi: 10.1186/1472-6920-7-5.
  13. Howie, J. G. R., Heaney, D. J., Maxwell, M., & Walker, J. J. (1998). A comparison of a Patient Enablement Instrument (PEI) against two established satisfaction scales as an outcome measure of primary care consultations. Family Practice, 15(2), 165–171.CrossRefGoogle Scholar
  14. Joshi, H. et al. (2007). Developing and maintaining an assessment system – a PMETB guide to good practice. January 2007. (accessed 10.01.2008).
  15. Lockyer, J. (2003). Multi-source feedback in the assessment of physician competencies. Journal of Continuing Education in the Health Professions, 23(1), 4–12.CrossRefGoogle Scholar
  16. Lough, J. R., & Murray, T. S. (2001). Audit and summative assessment: A completed audit cycle. Medical Education, 35(4), 357–363.CrossRefGoogle Scholar
  17. Lynn, M. R. (1986). Determination and quantification of content validity. Nursing Research, 35, 382–385.CrossRefGoogle Scholar
  18. McKay, J., Bowie, P., & Lough, M. (2003). Evaluating significant event analyses: Implementing change is a measure of success. Education for Primary Care, 14, 34–38.Google Scholar
  19. McKay, J., Murphy, D. J., Bowie, P., Schmuck, M., Lough, M., & Eva, K. W. (2007). Development and testing of an instrument for the formative peer assessment of significant event analyses. Quality and Safety in Health Care, 16, 150–153.CrossRefGoogle Scholar
  20. Mercer, S. W., & Howie, J. G. R. (2006). CQI-2, a new measure of holistic, interpersonal care in primary care consultations. British Journal of General Practice, 56(525), 262–268.Google Scholar
  21. Mercer, S. W., McConnachie, A., Maxwell, M., Heaney, D. H., & Watt, G. C. M. (2005). Relevance and performance of the Consultation and Relational Empathy (CARE) Measure in general practice. Family Practice, 22(3), 328–334.CrossRefGoogle Scholar
  22. Mercer, S. W., Watt, G. C. M., Maxwell, M., & Heaney, D. H. (2004). The development and preliminary validation of the Consultation and Relational Empathy (CARE) Measure: An empathy-based consultation process measure. Family Practice, 21(6), 699–705.CrossRefGoogle Scholar
  23. Modernising Medical Careers (MMC). (accessed 10.05.2007).
  24. Multi-Source Feedback: 360° Team Assessment of Behaviour (TAB) West Midlands Deanery, UK. (accessed 10.05.2007).
  25. Murphy, D. J., Bruce, D. A., & Eva, K. W. (2008). Workplace-based assessment for general practitioners: Using stakeholder perception to aid blueprinting of an assessment battery. Medical Education, 42, 96–103.CrossRefGoogle Scholar
  26. National Office for Summative Assessment. First level assessor’s instructions and marking schedule. (accessed 17.02.2008)
  27. National Office for Summative Assessment. (accessed 22.05.2007).
  28. Norcini, J. J., Blank, L. L., Arnold, G. K., & Kimball, H. R. (1995). The Mini-CEX (Clinical Evaluation Exercise): A preliminary investigation. Annals of Internal Medicine, 123(10), 795–799.Google Scholar
  29. Pitts, J., Coles, C., & Thomas, P. (1999). Educational portfolios in the assessment of general practice trainers: Reliability of assessors. Medical Education, 33, 515–520.CrossRefGoogle Scholar
  30. Ram, P., Grol, R., Rethans, J. J., Schouten, B., van der Vleuten, C., & Kester, A. (1999). Assessment of general practitioners by video observation of communicative and medical performance in daily practice: Issues of validity, reliability, and feasibility. Medical Education, 33(6), 447–454.CrossRefGoogle Scholar
  31. Ramsay, P. G., Weinrich, M. D., Carline, J. D., Innui, T. S., Larson, E. B., & LoGerfo, J. P. (1993). Use of peer ratings to evaluate physician performance. The Journal of the American Medical Association, 269, 1655–1660.CrossRefGoogle Scholar
  32. Ramsey, P., & Wenrich, M. (1999). Peer ratings: An assessment tool whose time has come. Journal of General Internal medicine, 14, 581–582.CrossRefGoogle Scholar
  33. RCGP: Video assessment of consulting skills in 2008; Workbook and instructions (accessed 13.01.2008).
  34. Referral Advice. (2001). A guide to appropriate referral from general to specialist services. London: National Institute for Clinical Evidence (NICE).Google Scholar
  35. Reznick, R., Smee, S., Rothman, A., Chalmers, A., Swanson, D., Dufresne, L., Lacombe, G., Baumber, J., Poldre, P., Lavasseur, L., et al. (1992). An objective structured clinical examination for the licentiate: Report of the pilot project of the Medical Council of Canada. Academic Medicine, 67, 487–494.CrossRefGoogle Scholar
  36. Roberts, C. (2002). Portfolio-based assessments in medical education: Are they valid and reliable for summative purposes? Medical Education, 36, 899–900.CrossRefGoogle Scholar
  37. Sargeant, J., Mann, K., & Ferrier, S. (2005). Making available a mentoring service to support physician feedback, reflection, learning and change, can increase acceptance and use of feedback. Medical Education, 39, 497–504.CrossRefGoogle Scholar
  38. Sargeant, J., Mann, K., Sinclair, D., van der Vleuten, C., & Metsemakers, J. (2007). Challenges in multi-source feedback: Intended and unintended outcomes. Medical Education, 41, 583–591.CrossRefGoogle Scholar
  39. Schuwirth, L. W. T., & van der Vleuten, C. P. M. (2006). Challenges for educationalists. British Medical Journal, 333, 544–546.CrossRefGoogle Scholar
  40. Scottish Intercollegiate Guideline Network (SIGN). (1998). Report on a recommended referral document. Edinburgh: SIGN.Google Scholar
  41. Scottish Revalidation Toolkit, RCGP Scotland. (accessed 10.05.2007).
  42. Streiner, D. L., & Norman, G. R. (2003). Health measurement scales (3rd ed.). Oxford: Oxford Medical Publications.Google Scholar
  43. Swanson, D., Norman, G. R., & Linn, R. I. (1995). Performance based assessment: Lessons from the health professions. Educational Researcher, 24, 5–12.Google Scholar
  44. Tate, P., Foulkes, J., Neighbour, R., Campion, P., & Field, S. (1999). Assessing physicians’ interpersonal skills via videotaped encounters: A new approach for the Royal College of general practitioners membership examination. Journal of Health Communication, 4, 143–152.CrossRefGoogle Scholar
  45. van der Vleuten, C. P. M. (1996). The assessment of professional competence: Developments, research and practical implications. Advances in Health Sciences Education, 1, 41–67.CrossRefGoogle Scholar
  46. Verhulst, S. J., Colliver, J. A., Paiva, R. E., & Williams, R. G. (1986). A factor analysis study of first-year residents. Journal of Medical Education, 61, 132–134.Google Scholar
  47. available
  48. Williams, R. G., Verhulst, S., Colliver, J. A., & Dunnington, G. L. (2005). Assessing the reliability of resident performance appraisals: More items or more observations? Surgery, 137, 141–147.CrossRefGoogle Scholar

Copyright information

© Springer Science+Business Media B.V. 2008

Authors and Affiliations

  • Douglas J. Murphy
    • 1
  • David A. Bruce
    • 2
  • Stewart W. Mercer
    • 3
  • Kevin W. Eva
    • 4
  1. 1.Community Health Sciences DivisionUniversity of DundeeDundeeUK
  2. 2.NHS Education for Scotland, East of Scotland DeaneryDundeeScotland, UK
  3. 3.Section of General Practice and Primary Care Division of Community-Based SciencesUniversity of GlasgowGlasgowScotland, UK
  4. 4.Department of Clinical Epidemiology and Biostatistics, Programme for Educational Research and DevelopmentMcMaster UniversityHamiltonCanada

Personalised recommendations