Background

There are a plethora of educational programmes and implementation strategies aimed at improving the quality of care delivered by health care professionals. A number of these are delivered via information technology systems with the use of video as an educational medium well established [13]. A new educational tool, that has become possible through multimedia advances in the last decade, is the audio-visual demonstration of signs and symptoms in patients, referred to as Patient Video Cases or PVCs [4]. They are easily displayed via online platforms, are widely used, and have been endorsed by the National Patient Safety Agency [5] as an example of good practice. However there is little academic study of their effectiveness. Given the financial pressures affecting all health care agencies, it is important to know if these resource intensive e-learning strategies give demonstrable benefit to patients or health care professionals.

Theoretical constructs exist to evaluate interventions designed to improve clinical performance, but no single approach is followed, due to the wide range of individual and organisational factors that affect the outcomes before, during and after the intervention [6]. Kirkpatrick’s training evaluation is defined by four distinct levels of outcome to be approached in a stepwise fashion [7]. The four key domains of the Kirkpatrick model are learner satisfaction, learner knowledge, learner behaviour change and organisational change. Although others have argued contextual factors not classified under these domains may be significant [6], the Kirkpatrick model still remains a valid methodology with systematic reviews using the process to examine training effectiveness [8]. A healthcare relevant modification of the Kirkpatrick model has been used in a study of inter-professional education in health and social care [9]. When using the Kirkpatrick model, or other relevant frameworks for assessing an educational or training intervention, the outcome measures and the methodology by which they are obtained must be valid. The concepts of internal and construct validity are classifications with direct relevance to outcome measures and are components of methodological quality used by the Campbell Collaboration [10, 11].

  1. i.

    Internal Validity is the extent to which the intervention can reliably be ascribed to have affected the change

  2. ii.

    Construct Validity relates to the association between the concept being investigated and the measures used to test it i.e. does the data collected accurately reflect the outcome measure chosen?

Other forms of validity exist but are not directly relevant to the quality of the outcome measures chosen, for example good external validity would imply that using PVCs could be beneficial in different populations, but would not give any information if the initial outcome measure was fit for purpose.

The aim of this work is to answer the question “What is the validity and quality of outcome measures that have been used to evaluate interventions based on PVCs?”. This literature review will be used to identify which outcome measures are most valid in the assessment of the clinical effectiveness of an intervention based on PVCs. It will also help identify areas where more methodological research is needed to enable future studies to demonstrate high internal and construct validity.

Methods

This review was performed over three stages, the first stage collating relevant literature followed by individual study quality appraisal in stage two with a summation of the overall validity of the studies.

Stage one

Stage one identified literature relevant to the use of PVCs in health care settings. The definition of Health Care Settings used was; ‘any location or environment where students or graduates are practising or learning medicine.’ The definition of a PVC was; ‘any pre-recorded or live video footage of a patient used for the purposes of demonstrating a sign or symptom’. It did not include footage recorded for the purposes of educating other patients or families. Inclusion criteria were:

  1. i.

    Humans

  2. ii.

    The study described the use of PVCs in a training, educational (undergraduate or postgraduate), implementation capacity or environment.

As PVCs relate to demonstration of signs and symptoms in patients, studies using video to demonstrate verbal communication, non-lexical utterances or solely history taking between a patient and doctor or patient and patient were excluded as were non-English language papers which could not be translated. The full literature search was developed in conjunction with a senior NHS Librarian and is available on request. The following general search terms were used (Video* OR Video record* OR video clip OR digital* record* OR analogue recording OR patient video clip) and (Educat* OR Train* OR learn* OR teach* OR inservice training). The following databases were searched: Medline, British Nursing Index (BNI), EMBASE, Health Management Information Consortium (HMIC), CINAHL, NIHR Health Technology Assessment Programme (HTA), Database of Abstracts of Reviews of Effects (DARE), Scopus, The Cochrane Library and the Education Resources Information Centre (ERIC). Internet search engines and NHS evidence were used to identify publications or articles related to the search terms. The search strategy was not limited to any particular research methodology used in the articles. The last search performed was 27thth July 2012 by the principal author. In all phases of the study any uncertainty as to classification or indexing of information was discussed with the collaborating authors.

Articles with a relevant abstract (any detail relating to the recording and utilisation of video clips of patients) had a complete paper review (as did any abstracts in which there was uncertainty about inclusion potential). Information on aim, health care user, educational purpose, modified Kirkpatrick training level domain, type of study, outcome measure and conclusions was extracted from each paper as shown in Table 1. The Educational purpose was subdivided into three categories:

Table 1 Studies by Healthcare professional grouping

Stage two

To enable objective review of articles to determine the aspects of validity under study the following domains were used which represent features reducing the internal validity of studies. They have been amended from the list described by Farrington [12]. This work was chosen as it is based on Cook and Cambell’s original work on methodological quality. Although other methodologies of analysis are available this is a widely used and accepted process which allows for an objective process to be applied.

  1. 1.

    Selection: Does the outcome measure allow for control between groups?

  2. 2.

    History: Does the outcome measure allow for the effects caused by some event occurring at the same time as the intervention?

  3. 3.

    Maturation: Does the outcome measure allow for natural progression in learning and knowledge?

  4. 4.

    Instrumentation: Is the outcome measure reproducible?

  5. 5.

    Testing: Does the outcome measure itself affect the results?

  6. 6.

    Differential attrition: Can the outcome measure control for differing numbers of participants in control or experimental groups (if present) or large drop out rates.

The extraction of information was undertaken by the principal author.

Stage three

Once this process had occurred a number of more global questions were asked of each paper to determine whether the article’s author had evaluated the outcome methods they had chosen and allow an assessment of the construct validity of the study.

  1. a)

    How was the choice of outcome measure justified?

  2. b)

    Did the choice determine the results the study aims to investigate?

  3. c)

    To what extent were the writers aware of the disadvantages as well as the advantages of the outcome measures chosen?

  4. d)

    How did they overcome the disadvantages?

Results

Figure 1 shows the flow of journals from the initial search to the final selection of articles. The types of healthcare professionals studied is demonstrated in Table 1 and the number of studies classified by educational purpose and Kirkpatrick level shown in Table 2. Two studies evaluated both undergraduate and basic postgraduate trainees leading to a total of 21 studies of health care professional groups and two studies evaluated both learner knowledge and learner behaviour leading to a total of 20 studies of the relevant Kirkpatrick level.

Figure 1
figure 1

Literature Search Flow Diagram.

Table 2 Classification of studies

The purpose of this work was to be as inclusive as possible so as to capture all outcome measures used. Although twenty-two articles (twenty-three studies) underwent a thorough analysis in stage two, half of these require further clarification as to the reasons for their inclusion. These articles were all reviewed by all three authors and a collaborative decision reached on their inclusion. Under the inclusion criteria it had not been the intention to include animal studies in the protocol. However one, in the field of veterinary medicine [13], studied PVCs in precisely the context human patients clips would be used with an accompanying relevant and feasible methodology. It has been included in the final review as it was decided methodology rather than context was being investigated. The search was repeated removing the ‘human only’ limitation but no other veterinary journals of relevance were found.

One study examining an intervention to improve the physical examination component of a medical student exam via a web-based video did not specifically use abnormal or normal clinical signs [14]. The study looked at outcomes across a whole year group in a before and after cohort design. This study has been included as the methodology could have been easily used in a PVC-related intervention. A study using video to demonstrate a specific clinical examination was also included although it could be argued that the precise aim of the tool was not demonstrating specific clinical signs but a methodology of elucidating them. The methodology used, a Solomon four-group design [15], was considered relevant to defining robust outcome measures in future PVC studies.

Finally six studies [1621], although in different patient groups (ankylosing spondylitis, rheumatoid arthritis, fibromyalgia) used exactly the same methodology as two initial studies into osteoarthritis by the same investigators. These were studies in the validation of an examination methodology in both medical students and consultants. Although the actual data was different, the papers used exactly the same introduction, methods and discussion. In terms of the narrative review, these eight journal articles represent only one methodological approach in two different cohorts of participants. It was felt due to the lack of difference in the wording of the arthritic publications these should be considered as two studies, one representing undergraduates and the other trained doctors continuing professional development. Noting the reasons given above the total number of articles evaluated was 17 (which involved 18 distinct studies).

Table 3 contains the descriptive results for the reviewed articles and Table 4 contains the overall judgement on each of the articles. The analysis of the validity of the outcome measures can be found in the Additional file 1: Appendix.

Table 3 Identification of health care settings in which educational patient video clips have been utilised
Table 4 Review of methodological quality of studies using outcome measures to assess the impact of PVCs

Discussion

This review examined the evidence on how to measure outcomes when Patient Video Cases (PVCs) are used in healthcare settings. This evidence was small, extremely heterogeneous and there was insufficient evidence to specify the best outcomes to use. The heterogeneity in the articles was created by the diversity of involved health care professionals, varying educational purposes, different types of intervention, a wide range of outcome methodologies, different internal and construct validities and a variety of results. Each of these is examined in turn.

Type of healthcare professional

The preponderance of projects in undergraduate education is likely related to the large number of medical education academics at these institutions, the access to a ‘captive group’ of subjects and the greater ease of assessing undergraduate outcomes. Further investigation into the use of PVCs at postgraduate level and in other healthcare professionals is clearly warranted. For all health care professionals it is also reasonable to attribute the lack of studies to the difficulties in designing [36] and funding studies evaluating PVCs.

Educational purposes and types of intervention

Given the small number of studies, it is difficult to identify clear treads in educational purpose or type of intervention. Learner satisfaction and knowledge gain are the easiest of the Kirkpatrick training outcomes to measure as they do not require external observation or intervention. However these domains are the lowest in the hierarchy of evidence needed to confirm that a training process has been truly effective [37]. No study looked at organisational change, which is in keeping with previous literature. A review aiming to identify methods used to measure change in the clinical practices of health professionals found only 17.6% looked at changes at an organisation level [38]. Also in this review only one study attempted to look at more than one level of training outcome. A systematic review of evaluation in formal continuing medical education [39] noted 28% of studies reviewed looked at two levels and only 6% looked at three.

Methods for determining and assessing outcome measures

Reflecting the wide range of different types of studies performed, the validity of the outcome measures used was variable. This represents the difficulties of examining interventions related to education and training. In clinical practice a gold standard approach in assessing the effectiveness of medication is the randomised controlled trial. The primary outcome measure being an objective endpoint such as a defined reduction or gain in a physiological parameter. In training interventions, a single endpoint as an outcome requires a lot of interpretation, and potential criticism. For example, learner satisfaction does not necessarily equate to knowledge change, neither does it have a direct correlation with change in practice. The absence of a gold standard measure to assess training interventions may have led researchers to be opportunistic in their use of outcome measures. In this review seven studies gave no justification for the outcome measure used [13, 15, 25, 26, 29, 30]. In addition comments by the authors themselves on limitations to the outcome measures were absent in five of the studies [13, 26, 30, 31].

Only one study looked at more than one discrete domain in the Kirkpatrick training evaluation framework [29]. In this work both learner knowledge and learner satisfaction were assessed by different measures (a video test, a written test and a course evaluation). Three other studies [14, 25, 31] had more than one outcome measure, although these were all subtle variations on a theme such as scores in different types of clinical examination in the same test.

Only two of the studies [27, 33] satisfied all domains when deciding on whether internal and construct validity had been achieved. Three other papers [15, 23, 29] had minor concerns, generally relating to the extent which the outcome measure itself affected the results. Questionnaire studies reflecting learner satisfaction tended not to perform well as control between groups was not possible and confounding factors were very difficult to assess.

Results of the interventions

Nearly all papers were positive regarding the use of PVCs (regardless of whether the analysis above had revealed concerns over the validity of the outcome measure). The medical student studies regarding critical analysis and thinking showed strong results in favour of the use of PVCs. The underlying hypotheses of these studies [23, 24, 27, 28, 32] were plausible and the methodologies used rigorous. A researcher independent of these groups has also recently shown students prefer this use of PVCs to current problem based learning techniques [30] so triangulation has in some respects been achieved in this field. A recent paper demonstrating experts are more focused on the relevant clinical features within patient video clips has been further supported by, as yet unpublished evidence, that eye movement modelling may improve diagnostic reasoning. This methodology, where the minute movements of the eye are tracked while observing dynamic images, has strong construct validity. It is felt the cognitive ‘load’ of dynamic video clips may encourage cognitive processing [40] and therefore methodologies to explore the extent of this load created by PVCs are welcome. Future research must be cognisant of the fact that under- or over-load may occur depending on the capacity of the individual engaging in the activity. Extraneous cognitive [41] load may be able to be controlled to some extent by investigators and this will aid determination of its impact on the outcome of the intervention.

Studies concerning testing methods and clinical examination showed no obvious differences between PVCs and current assessment methods. The potential difficulty and cost of placing video clips into examinations (whether formative or summative) may have limited the number of validation studies in this area. In studies of clinical examination technique which aimed to show improvement following a PVC intervention, there was supportive evidence although initial skill sets tended to be relatively high. The importance of controlling for this was demonstrated by the use of the Solomon Four Group design on a video intervention to improve examination of the plantar reflex [15]. In this study an effect was only seen when pre-intervention performance was assessed.

The video-based training method for improving the detection of depression in residents of long term care facilities demonstrated an increase in performance of the intervention group in both knowledge assessments [29]. Direct patient benefit was not assessed so an improvement in clinical care as a use of PVC cannot be claimed. However given the good levels of satisfaction on questionnaire testing it is likely that participants would not have been averse to incorporating newly acquired learning into their day-to-day practice.

Limitations

The heterogeneity of the current published evidence made a robust narrative review extremely difficult. Apart from the work on how PVCs encourage discourse and critical thinking, there were no common themes in which to be able to extract information and analyse composite outcomes. This may represent difficulty in undertaking research in the field (the cost of production of video clips), the difficulty in defining valid outcome measures or publication bias due to a paucity of positive outcomes. This exemplifies the challenge that much medical education research is Action Research, research based on the instructors’ own practice.

Publication bias is unlikely to be significant as there as there is literature in which research is positive [42] regarding the use of video and online technologies but there are also negative [43] publications in existence. It would seem unlikely a particular modality of online or audiovisual learning would be subject to a different research agenda.

The main limitation of this study is the low number of articles that were found. The search strategy used was expansive although “Patient Video Clip” or similar terms are not used by all researchers in the field. It is possible terms other than those searched have been used although the number of papers missed is likely to be very small. Extraction of data was performed by a sole reviewer so it is possible so errors of typology were made although the small number of final articles has allowed extensive examination of the papers by all the authors.

Conclusion

This review process has demonstrated the diverse nature of research in determining the effectiveness of PVCs in education. Medical education occurs in a variety of environments and the complicated interplay of confounding variables makes interpretation of outcomes difficult. The following recommendations would enable the production of a standard conceptual framework to guide future research in the area.

  • Studies should classify which facet of training or educational outcome the study is aiming to explore.

  • Studies should aim to validate a particular outcome measure, preferably by reproducing previous work rather than adopting new methods.

  • A description of the validity of the chosen outcome measure should be included in study protocol.

  • Although control groups are useful for demonstrating the benefit of a PVC intervention, more evidence is needed on whether the outcome measure demonstrates construct validity.

  • Studies on PVCs should take account of cognitive theory with the cognitive processing enhancement, demonstrated in a number of the medical student papers, tested at a postgraduate level. Although pragmatic outcome measures are easier to achieve explanatory trials are needed.

  • Prior-knowledge and behaviour testing is vital to demonstrate improvement.