Traditionally, the quality of surgery is assessed on morbidity and mortality data (MMD) [1, 2]. Useful as it is in hospital surgical practice, the limitation of MMD as a performance index, is its retrospective nature. Learning curves (LC) are often used by surgeons who are ‘learning’ (i.e., gaining proficiency) in the execution of an operation, as performance improves with increasing experience [3,4,5,6].

Neither MMD nor LCs can provide objective information on the nature of intraoperative errors and their mechanisms when these effect adversely patient outcome. Specifically, they fail to differentiate the exact role of technical errors from other components of surgical competence, e.g., non-technical skills [7,8,9,10], or the level of proficiency of surgeons by proficiency–gain curves (P-GCs) (Fig. 1). The P-GC of an individual surgeon for an operation represents the time course on repeat executions through which the trainee reaches the proficiency zone and is then able to perform the operation consistently well with good patient outcome; benchmarked by Surgical Colleges and required by Credentialing Committees and National Licensing Bodies. In essence, these safeguard society from surgeons who cannot or have lost the ability to operate safely and to the ‘accepted standard of care’ [7]. The underlying root causes of the adverse events are technical errors which often also provide key information on learning opportunities to prevent or reduce adverse events [11,12,13,14].

Fig. 1
figure 1

PRISMA guidelines-based selection of publications for systemic reviews

An alternative approach to human error reduction is human reliability analysis (HRA) techniques [15,16,17,18,19,20]. These are widely used in risk management of safety–critical systems, e.g., nuclear power industry, aviation industry, and military operations. HRA techniques determine the impact of human error within a system. The techniques are those of systems engineering and cognitive and behavioral science. They are used to analyze and understand the human contribution to the system’s reliability and safety [19, 20]. Common steps of the HRA process consist of problem definition and specification of the task and its modeling, human error identification and analysis, human error quantification, and error management.

The first study to use of HRA techniques in laparoscopic surgery was published in 1998. It analyzed the surgical task performance based on technical errors during laparoscopic cholecystectomy (L chole) [15]. Subsequent research from the Surgical Skills Unit in Dundee was directed towards increasing the clinical relevance of HRA. This was necessary as HRA is essentially predictive, i.e., its objective being to ensure that the activity, e.g., civilian flight, space flight etc., is as safe as is humanly possible before the aircraft takes off. In sharp contrast, all operations can nowadays be assessed objectively from an unedited video recording using established human factors (cognitive) engineering expertise. This approach renders HRA observational and specific to an operator. Hence this modified HRA is referred to as ‘Observational Clinical – Human Reliability Assessment (OCHRA) [16, 21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. The purpose of this review was to analyze the current state, uptake and limitations of the use of OCHRA to assess intraoperative technical errors, hazard zone of operations and proficiency–curves of operations.

Methods

Search strategy and criteria

The review was performed using the guidelines outlined in Systematic Reviews and Meta-Analyses (PRISMA) statement (Fig. 1) [43]. Only publications related to assessment of surgical task performance and surgical operations by identifying technical errors using HRA and OCHRA were included across specialties: General Surgery, Colorectal Surgery, Bariatric Surgery, Urology, Ophthalmic Surgery, Pediatric Surgery, and Otorhinolaryngology. Surgical tasks in surgical training programs and surgical performance in experimental surgical studies were also included. Exclusions were publications on non-surgical performance, descriptive publications without data, conference abstracts, letters, editorials and commentaries, and non-English publications.

Since this study was a systematic review and there were no human subjects involved, thus, the institutional review board (IRB) approval and written consent were not required.

Eligibility criteria

An initial search was carried out on PubMed, EMBASE, Web of Science and the Cochrane Library for English language articles published from January 1998 to January 2019. Search strategy and terms used included ‘human reliability analysis (HRA),’ ‘observational clinical human reliability analysis (OCHRA),’ ‘human error in surgery,’ ‘adverse events,’ ‘human error identification,’ ‘technical error in surgery,’ ‘surgical performance,’ ‘task analysis in surgery,’ and ‘competency assessment.’ A further search used terms such ‘patient safety,’ ‘hazard zones in surgery,’ ‘human factors in surgery,’ ‘proficiency–gain curves in surgery,’ ‘surgical skills training.’ All the key search terms were combined subsequently.

The culled publications were retrieved in full text for further assessment for eligibility. Following review, relevant references cited in the included articles were also retrieved and scrutinized.

Data extraction and synthesis

Studies describing use of HRA or OCHRA for direct assessment of surgical operations were grouped together. Other publications in which HRA or OCHRA were used as one of the methods to assess surgical task performance for research projects were grouped separately. Microsoft Excel 2016 (Microsoft Corp, Redmond, WA) was used to manage the extracted data. Risk of bias within individual or across studies was not specifically assessed.

Assessment of methodological quality

The Medical Education Research Study Quality Instrument (MERSQI) [44] was applied to assess the quality of studies conducted using OCHRA. The MERSQI contains 10 items that reflect 6 domains of study quality including study design, sampling, type of data, validity, data analysis, and outcomes. MERSQI produces a maximum score of 18 with a potential range from 5 to 18. The maximum score for each domain is 3. The overall MERSQI scores pf the publications included in the review are shown in Table 1.

Table 1 Synthesis and analysis of publications on HRA and OCHRA

Results

A total of 2341 publications were screened, of which 297 were read in full text. Of these, 82 studies were excluded as not relevant. After the eligibility criteria of inclusion and exclusion were applied, a total of 26 studies were selected in the final data set for analysis (Fig. 1), with the majority (73%) being clinical. Thirty-one percent of these were performed by consultant surgeons and 69% by surgical trainees in established surgical training programs. OCHRA as the only assessment method was used in 54% of the 26 publications (Fig. 2).

Fig. 2
figure 2

Analysis of published studies included in review

OCHRA was applied to 719 surgical operations for direct analysis of the technical errors, hazard zones, external errors modes and P-GCs. The data also included a range of experimental research projects carried out by 265 surgical trainees, the vast majority of which used OCHRA with HRA being used only in 3 publications to evaluate surgical task performance.

Sixteen different surgical operations were analyzed using OCHRA: General Surgery, Colorectal Surgery, Bariatric Surgery, Urology, Ophthalmic Surgery, Pediatric Surgery, and Otorhinolaryngology. During execution of these operations, 7869 consequential errors were identified and analyzed (Table 1). Error rates and external error modes varied depending on the type of operations and level of experience of operators. In general, surgical trainees committed twice as many technical errors as specialist consultant/attending surgeons [16, 22].

The consequential error rate averaged 11 per procedure with a wide range of 4–34 (Table 1) depending on the complexity of the operation and level of expertise and skill of the operator. In one case series of 200 LCs [16], the inter-rater consistency of OCHRA was 85% and a strong correlation was observed between proficiency and error frequency upon test-re-test analysis (r = 0.79, P < 0.001) [25]. In a similar study evaluating performance in advanced laparoscopic surgery, analysis of 335 execution errors showed a significant correlation between error frequency and mesorectal specimen quality (Rs = 0.52, P = 0.02) and with blood loss (Rs = 0.609, P = 0.004) [25]. Classification of intraoperative adverse events using OCHRA was agreed by 84% of 34 European Association for Endoscopic Surgery (EAES) experts in laparoscopic surgery [19]. Error rates and external error modes varied, depending on the type of operations and level of experience of operators. In general, surgical trainees committed twice the technical error rate than specialists [14, 22].

Only two publications reported on External Error Mode (EEM), both on laparoscopic colorectal resections. The first study reported on EEM at different levels of expertise and was based on 32 video-recorded laparoscopic colorectal resections, performed by experts and delegates of the National Training Program in England [28]. All included errors on tissue-handling, instrument-misuse, and times spent on dissecting (D) and exposure (E). This new performance variable was referred to as the D/E ratio. Two independent expert surgeons globally assessed each video in terms of competency (pass vs. fail). The study identified 399 errors and reported significant differences between expert, pass, and fail candidates for total errors; with median errors for experts, pass, and fail candidates being 4, 10, and 17 (P < 0.001), respectively. Comparison between the pass and fail candidates showed more tissue-handling errors in the failed group (7 vs. 12; P = 0.005), but not for consequential and instrument-handling errors. As expected, the D/E ratio was significantly lower for delegates than for experts (0.6 vs. 1.0; P = 0.001) [28]. In this study all 4 independent variables were used to predict delegates who passed or failed the assessment, the area under the receiver operating characteristic curve was 0.867, sensitivity 71.4%, specificity 90.9%, and overall predictive accuracy 84.4%. Thus, OCHRA provides significant discriminative power (construct validity) between competent and non-competent performance [28].

The second, a single-center study, used OCHRA to identify technical errors enacted in unedited videos of 20 consecutive laparoscopic rectal cancer resections [25]. The study identified 335 execution errors with a median of 15/operation. More errors were enacted during pelvic compared with abdominal steps (P < 0.001). Additionally, more errors were observed during dissection on the right than the left side of the pelvis (P = 0.03).

Hazard zones and difficult tasks were identified in all major commonly performed laparoscopic operations such as general surgical, colorectal, bariatric and ENT operations [16, 21, 25, 27, 29, 30, 32, 33]. Examples include dissection of triangle of Calot during LChole, dissection of right side of pelvis during laparoscopic resection of rectal cancer, mobilization of the greater curvature and stapling of the stomach during sleeve gastrectomy and access to nasal cavity during endoscopic dacryocystorhinostomy (DCR). Difficult tasks were also identified by OCHRA, e.g., intracorporeal sutured laparoscopic anastomosis and laparoscopic gastric bypass [33, 34].

The data also confirmed that OCHRA can be used to quantify the P-GC for a laparoscopic operation indicated by reaching the proficiency zone, when the individual surgeon attains maximal optimal performance in the execution of a specific procedure (Fig. 3) [34, 45]. It has also been suggested that OCHRA is a valid tool for assessing competency level in advanced specialist surgery, e.g., laparoscopic colorectal surgery [23, 25, 28].

Fig. 3
figure 3

Reproduced by permission of Editor in Chief/Publisher of Surgical Endoscopy

Attainment of proficient execution of palliative laparoscopic bilio-enteric bypass by surgical fellow (MT) indicating that this surgeon needed to perform 13 such procedures to reach a nadir of a few inconsequential operations [34].

Discussion

OCHRA assesses the quality of execution by a surgeon (performance level) by detection and characterisation of technical errors (procedural/execution) and (consequential/inconsequential) enacted by the operator during the operation [16, 21,22,23,24,25,26,27,28,29,30,31,32,33,34]. In this process OCHRA, divides the continuum of an operation into steps, tasks and hazard zones, the last referring to sections of an operation where major errors, some catastrophic, iatrogenic injuries, occur most commonly [16, 21, 25,26,27,28,29,30,31,32,33].

The reported significant correlation between OCHRA error rates and quality of total mesorectal excision also confirms the clinical relevance of the technique in quality assessment of surgical performance [25, 28]. It also detects the attainment of complete proficiency reached by a surgeon indicated by a nadir of only a few inconsequential errors. This ability of OCHRA is currently underutilized in both surgical training and higher surgical specialization [22, 45,46,47,48,49].

In the OCHRA paradigm, technical errors are classified as consequential (need remedial action by surgeon) and inconsequential [16, 21, 22]. Any action or omission that causes an adverse event or increases the time of surgical procedure by necessitating a corrective action that falls outside the ‘acceptable limits’ constitutes a consequential error. Inconsequential errors are actions or omissions that increase likelihood of negative consequence and under slightly changed circumstances could result in an adverse effect on patient outcome. Inconsequential errors are important as they serve as ‘near misses’ providing key learning opportunities for reduction of future adverse events [11, 15,16,17,18,19,20].

Technical errors associated with inability of the surgeon to execute the component steps in the correct order are categorized as ‘procedural error modes,’ while ‘execution error modes’ reflect ineffective/traumatic manipulations [15, 16, 22]. Surgical trainees committed twice the incidence of technical errors than consultant/attending surgeons [16, 22].

Underling mechanisms which provide a deeper understanding of the likelihood of occurrence of technical errors were reported in some studies, e.g., applying excessive force, incorrect order of steps, concentration lapses, misjudgements, poor instrument selection etc., have been identified as factors. Several hazard zones have also been described (Table 1) [15, 16, 21, 22, 25,26,27,28,29,30] and difficult tasks highlighted [27, 34]. OCHRA enables differentiation between LCs and P-GCs. Learning an operation goes beyond cognitive knowledge, by the individual becoming able to execute the procedure safely, without having to think about it. In this process, the surgeon progresses from the controlled conscious mode (exhausting and cerebrally intensive and subject to fatigue) to the automatic mode, characterized by smooth effortless execution [49].

The study by Miskovic et al. which evaluated the performance of specialists executing live operations in the operating room, confirmed the validity of OCHRA in adjudicating surgical performance at a specialist level and suggested that this method could be implemented for competency assessment within a clinical training program [28]. Potentially, it can also be used for recertification and re-validation.

Equally important, the review highlights the current limitations of OCHRA including its labor-intensive nature involving human factors scientists using established criteria to identify and categorize errors from unedited videos of operations [15, 16, 21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. In this respect, the OCHRA will eventually benefit by progress in AI and ML [50]. This development is considered essential for the wider uptake of OCHRA. The review confirms that OCHRA by its documentation and characterisation of errors enacted by operator, constitutes a valid technique for objective assessment of competence in the execution of operations at both consultant and trainee levels (Fig. 4).

Fig. 4
figure 4

Proficiency gain curves defined by OCHRA: A attainment of the proficiency zone by the majority of trainees (80%) for a specific operation; B earlier attainment of the proficiency zone by naturally gifted trainees with high level innate aptitude for the same operation as in (A); C inability to reach the proficiency zone by surgically untrainable surgical trainees (11%), who should be identified at an early stage and advised accordingly; D loss of proficiency by previously competent surgeons usually due to disease including alcoholism and other addiction

Conclusions

The resulting increased uptake and use of OCHRA would enhance patient outcome after surgery in routine hospital surgical practice and surgical training, aside from being a useful tool for privileging, accreditation and re-validation. The low uptake of OCHRA despite its ability to assess execution quality of operations is attributed to its labor-intensive nature involving human factors (cognitive engineering) expertise. This issue can only be resolved by development of smart video recorders equipped with AI and ML based on incorporated and/or WIFI-accessible huge data sets of unedited recorded operations.