Although cardiopulmonary resuscitation (CPR) measures have improved continuously since the 1960s, in-hospital cardiac arrest (IHCA) remains associated with high morbidity and mortality.1 A recent meta-analysis found that the overall survival rate from IHCA at one year was 13.4%, with survival more likely from events of cardiac origin (39.3%) than from those of noncardiac origin (10.7%).2 Ninety-two percent of survivors were found to have a cerebral performance category score of 1 or 2, corresponding to a good neurologic prognosis.2

The availability of automated external defibrillators (AEDs) outside of hospitals has been associated with a clear improvement in survival after cardiorespiratory arrest, as witnesses can perform early defibrillation before the emergency services arrive.3,4 In contrast, studies of in-hospital AED use have mostly shown no benefit in terms of survival.5,6,7,8 The proportion of shockable rhythms differs between in-hospital cardiac arrests (20%) and out-of-hospital cardiac arrests (37%).6,9,10

In French hospitals, when AEDs are not available, first assessments of cardiac rhythm are carried out by medical emergency teams led by anesthesiologist-intensivists, who are doubly qualified in critical care medicine and anesthesiology. Many hospitals in France are nevertheless equipped with AEDs, allowing defibrillation to be performed in some cases before the emergency team arrives.

The hypothesis of this study was that anesthesiologist-intensivists would have a lower diagnostic performance than AEDs, but would make decisions faster. Therefore, the primary objective was to determine the diagnostic performance of anesthesiologist-intensivists in identifying rhythms as shockable or nonshockable. The secondary objectives were to analyze decision-making times, to estimate the sensitivity and specificity of the decisions for subgroups of cardiac arrest rhythms, and to search for demographic factors associated with performance.

Methods

Study design

This was a simulation-based, multicentre, prospective, observational study that took place between May 2019 and March 2020. Junior and senior anesthesiologist-intensivists in six French hospitals (four university hospitals and two military hospitals) were sent a link to an online AED simulator along with a standardized questionnaire to record respondents’ age, sex, level of experience, number of CPRs performed per year, preferred mode of defibrillator operation (manual, semiautomatic, or situation-dependent), and main activity (intensive care or operating room). Email reminders were sent every month in the absence of a response. Respondents were excluded if they had incomplete responses to one or several tasks. We chose a convenient sample of 100 participants for this study.

Ethical approval

The study was approved on 10 June 2019, by the research ethics committee of the French Society of Anesthesiology and Intensive Care (Société Française d'Anesthésie et de Réanimation, IRB N° 00010254-2019-099), Paris, France (Chairman Prof. J. E. Bazin), and registered with the French Data Protection Agency (Commission Nationale de l’Informatique et des Libertés). All data were anonymized.

Simulator

The simulator, which is accessible online (https://simul-shock.firebaseapp.com/), presented as a manual defibrillator showing a series of 60 electrocardiograms as recorded in real time, with two buttons to either deliver a shock (“shock”) or not shock and resume chest compressions (“no shock”) (Electronic Supplementary Material [ESM], eFig. 1). The time taken to decide whether to shock was also recorded. Before starting the simulation, participants had to read instructions stating that whatever the proposed rhythm, the patient was in clinical cardiac arrest, therefore unconscious and without a palpable pulse.

The electrocardiogram recordings shown by the simulator were performed in real use by AEDs (Defigard Touch 7, Schiller, Wissembourg, France) in patients with cardiac arrest. The chosen recordings lasted approximately 10 sec and did not contain signs of shock delivery or chest compression artifacts. Three expert physicians analyzed an initial set of 62 recordings on the simulator to determine whether the rhythms were shockable. If there was disagreement (n = 4), the recordings were assessed a second time, and eliminated if no consensus could be reached (n = 2). Experts’ consensus decisions were defined as the gold standard, with 100% sensitivity and specificity. The testing dataset therefore consisted of 60 electrocardiograms (Fig. 1): 14 (23%) showed asystole, 29 (48%) showed pulseless electrical activity (PEA), four (7%) showed coarse ventricular fibrillation (VF), five (8%) showed fine VF, and the remaining eight (13%) showed ventricular tachycardia (VT).

Fig. 1
figure 1

Several rhythms on the simulator

Outcome measures

The main outcome measure was the performance of participants in diagnosing rhythms as shockable or nonshockable, defined as the overall sensitivity and specificity of their decisions over the entire test dataset. Secondary outcome measures were the sensitivity and specificity of participants’ decisions for each rhythm category, and their decision-making times, as measured from the moment each electrocardiogram, were shown on the screen to the moment participants pressed the “shock” or “no shock” button.

Statistical analyses

For each participant, their responses to the 60 electrocardiograms were used to calculate individual sensitivity (defined as the proportion of decisions to shock for shockable rhythms), and individual specificity (defined as the proportion of decisions not to shock for nonshockable rhythms). Overall diagnostic performance was then presented as the overall median sensitivity and the overall median specificity of the participants together with the interquartile range. Results for each type of rhythm were statistically weighted to match the proportions reported in the literature: asystole (34.9%), PEA (46.5%), VF (10.4%), and VT (8.1%).11 This artificially inflated the number of recordings in each category without affecting the overall diagnostic performance. The median values of sensitivity and specificity were used and substituted into the formulas to calculate the likelihood ratios.

The sensitivity for each type of shockable rhythm (VT, coarse VF, and fine VF) and the specificity for each type of nonshockable rhythm (asystole and PEA) were also calculated. Univariate associations between demographic variables and participants’ sensitivity and specificity were also investigated. Decision-making times are presented as the median value of all delays for each rhythm category.

The demographic values are presented as median [interquartile range (IQR)] and number (%). Continuous variables are summarized as median [IQR]. Decision-making times were compared using the Wilcoxon–Mann–Whitney test. All tests were two-sided. As our study was exploratory, with a target sample size that was set arbitrarily, differences were considered significant at P < 0.01. All statistical analyses were performed using the R software version 3.6.1 (R Foundation for Statistical Computing, Vienna, Austria).

Results

Flow chart

Among the 267 anesthesiologist-intensivists contacted to participate in the study, 186 (70%) participated, seven (4%) of whom did not complete the test (ESM eTable). The final number of participants was 179 (response rate of 67%) (Fig. 2).

Fig. 2
figure 2

Study flow chart

Demographic characteristics

The demographic characteristics of the participants are summarized in Table 1. The median age was 32 yr and most respondents were male (113/179, 63%) and senior physicians (124/179, 69%), with a median of four years of seniority. Most participants performed fewer than six CPRs per year (n = 100, 56%) and preferred to use defibrillators in manual mode (n = 89, 50%).

Table 1 Demographic characteristics of participants

Sensitivity and specificity

The median [IQR] overall sensitivity was 88 [79–95]% and the median overall specificity was 86 [77–92]%. The positive likelihood ratio was 6.29, and the negative likelihood ratio was 0.14. The corresponding receiver operating characteristic curve is shown in Fig. 3.

Fig. 3
figure 3

Receiver operating characteristic curve for anesthesiologist-intensivists in identifying shockable and nonshockable rhythms. This curve was constructed from the [sensitivity/(1-specificity)] coordinates of each anesthesiologist-intensivist. It allows to graphically apprehend the group performance by visualizing the area under the curve (AUC). Nevertheless, we cannot obtain a regression function represented by a mathematical model so we cannot provide an AUC value.

Among shockable rhythms, the median [IQR] sensitivity was 100 [100–100]% for VT, 100 [100–100]% for coarse VF, and 60 [20–100]% for fine VF. The median [IQR] specificities for nonshockable rhythms were 93 [86–100]% for asystole and 83 [72–86]% for PEA.

Table 2 presents the results of the univariate analysis of the association between sensitivity, specificity, and demographic variables. Senior status was significantly associated with higher sensitivity. There were no significant differences in the of annual number of CPRs performed, main activity, and preferential defibrillator mode of use.

Table 2 Demographic subgroups analysis for sensitivity and specificity

Decision-making times

Decision-making times (Fig. 4) differed between rhythm categories. The most rapidly recognized rhythms were the coarse VF and VT. The median decision times ranged from 2.0 to 3.5 sec. Most responses were made within five seconds (ESM eFig. 2).

Fig. 4
figure 4

Box plots showing physicians' decision-making times for each rhythm category. The black line indicates the median value. The box edges indicate the 25th and 75th quartile (Q25 and Q75), and the whiskers indicate minimum and maximum values.

Discussion

In the present simulation-based multicentre study, the median overall sensitivity and specificity of the studied anesthesiologist-intensivists in recognizing shockable and nonshockable rhythms were 88% and 86%, respectively. A comparative study of two AEDs reported 91% and 99% for the first, and 100% and 96% for the second.12 Compared with USA recommendations for the diagnostic performance of AEDs, the sensitivity of this group of anesthesiologist-intensivists was adequate for VT and coarse VF, but the specificity was insufficient, particularly for PEA.13

The 100% sensitivity determined here for VT and coarse VF suggests that anesthesiologist-intensivists correctly decide to deliver shocks in these situations. In contrast, the specificities of 93% for asystole and 83% for PEA suggest that in some of these cases, shocks are delivered when they should not be. Inappropriate shocks are deleterious during CPR because they require the unwarranted cessation of cardiac massage, resulting in a longer period of no-flow.14,15

Regarding the sensitivity of participants for fine VF (60%), these results are more difficult to interpret. Indeed, 2015 European Resuscitation Council recommendations (that applied during the study) were to not shock in case of diagnostic uncertainty between asystole and very fine VF.16 The fact that the electrocardiograms were presented by the simulator without a y-axis scale or gridlines made differentiating these two rhythms and deciding to shock or not much harder, and the substantially lower sensitivity observed for fine VF may thus be explained by a corresponding increase in false negatives. The latest 2021 European Resuscitation Council recommendations state that when the rhythm is clearly judged to be VF, a shock should be given.17

The availability of AEDs in hospitals has so far not had a significant effect on IHCA survival rates, which have not improved since 2010.5,6,7,8,18 The most important factor in improving survival from IHCA seems to be reducing the time between cardiac arrest and the first shock in cases of VF or VT. Indeed, survival is significantly reduced if the shock is administered more than two minutes after the start of CPR.19,20 In our study, the difference in decision times between rhythms was statistically different but clearly not clinically relevant: most response times were below five seconds. In comparison, recent AEDs can assess electrocardiogram rhythms and advise on whether to shock within five seconds of the interruption of chest compressions.21 Decision-making times are therefore very similar for anesthesiologists-intensivists and AEDs. The second issue in improving survival rates is to limit the time without chest compressions. Operating the defibrillator in manual mode saves time by avoiding the spoken instructions and the sometimes lengthy cardiac rhythm analysis in semiautomatic mode, but increases the number of inappropriate shocks and associated interruptions in chest compressions.14,22 The latter risk is highlighted by the limited specificity of anesthesiologist-intensivists measured here. Koller et al. found error rates of 6–11% in the analyses of five AEDs.23 External artifacts due notably to chest compressions continued despite instructions from the AED to stop can also lead to incorrect decisions.24 A fourth issue in this context is the training of medical emergency teams. Return of spontaneous circulation and survival at one year after IHCA is more likely when CPR is delivered by teams trained in advanced cardiac life support.25 Continuing education for anesthesiologist-intensivists who do not regularly treat IHCAs may increase diagnostic performance and thereby the likelihood of correct defibrillation decisions.

Several strategies have been investigated with AEDs to limit or avoid interruptions in cardiac rhythm analysis during chest compressions or to detect returns of effective spontaneous circulation during rhythm analysis.21,26,27,28 Automatic analysis while chest compressions are ongoing, along with a high diagnostic performance, would certainly reduce the cognitive load of physicians during CPR, leaving more time to lead the emergency team and identifying and treating the cause of cardiac arrest, thereby increasing patient survival rates.29,30,31 At present, limitations in the interpretation of AEDs should encourage the practitioner to be cautious about their use. It is important to remember that responsibility for decision-making rests with the physician and that survival rates for in-hospital cardiac arrest have not been improved by their introduction in wards.5,6,7,8,18

The strengths of the study include its size (n = 179) and high response rate (67%). In terms of methodology, the original design and use of real electrocardiogram recordings are also strengths.

The study is limited by its observational nature and the fact that the tests were conducted under simulated conditions. An investigation of anesthesiologist-intensivists’ decisions in the management of real IHCAs would better reflect actual clinical practice. Nevertheless, decisions taken under real conditions may be difficult to analyze retrospectively because of CPR artifacts, with the added complication that all clinical situations are unique. In contrast, these simulations allowed participants’ performance to be evaluated uniformly, in a controlled environment. Another limitation of the study is the fact that the 60 recordings were analyzed in sequence, and a drop in performance at the end of the test may have led to an underestimation of sensitivity and specificity. Our study may be underpowered with respect to showing differences in diagnostic performance between groups of physicians. A dedicated analysis of the group of critical care consultants, who are responsible for managing cardiac arrest in their daily practice, would have been particularly interesting, but only possible if we had had more participants. Our group of participants may be younger and have less experience compared with other centers, and this may limit the generalizability of our results. Furthermore, while medical emergency teams often include nonanesthesiologist-intensivists or emergency physicians, the performance of these categories of physicians was not evaluated. Finally, as pointed out above, the electrocardiograms were displayed by the simulator without a y-axis scale, which made very fine VF (nonshockable) difficult to distinguish from fine VF (shockable).

In conclusion, the anesthesiologist-intensivists who participated in this simulation-based study classified rhythms as “shockable” or “nonshockable” with a median overall sensitivity of 88%, and a median overall specificity of 86%. Participants’ sensitivity in deciding to deliver shocks for VT and coarse VF was excellent, while the specificity of their decisions to not shock for PEA was inadequate, implying that shocks would have been delivered inappropriately. Their decision-making times were below five seconds. Theoretical and practical training in recognizing cardiac arrest rhythms should be strengthened for anesthesiologist-intensivists who use defibrillators in manual mode. After validation in a larger cohort, the online simulation tool created for this study could be used as part of a continuing education program.