Clinicians’ overestimation of febrile child risk assessment

We aimed to estimate clinicians’ based risk thresholds at which febrile children would be managed as serious bacterial infections (SBI) to determine influencing characteristics and to compare thresholds with prediction model (Feverkidstool) risk estimates. Twenty-one video vignettes of febrile children visiting the emergency department (ED) were assessed by 42 (40.4 %) international paediatricians/paediatric emergency clinicians. Questions were related to clinical risk scores of the child having SBI and SBI management decisions on visual analogue scales. Feverkidstool risk scores were based on clinical signs/symptoms and C-reactive protein. Amongst vignettes assigned to SBI management, the median risk was 60 % (interquartile range (IQR) 30.0–80.5) and 16.0 % (IQR 5.0–32.0) when vignettes were not managed as SBI. Ill appearance and aberrant circulatory signs were the most influencing factors, as age and duration of fever were the least influencing factors on SBI management decisions. Feverkidstool risk scores varied from 13 % (IQR 7.7–28.1) for SBI management to 7.3 % (IQR 5.7–16.3) for no SBI management. Conclusion: Clinicians assigned high risk scores to children who they would have managed as SBI, mostly influenced by ill appearance and aberrant circulation. In contrast to SBI risk assessment of the Feverkidstool, clinicians’ appeared to apply a more stepwise assessment of the risk of presence/absence of SBI at different steps in the diagnostic and therapeutic process. Uniform risk thresholds at which one should start SBI management in febrile children remains unclear; risk thresholds at which we refrained from SBI management were more consistent. What is Known: •Only a small proportion of febrile children presenting to the emergency department will have serious bacterial infections (SBI) and uniform risk thresholds to start or withhold SBI treatment are not known. •The low prevalence of SBI and consequently the low exposure of clinicians to these infections make them rely more on alarming signs or clinical decision rules. What is New: •Previously identified model predictors for SBI appeared to be significantly influencing factors in clinicians’ febrile child management in emergency care. •Clinicians’ wielded higher risk thresholds regarding SBI febrile child management than reflected by the clinical prediction model while smaller differences in risk thresholds between clinical and model prediction were observed when clinicians refrained from SBI management.

Feverkidstool risk scores were based on clinical signs/symptoms and C-reactive protein. Amongst vignettes assigned to SBI management, the median risk was 60 % (interquartile range (IQR) 30.0-80.5) and 16.0 % (IQR 5.0-32.0) when vignettes were not managed as SBI. Ill appearance and aberrant circulatory signs were the most influencing factors, as age and duration of fever were the least influencing factors on SBI management decisions. Feverkidstool risk scores varied from 13 % (IQR 7.7-28.1) for SBI management to 7.3 % (IQR 5.7-16.3) for no SBI management.
Conclusion: Clinicians assigned high risk scores to children who they would have managed as SBI, mostly influenced by ill appearance and aberrant circulation. In contrast to SBI risk assessment of the Feverkidstool, clinicians' appeared to apply a more stepwise assessment of the risk of presence/absence of SBI at different steps in the diagnostic and therapeutic process. Uniform risk thresholds at which one should start SBI management in febrile children remains unclear; risk thresholds at which we refrained from SBI management were more consistent.
What is Known: • Only a small proportion of febrile children presenting to the emergency department will have serious bacterial infections (SBI) and uniform risk thresholds to start or withhold SBI treatment are not known. • The low prevalence of SBI and consequently the low exposure of clinicians to these infections make them rely more on alarming signs or clinical decision rules.
What is New: • Previously identified model predictors for SBI appeared to be significantly influencing factors in clinicians' febrile child management in emergency care. • Clinicians' wielded higher risk thresholds regarding SBI febrile child management than reflected by the clinical prediction model.

Introduction
The febrile child is a common presentation to emergency departments (ED) with 10 to 20 % of all paediatric patients due to febrile illness alone [14,17,28]. Most children suffering from simple self-limiting infections do not need treatment.
However, a small proportion will have serious bacterial infections (SBI) which require investigation, hospital admission, antibiotics and in some cases intensive care admission. Understanding health care professionals' decision making, particularly regarding to diagnosis, treatment and follow-up is of vital importance, particularly as ED's become increasingly overcrowded [33,34]. Moreover, diagnostic errors, especially in infectious diseases, are amongst the most common medical misadventures of malpractice lawsuits in paediatrics [16].
To support decision making in febrile children, different clinical prediction models have been developed in the past decade [4,7,12,19,30,31]. Although most studies on prediction models report good accuracy and high compliance, implementation in paediatric emergency care is limited. One of the reasons might be that clinicians' intuitive estimation of probabilities may be as good as, or better than, prediction models [15,21,27]. Moreover, the lack of evidence on clinically based decision thresholds makes the application process of prediction models in clinical practice complex.
The aim of this study was to estimate risk thresholds at which children would be managed as SBI according to clinicians' judgement by assessment of video vignettes of febrile children visiting the ED. Secondary measures included determining the effect of investigations by recording risk estimations after information on C-reactive protein value, determining the presenting characteristics that influence these risks and comparing clinician perceived risk with risk estimates using a validated prediction model (Feverkidstool) [19].

Study design and setting
We performed a cross-sectional study with real life video vignettes of febrile children who presented themselves to the children's ED of the Leicester Royal Infirmary in Leicester, UK. All parents had given formal consent for the video images to be viewed by healthcare professionals under trust policy guidelines via previously published process [13]. Ethical consent for the collection of video images process had been granted by the National Research Ethics Committee East Midlands.

Study population
Paediatricians and paediatric emergency clinicians from the source population of the REPEM network (Research in Paediatric Emergency Medicine, Europe; www.pemdatabase.org/ REPEM.html), and Paediatricians at teaching hospitals with an interest in acute and emergency care in the Netherlands and United Kingdom, were invited (104 invitations). Nonresponders were sent reminders at 4-week intervals, for a maximum of four mailings per subject.

Study intervention-video vignettes
Twenty-one online video vignettes of febrile children were shown to the study participants. The vignettes were a mix of children in different age categories with potential SBI and children with simple self-limiting problems reflecting the different levels of severity in febrile child presentations in practice. The videos, with a mean duration of about 30 s, were originally recorded for educational purposes of paediatricians in training as part of the REMIT (Refining Evaluation Methodologies for Practice Changing Interventions) study (ISRCTN94772165). Background history and vital signs were reported as added text or could easily be interpreted from the video vignettes.
Initially, the participants were asked if they should manage the febrile child as having a SBI based on the vignette and background history (e.g. duration of fever) alone. Next, they were asked to assess the actual risk of the child having a SBI on a visual analogue scale (VAS 1 ). Finally, we add different values of C-reactive protein (CRP) and asked if their risk assessment would have changed (VAS 2 ). The online vignettes and the respondents were hosted on a secure password protected server.

Data collection
All data collected online was exported in an anonymised format as an Excel file. We collected answers on the following questions: (1) Would you manage this child as having a serious bacterial infection? (Answers: yes/no). (2) Which diagnostics or therapeutics would you perform? (Options: no action and/or discharge; antipyretic; fluid trial; blood tests; chest-radiography; lumbar puncture; urine dipstick; oral antibiotics; intravenous antibiotics; admission). Study participants could tick as many items as they judged relevant. (3) What is the chance of SBI in this child? (Answer: 0-100 % on a VAS (VAS 1 )) [1]. As CRP is the strongest predictor of the Feverkidstool, we studied the additional value of CRP in clinicians' management decision, with the following question: (4) A CRP is taken and returns at (continuous value) mg/l. What is the chance of SBI in this child? (Answer: 0-100 % VAS (VAS 2 )).
Participant's background information was collected after finishing the video vignettes. These questions included (1) Are you a: Emergency Medicine clinician/Paediatrician; (2) How long have you been working as an Emergency Medicine clinician/paediatrician? (Options: <5 years; 5-10 years; 10-15 years; >15 years); (3) Have you ever missed/recognised a serious infection too late? (Options: yes/no).

Definitions and outcome measures
All participants were informed about the predefined SBI definition in the letter for the study invitation: culture or radiographically proven bacterial infection (e.g. meningitis, sepsis, bacteremia, pneumonia, urinary tract infection, bacterial gastroenteritis, osteomyelitis or ethmoiditis). The outcome SBI in the vignettes was defined as management of the child as having a SBI.
Detailed descriptions on the Feverkidstool development and validation have been published earlier [19]. The originally reported discriminative ability according to the area under the receiver operating characteristic curve (AUC) of the model to predict pneumonia was 0.81 (standard error 0.04) and for other SBI 0.86 (standard error: 0.03) [19]. As the Feverkidstool was based on a polytomous logistic regression model, two risk scores were calculated, one for pneumonia and one for other SBI (e.g. urinary tract infection). We used the highest risk score in the comparison with the VAS risk scores of the video vignettes. We dichotomised the outcome of performed diagnostics and/or therapeutics. This outcome was scored 'present' if participants ticked fluid trial, blood tests, chest-radiography, lumbar puncture, urine dipstick, administration of oral/ intravenous antibiotics and/or admission. When 'no action and/or discharge and/or antipyretics' was chosen, the outcome was scored as 'not present'.
All vignettes had a statement on age, temperature and duration of fever. Abnormal clinical signs and symptoms were distributed amongst the different vignettes, with ten vignettes having one alarming sign, four vignettes with two alarming signs and seven vignettes having three or more alarming signs.

Statistical analysis
First, we assessed the range of estimated median risks by clinical judgement (VAS) and the risk with the added value of CRP. Second, we measured the patient characteristics which enact SBI management with discrete choice experiment (DCE) analysis. Finally, we compared VAS risk scores with prediction model based judgement (Feverkidstool).
DCEs are a quantitative approach to assess preferences for e.g. medical interventions and are increasingly used in health care [10]. In DCEs, it is assumed that important items influencing medical interventions, such as vital signs, can be described by its characteristics (i.e. attributes) [24]. Those characteristics are further specified by variants of that characteristics (i.e. attribute levels). A second assumption is that the levels of those attributes are determined by the individuals' preference for a medical intervention [24]. We studied the clinical variables of the Feverkidstool (www.erasmusmc.nl/feverkidstool) as attributes to the decision whether or not to manage febrile children of the vignettes as a SBI [13]. All DCE data was analysed by taking each choice amongst the two management alternatives as an observation. Using the Nlogit software http://www.limdep.com/ to the next sentence, the observations were analysed by a logit model. As there was a lack of diversity amongst the clinical variables 'oxygen saturation' and 'tachypnoea' between the vignettes, we could not analyse these variables accordingly. The variables tachycardia and prolonged capillary refill were taken together as one clinical variable as their correlation was too high. The influence of the different variable coefficients was tested for statistical significance (p value ≤0.05). As at this moment, no formal statistical methods to determine sample sizes for DCE exist; our study strived to reach at least 40 respondents in line with previous studies [6,26].

Results
Of the 104 invited participants, 50.4 % agreed to participate and 42 (40.4 %) participants finished the online video vignettes. The 42 final participants included 83 % paediatricians and 17 % paediatric emergency medicine physicians. Fifty per cent of the participants had a working experience of more than 10 years. Almost half of the participants had at least once missed or delayed recognised serious infection (Table 1).

Study intervention-video vignettes
In Table 2, clinical characteristics of the video vignettes are summarised. Median age of the children was 12.0 months (interquartile range (IQR) 2.0-72.0), 57 % were boys and the median C-reactive protein level (CRP) was 60 mg/l (IQR 10.0-110.0). Answers on the four questions of the video vignettes are summarised in Table 3. Forty-one per cent of the video vignettes are managed as having a SBI according to the participants. Diagnostics and/or therapeutics were started in 77 % of the video vignettes. Median risk before the knowledge of CRP (VAS 1 ) was 20.0 % (IQR 9.0-50.0) and with CRP information the risk (VAS 2 ) increased to 30.0 % (IQR 10.0-60.0). As CRP values were already available in the first video for vignette 3 and 21, no change in risk could be measured. Details of performed diagnostics, therapeutics and follow-up are described in Table 4. More diagnostics and/or therapeutics were performed when the child was managed as SBI. Antipyretics were given in 65 % of the video vignettes with no differences when stratifying by outcome (SBI M ). In 94 % of the video vignettes who were managed as SBI, blood tests were done and 71 % were hospitalised (Table 4).

Clinical judgement versus different levels of CRP
In Fig. 1, the differences in clinical risk scores are visualised versus different levels of CRP values. The median clinical risk differences (VAS 2 -VAS 1 ) were positively correlated with a higher level of CRP (SBI M yes: Pearson correlation 0.53 (p = 0.000) and SBI M no: Pearson correlation 0.68 (p = 0.000)). Risk scores of children classified initially already as being managed as SBI were influenced only by high levels of CRP (>65 mg/l), whereas children not managed initially as SBI were influenced by lower CRP levels (>40 mg/l) (Fig. 1).

Discrete choice experiment-video vignettes
Discrete choice experiment was based upon 20 video vignettes as the clinical variables of one video were too correlated. Almost all clinical variables of the Feverkidstool could be tested with DCE analysis, except for CRP, oxygen saturation and tachypnoea. Ranking and coefficients of influencing variables on management decision of febrile children according to the DCE analysis are presented in Table 5. All tested clinical variables influenced the decision on management of febrile children significantly. Ill appearance and the combined variables of prolonged capillary refill and tachycardia were the most influencing factors and age and duration of fever the least influencing factors.

Risk scores video vignettes-risk scores Feverkidstool
The median clinical risk score (VAS 2 ) according to the participants amongst those video vignettes who were assigned as managed as SBI was 60.0 % (IQR 30.0-80.5) compared to a risk score according to the Feverkidstool of 12.7 % (IQR 7.7-28.1) ( Table 6). When the video vignettes were not managed as SBI, the clinical risk score (VAS 2 ) amounted to 16.0 % (5.0-32.0) compared to a risk of 7.3 % (5.7-16.3) according the Feverkidstool (Table 7). The largest risk score differences between the vignettes and risk scores according to the Feverkidstool were seen for video vignettes with (various levels of) decreased consciousness or agitation. This item is   clearly observed when watching the video vignettes, but this clinical variable is not included in the predictors of the Feverkidstool. Finally, no differences were found in median clinical risk scores when stratified for previously missed diagnoses of the participant (p=0.218).

Main findings
This is the first study on real life video vignettes to determine febrile child characteristics which enact clinicians' management decisions. High clinical risk scores to manage febrile children as SBI were created by clinicians. All tested clinical variables of the Feverkidstool influenced clinicians' management decisions of febrile children significantly with ill appearance and aberrant circulatory signs being the most important. Moderate CRP levels influenced risk scores in children who were initially not managed as SBI whereas high CRP levels were needed to influence risk scores in children who were initially already managed as SBI. In children managed as SBI risk thresholds judged by the clinician were higher compared with predicted risk thresholds according to the Feverkidstool. Clinical risk thresholds of children not managed as having a SBI were more comparable to prediction model-based risk thresholds.

Comparison with literature
In this study, we aimed to get insight in patient characteristics and contextual factors influencing management decisions of the febrile child at the ED. One way to approach this process of diagnostic reasoning is decision making [11]. Decision making has been influenced by statistical models of reasoning under uncertainty using pre-and post-test probability according to Bayes' theorem. This model deals with two major classes of errors in clinical reasoning: in the assessment of either pretest probability or the strength of the evidence [11]. Although the pretest probability of having SBI (prevalence of disease) is depending on several factors as for example age and relevant medical history, the pretest probability determined by health care setting was considered stable in the vignettes. However, we focused on the interpretation of clinicians' strengths of evidence of the probability of a serious infection. For this decision process, we performed discrete choice experiment (DCE) analysis, which is an increasingly used method applied in studies where clinicians weigh clinical information in the diagnostic work-up [3].
In literature on diagnostic reasoning, evidence-based medicine is the most successful educational method in the translation of statistical decision theory into clinical practice [25]. Within this translation, we aimed to elaborate on the determination of quantitative decision thresholds that proved to be a complex topic. Most studies used optimised performance measures as area under the receiver operating characteristic curve (AUC) or sensitivity/specificity to establish these thresholds. Other studies described Delphi procedures to determine their clinical based cutoff points [5,18,20,22,32]. In our study, we described clinicians' assigned median risk estimates according to which patients would have been managed as SBI. We observed agreement on clinical and prediction model-based risk thresholds when clinicians decided not to manage the febrile child as a SBI. However, the clinical risk threshold to manage the child as SBI was much higher compared with prediction model-based judgement. This phenomenon is well recognised, as clinicians don't want to miss serious, but treatable diseases, there is a tendency to overestimate the probability of these diseases [11].

Clinical and research implications
The most important finding of this study includes the high risk scores clinicians assigned to those children who they would have managed as SBI (median risk 60.0 % (IQR 30-80.5)). This observation is in contrast to our hypothesis that very low risk thresholds might be chosen for specific diagnosis with high morbidity/mortality (e.g. meningitis). Apparently, clinicians create more dichotomous risk estimations (high risk or low risk) for the management of specific serious infections with reassessment of risk estimates after every diagnostic step. Clinicians used a stepwise approach in the management of febrile children, rather than considering one risk thresholds for SBI in general. We observed agreement in predictive value of all tested clinical predictor variables in the detection of children with SBI, for both clinical-based as prediction model-based judgement. Clinicians were guided by ill   appearance and aberrant circulatory signs in their febrile child evaluation, which were not the most influencing factors according to the Feverkidstool. For the Feverkidstool respiratory predictors as chestwall retractions and oxygen saturation were more powerful influencing factors. Furthermore, we found that CRP levels influenced clinical risk scores differently in children with or without initial SBI management, with higher influence of clinical factors than of CRP value. In our study population, this approach was not enhanced by experiences of errors in the past. These insights in influencing factors in the clinical prediction of febrile children at risk for SBI helps us to understand, review and evaluate clinical management decisions. Compared to prediction model based risk scores, thresholds of children who were not managed as having a SBI were more comparable, ranging from 7 to 16 %. We might have to conclude that this risk threshold is justified as SBI rule-out threshold, but no agreement can be defined on rule-in thresholds as there appears too much difference between prediction model and the clinical stepwise risk assessment in children managed as SBI.

Strengths and limitations
The main strength of this study is the use of real-life videos instead of paper-case patients. This approach is a more representative way of portraying real life, and there is an evolving evidence base on the use of patient video cases as educational interventions [8,23].
A second strength of the study is the use of the Feverkidstool as an arithmetic model to compare the subjective overall assessment of the clinician when evaluating the febrile child. In a review describing vignette studies on medical decision behaviour, it was concluded that most studies on this topic did not compare their results to some sort of normative benchmark [3]. Moreover, the role of prediction models becomes greater, as clinicians may increasingly rely on alarming signs and symptoms described in (inter)national clinical guidelines and prediction models due to decreasing incidence of SBI. Although, there was a discrepancy in risk assessment of some video vignettes (e.g. vignettes 7, 11 and 18), probably due to the absence of variables as decreased consciousness or agitation in the Feverkidstool.
There are some other limitations in this study. Videos still lack some aspects of real life such as observation time or concise descriptions of patients' history. However, from literature, we know that more detailed case descriptions will be assigned a higher subjective probability of disease than a brief abstract of the same case, even if they contain the same disease information [11]. Another limitation includes the determination of some clinical variables by the clinicians' judgement (ill appearance, chestwall retractions and capillary refill time). In this way, misclassification of these clinical predictors could have occurred. However, this approach does reflect clinical practice and therefor may just strengthen generalisability of our results.
Next, the DCE analysis had to be performed within the availability of a limited number of video vignettes. As a consequence, we were forced to exclude or merge some predictor variables (e.g. oxygen saturation and tachypnoea) to meet the DCE theory design. Second, although a response rate of 50 % for clinicians was similar to other DCE studies, this response rate is not optimal [2,9,29]. However, due to the experienced background of all participants, we assume limited answer variability resulting in representative study results.

Conclusion
In this study on real-life video vignettes, we observed high risk scores in clinicians' risk estimation of SBI management in febrile children, and these risks are mostly influenced by the clinical characteristics ill appearance and aberrant circulatory signs. Uniform risk thresholds at which one should start SBI management in febrile children remains unclear, as the  concept of clinicians' dichotomous risk thresholds was hardly comparable to the overall SBI risk assessment of the prediction model. However, more consistent results were found for clinical and prediction model-based risk thresholds at which we refrain from SBI management in the febrile child visiting the emergency department.