Background

The pressure on the out-of-hours (OOH) healthcare services, i.e. OOH primary care (OOH-PC), emergency departments (EDs), and prehospital emergency medical services (EMS), is increasing in many countries [1, 2]. Telephone triage plays a pivotal role in managing patient flows and workload [1, 3, 4]. Securing a safe and efficient telephone triage is a challenge as it must balance a minimum of undertriage securing high patient safety, while keeping overtriage at an acceptably low level. Existing OOH-PC services vary and use different triage models [5, 6], and involvement of general practitioners (GPs) is debated [1, 4, 7]. Many countries experience increasing shortage of GPs [8, 9], and GPs report high self-perceived stress and multiple burnout symptoms [10, 11]. In most countries, telephone triage in OOH-PC services is performed by nurses using a computerised decision support system (CDSS) [6]. In Denmark, GPs primarily perform the telephone triage [3].

Previous studies have explored the safety and efficiency of telephone triage in OOH-PC services [12,13,14,15,16,17,18,19,20,21,22,23]. Some have questioned the safety of telephone triage conducted by nurses [13, 24], especially for high-risk calls [24]. Newer studies suggest nurse triage to be safe [1, 12, 20, 25], and concerns mainly regard efficiency [1, 26]. However, most previous studies have described only nurse-led telephone triage conducted in study settings using vignettes [17, 18], simulated patients [13, 15, 19], or review of patient records [20]. This approach has provided little uniformity of outcome measures regarding accuracy of triage, patient safety, and efficiency [16]. Moreover, comparative studies of nurse- and GP-led telephone triage are sparse and mostly describe the quality of telephone triage in daytime [25, 27] rather than OOH [12]. To our knowledge, no existing studies have compared telephone triage by physicians with different specialities. Consequently, comparative studies of the quality of OOH telephone triage by GPs, nurses, and physician with different specialities in natural settings with real patient calls are needed.

After a reorganisation in 2014, two organisations for OOH-PC exist in Denmark alongside, one with nurse-led telephone triage using CDSS and physician-led triage, and one with GP-led telephone triage. This situation made it possible to explore the quality of the two OOH telephone triage models in a natural setting. In this study, we aim to explore and compare the safety, efficiency, and health-related quality of telephone triage at OOH-PC services performed by GPs, nurses using CDSS, or physicians with different medical specialities.

Methods

Design and setting

We conducted a natural quasi-experimental study in two OOH-PC services in Denmark. We selected the GPC in the Central Denmark Region using GP-led telephone triage and the medical helpline 1813 (MH-1813) in the Capital Region of Denmark using telephone triage performed by registered nurses with a CDSS and physicians with different medical specialities (see Table 1).

Table 1 Description of the OOH organisations in two included telephone triage models

In 1992 a reform introduced large-scale GPCs, with GP specialists performing telephone triage [3]. In the Capital Region of Denmark a reorganisation in January 2014 formed the MH-1813, where nurses answer calls using a CDSS with the option to redirect calls to physicians on duty. All triage nurses are certified as registered nurses indicating a completed 3.5-year professional bachelor’s degree and completed a 6-week introductory course when employed in MH-1813, and MH-1813 conducts regular audits of nurse calls. Besides answering the redirected calls from nurses, physicians answer approximately one third of all calls to the MH-1813 directly. Physicians employed at MH-1813 have different medical specialties (e.g. internal medicine, pediatrics, anesthesiology, surgery) and varying experience (including junior physicians), with only a minority being a GP. We refer to this group as physicians in the rest of the article. The CDSS is also accessible for physicians, without an obligation to use it [Personal communication with MH-1813] [31]. The GPC and the MH-1813 are open outside office hours, i.e. on weekdays from 4 pm to 8 am, weekends all day, and national holidays, offering telephone consultations, clinic consultations and home visits. The MH-1813 is accessible 24 h/ per day, but only calls outside office hours were included to match the GPC opening hours. The OOH-PC services routinely audio-record all calls and have an administrative registration system. We were unable to access patient information on ethnicity, educational level, socio-economic status, or comorbidity.

Selection of calls

We aimed to include an equal distribution of calls triaged by GPs, nurses, and physicians. For our power calculation, we used the level of undertriage, as this potentially has most clinical implications. Based on literature, we assumed an undertriage rate of approximately 9.5% for a power calculation, an ability to detect a 5% difference in undertriage between triage professionals, with a power of 0.8 and an alpha of 0.05. Thus, 435 calls per group of triage professionals were needed. All calls answered directly by a triage professional at GPC or MH-1813 outside office hours during the inclusion period (MH-1813: 23 November − 8 December 2016, GPC: 23 November - 7 December 2016) were eligible (Fig. 1). For calls redirected by a nurse to a physician at MH-1813, only the part conducted by the nurse was eligible. Based on available registration information, we selected eligible calls (Table 2). From all eligible calls we randomly selected (500 calls per triage professional group, matching the overall distribution on day of week (i.e. weekend/not weekend) and time of day (i.e. day, evening, night) using STATA. We selected 525 GPC calls, as we expected more exclusions due to the lack of a separate direct telephone number for nursing homes. Each selected call had a unique identification number that was used to identify the corresponding audio-recorded call.

Fig. 1
figure 1

Flowchart of selection and exclusion of calls from the GPC and MH-1813. Note: For definition of exclusion criteria see Table 2; £More calls were selected from the GPC, to account for the higher expected number of calls from other health professionals

Table 2 Exclusion criteria

Three master students of medicine masked the audio recordings using beep tones to mask triage profession, OOH organisation, and patient identification information. These medical students were trained and each student was supervised for the first 20 calls by DSG in the masking and exclusion process. If a call fulfilled or if the student was in doubt if the call fulfilled the exclusion criteria. Final decisions to exclude or not were made by first author (DSG), or if in doubt, a consensus was reached between DSG and AFP. Due to an unforeseen partly system failure of the IT system at MH-1813 for 3 days, we were unable to get the audio-recordings of 194 selected calls (22% of all MH-1813 calls). We substituted these with randomly selected calls from the following week, matching on day of week and time of day.

Assessment tool

Assessments were performed using the tool “Assessment of Quality in Telephone Triage” (AQTT) and the accompanying rating manual printed in a booklet (Appendices 1 and 2 provides an overview of the 24 items and the general rating scale for most specific items). The AQTT was thoroughly developed and tested, with satisfactory inter-rater agreement when distinguishing poor from sufficient performance [32]. The AQTT comprises 24 items assessing the health-related quality (eleven specific items), quality of communication (nine specific items), as well as four overall items of the assessors’ general perception of the quality of communication, health-professional quality, patient safety, and efficiency. The majority of items are rated on a 5-point Likert scale with an additional category “not applicable” (“n/a”) if an item is correctly found not relevant or available information is insufficient for assessment. The accuracy of the triage decision (item 11) is assessed on a 7-point scale to differentiate between levels of undertriage and overtriage (defined in footnotes of Table 6). The AQTT provides explicit definitions of when to apply the specific ratings for each item, including when to score “n/a”. Overall items are measured on a 10-point visual analogue scale, representing the general perception of the assessor, after scoring of all specific items. We present results on the eleven health-related items and three overall items (Table 3).

Table 3 Overview of specific health- professional items and items assessing overall quality

Assessment panel

We recruited 24 physicians for the assessment panel among triage professionals from the GPC and MH-1813 using two inclusion criteria: > 1 year experience and currently active in telephone triage in OOH-PC. An email invitation was sent to all GPs and physicians by their organisers. Using STATA we randomly selected 16 GPs from the 56 interested GPs at the GPC, matching age and sex distribution. At the MH-1813, we included all eight physicians fulfilling our inclusion criteria from the ten interested physicians. All assessors followed a two-day training course providing knowledge on telephone triage and communication, introducing the AQTT and rating manual, and assessing triage calls individually and in plenary, focusing on achieving consistency.

Assessment process

After collection, we renamed all audio-recorded calls and distributed them at random to the assessment panel, regardless of OOH service, with one assessor per call. Thus, each assessor assessed calls by all triage professionals. Information on age and sex of the patient, day of week, and the time of each call was available. Assessors made their assessments at home; each assessed a median of 53 (range: 48 to 61) calls during a median period of 111 days.

Statistical analyses

For health-related specific items, we categorized the outcomes into poor quality (rated “1” or “2”) and sufficient quality (rated “3”, “4”, or “5”). “Not applicable” (“n/a”) was recoded into “missing”. Accuracy of triage decision (item 11) was categorised into clinically relevant undertriage (rated “1” or “2”) and clinically relevant overtriage (rated “6” or “7”). These categorizations were based on the satisfactory inter-rater agreement of the AQTT [32].

We used descriptive analyses to describe patient and call characteristics stratified by triage professional group. We conducted an overall comparison of patient and call characteristics using chi-square test for categorical variables and Kruskal-Wallis test for continuous variables (significance level < 0.05). In case of a significant difference, we conducted a post-hoc pair-wise comparison using chi-squared test for categorical variables and Mann-Whitney U-test for continuous variables with Bonferroni adjusted significance level (< 0.025). We also used descriptive analysis to describe the ordinal-scaled health-related specific items, excluding the rating “n/a” from our analyses. We calculated the relative risk (RR) of having poor quality (i.e. rated “1” or “2”) versus sufficient quality (i.e. rated “3”, “4” or “5”) on the health-related specific items and of clinically relevant undertriage or overtriage (vs. not clinically relevant undertriage or overtriage) for the three groups of triage professionals, using binomial regression. All comparative analyses were conducted pairwise using GP-led triage as reference group. The items measuring the overall perceived quality were compared by ranksum between triage professional using non-parametric Mann-Whitney U-test as most distributions did not follow normal distribution.

We noticed a tendency to overestimate the quality of GP-led triage for assessors from GPC (i.e. GPs) compared with assessors from MH-1813 as well as the reverse: assessors from MH-1813 overestimating the quality of physician-led triage compared with GP assessors. We concluded that a “similar-to-me” bias was present in the data, i.e. assessors giving a slight bonus to triage led by a similar triage professional to themselves [33]. Since the dataset is unbalanced (GPC: 16 vs. MH-1813: 8) and, more importantly, since nurses could never receive such favorable assessment, we decided to adjust the RR estimates of poor quality and of clinically relevant under- and overtriage for whether or not assessor had the same professional background as the triage professional. All analyses were performed in STATA 14.2 (StataCorp. 2015. Stata Statistical Software: Release 14.2. College Station, TX: StataCorp LP).

Results

Population

In our final analyses, we included 423 calls triaged by GPs, 430 by nurses, and 441 by physicians of different medical specialties (Fig. 1). No differences in triage calls were identified between GPs and nurses and between GPs and physicians concerning patients’ age and sex and time of call (Table 4). An explorative analysis comparing calls of nurses and physicians revealed a significant difference in patients’ sex (p = 0.006 not shown in table). Nurse telephone calls were significantly longer (mean = 4 min 44 s, SD: 168 s) compared to calls triaged by GPs (mean = 2 min 57 s, SD 105 s) and physicians (mean = 4 min 1 s, SD: 146 s).

Table 4 Baseline distribution of patient and call characteristics, stratified by triage professional group

Health-related specific items

Figure 2 shows the distribution of ratings for each group of triage professionals, with varying use of “n/a” between items and between triage professional. For four items the RR of poor quality was significantly lower for nurses compared with GPs: “asks to speak to patient” (RR = 0.68, 95% CI: 0.52–0.89), “identifies problems” (RR = 0.66, 95% CI: 0.52–0.83), “asks essential questions” (RR = 0.77, 95% CI: 0.63–0.94), and “asks about medical history” (RR = 0.82, 95% CI: 0.68–0.97) (Table 5). Physicians had a significantly higher RR of a poor quality than GPs for four items (i.e. 6, 7, 8, 9). Table 5 additionally, shows the RR estimates adjusted for evaluator background (GPC, MH-1813) (i.e. similar-to-me) and the uneven constitution of assessors (assessors from GPC:MH-1813 – 16:8).

Fig. 2
figure 2

Distribution of assessments when item was applicable. Note: Distribution of ratings for each specific health-related item. When an item was scored as “not applicable”, the call was excluded from the distribution for that particular item. Items 1 and 2: The scale for items 1 and 2 ranges from only one to three, as performance can only be insufficiently performed or performed but with no possibility to excel (thus, “good” or “optimal” performance is not possible). Item headlines in abbreviated form. For full length headlines, see Table 4

Table 5 Assessment of percentage poor and relative risk (RR) of poor quality of health-related items for different triage professionals

Accuracy of triage outcome

Only 3.7% of calls triaged by nurses were clinically relevant undertriaged, whereas GPs (7.3%) and physicians (6.1%) had higher percentages (Table 6). Consequently, the risk of clinically relevant undertriage was significantly lower for nurses compared to GPs (RR = 0.51, 95% CI: 0.28–0.93). Compared to GP-led triage, the risk of being clinically relevant overtriaged was significantly higher for nurse-led (RR = 2.13, 95% CI: 1.22–3.73) and physician-led triage (RR = 1.93, 95% CI: 1.10–3.39).

Table 6 Assessed triage decision and relative risk (RR) of optimal triage, undertriage and overtriage for triage professionals

Overall perceived quality

The overall perceived health-professional quality and efficiency of telephone triage was significantly lower for both nurses and physicians compared with GPs (Table 7). The overall perceived patient safety was significantly lower for physicians compared with GPs.

Table 7 Assessed overall health-related quality, safety, and efficiency per triage professional

Discussion

Principal findings

We found a significant lower risk of poor quality for nurse triage compared to GP triage in four out of ten health-related items that focus on identifying and uncovering the problem and requesting to talk directly to the patient. In four out of ten items, the risk of poor quality was significantly higher in calls triaged by physicians with different medical specialities compared to GPs. The risk of clinically relevant undertriage was significantly lower for nurses compared to GPs. However, compared to GPs, both nurses and physicians had significantly more clinically relevant overtriage. In addition, the calls were significantly longer for nurses compared to GPs, and the overall perceived efficiency was significantly higher in GP-led telephone triage compared to nurse-led and physician-led triage. The overall perceived safety was significantly lower in physician-led triage and tended to be higher in nurse-led triage compared to GP-led triage.

Strengths and weaknesses of the study

To our knowledge, this is the first study to compare the quality of OOH telephone triage performed by GPs, nurses using CDSS, and physicians in a real-life setting. Major strengths are the use of randomly selected real-life calls as opposed to the constructed setup used in previous studies [18,19,20, 34, 35] and the assessment of a range of outcome measures. Additional strengths are the study size with 1294 calls and the meticulous assessment process using the validated AQTT tool combined with a comprehensive rating manual that included clear definitions per answering category for each item, thus reducing the subjectivity of the assessments.

Our study also had some limitations. Multiple assessors per call would have been preferable, but due to the thorough assessment process, this was not feasible. Thus, each call was only assessed by a single assessor. We took several precautions to ensure consistency of assessments; the assessors followed a comprehensive training course, assessments followed the carefully developed and validated AQTT [32], and audio-recordings were attempted masked for information about organisation and triage professional. Moreover, in comparative analyses we dichotomised ratings (distinguishing poor from sufficient quality), which was supported by the satisfactory inter-rater agreement of the AQTT [32].

Post-hoc sensitivity analyses revealed a similar-to-me cognitive bias [33], indicating that the risk of poor quality in calls assessed by an assessor similar to the triage professional tended to be assessed lower than if not assessed by a similar assessor. Furthermore, the decision to include only physicians (GPs from GPC and physicians from MH-1813) in the assessment panel may have induced cognitive bias when assessing nurse-led triage. We chose these assessors as no consensus exists on the best professional for assessing quality of telephone triage [13,14,15, 17, 36], and physicians or GPs have most frequently been used in other studies [13,14,15]. Moreover, our assessment panel was unbalanced with more assessors from the GPC compared to MH-1813 (16:8). We adjusted for the similar-to-me bias and for the uneven distribution of assessors. The adjusted RR of poor quality and of clinically relevant undertriage and overtriage generally favours nurse triage with lower RR of poor quality. The adjusted RR were comparable to the crude estimates but points towards smaller difference between GPs and physicians for most items. However, the use of non-parametric ranksum for the overall perceived quality items did not allow these adjustments. As these items encompass a high level of subjectivity, we assume that adjustment for these factors may have increased differences between the triage professionals.

No differences in calls were seen between the compared groups concerning age, sex, and time of call. We know that populations in the different regions differ, as the percentage of immigrants and the level of education is higher in the Capital Region (MH-1813) [28]. If these differences also exist for callers to the OOH services, this could potentially give case mix with different levels of difficulty in triage contacts. Moreover, data on other factors like co-morbidity and socioeconomic status were regrettably not available. In addition, some items had considerable proportions of “n/a” assessments, as intended, with significant differences between triage professionals in four items. Thus, some case mix cannot be rejected and should be considered especially when interpreting comparisons with small number of calls. Furthermore, we did not have access to background characteristics of the triage professionals, such as age, gender, experience, and education. The management of “n/a” was ambiguous as it could both reflect a correct performance (i.e. “correctly found not relevant”), but could also potentially cover a poor performance (i.e. “available information is insufficient for assessment”). In the testing of the reliability of AQTT “n/a” was recoded into “3”, but for the purpose of this paper, we chose to exclude “n/a”. Managing “n/a” as “sufficient quality” could overestimate the quality. A post-hoc sensitivity analysis of the inter-rater ICC reliability excluding “n/a” did not change the reliability considerably, and always towards a higher reliability. In the analyses we have performed many tests so significance by change cannot be excluded. A solution could be adjusting significance levels by Bonferroni consistently throughout all analyses, but this has been suggested to be too conservative and associated with increased risk of type-2 errors [37].

Interpretation and comparisons of results

Our study revealed that the quality of nurse-led triage using CDSS was higher than GP-led triage for most items and tended to be lower for physicians. However, we cannot say whether these differences are attributed to (non-)use of CDSS, differences in educational background, personality, and/or organisational conditions. CDSSs are developed to support health professionals in asking all essential questions [38] and ensuring consistency [39]. This corresponds to our finding that nurses are better at identifying and uncovering the problems. The differences between physicians and GPs, who did not use CDSS, could suggest that the medical background may be of relevance. The better ability of GPs to prioritise the problems and collect sufficient and complete information compared to physicians with different medical specialities could be attributed to GPs having more experience with similar unvisited patient populations in the daytime.

The rate of cumulated undertriage was 10.3% for nurses, 17.8% for GPs, and 17.8% for physicians, which is in line with other studies of nurse triage in controlled settings (12 to 41%) [13, 17, 18, 40]. To our knowledge undertriage has not been explored in GP triage. Two large-scale register-based randomised controlled trials comparing GP- and nurse-led telephone triage in daytime [25] and OOH [12] also suggested that nurse-led triage is safe, finding no excess deaths, hospital admissions, or increased ED attendance attributable to nurse-led triage.

Efficient OOH telephone triage incorporates multiple indicators, including overtriage and length of call. We found that the rate of cumulated overtriage was lowest in GP triage (GP: 11%, nurse: 23%, physician: 20%). The overtriage rate in other studies ranges from 12.5 to 19.3% in nurse-led triage [13, 17, 18]. Telephone calls triaged by nurses were significantly longer than calls triaged by GPs, which is supported by a study [41], but contradicted by another study [42]. The interpretation of the length of a call is ambiguous. A longer call may be more efficient if the problem is sufficiently resolved than a shorter call that does not sufficiently resolve the problem as this may lead to a new contact.

Future research and practical implications

Our results show that decision-makers should be aware that different triage professionals can cause differences in the quality of telephone triage and may influence the distribution of workload in primary and secondary OOH services. Nurse-led triage as a solution for high GP workload seems feasible, but further research is needed in this field as fewer GPs are required in telephone triage but more GPs may be needed in face-to-face consultations.

Future research should compare the long-term outcomes following a telephone call to OOH primary care related to safety (e.g. mortality, hospital admission rates, and adverse events), efficiency (e.g. influence on GP workload, workload in the OOH services, and follow-up contacts), and patient satisfaction. Additionally, future research should investigate influence of using a CDSS and factors associated with potentially unsafe and inefficient calls, including the characteristics of the triage professional and the type of call.

Conclusion

Keeping limitations in mind, our explorative study indicated that nurses using CDSS performed better than GPs in telephone triage, especially in four out of ten specific health-related items concerning identification and uncovering of the problem. Moreover, nurse-led triage was characterised by a lower level of clinically relevant undertriage, but more clinically relevant overtriage, and was perceived less efficient compared to GP-led triage. Calls triaged by physicians with different medical specialities were perceived less safe and less efficient compared to GPs and tended to receive lowest ratings on most specific items. The use of different triage professionals can influence the quality of telephone triage, and may influence the distribution of workload in primary and secondary OOH services. Future research could compare the long-term outcomes following a telephone call to OOH-PC related to safety and efficiency.

Definitions

“Health-related quality”: the term health-related quality refer to the measured quality in the specific items (used in the red specific items in appendix 1).

“Health-professional quality”: the term health-professional quality refers to the measured quality exclusively in item 22 assessing the overall perceived health-professional quality.