Background

Non-attendance, defined as a missed appointment without prior notification, is an important obstacle for adequate management of healthcare centers. High non-attendance rates are associated with increased waiting lists and healthcare and societal costs, as well as reduced effectiveness and efficiency of the healthcare system [1, 2]. At the patient level, missed appointments may lead to inadequate follow-up and late diagnosis or complication management, thus increasing the health risk of non-attendees. Reported non-attendance rates worldwide are highly heterogeneous and range from 13.2% (average countries in Oceania) to 43.0% (Africa); the estimated average rate in Europe is 19.3% [3].

Various authors have proposed interventions to reduce the harmful effects of non-attendance, such as overbooking [4] and open access [5], or to improve attendance rates directly, for example, by providing information, reminders, and incentives to patients [6,7,8]. Of them, the use of appointment reminders based on short message services (SMS) and telephone calls have been widely used [9,10,11]. Although current evidence suggests equal effectiveness of both interventions, reported results are heterogeneous, and most studies have low-quality design [10].

Regardless of the reminding strategy, identifying patients at higher risk of non-attendance may reduce costs and resources, thus increasing the sustainability of the intervention. The determinants of non-attendance are complex and may include patient-related factors (e.g., age and gender), their previous attendance history, and factors associated with the given appointment (e.g., lapse from schedule date, and weekday and season of the appointment) [12, 13]. In the last few years, a growing number of models for predicting no-shows have been proposed; however, most of them achieved an accuracy lower than the attendance rate [14]. The poor performance may be attributed to multiple factors that challenge model development, such as the type of data available or the sample size. Furthermore, the high variability of non-attendance rates worldwide suggests that behavioral determinants of non-attendance and the effectiveness of mitigating measures may depend on the country and healthcare system organization. Unlike traditional statistics for predicting outcomes, which rely on predetermined equation as a model, machine learning algorithms adaptively improve their performance as the number of samples available for learning increases. These techniques are particularly suitable for predicting complex outcomes, such as those that depend on human behavior [15, 16].

Therefore, we aimed to develop a machine learning model for predicting patients’ non-attendance and assess the effectiveness of selective phone calls to patients at high risk of non-attendance according to the resulting model.

Methods

Overview of study design

This study was conducted at two outpatient services (i.e., dermatology and pneumology) of the Hospital Municipal de Badalona (Spain) and included three stages: (1) the development of a non-attendance predictive model for each outpatient service, (2) the prospective validation of the resulting models, and (3) a pilot study to assess the effectiveness of integrating the predictive model into the organization of the healthcare provider.

Candidate models were developed using retrospective data from appointments scheduled between January 1, 2015, and November 30, 2018. Data were randomly assigned to one of the following two sets: 75% of the collected data were used for model building and algorithm training, and the remaining 25% were used in a retrospective validation of the model. The predictive capacity of the selected model was then validated prospectively using data from appointments scheduled between January 7 and February 8, 2019. Finally, we conducted a pilot study to assess the effectiveness of a preventive intervention based on selective phone call reminders to patients identified as high-risk of non-attendance according to the selected model. The pilot study was conducted between February 25 and April 19, 2019.

All data, including retrospective information for model building and prospective information of the pilot study, were collected in a pseudonymized way and handled according to the General Data Protection Regulation 2016/679 on data protection and privacy for all individuals within the European Union and the local regulatory framework regarding data protection. The pilot study included in this report was not intended to change biomedical or health-related outcomes; therefore, the research committee of Badalona Serveis Assistencials waived the need for ethics committee approval.

Variables collected for model development and validation

We collected three types of variables from the Electronic Medical Record database: sociodemographic characteristics of patients, characteristics of the appointment, and history of patients’ attendance. Sociodemographic characteristics included gender, age, nationality, marital status, and home address, which was used to calculate the distance from the patient’s home to the hospital. Characteristics of the appointment included hour, weekday, month, type of visit (first, second, successive), the reason for the visit, treatment category, physician, lead time (days since scheduling until the appointment date), and rescheduling. Variables regarding the record of patient’s attendance included the history of previous attendance, number of prior visits, days since the last appointment, and the last appointment status.

Predictive model development and validation

We conducted bivariate analyses to identify relationships between the available variables and non-attendance and correlations between covariates to rule out strong interactions. All variables with a significant association with non-attendance were included in training algorithms based on the following models: decision trees, XGBoost, Support Vector Machines (SVM), and k-nearest neighbor (kNN). For each learning algorithm, a 5-fold cross-validation and a grid search for hyperparameter optimization was used in the training, considering all significant variables. Class imbalance (approximately, 80% of attendees and 20% of non-attendees) was addressed by stratified random sampling.

The performance of the obtained model was retrospectively assessed using the dataset reserved to this end. Because the model was intended to identify patients at high risk of non-attendance, specificity, defined as the proportion of real non-attendees among all identified by the algorithm as high-risk , was used for measuring performance. Sensitivity (i.e., the proportion of real attendees among low-risk patients) and accuracy (i.e., the proportion of appointments predicted correctly) were also estimated. Model selection was based on a balance between (1) maximizing specificity and accuracy and (2) the explanability and interpretability of the algorithms. The model performance in predicting non-attendance was prospectively validated using the same definitions of performance as for the retrospective validation. The only exception was considering balanced accuracy instead of raw accuracy because of class imbalance in the prospective validation.

Pilot study

The pilot study included all consecutive patients with at least one appointment scheduled between February 25 and April 19, 2019, in either of the two involved services. The primary endpoint of the pilot study was the reduction of the non-attendance rate among patients at high risk of non-attendance according to the predictive model obtained. The week before the appointment, patients who were considered at high risk of non-attendance were randomly assigned to either a control or intervention group, balanced regarding age and gender. Right after randomization (i.e., one week before the appointment), patients allocated in the intervention group received a reminder phone call (up to three contact attempts) in which they were encouraged to either attend or early cancel the visit, whereas those in the control group did not receive any reminder. The outcomes related to the appointment reminder (i.e., whether the patient was reached, appointment cancellation or rescheduling, appointment attendance) were recorded. A post-intervention self-guided debriefing session was conducted on April 26, 2019, following a 3-phase conversational structure, including reaction, analysis and summary phases [17]. Two dermatology and two pneumology specialists, together with the responsible of administrative management and three directors (Medical Officer, Information Officer and Management Officer) participated in the conversation.

Statistical analysis

Continuous variables were presented as the mean and standard deviation (SD), and categorical variables as frequency and percentage. Non-attendance rates were calculated by dividing the number of non-attended visits by the number of scheduled visits on a given period. Data from remote appointments, and negative days of waiting time (i.e., introduced in the program after the visit) were excluded from the analysis. Specificity, sensitivity, and accuracywere estimated directly from the contingency table of predicted and real missed appointments, whereas balanced accuracy was calculated as (sensitivity+specificity)/2. For variable selection, categorical variables were compared using the Chi-Square test, whereas continuous variables were compared using analysis of variance (ANOVA). Correlations between quantitative variables were analyzed using the Pearson correlation test, whereas correlations between qualitative variables were analyzed with Cramer’s V coefficient. The significance threshold was set at a bilateral alpha value of 0.05. All analyses were performed using the R software (version 3.6.1).

Results

Variable analysis

Non-attendance algorithms were developed using data from 33,329 appointments scheduled in the dermatology service and 21,050 in the pneumology service. The global non-attendance rates of these appointments were 20.90% and 18.37% for dermatology and pneumology outpatient services, respectively. When comparing the sociodemographic characteristics, appointment characteristics and attendance history of patients who attended the appointment in the dermatology outpatient service and those who not, significant differences were observed in all variables except gender and marital status (Table S2, Supplementary file 1). Similarly, all variables showed a significant association with non-attendance in appointments in the pneumology outpatient service, except gender, physician, and number of reschedules (Table S3). We found no strong correlations between variables, neither categorical nor numerical (Table S4).

Model and prediction performance

After assessing both (1) the specificity, sensitivity, and accuracy,and (2) the explanability and interpretability of four training algorithms, we selected the decision trees algorithm for model development. The algorithm kNN yielded unacceptable results in terms of sensitivity, whereas XGBoost and SVM resulted in similar metric performance values to those of decision trees. Table S1 (Supplementary file 1) summarizes the performance values of each model. Figures 1A and 2A show the design of the resulting predictive models for dermatology and pneumology outpatient services, respectively. In the dermatology predictive model, the patient’s history of previous attendance was the most relevant factor to predict non-attendance in the future, followed by major ambulatory surgery, the status of the last appointment, number of prior visits, and age (Fig. 1B). This model displayed a specificity of 79.90%, a sensitivity of 67.09%, and an accuracy of 73.49%. Similarly, in the pneumology predictive model, the patient’s previous attendance history was also the most important variable to predict non-attendance, followed by lead time, the status of the last appointment, number of prior visits, and number of days since the last visit Fig. 2B. The specificity, sensitivity, and accuracy of this model were 71.38%, 57.84%, and 64.61%, respectively.

Fig. 1
figure 1

Dermatology model for predicting the non-attendance risk. A Relative importance of variables, according to the Gini index. B Decision tree representation; each leaf includes the following information: probability of the model (true: > 0.5; false: < 0.5), probability of each class within the node (values between 0 and 1), and percentage of observations of the node

Fig. 2
figure 2

Pneumology model for predicting the non-attendance risk. A Relative importance of variables, according to the Gini index. B Decision tree representation; each leaf includes the following information: probability of the model (true: > 0.5; false: < 0.5), probability of each class within the node (values between 0 and 1), and percentage of observations of the node

Model validation

The prospective validation of the non-attendance predictive models included 758 and 637 appointments in the services of dermatology and pneumology, respectively. In the dermatology service, the predictive model identified 348 (45.91%) appointments at high risk (i.e., ≥50% likelihood) of non-attendance, 123 of which were actually missed appointments. The total number of real non-attendances was 157, thus yielding a specificity of the model of 78.34% (95%CI 71.07, 84.51). The sensitivity and balanced accuracy of this model were 62.56% (95%CI 71.07, 84.51) and 70.45%, respectively. Correspondingly, 283 (44.43%) appointments scheduled in the pneumology service were identified as high risk of non-attendance, 81 of which were missed appointments. The total amount of real non-attendances was 116, resulting in a specificity of 69.83% (95%CI 60.61, 78.00). The sensitivity and balanced accuracy of the pneumology model were 61.23% (95%CI 56.89, 65.43) and 65.53%, respectively. Compared with the retrospective validation used during model development, specificity in the prospective validation was reduced by approximately 1.5 percentage points.

Pilot study

During the study period, 1,311 individuals had at least one appointment to either the dermatology or pneumology outpatient services that was identified as high risk non-attendance according to the selected model. Among them, 1,108 (805 and 303 in the dermatology and pneumology services, respectively) had available data and were, therefore, included in the analysis. Of the 805 patients with scheduled visits in the dermatology service, 390 (48.45%) were allocated to the intervention group and 415 (51.55%) to the control group. Correspondingly, 303 individuals had scheduled visits to the pneumology service, 146 (48.18%) and 157 (51.82%) allocated in the intervention and control groups, respectively. Table 1 summarizes the baseline characteristics of the individuals enrolled in the pilot study. None of the variables showed significant differences between control and intervention groups, except the time from the last visit among individuals visited at the pneumology service, which was higher in the intervention group than in the control group.

Table 1 Baseline characteristics of the pilot study population

In the dermatology setting, 267 (68.46%) individuals allocated in the intervention arm were successfully contacted by phone. From which, 251 attended the appointment, and 16 missed it (non-attendance rate 5.99%). Regarding the pneumology service, 95 (65.07%) individuals of the intervention group were successfully contacted; 86 of them attended the appointment, and 9 did not (non-attendance rate 9.47%). Table 2 summarizes the non-attendance rate of each group in each clinical setting. Overall, the interventions applied resulted in a significant decrease of the non-attendance rate for both dermatology and pneumology services, with a reduction of non-attendance of 50.61% and 39.33%, respectively. In both services, non-attendance rates were significantly lower among individuals in the intervention group that were successfully contacted than those who could not be reached (79.54% and 62.85% reductions for dermatology and pneumology services, respectively).

Table 2 No-shows in the pilot study, No. (%)

All participant of the post-study debriefing consistently perceived the intervention as successful. However, two issues were identified: (1) the overload of the hospital agenda after preventing non-shows, and (2) the overburden of the administrative staff associated with phone calls to patients at high risk of non-attendance.

Discussion

We found that the models that better predicted non-attendance in dermatology and pneumology outpatient services were based on decision trees and included the following variables: patient’s history of previous attendance, major ambulatory surgery, status of the last appointment, number of previous visits, and age, for dermatology, and patient’s history of previous attendance, lead time, status of the last appointment, number of previous visits, and number of days since the last visit, for pneumology. The use of the prediction models to identify individuals at high risk of non-attendance for further selective phone call reminders allowed reducing in approximately 50% and 40% the non-attendance rate in dermatology and pneumology services, respectively.

The systematic review conducted by Carreras et al. showed that at least half of the studies on no-show prediction identified age, gender, distance from home to the healthcare center, weekday, visit time, lead time, and history of previous attendance as predictors of non-attendance; marital status and visit type (first or successive) were also frequently used [14]. Our findings were mostly in line with the results reported by Carreras et al., although we did not find an association between gender and non-attendance, as reported elsewhere [18, 19]. Other studies described that non-attendance was associated with the number of previous appointments [20, 21], the status of the last appointment [22, 23], and the treatment category (e.g., surgery) [24], which was also consistent with our results. Regarding the relative importance of each variable in the model, the status of the last appointment, age, time of the day, lead time, and history of previous attendance are among the most important variables in the non-attendance predictive models presented in various analyses [12, 22, 25]. In our study, the history of previous attendance and the status of the last appointment also had a high weight in both models. In contrast, lead time and age were mainly important in pneumology and dermatology models, respectively. The time of the day had a small weight in both models.

Based on the performance results of the training algorithms, we chose decision trees to build our models, which was the second most frequently used algorithm to develop predictive models in the review of Carreras et al., after logistic regression [14]. The accuracy values reported in the review for models based on decision trees ranged from 76.5% to 89.6%, higher than the accuracy found in our analysis. However, most studies had a limited sample size and/or used the same dataset for training algorithms and assessing their performance. Alternatively to this approach, which may lead to overfitting, we used an independent dataset for model validation. Therefore, although lower than reported elsewhere, we think our results may better reflect the expected accuracy of the model when applied to the real-world.

Regardless of the validation approach, most studies reported accuracy values lower than the attendance rate [14]. This trend, also observed in our analysis, may be explained by the lack of data from other domains such as social, cultural, and socioeconomic factors that might have a relevant contribution to non-attendance behavior. Finally, we observed a poorer performance of the pneumology model compared with the dermatology model, which might also be due to differences in outpatient procedures and patient complexity between services. These findings suggest that service-specific characteristics and predictors from other domains should be included in the development of prediction models for non-attendance.

Like in our pilot study, other authors have reported non-attendance reductions after implementing reminding strategies based on phone calls [26] or, most frequently, short message services (SMS) [9,10,11]. However, phone calls are more expensive than SMS [9, 27], and both interventions have high costs for healthcare centers. Irrespective of the type of reminder, predictive algorithms may help to prioritize patients at higher risk of non-attendance, which is likely to improve the cost-effectiveness of the intervention. Furthermore, the quantitative approach to the prediction of non-attendance allows combining more or less compelling interventions based on different thresholds of non-attendance risk (e.g., SMS at risk between 50%-90%, and phone calls at risk ≥90%).

A remarkable consequence of our intervention for reducing non-attendance was the overloading of hospital agendas, highlighted during the debriefing held after the pilot study. This perception, which is consistent with the effectiveness of the measure, indicates that medical appointments were routinely scheduled on an overbooking basis, assuming certain level of non-attendance. Hence, the potential consequences of improving efficiency in healthcare systems should be considered before implementing these types of solutions. Another concern raised during the debriefing session was the cost (in terms of time spent by administrative staff) associated with phone calls to individuals at higher risk of non-attendance. The economic impact of this solution can be minimized by implementing call centers shared by various centers or investigating the optimal cut-off of non-attendance risk for a patient to be included in the intervention. For cut-off selection, other approaches like the efficiency curve (similar to the Lorenz curve used in economics) could be explored [28]. Nevertheless, cost-effectiveness analyses that consider the cost associated with non-attendance should be conducted before drawing conclusions on the actual economic impact of this intervention.

The interpretation of our results is limited by the simultaneous assessment of the predictive model and the intervention itself (i.e., phone call reminder), which precluded appraising the contribution of each feature to the non-attendance reduction. However, the main purpose of our pilot study was to assess the applicability of the whole concept to day-to-day practice. Another limitation was the unavailability of data with potential influence on the non-attendance rate, such as the economic status [29, 30], education level [31, 32], or certain medical conditions [20, 33]. As discussed previously, the lack of social information is common in the development of predictive algorithms elsewhere. Regardless of the future inclusion of these data, the model should undergo continual learning by retraining to assure its validity through time, including the seasonal perspective, which is likely to influence the outcomes. The model has to be aware of new patients or categorical features, as well as considering up-to-date data to include the latest trends of non-attendance in each hospital service. Alternative analytical approaches, such as logistic regression analysis, could also be explored.

Conclusions

The results of our study show that the use of non-attendance predictive models can be a valuable tool to identify patients at higher risk of non-attending a medical appointment and should be, therefore, prioritized for active reminders such as phone calls. The overloading of the hospital agenda experienced as a consequence of the effectiveness of the intervention underscores the need to consider organizational changes when implementing interventions for reducing non attendance rates. The free availability of our algorithm warrants future research to adapt it to other patient profiles and assess the cost-effectiveness of interventions based patient stratification according to the risk of non-attendance.