Patient-reported outcome (PRO) measurements in chronic and malignant diseases: ten years’ experience with PRO-algorithm-based patient-clinician interaction (telePRO) in AmbuFlex

Background Patient-reported Outcome (PRO) measures may be used as the basis for out-patient follow-up instead of fixed appointments. The patients attend follow-up from home by filling in questionnaires developed for that specific aim and patient group (telePRO). The questionnaires are handled in real time by a specific algorithm, which assigns an outcome color reflecting clinical need. The specific questionnaires and algorithms (named solutions) are constructed in a consensus process with clinicians. We aimed to describe AmbuFlex’ telePRO solutions and the algorithm outcomes and variation between patient groups, and to discuss possible applications and challenges. Methods TelePRO solutions with more than 100 processed questionnaires were included in the analysis. Data were retrieved together with data from national registers. Characteristics of patients, questionnaires and outcomes were tabulated for each solution. Graphs were constructed depicting the overall and within-patient distribution of algorithm outcomes for each solution. Results From 2011 to 2021, 29 specific telePRO solutions were implemented within 24 different ICD-10 groups. A total of 42,015 patients were referred and answered 171,268 questionnaires. An existing applicable instrument with cut-off values was available for four solutions, whereas items were selected or developed ad hoc for the other solutions. Mean age ranged from 10.7 (Pain in children) to 73.3 years (chronic kidney disease). Mortality among referred patients varied between 0 (obesity, asthma, endometriosis and pain in children) and 528 per 1000 patient years (Lung cancer). There was substantial variation in algorithm outcome across patient groups while different solutions within the same patient group varied little. Discussion TelePRO can be applied in diseases where PRO can reflect clinical status and needs. Questionnaires and algorithms should be adapted for the specific patient groups and clinical aims. When PRO is used as replacement for clinical contact, special carefulness should be observed with respect to patient safety. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-022-03322-9.


Background
The term Patient-reported Outcome (PRO) was coined by the US Federal Drug Agency to standardize the use of such data to support labeling claims in medical product development [1]. Interest in using PRO data, also at the individual patient level, is growing [2][3][4]. PRO data can be used during

Use of telePRO
Algorithm-based telePRO consist of three elements: the PRO data, the PRO-based algorithm, and the presentation of the PRO measures in a graphical overview [17]. The technology for the elements is generic, but configurable for each solution (each specific patient group and clinical aim), e.g., screening for symptom deterioration and need of type of contact and as a treatment decision tool. In a solution with the main purpose to screen for the patients' need of contact, a green, yellow, or red algorithm outcome color is used based on a "red flag" approach. A green outcome reflects no actual need of clinical attention. However, the patients are allowed to overrule the PRO-based algorithm by indicating a wish for contact. A questionnaire has a red outcome if just one item in the algorithm is flagged red, while a green outcome is applied if all flags are green. All other questionnaires have a yellow algorithm outcome. Since the algorithms are solution-specific, the meaning and consequence of the outcome colors differ between solutions. In some solutions, green outcomes are handled automatically by the AmbuFlex software, while yellow and red outcomes are reviewed and evaluated by a clinician. The principle of AmbuFlex is further explained in Figs. 1 and 2. The development of the solution-specific questionnaires and algorithms is described elsewhere [17].

Aim
The aim of this paper was (1) to provide an overview of all AmbuFlex's specific telePRO solutions, (2) to describe the algorithm outcomes and variation in outcomes, (3) to discuss similarities and differences between patient groups in terms of demographic characteristics and algorithm outcomes, and (4) to highlight possibilities and challenges in the use of telePRO.

Selection of solutions
Included in the analysis were AmbuFlex solutions using algorithms developed for research or routine use if more than 100 processed questionnaires were available. Solutions in identical patient groups using similar questionnaires and algorithms were merged before analysis.

Data collection
Questionnaire data and the results of the algorithms were retrieved from the internal database together with information on the patient's sex, age, and vital status and was last updated January 15, 2022. Information on vital status is automatically retrieved online by the AmbuFlex system from the Danish civil registration system [18]. Mortality of referred patients was calculated for each solution with person-years measured from the date of response to the patient's first questionnaire to the date of death or the last vitality status update. Total observation time in Ambu-Flex is the sum of patient's individual time span between the date of first and last answered questionnaire per solution. Information on algorithm outcome is recorded in the AmbuFlex system for each questionnaire with the outcome colors green, yellow, or red.

Data analysis
Descriptive tables were constructed using AmbuFlex's own software [15]. Algorithm outcomes were anonymized and transferred for further analysis in the R statistical software package [19]. The ranking of the three algorithm outcome colors is the same for all solutions (red is more severe than Patients complete a telePRO questionnaire developed for that specific patient group and aim at pre-defined individual intervals, e.g., 3 months. The system prompts patients to fill in the PRO through "e-Boks" (secure national e-mail platform). The epilepsy telePRO includes 47 items covering number of seizures, medicine adherence, symptoms, general health, and psychosocial function measured using the WHO-5, items from the SF-36, SCL-92 and ad hoc developed items. An item covers the patient's wish of contact to ensure that patients always can get an appointment. As part of devel-opment, an expert group has marked the response categories in the telePRO with a green, yellow, or red color based on a flag approach. Red flag: need of clinical attention (e.g., planning pregnancy, seizure impairments, suicidal thoughts, or if the patient wishes contact). A green flag indicates no need of clinical attention, a yellow flag possible need of attention, and a red flag need of attention. "All-green" outcomes are managed automatically by the AmbuFlex system and a new telePRO is sent to the patient at the pre-defined interval, while red and yellow algorithm outcomes are reviewed by a clinician (Fig. 2). (Color figure online) Fig. 2 Screenshot of the clinician's PRO overview. Example: Ambu-Flex/epilepsy. The telePRO responses are presented in a graphic overview inside the electronic health record (EHR) system. All red and yellow algorithms outcomes are shown to the clinicians on an alert list. For red outcomes, the clinicians contact the patient either by telephone or by an in-clinic appointment. For yellow outcomes, the clinicians evaluate the PRO data together with other available data and contacts the patient if necessary. (Color figure online) yellow, which is more severe than green). In most solutions, the difference in consequences between a yellow and a red algorithm outcome is smaller than the difference between a green and a yellow outcome. In some solutions only two colors were applied (green/red or yellow/red). To allow comparison across solutions, severity grade values of 0, 2, and 3 were assigned to green, yellow, and red outcomes and used to rank the questionnaires from each patient (Table 1). Each questionnaire can have one of three outcome colors, and therefore a patient with at least three answered questionnaires may have one of seven combinations of algorithm outcomes (severity group). Graphs were constructed for each solution depicting the frequency and variation in algorithm outcomes. Before plotting, patients were sorted by severity group. The total area of each color represents the overall proportion of that algorithm outcome, while the within-group variation is represented for each severity group. Components of variation in algorithm outcome severity score (within-and between-patient) were calculated for solutions with more than one answer from each patient. The anovaVCR function in the R VCR package was used to calculate components of variations in unbalanced designs [19,20]. The square root of variation was used for tables and plots to maintain interpretable values (severity grade).

Results
A total of 29 specific solutions in 24 diagnostic groups were included covering 42,015 referred patients from 89 hospital departments all over Denmark. One department may refer patients to more than one solution and the number of unique departments was 48 while the number of unique hospitals was 22. Also, the same patient may be referred to more than one solution, e.g., cancer patients may attend different solutions at different disease stages, one during active treatment and another during follow-up. Furthermore, patients may have several diseases corresponding to different solutions. There were 41,144 unique patients, 871 of whom had attended more than a single solution, and 16 had been referred to three solutions.

Algorithm aims
The aims for the algorithms could be divided into four groups, shown by examples in Table 2 and tabulated for each solution in Table 3. The first aim, need of clinical attention ("Need"), represents the original purpose of AmbuFlex, namely PRO-based out-patient follow-up, where PRO, not hospital visits, form the basis for the contact. In some solutions, questionnaires with green algorithm outcome was handled automatically by AmbuFlex' web-server, and a new questionnaire scheduled after a patient-specific assigned interval (e.g., 3 months) (n = 7 solutions, Table 2 and 3), while in 14 solutions questionnaires with green outcomes was reviewed and the green color used to support the decision if a visit was indicated or not. The second aim ("Path," n = 3 solutions) used telePRO to select the most relevant type of clinical path, e.g., a telephone or in-clinic consultation with a doctor or a nurse. The third aim ("Treatment", n = 2 solutions) used telePRO to decide if, e.g., planned antineoplastic treatment should be postponed. Frequently, side effects incompatible with a treatment are not discovered before the patient shows up for treatment, wasting time as well as expensive prepared medicine. The aim "Instruction" used algorithms to generate patient-specific on-screen messages or letters with instructions to the patient based on the PRO. This was implemented in three disease groups: bladder cancer [21,22], immune therapy for malignant melanoma [23], and screening for depression in patients with ischemic heart diseases [24].

Diseases
TelePRO was implemented in a broad range of conditions including nearly all ICD-10 main groups, the highest number of solutions being in malignant (n = 8) and neurological diseases (n = 7) ( Table 3). The most diverse use was in malignant diseases, which apart from out-patient follow-up also applied telePRO during active treatment (IT and M3, Tables 2 and 3) and to detect disease progression (PW). AmbuFlex is also used among cancer inpatients and patients attending palliative care, although without use of algorithms. Table 1 Grouping of telePRO outcomes by severity based on algorithm outcome colors in all questionnaires from each patient Green, yellow, and red algorithm outcomes were assigned the severity grade values of 0, 2, and 3 to and reflect that the difference in consequences between a green and a yellow algorithm outcome is larger than the difference between yellow and red in all solutions. Each questionnaire can have one of three algorithm outcome colors, and hence patients with at least three answered questionnaires may have one of seven combinations of algorithm outcomes that define the patient's severity group a All items in algorithm with green color codes b At least one item with red color code    Table 4) was in sleep disorder (SA). The median number of questionnaires from each patient ranged from a single questionnaire to 86 in patients with COPD (KO). In lung cancer (PW), 55% of questionnaires came from patients delivering 50 or more responses (Table 5), while the same was the case for 96% in COPD (KO). At the beginning of the period, most responses were collected by paper questionnaires (up to 92% in the patients with knee arthrosis, a solution that ran from 2011 to 2013), while in the current solutions nearly all patients are contacted by secure e-mail and questionnaires are answered online. This significant development in our PRO data collection is described elsewhere [16].

The algorithms
The algorithms were unique for each solution because they are based on specific questionnaires [9,17,25,26]. Examples of algorithms and meaning of color codes are shown in Table 2 and Supplemental Table 1. In four solutions, the core of the algorithm was based on group-validated questionnaires with fixed threshold values (Table 3). In the remaining solutions, no relevant instruments or threshold score values were available, and the algorithms were constructed as series of single items or scales, each addressing a clinical issue. We used SF-36 [27], SCL-90 [28] and the EORTC Item Library to select items [29]. If an item could not be located, a new item was created ad hoc, typically with response categories adapted from EORTC ("Not at all/A little/Quite a bit/Very much"). Questions regarding general health were collected from SF-36 [27]. At least one question regarding general health was asked in 19 (66%) of the solutions. All three colors were used in 23 solutions, green and red in 5, and yellow and red in one solution (Table 4).

Algorithm outcomes
The algorithm outcomes for each solution are listed in Table 6. The content and purpose of the algorithms were heterogenic. Accordingly, the proportion of green outcomes varied between 1 and 59%. A graphical "fingerprint" of algorithm outcomes and intra-group variation is displayed in Fig. 3 for each solution. The total area of each color represents the proportion of that outcome. The within-group variation may be read vertically for each severity group. Some solutions were dominated by one algorithm outcome, e.g., breast cancer (AB) and ischemic heart disease (AK). No or little intra-patient variance (AK, DP) was seen if there was only a single questionnaire for each patient or the patient had been referred recently. In lung cancer (PW), more than 95% of the responses came from patients with variation in algorithm outcomes. Different solutions within the same patient group had similar "fingerprints" although questionnaires and algorithms differed (Table 3). In prostate cancer (P2/P3 and PC), the solutions had a similar distribution of outcomes and a similar pattern within severity groups. The most important difference was a larger proportion of patients with all-red algorithm outcomes in PC, which may reflect referral of more patients with advanced disease. The variation in outcomes (severity grade, defined in Table 1) is described in Table 6 and Fig. 4. The largest variation in severity was found in lung cancer (PW) and the lowest in the proxy solution in epilepsy (EP). After breaking down the total variation in within-and between-patient variation, the highest within-patient variation was 50% (bladder cancer, B3), while the lowest variation was 29% in patients with multiple sclerosis (SC).

Discussion
TelePRO has been applied in 29 specific solutions of Ambu-Flex in 24 different patient groups, thus covering 12 of the first 19 ICD chapters. There were large variations between solutions with respect to patient characteristics (ICD10 group, age, gender, mortality) as well as questionnaire-and algorithm content and algorithm outcomes.

Variation in algorithm outcomes
Variations in algorithm outcomes may be divided into within-patient, between-patient and between-solution. Except for screening purposes with just one measurement, a certain degree of within-patient variation over time is a prerequisite in repeated measurements and was met in most solutions while the considerable between-patient and between-solution merely is a marker for the wide range of applicability of algorithm-based telePRO.

The four different aims of telePRO
Aim "Need", where telePRO is used to evaluate the patient's need for clinical attention, was used in the majority of the implementations. Denis et al. evaluated weekly symptoms reported by patients with lung cancer [12]. Twelve symptom items automatically triggered an alert to the clinicians if a pre-defined threshold was exceeded. A similar set-up was described in a study by Basch et al. [13]. In this study, patients could weekly self-report side effect symptoms after chemotherapy, and e-mail alerts were sent to clinicians if symptom scores worsened by a pre-defined threshold. Armstrong et al. described use of remote PRO with a mobile app during the first 30 days following ambulatory breast reconstruction [11]. Patients reported pain on a visual analog scale and quality of recovery on a nine item questionnaire daily for 2 weeks and thereafter weekly for 2 weeks. Clinicians were alerted by red flags, and abnormally high pain scores or low recovery scores prompted in-person follow-up. A similar approach was applied in an Australian study [30]. Brundage et al. summarize experiences [31] and point out that if PRO data are used remotely between visits, it is important to use pre-defined threshold levels. Decisions regarding the definition of these thresholds must be made by experts with sufficient expertise to weigh the implications of false-positive versus false-negative alerts [32]. In AmbuFlex, clinical experts are involved in defining the PRO-based algorithm thresholds and decide whether a specific response category should be given a green, yellow, or red color. In solutions where green outcomes are handled automatically ("Needauto "), the risk of false negative cases is more important than false-positive cases and a high sensitivity should be a key consideration. Regarding the aim "Instruction," the telePRO algorithm generates an instruction to the patient instead of an alert to the clinician, which basically poses the same demands of sensitivity. PRO-based alerts in the "eRAPID" system [33] included PRO data about adverse events related to chemotherapy treatment. The system provided tailored feedback to patients if they reported severe symptoms. In the case of less severe symptoms, the patients were asked to follow self-management advice. Thus, alerts based on PRO data can be tailored not only to clinicians but also to patients. on when, how, and to whom alerts are directed and whether PRO data are combined in the algorithm with other important data, e.g., a blood test or data from the patients' medical record [31]. In the two aim types ("Path" and "Treatment"), all questionnaires are individually evaluated and therefore false negatives are less problematic.

Limitations
Out-patient groups are the main target for telePRO-based follow-up, but not all diseases and patients are suitable. For a disease to be relevant, evaluation of the patient's state must rely on measures reportable as PRO, which may also include self-measurements. In two solutions we were able to identify the source population of referred patients; rheumatoid arthritis (RA) and epilepsy (AE). Successful referral was related to young age and low disease activity [9,34] and higher socioeconomic status [34]. Target groups was not intended to include very sick patients and a solution should not be a "one-size-fits-all". Each patient should be evaluated before referral and allowed to return to standard follow-up whenever he or she wishes to do so. This is for ethical reasons, but is also a way to monitor and evaluate the telePRO solution.
PRO-based follow-up requires a mentally capable patient. However, in patient groups with mentally disabled persons, proxy versions of the questionnaire may be applied. We did this in the pain in children (SM) and patients with epilepsy (EP) solutions in 231 referred patients compared to 6222 in the main solution (AE) [17].

Questionnaires and algorithms
Traditionally, validated questionnaires are validated for purposes other than telePRO, where the main question in aim "Need" may be expressed as: "Does this patient need clinical attention at the moment?", in aim "Path": "Which type of clinical contact is most relevant?", in aim "Treatment": "Is this patient ready for the planned treatment?", and in aim "Instruction": "What is the most relevant instruction to the patient?". We based the algorithm on a traditionally validated questionnaire and cut-off values in screening for depression [35], hip and knee alloplastic operations [36,37], and rheumatoid arthritis [38]. In all other solutions, algorithms were based on series of single items adapted from item libraries or developed together with clinicians [39]. When using the single-item approach, each item is provided its own cut-off value, making it possible for clinicians to achieve consensus regarding items, cut-off values, and hence the whole algorithm. This process runs in parallel with the development and revision of the questionnaire and takes years to maturate. The first epilepsy solution (AE) was launched in 2011 and has been revised four times. After 5 years without any changes, a national revision is now in progress.

Fig. 4
Standard deviation and components of variation (within-and between-patient) in algorithm outcome. Algorithm outcome for each questionnaire is measured as a discrete variable, severity grade, where green = 0, yellow = 2 and red = 3 (see Table 1). (Color figure online)

Length of questionnaires
Doctors and nurses will often focus on the length of the questionnaire as a critical factor and on the clinical relevance of each item. From our experience, patients are more concerned with the last issue than the first and patients accept long questionnaires if they find the questions relevant. Questionnaires in research-initiated solutions are often longer, which may be accepted by the participating patient because they volunteered to participate, while several of the clinical solutions have become standard care and the patient has to explicitly opt out. A good reason for clinicians to prefer short questionnaires is that both patient and clinicians will expect action to be taken if the patient reports a problem. Examples are depressive symptoms or sexual problems in solutions in specialized departments, where some clinicians expecting such issues to be handled by the family doctor.
There is no simple solution to this problem. In some cases, explicit guidelines have been developed [40,41].

TelePRO vs PRO for consultation support
In most AmbuFlex telePRO solutions, PRO is also used as a tool to enhance the consultation process. During the last decade, an increase in the use of PRO at the patient level has been seen in clinical care. However, PRO has no value in itself; it is the context and actual use that makes the difference. If PRO is an add-on to existing clinical practice, the implementation is very dependent on the commitment of the individual clinicians and in some implementations only a minor part of responses are ever seen by a clinician [16,42].
In telePRO-based follow-up, PRO constitutes the basis itself for the follow-up. Each time a questionnaire is received, it is either handled automatically (green response) or put on an alert list, like incoming lab tests, where it remains until a clinician has reviewed it and decided whether the patient should be contacted or not. Therefore, virtually all questionnaires are used: automatically, as a decision tool, and/or as a basis for patient-clinician interaction in the consultation.

Patient safety
Questionnaires with a calculation of scores or a color code for decision aid are considered medical devices if collected electronically and used in the treatment of patients. As such, telePRO solutions must ensure patient safety and be compliant with EU legislation for Medical Device Regulatory (MDR). Patient safety is a cornerstone, also within the application of PRO in clinical practice. The questionnaire and color code must uncover the defined aim and be understandable and meaningful to patients and clinicians, and the IT system must be reliable and secured. There are standards for the development and test of IT systems, while it is an ongoing process to decide how to validate questionnaires and algorithms, especially with respect to the green algorithm outcomes, where the patient may not be contacted. We are in the middle of this process. In outpatient follow-up, patients are instructed to contact the department, emergency room, or their family doctor in event of sudden health deterioration between appointments. This also solves a potential hazard for PRO-based follow-up if a questionnaire is lost for some reason. In most solutions, non-responding patients are appointed a specific code on the alert list. Also, only patients capable of evaluating and reporting their health should be referred.

The patient perspective
Two of the aims of AmbuFlex are to optimize the use of resources and to promote patient-centered care. Is there a contradiction between the patient's interests and the interests of clinicians and hospital owners? In AmbuFlex's very first years, health administrators and hospital owners in Denmark to some degree considered AmbuFlex as an easy way to cancel appointments for patients with no or little need of clinical attention, but did not acknowledge the resources needed to implement and run it. This view has changed, and telePRO is now merely seen as a tool for achieving better quality of care. Few patients are interested in fixed consultations when there is no need [43] and such patients should be offered standard follow-up. Clinicians also need to see less complicated cases to be able to experience the whole spectrum of a disease; otherwise, they will develop a biased picture of prognosis [44].

Conclusion
TelePRO can be applied in any setting where PRO can be used to evaluate patient clinical status and needs. Solutions are unique with respect to questionnaire content, algorithms, clinical purpose, and patient characteristics. Questionnaires and algorithms should be adapted for each specific patient group and aim.