FormalPara Key Points

MyFORTA confirms most errors reported by manual assessment of the FORTA score, and adds reliable error information mainly resulting from data gaps on medications.

MyFORTA is the first validated automated tool to assess and propose complex medication schemes in older people.

1 Introduction

Multimorbidity is highly prevalent in aging Western societies and often leads to polypharmacy, meaning the use of multiple medications, in older patients [1,2,3,4,5]. Polypharmacy is often associated with inappropriate drug treatment and frequently results in preventable adverse clinical outcomes such as functional status decline, impaired physical function, cognitive decline, hospitalization and even death [2, 6,7,8,9,10]. Listing approaches have been developed to cope with this challenge; most of them do not require intricate patient knowledge and focus on de-prescribing only, such as the Beers Criteria®; only few require intricate knowledge regarding diagnoses, severity, functionality and patient’s wishes/needs and therefore aid physicians in addressing the problem of undertreatment as well as overtreatment. In general, these listing approaches support the individualization of drug treatment. Some listing approaches such as the Beers Criteria® [11] mainly represent a negative list of medications. In contrast, the Screening Tool to Alert Doctors to the Right Treatment (START)/Screening Tool of Older Persons’ Prescriptions (STOPP) criteria [12] or the Fit fOR The Aged (FORTA) list [13] combine positive and negative labeling of drug treatments. The latter lists require more intricate knowledge of patients’ diagnoses, their severity and physical and mental abilities and are therefore not applicable by just interpreting the drug list. Since successful clinical validation has been largely restricted to the latter approaches, we have proposed using the term ‘patient-in-focus listing approach’ as opposed to ‘drug-oriented listing approach’ [4]. As an example, FORTA has been validated in a randomized controlled trial (VALFORTA) in older hospitalized patients and significantly ameliorates the quality of medication as measured by the FORTA score. In addition, important clinical outcomes such as adverse drug reactions and the activities of daily living (ADL) were significantly improved by the FORTA intervention [14].

The quantitative assessment of FORTA errors leads to the FORTA score (sum of overtreatment and undertreatment errors); it requires skill, experience and time; thus, wider implementation is probably restricted.

Here, an automated, algorithm-based tool (MyFORTA [MF]) is validated against the manually determined FORTA scores (gold standard [GS]/current standard) in patients from the AgeCoDe study for the first time.

2 Methods

2.1 Study Population

The Study on Ageing, Cognition and Dementia (AgeCoDe) in Primary Care Patients was a multi-centered (Bonn, Düsseldorf, Hamburg, Leipzig, Mannheim and Munich), population-based longitudinal cohort study starting in 2003/2004. For this study, primary care patients aged 75 years and older who had no dementia at baseline were recruited via general practitioners’ (GP) offices. Follow-up assessments took place on average every 1.5 years [15] and the sixth follow-up concluded in January 2014 [15]. The AgeCoDe study was later extended by the “Study on Needs, Health Service Use, Costs and Health-Related Quality of Life in a Large Sample of Oldest-Old Primary Care Patients (85+)” (AgeQualiDe). Relevant data collected at the sixth follow-up included drug use (ATC codes), age, gender, GP diagnoses and blood pressure; data were electronically entered into the database. Dosages were not documented. In general, FORTA does not address disease severity, this is a function of the demand analysis (see below); exceptions (depression, dementia) were contained in AgeCoDe.

Not all FORTA diagnoses were covered in the database. The alignments of diagnoses have been described in Pazan et al. [25]; for example, gastritis, reflux gastritis, reflux, esophageal carcinoma and gastrointestinal bleeding were considered to reflect the FORTA diagnosis ‘gastrointestinal disease,’ and stroke, cerebellar infarction, stenosis of the afferent cerebral arteries and transient ischemic attacks were aligned to the FORTA diagnosis ‘stroke.’

Further details of these studies are provided elsewhere [15,16,17,18,19,20,21,22,23,24], and in the first paper on the FORTA analysis of those data [25].

These studies were approved by the ethics committees of all the participating centers [18] and were conducted in accordance with The Code of Ethics of the World Medical Association [26]. In addition, this study was covered by the ethics approvals that were previously received (reference number: 2007-253E-MA) and it was recently re-evaluated and approved by the ethics committee in Mannheim, University of Heidelberg (reference number: TEMP558252-AF 11).

According to a sample size analysis, 310 patients from this cohort would enable the detection of a minimal clinically important difference of one score point at a standard deviation of 2.7 (power = 0.9).

Of 504 participants from the sixth follow-up of the AgeCoDe study (for which the quality of drug treatment according to FORTA was previously determined by a person/evaluator), 331 were randomly assigned to this study by patient ID and included.

These evaluator-based FORTA scores, which were manually determined (gold standard [GS]/current standard), have already been published by Pazan et al. [25].

2.2 Determination of the FORTA Score

The FORTA list assigns four FORTA classes to drugs that are defined as follows:

  • Class A (A-bsolutely) = indispensable drug, clear-cut benefit in terms of efficacy/safety ratio proven in elderly patients for a given indication

  • Class B (B-eneficial) = drugs with proven or obvious efficacy in the elderly, but limited extent of effect or safety concerns

  • Class C (C-areful) = drugs with questionable efficacy/safety profiles in the elderly, to be avoided or omitted in the presence of too many drugs, lack of benefits or emerging side effects; review/find alternatives

  • Class D (D-on’t) = avoid in the elderly, omit first, review/find alternatives

These assessments are specific for age-relevant diagnoses and a given medication list can be checked against these recommendations. The related FORTA score is the sum of medication errors classified as overtreatment and/or undertreatment errors in an individual patient as checked against these labels. An error was counted if an indication was not appropriately treated though beneficial options (FORTA A or B) exist (undertreatment) or if a prescription was suboptimal regarding the FORTA categories (e.g., FORTA C, though A or B drugs exist) or not indicated (overtreatment) [14]. Proton pump inhibitors (PPIs) are often not indicated and would trigger an overtreatment error; oral anticoagulation is strictly indicated in atrial fibrillation, and the absence of a positively labeled oral anticoagulant (e.g., apixaban) would be considered an undertreatment error. To apply FORTA, a demand analysis for drug treatment is required and the determination of the FORTA score relies on it. It depends on (i) relevant diagnoses, (ii) severity and current level of control as checked against current recommendations and practice, (iii) current treatments including drugs and reported experiences, (iv) co-morbidity and functional status with particular focus on geriatric syndromes. Further details about the FORTA score are provided elsewhere [14, 27], and in the first paper on FORTA in AgeCoDe [25].

The recently developed MyFORTA algorithm [https://optimedis.de/forta/] (MF) was applied to the data of the 331 patients from follow up visit 6 by inputting ATC and ICD codes and by answering the provided clinical questions through web-based access to the program. As MF is proprietary, its code (including assessments and clinical questions) is not public, and use needs to be licenced. This algorithm was developed by the senior author and OptiMedis AG (Hamburg, Germany) from the simpler algorithm FORTA-EPI [28], which detects deviations of medications from the FORTA-based recommendations just by aligning ATC and ICD codes. FORTA-EPI does not cover individual characteristics such as actual blood pressure values, but detects deviations from optimal medication if (a) not the best FORTA drugs are used to treat a given FORTA diagnosis, or (b) no adequate diagnosis can be found for a given drug (e.g. PPIs not indicated by reflux or prophylactic indications), or (c) FORTA D drugs are present. It has been proven to be useful for the analysis of large data samples and provides epidemiologically relevant information on medication quality in larger patient cohorts, but not medication adequacy for individual patients.

MyFORTA adds clinical information to this initial assessment by FORTA-EPI in that the program generates clinical questions from the FORTA diagnoses encoded to be answered by the patients’ doctor, in most cases the general practitioner. These questions are kept as simple as possible, utilizing tick boxes (e.g. diagnosis X treated well or not sufficiently) or entries of simple numbers (e.g. blood pressure). Examples of questions generated by the MyFORTA algorithm are “Do medication intolerances (allergies, previous negative therapy attempts due to intolerance) exist?”, “How is systolic/diastolic blood pressure”, “Is the renal clearance below 30 mL/min?”, “How severe is the pain on a scale from 0 (no pain) to 10 (most severe pain imaginable)?” or “Do you consider treatment of depression successful, moderately successful or insufficient?”. The answers to these questions are assessed by the algorithm to determine the individual demand for drug treatment of the patient’s diagnoses. It is programmed to compare the entries with accepted standards of treatments, mainly covered by geriatric recommendations. Some examples are given here: systolic blood pressure values in excess of 140 mmHg would be identified as needing one additional blood pressure lowering drug, which would lead to the recommendation of the best, so far unused drug in the FORTA list. If the numerical pain scale shows ‘0’, overtreatment is assumed, and deprescribing of one pain medication with the worst FORTA score, or dose reduction would be recommended by the algorithm. If the pain scale shows 4, an additional, as-yet unused pain medication with the best FORTA score would be recommended. The computed rules for demand assessment are matter of update if guideline recommendations change, and if guideline recommendations are missing or vague for older people, must be considered as consensual. These clinical rules are not contained in the FORTA list or just given in exceptional cases (drug treatment for depression only in moderate or severe stages). The same rules have been applied in the manual assessment of the FORTA score as the gold standard; as such, they have evolved over the 15 years of manual FORTA assessments by our group. Their implementation in the algorithm followed these established rules as closely as possible; yet some exceptions or ramifications in clinical routines may have been missed for the sake of implementability: the questions should be a) few, (b) tickable/allow for entry of simple numbers and (c) reflect simple categories (good/intermediate/bad).

It is expected that on top of routine questions (e.g. on known allergies) three or four questions per patient need to be answered by the responsible physician; to answer them should not require more than 1–2 min if the patient is known and the records are at hand (e.g. lab values).

2.3 Statistical Analysis

Score comparisons were performed by the Wilcoxon rank-sum test. Statistical significance was assumed at p < 0.05, and the correlation between GS and MF was analyzed by Pearson correlation. Statistical analyses were performed using SAS Version 9.4 software for Windows (SAS Institute Inc., Cary, NC, USA).

3 Results

The mean age of the 331 patients was 88.0 years (29.1% male) and thus not statistically different from the entire cohort [25]. The previously determined [30] evaluator-based mean FORTA score (gold standard [GS]/current standard) for the 331 participants was 6.02 ± 2.52. The mean algorithm-based FORTA score (MF score) for the same participants was 9.01 ± 2.91, with this difference being statistically significant (p < 0.00001). The MF score was reduced to 7.5 ± 2.7 after removing the undertreatment errors detected by the MF algorithm for undertreatment with calcium/vitamin D (see also Discussion section) and influenza/pneumococcal vaccination (undertreatment with calcium/vitamin D = 244, vaccination undertreatment errors = 262). GS was corrected by removing the errors for undertreatment with calcium/vitamin D (vaccination undertreatment errors were not present, N = 14). Consequently, GS remained stable at 5.98 ± 2.55. This difference was significant at p < 0.00001. In addition, a strong correlation (correlation coefficient = 0.63339, p < 0.0001) between the corrected GS and MF FORTA scores was observed. A scatter plot of GS and MF values is provided in Supplementary Figure 1 (see electronic supplementary material [ESM]).

We investigated the reasons for the remaining deviations between MF and GS by analyzing a sub-group of 100 patients (‘top 100’) with the highest degree of deviation in the FORTA scores (MF vs GS).

Figure 1 shows the most relevant diagnoses with the highest difference between MF and GS.

Fig. 1
figure 1

Most relevant diagnoses for which differences in the evaluation of the MyFORTA (MF) algorithm-based FORTA score and the evaluator-based FORTA score were detected. The total number of deviations is aligned to the diagnosis indicated for the top 100 patients with the largest score differences

As can be seen, cardiovascular diagnoses were leading the list, followed by diabetes mellitus type II and chronic pain.

Figure 2 shows the items which created the largest individual deviations. Omitted nitro spray accounted for 84 errors (in 100 patients) only detected by MF. This simply reflects the fact that acute application of nitro spray must be possible, as this drug is FORTA A, but in most cases there is no entry in the recent ATC drug list by GPs as it may have been prescribed 2 years ago and is still available. The GS evaluator suppressed this error because this omission obviously did not indicate undertreatment in most cases as the prescription may have occurred years before the assessment and is no longer in the list, but the computer was rigid and consequent.

Fig. 2
figure 2

Most frequent individual drug errors. Blue columns are those detected by the MF algorithm, but not by the manual evaluation (GS); those detected by GS, but not by MF, are depicted in orange (for the top 100 patients). ARB angiotensin II receptor blocker, GS gold standard/manual evaluation, MF MyFORTA, PPI proton pump inhibitor, 1 Chronic therapy after myocardial infarction, 2 Arterial hypertension, 3 Chronic pain, 4 Dementia, 5 Depression, 6 Atrial fibrillation, 7 Acute coronary syndrome, 8 heart failure, 9 Diabetes mellitus type II, 10 Gastrointestinal illness

Undertreatment by platelet inhibitors was observed in 66 cases by MF only; MF is clearly instructed to ask for timelines, as platelet inhibition for cardiac patients must continue as single platelet inhibition beyond 1 year after a myocardial infarction, a fact that seemingly has not been detected as precisely by GS.

In addition, MF was able to detect 43 more cases of drugs without diagnosis as compared with GS. In 30% of the cases, the detected overtreatments were due to the use of PPIs (pantoprazole/omeprazole) without proper indication (e.g. more than one antithrombotic drug). In one participant, the use of PPI was correct (MF incorrect) because acetylsalicylic acid was used concomitantly with other antithrombotics in this patient. In 16% of cases, high-dose acetylsalicylic acid (500 mg) was prescribed without appropriate diagnosis (e.g. pain). This was not obvious to the evaluator (GS) as the listed drugs did not usually include their dosage. In contrast, MF was able to detect the dosage of some drugs by using their ATC code (e.g. N02BA01) instead of their name. Consequently, seven cases of drugs without diagnosis and seven cases of acetylsalicylic acid or clopidogrel missing (see Fig. 2) were detected by MF but not by GS.

Moreover, the MF algorithm identified the absence of a DPP4 inhibitor in 20 patients diagnosed with type 2 diabetes. In half of these cases, the MF algorithm flagged the sole use of metformin and recommended the additional administration of a DPP4 inhibitor, as metformin is FORTA B and should be preceded by DPP4 inhibitors (FORTA A). Additionally, exclusive insulin therapy (FORTA B) in the remaining 10 cases similarly resulted in an error classification of "DPP4 inhibitor missing."

The MF algorithm detected missing β-blockers in 16 cases. Of those, 11 cases with chronic therapy after myocardial infarction were missing β-blockers although β-blocking agents should be given for 3 years after the acute event according to the FORTA list. This undertreatment error was assumed by the MF algorithm as patient data on the time that had elapsed after the acute event were missing in these cases. Of course, those patients might not qualify for this undertreatment as the acute event may have occurred over 3 years ago. In terms of an increased sensitivity, the MF algorithm was trained to assume the worst case if data were missing so as to generate a notification to check this. Therefore, the MF algorithm worked as it should. It is notable that the use of β-blockers after myocardial infarction is debatable according to newer guidelines, and the FORTA list and subsequently the MF algorithm will be updated to reflect this. Three patients had additional atrial fibrillation and heart failure and one patient had only additional atrial fibrillation; in these diagnoses, β-blocker therapy is absolutely indicated and correctly evaluated by the MF algorithm as a missing mandatory medication.

According to the FORTA list, the MF algorithm deems atorvastatin as a necessary medication for the management of patients with acute coronary syndrome. This leads to 14 divergent assessments with GS regarding simvastatin as an equivalent option in those patients, although this statin is no longer the optimal one. Thus, MF correctly indicated this error.

In 12 cases, MF detected that the same or similar drug was prescribed twice. Regarding the prescription of torasemide to four patients with the diagnoses arterial hypertension and heart failure, the MF algorithm criticized it due to the simultaneous administration of hydrochlorothiazide (HCT, sequential nephron blockade). However, manual evaluation considered this dual therapy in this patient group as a logical escalation of treatment, though the clinical input did not indicate the need for escalation.

For another group of eight patients receiving both long-acting and short-acting insulins for the treatment of type 2 diabetes, the automated assessment regarded the dual medication as non-compliant with FORTA guidelines. Manual evaluation, on the other hand, also assessed this medication in both groups as a logical escalation of therapy. This is the only example in which the MF algorithm did overreact as the concomitant use of long- and short-acting insulins is standard, but both types are summarized into one line and FORTA does not allow two drugs from the same group. The MF algorithm can be improved in this point with a new exclusion rule.

The absence of escitalopram was criticized by the MF algorithm in 11 cases. Manual evaluation, on the other hand, deemed these cases correct because the severity of depression was not considered in the MF algorithm input. In these 11 cases, it was found that only mild depression was present, which did not require treatment with escitalopram. The discrepancy was attributed to an input error, with the MF algorithm reaching the same result with information about disease severity.

According to the FORTA list, the treatment of heart failure requires, among other things, the prescription of a diuretic with sufficiently high blood pressure. In 11 cases, manual evaluation did not consider this as an error, while the MF algorithm, in accordance with guidelines, indicated it as missing mandatory medication.

In nine cases, PPI was missing according to MF. The prophylactic indication for the administration of PPIs was identified in four patients who were also receiving a selective serotonin reuptake inhibitor (SSRI) and acetylsalicylic acid (ASA). In one case, a potentially bleeding-promoting combination of clopidogrel and ibuprofen was identified and criticized. In two other patients, a prescription of diclofenac in combination with ASA was observed. This combination would no longer be considered faulty if diclofenac were substituted following the guidelines, but the MF algorithm correctly evaluated it as such. The simultaneous use of ibuprofen and prednisolone was also recognized as increasing the risk of gastrointestinal bleeding, thus indicating a prophylactic PPI therapy in this case. Furthermore, the MF algorithm correctly detected a potentially bleeding-promoting combination of phenprocoumon and diclofenac and evaluated it as an error.

Table 1 lists the major diagnoses in those top 100 patients included in the former analysis, showing the predominant diagnoses to be cardiovascular diagnoses, pain and dementia. The latter is expected as these patients participated in a study focused on dementia.

Table 1 The most frequent diagnoses in the top 100 patients, exposing the largest deviation of FORTA scores when MyFORTA (MF) and their manual determination (gold standard [GS]) were compared

4 Discussion

The need for improving drug treatment in particular for older, multimorbid patients is evident and most efforts so far seem to concentrate on ‘de-prescribing’, where ‘bad’ drugs are removed; this approach has been disappointing [4, 29, 30], whereas combined approaches addressing both overtreatment and undertreatment seem to be clinically more efficient [4, 14, 31]. OPERAM and SENATOR, two recent large clinical trials on START/STOPP (a ‘patient-in-focus listing approach’), failed despite this fact [32, 33]. In both studies, the main reason for failure was seen in the inefficiency of implementing the recommendations in clinical reality; the time to read the issued documents may have been too long, the clinical relevance of recommenations was seen to be too low, and the most important driver of implementation—personal interaction/training—was missing/too weak. It is therefore pivotal to not only design tools that are principally capable of improving both overtreatment and undertreatment, but also to address implementability.

An automated approach to apply such a tool seems to be very attractive in this respect. Here, we validated MyFORTA against the gold standard evaluator-based FORTA scoring and could demonstrate a surprisingly high concordance (with moderate correlation) of results. Importantly, MyFORTA not only reports the number of FORTA-related errors (overtreatment and undertreatment) and, thus, the FORTA score, it also gives advice on how to replace a drug with a better one, it points to caveats in its correct use and it comments on common problems such as allergies. MyFORTA therefore provides the physician with very easy instructions for use, resulting in a medication proposal that is individualized, optimized and streamlined to help older patients with complex multimorbidity and functional disabilities.

The differences between MyFORTA and GS ratings were related to understandable problems in medication reporting, with the computer algorithm/MF being more objective than interpretative manual evaluation. From the results reported here, about 1.0 of the 1.5 FORTA score points of difference were explainable by such observations. No error cluster seemed to render the MF algorithm insufficient in particular aspects, with the circumscript exception of allowing for the concomitant use of short- and long-acting insulins, and thus did not induce major developmental changes of the MF algorithm.

A detailed analysis of the discrepancies between the gold standard and the MF algorithm showed that critical questions should be asked, in particular if no or missing information on drug therapy is to be expected; for example, nitro spray must have been prescribed to coronary patients, but that may have happened long ago, and is not detected by the current medication analysis. The GS evaluator simply assumed that ‘someone’ had prescribed nitro spray in the past and did not rate it as an undertreatment error, but was in ‘good faith’. Obviously, the ATC annotations by GPs only list those drugs prescribed in the past or very recent 3-month period, but certainly not a treatment that was prescribed 2 years previously. To ask if this is correct is beneficial, not superfluous, rendering the MF algorithm more useful than the gold standard. The computer generates an error that only indicates that the doctor should check whether that prescription has been dispensed (and the patient is still in possession of the spray). We think this slight ‘over-alerting’ is still useful.

MF also seemed to be more stringent in checking on prophylactic (e.g. to counteract bleeding by antithrombotics) PPI indications, a very relevant area mainly of overtreatment, but sometimes also of undertreatment.

This also applies to DPP4 inhibitors, the only FORTA A antidiabetic drug. This may be counterintuitive if newer guidelines are consulted, but reflects the limited life expectancy of FORTA patients (geriatric patents) in whom endpoint effects become less important. The most recent German guideline (NVL Nationale VersorgungsLeitlinie (NVL) Typ-2-Diabetes – Version 3 Typ-2-Diabetes — Leitlinien.de) [34] reflects this preference for DPP4 inhibitors by the societies of diabetes and internal medicine, though not general medicine.

The MF algorithm did create more undertreatment notifications, namely for vaccinations and vitamin D/calcium, accounting for half of the original score difference of 3. As the database did not contain information on vaccination, and an undertreatment notification would have been generated in all cases, it was decided to remove this item. In clinical practice it is difficult to obtain information on vaccination status from the available data sources as—similar to the nitro spray case—it may have happened long before the medication assessment was performed, and is thus no longer present. It should be encoded as an ICD code, however this is frequently missing. In practice, the MF algorithm would not be changed and the doctor would be asked via the error notification to check the vaccination status of the patient. Regarding vitamin D/calcium, the MF algorithm is instructed to rate missing vitamin D/calcium in all patients over the age of 75 years, while the GS required a related diagnosis (osteoporosis), which was much rarer than old age, as all patients had to be over 75 years old in this study. This is still a matter of debate, although the authors tend to support the stricter handling of the issue by the MF algorithm, which will not be changed in this regard.

The amount of time required for the application of MyFORTA depends on the availability and interfacing of ATC and ICD codes. Apart from this technically manageable problem, the only time consumption for the physician is to tick boxes or answer questions by providing simple numbers. This may be done in < 1–2 min if the patient report is to hand.

5 Limitations

As the analysis was performed on a completed study, not all information and diagnoses relevant to the FORTA assessments were available; some FORTA diagnoses had to be constructed from available ones. The alignment of diagnoses may have affected the results for both MF and GS. Thus, this validation strictly applies only to those diagnoses available and, thus, may not be representative for patients in whom all FORTA diagnoses are principally accessible.

The same holds true for the medication assessment, which may have been incomplete at the data source and cannot be corrected after completion of the study milestones.

The cases included were selected from a larger cohort depending on the availability of information essential for this secondary analysis. This selection may have conveyed further bias.

The MF algorithm reflects current gerontopharmacological practice, in particular regarding the demand analysis for drug therapy. As is common in geriatrics, several aspects of this analysis are consensual and lack a sound evidence base, they may also reflect the biases of the developers of the MF algorithm.

6 Conclusion

MyFORTA confirms most errors reported by manual assessment of the FORTA score, and adds reliable error information mainly resulting from data gaps on medications. The MF algorithm thus may be considered to be validated against the gold standard of assessment. It is the first validated automated tool to assess and propose complex medication schemes in older people.