Introduction

Acute myeloid leukemia (AML) is the second most common type of adult leukemia in the USA with an estimated 19,520 new cases of AML and 10,670 AML-related deaths in 2018. The disease prognosis is poor with a 5-year survival rate of 27% [1, 2]. AML is characterized by bone marrow failure and immature myeloid cell proliferation. The heterogeneity of AML is widely recognized, making the need for cytogenetics and molecular testing an important prognostic tool for post-induction treatment [3].

The risk of developing AML increases with age with over 70% of new AML cases being diagnosed in adults ≥ 55 years [4]. The prognosis is worse for older patients with a 5-year survival rate of 3–8% in patients > 60 years compared with nearly 50% in patients < 60 years [5]. Approximately one-third of AML patients express the mutated form of fibromyalgia syndrome (FMS)-like tyrosine kinase-3 (FLT3) [6]. This mutation can confer an adverse prognosis [7,8,9]. However, recent advances in targeted drug development are expected to improve or maintain both clinical and health-related quality of life (HRQoL) outcomes for these patients [5, 10, 11].

AML treatment is aggressive, and older patients are often unable to tolerate aggressive therapies [12, 13]. Treatment options for older patients with AML in the USA include standard induction therapy (for fit candidates), a hypomethylating agent (HMA), supportive care or novel therapies [14, 15].

Improvements to HRQoL may be of particular value in patients whose life expectancy is expected to be shorter and for those individuals for whom the likelihood of cure is lower [16]. Fatigue is a particularly common symptom that significantly impacts HRQoL and is reported by > 90% of adults undergoing treatment for AML [16]. In addition, there are multiple features of AML that can influence HRQoL including disease-related symptoms (i.e., fatigue, fever or weight loss) and the effects of disease symptoms, treatment and prognosis on emotional and mental well-being. As the physical and functional abilities of the patient may be exacerbated by the toxicity of currently available AML treatments, maximizing HRQoL should be a particularly important goal for the patient [16, 17]. Treatments that actively improve HRQoL because of decreased toxicity, while still maintaining effectiveness, or that result in improvements in clinical symptoms and reduced burden of the disease (such as a decrease in transfusion dependence) may be a key addition in the treatment algorithms of AML.

The primary objective of this study was to understand the impact of a range of factors that influence AML patients’ HRQoL. In addition, this study investigated the HRQoL of patients with different clinical characteristics [i.e., relapsed/refractory disease, presence of FLT3-internal tandem duplication (ITD) mutation] and during different types/intensities of treatment regimens, i.e., low vs. high intensity, HMA monotherapy or stem cell transplantation (SCT). A second objective of the study was to assess the degree of concordance regarding patient-experienced symptoms as reported by AML patients themselves and the recording of the same symptoms by their treating physicians.

Methods

Data Source and Populations

The Adelphi AML Disease Specific Programme (DSP) captured data from the perspective of physicians and their consulting patients between February and May 2015 in the USA. DSPs are large point-in-time surveys conducted to provide impartial observations of real-world clinical practice from a physician and matched patient viewpoint. The DSP is not conducted to test any pre-specified hypotheses and is not set up to demonstrate cause and effect; rather it is designed to provide a holistic, benchmark view of contemporary AML management via physician- and patient-reported record forms [18]. A complete description of the survey methods has been published previously [18,19,20]. The research was done following the appropriate ethical and legal guidelines. The collection of patient data is compliant with the Health Insurance Portability and Accountability Act of 1996 and the Health Information Technology for Economic and Clinical Health Act of 2009 [21, 22]. In addition, informed consent was obtained from the participants involved.

Physicians (hematologist-oncologists and hematologists) were identified through publicly available lists and their recruitment reflected nationally representative samples subject to meeting the DSP inclusion criteria including: the physicians must have qualified between 1978 and 2011, were licensed to practice in the USA, were personally responsible for prescribing decisions for patients with AML and saw a minimum of two patients with AML in a typical week. For inclusion in the current AML DSP study, patients were required to be adults ≥ 18 years with a physician-confirmed diagnosis of AML, had received active AML drug therapy (although may not be receiving active therapy at the time of consultation and inclusion), had completed a patient self-completion (PSC) form and were not currently enrolled in a clinical trial.

Physician and Patient-Reported Outcomes

Physicians were requested to complete a detailed patient record form (PRF) for the next 6–8 consecutive consulting patient visits for patients eligible for inclusion. Data captured by the physician include demographic and clinical information, current disease management, symptom profile and AML treatment patterns. For each patient, the consulting physicians also completed details on how the diagnosis of AML was made and if cytogenetic and molecular analysis [e.g., FLT3-ITD, Isocitrate Dehydrogenase (IDH)1/2, Neuroblastoma Rat Sarcoma (NRAS), c-KIT] was used to establish this diagnosis.

After providing informed consent, patients completed a questionnaire immediately after the consultation with the physician, which was done independently from their physician; this was placed in an envelope that was sealed by the patient and was left with the receptionist in the physician’s office prior to returning to the fieldwork agency. The PSC included validated measures that assessed HRQoL and therapy satisfaction. HRQoL was measured using the Functional Assessment of Cancer Therapy–Leukemia (FACT-Leu) Questionnaire and the 5-Dimension EuroQoL Questionnaire (EQ-5D-3L). Therapy satisfaction was measured using the Cancer Treatment Satisfaction Scale (CTSQ).

The FACT-Leu Questionnaire [23] [consisting of the general FACT Questionnaire (FACT-G) plus the leukemia-specific (Leu) subscale] is a validated FACT-leukemia-specific questionnaire that measures the most common and important HRQoL concerns of patients with leukemia. The FACT-Leu has multiple scoring components: functional well-being (FWB), social well-being (SWB), emotional well-being (EWB), physical well-being (PWB), the Leu subscale, the total FACT-Leu scale score and the trial outcomes index (TOI; TOI = PWB + FWB + Leu subscale). Each subscale has a minimally important difference (MID). The concept of an MID is to identify a difference in score which is considered to be clinically meaningful. The MIDs for each FACT-Leu subscale are: FWB (2–3), SWB (not available), EWB (2), PWB (2–3), FACT-G Total (3–7), FACT-TOI (5–6) and FACT-Leu Total (6–12) [24, 25].

Responses on symptomatology were elicited from the “Additional Concerns” section of the FACT-Leu. This includes 17 physical symptoms (fevers, bleeding, general pain, stomach area pain, chills, night sweats, bruising, lymph node lumps/swelling, weakness, tiredness, weight loss, appetite, shortness of breath, functional ability, diarrhea, concentration and mouth sores) and 10 emotional/social concerns (frustration with activity limitation, discouraged by illness, future planning, uncertainty, worry about illness, emotional ups and downs, isolation, concern about infertility, worry about family and worry about infections) [22]. This aspect of the FACT-Leu asks patients to provide responses for each item on a scale of 0–4 where 0 represents “not at all,” 1 represents “a little bit,” 2 represents “somewhat,” 3 represents “quite a bit” and 4 represents “very much” [21]. To enable a two-way comparison, patient responses were (conservatively) grouped as follows: a response of 0, 1 or 2 indicated that the patient was not experiencing the symptom (or not severely enough to warrant attention); a response of 3 or 4 was assumed to indicate the presence of the symptom. Additionally, as a sensitivity analysis, we applied a more stringent grouping regarding the presence or absence of symptoms, comparing a response of 0 (no symptom at all) with a response of 1–4 (any sign/severity of a symptom).

The EQ-5D-3L [23, 26] is a five-item assessment of a patient’s general health status (mobility, self-care, daily activities, pain/discomfort and anxiety/depression), which are reported on a scale of 0–1, with 1 indicating “full health,” and an MID of 0.074 [27, 28]. The EQ-5D-3L includes a visual analog scale (VAS) to rate an individual’s health state on a scale from 0 to 100, with an MID of 7 [27, 29].

The CTSQ [30] measures treatment satisfaction and includes three domains: expectations of therapy (ET), satisfaction with therapy (SWT) and feelings about side effects (FSE). Each item is scored on a scale from 1 to 5 with a value of 1 being the worst response and a value of 5 being the best response. Domain scores are calculated by the following formula: (mean of completed item scores − 1) × 25, resulting in a domain score of 0–100, with a higher score representing a better outcome [31]. Mean MIDs for the CTSQ domains are reported as ET (14.3), FSE (8.5) and SWT (5) [31].

Statistical Analysis

Patient characteristics were analyzed descriptively for the total study sample. The total population was divided into subgroups depending on the nature of the analysis. These subgroups included relapsed/refractory status, treatment intensity, HMA monotherapy status, SCT status and FLT3-ITD status.

Where any patient groups were compared, bivariate analyses were used to identify significant differences. For numerical variables, T tests were performed. For nominal/ordinal variables, Pearson’s chi-square tests were performed. Effect size (ES) between groups was also calculated by dividing the difference in means between groups by the standard deviation. As effect size quantifies the difference between two groups independently from the sample size, it holds more clinical meaning than statistical significance alone. The kappa statistic measure of inter-rater agreement (adjusting for random agreement) was calculated to express the differences and level of concordance between physicians and patients with respect to the presence of key AML symptoms. Symptom severity was not reported by the physician, so analysis was conducted on presence only. Kappa values near 0 indicate low agreement, and values approaching 1 indicate high agreement; magnitude guidelines were used in interpretation [32]. Logistic regression analysis was used to identify whether physician-patient discordance could be independently associated with QoL outcomes using individual and number of disagreements as predictors.

All analyses used Stata Statistical Software: Release 15 (StataCorp LP, College Station, TX, USA).

Results

Sample Characteristics

A total of 61 physicians included 457 patients receiving treatment for AML in the DSP. Fifty-seven of the physicians described themselves as hematologist-oncologists, and the remaining four as hematologists. The median number of leukemia patients seen by the physician per week was 40, with a median of 38% of these being patients with AML.

Patient demographics are shown in Table 1; 44% were female, the mean age was 60 years, 56 (12%) were considered by the physician to be relapsed/refractory, and 7% had received SCT. Of the 237 (51%) patients who had been tested for cytogenetics, 11% tested positive for the FLT3-ITD mutation (a lower proportion than expected although it is not known how many patients underwent the FLT3-ITD test).

Table 1 Patient demographics

While 457 patients were included in the study, only 82 agreed to complete a PSC form and could therefore be included in the analysis. Of those who completed the PSC, 37 (45%) were female, mean age was 58.2 years, 19 (23%) had relapsed/refractory AML, and mean time since diagnosis was 10.6 months. Characteristics of patients completing a PSC form were similar to those for whom a PSC form was not available, with the exceptions that more patients completing versus patients not completing a PSC form were relapsed/refractory (23% vs. 10%; P = 0.0024) and had been diagnosed for longer (10.6 months vs. 6.4 months; P < 0.001) (Supplementary Table 1).

Overall Burden of AML

Relapsed/Refractory Patients and Non-relapsed/Refractory AML Patients

Relapsed/refractory AML patients had lower QoL scores on all FACT-Leu domains than non-relapsed/refractory AML patients (Table 2). Physical well-being was lower for patients with relapsed/refractory AML compared with non-relapsed/refractory AML patients (13 vs. 17.6; P = 0.0053; ES = 0.729), and the difference was also clinically meaningful as it exceeded published MIDs of between 2 and 3 [24]. Although not statistically significant, the differences in other domains that exceeded published MIDs for relapsed/refractory AML compared with non-relapsed/refractory AML were FACT-G (56 vs. 63.3; P = 0.0842; ES = 0.458) [23], trial outcome index (63.8 vs. 73; P = 0.0802; ES = 0.464) [23] and overall FACT-Leu (92.5 vs. 103.7; P = 0.0982 ES = 0.439) [24]. No differences were noted in relapsed/refractory patients regarding expectations of therapy scores (64.7 vs. 67.7; P = 0.6172; ES = 0.134) and satisfaction with current therapy scores (60.3 vs. 66.5; P = 0.1581; ES = 0.376).

Table 2 Overall burden of AML

AML Subgroups by Treatment Regimen

For the 39 patients receiving low-intensity therapy, those receiving HMA monotherapy had a worse HRQoL compared with those on therapies other than HMA therapy in EQ-5D-3L (0.70 vs. 0.79; P = 0.0928; ES = 0.558) [29] and overall FACT-Leu QoL (89.9 vs. 112.9; P = 0.0021; ES = 0.971) [24] and in the FACT-LEU domains [24]: physical well-being (13 vs. 19.3; P = 0.0006; ES = 1.063), social well-being (17 vs. 19.9; P = 0.0851 ES = 0.567), FACT-G (56.8 vs. 68.5; P = 0.0143; ES = 0.790), leukemia specific (33.1 vs. 44.4; P = 0.0007; ES = 1.050) and trial outcome index (61.2 vs. 80.5; P = 0.0007; ES 1.055), the differences were clinically significant as measured by the MID in all scales and sub-scales. In addition, patients who received HMA monotherapy vs. not also reported lower expectations (54.3 vs. 77.3; P = 0.0007; ES = 1.061) [31] and less satisfaction with therapy (56.3 vs. 71.9; P = 0.0029; ES = 0.943) [31] (Table 2).

When assessing patients on high-intensity treatment compared with those on low-intensity treatment, there were no clinically important differences in HRQoL measures. The only notable difference was that patients on high-intensity treatment had higher expectations of therapy than those on low-intensity treatment (68 vs. 63.4; P = 0.3409; ES = 0.213).

Prior Stem-Cell Transplantation

Our cohort included 10 patients who had received prior SCT and 72 patients who had not. For patients who had received a prior SCT, there were clinically meaningful decreases in HRQoL measures compared with those who had not received SCT regarding: emotional well-being (9.5 vs. 12.3; P = 0.0664 ES = 0.619) [24], FACT-G (55.5 vs. 61.7; P = 0.2360 ES = 0.402) [24] and EQ-5D-3L (0.65 vs. 0.75; P = 0.1288 ES = 0.514) [28]. However, due to the small sample size, the differences were not statistically significant. In addition, no differences were reported in patients who had received SCT regarding expectations of therapy, although this may reflect the small sample size used for these analyses (72 vs. 65; P = NS; ES = 0.325) [31].

AML by FLT3-ITD Mutation Status

Of the 82 patients included in this analysis, 54 were reported to have molecular testing (to establish their AML diagnosis). Of these, seven patients tested positive for the FLT3-ITD mutation. Although the sample size was small, there were clinically meaningful decreases in HRQoL between patients who were FLT3-ITD positive compared with those without the mutation: EQ-5D-VAS (47.6 vs. 63.7; P = 0.0428; ES = 0.816), EQ-5D-3L (0.64 vs. 0.76; P = 0.1629; ES = 0.568), overall FACT-Leu (85.5 vs. 100; P = 0.1484; ES = 0.588) and several FACT-Leu domains, including physical well-being (12 vs. 16.9; P = 0.0711; ES = 0.730), FACT-G (53.8 vs. 60.3; P = NS; ES = 0.423), leukemia-specific (31.7 vs. 39.7; P = 0.0717; ES = 0.729) and trial outcome index (70.9 vs. 57.0; P = NS; ES = 0.708). In addition, patients with the FLT3-ITD mutation had clinically meaningful differences in feelings about side effects compared with those without the mutation (40.2 vs. 48.1; P = 0.1810; ES = 0.545). Again, although sample sizes were small, the directionality of the results was strong and was statistically significant in the case of the EQ-5D-VAS score.

Agreement and Discordance Between Patient-Physician Symptom Reporting

Both physicians and patients were asked to report on current symptoms at the time of data collection (i.e., at the same time point). As shown in Fig. 1, for symptoms where physician and patient response could be linked (bruising, fatigue, fever, bleeding, weight loss and appetite loss), patients were more likely than physicians to report the presence of a symptom. Agreement on individual symptoms varied considerably, with low kappa scores observed for bruising (κ = 0.1292), fatigue (κ = 0.0836), bleeding (κ = 0.0177), weight loss (κ = 0.0821) and appetite loss (κ = − 0.0246), indicating substantial disagreement between these patient- and physician-reported symptoms. Agreement was observed for fever (88%; P = 0.0016; κ = 0.3121), with 12% of patients self-reporting this symptom compared with 7% of physician-reported instances; the kappa score (κ = 0.3121) was considered to be less than moderate [21]. The lowest level of agreement observed was for appetite loss (29%; P = NS; κ = − 0.0246) in which 74% of patients reported this symptom compared with only 11% of physician-reported instances. A higher level of disagreement was observed for symptoms such as appetite loss and fatigue than for objective variables such as fever and weight loss (Fig. 1). As expected, applying a more stringent cutoff for sensitivity analysis to the reporting of symptoms by patients, whereby patients reported either no symptom at all (0) or any mention of a symptom (1–4), resulted in less agreement between patient and physician reported symptoms (Supplementary Table 2).

Fig. 1
figure 1

Concordance and discordance between patient-physician symptom reporting. *Black and white bars represent discordance between patient and physician reporting of signs and symptoms, and shaded bars represent concordance

Impact of Patient-Physician Disagreement

HRQoL as determined by FACT-Leu was associated with patient-physician disagreement on the symptoms of bleeding (− 14.12; P = 0.046), weight loss (− 21.22; P = 0.001) and appetite loss (12.58; P = 0.027) (Table 3). In each of these cases, in which there was a disagreement on symptoms, the patient reported a lower HRQoL compared with cases where the patient and physician were in agreement about the presence/absence of symptoms. Similar results were observed for satisfaction with therapy if there was patient-physician disagreement with symptoms of fever (− 11.66; P = 0.026), weight loss (− 8.08; P = 0.043) and appetite loss (− 10.30; P = 0.007) (Table 3), again with presence of symptom disagreement being associated with a worse HRQoL.

Table 3 Impact of discordance on HRQoL and satisfaction with therapy

Discussion

The present study of adult patients receiving therapy for AML demonstrates that patients with relapsed/refractory AML and those who have had SCT have a worse QoL than those who do not have AML or SCT. Table 2 shows the overall FACT score was 92.5 for relapsed/refractory AML, although it was even lower for the subgroups analyzed. This compares to a score of 147.2 for the de novo AML population who have received treatment for 5–6 months [33]. Similarly, the EQ-5D-3L score was 0.71 for relapsed/refractory patients (and only 58.6 on the EQ-5D-VAS) compared with a value of 0.83 for AML patients in complete response (survivors with no relapse), with a lower score indicative of a worse QoL [34]. Similar results were suggested in patients with the FLT3-ITD mutation, although the sample size was very small. However, the strength of the directional trends suggests that further investigation may be warranted regarding the impact of this mutation on HRQoL. There were no differences in HRQol in patients based on intensity of treatment. However, among the patients who received different low-intensity regimens, HRQoL was lower in patients receiving HMA monotherapy. Although these results are not surprising, they suggest that HRQoL is low in AML patients in general and may be particularly poor in specific subgroups—a factor that physicians should bear in mind in the context of overall patient management and support.

To the best of our knowledge, our study is the first to report a real-world HRQoL assessment of patients with the FLT3-ITD mutation, albeit in a very limited sample of patients. A larger sample with verified molecular testing could provide more information about the HRQoL in this patient subgroup.

Our results are consistent with other studies that observed lower QoL in patients with SCT compared with those receiving chemotherapy. Also, there was a worse HRQoL in patients with relapsed and refractory AML compared with those who had not relapsed or become refractory [35, 36], although ours is the first study to report directional trends suggesting clinically meaningful differences in HRQoL for both relapsed and refractory patients. There is also evidence that the symptoms of appetite loss and fatigue have the most detrimental impact on HRQoL [36].

As expected, patients without relapsed/refractory AML, those receiving high-intensity therapy or those patients treated with non-HMA low-intensity therapy have higher expectations of, and more satisfaction with, their therapy. Prior SCT and FLT3 status did not have an impact on therapy expectation or satisfaction. This is of interest, as the prognosis differs in patients who are FLT3-ITD-mutation positive, so one would expect that there would be a difference in therapy expectations. This may be due to a variety of factors, which could include a lack of clear prognosis description by the physician, the psychologic difficulty patients encounter when facing a poor prognosis or simply a lack of understanding in the patient population of the poor prognosis of FLT3-ITD mutations.

Substantial differences were observed between frequencies of physician- and patient-reported symptoms, with patients being considerably more likely to report experiencing symptoms to some degree, in particular appetite loss. This lack of agreement shows there is a substantial discordance between AML patients and their physicians. While this could be expected in patient-provider interactions (patients reporting more symptoms than their physicians), this study has sought not only to quantify the level of discordance, but also the impact of this discordance on the HRQoL of patients. The level of discordance implies that either patients may be underreporting or not reporting their symptoms to physicians or physicians may not recognize or place as much emphasis as the patient on these symptoms.

However, it must be recognized that in all cases the reporting of symptoms is subjective and that in our study the physicians provided responses on the presence of symptoms by checking a pre-coded list, whereas the patients provided responses within the FACT-Leu. As such, it can be argued that as different methods were used to gather patient and physician data in this study, this could account for some of the discordance. Given that the analysis refers solely to the differences in symptom prevalence and not to symptom severity, the differences in methodology described above may not be a significant limitation, as both the patient and physician data collection methods allow for the presence or absence of symptoms to be derived. Indeed, a sensitivity analysis applying a more stringent grouping of the patient-reported data, namely taking into account only the presence or absence of symptoms was undertaken (ignoring severity), yielded similar results to the original core analysis. This issue is common in many studies where an exact match between the physician-reported and patient-reported outcomes tools are difficult to obtain [37].

In our study, we observed differences in HRQoL when there was patient-physician discordance on symptoms of bleeding, weight loss and appetite loss whether using a relaxed or stringent cut-off, indicating the validity of our results. It should be noted that for the symptom bruising, which is strongly associated with AML, there was no impact on HRQoL even though we observed a disconnect in reporting.

While differences in reporting may be logical in the sense that the primary objective for physicians will generally be to treat and manage the AML itself, recognition and subsequent management of specific symptoms important to the patient may also improve patient QoL, as this would allow symptoms to be addressed more accurately. This discordance could be due to a variety of factors, and a likely factor could be that there is not clear communication of symptoms in a patient’s interaction with their physician. Improved communication between patient and physician may help to reduce discordance, a finding consistent with other published evidence [38], whereas discordance in symptom detection may lead to suboptimal symptom management, which could potentially adversely impact patients’ HRQoL.

This study had some limitations, with an obvious limitation being the small sample size. Regarding the patient-reported self-completion questionnaire, upon which much of this analysis is based, it must be recognized that this was voluntary, so while all patients included were invited to undertake this task, only a limited number agreed to do this. However, characteristics of patients who did and did not agree to complete the form were similar in most of the categories as shown in the results (Table 1) and Supplementary Table 1. We also recognize the small numbers of patients in the SCT (10 out of 82) and FLT3-ITD (7 out of 54) subgroups. More research is needed in these patient subgroups to confirm our results so that our findings can be more widely generalized. In addition, we did not have a true random sample of physicians or patients. While minimal inclusion criteria governed the selection of the participating physicians, participation was dictated by their willingness to complete the data collection. This recruitment bias is common to all research, since regardless of the study type, research design and methodology, there will always be groups of physicians who are willing to participate and other groups who are not. It is also possible that the patients included may consult their physician more frequently and may be more severely affected than those who do not consult their physician as frequently. However, this patient group is representative of the population actively consulting with their physician regarding their AML. While the cross-sectional design of this study prevented any conclusions on causal relationships, we are still able to identify associations between the AML patient subgroups and HRQoL. Finally, although we use a survey methodology that can be subject to recall bias, this is not an issue for this study as all data used relate to variables that do not require any recall, such as presence/absence of current symptoms and HRQoL.

The key strength of this study is that patients have a confirmed diagnosis of AML and that cases reflect the real-world population from clinical practice rather than studying populations in controlled conditions. This enables insight into HRQoL of patients being treated for AML and their journey.

Conclusion

Our study demonstrates that HRQoL is low for patients with AML in general, and particularly for patients with relapsed/refractory disease, patients harboring the FLT3-ITD mutation and those who have had SCT. There is a lack of concordance between patient and physician reporting of symptoms, which may be due to communication. Lack of concordance can potentially have detrimental effects, as a lack of awareness of symptoms on the physician’s part means these symptoms are not addressed. Improved patient symptom reporting tools and heightened awareness of symptom impact on the patient by the physician may improve the HRQoL of the patient, and possibly their disease course.