Neurocognitive impairment and patient–proxy agreement on health-related quality of life evaluations in recurrent high-grade glioma patients

Purpose The rate of missing data on patient-reported health-related quality of life (HRQOL) in brain tumor clinical trials is particularly high over time. One solution to this issue is the use of proxy (i.e., partner, relative, informal caregiver) ratings in lieu of patient-reported outcomes (PROs). In this study we investigated patient–proxy agreement on HRQOL outcomes in high-grade glioma (HGG) patients. Methods Generic and disease-specific HRQOL were assessed using the EORTC QLQ-C30 and QLQ-BN20 in a sample of 501 patient–proxy dyads participating in EORTC trials 26101 and 26091. Patients were classified as impaired or intact, based on their neurocognitive performance. The level of patient–proxy agreement was measured using Lin’s concordance correlation coefficient (CCC) and the Bland–Altman limit of agreement. The Wilcoxon signed-rank test was used to evaluate differences between patients’ and proxies’ HRQOL. Results Patient–proxy agreement in all HGG patients (N = 501) ranged from 0.082 to 0.460. Only 18.8% of all patients were neurocognitively intact. Lin’s CCC ranged from 0.088 to 0.455 in cognitively impaired patients and their proxies and from 0.027 to 0.538 in cognitively intact patients and their proxies. Conclusion While patient–proxy agreement on health-related quality of life outcomes is somewhat higher in cognitively intact patients, agreement in high-grade glioma patients is low in general. In light of these findings, we suggest to cautiously consider the use of proxy’s evaluation in lieu of patient-reported outcomes, regardless of patient’s neurocognitive status. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-022-03197-w.


Introduction
The prognosis for high-grade glioma patients is quite dire, with a median survival time of 15 months and a usually rapid decline in general health. Therefore, it is not surprising that health-related quality of life (HRQOL) has become an important secondary outcome measure in high-grade glioma clinical trials, as a measure of patients' functioning [1]. Traditional clinical trial outcomes such as progression-free survival or overall survival do not provide a complete picture of the patient's functioning and well-being. Therefore, outcomes such as HRQOL and neurocognitive functioning are now typically included in brain tumor clinical trials, to better capture functioning and well-being. Unfortunately, many brain tumor clinical trials still suffer from a substantial amount of missing HRQOL data over time. [2] Restricting analyses only to patients able to offer complete patient-reported outcomes (PROs) might introduce a bias in clinical trials, since important HRQOL evaluations of patients with poor neurological and/or neurocognitive functioning might be missing [3]. A possible solution to this problem has been proposed in resorting to evaluations offered by partners, relatives, or informal caregivers, collectively referred to as "proxies" as substitute data in the analysis of missing scores reported by patients.
However, it is unclear to what extent proxy-reported outcomes are representative of the patients' self-perceived outcomes. Previous studies in low-grade glioma patients showed that the level of neurocognitive functioning determines the degree of patient-proxy agreement. Moderate to high patient-proxy agreement was found in neurocognitively intact patients, while in patients with neurocognitive impairment, patient and proxy ratings differed regarding emotional functioning [4][5][6][7]. In general, there seems to be agreement between patients and proxies concerning physical functioning and symptoms [6], but the same cannot be said regarding less visible issues, such as mood and emotional functioning [8].
Furthermore, the debate concerning the subjectivity of HRQOL is still open. While HRQOL ratings are by definition subjective and should in principle be reported by the patient him-or herself [9], HGG patients often face neurological and neurocognitive deterioration that could force clinicians to resort to proxy ratings because of the inability of the patient to do so.
The aim of this study is to investigate patient-proxy HRQOL agreement in a large sample of patients with recurrent HGG. Previous findings of a study conducted on low-grade glioma patients, in which part of the authors involved in the present study collaborated, suggest neurocognitive impairment as an influencing factor for HRQOL patient-proxy agreement. Therefore, we divided patients participating in two EORTC-coordinated clinical trials in neurocognitively impaired and intact and used an approach similar to the one previously implemented in the study by Ediebah and colleagues. Our pre-trial hypothesis was that we expected neurocognitively impaired patients to have lower levels of patient-proxy agreement than neurocognitively intact patients [4][5][6][7].

Patients and methods
The initial sample of EORTC trials 26101 and 26091 combined consisted of 731 patients. Most patients had prior chemotherapy and radiotherapy and, in both trials, evaluation prior to randomization and every 12 weeks thereafter included neurocognitive, HRQOL, and full clinical assessment.
In addition to the criteria set for inclusion in the two clinical trials shortly described underneath [10,11], only HGG patients (i.e., WHO grade III and grade IV) were selected for this study. Since data have been collected prior to 2016, the 2007 WHO tumor classification was used, selecting only WHO grade III and IV tumors [12]. All variables were measured within two weeks prior to randomization. We determined a time window of ± 7 days between neurocognitive functioning (NCF) evaluation and QLQ-C30 and QLQ-BN20 administration to assure concurrent measurements, given the one-week time frame of the QLQ questions.
EORTC trial 26101 (EudraCT number 2009-017,422-39) was a randomized phase III trial investigating whether the combination of lomustine plus bevacizumab compared to lomustine alone would result in better overall survival in glioblastoma patients with first progression [10]. EORTC 26091 (EudraCT number 2010-023,218-30) was a randomized, open-label phase II trial comparing temozolomide alone to the combination of temozolomide and bevacizumab in patients with a first recurrence of a locally diagnosed WHO grade II or III glioma without 1p/19q co-deletion [11].

Ethics
These trials were approved by the institutional review boards and ethics committees of all participating centers and the respective authorities. The trials were completed according to the Declaration of Helsinki and all patients provided written informed consent.

Neurocognitive assessment
Neurocognitive functioning (NCF) was assessed using a widely accepted clinical trial battery for testing NCF in patients with intracranial or extracranial tumors and were selected because of their wide use in clinical trials and their sensitivity to the impact of tumor and treatment-related variables [13,14]. This battery consists of the Hopkins Verbal Learning Test-Revised (HVLT-R) [15] for total recall, delayed recall, and delayed recognition, which indexes verbal learning and memory, the Trail Making Test (TMT part A and part B) [16], which measures attention, speed, and mental flexibility; and the Controlled Oral Word Association Test (COWAT) [17] test, which evaluates the spontaneous production of words under restricted search conditions. These tests were administered by centrally trained and certified health-care personnel, e.g., research nurses, and neuropsychologists.
Health-related quality of life assessment for patients HRQOL was measured using the generic EORTC QLQ-C30 questionnaire [18] and the EORTC QLQ-BN20 module specific for brain tumor patients [19]. The former is a 30-item questionnaire developed to assess the quality of life of cancer patients. The latter is a 20-item module which tackles problems specific to brain tumor, its treatment, and consequences.
The QLQ-C30 is divided in functioning and symptoms scales, while the QLQ-BN20 is a symptom-only questionnaire. In functioning scales, the higher the scores, the better the functioning, and in symptoms scales, the opposite is true, a higher score indicates more of the symptoms.
HRQOL questionnaires were filled out at the hospital when patients had scheduled visits. Patients completed the questionnaire in the clinic, ideally in a quiet, private room; questionnaires were given to the patient before meeting with the physician, ensuring that the patient had enough time to complete the questionnaire. If the patient received a therapy, the questionnaire was filled out before administration of the treatment. The questionnaire could not be taken home and/ or mailed.

Health-related quality of life assessment for proxies
Consenting patients were requested to identify a significant other (i.e., spouse or other person in close relationship to the patient), whom physicians asked to participate in the study. The significant others were also provided with verbal and written information on the study.
Patients' proxies were asked to complete the EORTC QLQ-C30 and EORTC QLQ-BN20 at each assessment point at the same time as the patient, at baseline and at 12-weekly follow-ups. Proxies were also instructed to complete the questionnaire trying to put themselves in the shoes of the patients since the questions were formulated always in first person.
Raw scores of the six NCF test outcomes were transformed into Z-scores using available normative data. [15][16][17] A deviation of − 1.5 SD or more from the Z-score mean was used as cut-off to define NCF impairment. Based on the presence of impaired test outcomes, patients were consecutively divided into two groups. Patients who had no impairment on any of the six test outcomes were defined as 'intact,' while patients showing at least one impaired test were defined 'impaired.' The QLQ-C30 and QLQ-BN20 are questionnaires based on a Likert scale answer system and multi-item and singleitem subscales are formed addressing general functioning as well as symptoms. A higher score on a functioning scale corresponds to better functioning, and a higher score on a symptom scale correspond to more symptoms.
Patients with more than half of the indices of neurocognitive functioning (NCF), EORTC Quality of Life Core Questionnaire (QLQ-C30), or Quality of life brain module (QLQ-BN20) evaluations unavailable were excluded from the analyses. QLQ-C30 and QLQ-BN20 raw scores were transformed into a linear scale ranging from 0 to 100 [20]. Mean differences and standard deviations between patients and proxy were calculated. The proportion of patient-proxy dyads whose difference was within 0, 10, 20, and more than 20 units was summarized. Then, scores of patients and proxies on all QLQ-C30 and QLQ-BN20 scales were compared using Lin's concordance correlation coefficient (CCC) and the Bland-Altman limit of agreement. The Wilcoxon signedrank test was used to evaluate differences between patients' and proxies' HRQOL.
The Wilcoxon signed-rank test was used to compare the distributions of the patients and proxies scores looking for differences and more importantly to identify eventual systematic bias. Such bias can be caused for instance by a higher median for proxies scores, compared to patients scores.
The Bland-Altman indicates the range within which 95% of all differences in ratings are expected to fall, assuming distribution normality. It was implemented to compare patient-proxy agreement by offering plausible ranges for differences in scores [21][22][23].
As last CCC was used to test the concordance between patient and proxy ratings. A score below 0.40 indicated poor to fair agreement; 0.41-0.60, moderate agreement; 0.61-0.80, good agreement; and 0.81-1.00, excellent agreement [21].

Results
From the initial cohort of 731 patients and 601 proxies, 127 patients were excluded due to completely or extensively missing NCF, QLQ-C30, and/or QLQ-BN20 evaluations and 103 patients did not meet the histological criteria of WHO grade III and IV high-grade gliomas. After the exclusion of those patients with extensively missing data and their significant others together with them, a total of 501 patients were selected: 470 with first recurrence of glioblastoma (EORTC 26101) and 31 patients with first recurrence of a locally diagnosed WHO grade II or III glioma without 1p/19q co-deletion (EORTC 26091). Patients included for the current analyses (n = 501) had a median age of 56 years (21-82) and 202 (37.3%) were female. Further detailed clinical information can be found in Table 1. Table 2 summarizes the QLQ-C30 AND QLQ-BN20 outcomes for all 501 patients meeting the inclusion criteria, regardless of their neurocognitive status.

Agreement between patients and proxies
Differences in scores of patients and proxies were observed on various functioning and symptoms scales, with patients reporting a higher score (better functioning/more symptoms) than their proxies on Emotional functioning, Cognitive functioning, Nausea and Vomiting, Dyspnea, and Appetite loss scales of the QLQ-C30 and on Seizures, Itchy skin, and Bladder control scales of the QLQ-BN20.
The opposite held true, with patients reporting lower scores (worse functioning/less symptoms) than their proxies on Fatigue and Insomnia scales of the QLQ-C30 and Future uncertainty, Motor dysfunction, Communication deficit, and Drowsiness scales of the QLQ-BN20.
Lin's CCC showed a poor to fair agreement ranging from r = 0.082 (Nausea and vomiting) to r = 0.46 (Financial Difficulties).
As last the Bland-Altman limit of agreement revealed low agreement between patient and proxies in all HRQOL domains with few exceptions: Global health, Cognitive functioning, Diarrhea, and for the QLQ-BN20, Seizures.

HRQOL agreement between neurocognitively impaired patients and proxies
In total 94 patients were neurocognitively intact, while 407 were impaired according to our definition. Differences in scores of neurocognitively impaired patients and their proxies were observed on more than half of the functioning and symptoms scales, with patients reporting a higher score than their proxies on the Physical functioning, Role functioning, Emotional functioning, Cognitive functioning, Nausea and vomiting, and Financial difficulties scales of the QLQ-C30 and on the Seizures and Itchy skin scales of the QLQ-BN20.
On the other hand, neurocognitively impaired patients reported lower scores than their proxies on the Fatigue and Insomnia scales of the QLQ-C30 and on the Future uncertainty, Motor Dysfunction, Communication deficit, and Drowsiness scales of the QLQ-BN20.
Lin's CCC showed poor to fair agreement with the exception of the Financial difficulties score (that showed moderate agreement), ranging from r = 0.088 (Nausea and vomiting) to r = 0.452 (Financial Difficulties).
Again, the Bland-Altman limit of agreement revealed low agreement between patient and proxies in all HRQOL domains, except for Constipation, Diarrhea on the QLQ-C30, and Seizures on the QLQ-BN20.
The difference between patients and proxies was calculated, and the proportion within 0, 10, 20 and more than 20  Table 3 shows the scores of neurocognitively impaired patients and their proxies.

HRQOL agreement between neurocognitively intact patients and proxies
As can be seen in Table 4, the analysis of the scores of neurocognitively intact patients and their proxies showed significant differences on five functioning and symptoms scales, with patients reporting a higher score than their proxies on the Nausea and Vomiting and Dyspnea scales of the QLQ-C30 and on the Itchy skin scale of the QLQ-BN20, while showing lower score (worse functioning/less symptoms) than their proxies on the Physical functioning and Role functioning of the QLQ-C30.
Lin's CCC ranged from poor to moderate with the lowest agreement on the Appetite loss r = 0.027 and the highest on the Diarrhea scale r = 0.538.
The Bland-Altman limit of agreement revealed agreement between patient and all functioning scales, Physical functioning, Role functioning, Emotional functioning Cognitive functioning, Social functioning, Fatigue, Pain, and Diarrhea on the QLQ-C30 and Visual, Motor dysfunction, Headache, Seizures, Drowsiness, itchy Skin, Weakness in, and Bladder Control on the QLQ-BN20.

Discussion
In this study, we aimed at assessing patient-proxy HRQOL agreement in a large sample of high-grade glioma (HGG) patients with and without neurocognitive impairment. To achieve this, we compared the baseline scores of patients and proxies from the EORTC trial 26101 and 26091 on the QLQ-C30 and QLQ-BN20 questionnaires. Our findings primarily suggest that in general there is little agreement between HGG patients and proxies on generic and diseasespecific HRQOL outcomes.
These results are only partially in line with other studies on patient-proxy agreement in brain cancer patients. [7,24] Indeed, when looking at the general agreement of HGG patients with their proxies and comparing it to the results published by Brown and colleagues in a similar population and by Sneeuw and colleagues on a generic cancer population, the agreement reported in our sample is lower. Using a similar statistical approach, the first study reported an ICC between patients and proxies greater than 0.5 on 80% of the measurements; the second showed ICCs ranging from 0.46 to 0.73, indicating a moderate to good level of agreement between patients and proxies. In our study, the agreement ranged from 0.08 to 0.46, with only two scales surpassing the 0.40 threshold that indicates the transition from poor/fair to moderate agreement.
According to the literature comparing patients and proxies evaluations in brain tumor patients [4,6,23,24], neurocognitive impairment is considered to affect patient-proxy agreement. Therefore, we expected neurocognitive impairment to be an influencing factor. However, the findings of the present study do not offer such crystal-clear difference between cognitively impaired and intact patients. While it is true that intact patients showed moderate level of agreement and impaired patients reached a similar level of agreement on only two subscales, the difference between the two groups was not as profound as in other studies. Even though the number of scales showing moderate agreement was higher in neurocognitively intact patients, the lower and upper limit of Lin's CCC range across all scales did not differ much between neurocognitively impaired and intact patients.
Results obtained using the Wilcoxon signed-rank test, as well as those using the Bland-Altman limits of agreement follow a similar pattern, respectively, with less scales showing significant differences and more scales with good levels of agreement for cognitively intact patients than impaired ones, without determining a clear difference.
It is hard to determine whether our expectations were met since neurocognitively impaired patients showed lower levels of patient-proxy agreement than neurocognitively intact patients, but agreement was poor altogether, independently from patients' neurocognitive functioning. The results stress how HRQOL evaluation from patients and proxies are far from aligned and this offers the chance to discuss how this divergence can be interpreted. Perfect patient-proxy agreement is unlikely, and differences in scores, depending on the direction of the difference, determine the interpretation: patients showing higher functioning scores and lower symptoms scores than proxies are considered to underestimate their condition and proxies scoring lower on functioning scales and higher on symptoms scales than patients are considered to overestimate patients' status. The opposite interpretation is possible as well, with proxies underestimating or overestimating patient's functioning and symptoms.
We believe that this way of interpreting the difference in scoring is inadequate, since it implies that one of the two perspectives must be wrong. Clearly, this deviates from the purpose itself of evaluating HRQOL which is a subjective concept by nature. We expected the present study to confirm, as reported in the literature, agreement between patients and proxy, or perhaps smaller differences on those scales concerning aspects of physical functioning and symptoms [6], and discrepancy or greater differences over scales and symptoms related to mood and emotional functioning [8]. The difference in mean scores of patients and proxies observed in this study support this pattern. Indeed, we believe that it could be easier for a proxy to recognize patient's physical distress or functioning impairment in the activities of daily living rather than perceive mood changes or being emotionally empathic. % 0 (cases of no difference between patient and proxy evaluation); % < 10 (cases with less than 10 points of difference between patient and proxy evaluation); % < 20 (cases with less than 20 points of difference between patient and proxy evaluation); % > 20 (cases with more than 20 points of difference between patient and proxy evaluation) CCC Concordance Correlation Coefficient, LL

Lower limit, UL
Upper limit, C.I % 0 (cases of no difference between patient and proxy evaluation); % < 10 (cases with less than 10 points of difference between patient and proxy evaluation); % < 20 (cases with less than 20 points of difference between patient and proxy evaluation); % > 20 (cases with more than 20 points of difference between patient and proxy evaluation) CCC Concordance Correlation Coefficient, LL Lower limit, UL

Confidence interval
Upper limit, C.I % 0 (cases of no difference between patient and proxy evaluation); % < 10 (cases with less than 10 points of difference between patient and proxy evaluation); % < 20 (cases with less than 20 points of difference between patient and proxy evaluation); % > 20 (cases with more than 20 points of difference between patient and proxy evaluation) CCC Concordance Correlation Coefficient, LL Lower limit, UL

Confidence interval
Upper limit, C.I

Confidence interval
Altogether the lack of patient-proxy agreement over HRQOL reflected in the results of the present study, and the importance of not considering a patient untrustworthy solely due to its condition stress the lack of a tool to establish PROs reliability.
It is important to mention that there are several limitations to this study. The fact that treatment course and disease duration prior to inclusion may have been different between patients in the two trials might have impacted sample homogeneity. Additionally, a selection bias due to missing NCF data might have affected the results.
Data concerning the level of kinship of proxies was not recorded at the EORTC headquarters. At the time of the design of these studies, it was regarded to be counterproductive to register demographic data on the proxies, as that would have required informed consent by the proxy as well, possibly negatively influencing recruitment rates of these EORTC studies. No information about specific procedures used to assess tests which were independently completed was recorded. However, in each institution, one person was appointed as the responsible for the local organization of HRQOL data collection. This could have been a physician, data manager, (research) nurse, or a psychologist.
The percentage of mean differences between dyads (0, 10, 20, or more than 20) might have been influenced by the number of possible scores on a scale. [23] Furthermore, no direct measure of mood, which has been shown to offer even more insight into patient-proxy levels of agreement, was collected [8]. Generalizability might be limited due to the selection bias which characterizes clinical trial populations in general. Finally and importantly, our definition of impairment is arbitrary. It is possible that by grading the extent of neurocognitive impairment in levels rather than in a dichotomic variable, results might be different, unfortunately in our case this was not possible. Nevertheless, a sensitivity analysis raising the threshold for neurocognitive dysfunction per test (> 2 SD) was performed. By exacerbating the definition of neurocognitive impairment from what is commonly considered the impairment threshold in the clinical environment, we hoped to include only those with an impaired performance even if on only one of the NCF tests. Results show how raising the neurocognitive impairment threshold produced little difference compared to the methodology implemented in the present study and suggests that the threshold adopted in the present study does not limit its message.
The aim of this study was to assess patient-proxy HRQOL agreement in a large sample of high-grade glioma (HGG) patients with and without neurocognitive impairment. The intrinsic subjectivity of health-related quality of life evaluation makes it difficult to establish what the 'truth' is. Our initial assumption was based on a syllogism for which a cognitive intact patient could be considered a reliable source of his/her own quality of life and a caregiver should be a reliable observer, at least for those scales describing functioning aspects and observable symptoms.
However, the question that would follow is a predictable one: Would it be legitimate not to rely on patients evaluation of his/her own well-being due to neurocognitive impairment?
The results of this study suggest that the level of patient-proxy agreement in HGG patients is low in general. When patients were divided into cognitively impaired and intact, these latter showed agreement with their proxies on more scales of the questionnaires, but the level of agreement remained low, suggesting, in contrast with previous literature that cognitive impairment might influence but not preclude agreement. We hope that future studies will tackle the lack of a quantitative measure of reliability of PROs in patients at risk for neurocognitive impairment. Moreover, in light of these findings, we would suggest to cautiously consider the use of proxy's evaluation in lieu of PROs at least until a measure to establish reliability is developed.