Measuring context that matters: validation of the modular Tele-QoL patient-reported outcome and experience measure

Purpose A setting-sensitive instrument for assessing Quality of Life (QoL) in Telemedicine (TM) was unavailable. To close this gap, a content-valid “add-on” measure was developed. In parallel, a brief index was derived featuring six items that summarise the main content of the multidimensional assessment. After pre- and pilot-testing, the psychometric performance of the final measures was investigated in an independent validation study. Methods The questionnaires were applied along with other standardised instruments of similar concepts as well as associated, yet disparate concepts for validation purposes. The sample consisted of patients with depression or heart failure, with or without TM (n = 200). Data analyses were aimed at calculating descriptive statistics and testing the psychometric performance on item, scale, and instrument level, including different types of validity and reliability. Results The proposed factor structure of the multidimensional Tele-QoL measure has been confirmed. Reliability coefficients for internal consistency, split-half, and test-retest reliability of the subscales and index reached sufficient values. The Tele-QoL subscales and the index demonstrated Rasch scalability. Validity of both instruments can be assumed. Evidence for discriminant construct validity was provided. Known-groups validity was indicated by respective score differences for various classes of disease severity. Conclusion Both measures show convincing psychometric properties. The final multidimensional Tele-QoL assessment consists of six outcome scales and two impact scales assessing (un-)intended effects of TM on QoL. In addition, the Tele-QoL index provides a short alternative for outcome assessment. The Tele-QoL measures can be used as complementary modules to existing QoL instruments capturing healthcare-related aspects of QoL from the patients’ perspective. Supplementary Information The online version contains supplementary material available at 10.1007/s11136-023-03469-z.


Background
Existing health-related or disease-specific quality of life (QoL) questionnaires assess the patient-reported impact of diseases or treatments on the concept [1,2].Any aspects related to the context of healthcare, that might influence QoL beyond treatment, were hardly considered so far [3][4][5].As part of the digitalization of healthcare, medical procedures and therapeutic treatment strategies are made available within the context of telemedicine (TM; [6]).Furthermore, additional health services are provided through innovative solutions, like telemonitoring [7][8][9][10][11].This digital transformation has led to a change in healthcare contexts which is widely neglected in TM evaluations [12].In an extensive review including 293 TM studies [4,5], results indicated that TM-sensitive instruments were used in only about 5% of the articles included.Moreover, these instruments were only available for a limited range of concepts, as the majority was solely directed to assess satisfaction [13].Thus, TM-specific aspects of care are not sufficiently covered by existing instruments, yet.Moreover, even though QoL is frequently considered as a core patient-reported outcome [14] in TM [3,15,16], there is no QoL instrument available for telehealth in particular.For this reason, we emphasize that more attention should be paid to contextual factors of healthcare, their influence on patients' experiences and health outcomes [12,17,18].Klara Greffin and Holger Muehlan have shared first authorship.

Extended author information available on the last page of the article
The aim of the "Tele-QoL" project was to close this gap by developing a suitable "add-on" QoL instrument to enable a setting-sensitive evaluation of TM applications [19].As such, this modular questionnaire shall assess QoL of patients with chronic conditions or mental illnesses in the context of TM care.This validation paper aims to document the psychometric performance of the Tele-QoL measures in terms of different forms of reliability and validity.

Instrument development
Developing the Tele-QoL instrument was based on current recommendations for patient-reported outcome measures [20,21] and to some extent inspired by a needs-based approach of QoL assessment [22].The items of the Tele-QoL questionnaires were directly derived from qualitative interviews and focus groups, and assess various facets of the healthcare-related domain of QoL [12].After developing the initial version, an expert workshop for external validation (n = 6), an online expert survey to test the instrument's content validity (n = 15), and a pre-testing of the initial items with a sample of patients (n = 32) were conducted.Subsequently, the revised version of the questionnaire was piloted.Therefore, a sample of patients with depression or heart failure with or without TM care (n = 200) was recruited.As a result, we identified an appropriate measurement model The questionnaire opens with a short instruction on the objective and how to carry it out; this is followed by the respective items.In addition, the temporal reference of four weeks is referred to again at the beginning of each page.Patients rate their healthcare-related experiences of the last four weeks using a 4-point Likert scale with the ratings 1 = "Do not agree" to 4 = "Highly agree".The objective of this paper is to document the evaluation of performance and psychometric properties of the modular Tele-QoL instruments, including the multidimensional Tele-QoL measure with six outcome scales and two impact scales as well as the brief Tele-QoL index with six items.The alternative short version represents the main content of the outcome subscales as closely as possible with one item per dimension, excluding the content of the impact dimensions.

Data sample
For the validation study, patients with chronic heart failure or depression (n = 200), with (Tele-QoL version A) or without (Tele-QoL version B) TM care were recruited.The recruitment was implemented in several hospitals of the project's consortium partners (Brandenburg an der Havel, Greifswald, Leipzig) as well as at ambulatory healthcare facilities, all located in Northeastern Germany.In addition to the disease and treatment criteria mentioned above, a minimum age of 18 was an inclusion criterion; cognitive impairment and severe cognitive comorbidities as well as non-proficient knowledge of the German language were considered as exclusion criteria [19].
Treatment providers or study nurses in the recruitment centers informed interested patients according to pre-defined criteria in person or via telephone about purpose of the study, voluntariness, dropout options, and compensations.In addition, the patients received the information in written form along with phone and e-mail contact details of the recruiting centers and the scientific research assistant.After the patients had given informed consent for the study, the questionnaires were handed out with the request to fill them in.Personal codes were generated for a pseudonymous assignment of the follow-up survey, that was scheduled four weeks later.Personal assistance during the completion of the survey was available upon request.Completed questionnaires were mailed or dropped off in a prepaid envelope.After the questionnaires have been received, data were entered into an Excel spreadsheet and stored on a secured file server.Finally, the original questionnaires were stored in lockable cabinets.

Applied measures
Whereas all measures were applied to the first wave of the validation study, only some of them were also used within the second wave after four weeks to detect test-retest reliability, stability over time, and sensitivity to change.All instruments included in the validation study are described in the study protocol [19].Therefore, we will provide a short overview, only.
Sociodemographic characteristics were assessed based on recommended demographic standards in Germany [24].The "Goldman Specific Activity Scale" (original version: [25]) was used to assess severity of heart failure.Participants were asked to rate whether they are able to perform specific daily activities, and were classified according to four Specific Activity Scale Functional Classes (Class I = least burdened; Class IV = most burdened) based on their answers.It was complemented by the "New York Heart Association Classification" (NYHA; original version: [26]; German version [27]:).Based on their answers, the participants were assigned to one of four classes (NYHA 1 = least burdened; NYHA 4 = most burdened).Depressive symptoms were assessed using the "Patient Health Questionnaire 9 (PHQ-9)" [28].The "SeCu-20" Questionnaire (German original version: [29]) was used to assess perceived security in experiences with TM care.Patient satisfaction was assessed by the "Questionnaire to evaluate patient satisfaction (ZUF-8)" (original version: [30]; German version: [31]).Additionally, the general item of the "Youth Health Care Measure (YHC-SUN)" [32] was used to assess the general satisfaction with treatment.Patient activation was assessed using the "Patient Activation Measure (PAM13-D)" (original version: [33]; German version: [34]).To assess body-related self-consciousness, the subscale "private" of the "Body-related Self-Consciousness (KSA)" questionnaire (German original version: [35]) was used.From the "Body-related Locus of Control (KLC)" questionnaire (German original version: [36,37]) for assessing the body-related locus of control, the subscale "health" was used.The "European Health Literacy Survey (HLS-EU-Q6)" (original version in multiple languages: [38]) was used to assess health literacy.In addition, we used a newly adapted version of HLS-EU-Q6 for digital healthcare, referred to as D-HLS-EU-Q6.The"WHO-Five Well-Being Index (WHO-5)" [39,40] was used to assess QoL of participants with mental health issues.The "Minnesota Living with Heart Failure questionnaire (MLHFQ)" was used to assess QoL of patients with heart failure [41][42][43].The "Veterans RAND 12 Item Health Survey (VR-12)" (original version: [44]; German version: [45]) was used to assess the subjective health status of the participants.Health-related QoL was assessed with the "European Quality of Life 5 Dimensions (EQ-5D)" (original version: [46,47]).The short form of the "World Health Organization Quality of Life (WHOQOL-BREF)" (original version in multiple languages: [48]) was used to assess the general QoL in four different life domains (physical, psychological, social, environmental).

Data analyses
Factorial validity was investigated by conducting a confirmatory factor analysis with maximum likelihood estimations to test the multidimensional measurement model.Amongst the fit indices [49] we inspected the comparative fit index (CFI), Tucker-Lewis index (TLI), and the root mean square error of approximation (RMSEA).Discriminant validity was investigated by calculating Pearson correlation coefficients for associations between Tele-QoL scores and various indicators of general, health-related, and diseasespecific QoL as well as measures related to the assessment of satisfaction with care, patient activation, and health literacy.All concepts were assumed to be low or moderately associated with the Tele-QoL scores.
Concerning convergent validity, we assumed high associations with the subscales of a setting-sensitive measure for patient experiences in TM.Finally, we tested for correlations with further associated constructs, including self-monitoring and locus of control.
To examine known-groups validity with respect to different clinical variables known for differences in QoL, standardized effect sizes for differences of two independent means were estimated using Cohen's d [50].We expected that patients with stronger disease severity show lower Tele-QoL outcome and higher impact scores.
Rasch analysis was used to detect possible misfit on item level.The partial credit model was applied to the data, using Q index statistics and threshold ordering estimation for detecting item misfit [51].
For reliability testing, homogeneity of the subscales was investigated by computing Cronbach's alpha coefficient α.Split-half reliability was determined by the correlation between both test-halves.Intraclass correlation coefficient (ICC) was used to estimate test-retest reliability of the Tele-QoL scores.

Statistical software
Descriptive statistics and item-scaling analysis were performed using the IBM SPSS Version 28.0 [52].Confirmatory factor analysis was processed using IBM AMOS Version 28 [53].For Rasch analysis, the WINMIRA software package was used [54].
There have been patients who did not want to participate in this study, mostly due to a lack of interest or to feeling overwhelmed by the amount of study participation requests they got from the hospitals.A third group was already stressed due to covid-19 and related circumstances, which was also a factor for non-participation.Unfortunately, the number of these individuals was not systematically recorded in the clinics.

Data
Factorial validity was explored by applying confirmatory factor analysis (CFA).We used Maximum-likelihood parameter estimation for testing the model.Despite impaired normal distribution of items, this method can be applied as it is assumed to be robust even if the data violates the assumption of normal distribution.The model did fit the data well (χ 2 (df = 436) = 696.53,p < 0.001, CFI = 0.94, TLI = 0.93, RMSEA = 0.056 [0.048; 0.064]).
The six "outcome" subscale scores of the multidimensional Tele-QoL instrument correlate with each other with r = 0.39-0.81, the two "impact" subscales with r = 0.44 (see Table 2).The Tele-QoL index score correlates moderately to highly with all outcome scales of the multidimensional Tele-QoL (r = 0.59−0.83),but slightly negatively with both impact scales (r = −0.12 and r = −0.16).
Rasch analysis (Partial Credit Model) with emphasis on the operational characteristics of the items showed that none of the items in any of the Tele-QoL subscales or the Tele-QoL index displays infit, indicating no substantial deviation from the model.The range of item locations for most of the scales is moderate (< 2 logits), but the effective range carried by threshold distributions along the latent traits varies between > 4 and < 11 logits.Ordering of thresholds is in accordance with the model assumptions for all items in any of the subscales and the index as well (Table 3).
For reliability testing, the internal consistency was calculated using Cronbach's alpha (α) coefficient for all subscales and the index score.For the Tele-QoL index, a value of α = 0.90, and for the Tele-QoL subscales values between α = 0.84 and 0.95 were obtained.Thus, the internal consistencies for all scales of the Tele-QoL instruments can be judged as very good.All subscales of the Tele-QoL measure as well as the Tele-QoL index also yielded very good values for the splithalf-reliability, which varied between 0.81 and 0.91.Test-retest reliability was determined over a period of approximately four weeks, controlling for the course of the disease.The corresponding intraclass correlation coefficients vary between 0.64 and 0.74 and are thus sufficient to good.T-tests indicate no significant differences between test and retest scores, except for the subscale "Patient Burden & Limitation" (p = 0.01).All reliability coefficients for the subscales of the Tele-QoL and the Tele-QoL index are also depicted in Table 3.
Evidence for known-groups validity of the Tele-QoL measure is displayed by expected group differences (d = 0.01 < 0.44) in the Tele-QoL scores for patients with different disease severity (Table 4).With regard to discriminant construct validity related to QoL results show low to moderate correlations with different indices of general QoL (WHOQOL-BREF), health-related QoL (EQ-5D, VR-12), disease-specific QoL, and well-being (MLHFQ, WHO-5).These findings indicate a sufficient divergent validity of the Tele-QoL instruments, since they capture different aspects of QoL than other instruments Discriminant construct validity related to patients' experiences with healthcare provision was also investigated using other measures of related concepts assessing satisfaction with healthcare (YHC-SUN), patient satisfaction (ZUF-8) as well as patient activation (PAM13-D).For almost all correlations between Tele-QoL outcome subscale scores and index score, coefficients indicate low to moderate associations (r = 0.22-0.61).In addition, discriminant construct validity was also investigated with selected patient experiences covered by the Tele-QoL scales.
Considering "Information & Education", health-literacy (HLS-EU-Q6) as well as digital health literacy (D-HLS-EU-Q6) were assessed.Correlation coefficients indicate very low associations (r < 0.10).With respect to "Perceived Control & Monitoring", we applied instruments assessing related concepts such as private body-related self-monitoring (KSA) as well as internal and external health-related locus of control (KLC).Again, correlations coefficients also indicate very low associations (r < 0.10).
All six outcome subscales scores of the Tele-QoL instrument and the index score correlate with the three subscales of the SeCu-instrument assessing patient experiences in TM (r = 0.36-0.90).This supports the assumption of convergent validity.Missing substantial correlations (r = −0.06< 0.07) with the SeCu subscale assessing negative experiences in TM ("Technology Anxiety") indicate divergent validity.Correspondingly, the subscale"Data Processing & Surveillance" of the Tele-QoL instrument shows a low correlation coefficients with "Technology Anxiety" (r = 0.31), but both impact subscales correlate slightly negative with the three "positive" SeCu subscales (r = −0.30< −0.08).

Main results
With the Tele-QoL measures, we provide a modular instrument that assesses the impact of the TM healthcare context on QoL of patients, beyond the effects of the disease and the treatment [12].This study tested and determined the psychometric properties of the Tele-QoL measures when applied to a sample of German patients with depression or heart failure.
Summarizing the results of this study, the Tele-QoL measures show a promising psychometric performance.Our results confirm the factorial structure of the multidimensional measure.The multidimensional measure suggests that the Tele-QoL captures distinct domains of healthcarerelated QoL in a reliable and valid manner.The reliabilities of all subscales and of the index are satisfying, with alpha coefficients and split-half reliabilities being very good and test retest-reliabilities with sufficient to good values.This indicates that the items within each subscale are highly consistent in measuring the intended construct.Considering that each subscale consists of only four items, these results are    quite convincing.Also, operational characteristics of the items for all scales were in line with the model assumptions implied by the Rasch model.Furthermore, this validation study suggests that the Tele-QoL measures possess both discriminant and convergent validity.There is reasonable evidence that the concept of healthcare-related QoL and the domains representing this construct in the measurement model are not identical with related constructs and are sufficiently distinguished from each other in terms of the discriminant validity.This implies that these measures effectively distinguish between various aspects of QoL.In addition, data provides evidence bolstering the incremental validity, indicating that the Tele-QoL measures offer unique insights beyond what is already captured by other QoL questionnaires.Moreover, a high level of content validity can be assumed, as the Tele-QoL questionnaires underwent a rigorous development process, drawing from extensive qualitative material, collected in previous studies of the Tele-QoL project [12].Noteworthy, the observed correlations among the Tele-QoL outcome subscales imply that there are substantial interrelationships among various aspects of healthcare-related QoL in TM contexts, which is in line with our previous qualitative study [12].Collectively, these findings affirm the robustness and utility of the Tele-QoL measures in assessing healthcare-related QoL.

How can the Tele-QoL measures benefit the evaluation of TM applications?
According to a modern understanding, the majority of patients are considered active protagonists who no longer want to be treated passively, but also want to make their own contribution to their health [55,56].With a long-lasting illness, however, the needs and challenges in everyday life that a patient is confronted with also increase [57].For this reason, it is the purpose of TM care for long-term illnesses to support patients in the management of their illness and the needs associated with it [58].In order to assess whether and to what extent TM applications are able to provide this support, appropriate assessments are needed that reflect the patient's perspective [59].Therefore, the development and implementation of setting-sensitive questionnaires like the Tele-QoL measures are crucial as they allow for a more valid assessment in TM studies.In this way, the healthcare context is included in the evaluation of care components, in addition to the effects of the disease and respective treatment.As a result, the demand for a valid and quantitative summative evaluation of the medical benefit can now be better met [59].
In general, patients using TM will have the opportunity to better represent the impact of TM on their QoL via the Tele-QoL questionnaire.The extended conceptualization of QoL in TM settings may also lead to potential improvements of TM applications and individualized TM care for patients with chronic diseases and mental illnesses.

Strengths and limitations
The Tele-QoL is developed based on an extensive mixedmethods approach, which is a strength in terms of content validity [20,21].Moreover, patients were included in all stages of the development and validation process.Another advantage is the sample composition for validation, consisting of respondents with complementary diseases and forms of treatment.Thus, half of the sample consisted of patients with TM or standard treatment, half of whom were chronically physically or mentally ill.Amongst patients with TM care, half of them were treated with an active TM approach (regular phone calls), the other half were treated with a passive TM application (remote vital monitoring).The aim was to represent all potential user groups and to test whether the questionnaire can be used independently of the disease and the treatment.
However, our validation study also has limitations.First of all, in planning the project, a compromise had to be made between an adequate sample size and the feasibility of data collection.A sample of n = 200 is considered fair [60] and is therefore sufficient, but can be expanded.Future evaluation of the psychometric properties should be based on larger samples, including more disease groups and other TM settings.Moreover, other important properties of the measures need to be investigated, such as readability or responsiveness.
To assess test-retest reliability, patients were asked to complete a second questionnaire four weeks after the initial survey.The date for the second questionnaire was written on the instrument.In addition, after completing the second questionnaire, patients were asked to write the current date under the questionnaire's items.Unfortunately, not all patients did so.Therefore, we cannot be sure in every case that the questionnaires were filled out exactly four weeks later.
The severity of the respective disease, which was used for calculating the known-groups validity, was based on patients' self-reports, assessed via patient-reported outcome measures.The data may be biased, for example, by how someone feels on a particular day.In addition, the validation was conducted as a questionnaire study in which patients were asked to fill out different questionnaires one after the other.We arranged the order of the questionnaires in such a way that the questions on general health run towards specific health questions in order to cause as little priming as possible.Nevertheless, answering one questionnaire may have an impact on answering subsequent questionnaires.
It remains unclear what effect the SARS-CoV-2 pandemic outbreak had on our sample.The recruiting institutions had the impression that more severely burdened patients were less willing to participate in the study than before the pandemic, but this circumstance was not systematically recorded.Nevertheless, it should be reported that in this context there may have been a selection and nonresponse bias in our sample regarding the severity levels included.Besides, TM was suddenly used as a substitute, not as a complement.
In summary, this instrument development demonstrates that the psychometric properties of the Tele-QoL measures are convincing.However, it only remains the first step towards a fully validated questionnaire [61].

Conclusion & outlook
The modular Tele-QoL instrument represents a methodologically sound measure to assess QoL in TM settings.It can be used as complementary module to existing QoL instruments to assess healthcare-related aspects of QoL from the patients' perspective in telehealth contexts.It is an important and necessary contribution to developing, implementing, and evaluating digital health applications.In the future, the Tele-QoL approach will be further adapted so that it can also be used for children and adolescents (new development of a Tele-QoL Kids) as well as in other countries (cultural adaptation and translation) facing similar challenges.
comprising six factors related to patient-relevant "outcomes" (Information & Education, Perceived Safety & Well-Being, Needs Orientation & Trust, Perceived Control & Monitoring, Patient Relief & Independence, Cooperation & Communication), and two factors related to the unintended "impact" of telehealth on patients (Data Processing & Surveillance, Patient Burden & Limitation) [23].The Tele-QoL measure aims to assess healthcare-related aspects of QoL in the context of TM applications (version A) or standard care (comparison version B).It is used as an "add-on instrument" to already existing QoL questionnaires.The target group of the Tele-QoL are patients aged 18 years and older who receive TM care (version A).It is irrelevant whether the patients are being treated for chronic physical or mental illnesses.For now, the Tele-QoL instruments are available in German in form of a short index or full version.

Table 1
Sociodemographic characteristics of the Tele-QoL validation study sample (n = 200) *Data referring to frequencies and percentages.Absolute frequencies vary as a function of the amount of missing data for each variable **Sum of percent value may vary resulting from rounding of single percent rates

Table 5 )
. Most coefficients for correlations between the six Tele-QoL outcome subscales with scores from other QoL measures are notably higher for those domains related to mental issues (WHOQOL-BREF: Mental/Psychological Domain, VR-12 Mental Health Status) than for domains related to physical issues (WHOQOL-BREF: Physical Domain, VR-12 Physical Health Status).Also, domains related to social or environmental issues show higher correlations than domains related to physical issues, but not as high as the "mental" domains.Correlation coefficients with physical domains of QoL are generally weak or low (r = −0.17 to 0.27).Both impact scales of the multidimensional Tele-QoL measure show low negative correlations with almost all QoL scores (r = −0.25 to 0.02).

Table 3
Rasch analysis and reliabilities of the multidimensional Tele-QoL sub-scales and the Tele-QoL index (n = 200)

Table 5
Intercorrelations of the Tele-QoL scores with subscale scores of other measures for convergent and discriminant validation (n = 200) *For patients with chronic heart failure only; **for patients with depression only, ***for patients with telemedical care only.Interpretation of correlation coefficients: r < 0.50: low; r = 0.50-0.70:moderate; r > 0.70: high.In bold print: r ≥ 0.30.WHOQOL-BREF World Health Organization Quality of Life measure (short version); EQ-5D