Plain English summary

Many people have been affected by COVID-19, with an impact on their health-related quality of life. Health-related quality of life includes physical, emotional, and social elements. It is important that we can ask the right questions to assess quality of life after having had COVID-19. Our work focuses on testing the questionnaire we have developed, to make sure it includes the right questions, and that patients understand these. We asked patients from eleven countries to complete the questionnaire. A total of 425 patients completed the questionnaire. Sixty one of 80 questions were kept, and the wording on several items was changed. The final 61-item questionnaire includes the right questions, is complete, and is acceptable to patients. This questionnaire is valid and can be used with patients during and after COVID-19 to measure their health-related quality of life.

Introduction

For over two years, the SARS-CoV-2 virus has continued its worldwide spread causing a heavy burden for the many affected by symptomatic disease (COVID-19). Vaccination programs have reduced the risk of severe disease, hospitalization, death [1,2,3], and disease transmission [4]. Fully vaccinated individuals still experience symptoms ranging from mild to serious [5], especially if they are older (≥ 65 years) and have comorbidities [6]. The most frequent symptoms reported in COVID-19 are fever, headache, cough, myalgia, dyspnea, and loss of taste and smell [7,8,9]. They may also suffer from, e.g. sore throat, runny nose, gastrointestinal problems, and chest pain [7,8,9,10]which can have a considerable impact on physical, emotional, and social functioning [11, 12]. In April 2020, we conducted a literature review and identified publications related to studies of symptoms and problems with functioning in patients with COVID-19 from a large variety of clinical settings, culture and countries, and from all continents. As the pandemic continued to evolve with new variants of the virus, we updated our literature review twice during the first year [9].

Given the impact of COVID-19 and the potential for persisting problems, it is crucial to evaluate the patient’s perspective using validated patient-reported outcome measures (PROMs). Such measures can be used to assess symptoms and other relevant health-related quality of life (HRQoL) issues over time and in response to therapeutic interventions. They help ensure that the patient’s perspective remains central and mitigate the effects of underreporting by health-care professionals [13, 14].

To date, there is no cross-culturally validated COVID-19-specific HRQoL questionnaire available. Although several COVID-19-related clinical outcome assessments have been published [15], they focus on the pandemic’s impact on the general population [16, 17] and mental health [18] or describe more general recommendations [19]. Generic HRQoL measures, along with symptom checklists, have been employed to assess issues in patients with COVID-19 [20, 21], but they may not capture the range of potential symptoms and HRQoL issues associated with COVID-19. Moreover, ad hoc non-COVID-specific measures may fail to demonstrate the psychometric properties required for a PROM [22].

In April 2020, our multilingual and multicultural research group, involving clinicians, psychometricians, statisticians, and HRQoL specialists, set out to develop a questionnaire to assess HRQoL and symptoms in patients with COVID-19, from diagnosis through active disease and recovery. The first two phases of the questionnaire development have been published [9, 23]. In this paper, we present the next phase of the development process with pretesting of our provisional COVID-19 questionnaire and preliminary psychometric testing of the validity and reliability in an international sample of patients.

Methods

The current study is a multicenter international methodology study for a questionnaire development. Guidelines from the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Group (QLG) were followed [24]. The four-phase procedure covers general principles for questionnaire development and is supported by the Food and Drug Administration [25]. The first two phases (gathering relevant issues through literature review, interviews with health-care workers and patients in seven countries, and operationalization from issues to items) were performed from April to October 2020 and have been published [9, 23].

To make sure that the questionnaire also covered potential new important issues evolving during the pandemic the literature review was updated in October 2020, with results that was further explored in this phase III study, and in February 2021 without any new issues being reported [9].

The resulting 80-item provisional weekly (PW) questionnaire was named the OSLO COVID-19 QLQ-PW80© (short name QLQ-PW80) and copyrighted by the Oslo University Hospital, Norway that coordinated the international development. In the current phase III study, this work continued with further pretesting (pilot testing in a small sample) and validation of the provisional questionnaire. The aim was to identify missing or redundant items and ensure comprehensibility (phase IIIA) and perform initial psychometric testing (phase IIIB). All study documents were developed in English. The questionnaire was translated from English into the required languages (n = 9) using a modified forward/backward translation procedure based on international guidelines [26] and reviewed by an experienced translation officer before the patient enrollment started.

Countries and participants in phase IIIA and phase IIIB

Countries from all continents were approached through professional networks, including the World Health Organization (WHO) COVID-19 clinical management team network. Partners from 11 countries (Austria, Croatia, Germany, Ghana, India, Norway, Palestine, Spain, Sweden, The Philippines, and United Kingdom) enrolled patients ensuring representation from different cultural areas.

The target population was ≥ 18 years, with verified SARS-CoV-2 infection (according to local/national standards) and active or previous symptomatic COVID-19. Patients in hospitals, nursing homes, at home, or in COVID-19 centers were eligible if they were able to read and comprehend the study documents. Patients in intensive care units could be enrolled after discharge. To ensure content validity of the final questionnaire according to guidelines, pre-specified enrollment matrices were used [24].These matrices were set up to ensure sufficient sample sizes with adequate distribution of participants to represent the target population. In phase IIIA (Online Appendix 1), we aimed to include 10–15 patients in each cell of the sample matrix, ≥ 5 patients per country, and ≥ 45 patients in total. In phase IIIB (Online Appendix 2), we aimed to include 15 patients per cell of the sample matrix and ≥ 300 patients in total [24], to allow for preliminary evaluation of the psychometric properties, in particular the hypothesized scale structure. In phase IIIB, participants in recovery, expected to have stable disease i.e., unlikely to experience any changes in the physical well-being, were asked to complete the questionnaire a second time after 14 days (± 2 days) to measure test–retest reliability. The target sample size for this part was 50 patients.

Phase IIIA: procedure, collection of data

Data collection in phase IIIA was performed from November 2020 until June 2021. The researchers in the 11 countries approached patients in hospital, nursing homes, and patients staying at home. After written informed consent, patients completed the provisional questionnaire, the QLQ-PW80, in their native language and recorded the completion time. This was followed by structured debriefing interviews by the research team documented with field notes (Online Appendix 3). Patients’ background information was collected (age, gender, hospitalization, time since diagnosis, disease severity, comorbidity). Whether the patients had experienced each item was assessed considering the whole disease period and reported as relevance (1 = not at all, 2 = a little, 3 = quite a bit, 4 = very much). If experienced, the extent they were troubled by this was regarded as a measure of importance (0 = not experienced, 1 = not at all to 4 = very much). We defined relevant items as those with relevance scores 2–4, and important items as those with importance scores 3 and 4. Patients were asked if any of the items were difficult to understand or confusing, annoying, or upsetting, and if any of the items overlapped with other items. The interviewer also asked about relevance and importance of any other issues not listed, but still experienced by the patient. Six items were assessed in more detail to explore how the patients interpreted/understood these issues (Appendix 4).

Phase IIIA: decision rules and criteria for retention of items

A set of six decision rules and criteria for retention of items in phase IIIA was set up as recommended [24]; the first four criteria considered most important (Table 1A). Since COVID-19 is a new disease and a limited number of patients were to be enrolled in phase IIIA, care was taken not to exclude important items simply because of low prevalence. The decision on whether an item should be retained, modified, or removed was made by consensus in the project group, including review of potentially overlapping items and rewording of difficult and annoying items.

Table 1 Criteria for retention of items in the Oslo COVID-19 QLQ phase IIIA and IIIB

Phase IIIB: procedure and collection of data

Data collection in phase IIIB was performed over six months from July 2021 to January 2022. An electronic data capture system, Ledidi® was used that fulfilled the necessary national and EU security and privacy policy requirements [27]. After written informed consent, patients in hospital, at home or in nursing home, were asked to complete questionnaires on paper or digitally in the Ledidi system®. The set of questionnaires consisted of (1) background information (as described for phase IIIA, supplemented by vaccination status), (2) the provisional questionnaire resulting from phase IIIA, and (3) a debriefing questionnaire. The debriefing questionnaire documented the time and help needed to complete the questionnaire, whether there were difficult, annoying or overlapping items, or additional issues not covered by the questionnaire. In addition, patients in recovery (more than three months after being infected) were asked to point out any additional issues they had experienced more than three months after being infected.

Phase IIIB: decision rules and criteria for retention of items

In phase IIIB, the research group used seven criteria for retention to decide on items that were candidates for removal (Table 1B). The summarized results and consistency across languages were discussed, and decision trails from earlier phases of development consulted. An item was kept if clinically relevant, fulfilling ≥ 5 of the retention criteria, or found important to subgroups of patients, and not overlapping with other items. An item was modified if > 5% of the patients found it difficult or annoying.

Statistical analyses

Descriptive analyses

Descriptive analyses were presented as frequencies and proportions for categorical data and means, standard deviations, and range for continuous data. For the analysis of content validity, interviews in phase I and pilot testing by the user group in phase II laid the foundations [23]. We investigated this further by exploring feasibility; i.e., patients who needed more than > 30 min to complete the questionnaire, patients who needed help understand the items, and whether there were systematic patterns of missing values. We explored possible differences in response patterns between countries by qualitative review of responses and comments in the debriefing questionnaire. New issues raised by the patients were considered if not covered by existing items, not excluded in previous phases, and not clinical parameters or other distinct conditions.

Psychometric analyses

The scale structure analyses were based on the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) taxonomy [28]. In addition to the descriptive analyses, we tabulated the results per item along with the number of fulfilled criteria (Online Appendix 5). The multi-language clinical expert group proposed scales from a clinical point of view. The scale structure was modified based on the following analyses. We assessed the validity by correlation-based methods (e.g., multitrait analysis) using the Pearson correlation. Item convergent validity was supported by a correlation of ≥ 0.40 between an item and its own scale (corrected for overlap). Scales that are conceptually related (e.g., fatigue and malaise) are expected to have a correlation ≥|0.40| while scales that are conceptually different (e.g., worries and temperature) would have a correlation <|0.40|. Scaling errors were calculated as the percentage of items that correlated higher with a different scale than with their own scale, corrected for overlap.

Known group comparisons were performed for pre-defined groups (Online Appendix 6) based on the patient matrix (Online Appendix 2), using independent sample t tests.

To test the reliability, we calculated the internal consistency using Cronbach’s alpha for each multi-item scale. Values > 0.70 were considered acceptable for group comparisons. Test–retest reliability was assessed by calculating intra-class correlation coefficients (ICC) for scores at first and second time points for the multi-item scales.

Confirmatory factor analyses were performed to explore the dimensionality of the questionnaire. We calculated standardized factor loadings for each item with regard to the corresponding scale and considered loadings > 0.40 to be sufficient. Model fit was assessed by the Comparative Fit Index (CFI) and the Tucker-Lewis Index (TLI), with both indices considered to indicate good fit if > 0.95 as well as by the Root-Mean-Squared Error of Approximation (RMSEA) which is recommended to be < 0.06 [29].

The final scale structure was based on the psychometric tests and clinical evaluation, and a scoring manual was developed (Online Appendix 7). All items had responses on a four-point Likert scale ranging from 1 = ‘not at all’ to 4 = ‘very much,’ except the two items in the overall quality of life scale ranging from 1 = ‘very poor’ to 7 = ‘excellent.’ The scores from scales and single items were linearly transformed to scores ranging from 0 to 100. A higher score represented worse symptoms and more problems on functioning scales. For the overall quality of life scale, higher score represented better quality of life.

Results

Phase IIIA

Research teams in 11 countries interviewed 54 patients (Table 2). Adult males and females of all ages, with active ongoing disease or in recovery, admitted to hospital or isolated at home, were included. Twenty-one patients reported various comorbidities. The mean time needed to complete the QLQ-PW80 was 16 min (range 4–45). The patient-reported relevance and importance, and item score distribution with final decisions, are presented in Online Appendix 8. In summary, 21 items were candidates for removal according to criteria outlined in Table 1A. Patients identified possible overlap for five groups of items. Nine items were removed (Fig. 1) due to low importance (n = 4, blocked nose, sneezing, shaking hands, heartburn), overlap (n = 2, abdominal discomfort, abandoned by family and friends), and an overall consideration with low mean score, floor effect, limited importance, and few patients experiencing them (n = 3, constipation, dysuria, and hair loss). Five items identified as difficult or annoying were reworded, resulting in the revised provisional 71-item questionnaire named the Oslo Covid-19 QLQ-PW71© (QLQ-PW71).

Table 2 Patient characteristics in phase IIIA and IIIB
Fig. 1
figure 1

Overview of the development of the international COVID-19 specific health-related quality of life questionnaire

Phase IIIB

A total of 371 patients from 10 countries with ongoing disease or in recovery, admitted to hospital or isolated at home were interviewed (Table 2, Online Appendix 9). Patients who reported unspecified comorbidity (n = 49) commented, e.g., that they had slow metabolism, lumbar prolapse, fibromyalgia, Crohn’s disease, and myalgic encephalomyelitis. The six patients who needed > 30 min to complete the questionnaire (Table 3) were elderly (67–106 years old) and needed assistance with completion. Forty-four patients needed help to understand the items, mainly from Ghana (n = 15), India (n = 8), and Germany (n = 7), all age groups were represented. Patients in hospital (27/90) and nursing home/other institution (3/17) needed assistance more often compared to patients at home (14/263). Even though they needed assistance, 63% completed the questionnaire within 15 min.

Table 3 Phase IIIB Feasibility of the OSLO COVID-19 QLQ-PW71 in 371 patients

Most patients (279, 75%) completed all 71 items. Of the 92 patients with missing values, 68 had only one or two missing. Seven of the 92 patients had completed only the first page of the digital questionnaire (item 1–23), missing the subsequent 48 items. For the others, there were no patterns of missing data. There were no clear differences between the countries regarding number and pattern of missing values.

Additional patient-reported problems

Some patients described additional symptoms or problems experienced in the first three months (n = 56) and/or more than three months after being diagnosed (n = 36). Two of these problems were new: menstrual disturbances (n = 2) and word-finding problems or aphasia (n = 2). Other proposed issues were either covered by the questionnaire (n = 34), excluded previously in phase I or phase IIIA (n = 11), or were not patient-reported symptoms but objective signs such as low oxygen saturation and weight loss (n = 11) (Online Appendix 10). The project group agreed that there was not enough evidence to include the two new issues.

Evaluation of items and (initial) scale structure

Of the 71 items, 55 items fulfilled ≥ 5/7 criteria for retention (Online Appendix 5). Twenty items had a mean score > 1.5. Across all items, the whole range of responses was used and patients did not find them difficult or annoying in any language. Compliance was high (> 95%) for all but two items (light and heavy housework). All items were reviewed quantitatively and qualitatively. It was decided to remove 10 items that did not fulfill more than 4/7 criteria for retention, and were either not important to subgroups of patients (n = 8) or had low compliance and overlapped with other items (n = 2). For the initial proposed scales, the internal consistency was satisfactory (Cronbach’s alpha > 0.70) for all but one scale (Sensory) (Online Appendix 5). Nevertheless, the scale was retained because it was considered clinically meaningful.

For the initial proposed scales, the item-scale correlations were satisfactory for all but nine items (drowsy, feeling ill or unwell, headache, chest pain, weakness in hands or feet, appetite loss, carrying a heavy bag upstairs, social activity, worry about financial difficulties). Three were moved to another scale (drowsy, headache, chest pain) while six were kept in the initial scale, as this was considered clinically more meaningful. Three items (palpitation, burning and sore eyes, skin problems) were kept as single items. The ‘Respiratory’ scale was divided into ‘upper’ (throat) and ‘lower’ (chest), as this was clinically more meaningful, and the item-scale correlations were virtually unchanged.

The validity and reliability of the final questionnaire

The final 61-item weekly questionnaire resulting from our current phase IIIB development, now consists of 15 multi-item scales and six single items and was named the Oslo COVID-19 QLQ-W61© (QLQ-W61) (Table 4). All scales had acceptable item convergent validity (|≥ 0.40|) except one item (financial difficulties) in the ‘Worries’ scale. The scaling error was low (0–3.3%). Discrimination across pre-defined groups was observed for most of the scales (Temperature disruption, Sensory, Gastrointestinal, Physical and Social functioning), but not for all (Pain, Worries, Emotional, and Cognitive functioning) (Table 5). The ‘Respiratory lower, chest’ scale reached significance for disease status, but not for comorbidity and gender.

Table 4 Reliability and scale structure of the Oslo COVID-19 QLQ-W61
Table 5 Known group validity; Comparisons of scales within the Oslo COVID-19 QLQ-W61 for clinically distinct groups

Cronbach’s alpha was > 0.70 for 14 of the 15 multi-item scales (range 0.68–0.92) (Table 4). The test–retest reliability was good, ICCs between the two time-points were > 0.70 except for problems with eyes (ICC 0.44). In the confirmatory factor analysis, all standardized factor loadings exceeded the threshold of 0.40 (range 0.42–0.95), supporting the hypothesized scale structure. The CFI was 0.85, the TLI was 0.83, and the RMSEA was 0.06, somewhat lower than the pre-defined values of acceptable fit of the model and the data.

Discussion

This questionnaire was developed to capture HRQoL of patients with COVID-19 from diagnosis through active disease and the recovery period. We have demonstrated that the questionnaire captured relevant and important issues to the international sample of COVID-19 patients, that the final number of items is manageable, and that the instrument shows promising psychometric properties.

To allow the questionnaire to be feasible and valid in a heterogeneous international setting, we successfully ensured that patients involved represented a good distribution of age, gender, disease phases and severity, comorbidities, and countries. Cross-cultural acceptability was supported by patients from 11 countries in three continents reporting the items to be relevant and important, and discussions among the involved researchers from different cultural areas. Special attention was given to the wording of five items pointed out as difficult to understand to improve cross-cultural acceptance. The word ‘stigmatised’ was difficult to understand in some languages and the group decided to include ‘or judged negatively’ to improve the comprehension of the concept (Online Appendix 8, item 71). The word ‘abandoned’ was described as too offensive in some languages and, based on earlier input from patients, this was changed to ‘not receive sufficient attention’ (Online Appendix 8, item 77).

Even though elderly patients might need some assistance, the high compliance, few missing values, and the finding that most patients used ≤ 15 min to complete the questionnaire, supported that it is easy to understand and fill in, making it suitable for clinical studies but also for descriptive purposes in clinical practice. Since the digital version of the questionnaire was challenging for some patients, more focused instructions and user-friendly layout are recommended in future studies.

Our results where the participants regarded the functional and overall HRQoL items as relevant and important, are supported by others studies using, e.g., EQ-5D [30]. It is evident that the HRQoL issues relevant to COVID-19 patients are not limited to symptoms alone; how COVID-19 has affected these patients’ functioning and overall quality of life remain important factors in assessing the patients’ experiences.

The preliminary psychometric properties of the QLQ-W61 were robust. All multi-item scales had good internal consistency. Results from known group comparisons supported the pre-defined hypotheses for the majority of the scales, but not for all. For ‘Pain’ and ‘Worries,’ no differences were found between patients with acute disease and those in recovery; both groups showed elevated levels. This suggests that pain (e.g., muscle aches and pain) and worries (e.g., worries about health) may be issues that need to be flagged for post-COVID-19 condition (Long COVID). Younger and elderly patients had similar results on ‘Cognitive functioning.’ This may be a result of selection bias where elderly patients who could respond to the COVID-19 questionnaire (reflecting less problems with cognitive functioning) were selected for this study. For ‘Emotional functioning,’ an expected gender difference was not found even though many studies report more emotional issues among females compared to males [31, 32]. This imply that the emotional stress of having COVID-19 may have a similar impact on both genders, but this has to be further explored in larger patient populations. As expected, patients with acute disease had higher mean score on the ‘Respiratory lower, chest’ scale (e.g., shortness of breath, chest pain) than those in recovery. Males and patients with comorbidities are at risk of having more severe COVID-19 [33, 34] and lower respiratory symptoms such as shortness of breath and chest pain are regarded as symptoms of severe disease. However, we were surprised to find that there were no differences regarding gender or comorbidities. The test–retest reliability for both multi-item and single item scales were of acceptable levels, except for the eye-item. This shows that the assessment is stable over time unless clinical changes occur in the well-being of the patient. Although results from the initial confirmatory factor analysis showed sub-optimal fit (TLI < 0.90, CFI < 0.90), this may be explained by the low sample size relative to the complexity of the scale structure for this questionnaire. Furthermore, testing of differences between countries will be performed in the next phase of the questionnaire development involving larger samples of patients from finalized clinical studies in COVID-19 patients.

One limitation might be the questionnaire’s ability to cover issues from new variants of COVID-19, and issues related to Long COVID, although we performed two updates of the literature review to reduce this risk. For example, with the Omicron variant, the most frequent symptoms were runny nose, headache, fatigue, sneezing, and sore throat, which were different from the dominant symptoms in the earlier COVID-19 Alpha variant (i.e., fever, cough, and loss of sense of smell or taste) [35]. Also, symptoms of dry and red eyes have been mentioned as more frequent with Omicron than earlier variants of the virus [36]. The QLQ-W61 captures most but not all Omicron symptoms. Runny nose was excluded after phase I, and sneezing after phase IIIA, as patients who had experienced these symptoms did not regard them as important.

In addition, due to the time constrains, there are limitations in the analyses of the questionnaire’s ability to capture all symptoms of Long COVID. Long COVID was unknown at the start of the pandemic and consequently not included in the first two phases of this project. To compensate for this in the current phase, we specifically asked the 181 patients in recovery (more than three months post infection) to point out any additional issues they had experienced more than three months after being infected. The two new issues that were proposed were not included. Menstrual disturbances were more likely to be related to vaccination [37] or secondary to psychological distress [38]. Word-finding problems could be related to general fatigue or be part of a cerebrovascular incidence secondary to COVID-19 related thrombotic complications [39]. In a study on breast cancer patients with Long COVID not yet published (JIA, personal communication), the participants completed the QLQ-PW80 7 or 10 months after being diagnosed with COVID-19. They were asked to describe any new issues, but none were reported. Long COVID has been shown to affect patients in all age groups, with varying background characteristics and levels of disease severity [40].The most commonly seen symptoms are fatigue, reduced physical and cognitive functioning, shortness of breath, and palpitations [41], but also psychological symptoms such as anxiety and depression. Other symptoms reported are loss of taste and smell, muscle pain, headache, skin problems, and hair loss [41]. Almost all symptoms of Long COVID-19 are covered in the QLQ-W61, except less common symptoms such as hair loss (removed in phase IIIA) and conditions such as post-traumatic stress disorder.

Since this process included a broad sample of international patients in various stages of COVID-19, including recovery, we believe the questionnaire fits the needs of various stakeholders. However, a wider distribution of participants would have been optimal in a study of a worldwide pandemic. More countries were approached, but due to ethical approval obstacles, unfortunately, some colleagues were not able to participate and this is a limitation. Some patients involved in this study had comorbidities or pre-existing conditions, and we cannot ascertain that all reported issues are caused by COVID-19. However, since issues were reported as relevant and important by a number of patients, it is ensured that the reported issues are common to at least a subgroup of COVID-19 patients, regardless of their pre-existing conditions.

The Oslo COVID QLQ-W61 is regarded to have the properties needed in clinical trials as a standardized internationally developed tool to evaluate HRQoL of patients during active disease, in the recovery phase, and even in a Long COVID setting. It is a comprehensible instrument that is easy to fill in, and we believe that it could be useful in clinical practice as well. In a clinical setting, the questionnaire could be used to monitor patients, e.g., for symptom control in the clinic and to evaluate whether their HRQoL issues are changing over time. This has to be explored in future studies.

Additional psychometric testing in a large international sample would be preferable, but is considered time consuming and resource intensive and may be difficult to perform. Therefore, further validation will be based on data from ongoing clinical studies where patients fill in this COVID questionnaire, and data from two such studies (not yet published) are already available. In this study, we tested the questionnaire with a weekly (W) time frame (QLQ-W61), but in future studies, we may test the daily version (QLQ-D61).

Conclusion

The Oslo COVID QLQ-W61© is a stand-alone, multidimensional HRQoL questionnaire that can assess the symptoms, functioning, and overall quality of life of COVID-19 patients. The questionnaire is applicable for clinical trials and clinical practice, covers relevant COVID-19 issues and is acceptable to a broad population of COVID-19 patients from many countries. Although the questionnaire still needs to go through a final development phase of international psychometric validation in a large patient sample, this provisional questionnaire is now available for use with nine completed translations.