Background

Assessing patient-reported outcomes (PRO) has become a standard now both in clinical trials and routine care worldwide [1, 2]; it is also relevant many years after the diagnosis [3]. In a busy clinic or in multinational trials, using electronic data capture instead of paper is often preferred for several reasons: real-time assessment and consequently immediate feedback on the patients’ well-being [4], easier logistics once the system is established [5], the possibility for the adaptive presentation of questions [6], avoidance of secondary data entry errors [7], and ease of using different language versions.

However, there are also disadvantages of this mode of data capture: costs and time involved to develop and set up the information technology (IT) infrastructure [8], increased workload for the health care professionals [9, 10], challenges with data security [11] or digital exclusion [12]. There are also reports that electronic assessments may be more challenging for some patients [13], although other studies found that most patients prefer electronic symptom monitoring [14, 15]. Generally, it seems that patients agree to report on their symptoms electronically but they want to talk to a person about their problems as soon as there is something of concern [16, 17]. Preferences and participation are also related to the patients’ age, education, and digital literacy [9, 18, 19].

Another much-debated question is whether electronic data capture versus paper and pen or a life telephone interview is equivalent in terms of the outcomes obtained. While most studies find that equivalence is sufficient [20,21,22,23,24,25,26,27], others have reported that this is not always the case [28,29,30,31]. For example, a randomized trial found that patients reported more severe problems with an automated voice response system compared to during a nurse-led live telephone interview [30].

There is conflicting evidence whether the mode of assessment affects the proportion of missing data. While some studies report better completion with electronic data capture [32, 33], others found the opposite [34]. Forced answer options usually result in no missing items at all; participants might, however, just stop with questionnaire completion.

Regarding the time needed to complete a questionnaire, evidence is also inconclusive. In a systematic review, only two out of nine studies found that the completion time was shorter with electronic devices [35]. This question, however, depends on what time is taken into account, just the time needed for filling out the questionnaire or also the time a tablet takes to start or an online version to load.

Due to these open questions, it is generally recommended to avoid mixing the modes of data capture in a given study, unless there is evidence indicating that, for the particular instrument and patient population, the discrepancies are minor [7, 36]. The aim of the present analysis was to investigate whether there were differences in time required, or the need for assistance in completing a newly developed questionnaire measuring quality of life in thyroid cancer patients [37,38,39]. We also examined the proportion of missing data by mode of data capture. In particular, our questions were:

  1. 1.

    Is the type of data capture (electronic vs. paper-based) associated with the time required to complete the questionnaire?

  2. 2.

    Do patients more frequently complete the questionnaire themselves (as opposed to the researcher asking the questions and entering the data/completing the forms on behalf of the patient) when they use electronic rather than paper-and-pen data capture?

  3. 3.

    Is the type of data capture (electronic vs. paper-based) associated with whether or not help is required to complete the questionnaire?

  4. 4.

    Does the proportion of items missing a response vary by type of data capture?

Methods

Study design

We used data from the phase IV field validation study of the European Organisation for Research and Treatment of Cancer (EORTC) Quality of Life Questionnaire Thyroid Module (EORTC QLQ-THY34; study number EORTC 002/2017), to address our research questions. This was a prospective multinational study including patients with thyroid cancer from 21 different institutions in 17 countries.

Two groups of patients were enrolled. The first group comprised patients about to undergo treatment, named Group 1 (treatment). They completed the questionnaire on three occasions: before onset of treatment or best supportive care (t1) as well as 6 weeks (t2) and 6 months thereafter (t3). If a collaborating institution was not able to include patients at t1, patients could be enrolled at t2 (then defined as 6 weeks after first day of initial treatment). The second group comprised people who had been diagnosed with thyroid cancer ≥24 months prior to enrollment, without structural evidence of disease based on imagingFootnote 1, and without anti-neoplastic treatment during the past 12 months, named Group 2 (survivors). They completed the questionnaire only once.Footnote 2 As there is no generally accepted definition of when a patient could be considered to be a survivor, we chose the 24-month time interval, together with the other criteria, assuming that the participants would then not be in an acute situation anymore.

Eligibility criteria for all participants were: diagnosed thyroid cancer, sufficient language proficiency (the questionnaire was available in the local language of the participating institutions) and cognitive functioning to understand and complete the forms (as judged by the local investigator), age ≥16 years, and written informed consent.

Institutional Review Board Approval was obtained from the Ethics Committee of Rhineland-Palatinate Medical Association with reference number 837.406.17 (11240) as well as from all participating sites, according to the respective national requirements. More details about the study design and conduct are published elsewhere [37].

Mode of questionnaire administration and data capture

Electronic data capture was undertaken using the Computer Based Health Evaluation System (CHES) [40, 41]. Each site was free to decide what type of data capture (paper vs. electronic) they used, depending on the local infrastructure.

At the first time-point (t1), all patients were seen in person as per the study protocol. Electronic data capture could be used by handing them a tablet or using an online system.

Instruments

The participants received two questionnaires: the core instrument of the EORTC, the EORTC QLQ-C30 [42], and the thyroid cancer-specific module, the EORTC QLQ-THY34 [37].

The thresholds published by Giesinger et al. [43] were used to define what patients had clinically relevant impairments of their subjective emotional and cognitive functioning, which are 71 and 75, respectively.

The local investigator asked a few debriefing questions at t1 regarding the time needed to complete the questionnaire (core questionnaire and module together), type of completion (self-completed vs. researcher read questions to participants and recorded their response), and whether any help was required to complete it (no help, practical help [for example when the patient did not have their glasses so the questionnaire had to be read to them, or they had shaky hands and had difficulty writing], supportive help [such as interviewer just sitting with the patient while they completed the questionnaire], or help with understanding the questionnaire).

Statistical analysis

Data analysis comprising descriptive statistics and multivariable binary logistic regression analyses was performed with STATA (StataCorp. 2017. Stata Statistical Software: Release 16. College Station, TX: StataCorp LP).

The exposure variable was the type of data capture (paper vs. CHES) and the outcomes were: time needed (<10 vs. ≥10 min), type of completion (self-completed vs. orally), help required (any help vs. no help), and proportion of missing scale scores. According to the EORTC scoring manual [44], a score of a certain scale can be calculated when at least half of the items of that scale are completed. Hence, if more than half of the items are not completed, no score is computed and is therefore missing. We first calculated the number of missing scores per participant and then created a binary variable “any missing score vs. no missing score”. The latter was used as an outcome variable.

We explored the potential for effect modification by the following variables: age (<75 vs. ≥75 years), education (<10 years, 10 years, >10 years), clinical impairment of cognitive or emotional functioning at baseline; effect modification was explored using the Mantel-Haenszel method and consequently tested with likelihood ratio tests in the regression models. If there was no evidence for effect modification, we did not specifically mention that in the following results section.

The following variables were adjusted for language, UICC stage, performance status, comorbidity (ascertained using the Charlson Comorbidity Score), exhaustion at t1 (measured with the EORTC QLQ-THY34). We did not adjust for further individual patient characteristics because the type of data capture did not vary within one center, for logistic reasons. Once the center had decided about the type of data capture, this was not changed later on. For that reason, the results are not confounded by this.

As patients with anaplastic thyroid cancer and those receiving best supportive care differ in certain clinical aspects from those with other histologies and treatments, we explored in a sensitivity analysis whether the effect of data capture on the various outcomes is different in these patients.

Results

Sample characteristics

A total of 437 patients participated (see Fig. 1 for details). The majority (84%) had differentiated thyroid cancer, 11% had medullary, 4% anaplastic, and 1% other types of thyroid cancer. About 9% were 75 years or older. The median age was 51 years (mean: 51 years, standard deviation 16). Most (71%) had received more than 10 years of education (Table 1). At the time of entry into the study, 278 (64%) participants had received total thyroidectomy, 35 (8%) partial thyroidectomy, 72 (16%) no surgery, and for 52 (12%) no information about surgery was available. By the end of the study, 364 (83%) had received total thyroidectomy, 44 (10%) partial thyroidectomy, and 29 (7%) no surgery.

Fig. 1
figure 1

Patient flow through the study

Table 1 Demographic and clinical characteristics of the participants by type of data capture

Before t1, 10 (2%) patients had had radiotherapy for local control and 8 for distant metastases; 10 (2%) had received tyrosine kinase inhibitors (TKI); radioactive iodine (RAI) was received by 36 (8%) for ablation and 69 (16%) for therapy. By the end of the study, 275 (63%) participants had ever received RAI and 36 (8%) TKI. There were 4 patients who received the best supportive care.

Clinically relevant impairment of cognitive and emotional functioning as defined by Giesinger et al. (2020) was present at baseline in 36% and 50%, respectively (Table 2).

Table 2 Aspects of quality of life by type of data capture

Type of data capture

A total of 54 (12%) participants from three sites (Kobe, Japan; Innsbruck, Austria; Pamplona, Spain) used CHES for electronic data capture. The remaining data were captured on paper.

Description of outcomes

Time required

About 42% of the participants needed <10 min and 55% needed ≥10 min. For 12 participants (3%), the time required to complete the questionnaire was not documented. The numbers and percentages broken down by type of data capture are displayed in Table 3.

Table 3 Time and help required as well as proportion of missing scores by type of data capture

Type of completion

The questionnaires were self-completed by 68% of participants, by the researcher in 30%, and in 2% this was not documented (Table 3).

The type of completion was associated with the language used. The proportion of oral completion (researcher read out the questions) was as follows: Swedish 75%, Greek 61%, Portuguese 53%, French 30%, Spanish 30%, Italian 28%, German 28%, Japanese 27%, English 7%, Dutch 7%, Norwegian 4%, Arabic 2% (p < 0.01).

Help needed

The majority (74%) of the participants required no help to complete the questionnaires. A quarter (24%) needed some type of help, and for 2% whether help was needed was not documented (Table 3).

Proportion of missing scale scores

At t1, there were 16 participants (4% of those participating at t1) with missing scale scores; in 13 of these cases, only 1 score was missing.

At t2, there were 10 participants (3% of those participating at t2) with missing scale scores; in 8 of them, only 1 score was missing.

At t3, there were 14 participants (6% of those participating at t3) with missing scale scores; in 11 of them, only 1 score was missing.

For the 6 participants who entered the study at t2, their completion data were used from t2 for the following analyses. They all had no missing scores at t2. Hence, the final number for the regression models was 16 (4%) with at least one missing scale score.

Association of type of data capture with time or help required and with missing scores

Is the type of data capture associated with the time data collection needs?

There was a social gradient with less time needed the higher the education level was (≥10 min needed in participants with less than 10 years versus 10 years versus more than 10 years of schooling: 76% vs. 60% vs. 53%), but effect modification was not observed. A similar pattern was seen regarding cognitive functioning. Emotional functioning, however, did modify the effect of data capture on time needed. Consequently, stratum-specific effect estimates for patients with and without clinical impairment of emotional functioning are reported in the following, and the other variables were treated as potential confounders. Hence, the final list of variables adjusted for the regression model were age, education, cognitive functioning, language, UICC stage, status of disease, performance status, comorbidity, and exhaustion.

With this model, there was no evidence that patients without clinically relevant emotional problems differed in the time needed to complete the questionnaires (adjusted odds ratio [ORadj] 1.9, 95% confidence interval [CI] 0.3–11.4; p = 0.48). In contrast to that, patients with clinically relevant emotional problems more often needed ≥10 min to complete the questionnaires when they used electronic data capture compared to paper and pencil (ORadj 24.0, CI 2.4–235.8; p = 0.006; Fig. 2).

Fig. 2
figure 2

Proportion of patients needing less or more than 10 min to complete the questionnaire, by type of data capture and level of emotional problems. The figures inside the columns indicate the absolute numbers of participants within that category

When we ignore effect modification by emotional functioning, the odds of needing more than 10 min for questionnaire completion was 8 times higher in patients who used electronic data capture compared to paper and pencil (ORadj 5.5, CI 1.1–27.3; p = 0.04).

Do patients more frequently self-complete the questionnaire when they use electronic data capture?

The odds of having the researcher reading the questions out (instead of the patient doing this themselves) were considerably lower when electronic data capture was used (ORadj 0.1, CI 0.03–0.6; p = 0.01) compared to paper-based data collection.

Is the type of data capture associated with the help required for data collection?

The odds of needing any help were lower when electronic data capture was used (ORadj 0.1, CI 0.02–0.6; p = 0.01) compared to paper-based data collection.

Does the proportion of missing scale scores differ by type of data capture?

The proportion of missing scale scores was similar in both types of data capture: 4% vs. 2% had at least one missing scale score in paper vs. electronic data capture (see Table 3 for details). In line with that, when adjusting for potential confounders, we found no evidence for an effect of data capture on the likelihood of missing scores (ORadj 0.4, CI 0.04–4.0; p = 0.42).

Sensitivity analysis

Anaplastic cancer versus other types of histology

Of the 19 patients with anaplastic cancer, 3 had used CHES and the remaining used paper for their data capture. More than half (58%, n = 11) reported clinically relevant levels of emotional problems. The majority (84%, n = 16) needed more than 10 min for completing the questionnaires. Fifty-eight percent (n = 11) completed the questionnaires on their own. Ten percent (n = 2) had at least one missing score at t1. The anaplastic cancer patients were considerably older than the patients with other histologies (median 72 years, mean 70, range 43–83 years).

Due to the low number of patients using CHES in this group of patients, regression models using the same specifications as with the entire sample (variables taken into account for adjustment and effect modification) could not be computed. We had to restrict the adjustment variables to age and stratify by emotional functioning for the model where we had found effect modification in the entire sample. In those with increased levels of emotional problems, the age-adjusted OR for electronic data capture on time needed to complete the questionnaire was 0.13 (p = 0.24). For the eight patients with sub-threshold emotional problems and for all other outcomes, no regression models could be computed due to the low number of cases.

Best supportive care versus other treatment

All of the four patients with the best supportive care had used paper for data capture. Hence, no comparisons between different types of data capture were possible here.

Discussion

This analysis set out to investigate the effect of electronic versus paper-based data capture on the time and help needed to complete the questionnaires for a newly developed questionnaire to measure the quality of life in thyroid cancer patients [37]. We were also interested in whether the proportion of missing scores differs between these two options.

We found that the effect of data capture on the time needed is probably modified by the emotional well-being of the patients. In those without clinically relevant mental health problems, the questionnaire completion time is similar for both types of data capture. This is in contrast to other studies where less time was needed for electronic data capture [45], though the majority of studies did not find a difference in time for completion [35]. Those exceeding the threshold for clinical importance, however, needed in our study more time when electronic data capture was used. This finding is of relevance because emotional problems are common among cancer patients; about a third suffer from co-morbid mental health conditions [46,47,48,49]. As thyroid cancer involves the endocrine system, which is related to mental health, this topic is of particular concern in this group of patients [50,51,52,53,54,55,56]. They often suffer from depression, anxiety and poor emotional functioning, even for long periods of time after the diagnosis [3, 57,58,59]. In our study, half of the participants indicated emotional problems of clinical relevance. The instrument used to capture emotional functioning, the EORTC QLQ-C30, subsumes the following constructs under it: tension, worry, irritability, and depression. Especially irritability and tension are known to be related to hyperthyroidism, while depression is more common during hypothyroid states. Thyroid cancer patients can suffer from both.

It is well-known that individuals with emotional problems may slow down in their capabilities of processing thoughts and deciding [60,61,62,63]. Recent data on digital interventions also suggest that depressive patients need professional support by a human being and should not be left alone with a tablet or computer [64]. This underlines that using the latest technology is not always the best option for data capture [24] and researchers need to think carefully about their target population when deciding about the methods of data collection, be it “paper or plastic” [23].

Apart from that particular problem, our data show that electronic data capture may be associated with needing less help in filling out the questionnaires and with a higher probability of self-completion, which is usually desired in large-scale observational studies and clinical trials. This might seem somewhat contradictory to the findings discussed above about the time needed. It might imply that the electronic data capture procedures were clear to the patients and the material was easy to use, but that some of the patients still needed more time to complete the forms electronically, for other reasons than needing help.

The proportion of missing scores was equal in both types of data capture, which is in contrast to Blondin’s study where more missing values were found with electronic data capture [34]. This difference could be explained by the different modes of data acquisition: they used an interactive voice response system, whereas we presented a questionnaire on a tablet. Moreover, they elicited daily reports from the patients, whereas our study only had three timepoints for data collection, with a longer time in between.

The results of our study should be interpreted in the light of its limitations. First of all, the type of data capture was not randomly allocated to either institutions or individual patients. The differences found could therefore also be due to residual confounding. There could also be more effect modifiers that we did not take into account. We selected only four variables for test of effect modification in order to keep the power of the tests at an acceptable level. This brings us to the next limitation which is the relatively low proportion of sites (and therefore participants) using electronic data capture. Consequently, the statistical tests used do not have the ability to detect smaller effects and effect estimates may be imprecise. Moreover, the three sites using CHES are most likely not representative for other institutions because they had a preference for this type of data capture. We also learned that one of these sites included patients at the nuclear medicine department, where internet connections were slow in some rooms due to necessary radiation protection measures there. Finally, the time required for completing the questionnaires was not measured exactly but recorded by the researchers. It is possible that it was remembered differently when using paper versus electronic data capture, thereby introducing an information bias that cannot be controlled for.

Strengths of our study include that all data collection that we considered in this analysis was done in the hospitals, so the home environment of the patients could not affect the results.

In summary, the obvious advantages of electronic data capture such as real-time assessment [4] and fewer data entry errors [7] may come at the price of additional time required for data collection on the part of the patients when they have mental health problems. As such problems are very common among thyroid cancer patients [57, 65, 66], researchers and clinicians should take these aspects into consideration when choosing a type of data capture for a particular research question or clinical application, respectively, in this patient population.