FormalPara Key Summary Points

Why carry out this study?

Sjögren’s Syndrome Symptom Diary (SSSD) and Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F) are patient-reported outcome (PRO) instruments used to assess Sjögren’s symptoms.

Currently, qualitative evidence supporting the content validity of new and updated SSSD items and FACIT-F in a Sjögren’s population is lacking.

This study investigated the appropriateness of the new and updated SSSD items and FACIT-F in a Sjögren’s population, as well as patient perceptions of meaningful change on SSSD.

What was learned from the study?

The new and updated SSSD items and FACIT-F demonstrated good content validity and most patients considered a two-point improvement on most SSSD items meaningful, as well as a one- or two-point total score improvement.

This qualitative evidence supports the use of SSSD and FACIT-F as clinical trial endpoints, in clinical practice, and other research settings and qualitative data exploring meaningful change will be valuable in supporting psychometrically derived responder definitions.

Introduction

Sjögren’s is a chronic autoimmune disease characterized by lymphoid infiltration and progressive destruction of the exocrine glands [1]. Sjögren’s affects approximately 0.5–1% of the population, with a strong female predominance (female:male ratio of approximately 9:1) [2]. Sjögren’s symptoms are highly heterogenous, including eye, mouth, skin, and female genitalia dryness [1, 3,4,5]. Patients also report joint/muscle pain and fatigue, with approximately 70% of patients describing fatigue as a pronounced and incapacitating component of the disease [6]. Approximately 30–40% of patients with Sjögren’s also have potentially serious systemic organ complications that can greatly affect morbidity and mortality, such as interstitial lung disease, peripheral and central nervous system inflammation, arthralgia, and myalgia [7,8,9,10].

Sjögren’s Syndrome Symptom Diary (SSSD) is a novel patient-reported outcome (PRO) instrument in the form of a daily diary, being developed in line with regulatory guidance [4, 11, 12]. SSSD assesses the severity of eye, mouth, skin dryness, and genital dryness (females only), fatigue, and muscle/joint pain over the past 24 h on a numerical rating scale (NRS) ranging from 0 (no symptom) to 10 (worst possible symptom). Although there is no agreed approach, various scoring methods have been explored using phase II clinical trial data (NCT02962895), including a six- or five-item (excluding genital dryness) average total score and tracking changes in the symptom(s) rated most severe at baseline. However, to explore alternative personalized scoring approaches, two new supplementary, standalone items have been developed for administration alongside SSSD. These items assess which symptoms included in SSSD are ‘most bothersome’ and ‘most important to improve’ from the patient perspective. Further, the SSSD ‘fatigue’ item has been updated to assess ‘tiredness’ as most patients in the development interviews used this terminology [4].

Functional Assessment of Chronic Illness Therapy-Fatigue (FACIT-F) version 4 is a well-established, 13-item PRO instrument widely used in clinical trials and clinical practice to assess fatigue across a range of health conditions [13,14,15,16,17,18]. Items assess aspects of physical and mental fatigue and their impact on daily activities and functioning over a 7-day recall period. Each item is rated on a five-point Likert scale ranging from 0 (not at all) to 4 (very much), resulting in a transformed total score from 0 to 52. FACIT-F has demonstrated content and psychometric validity in a number of rheumatic diseases including systemic lupus erythematosus [14,15,16], psoriatic arthritis [17], and rheumatoid arthritis [18]. The instrument has also been used in several Sjögren’s clinical trials to assess fatigue more granularly compared to single items included in SSSD and European Alliance of Associations for Rheumatology’s (EULAR) Sjögren’s Syndrome Patient Reported Index (ESSPRI) [19,20,21,22,23]. FACIT-F has been included in various European Medicines Agency (EMA) approved product labels for rheumatic diseases such as rheumatoid arthritis [24,25,26,27,28] and systemic lupus erythematosus [29], and in two Food and Drug Administration (FDA)-approved labels for rheumatoid arthritis [30, 31].

Although the original SSSD items have demonstrated content validity, additional qualitative evidence is required to support the new and updated SSSD items. Further, there is no evidence to support the content validity of FACIT-F in a Sjögren’s population. Regulators such as FDA are increasingly emphasizing the importance of qualitative evidence demonstrating content validity of PRO instruments, an important criterion to support the use of instruments as clinical trial endpoints and their inclusion in product labelling [11, 12, 32,33,34,35]. Furthermore, qualitative evidence relating to meaningful changes on PRO instruments, and how changes relate to how patients feel, function, and survive, is also recommended by the FDA to support and complement within-patient meaningful change thresholds (i.e., responder definitions) generated using anchor-based statistical analysis [32, 33]. Given SSSD is still in development, evidence pertaining to this was also lacking. Finally, it was recognized that a preliminary assessment of physicians’ preferences for the various potential SSSD scoring approaches would be valuable to support ongoing psychometric validation.

This paper describes qualitative research conducted with patients with Sjögren’s and expert Sjögren’s physicians to address the key aforementioned evidence gaps related to SSSD and FACIT-F.

Methods

Sample and Recruitment

Two non-interventional, cross-sectional, qualitative interview studies were conducted with patients with Sjögren’s and expert Sjögren’s physicians. MedQuest Global Market Research, Inc recruited patients from diverse locations across the United States (US) (Baltimore, MD; Chicago, IL; Los Angeles, CA; Pittsburgh, PA; and St Louis, MO) via physician referrals. In accordance with best-practice guidelines for collecting comprehensive and representative input [36], sampling quotas relating to key demographic and clinical characteristics were employed to ensure insights were gathered from a diverse sample of patients who were representative of the wider Sjögren’s population (Table 2).

Expert Sjögren’s physicians from the US, UK, and Germany were approached by the study sponsor to participate in an interview. Although not included in this paper, the content validity of ESSPRI [19] and EULAR Sjögren's Syndrome Disease Activity Index (ESSDAI) [37] was also assessed during the interviews [38]. Therefore, a sampling quota was employed to ensure greater representation of US physicians in the study to provide clinical perspectives from this region, given that the perspectives of European physicians were well incorporated during ESSDAI and ESSPRI development. Patients and physicians were required to meet pre-defined eligibility criteria (Table 1).

Table 1 Patient and physician eligibility criteria for qualitative interviews

Although there is no minimum sample size required for qualitative interview studies, it is recognized that Sjögren’s can be highly heterogeneous, meaning a relatively large number of patients would be needed to fully explore the disease experience. However, the focus of this study was to debrief SSSD and FACIT-F using cognitive interview methods and research suggests that 7–10 participants are adequate to comprehensively assess the content validity of Clinical Outcome Assessment (COA) instruments [39]. Additionally, previous qualitative research into SSSD and FACIT-F has been conducted and the present interviews aimed to supplement this evidence and gain additional insights on specific areas of interest in Sjögren’s. Therefore, the patient (n = 12) and physician (n = 10) samples were considered sufficient.

Qualitative Interviews

Interviews were conducted by trained Adelphi Values interviewers via telephone or Microsoft Teams video call, between March and June 2021. Discussions with patients (approximately 90 min) and physicians (approximately 60 min) were guided by separate semi-structured interview guides, employing cognitive interviewing methods. Interviews were designed and conducted in accordance with best-practice guidelines for qualitative research [32], such as ensuring questions were framed in an unbiased manner by using open-ended and non-leading questions.

During the interviews, the new and updated SSSD items were debriefed with patients to explore content validity (i.e., understanding, appropriateness, and relevance of item wording, recall period, and response options). Patient perceptions of meaningful change at the item and total score level were also explored, and how this would relate to changes in how patients feel, function, and survive. Furthermore, expert Sjögren’s physician feedback was sought regarding the most appropriate approaches to scoring SSSD. FACIT-F was also assessed with patients to gain evidence of understanding, interpretation, and relevance of the items to their experiences of Sjögren’s.

Qualitative Analysis

All interviews were audio-recorded and transcribed verbatim to allow for qualitative analysis using ATLAS.Ti software [40]. Separate coding schemes were developed for the patient and physician interviews and were used throughout the analysis process to ensure the consistent application and grouping of codes by trained and experienced researchers. Framework and thematic analysis methods were used to analyze the interview transcripts [41, 42]. Framework analysis involves the allocation of dichotomous codes to capture participants’ responses to a question (e.g., whether an item was relevant to a patient or not) [41]. Thematic analysis involves the identification of patterns across interviews to synthesize in-depth qualitative data in a flexible manner (e.g., themes surrounding perceptions of meaningful change) [42]. An induction–abduction approach was taken to identify themes in the data by topics and issues emerging directly from the data (inductive inference), and by applying prior knowledge (abductive inference).

Ethical Approval

Ethical approval and oversight were provided by Salus Independent Review Board (IRB), an international centralized IRB (physician and patient protocol IDs: NO9051A and NO9052A, respectively), and both studies were designed and conducted in accordance with best-practice guidelines [43, 44] and the ethical principles laid down in the Declaration of Helsinki and its later amendments [45]. All participants provided oral and written informed consent prior to the conduct of any research activities.

Results

Patient Characteristics

A total of 12 patients with Sjögren’s were interviewed. All demographic and clinical sampling quotas implemented to promote heterogeneity and representation of the Sjögren’s population (Table 2) were met or exceeded, except for the education level category ‘completed high school or below only’ which was narrowly missed (target: ≥ 3; actual: 2). Patients were predominantly female (n = 8/12; 67%), with a mean age of 56.1 years (range, 20–80 years). Although Sjögren’s is more common in White individuals (hence the racial quota relating to White and non-White) [2], White and non-White individuals were equally represented (n = 6/12; 50%). Most patients (n = 10/12; 83%) were diagnosed with Sjögren’s within the last 10 years and classified as having moderate (n = 5/12; 42%) or high (n = 4/12; 33%) disease activity based on a Physician Global Assessment (PhGA) score. At screening, most patients (n = 9/12; 75%) had an unsatisfactory symptom state (≥ 5 ESSPRI score) [37], reporting symptoms of eye dryness (n = 12/12; 100%), tiredness/fatigue (n = 11/12; 92%), and mouth dryness (n = 8/12; 67%), among others.

Table 2 Demographic and clinical characteristics as reported by patients at screening (N = 12)

Physician Characteristics

A total of ten physicians were interviewed, and all sampling quotas were met. Physicians were all rheumatologists from the US (n = 8/10; 80%), Germany (n = 1/10; 10%), and UK (n = 1/10; 10%), and five (n = 5/10; 50%) physicians were female. All physicians had been qualified for at least 10 years, treating patients with Sjögren’s for at least 5 years, and treating patients with Sjögren’s on a weekly (n = 9/10; 90%) or monthly (n = 1/10; 10%) basis at the time of interview. On average, physicians treated at least 20 patients with Sjögren’s per month and worked in a range of settings including academia (n = 8/10; 80%), private practice (n = 3/10; 30%), and/or hospital-based care (n = 2/10; 20%).

SSSD

New and Updated SSSD Items

In general, the new and updated SSSD items were well understood by all patients and most suggested it was easy to provide a response. Specifically, all patients (n = 12/12; 100%) understood the updated ‘tiredness’ item and 24-h recall period (n = 8/8; 100%), and most considered the item relevant to both their overall Sjögren’s experience and their experience of Sjögren’s in the past 24 h (n = 11/12; 92%). The 0–10 NRS was generally reported as an appropriate response scale (n = 11/12; 92%), and all patients suggested that it would be easy to remember their tiredness over the 24-h recall period (n = 12/12; 100%). All patients (n = 12/12; 100%) demonstrated a good understanding of the new ‘most bothersome’ and ‘most important to improve’ symptom items and most suggested it would be easy to choose a response for these items using the options provided (n = 10/12; 83%, and n = 10/10; 100%, respectively). All patients selected the same symptom as their most bothersome and most important to improve, with eye dryness selected the most frequently (n = 7/12; 58%). However, a third of patients (n = 4/12; 33%) reported a different symptom as their most bothersome, compared to the symptom they rated as most severe at the time of interview.

Meaningful Change on Individual SSSD Items

Meaningful change was explored in relation to patients’ 0–10 NRS score on each individual SSSD item at the time of interview. All patients (n = 12/12; 100%) reported that a two-point improvement on the eye dryness item would be important to them (Fig. 1). Similarly, most patients who experienced mouth dryness and muscle/joint pain reported that a two-point improvement would be meaningful to them (n = 7/9; 78%, range, 2–4 points and n = 7/11; 64%, range, 1–6 points, respectively). For the skin dryness and tiredness items, perceptions of a meaningful improvement varied, ranging from 1–6 to 1–5 points, respectively. Finally, of the three (n = 3/8; 38%) female patients who experienced genital dryness, two (n = 2/3; 67%) reported that a two-point improvement would be important to them (range, 2–5 points). The minimum amount of improvement that would be worth taking a treatment for was also explored during the patient interviews. Responses varied across patients, ranging from 1 to 6 points on the eye dryness and tiredness items, 1–5 points on the mouth dryness and muscle/joint pain items, 2–6 points on the skin dryness item, and 2–5 points on the genital dryness item.

Fig. 1
figure 1

Meaningful improvement on individual SSSD items

Patients were also asked about meaningful worsening on each item. For the eye dryness item, all patients who were asked (n = 11/11; 100%) reported that worsening by two points would be important. Similarly, most patients suggested that a two-point worsening would be important on the mouth dryness (n = 9/12; 75%, range, 1–5 points), skin dryness (n = 6/12; 50%, range, 1–3 points), and tiredness (n = 9/12; 75%, range, 1–3 points) items. However, responses were more varied for the pain and genital dryness items, ranging from 1–8 to 1–7 points, respectively.

Meaningful Change on SSSD Total Score

Most patients reported that a two-point (n = 6/12; 50%) or one-point (n = 4/12; 33%) improvement in their total SSSD score would be meaningful (range, 1–7 points), mostly as it would improve how they feel (n = 6/10; 60%; Table 3). Patients also discussed the symptoms that would be most important to see improvement in for the change to be meaningful (Table 3). Eye dryness was reported most frequently (n = 7/12; 58%), followed by muscle/joint pain (n = 3/12; 25%), skin dryness (n = 1/12; 8%), and tiredness (n = 1/12; 8%). Of note, these findings were in line with patients’ responses to the ‘most important to improve’ item.

Table 3 Impact of improvement in SSSD total score on how patients feel/function and most important symptom to see change in

Most patients who were asked reported that a two-point (n = 5/11; 45%) or one-point (n = 3/11; 27%) improvement in their SSSD total score would be worth taking a treatment for (range: 1–4 points). Similarly, most patients who were asked reported that a one-point (n = 3/11; 27%) or two-point (n = 5/11; 45%) worsening would be important (range: 1–7 points).

Best Approach to Calculating an SSSD Score

Most physicians suggested that the most appropriate scoring approach would be tracking changes in individual SSSD item scores (n = 7/10; 70%), followed by tracking average total scores (n = 4/10; 40%) or tracking domain scores (n = 1/10; 10%). Of those who felt tracking changes in individual item scores would be optimal, three (n = 3/7; 43%) explained that patients’ experiences of symptoms assessed by SSSD vary greatly so should be tracked separately, and that improvements in individual symptoms may be masked by a total score including symptoms which are not relevant to their experience/may remain unchanged. Additionally, physicians noted that some symptoms may be influenced by environmental factors (i.e., skin dryness might be dependent on use of artificial heaters or lack of moisturizer use) so capturing individual data is important for monitoring long-term improvement (n = 2/7; 29%).

Interestingly, two physicians (n = 2/10; 20%) felt that a hybrid approach of tracking both individual item scores and average total score would be optimal. Both physicians felt it important to include an individual symptom score, given that improvement in one symptom does not equate to improvement in another, while acknowledging the importance of an overall total score, particularly in monitoring improvement longitudinally.

FACIT-F

Due to interview time constraints, FACIT-F was debriefed with n = 11/12 (92%) patients.

Understanding

FACIT-F items were generally well understood by patients (n ≥ 7/11; ≥ 64%, Fig. 2). However, some patients reported that they did not understand the term ‘listless’ (n = 4/11; 36%) in Item 3 (‘I feel listless [washed out]’), with one stating that they “have never actually heard that word before”. Of note, one of these patients was able to infer the meaning of this item from the term ‘washed out’, hence the total number of patients not understanding this item was three (n = 3/11; 27%).

Fig. 2
figure 2

Overall understanding for each FACIT-F item

Due to interview time constraints, it was not possible to assess patients’ understanding of the response options and recall period for each individual FACIT-F item. However, as the response options and recall period are consistent across FACIT-F items, it was deemed sufficient to explore patients’ understanding in relation to Item 1 (‘Fatigue’) only. All patients who were asked demonstrated a good understanding of the FACIT-F response options (n = 11/11; 100%) and 7-day recall period (n = 6/6; 100%). Appropriateness and ease of the response options and recall period were also assessed more generally, considering all FACIT-F items. All patients who were asked reported that the response options were appropriate (n = 10/10; 100%) and most reported that it was easy to answer the questions using the response options (n = 4/6; 67%). All patients who were asked (n = 8/8; 100%) also reported that it was easy to remember their experience of fatigue over the 7-day recall period.

Relevance

Most FACIT-F items were relevant to most patients (Fig. 3). However, Item 10 (‘I am too tired to eat’) was not relevant to any patients (n = 0/11; 0%), and Item 2 (‘I feel weak all over’), Item 3 (‘I feel listless [washed out]’), and Item 11 (‘I need help doing my usual activities’) were relevant to less than half of patients (n = 5/11; 45%, n = 3/11; 27%, and n = 4/11; 36%, respectively).

Fig. 3
figure 3

Overall relevance for each FACIT-F item

Throughout the interviews, four patients (n = 4/11; 36%) noted that it was difficult to distinguish whether their experiences of fatigue, as explored by FACIT-F, were caused by Sjögren’s or other reasons (i.e., age, co-morbid conditions, and/or general tiredness). Specifically, difficulties were raised in relation to Item 1 (‘fatigue’; n = 1/4; 25%), Item 2 (‘weak all over’; n = 1/4; 25%), Item 5 (‘trouble starting things’; n = 2/4; 50%), Item 6 (‘trouble finishing things’; n = 1/4; 25%) and Item 11 (‘need help doing usual activities’; n = 1/4; 25%).

Discussion

The SSSD [4], a novel PRO instrument assessing the severity of six key Sjögren’s symptoms, is currently being developed in line with regulatory guidance and the original items have demonstrated content validity in a Sjögren’s population [11, 12, 35]. Importantly, SSSD assesses dryness more granularly than alternative PRO instruments such as ESSPRI, where dryness is assessed by a single item. Further, the 24-h recall period of SSSD allows the assessment of symptoms over a shorter time period, thus reducing the risk of recall error/bias.

In line with terminology used by patients in the development interviews, the ‘fatigue’ item was updated to assess ‘tiredness’ and the data from the present study supports the content validity and use of this updated item. Similarly, the data supports the content validity of the new supplementary items assessing the SSSD symptoms that are ‘most bothersome’ and ‘most important to improve’ for patients. Therefore, the newly developed items are appropriate for administration at baseline alongside SSSD in clinical trials to support the exploration of personalized endpoints. Personalized endpoints are of growing interest due to the increasing emphasis on greater patient-centricity in drug development and regulatory decision-making [32]. The approach is also of particular interest in Sjögren’s due to the heterogenous nature of symptom presentation. The ‘most bothersome’ and ‘most important to improve’ items would also be valuable when used in the context of routine clinical practice to ensure patient care is tailored to individuals. Understandably, all patients reported that their most bothersome symptom was also the most important to improve, and most patients reported this to be eye dryness. However, patients’ most severe symptom was not necessarily their most bothersome, suggesting the approach of tracking patients’ most severe symptoms at baseline may not be appropriate in the Sjögren’s population. Further, the clinical insights obtained in this study relating to the most appropriate SSSD scoring approach will be valuable to support ongoing psychometric validation and use as a Sjögren’s clinical trial endpoint. More broadly, the findings support the content validity of the updated ‘tiredness’ item and the new supplementary items, complementing existing evidence supporting content validity of the original SSSD items and ultimately supporting use of the instrument in the context of clinical trials, routine clinical practice and other research settings.

In line with FDA’s guidance for industry and the greater emphasis on the use of qualitative data to support and contextualize meaningful change thresholds [11], this study also generated qualitative data regarding changes at the SSSD item and total score level that would be meaningful to patients, and how this relates to changes in their symptoms, and how they feel, function, and survive. Although responses varied across items, patients generally reported that relatively small improvements at both the item and total score level would be meaningful. As such, this qualitative data will be valuable for aiding interpretation of psychometrically derived responder definitions and ensuring greater patient-centricity when defining clinical trial endpoints [32, 33].

Although FACIT-F is a well-established instrument with evidence of content and psychometric validity in a number of rheumatic diseases [14,15,16,17,18], this paper is the first to present qualitative data demonstrating the content validity of FACIT-F in a Sjögren’s population. The findings therefore address a key evidence gap and are valuable in supporting use of the instrument to assess fatigue more granularly than SSSD where desired, such as in the context of clinical trials, routine clinical practice, and other research settings. Although FACIT-F items were generally considered relevant to most patients, scoring algorithms excluding items that demonstrated lower relevance in this study (particularly Items 2, 3, 10, and 11) could be explored in the future to maximize relevance to individuals with Sjögren’s. A minority of patients described difficulties distinguishing whether their tiredness and fatigue were related to Sjögren’s or due to other factors such as co-morbid conditions and general tiredness, which may influence their responses on FACIT-F. However, these findings are in line with previous research suggesting that the experience of fatigue in Sjögren’s and other rheumatic diseases is complex and likely to have several contributing factors [46,47,48].

Limitations

The relatively small size of the patient sample could be considered a limitation of this study, as it limits the generalizability of the findings, particularly given the fluctuation and heterogeneity of Sjögren’s symptoms meaning a relatively large number of patients would be needed to fully explore the disease experience. However, this was not the aim of this study. Instead, the aim was to debrief SSSD and FACIT-F using cognitive interview methods and previous research suggests that a sample of 7–10 participants is adequate to comprehensively assess the relevance and understanding of COA instruments [39]. Additionally, sampling quotas relating to key demographic and clinical characteristics were implemented to ensure insights were obtained from a diverse sample of patients who were representative of the wider Sjögren’s population. Further, previous qualitative research into SSSD and FACIT-F has been conducted and this study was aimed to supplement this evidence and gain additional insights on specific areas of interest in Sjögren’s. Therefore, the sample sizes were deemed sufficient to build on existing evidence to support the content validity of SSSD and FACIT-F in a Sjögren’s population. However, as previously mentioned all patients and most physicians who were interviewed were from the US. Although this was suitable for the aims of this study, future research could be conducted using translated versions of SSSD and FACIT-F to confirm their content validity in patients with Sjgren’s from different countries.

As Sjögren’s has a strong female predominance (approximate female:male ratio of 9:1) [2] it could be argued that the sample was not representative of the Sjögren’s population in terms of sex (female:male ratio of 2:1). However, this is less of a concern for the cognitive interviewing methods used in this study, aimed at exploring the content validity of COA instruments, compared to concept elicitation interviews aimed at exploring the patient experience of a condition. Furthermore, previous research suggests that Sjögren’s is heterogeneous regardless of sex, with the only noted difference being that females may experience genital dryness [49]. Additionally, the sample met the quota regarding sex, ensuring females were adequately represented to provide input on meaningful improvement in relation to the SSSD genital dryness item.

Conclusions

In conclusion, the qualitative data presented in this paper builds on existing evidence supporting the content validity of SSSD, its suitability as a Sjögren’s clinical trial endpoint, and its use in routine clinical practice and other research settings. Further, patient perceptions of meaningful change at the item and total score level will be valuable in interpreting and supporting psychometrically derived responder definitions and ensuring these are patient-centric. Expert physician opinions on SSSD scoring approaches will also be useful in supporting ongoing psychometric validation. Finally, this paper is the first to present qualitative data supporting the content validity of FACIT-F in a Sjögren’s population. The findings support use of the instrument to assess fatigue more granularly compared to other PRO instruments in the context of clinical trial endpoints, routine clinical practice, and other research settings. Taken together, these qualitative findings fill important evidence gaps and provide valuable insights which can inform use of SSSD and FACIT-F to assess Sjögren’s symptoms in various settings.