FormalPara Key Summary Points

Why carry out this study?

A review of existing patient-reported outcome measures used in dermatological conditions indicated that there were some existing measures that could be appropriate for use in lichen planus; however, further qualitative and psychometric testing was required to address evidence gaps.

This study aimed to assess the content validity and psychometric measurement properties of the Dermatology Life Quality Index, Epworth Sleepiness Scale, Scalpdex and Oral Lichen Planus Symptom Severity Measure across three lichen planus subpopulations: cutaneous lichen planus, lichen planopilaris and mucosal lichen planus.

What was learned from the study?

The findings recommend the use of the Scalpdex and the Oral Lichen Planus Symptom Severity Measure with lichen planopilaris and oral mucosal lichen planus patients, respectively, and the Dermatology Life Quality Index in general lichen planus populations, with caveats. The Epworth Sleepiness Scale demonstrated weak psychometric properties and content validity when utilised with lichen planus patients.

This study highlights the importance of assessing the appropriateness of non-specific disease patient-reported outcome measures in disease-specific populations.

Introduction

Lichen planus (LP) is an inflammatory skin disorder estimated to affect between 0.5 and 1% of the population worldwide [1, 2]. LP can present in various forms across the body [3]. Cutaneous LP (CLP) lesions are the most common type of LP and are characterized by polygonal purple papules on the skin, often associated with severe itch and typically affecting flexor surfaces including the wrists, ankles and lower back [4]. Lichen planopilaris (LPP) is a follicular variant of LP and is most common in females [2]. LPP can present as painful and itchy patches of hair loss, predominantly localized to the centre of the scalp, along the frontal hair line and/or in the eyebrows [5]. If untreated, LPP can lead to irreversible scarring and alopecia [4]. Mucosal LP (MLP) lesions typically present as asymptomatic bilateral white striations or painful plaques localized in mucosal areas including buccal mucosa, tongue and gingivae, genitalia and conjunctiva [2, 4, 6]. Individuals may be diagnosed with more than one LP subtype, based on the clinical presentation [4].

Given the range of LP signs and symptoms (including itch, pain and a burning sensation at the affected areas) [1, 8,9,10,11], LP can have a significant impact on patients’ health-related quality of life (HRQoL) [4]. While qualitative literature is limited, there is evidence that LP patients, particularly CLP and MLP patients, experience psychological impacts including anxiety and depression [12]. Patients with oral MLP also report experiencing significant impacts to daily activities such as discomfort when having certain foods and drinks, which in some cases can result in depression and high levels of stress and anxiety [13, 14]. LPP patients have reported impacts on social interactions and daily activities as a result of scarring and hair loss, causing patients to have low self-esteem and feel self-consciousness [15].

Patient-reported outcome measures (PROMs) are commonly used in routine medical practice and clinical studies to measure symptoms and HRQoL from the patient perspective. It is important that PROMs are appropriate and fit for purpose in terms of content validity and psychometric validity in the context of use [16]. A review of existing PROMs used in LP and other similar dermatological conditions identified several PROMs that could be appropriate for use in LP clinical development programs. Specifically, dermatological measures such as the Dermatology Life Quality Index (DLQI) [17] and Scalpdex [18], and non-specific disease measures such as the Epworth Sleepiness Scale (ESS) [19], have been used to assess HRQoL in LP patients [15, 20,21,22,23]. While there is some evidence of content validity and psychometric properties for these measures in some dermatological conditions [23, 24], there is limited evidence to support their use in an LP population [25]. In contrast, while existing LP-specific PROMs such as the recently developed Oral Lichen Planus Symptom Severity Measure (OLPSSM) have strong content validity [8, 26], there is no published additional evidence of psychometric validation in an LP (nor any other) population.

To address the gaps in evidence and align with regulatory standards [16, 27], the current study aimed to assess the content validity and psychometric measurement properties of the DLQI, ESS, Scalpdex and OLPSSM in an LP population through the conduct of qualitative patient interviews and psychometric analysis of data from an international Phase 2 LP clinical study. Aligned with the United States Food and Drug Administration (FDA) patient-focused drug development (PFDD) guidance documents, a mixed-method approach was used to ensure that the patient voice was represented in the evaluation of the select PROMs and in future clinical study design in LP [28,29,30,31].

Methods

Study Design

This study was conducted in two phases: In the quantitative phase the psychometric properties of the DLQI, ESS, Scalpdex and OLPSSM were assessed in an LP population. In the qualitative phase content validity of the measures was evaluated via cognitive debriefing interviews.

Compliance with Ethics Guidelines

Ethical approval and oversight were obtained for the clinical study including exit interviews ([clinicaltrials.gov ID: NCT04300296, EUDRACT: 2019-003588-24]) and the independent qualitative interviews (Western Copernicus Group Independent Review Board [WCG IRB; reference: 20216826]). The studies were performed in accordance with the Helsinki Declaration of 1964 and its later amendments, and all participants provided informed consent indicating their data will be used for medical research purposes and the study results may be published.

Quantitative Phase

The quantitative phase used data collected from a global, randomized, double-blind, placebo-controlled, multi-centre, parallel-group Phase 2 clinical study involving 111 adults with biopsy-proven forms of moderate to severe LP (based on Investigator Global Assessment [IGA] rating of ≥ 3) who were eligible for systemic therapy and not adequately controlled with topical corticosteroids of high-ultrahigh potency in the opinion of the investigator. The study consisted of three cohorts (CLP, MLP and LPP) and two treatment periods (treatment period 1: baseline to Week 16; treatment period 2: Week 16 to Week 32) (Supplementary Material). For the psychometric analyses, treatment period 1 data were used. The PROMs selected were included as secondary or exploratory study endpoints.

Overview of PROMs

Table 1 provides a brief description of the PROMs included in the planned analyses and the cohorts they were administered to within the clinical study. Licenses to use the PROMs in the clinical study were obtained.

Table 1 Overview of the PROMs included in the quantitative phase of the study

Anchor Measures

Anchor measures were developed and administered in the LP clinical study to the full clinical sample to support psychometric evaluation of the PROMs [16]. This included a five-point patient global impression of severity (PGI-S) item, a five-point patient global impression of change (PGI-C) item, a five-point Investigator’s Global Assessment (IGA) scale and Item 1 of the DLQI (‘Over the last week, how itchy, sore, painful or stinging has your skin been?’). The PGI-S and the IGA were administered at baseline and at Week 2, 4, 8, 12 and 16; the PGI-C was administered at Week 2, 4, 8, 12 and 16.

Psychometric Analysis

Item- and scale-level psychometric analyses were conducted (Table 2). Unless noted otherwise, Week 4 data were used, as this time point was identified to provide a greater range of scores. As the PROMs were not appropriate for use in all LP types, analyses were conducted with different patient samples, e.g., DLQI and ESS with all LP types (n = 111), Scalpdex with LPP only (n = 37) and OLPSSM with MLP patients with oral LP (n = 33). The aim of this study was not to evaluate the structure of the questionnaires; therefore, factor analyses were not conducted.

Table 2 Summary of psychometric analyses

Qualitative Phase

The qualitative phase assessed the content validity of the PROMs via cognitive debriefing interviews. Given that the DLQI, ESS, Scalpdex and OLPSSM are existing validated measures, only relevance will be reported on, as evidence of understanding is already available from the original development studies and consequent studies evaluating their use. An overview of the study procedure is provided in Supplementary Material, with further detail described in the subsequent sections.

Sample and Recruitment

A subset of patients (n = 13) enrolled in the Phase 2 LP clinical study in the US were invited to participate in an exit interview once they had completed all treatment visits to Week 32 but before their Week 40 follow-up visit. Participation was voluntary and patients could opt-out from taking part in an interview; patients who withdrew from the clinical study early were not eligible to participate in an exit interview. To further enhance the sample size, an additional and independent sample of patients (n = 45) were recruited by third-party recruitment agencies via referring clinicians in the US and Germany to participate in a qualitative interview. Inclusion and exclusion criteria for the independent interviews were broadly reflective of the LP clinical study eligibility criteria. Based on previous research, the sample included was deemed sufficient for assessing the content validity of the PROMs [32].

Interview Procedure

Interviews were 60 min and conducted via telephone by trained qualitative interviewers in the patient’s native language using a semi-structured interview guide to facilitate the discussions. The cognitive debriefing (CD) section of the interview, which aimed to explore the relevance of the concepts assessed in the PROMs, lasted approximately 30 min and consisted of direct and focused questions.

Qualitative Analysis

All interviews were audio-recorded and transcribed verbatim with identifiable information redacted; the German interviews were further translated to English. Interview transcripts were analysed using Atlas.ti (Version 22) [33] using a framework approach [34]. Dichotomous codes were assigned to each item, instruction, response option(s) and recall period to indicate whether it was understood, relevant and/or appropriate, and why. Further codes captured any suggested changes.

Results

Participant Demographic and Clinical Characteristics

Overviews of the demographic and clinical characteristics for the qualitative interviews (N = 58: exit interviews, n = 13; independent qualitative interviews, n = 45) are presented in Tables 3 and 4, respectively. Age was lower for MLP participants and there was a higher proportion of females, again reflecting the female inclination of LP [35]. Most participants enrolled were in the US and were Black or African American. There was a higher proportion of participants with ‘moderate’ LP, as confirmed by IGA severity scores at recruitment. The clinical study sample (N = 111; n = 37 in each LP cohort) was comparable with the qualitative samples; these data will be presented elsewhere.

Table 3 Participant demographic characteristics
Table 4 Participant clinical characteristics

Quantitative Phase

Item-Level and Dimensionality Analyses

Inter-Item Correlations

As expected, items within the DLQI domains (Table 5) correlated well with each other, particularly ‘Leisure’ (r = 0.894) and ‘Personal relationships’ (r = 0.890). Items in the domains ‘Symptoms and feelings’ (r = 0.479) and ‘Daily activities’ (r = 0.579) correlated moderately, however, ‘Daily activities’ items correlated most strongly with Item 2 (‘Embarrassed or self-conscious’), which was part the ‘Symptoms and feelings’ domain (range: r = 0.721–0.848). The ESS (Table 6) had a few weak correlations with the weakest (r = 0.311) being observed between Item 2 (‘Watching TV’) and Item 6 (‘Sitting and talking to someone’). Majority of correlations were in the range of r = 0.60–0.70. No correlations in the ESS exceeded 0.80. For the Scalpdex (Tables 7 and 8), inter-item correlations ranged from – 0.226 to 0.935. Items within Scalpdex domains overall correlated moderately, but this varied. Item 19 (‘I feel that my knowledge for caring for my scalp is adequate’), Item 20 (‘The cost of caring for my scalp condition bothers me’) and Item 8 (‘My scalp condition bleeds’) had the lowest correlations with the remainder of the items, suggesting they measure concepts dissimilar to other items in the Scalpdex. A number of strong correlations were observed, suggesting potential redundancies. As shown in Table 9, the OLPSSM had few weak correlations < 0.40, with the weakest correlation (r = 0.136) being observed between Item 1 (‘When you brushed your teeth’) and Item 6 (‘When you talked’). Majority of correlations were in the range of r = 0.50–0.60, with Item 2 (‘When you ate food’) and Item 7 (‘When it was touched’) having the strongest correlation (r = 0.889), indicating possible redundancy.

Table 5 Inter-item correlations of the DLQI at Week 4—total sample (n = 108)
Table 6 Inter-item correlations of the ESS at Week 4—total sample (n = 108)
Table 7 Inter-item correlations of the Scalpdex Items 1–12 at Week 4—LPP sample (n = 37)
Table 8 Inter-item correlations of the Scalpdex Items 13–23 at Week 4 —LPP sample (n = 37)
Table 9 Inter-item correlations of the OLPSSM at Week 4—MLP sample with OLP (n = 33)

Scale-Level Analyses

Internal Consistency Reliability

Internal consistency was examined using Cronbach's alpha to assess the homogeneity of items belonging to the total measure score or domain score (Table 10). As Cronbach’s alpha cannot be used for domains with fewer than three items, this was not assessed for DLQI domain scores. Alpha coefficients surpassed 0.70, indicating good internal consistency (DLQI total score = 0.920, ESS total score = 0.859, Scalpdex ‘Functioning’ domain score = 0.823, Scalpdex ‘Emotions’ domain score = 0.941, OLPSSM total score = 0.877), except for the Scalpdex ‘Symptoms’ domain score (0.655). However, this domain is only composed of three items, and therefore lower reliability was expected. The measure with the highest reliability coefficient was the DLQI total score.

Table 10 Internal consistency using Cronbach’s alpha for DLQI, ESS, Scalpdex and OLPSSM

The alpha-if-deleted method was also conducted to assess whether the internal consistency of each total score or domain would improve with the removal of each item in turn (Supplementary Material). The overall internal consistency improved slightly with the removal of: DLQI Item 10 (‘Over the last week, how much of a problem has the treatment for your skin been, for example by making your home messy, or by taking up time?’) (0.921); ESS Item 6 (‘Sitting and talking to someone’) (0.864); Scalpdex ‘Symptoms’ domain Item 8 (‘My scalp condition bleeds’) (0.695); Scalpdex ‘Functioning’ domain Item 15 (‘My scalp condition affects the color of clothes I wear’) (0.844); Scalpdex ‘Emotions’ domain Item 19 (‘I feel that my knowledge for caring for my scalp is adequate’) (0.949) and Item 20 (‘The cost of caring for my scalp condition bothers me’) (0.949). However, given the marginal difference in the Cronbach's alpha coefficient, these results were not considered problematic.

Test-retest Reliability

Test-retest reliability was evaluated to examine the stability of scores either between Week 2 and 4, and Week 4 and 8 for the scales (ESS total score & OLPSSM total score) assessed at those three time points, or between Week 4 and Week 8 for the scales (DLQI total score & Scalpdex ‘Total’, ‘Symptoms’, ‘Emotions’ and ‘Functioning’ domain scores) not assessed at Week 2.

When stability was defined using the IGA, PGI-S, PGI-C or DLQI item 1, all ICCs surpassed 0.75, indicating good test-retest reliability [36] (Table 11). Pearson’s correlation coefficients were similar to the ICCs, providing further evidence of the reproducibility of measure scores in stable participants.

Table 11 Test-retest reliability for the DLQI total score, ESS total score, Scalpdex domain scores and OLPSSM total score
Concurrent Validity

The ESS total score had weak correlations (≤ 0.250) with all convergent measures (Table 12). The Scalpdex total score correlated strongly with the DLQI total score (0.801) and moderately with the OLPSSM total score (0.353). Both the Scalpdex total score and the OLPSSM total score correlated moderately with the DLQI Item 1 (range: 0.473–0.504) and the PGI-S (range: 0.609–0.637), while both had weak correlations with the PGI-C (range: 0.173–0.290). The IGA correlated moderately with the OLPSSM total score (0.552) and weakly with the Scalpdex total score (0.030).

Table 12 Concurrent validity correlations of DLQI, ESS, Scalpdex and OLPSSM with hypothesized convergent measures
Known-Group Validity

Known-group analyses compared DLQI total score, ESS total score, Scalpdex total and domain scores and OLPSSM total score, according to groups defined by IGA and PGI-S disease severity scores (Table 13). The DLQI total score, ESS total score, Scalpdex total, ‘Symptoms’ domain score and Scalpdex ‘Emotions’ domain score differed significantly (p < 0.05) among groups defined by the PGI-S, with moderate to large between-group effect size estimates. In contrast, the difference in mean scores between target PROMs and the IGA was non-significant with negative moderate to small between-group effect size estimates, suggesting that the IGA cannot discriminate between groups. Of note, due to the sample size for the OLPSSM, more weight should be given to the between-group effect size values to interpret validity; as such, OLPSSM scores show evidence of being able to discriminate between groups for the PGI-S known groups and the IGA known groups.

Table 13 Known-group analysis comparisons of DLQI total score, ESS total score, Scalpdex domain scores and OLPSSM total score
Ability to Detect Change

Within-group effect sizes [37] and between-group one-way ANOVA F-test were calculated to evaluate the magnitude and significance of the differences in change scores between each group (improved/worsened versus stable participants) (Table 14).

Table 14 Ability to detect change for the DLQI, ESS, Scalpdex and OLPSSM

For the DLQI total score, change scores between groups were statistically significant for both the PGI-S and PGI-C. For the ESS, small effect sizes were observed for all groups in the PGI-S and PGI-C and in the improved group for the DLQI Item 1. However, effect sizes were either non-significant (DLQI Item 1), in an unexpected direction (PGI-S) or similar for the stable and improved/worsened groups (PGI-C), suggesting that the ESS has limited ability to detect change in these anchor measures. For the OLPSSM, both the PGI-S and PGI-C showed a statistically significant difference between groups; however, statistical significance was not achieved for the DLQI Item 1.

For the Scalpdex total score, small effect sizes were found across the three groups for the DLQI, PGI-S and PGI-C, except for a moderate effect in the stable group for the DLQI. The DLQI and the PGI-C demonstrated some evidence of ability to detect change. The Scalpdex ‘Symptoms’ score had a large effect size for improved groups in all measures (DLQI Item 1, PGI-S, PGI-C). Worsened groups demonstrated a large effect size in the DLQI Item 1, a moderate effect size in the PGI-C and a small effect size in the PGI-S. A small effect size was observed for the stable groups in all three anchor measures. Change scores between groups were statistically significant for the DLQI Item 1 and PGI-C but not for PGI-S; however, the PGI-S p value may have been impacted by the low sample size for the worsened group. For the Scalpdex ‘Functioning’ score, all groups (improved, worsened, stable) had small effects sizes in all three anchors (DLQI Item 1, PGI-S, PGI-C). The only statistically significant difference between groups was for the PGI-S. For the Scalpdex ‘Emotions’ score, the DLQI Item 1 had a statistically significant change between groups with a small effect size reported for the improved and stable groups and a moderate effect size for the worsened group. The PGI-S and PGI-C had small effect sizes for all groups, with the change scores between groups being statistically significant for the PGI-S and not statistically significant for the PGI-C.

Qualitative Phase

DLQI

The DLQI was cognitively debriefed with all exit interviews participants (n = 13). Individual items did not perform well in terms of relevance; i.e., most items (n = 8/10, 80%) were considered relevant to less than half of participants. The least relevant items were Item 6 (‘Over the last week, how much has your skin made it difficult for you to do any sport?’) and Item 7 (‘Over the last week, has your skin prevented you from working or studying?’) (n = 1/13, 7.7% per item). The most relevant items were Item 1 (‘Over the last week, how itchy, sore, painful or stinging has your skin been?’; n = 11/11, 100.0%) and Item 2 (‘Over the last week, how embarrassed or self-conscious have you been because of your skin?’; n = 9/13, 69.2%), both of which are included in the DLQI ‘Symptoms and feelings’ domain.

ESS

The ESS was cognitively debriefed with a total of 49 participants (CLP participants during the exit interviews: n = 4, all participants during the independent interviews: n = 45). Relevance was mixed, with just over half of items (n = 5/8, 62.5%) being considered relevant to at least half of participants. The item that demonstrated the highest relevance was Item 5 (‘Lying down to rest in the afternoon when circumstances permit’; n = 43/49, 87.8%). Item 6 (‘Sitting and talking to someone’; n = 8/48, 16.7%) demonstrated the lowest relevance. Some participants were also asked additional probes about sleepiness with almost all participants reporting never feeling sleepy because of LP (n = 19/20, 95%) and most participants reporting never dozing off or falling asleep due to LP (n = 11/13, 84.6%).

Scalpdex

The Scalpdex was cognitively debriefed with a total of 19 LPP participants (exit interviews: n = 4, independent interviews: n = 15). Relevance was high, with almost all items (n = 21/23, 91.3%) being considered relevant to at least half of participants. The most relevant items were Item 3 (‘My scalp itches’), Item 6 (‘I am frustrated by my scalp condition’) and Item 9 (‘I am annoyed by my scalp condition’) (n = 18/19, 94.7% per item). The least relevant item was Item 15 (‘My scalp condition affects the color of clothes I wear’; n = 7/19, 36.8%).

OLPSSM

The OLPSSM was cognitively debriefed with MLP participants with oral involvement during the exit interviews (n = 5). Just over half of the items (n = 4/7, 57.1%) were considered relevant to at least half of participants. Almost all participants considered Item 4 (‘When you smiled?’; n = 4/5, 80.0%) and Item 6 (‘When you talked?’; n = 4/5, 80.0%) relevant to their experience of MLP, while Item 5 (‘When you breathed through your mouth?’; n = 2/5, 40.0%) was considered least relevant.

Of note, participant quotes to support the qualitative results are presented in Supplementary Material.

Discussion

There are limited disease-specific PROMs that assess HRQoL in LP patients and a scarcity of psychometric evidence for the use of generic HRQoL PROMs in this population. The analyses described in this study evaluated the content validity and psychometric properties of the DLQI, ESS, Scalpdex and OLPSSM to assess appropriateness of use in clinical trials with LP patients. Importantly, the mixed methods approach adopted allows for the patient voice to be represented not only in this study but in future clinical study designs, as recommended by the PFDD guidance documents [28,29,30,31] and followed the FDA recommendation for evidence-based rationale when proposing a clinical outcome assessment (COA) as fit for purpose [30]. Specifically, the approach adopted allowed for the assessment of whether the PROMs capture all important aspects of the concept of interest; that the method of scoring is appropriate and sufficiently sensitive to reflect clinically meaningful change within the context of use; that respondents understand the items as intended; that differences in scores can be interpreted in terms of impact on patient’s experience and that scores correspond to specific health experiences of patients [30]. The study also included exit interviews, which the FDA have noted as a valuable tool to contribute cumulative evidence on aspects of the patient experience; inform development or refinement of COAs; add greater depth to data in diseases, such as LP, that do not have much qualitative patient input; and to obtain patient input on meaningful outcomes [29].

While the DLQI is one of the most widely used PROM in multiple dermatological indications and has also been commonly used with LP patients [17], content and psychometric evidence of its appropriateness in LP patients for usage in clinical studies is limited [21]. The current study on the one hand supports the use of the DLQI in LP patients, as findings provide strong evidence of reliability and construct validity. The DLQI domain ‘Symptoms and feelings’ performed particularly well. On the other hand, the psychometric data do not confidently support that the DLQI can detect change over time in the specific context of use for adults with LP as high inter-item correlations between some items suggest potential redundancies. The qualitative interview data further suggest that patients did not consider most items relevant to their disease experience of LP. Given the modular nature of the DLQI, the study data support the use of the ‘Symptoms and feelings’ domain as an independent module with LP patients, where necessary and appropriate.

Even though the ESS demonstrated evidence of reliability in other populations, convergent validity was poor in this study. Furthermore, known-group comparisons showed evidence of the ESS’ ability to discriminate between groups for the PGI-S but not the IGA; ability to detect change was limited or null. These findings suggest that the ESS may not be appropriate for use in clinical trials with LP patients. This is supported by the qualitative findings where most participants reported that they never felt sleepy or wanted to fall asleep because of their LP, although some patients did spontaneously report sleep-related impacts, such as sleep disturbance (i.e., sleep quality and/or sleep quantity). It is suggested that measures that assess sleep rather than daytime sleepiness should be used in clinical studies with LP patients. However, further research is needed to ascertain whether sleep is a meaningful and important concept of LP, as data are scarce [20].

The Scalpdex performed relatively well when psychometrically evaluated in the study’s LPP patient sample, demonstrating evidence of internal consistency, test-retest reliability and convergent validity (although only weak correlations with PGI-C and IGA). There was mixed evidence to differentiate between known groups and to report an ability to detect change. Not all items may be appropriate for use with LPP patients. For example, inter-item correlations for Item 19 and Item 20 were much weaker than the rest of the items, while Item 15 demonstrated weak correlations with the other ‘Function’ domain items and Item 8 had overall very weak correlations including other ‘Symptoms’ domain items, which is particularly concerning as the ‘Symptoms’ domain only consisted of three items. These findings are not surprising as the Scalpdex was originally developed with patients with seborrheic dermatitis and scalp psoriasis [18]. Clinical characteristics present in these patients, such as desquamation and bleeding [23], may not be relevant to LPP patients. This finding is supported by the qualitative CD interviews and the original Scalpdex development study whereby the impact of desquamation, as assessed via Item 15, was reported as not relevant by a high percentage of patients [18]. Based on the study findings, it is suggested that the Scalpdex may be used with caution with LPP patients and that further evidence is needed when it is used in clinical trials. A potential further limitation of the Scalpdex is its length with 23 items that might be viewed as burdensome for many patients, particularly if some items are deemed not relevant. Similar to the DLQI, the Scalpdex ‘Symptoms’ domain performed better than the measure as a whole, but caution should be taken if the acceptable performance of the measure total score is purely driven by the ‘Symptoms’ domain-specific items.

Lastly, the OLPSSM, as psychometrically evaluated in MLP patients with oral involvement, had evidence of good reliability, construct validity and ability to detect change over time (PGI-S and PGI-C). It is not surprising that the OLPSSM performed well as it was designed specifically for patients with oral lichen planus and has been previously used within similar populations [8, 38]. However, despite the psychometric validity of this measure, it is worth noting that not all items may be relevant to all patients with oral involvement. For example, Item 4 and Item 5 have been noted in the literature and supported by the qualitative interviews in the current study as triggers least likely to cause soreness and are associated more with patients with severe OLP [8]. Furthermore, inter-item correlations between Item 1 and Item 6 were weak, suggesting that these two items might measure dissimilar concepts while correlations between Item 2 and Item 5 were very high, suggesting potential redundancy. Lastly, the OLPSSM is limited in its use to patients with oral involvement [8, 38], leaving a gap for other LP patients. Overall, the data suggest that the OLPSSM is a valid HRQoL PROM for use with patients with OLP.

Study Limitations

Given the potential limitation of a relatively small sample size of some LP cohorts in the current study, particularly for the OLPSSM and Scalpdex, future research in a larger sample size is recommended to strengthen the findings. Further research is also recommended to review other existing HRQoL measures that may be used in LP patients.

Conclusion

The results of our study contribute to the literature by providing novel insights into the appropriateness of existing PROMs commonly used with LP patients. Our study further highlights the need for additional psychometric evaluation and qualitative evidence to assess whether PROMs under consideration are “fit for purpose” for use in future LP clinical studies and support the development of additional LP specific HRQoL PROMs.