FormalPara Key Summary Points

Why carry out this study?

To characterize patients’ experiences with alopecia areata (AA), including the psychosocial and functional impacts of the disease, it is important to capture their perspectives using a rigorously developed and validated AA-specific patient-reported outcome measure

What was learned from the study?

This psychometric evaluation demonstrated the disease-specific Alopecia Areata Patient Priority Outcomes (AAPPO) to be reliable and valid in measuring symptom severity and impacts in adults and adolescents with AA

Findings from this study support use of the AAPPO in clinical trials to show treatment benefit from a patient perspective

Introduction

Alopecia areata (AA) is an autoimmune condition that targets the hair follicles, with an estimated self-reported point prevalence of approximately 1% [1]. Studies have shown that people living with AA are at a higher risk than the general population of developing depression, anxiety, and social phobia; living with AA has also been associated with much higher levels of body dissatisfaction and concern with general appearance because of the associated perception of hair loss [2,3,4]. The patient burden of AA, coupled with a lack of highly effective treatment options, represents a significant unmet medical need [5].

To fully characterize the patient experience of AA, including the psychosocial and functional impacts of this disease, it is important to capture patients’ perspectives directly [6]. Existing AA-specific patient-reported outcome (PRO) measures are missing concepts that are a high priority to individuals with AA or employ response options and recall periods that may not sufficiently capture the impacts of AA. Thus, a novel AA-specific PRO measure, the Alopecia Areata Patient Priority Outcomes (AAPPO) tool, was developed to assess hair loss signs as well as the emotional symptoms and activity limitations from the patient’s perspective [7]. Development of the AAPPO met the requirements described in the Food and Drug Administration (FDA) patient-focused guidance and was adherent to the principles of the FDA Patient-Focused Drug Development initiative [6, 8].

The objective of this study was to conduct a noninterventional quantitative evaluation consistent with the requirements of the FDA’s patient-focused guidance to determine the optimal structure and scoring algorithm and to assess the psychometric properties of the AAPPO, including reliability and construct validity.

Methods

Study Design

This study was a prospective, noninterventional, web-based study with two assessment time points, at baseline and follow-up 2 weeks later. A target sample size of 120 patients with a dermatologist-confirmed diagnosis of AA was recruited in the US. Of the target sample, enrollment was planned to include approximately 90 adult patients and 30 adolescent patients. Patients were recruited through dermatology practices that partnered with the Global Perspectives research database organization. Dermatology practices were responsible for identifying potentially eligible patients who had previously agreed to be contacted for studies from their clinical records.

The study was evaluated and deemed exempt from full review by the RTI International Review Board (IRB; IRB ID MOD00000707 for 20712). All participants provided informed consent.

Study Population

Eligible patients were adults (aged ≥ 18 years) or adolescents (aged 12–17 years) with a dermatologist-confirmed diagnosis of AA and who had experienced at least 6 weeks of hair loss. In addition, recruitment targets were applied to achieve a mix of participants with the following conditions: ≥ 25% scalp hair loss as measured by the Severity of Alopecia Tool (SALT) [9] within the past 30 days; alopecia totalis (AT), defined as complete (100%) scalp hair loss; and alopecia universalis (AU), defined as complete (100%) scalp, facial, and body hair loss [10,11,12]. Patients were ineligible if they were participating in a clinical trial, undergoing treatment with a Janus kinase (JAK) inhibitor in the past 90 days, or had other forms of alopecia.

Clinical Outcomes Assessment Measures

SALT

The SALT was developed by the National Alopecia Areata Foundation [9, 13] to quantitatively assess AA severity based on terminal scalp hair loss. The dermatologist provided ratings of hair loss in four areas of the scalp: the back, top, and two sides; each area represents a percentage of the total scalp surface area: 24%, 40%, 18%, and 18%, respectively. The SALT total score is the summed percentage of hair loss on the scalp in each of the four areas weighted by their respective surface area.

Dermatologists provided each patient’s SALT score assessed within 30 days before the baseline PRO assessments. Participants were classified into three groups (tertiles) based on the SALT total score: 25–49%, 50–75%, and 76–100%.

Patient-Reported Outcomes

The administered study PRO assessments at baseline and 2 weeks included the AAPPO, the Alopecia Areata Symptom Impact Scale (AASIS), the 36-item Acute Short Form (SF-36v Acute), the Patient Global Impression of Severity (PGIS) item, and 2 Patient Global Impression of Change (PGIC) items.

AAPPO

The 11-item AAPPO [7] contains four items, categorized as “Hair Loss” from (1) the scalp, (2) eyebrows, (3) eyelashes, and (4) body, and asks the patient to describe the current amount of hair loss using a five-point response scale that ranges from 0 (no hair loss) to 4 (complete hair loss: “I do not have any hair on my [insert hair loss area]”). Four items ask the patient to rate Emotional Symptoms of AA over the past week on a 5-point scale ranging from “Never” to “Always”. Three items ask the patient to rate Activity Limitations on a 5-point scale ranging from “Not at all” to “Completely (did not do any outdoor activities because of hair loss/did not do any physical activities because of hair loss/did not interact with others at all because of hair loss)”.

AASIS

The 13-item AASIS asks patients with AA about the severity of their signs and symptoms and how AA interfered with their daily functioning in the past week [14]. For signs and symptom ratings, the measure uses a numeric rating scale of 0 (sign/symptom has not been present) to 10 (the sign/symptom was as “bad as you can imagine it could be”). For the interference with daily functioning ratings, the measure uses a numeric rating scale of 0 (did not interfere) to 10 (interfered completely). The AASIS was designed to enable patients, clinicians, and researchers to make informed decisions about evaluating newer therapies specifically designed for the treatment of AA [14]. Users can calculate a mean total score and four subscale scores (2-item hair loss, 5-item symptoms, 7-item symptoms, 6-item interference), each ranging from 0 to 10 points, with higher scores indicating worse AA-specific health status.

SF-36v2 Acute

The Medical Outcomes Study (MOS) SF-36v2 Acute is a generic health status instrument that measures concepts of health-related quality of life over the past week for 8 general health domains: (1) physical functioning, (2) role limitations due to physical health, (3) bodily pain, (4) general health perceptions, (5) vitality, (6) social functioning, (7) role limitations due to emotional problems, and (8) mental health [15, 16]. These domains can also be summarized as Physical and Mental Component Summary (PCS and MCS) scores. The recommended normed scores were used, ranging from 0 to 100, with higher scores indicating better health status [16].

Global Items: PGIS and PGIC

Participants provided an overall assessment of the severity of their hair loss on the PGIS item “I consider my current hair loss to be: [none, mild, moderate, severe, extremely severe].” This single-item assessment was completed by all patients at baseline and at week 2 of the study. Scores ranged from 0 (none) to 4 (extremely severe). Patients also provided an overall retrospective assessment of their AA on the PGIC items. On the baseline questionnaire, they were asked to answer, “In the past 30 days, my alopecia areata has [greatly improved, moderately improved, slightly improved, not changed, slightly worsened, moderately worsened, greatly worsened].” Patients selected one response that best described their experience. On the follow-up questionnaire, they were asked to reply to a different PGIC item: “Since the start of the study, my alopecia areata has [greatly improved, moderately improved, slightly improved, not changed, slightly worsened, moderately worsened, greatly worsened].” Scores ranged from 1 (greatly improved) to 7 (greatly worsened).

Psychometric Analyses

Prior content validity work in the development of the AAPPO with adults and adolescents indicated that the AAPPO appropriately assesses disease status in both age groups [7]. Therefore, analyses planned to establish the AAPPO scoring algorithm (i.e., response distributions, inter-item correlations, and factor analyses) and assess reliability and construct validity were conducted with data pooled across both age groups using SAS v9.4 for Windows statistical software [17], with sensitivity analyses conducted in the separate adult and adolescent samples.

Distributional characteristics of the AAPPO responses were evaluated for possible response biases, including floor and ceiling effects (overall and by age group). A priori, the threshold for a potentially problematic floor or ceiling effect was set as ≥ 40% of participants (given a uniform distribution) selecting the best (ceiling) or worst (floor) response category [18].

To inform the AAPPO structure and provide scoring recommendations, inter-item polychoric correlations were computed, and a series of factor-analysis models were estimated with mean- and variance-adjusted weighted least squares estimation in Mplus version 7.4 [19]. Exploratory factor analyses (EFAs) were performed on baseline item scores (overall and for adults), and an increasing number of factor solutions were extracted with oblique quartimin rotation for comparison. Based on the EFA results, confirmatory factor analyses (CFAs) were conducted on 2-week follow-up data (overall and for adults), and the results were interpreted using model fit indices, including the root mean square error of approximation [20, 21], comparative fit index [22], Tucker-Lewis Index [23], and standardized and weighted root mean square residual [20, 24, 25] as well as the magnitude and pattern of the factor loadings.

To evaluate the repeatability of scores (i.e., test-retest reliability), weighted kappa and intraclass correlation coefficients (ICCs) were computed using the complete data and for subsets of patients with: (1) PGIS scores that were equal at baseline and 2-week follow-up, (2) PGIC scores that were equal at baseline and 2-week follow-up, and (3) either a 1-point change or no change in the AASIS hair loss subscale at baseline and follow-up. For the AAPPO item-level scores, weighted kappa coefficients were computed using quadratic weights [26,27,28]. For the AAPPO multi-item domain scores, a two-way mixed-effects analysis of variance model with absolute agreement for single measures was used [29, 30]. According to Landis and Koch [31], kappa coefficients can be interpreted such that ≤ 0 is poor, 0–0.2 indicates slight agreement, 0.21–0.4 indicates fair agreement, 0.41–0.6 indicates moderate agreement, 0.61–0.80 indicates substantial agreement, and 0.81–1.00 indicates almost perfect agreement. It is generally recommended that ICCs be at least 0.70 for multi-item scales [32, 33].

To evaluate internal consistency reliability, Cronbach’s coefficient alpha was computed to evaluate the cohesiveness of the resulting multi-item domains [34]. Cronbach’s alpha estimates > 0.70 indicate a set of strongly related items capable of supporting a unidimensional scoring structure [35].

Convergent and discriminant validity analyses aided in the evaluation of relationships among multiple indicators of similar and dissimilar constructs and the degree to which they followed hypothesized patterns. Moderate to strong correlations were anticipated between the AAPPO Hair Loss subscale and the PGIS. Moderate to strong correlations were also hypothesized between the AAPPO Emotional Symptoms and Activity Limitation domain scores and: (1) the AASIS symptoms and interference subscales scores and the AASIS total score; (2) the norm-based SF-36v2 Acute MCS score; and (3) the norm-based SF-36v2 Acute domain scores closely related to MCS (i.e., vitality, emotional functioning, role-emotional, mental health) [16, 36]. Smaller correlations were anticipated between the AAPPO domain scores and the norm-based SF-36v2 Acute PCS score and domain scores closely related to PCS score (i.e., physical functioning, role-physical, bodily pain, general health perceptions) [16, 36]. Correlation coefficients (absolute value) ≥ 0.50 were considered large, 0.30–0.49 were considered moderate, 0.10–0.29 were considered small, and < 0.10 were considered trivial [37].

Known-groups validity examines the ability of the AAPPO scores to discriminate among groups of AA patients who differ on external criteria or known groups. It was hypothesized that AAPPO domain scores would differentiate between patients: (1) with lower SALT scores (SALT 25%–49%) versus those with higher SALT scores (76–100%; greatest scalp hair loss); (2) who reported less hair loss versus those who reported higher levels of hair loss as assessed by the AASIS hair loss subscale items (as defined by AASIS interference subscale scores ≤ 1 and ≥ 5); and (3) who had higher MCS scores versus those who had lower MCS scores (≤ 30 vs. ≥ 50).

Results

Patient Characteristics

The study population included 121 patients with AA (85 adults aged ≥ 18 years and 36 adolescents aged 12–17 years) (Table 1). A mix of adult and adolescent patients with AA were enrolled: 57.9% (adults, 49; adolescents, 21) had ≥ 25% scalp hair loss (based on dermatologist-confirmed diagnosis of AA), 33.1% (adults, 28; adolescents, 12) had AT, and 9.1% (adults, 8; adolescents, 3) had AU. Furthermore, 37 (30.6%) patients (adults, 31 [36.5%]; adolescents, 6 [16.7%]) were in the SALT 25–49% tertile, 16 (13.2%) patients (adults, 13 [15.3%]; adolescents, 3 [8.3%]) were in the SALT 50%–75% tertile, and 68 (56.2%) patients (adults, 41 [48.2%]; adolescents, 27 [75%]) were in the SALT 76%–100% tertile. The mean number of years since diagnosis of AA was 12 years (adults, 15 years; adolescents, 6 years); the duration since diagnosis ranged from < 1 year to 58 years for adults and < 1 year to 15 years for adolescents.

Table 1 Patient demographics and characteristics

Of the 121 patients, 88 (72.7%) described themselves as White and 22 (18.2%) described themselves as Black. In the adult cohort, 14 (16.9%) had a high school education or equivalent (e.g., GED), 26 (31.3%) had an undergraduate degree, more than half (57.8%) were employed full time, and 21 (25.3%) were single or never married. In the adolescent cohort, 34 (94.4%) were students, with the majority (88.6%) not yet having completed high school.

Item-Level Distribution

As expected and given study inclusion criteria, item-level floor effects (≥ 40% at the worst health level) were observed on the AAPPO Item 1 assessing scalp hair loss (i.e., “a great deal” [46%] or “complete” [42%]) (Table S1, Supplementary Material). At baseline, 33.9% of patients reported complete hair loss of the eyebrows (Item 2), 29.8% reported complete hair loss of the eyelashes (Item 3), and 25.6% reported complete hair loss on the body (Item 4). The Emotional Symptoms items revealed an age group split, with adult endorsement levels of the most severe category (“always”) considerably higher than those of the adolescent group: self-conscious (Item 5; 42 vs. 17%), embarrassed (Item 6; 33 vs. 17%), sad (Item 7; 33 vs. 14%), and frustrated (Item 8; 38 vs. 17%). Finally, ceiling effects (≥ 40% at the best health level) were observed in both the adult and adolescent responses related to limitations due to hair loss in outdoor activities (Item 9), exercise (Item 10), and interaction with others (Item 11). For example, at baseline 40.0% of adults and 63.9% of adolescents responded “not at all” to limitations in outdoor activities because of hair loss (Item 9).

Inter-Item Correlations

In general, inter-item correlations were positive and strong in magnitude (|r|≥ 0.50) (Table 2). The inter-item correlations were positive and strong among the four Hair Loss items (Items 1–4) for adults, ranging from 0.73 to 0.92, and moderate to strong (|r|≥ 0.30) for adolescents, ranging from 0.31 to 0.94 (Table S2, Supplementary Material). The inter-item correlations were also positive and strong in magnitude between the respective Emotional Symptoms and Activity Limitations items (Items 5–8 and 9–11), ranging from 0.70 to 0.98.

Table 2 AAPPO inter-item correlations at baseline: overall (n = 121)

The observed negative correlations between the Hair Loss items (Items 1–4) and the Emotional Symptoms and Activity Limitations items (Items 5–11), ranging from − 0.06 to − 0.37, were not expected (Table S2, Supplementary Material). This finding may suggest the potential adaptation to the effects of hair loss in the more severe hair loss cases.

Exploratory and Confirmatory Factor Analyses

The EFA results using baseline data supported a three-factor solution for the AAPPO (Table S3, Supplementary Material), and CFAs fitted to the follow-up overall data confirmed this structure (Table 3). Although the CFA results provided support for consideration of an overall hair loss subscale (Items 1–4), patients with AA present clinically with hair loss on the scalp and/or any hair-bearing area on the body [38]. Moreover, the qualitative evidence obtained during the AAPPO development process demonstrated that not all patients experienced hair loss in each measured location (i.e., scalp, eyebrows, eyelashes, and the body) or prioritized hair loss from each area equally [7]. Therefore, the decision was made to score the four individual Hair Loss items separately and not as a four-item summed domain score.

Table 3 CFA results at follow-up: overall sample

Description of Recommended AAPPO Domain Scoring

Based on the content validity results identifying the 11 AAPPO items as distinct and important concepts, as well as the overall pattern of inter-item and construct validity correlations and the EFA and CFA results, all 11 items were retained, with 6 independent AAPPO scores: (1) Hair Loss on the Scalp (Item 1); (2) Hair Loss on the Eyebrows (Item 2); (3) Hair Loss on the Eyelashes (Item 3); (4) Hair Loss on the Body (Item 4); (5) Emotional Symptoms domain computed as the mean of Items 5–8, with the requirement that at least 2 domain items have nonmissing responses; (6) Activity Limitations domain computed as the mean of Item 9–11, with the requirement that at least 2 domain items have nonmissing responses. Each domain score ranges from 0 to 4, and a total score from the 11 items is not recommended.

Reliability

Using baseline (test) and follow-up (retest) data, ICC values estimating the test-retest reliability for the six AAPPO scores were acceptable (≥ 0.78) for the full sample as well as for the patient subgroups selected a priori to demonstrate stability over 2 weeks using the PGIS, PGIC, or the AASIS hair loss subscale assessments (Table 4). Similar ICC results were observed within each age group (Table S4, Supplementary Material). Internal consistency reliability was also strong for the two multi-item domain scores, Emotional Symptoms and Activity Limitations, with Cronbach’s alpha ranging from 0.87–0.96 at baseline and at week 2 (Table S5, Supplementary Material). Although the alpha > 0.90 levels may indicate redundancy [27], patients provided differentiation between and the importance of each of the Emotional Symptoms and Activity Limitations domain items during cognitive debriefing interviews; therefore, no items were removed [7].

Table 4 Six AAPPO domain scores: test-retest reliability

Validity

The four AAPPO Hair Loss item scores demonstrated moderate to strong construct validity (r ≥ 0.34) compared with the AASIS hair loss subscale score, with similar moderate to strong associations with the PGIS (r ≥ 0.41) (Table 5). Moreover, the four Hair Loss item scores had notably weaker relationships with the PCS score (|r|≤ 0.06) and with each of the PCS-related domains of the SF-36v2 Acute (|r|≤ 0.14). As hypothesized, Emotional Symptoms and Activity Limitations domain scores were strongly correlated with the SF-36v2 Acute MCS score (|r|≥ 0.58), AASIS interference subscale and total scores (r ≥ 0.68), and AASIS symptoms subscales scores (r ≥ 0.51) and moderately to strongly correlated with MCS-related SF-36v2 Acute domain scores (|r|≥ 0.44). The Emotional Symptoms and Activity Limitations domain scores had much weaker correlation with the AASIS hair loss subscale score (r ≤ 0.19) and the PCS score (|r|= 0.10). The PCS-related domains of the SF-36v2 Acute demonstrated generally moderate relationships (0.18 ≤|r|≤ 0.42) with the Emotional Symptoms and Activities Limitations domain scores (Table S4, Supplementary Material). These trends were similar for both adults and adolescents (Table S6, Supplementary Material).

Table 5 Six AAPPO domain scores: construct validity results at baseline

Results from the first set of known-groups validity analyses to confirm whether the hypothesized difference between groups known to differ on a key variable of interest (scalp hair loss) provided important insights. As predicted, the four AAPPO Hair Loss item mean scores were better (lower) for patients in the 25–49% SALT tertile compared with those in the highest SALT tertile (76–100%; p < 0.0001). However, AAPPO Emotional Symptoms and Activity Limitations domain mean scores tended to be worse (higher) for participants in the 25–49% SALT tertile compared with those who had higher SALT scores (Table 6; Table S7, Supplementary Material). Additional known group analyses for the Emotional Symptoms and Activity Limitations comparing the adult and adolescent groups with: (1) higher versus lower AASIS interference scores and (2) higher versus lower MCS scores (≤ 30 or ≥ 50) confirmed known group expectations for these two AAPPO domains in each age group (Table 7). Patients with lower (better) AASIS Interference scores had lower (better) AAPPO Emotional Symptoms and Activity Limitations mean domain scores compared with the subgroup with higher AASIS Interference scores (p < 0.0001). Similarly, the subgroup with higher (better) MCS scores had lower (better) AAPPO Emotional Symptoms and Activity Limitations mean domain scores compared with the subgroup of patients with lower MCS scores (p < 0.0001), with similar relationships demonstrated for both the adults and the adolescents (Table 7).

Table 6 Six AAPPO domain scores: known-groups validity at baseline by SALT subgroup
Table 7 Six AAPPO domain scores: known group validity at baseline by AASIS interference and MCS

Discussion

The performance of the AAPPO was evaluated using standard psychometric methods on data collected in the context of a prospective, noninterventional, web-based study. A total of 121 adults (n = 85) and adolescents (n = 36) with a dermatologist-confirmed diagnosis of AA were recruited in the US. A mix of patients with AA were enrolled: 37 (30.6%) had 25–49% scalp hair loss, 16 (13.2%) had 50–75% scalp hair loss, and 68 (56.2%) had 76–100% scalp hair loss based on their SALT total scores.

Reflecting the distribution of the SALT scalp hair loss scores provided by their clinicians, the majority of adults and adolescents considered their scalp hair loss as severe or extremely severe at the baseline and week 2 assessments, resulting in anticipated floor effects for the AAPPO Hair Loss items. Descriptive statistics also revealed ceiling effects (no limitation reported) for some AAPPO Emotional Symptoms and Activity Limitations items, most notably for the adolescent group. Taking into account the item-level correlations and the factor analysis, as well as the qualitative research conducted in the development of the AAPPO, six AAPPO scores are recommended to reflect their unique content: Hair Loss on the Scalp, Hair Loss on the Eyebrows, Hair Loss on the Eyelashes, Hair Loss on the Body, Emotional Symptoms domain, and Activity Limitations domain. A mean scoring algorithm is proposed for each domain ranging from 0 to 4, with higher scores indicating greater impacts.

The test-retest reliability coefficients (≥ 0.78) were adequate for demonstrating the reproducibility of the six scores, and internal consistency results (Cronbach’s alpha ≥ 0.87) were supportive of the two multi-item domains. Strong convergent and discriminant validity correlations and several known-group analyses provide additional empirical evidence that the AAPPO domains were measuring what they were intended to measure.

Although it was anticipated that greater hair loss severity, as indicated by the highest SALT tertile (76–100%), would yield the highest mean scores on the Emotional Symptoms and Activity Limitations domains, the pattern of these domain scores across the SALT tertiles was reversed. Specifically, Emotional Symptoms and Activity Limitations scores tended to be worse (higher) for those in the 25–49% SALT tertile than in the most severe SALT tertile (76–100%). These results demonstrated a greater emotional and activity impact of patients with 25–49% scalp hair loss compared with those with greater scalp hair loss and the pressing need for safe and efficacious treatments for patients at this moderate hair loss level [39] to alleviate their burden. Although a possible explanation for this finding is adaptation to life with AA by the patients in the highest SALT tertile, our preliminary analyses of the impact of years since diagnosis on the relationship between SALT tertiles and the AAPPO Emotional Symptoms and Activity Limitations domain scores did not reveal statistically significant trends in mean differences (p > 0.05). One exception to this conclusion was the trend for higher (worse) AAPPO mean Activity Limitations scores for the binary subgroup of patients with ≤ 10 years since diagnosis compared with patients with > 10 years since diagnosis (p = 0.0494; analyses available on request).

Another plausible explanation for this unexpected SALT 25–49% tertile finding is the greater daily emotional and activity-limiting burden to cosmetically conceal and manage smaller patchy areas of hair loss compared with patients with far greater or complete scalp hair loss (AT/AU). The latter group (SALT 75–100%) may, in general, focus less on these concealment challenges and instead sport a bald or prothesis-covered scalp, thus reducing: (1) the amount of time focused on daily concealing activities [40], (2) the emotional concerns of being “found out” if the concealment is imperfect or becomes disrupted, and (3) activity limitations necessary to avoid water, sweat, and/or wind that could disrupt cosmetic concealment of scalp hair loss. These possible explanations elucidate the need for future research to better understand this finding of greatest emotional and activity impact in the 25–49% SALT tertile.

An additional interesting finding in these analyses was the trivial-to-small relationships of the Emotional Symptoms and Activity Limitations domain scores with the PCS score (r = − 0.10; Table 5); these trivial and small PCS relationships differing in magnitude from the generally moderate correlations observed for PCS-related domains of the SF-36v2 Acute demonstrated a relationship (0.18 ≤|r|≤ 0.42; Table 5). Because the PCS score calculation is computed using all eight SF-36v2 Acute domain scores with positive weighting for physical domains and negative factor weights for mental domains, its relationship to the Emotional Symptoms and Activity Limitations domain scores is more complex than a simple examination of the correlations of the SF-36v2 Acute domains considered related to the PCS. This known challenge for best understanding the PCS score has been reported by others [41, 42].

In addition to the AAPPO, other AA-specific patient-reported measures are available, including the AASIS [14], the Alopecia Areata Quality of Life Index [43], and the Alopecia Areata Patients’ Quality of Life instrument [44], although these measures have not been extensively validated or used frequently in studies evaluating HRQoL in patients with AA [45]. The AAPPO has established content validity, reflects the symptoms and impacts that qualitative research has shown matter most to patients, and has been rigorously evaluated for reproducibility and cross-sectional measurement properties.

Limitations of this study include (1) the modest sample size (n = 121), (2) an adult sample that was primarily composed of females, (3) a greater proportion of patients with AU/AT than is reflected in a recent study of the AA population in the US [1], potentially limiting generalizability, and (4) a lack of longitudinal analyses to investigate the AAPPO domain scores’ ability to detect change over time and to explore meaningful within-patient change thresholds. Nonetheless, the AAPPO is currently being administered in a longitudinal, interventional study to investigate meaningful within-patient change thresholds [46, 47], providing the opportunity to investigate and understand these important measurement properties in the AAPPO domain scores.

Conclusion

The AAPPO is a novel, AA-specific PRO measure with domains that capture the outcomes of importance to patients with AA. This psychometric evaluation demonstrated the reliability and validity of the AAPPO to measure symptom severity and impacts in adults and adolescents with AA, supporting its use in clinical trials to show treatment benefit from a patient perspective.