FormalPara Key Summary Points

Patients with ankylosing spondylitis and non-radiographic axial spondyloarthritis report a similar range of symptoms, which are assessed using the Assessment of SpondyloArthritis International Society Health Index (ASAS HI) and Ankylosing Spondylitis Quality of Life (ASQoL) tools.

While these patient-reported outcome tools assess a similar range of measures, it is not known how they perform in comparison to each other across the different disease groups and if they can be used to predict one another in the absence of one.

We developed a cross-walk scoring algorithm approach using non-parametric local-linear regression to allow for comparisons to be made across different studies using ASQoL or ASAS HI to assess patient-reported outcomes.

The results of our study demonstrate a scoring algorithm for cross-walking the ASAS HI to the ASQoL and vice versa, which allows for comparisons across studies using either ASAS HI or ASQoL to be made.

Digital Features

This article is published with digital features, including a summary slide, to facilitate understanding of the article. To view digital features for this article go to https://doi.org/10.6084/m9.figshare.14339597.

Introduction

Axial spondyloarthritis (axSpA) is a chronic inflammatory rheumatic disease that predominantly affects the spine and sacroiliac joints and encompasses patients with ankylosing spondylitis (AS) and non-radiographic axial spondyloarthritis (nr-axSpA) [1, 2]. Both AS and nr-axSpA patients present with similar symptoms, such as back pain, morning stiffness, reduced mobility, fatigue, difficulty sleeping, impaired social activity, and work productivity, all resulting in decreased health-related quality of life (HRQoL) [3, 4]. Clinical manifestations and disease activity measures of early AS and nr-axSpA are comparable, indicating that both are part of the same disease spectrum [5]. Although not all patients with nr-axSpA progress to modified New York criteria-defined AS, the disease activity, functional impairment, HRQoL, and health status have been reported to be similar between patients with nr-axSpA and AS [3, 6,7,8].

HRQoL of AS and nr-axSpA patients has been assessed in previous studies using patient-reported outcome (PRO) measures such as the 36-Item Short Form Health Survey (SF-36) (general HRQoL) [9, 10], Ankylosing Spondylitis Quality of Life (ASQoL) (disease-specific measure) [11], and Assessment of SpondyloArthritis International Society Health Index (ASAS HI) (disease-specific measure) [12]. These measures were developed in order to assess HRQoL and impact of the disease on daily activities and/or social participation from a patient’s perspective, as well as to quantify the relative impact of therapeutic response to treatment [11,12,13]. These factors are not always adequately assessed by traditional physician-focused disease-specific questionnaires as physicians are more likely to focus on measurable aspects of care such as pain, disability, and level of impairment [14].

The ASAS HI and the ASQoL quantify patient-reported HRQoL using similar parameters have a similar scoring range and have been validated in their respective patient cohorts [11,12,13,14,15]. However, their performance relative to each other is unknown. Since the ASQoL and ASAS HI are both used in clinical trials and real-world evidence studies of axSpA patients, it is important to be able to compare the results across studies using these measures and to assess equivalency of outcomes. Therefore, the objective of this study was to conduct a cross-walk analysis of the ASAS HI to ASQoL, and vice versa, in patients with physician-confirmed AS and nr-axSpA.

Methods

Study Design

Data were used from the Adelphi axSpA Disease Specific Programme (DSP™) surveys collected in the United States (US) between June and August 2018 from rheumatologists and their consulting AS and nr-axSpA patients using a combination of rheumatologist-completed patient record forms and voluntary patient self-completed forms. DSPs are large, retrospective, point-in-time surveys; the full methodology has been described and published previously [16, 17].

Rheumatologists who saw ≥ 10 axSpA patients in a typical month were eligible to participate in the study. Eligible rheumatologists completed patient record forms for their next ten consecutive patients (five AS and five nr-axSpA), with a rheumatologist-confirmed diagnosis of AS or nr-axSpA. Rheumatologists reported information on patient demographic characteristics (age, sex, body mass index [BMI], employment status, and ethnicity), disease status (severity, time since diagnosis) and current treatment patterns. Patients with a rheumatologist-confirmed diagnosis of AS and nr-axSpA were invited to complete a self-completed form independent of their rheumatologists, in which they completed patient-reported outcomes including the ASQoL and ASAS HI.

Ethics

Ethical approval was sought and granted through Western Institutional Review Board (WIRB—Study ID number 1183030). The non-interventional, observational nature of the data collection did not result in patients being placed at risk from the study. Physicians and patients provided informed consent to participate in the study and did not provide any personally identifiable information. All responses were anonymized to preserve respondent (physician and patient) confidentiality and all participating physicians and patients were assigned a study number to aid anonymous data collection and allow linkage of data during data collection and analysis, as well as other regional guidelines and regulations where appropriate.

Outcome Measures

ASQoL

The ASQoL questionnaire was developed in 2003 and is a self-administered 18-item questionnaire around the needs-based QoL model. The ASQoL has been utilized in both clinical research and daily practice to assess patient-reported HRQoL [11]. It is well accepted by AS patients, taking about 4 min to complete, and is a valuable tool for assessing the impact of interventions for AS and evaluating models of service delivery, covering functional, emotional, pain and fatigue domains, and the impact of AS on a patient’s HRQoL [18]. Patients are asked to respond “Yes” (equaling a score of 1) or “No” (equaling a score of 0) to each statement based on how they feel at the time of completion. All item scores are summed to give a total score ranging from 0 (good HRQoL) to 18 (poor HRQoL) [19]. A total score cannot be calculated if more than three items (i.e., 20% of the data) are missing. Published research has indicated that differences of 1–2 points in ASQoL scores are clinically significant [20].

ASAS HI

The ASAS HI is a new health index tool based on the International Classification of Functioning, Disability and Health (ICF) core set developed specifically for patients with axSpA to measure the concepts of functioning, disability, and health [12]. The ASAS HI is a linear composite questionnaire covering 17 categories, forming a unidimensional scale providing a sum score representing a wide spectrum of different levels of functioning, including pain, function, sleep, mobility, self-care, and social interactions [21]. Patients are asked to respond “Yes” or “No” to a list of statements based on how they feel at the time of completion. All statements have an equal value, with “Yes” equaling a score of 1 and “No” equaling a score of 0. All item scores are summed to give a total score ranging from 0 (good HRQoL) to 17 (poor HRQoL). Items No 7 (I have lost interest in sex) and 8 (I have difficulty operating the pedals in my car) are not applicable to all patients. For those patients who ticked the response “not applicable”, the sum score is analyzed based on n = 16 or n = 15, respectively. An overall score is calculated, with a lower score representing better functioning and higher HRQoL [12]. A total score cannot be calculated if more than three items (i.e., 20% of the data) are missing. The ASAS HI has been validated in a cross-sectional international observational study in patients with AS, nr-axSpA, and peripheral SpA (pSpA) across 23 countries [15]. Reliability in patients with stable disease who had repeat measurements was good; values ≤ 5.0 distinguish good health, as opposed to moderate health, with values ≥ 12.0 representing poor health. The smallest detectable change in the measure has been reported to be 3 points [12, 15].

Statistical Analysis

Patients Demographic and Clinical Characteristics

Patient demographic and clinical characteristics will be reported among three groups, AS, nr-axSpA, and overall patients. The continuous variable will be reported with number of patients, mean, standard deviation, minimum, and maximum. Count and percentage will be presented in each specific group for categorical variable. Base sizes can vary between characteristics due to missing values in physician completed patient record forms.

Mapping Between ASAS HI and ASQoL Using Cross-validation

K-fold cross-validation is a technique in which patients in the dataset are randomly partitioned into k sub-samples [22]. One sub-sample is set aside as the validation data for testing the predictive model and the remaining k − 1 sub-samples are used to generate the predictive model. The cross-validation process is then repeated k − 1 more times, with each of the k sub-samples used exactly once as validation data. Model accuracy measures are calculated on the validation data. These measures are considered “out of sample” as they are calculated on data not used in the predictive model.

Tenfold cross validation was used to determine the best models in mapping between ASAS HI and ASQoL [23]. Mapping was conducted from ASAS HI to ASQoL and also from ASQoL to ASAS HI.

A range of models were assessed. The mean score was first generated, this served as a benchmark for other models. If models could not perform better than assigning the mean score to the data, then they were considered poor models.

Regression models were generated, utilizing ordinary least squares and generalized linear models [24]. For generalized linear models, Gaussian, Poisson, negative binomial, gamma, and inverse Gaussian families were used, with identity, log, and power (from − 1 to 1.2 in increments of 0.05) links utilized.

Fractional logit was also performed [25], conducted on scores rescaled to [0, 1] interval, utilizing robust standard errors.

Nonparametric local-linear regression using the Epanechnikov kernel was also undertaken [26]. Like linear regression, nonparametric regression models the mean of the outcome conditional on the covariates, but unlike linear regression, it makes no assumptions about the functional form of the relationship between the outcome and the covariates. Nonparametric local-linear regression may be used to model the mean of a continuous, count, or binary outcome.

In the models described above, covariates were also utilized, including age, sex, BMI, employment status, ethnicity, disease severity, and treatment. Splines of continuous predictors/covariates were also investigated, including linear splines and restricted cubic splines, with knots derived from percentiles of independent variables. Interaction terms between predictors/covariates were also investigated.

All models had their performance assessed by utilizing root-mean squared error (RMSE) as an accuracy measure. The RMSE is the square root of the variance of the residuals and can be interpreted as the standard deviation of the unexplained variance [27]. Therefore, lower values of RMSE indicate better fit, and well-fitting models have an RMSE which is lower than the clinically meaningful difference in the ASAS HI or ASQoL. Additionally, the RMSE was examined at the lower, middle, and upper thirds of the predicted scale. This allowed us to determine if the model fitted well over the full range of values.

The model with the lowest RMSE in cross-validation was deemed the best. However, an alternative model would have been selected if the RMSE values indicated that the model performed particularly poorly in certain ranges of the scale. For the selected model, appropriate summary output statistics and regression tables are given. A summary of how values map from one scale to the other and scatter plots of recorded values against predicted values are also presented.

Conventionally, in this study alpha levels of 0.05 were deemed significant. However, the methods used were not driven by significance testing. Additionally, no adjustments for multiplicity were made. Analysis was undertaken in Stata 16.1 (StataCorp. 2019. Stata Statistical Software: Release 16. College Station, TX, USA).

Results

Demographics

In total, 63 rheumatologists completed PRFs and participated in this survey, with data from 283 AS patient and 274 nr-axSpA patient PSCs included in the cross-walk analysis. Patient demographic and clinical characteristics of the overall patient cohort (n = 557), and of the two patient sub-groups, are shown in Table 1. The mean age of the cohort was 45.3 years, and the mean time since diagnosis was 5.1 years. In total, 60.7% of the cohort were male, 82.2% were Caucasian/white, and the mean BMI was 27.3. Overall, 64.1% were categorized with mild disease, 32.5% moderate disease, and 3.4% having severe disease. The full-time employment rate was 68.6%.

Table 1 Patient demographics

Description of ASQoL and ASAS HI Scores

The distribution of ASQoL and ASAS HI scores across both patient groups is shown in Fig. 1. There was no significant difference in the distribution of scores. Overall, the mean ± standard deviation (SD) overall ASQoL score was 6 ± 5.4, AS patients had a mean score of 6.1 ± 5.4, and nr-axSpA patients had a mean score of 5.8 ± 5.3 (p = 0.5). The mean ± SD ASAS HI score across both patient populations was 5.5 ± 4.2, patients with AS had a mean score of 5.7 ± 4.4, with nr-axSpA patients having a mean score of 5.3 ± 4.1 (p = 0.23).

Fig. 1
figure 1

Distribution of ASQoL and ASAS HI scores. The distribution of ASQoL and ASAS HI scores reported across both patient groups. ASAS HI Assessment of SpondyloArthritis International Society Health Index, ASQoL Ankylosing Spondylitis Quality of Life

Regression Analysis

Over 600 regression models were tested for predicting ASQoL and ASAS HI. Table 2 gives the results of this analysis. Models both with and without covariates had superior performance when predicting lower values and performed worst when predicting higher values (RMSE 1.597, 1.871, 2.871 across thirds for ASAS HI without covariates; and 1.719, 2.577, 3.140 for ASQoL without covariates). Overall, lower and middle RMSE values for models with and without covariates were lower than clinically significant differences for ASQoL and ASAS HI, as described above. Model predictions are shown in Figs. 2 and 3 for predicting ASQoL and ASAS HI, respectively, indicating the line of best fit between the two parameters based on the performance of each model with and without covariates. Figures 2 and 3 also include observed values and the 95% confidence intervals of predicted values. Full regression tables for the best-performing models can also be found in the appendix.

Table 2 Regression model cross-validation results
Fig. 2
figure 2

Models for predicting ASQoL scores from ASAS HI scores. Models for predicting ASQoL scores from ASAS HI scores, indicating the line of best fit between the two parameters based on the performance of each model. Observed values shown in green and 95% confidence intervals of predicted values are shown in pink. a Mean model. b Non-parametric local-linear regression, without covariates. c GLM with Gaussian family and power link (^0.55), with covariates. ASAS HI Assessment of SpondyloArthritis International Society Health Index, ASQoL Ankylosing Spondylitis Quality of Life

Fig. 3
figure 3

Models for predicting ASAS HI scores from ASQoL scores. Models for predicting ASAS HI scores from ASQoL scores, indicating the line of best fit between the two parameters based on the performance of each model. Observed values shown in green and 95% confidence intervals of predicted values are shown in pink. a Mean model. b Non-parametric local-linear regression, without covariates. c GLM with Gaussian family and power link (^1.15), with covariates. ASAS HI Assessment of SpondyloArthritis International Society Health Index, ASQoL Ankylosing Spondylitis Quality of Life

Discussion

The ASQoL and ASAS HI are both used in clinical trials and real-world evidence studies of axSpA patients. While many of the questions asked by both the ASQoL and ASAS are comparable, and the two scales have a similar scoring range, it is important to be able to compare the results across studies using these measures and to assess equivalency of outcomes. This is the first study demonstrating a cross-walk scoring algorithm for the ASQoL to the ASAS HI, and vice versa. The ASAS-HI tool was first released in 2014, and thus the published evidenced is lower than for the ASQoL tool which has been in use since 2003. This cross-walk approach is of value, in that it facilitates comparisons of patient-reported outcomes relating to disease severity and treatment response between different studies published over time, conducted in different patient groups using either the ASQoL or ASAS-HI scales, thereby increasing transparency and reducing uncertainty surrounding decision-making processes.

In a previous study of 140 axSpA patients, ASAS HI scores were obtained and compared alongside other disease outcome activity measures, with a correlation coefficient of 0.784 observed between ASAS HI and ASQoL [28]. In addition, a systematic review of various PRO measures for SpA identified a wide range of QoL domains of relevance, with minimal differences observed between patients with axial and peripheral SpA [29]. The authors concluded that peripheral SpA PROs can be used to measure QoL of patients with axial SpA. However, our study builds on both of these by mapping the two domains directly against each other in the same patient population, thereby creating a mathematical model in order to better understand the relationship between the two scoring systems.

We found that when predicting ASAS HI scores using ASQoL values, the best model was non-parametric local-linear regression, with an overall RMSE of 1.851. When predicting ASQoL scores using ASAS HI values, similarly to the opposite direction cross-walk, the best model was also non-parametric local-linear regression, with an overall RMSE of 2.254. The RMSE is the square root of the variance of the residuals and can be interpreted as the standard deviation of the unexplained variance, with lower values representing a better fit. In this study, the RMSE observed in both best fit models was less than the minimally observed clinical difference for both scales (two for ASQoL and three for ASAS HI), indicating that outcomes of our analyses can be considered well-fitting models.

In predicting ASQoL and ASAS HI, our models performed better in predicting lower values in the predicted scale. One explanation for this was that because there were more patients at the lower end of the distribution, there were more comparable data points to match. Patients with more severe disease had a greater range of potential disease variables, therefore the crosswalk matching between the two scales may not have been as precise.

There was no significant difference in the distribution of ASQoL and ASAS HI scores across both patient groups. The mean overall ASQoL score was 6 ± 5.4, and the mean overall ASAS HI score was 5.5 ± 4.2, which corresponds with the trends and distributions seen in previously published literature. In the original study detailing the development of the ASQoL tool [11], the UK subset (n = 210) had a mean 9.5 ± 5.3 at time 1 and mean 8.8 ± 5.7 at time 2, and the Dutch subset (n = 158) had a mean 6.7 ± 4.8 at time 1, and mean 6.2 ± 4.8 at time 2. In the global study validating the ASAS HI undertaken in 23 different countries (n = 1548 patients), the mean score was 6.7 ± 4.3 [15]. Taken together, these indicate that the population used in this study was representative and comparable with wider patient demographic.

In our study, some differences in clinical and demographic characteristics were observed between AS and nr-axSpA patients. For example, AS patients were older, had been diagnosed longer, were more likely to be retired and male, and had a higher disease severity than nr-axSpA patients. However, all of these findings were to be expected; AS is a more progressed condition than nr-axSpA and these differences are well described in the existing scientific literature [6, 30]. In any case, these differences observed between the AS and the nr-axSpA patients do not influence the outcome of the crosswalk methodology, as all patients completed both PRO tools. The overall distribution of scores for each instrument was skewed towards higher values for AS patients, but this simply reflects the higher disease severity. Including both types of patients strengthens the robustness and the reliability of the analysis, since we are able to analyze scores both at the upper end and lower end of each scale.

Strengths and Limitations

One of the key strengths of the data set used is that patients were enrolled by their treating rheumatologist, so only patients with a confirmed diagnosis of AS or nr-axSpA could be included. As mentioned above, patients presented with all degrees of severity, leading to a distribution of scores for both instruments across the whole range of values. This proposed relationship is strongly driven by the collected data, however there may be rare instances where this may not hold true. A general limitation, one in common with all real-world research, is that analysis is restricted only to those patients who are willing to participate. However, since it is the same patients completing both instruments, the study is internally consistent; as the analysis includes patients from all severity levels, the patients are representative of the whole AS and nr-axSpA population. One other limitation that needs to be mentioned is that the study included patients only from the US. If the results were to be applied to other populations, then some caution might be needed with interpretation if it is likely that an ex-US patient might score one of the instruments differently.

Conclusions

The results of this study demonstrate a scoring algorithm for cross-walking the ASAS HI to the ASQoL and vice versa, with the approach allowing for comparisons across studies using either ASAS HI or ASQoL to be made. Future research is needed to confirm our findings and the clinical usefulness of this study.