This was a multi-site, observational, prospective, longitudinal, cohort study in patients with AUD, followed in specialized care for up to 3 months (± 2 weeks) in Japan. Patients were treated according to routine practice, i.e., treatment was not decided in advance by the study protocol. The study was conducted between October 4, 2016 and September 5, 2017, at 15 outpatient sites across Japan.
Patients
Patients were male and female Japanese adults (aged ≥ 20 years) with a diagnosis of AUD according to the Diagnostic and Statistical Manual of Mental Disorders, 5th Edition (DSM-5). Eligible patients had to be currently on an established outpatient treatment plan with the intent to follow that plan for 3 months or have an established treatment plan that started within 1–4 weeks after the baseline visit. Key exclusion criteria were any serious or unstable psychiatric disorders (such as drug addiction) and physical disorders that prevented participation into the study, learning difficulties that prevented him/her reading and understanding questionnaires (e.g., dementia), and, in the physician’s opinion, could not be followed for the whole duration of the study.
Assessments
Assessments were conducted as part of routine practice visits at baseline, 2 weeks (± 1 week), and 3 months (± 2 weeks). Patients self-rated their HR-QoL using the AQoLS-Japan which has 34 items across 7 dimensions: activities (items 2–7, 13, 15, and 25–26), relationships (items 1, 8–11, and 27), living conditions (items 16–18 and 24), negative emotions (items 22–23), control (items 28–32), sleep (items 33–34), and self-esteem (items 12, 14, and 19–21) (Supplementary Appendix Table e1). Each item has four response categories (not at all, a little, quite a lot, and very much), with a 4-week recall period. Dimension and total (sum of 34 items) scores are linearly transformed to a 0–100 range, with higher scores indicating poorer HR-QoL. Other HR-QoL assessments included Japanese versions of generic HR-QoL measures—the EuroQol questionnaire (EQ-5D-3L) [12] and the SF-36 Health Survey (version 2) (SF-36) [13]. Clinicians and patients rated their global impressions of severity using the 7-item Clinical Global Impression of Severity (CGI-S) and the 5-item Patient Global Impression of Severity (PGI-S), respectively. Impressions of change were assessed using the Clinical Global Impression of Improvement (CGI-I) and Patient Global Impression of Change (PGI-C) [14], both consisting of 7 items where 1 = very much improved/better, 2 = much improved/better, 3 = minimally improved/better, 4 = no change, 5 = minimally worse, 6 = much worse, and 7 = very much worse). Levels of alcohol consumption were evaluated using the Timeline Follow-Back method (past 28 days) [15]. Drinking risk levels (DRLs) were defined according to WHO criteria as [male/female] low ≤ 40 g/≤ 20 g; medium 41–60 g/21–40 g; high 61–100 g/41–60 g; and very high > 100 g/> 60 g per day) [2]. A DRL response was defined as from very high to medium/low DRL, high to low DRL, medium to low DRL, or low DRL to alcohol consumption of 0 g/day.
Statistical analyses
Sample size estimation
The target sample size was determined based on the requirement to have approximately 60 patients for the assessment of test–retest reliability and responsiveness [16]. Based on the results of two previous clinical trials in patients with AUD [17, 18], it was estimated that approximately 59% of patients would maintain their baseline DRL after 2 weeks of follow-up and 58% would achieve treatment response at 3 months. Hence, considering 18% withdrawal rate at 3 months, it was estimated that a minimum of 127 patients were needed for this study. To allow for any cultural differences between Japan and the countries involved in the clinical trials (e.g., higher loss to follow-up), the minimum enrollment was increased to 150 patients.
Analysis population
The analysis population included all patients who met selection criteria and completed the baseline assessment. Psychometric validation assessments were performed for all patients who completed all study visits.
Descriptive statistics
Standard descriptive statistics were used to describe the distributional properties of the AQoLS-Japan item, dimension, and total scores at each study visit, as well as for change from baseline for AQoLS-Japan dimension and total scores at follow-up visits.
Descriptive statistics were also used to summarize the HR-QoL and clinical status of patients with AUD (AQoLS-Japan, EQ-5D, SF-36, PGI-S, and PGI-C) at baseline and 3 months.
Psychometric validation
Dimensional structure was assessed through inter-item and item-scale correlations at each visit and confirmatory factor analysis (CFA) using data from baseline. Adequacy of model fit was evaluated through the model Chi-square test statistic, comparative fit index (CFI ≥ 0.95), Tucker–Lewis index (TLI ≥ 0.95), and the root mean square error of approximation (RMSEA ≤ 0.06). Internal reliability of AQoLS-Japan was evaluated through Cronbach’s alpha coefficients, where an alpha between 0.70 and 0.90 indicates a set of items that is strongly related but not redundant and that is capable of supporting a unidimensional scoring structure [19].
Test–retest reliability was evaluated in patients whose condition remained stable on PGI-C, PGI-S, CGI-I, and DRL between baseline (test) and 2 weeks (retest) by estimating the intraclass correlation coefficients (ICCs) for scores between test and retest; ICCs of ≥ 0.70 were taken to represent adequate reliability. Construct validity was investigated by testing a priori hypotheses to evaluate the direction and strength of the relationships between AQoLS-Japan scores and scores on comparator measures (SF-36, EQ-5D, PGI-S, CGI-S, and alcohol consumption) using Pearson product-moment correlations at each visit. The strength of correlations is assessed based on Cohen’s criteria [20]; correlations between 0.10 and 0.29 are considered small, correlations between 0.30 and 0.49 are considered moderate, and correlations of 0.50 or greater are considered strong. A priori hypotheses regarding the direction and strength of these correlations were determined based on the literature and findings from the UK and French AQoLS psychometric validation [8, 11]. We hypothesized that there would be:
- 1.
Moderate to strong negative correlation between the AQoLS and the SF-36 mental and role-social components.
- 2.
Moderate negative correlations between the AQoLS and the SF-36 role emotional, vitality, mental health, and social functioning components.
- 3.
Small to moderate negative correlation between the AQoLS and the EQ-5D visual analogue scale.
- 4.
Low to moderate positive correlation between the AQoLS and the CGI-S; stronger correlation between the AQoLS and the PGI-S.
- 5.
Low to moderate positive correlations between the AQoLS and measures of alcohol consumption.
Since there is a well-established, ‘dose-related’ continuum of health impact for AUD [21], we wanted to check scale validity across the spectrum of drinking behaviors, from mild through to more severe disease and at different levels of alcohol consumption. Known-groups validity was evaluated through statistical significance of differences in AQoLS-Japan scores between the two most extreme subgroups across known prespecified subgroups (disease severity based on PGI-S and CGI-S, and level of consumption based on number of HDDs, number of drinking days and DRL) using t tests at each visit.
The ability to detect change of the AQoLS-Japan was evaluated by using Pearson product-moment correlation coefficients between AQoLS-Japan Month-3 change from baseline and Month-3 change on comparator measures (SF-36, EQ-5D, alcohol consumption, PGI-C, and CGI-I). AQoLS-Japan change in scores between baseline and 3 months were also computed for patients who improved on PGI-C, CGI-I, and reduced alcohol consumption during the 3-month follow-up period and the significance of change was tested using paired t test. In addition, effect-size estimate of change and standardized response mean were computed for the mean change in scores between baseline and 3 months on the AQoLS dimensions and total. Responsiveness effect sizes were interpreted based on [22] guidelines [22], where ≥ 0.20 to < 0.50 represent small effects, ≥ 0.50 to < 0.80 represent moderate effects, and ≥ 0.80 represent large effects. The minimal clinically important difference (MCID) is defined as the smallest change or difference in scores of a measure perceived by patients as beneficial or harmful [23]. Estimates for an MCID to evaluate group-level change over time on the AQoLS total were determined using the AQoLS total mean change for patients who had a small improvement (defined as PGI-S = 1, PGI-C = 3, CGI-S = 1 or 2, CGI-I = 3, or DRL improvement of one category) between baseline and Month 3 on each anchor determined to be adequate [23,24,25]. Estimates for responder definitions were based on mean changes, receiver-operator characteristic (ROC) analysis, and cumulative distribution function plots.
All statistical tests were two-sided and conducted at the 5% level of significance. With the exception of the SF-36 (where missing data at the item level were treated in accordance with standard scoring guidelines [13]), imputation was not performed for missing data. The statistical software used was SAS®, Version 9.4.