The HUPE/UERJ research ethics committee approved the present study (CEP/HUPE no. 2674). The participants were informed of the study aims and methods before signing an informed consent document.
Patient selection
A total of 70 male and female patients who were literate and physically active who reported hip pain and were diagnosed with either FAI or GTPS (as confirmed by radiograph, tomography, or magnetic resonance imaging) were selected. The participants were recruited from the Orthopedic Institute of Tijuca, a private hip outpatient clinic in Rio de Janeiro. Data were collected between December 2015 and June 2016.
Patients were excluded if they showed visual or cognitive disorders that impaired the reading and interpretation of the questions; hip arthrosis, characterized by a minimum joint space of < 1.5 mm and a severe limitation in hip range of motion [17]; or incomplete responses to the questionnaires on day 1 and 48 h after the first application.
Study protocol
The study protocol included completing the identification form, which was composed of the clinical characteristics of the patients and the application of three quality of life assessments: the HOS-Brazil as well as the Brazilian-validated versions of 12-Item Short-Form Health Survey (SF-12) and the Nonarthritic Hip Score (NAHS) [16, 18, 19]. The participants were instructed to complete all three questionnaires (1st application or test). Approximately 48 h later, they completed only the HOS-Brazil via e-mail (2nd application or retest).
HOS-Brazil
The HOS is a self-report questionnaire composed of 28 questions divided into two subscales: Activities of Daily Living (ADLs; 19 items) and Sports (nine items) [1, 16]. The total score of each subscale ranges from 0 to 100, where higher scores denote better hip function. The scores for each subscale were calculated separately [6].
The response options are the same for all 28 items and given specific scores that are added to the end of the assessment. Responses to the 19 items of the ADL subscale are scored from 0 to 4, where 4 is “No difficulty at all” and 0 is “Unable to perform.” The scores of the individual items are added to obtain the total score, which is then multiplied by 4 to generate the highest potential score. Assuming the patients respond to all 19 items, the highest possible score is 76. The total score obtained is divided by 76. The resulting value is multiplied by 100 to express the score as a percentage. The nine items of the Sports scale are calculated in the same way, and the highest possible score is 36. A higher final score represents a better level of physical functioning with regard to both the ADL and Sports subscales [6].
In addition, the HOS includes two questions regarding how respondents rate their current level of functioning during ADLs and sports from 0 to 100 as well as one qualitative question asking them to rate their current level of functioning (normal, nearly normal, abnormal, or severely abnormal). The responses to these three questions are not considered in the HOS final score [20].
Psychometric properties
To validate the psychometric properties of the HOS-Brazil, its reliability and validity were assessed according to the COSMIN checklist [13,14,15].
Reliability is a psychometric property that measures the degree to which a questionnaire is free from measurement error; furthermore, this process establishes whether the scores remain similar after repeated application to the same sample on a different occasion and without the influence of treatment. The reliability of the HOS-Brazil was assessed based on the following properties: internal consistency, intra-rater test-retest reliability, measurement error, and concordance [12, 14, 21].
Internal consistency assesses the ability of a set of questions to measure a similar concept. Test-retest reliability is a measurement property that assesses the ability of a questionnaire to yield similar results when the same respondents are assessed on a different occasion without undergoing any change in health. Concordance is related to systematic and random errors in the respondents’ scores that are not attributed to true changes in the construct to be measured [12, 14].
The reliability of the HOS-Brazil was investigated using a sample of 70 patients who responded to the questionnaire twice with a 48-h interval. In between applications, no new medications, therapies, or procedures were introduced that were likely to induce rapid changes to the patients’ clinical conditions. The interval between the test and retest was selected based on two criteria: The period of time was held long enough for the respondents not to remember their previous responses but was simultaneously brief enough for no changes to occur to the patients’ clinical conditions [12, 14].
The psychometric property of validity concerns the degree of instrument precision (i.e., whether it conserves the precision of the concept that it intends to measure). Validity assesses whether a new instrument retains the characteristics of the original version and is composed of three measurement properties: construct validity, content validity, and criterion validity [12, 14].
Construct validity corresponds to the degree to which an instrument’s scores are consistent with the hypotheses based on the assumption that the instrument measures the intended construct. Content validity estimates the degree to which the content of a measurement instrument is considered as an adequate reflection of the construct to be measured. Criterion validity determines the degree to which an instrument’s score adequately reflects the instrument considered as a “gold standard.” Criterion validity was not assessed in the present study because, according to the COSMIN formulators, no health assessment instrument is considered as a “gold standard” [12, 14, 21]. Therefore, the validity of the HOS-Brazil was assessed based on only construct and content validity.
Statistical analyses
The intraclass correlation coefficient (ICC) and Pearson’s correlation coefficient were used to analyze test-retest reliability [22, 23]. Internal consistency was assessed using Cronbach’s alpha [24, 25]. This statistical technique is based on the number of items and their homogeneity within a scale. A paired-sample Student’s t-test was used to compare the scores obtained for the first and second applications of the HOS-Brazil [23].
Measurement error was calculated based on the standard error of measurement (SEM) and minimal clinically important difference (MCID). SEM was calculated by multiplying the square root of 1 minus ICC times the standard deviation of the scores obtained for the first application of the HOS-Brazil. The MCID was calculated by multiplying the SEM times 1.96, which is equivalent to the z-score of the 95% confidence interval and the square root of 2 [12, 26].
Concordance was assessed based on the graphical representation of the measurement error between test and retest via Bland-Altman and concordance-survival plots [27,28,29]. The former quantifies concordance using limits of concordance based on the means of the test and retest as well as the difference between both assessments. These statistical limits are calculated using the mean and standard deviation of the differences. A linear regression curve of the Bland-Altman plot was modeled to assess the presence of proportional bias [27, 28]. The independent variable (x-axis) used for the linear regression was the mean of both assessments, and the dependent variable (y-axis) was the difference between both assessments. The null hypothesis stated that the slope of the regression line would not differ from zero. Proportional bias alludes to a situation in which the difference between the two measurements is not constant across the full range of possible scores as indicated by the p-value obtained for regression analysis (p < 0.05). If the difference between the scores obtained at two measurements is constant and independent of the scores’ magnitude, then it is described as a fixed bias [27, 28].
Construct validity, both convergent and divergent, was assessed using Pearson’s correlation coefficient. The HOS-Brazil was compared with the NAHS and SF-12, which have already been validated for Brazilian Portuguese. The aim of the construct validity assessment was to investigate the convergence and divergence of the HOS-Brazil relative to the NAHS and SF-12 [23]. Content validity was assessed based on the presence of completed questionnaires scored as zero or 100 (maximum score), i.e., floor or ceiling effects, respectively [30].
A descriptive statistical analysis was performed to characterize the study sample. The psychometric properties of reliability and validity were analyzed using GraphPad Prism software, version 7.00 for Windows (GraphPad Software, La Jolla, California, USA). The significance level was set at 0.05.