FormalPara Key Points

Valid self-reported outcome measures can contribute to a greater understanding of the impact of illness and injury on athletic performance.

There is currently no universally accepted self-reported outcome measure of athlete performance.

The Oslo Sports Trauma Research Centre overuse injury questionnaire has potential for development for use across different sports.

1 Background

Athlete-reported measures of health, wellbeing and performance can add meaningful information to that obtained from traditional physiological and biochemical performance measures [1, 2]. Research which includes the athlete’s perspective has contributed to a greater understanding of development and performance along with issues pertaining to athlete welfare and wellbeing [1, 3].

Validity and reliability are key characteristics of self-reported outcome measures [4] and questionnaires with evidence of validity and reliability in a general population or even a younger active population have been previously used in the sporting setting. However their length, narrow focus or lack of specificity to the athlete population has led to widespread use of study-specific questionnaires within sports medicine. While this reflects an attempt to reduce the burden on the athlete and increase the relevance, it may compromise validity and reliability [2, 5].

The scores obtained from these self-reported measures should allow valid inferences to be made including hypothesis-testing, therefore they should be assessed for validity in the particular population of interest. Evidence of validity accumulates over time from multiple studies [4, 5], therefore there is a need for consensus regarding the methods used to record and measure health-related incidents and their consequences for athletes [46]. Used together these values describe change that can be distinguished from measurement error and is important to athletes [6].

Athletes are different from the general population [7, 8]. They have higher levels of physical function, psychological function and perceived health. Physical activity is often their main employment, therefore the morbidity consequences of injury and illness tend to be high [9]. Athletes may not manifest symptoms during activities of daily living, and existing outcomes measures may not detect problems resulting from the demands of their training and competition [10], thus development of outcome measures that are specific to high performance sport could be important [9, 1113].

The negative consequences of health problems include impairment, activity limitation and participation restrictions [11, 12]. Information regarding the prevalence and impact of health-related incidents is important to establish the burden of health problems and inform appropriate preventive and health promotion strategies [1317]. However, athletes may not always seek medical care or present as patients, therefore patient-reported outcome measures (PROMs) may not be sufficient to capture all available information [9, 11, 1821]. Additional barriers to the use of self-reported outcome measures include time to complete and lack of accessibility [2, 22].

Measures that are easy to understand, administer, score and interpret are more likely to be useful to all stakeholders in sport, including athletes, clinicians, researchers, support staff, funding bodies and policy makers [9]. We aimed to review the evidence to determine which athlete-reported outcomes have been used to evaluate the impact of health problems on performance in sport. A secondary objective was to evaluate eligible outcome measures for evidence of validity and potential for future research.

2 Methods

In order to address the first objective we conducted a systematic review to answer the focused question: “Which athlete-reported outcome measures of performance have been used to measure the impact of injury and illness on performance in sport?”

Studies were included if they met the following eligibility criteria: (1) participants were currently or had been competing at an elite level as able-bodied athletes; elite level was defined as competitive at Olympic, international, national or professional level [7], (2) any outcome measure of performance, assessed in relation to illness, injury or a related intervention, was reported by the athlete including functional and generic patient-reported outcome measures (PROMS), athlete diaries, interviews and patient satisfaction surveys; (3) the study was published in English. Studies were excluded from the review based on the following criteria: (1) participants were under the age of 16 years; (2) participants were competing at a recreational level; (3) the study was undertaken with a heterogeneous sample (e.g. elite and non-elite, able-bodied and disabled, under and over age 16 years) without reporting groups separately.

2.1 Search Methods for Identification of Studies

2.1.1 Electronic Searches

The databases of MEDLINE (Ovid version), EMBASE, CINAHL Plus, SPORTDiscus with Full Text, and Cochrane library were searched to 26 January 2016. A sensitive search strategy was devised initially in MEDLINE including the following search terms: self-report * athlete * patient reported outcome measure * and used in subsequent searches. An overview of the search strategy is available on request.

2.1.2 Searching Other Resources

The reference lists of included studies were checked for other papers that might be suitable for inclusion.

2.2 Data Extraction

Titles and abstracts were screened for eligibility by one of the authors (JG). The full text of all potentially eligible studies was assessed for inclusion by two authors in duplicate and independently (JG and RGS), resolving disagreements by discussion. Where resolution could not be achieved, a third author, experienced in conducting systematic reviews, arbitrated (IN). For included studies, data were extracted using a specially designed form (piloted before use) also in duplicate and independently by two reviewers. Where information in a paper was unclear, the corresponding author was contacted for clarification. Data extraction related to type of study, setting where the study took place, sport, population, injury or illness regardless of need for medical attention and details of the outcome measure.

2.3 Quality Assessment

In order to address our second objective, validity of development of outcome measures was assessed. Aspects of validity were evaluated using a pre-defined checklist based on the taxonomy and criteria proposed by Terwee et al. [23, 24] for evaluation of measurement properties of health status questionnaires.

2.3.1 Validity

There are many types of validity evidence [6] including face validity (the instrument actually measures the intended construct), content and construct validity. We considered evidence for content validity to include a clear description of the measurement aim, the target population, the concepts being measured and item selection. In addition the target population should have been involved in item selection. Evidence for internal consistency required factor analysis to be applied, with a Cronbach’s alpha value between 0.7 and 0.95. Ideally there should be at least 50 participants and minimal floor or ceiling effects [21].

Evidence for construct validity included reporting of values to show convergent validity (agreement in scores from other outcome measures which aim to assess similar constructs) and/or divergent validity (low correlation with scores from outcome measures which assess different constructs). Correlation coefficients such as the Spearman rho or Pearson r are most commonly reported in construct validation studies [6]. There should be at least 50 participants and at least 75% of the results should support a previously defined hypothesis [21].

2.3.2 Reproducibility (Agreement and Reliability)

The outcome measure scores should reflect changes where real change has occurred rather than changes due to measurement error. Evidence for agreement included at least 50 participants and the standard error of measurement (SEM) to be reported along with smallest detectable change (SDC) and minimal important change (MIC) or convincing arguments that agreement is acceptable. Evidence for reliability required at least 50 participants and an intra-class correlation coefficient (ICC) of at least 0.7 to be reported [21].

2.3.3 Responsiveness (Longitudinal Validity)

Evidence for the outcome measurement instrument to detect clinically important change over time included correlation with scores from other outcome measures of the same construct. Interpretability was assessed from evidence that a (change in) score was clinically meaningful along with means and standard deviations (SDs) of scores of reference populations and participant subgroups. In addition an MIC should be defined [21].

2.4 Data Synthesis and Reporting

In keeping with the aims of the review, findings from eligible studies were combined narratively using tables of evidence. The characteristics of the outcomes were used to synthesise results as well as validity outcomes. Reporting of the review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline [25].

3 Results

The adopted search strategies yielded 6536 results. After removal of duplicates and titles clearly not relevant to the research question, 1358 articles were further screened by title and abstract for consideration in full text screening. The full text of 159 articles was assessed against eligibility criteria and 21 articles were finally included [2646]. Agreement on article inclusion was high (0.8). Reasons for exclusion of full text studies are given in Fig. 1.

Fig. 1
figure 1

PRISMA flow chart

3.1 Characteristics of Included Studies

The studies represented a range of countries, with the USA being the most frequent. Seven categories of health problems including hip and groin, knee, shoulder, lower back, eyes, oral health, overuse injuries and illness were represented across 34 different sports (Table 1). Ten of the 20 outcome measures were used in evaluations of medical interventions [31, 32, 34, 3638, 41, 4345].

Table 1 Characteristics of the studies

3.2 Characteristics of the Athlete-Reported Outcome Measures

Athlete-reported outcome measures of performance included return to play, time to return to training/competition, level of competition, perception of performance compared to pre-injury, participation limitation, reduction in volume of training and impact on performance. A summary of the athlete-reported outcomes identified by the search is presented in Table 2.

Table 2 Characteristics of the self-reported outcome measures

3.3 Evaluation of Athlete-Reported Outcome Measures Used in Health Surveillance

Nine different athlete-reported outcome measures were used in ten observational (epidemiological or surveillance) studies [26, 27, 29, 30, 33, 35, 39, 40, 42, 46]. However, most were designed for use in individual studies without reference to evidence of validity. Self-reported information was used in one qualitative investigation of rugby players’ experiences following anterior cruciate ligament injury and repair, conducted over a period of rehabilitation and return to competition [28]. Quality criteria based on a pre-defined checklist [23, 24] were applied to the four questionnaires where the study had included a reference to evidence of validity of the outcome measure (Table 3).

Table 3 Validity checklist applied to eligible outcome measures

3.4 Athlete- Versus Patient-Reported Outcomes to Evaluate Medical Interventions

None of the athlete-reported outcomes of performance used in evaluation of medical interventions cited assessment of validity; seven were used in conjunction with PROMs, not all of which cited validity in a sporting population (Table 2). However, three of the functional PROMs—International Hip Outcome Tool (iHOT-12), Copenhagen Hip and Groin Outcome Score (HAGOS) and Victorian Institute of Sport Assessment-Patellar Tendinopathy (VISA-P)—identified by this review have evidence of validity in a younger active population [4850]. The three generic PROMs used in the studies—Short Form (12) Health survey (SF-12), Short Form (36) Health Survey (SF-36) and EuroQol (EQ-5D) Health Questionnaire—have been reviewed by another author and found to have limited validity in a sport and recreation population [9]. The Hip Sports Activity Scale (HSAS) used to identify level of sporting activity (Table 4) has evidence of validity in young patients with hip disease [47].

Table 4 Potential utility as an athlete-reported outcome measure of performance

4 Discussion

Our key finding is that most athlete-reported outcome measures of performance to assess the impact of illness and injury on performance in sport identified in this review were developed for use in individual studies. There can never be a single study which validates an outcome measure; however, evidence of validity and reliability of the inferences drawn from the data accumulates over time with use in multiple studies, thereby allowing meaningful comparison across studies. One oral health self-reported measure of impact on performance was used in Olympic athletes and professional footballers, but evidence of its validity has been assessed in a general population only. Functional PROMs such as i-HOT12, HAGOS and VISA-P, developed using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines, demonstrate validity in young, active populations but not specifically in elite sport groups (Table 4). The HSAS self-reported measure of athletic capability has evidence of validity and reliability and could be a useful model for a tool to report the level of competition of athletes in research studies. Although rich in qualitative information, athlete interviews require a substantial time commitment from both the athlete and the researcher, as does the use of multiple PROMs. Consistent use of outcome measures with evidence of validity and reliability could help to quantify the burden of injury and illness and relative risk in athletes across different sporting activities. Researchers should aim to identify and use outcome measures with evidence of validity in the target group in which they are to be used. Three athlete-reported outcome measures of impact on performance demonstrate validity in a high performance athletic population—the OSTRC overuse injury questionnaire, the OSTRC questionnaire on health problems and the KJOC shoulder and elbow questionnaire; however, the KJOC questionnaire is specific to overhead throwing athletes. All are short and straightforward to complete and measure impact on performance in terms of athlete-reported pain/symptoms, participation, volume and quality of training/competition.

4.1 Strengths and Limitations of the Included Evidence

There are challenges to drawing robust conclusions from the included evidence. In general, the data regarding the outcome measures were drawn from their use in single studies, although one measure of the impact of oral health on performance was used in two separate studies. Few questionnaires reported development using a structured approach and involvement of the target population, limiting their validity.

4.2 Strengths and Limitations of the Review

4.2.1 Eligibility Criteria; Performance Level

In order to limit the review we made a decision to limit the participants in the studies to high performance, able-bodied athletes. This focus resulted in several studies being excluded because the studies included participants with disabilities, participants under the age of 16 years or recreational sports people who could not be separated out from the highest level athletes.

4.2.2 Performance Versus Functional Outcomes

Return to play is dependent on a number of factors, most of which are outside an athlete’s control. Included studies had to demonstrate that a self-reported outcome measure was used to evaluate the impact upon performance in elite athletes. This resulted in exclusion of studies which included heterogeneous samples and reported on the development of functional outcome measures using the COSMIN criteria, such as the Functional Assessment Scale for Acute Hamstring Injuries (FASH) [52] and Victorian Institute of Sport Assessment—Achilles Tendinopathy (VISA-A)

4.2.3 Risk of Bias and Quality Assurance

We attempted to minimise bias by developing the protocol a priori and employing duplicate full-text screening and data abstraction. However, initial eligibility assessment of titles and abstracts was carried out by one researcher (JG), which might have introduced bias in study selection.

4.2.4 Comparison with Other Reviews

This review supports the finding of related reviews. One systematic review of PROMs used to assess Achilles tendon rupture management [53] applied COSMIN criteria to 17 region-specific and condition-specific outcome measures; the authors found only four were presented in articles that referenced development and/or validation of that outcome measure and of these only one was developed using recognised methodology for outcome measure development. A systematic review of instruments used to assess outcomes of sport and active recreation injury [9] listed seven different health status and health-related quality-of-life measures, five different functional outcome measures and three physical activity measures; the authors stated that none have been specifically or region designed to measure injury outcomes in a general sport and active recreation population. One recent study of low back pain in international level rowers [54] recommended using the OSTRC overuse injury questionnaire, demonstrating its potential for use across all sports.

5 Conclusion

Within the limits of this review there is currently no universally accepted athlete-reported outcome measure of the impact of injury/illness on performance in sport. Most questionnaires were designed for individual studies and evidence to support their validity, reliability and responsiveness has not been reported. The KJOC shoulder and elbow questionnaire has evidence to support its validity, reliability and responsiveness but is specific to professional baseball players. Consistent use of self-reported outcome measures with evidence of validity, reliability and responsiveness would lead to more reliable and comparable evidence. Despite some limitations, as a potential tool to measure athlete-reported impact on performance across a variety of sports, the OSTRC questionnaire on overuse injuries forms a model that could be adapted to evaluate the impact of any pre-defined health problem on athletic performance. The addition of items related to impact on quality of life could add value in terms of understanding the negative consequences of injury and illness in sport.