Self-Reported Outcome Measures of the Impact of Injury and Illness on Athlete Performance: A Systematic Review

Background Self-reported outcome measures of athlete health, wellbeing and performance add information to that obtained from clinical measures. However valid, universally accepted outcome measures are required. Objective To determine which athlete-reported outcome measures of performance have been used to measure the impact of injury and illness on performance in sport and assess evidence to support their validity. Methods The authors searched Ovid MEDLINE, Ovid EMBASE, CINAHL Plus, SPORTDiscus with Full Text and Cochrane library to January 2016. Predefined inclusion and exclusion criteria were applied and papers included if an outcome measure of performance, assessed in relation to illness, injury or a related intervention, was reported by an elite, adult, able-bodied athlete. A checklist was used to assess eligible outcome measures for aspects of validity. Reporting of this study was guided by PRISMA guidelines for systematic reviews. Results Twenty athlete-reported outcome measures in 21 papers were identified. Of these 20, only four cited validation. Of these four, three reported evidence to support validity in elite athlete groups as defined by the predetermined checklist. Fifteen patient-reported outcome measures were identified, of which four demonstrated validity in young athletic populations. Conclusions Most athlete-reported outcome measures of performance have been designed for individual studies with no reported assessment of validity. Despite some limitations, the Oslo Sports Trauma Centre overuse injury questionnaire demonstrates validity and potential utility to investigate the self-reported impact of pre-defined conditions on athletic performance across different sports.


Background
Athlete-reported measures of health, wellbeing and performance can add meaningful information to that obtained from traditional physiological and biochemical performance measures [1,2]. Research which includes the athlete's perspective has contributed to a greater understanding of development and performance along with issues pertaining to athlete welfare and wellbeing [1,3].
Validity and reliability are key characteristics of selfreported outcome measures [4] and questionnaires with evidence of validity and reliability in a general population or even a younger active population have been previously used in the sporting setting. However their length, narrow focus or lack of specificity to the athlete population has led to widespread use of study-specific questionnaires within sports medicine. While this reflects an attempt to reduce the burden on the athlete and increase the relevance, it may compromise validity and reliability [2,5]. The scores obtained from these self-reported measures should allow valid inferences to be made including hypothesis-testing, therefore they should be assessed for validity in the particular population of interest. Evidence of validity accumulates over time from multiple studies [4,5], therefore there is a need for consensus regarding the methods used to record and measure health-related incidents and their consequences for athletes [4][5][6]. Used together these values describe change that can be distinguished from measurement error and is important to athletes [6].
Athletes are different from the general population [7,8]. They have higher levels of physical function, psychological function and perceived health. Physical activity is often their main employment, therefore the morbidity consequences of injury and illness tend to be high [9]. Athletes may not manifest symptoms during activities of daily living, and existing outcomes measures may not detect problems resulting from the demands of their training and competition [10], thus development of outcome measures that are specific to high performance sport could be important [9,[11][12][13].
The negative consequences of health problems include impairment, activity limitation and participation restrictions [11,12]. Information regarding the prevalence and impact of health-related incidents is important to establish the burden of health problems and inform appropriate preventive and health promotion strategies [13][14][15][16][17]. However, athletes may not always seek medical care or present as patients, therefore patient-reported outcome measures (PROMs) may not be sufficient to capture all available information [9,11,[18][19][20][21]. Additional barriers to the use of self-reported outcome measures include time to complete and lack of accessibility [2,22].
Measures that are easy to understand, administer, score and interpret are more likely to be useful to all stakeholders in sport, including athletes, clinicians, researchers, support staff, funding bodies and policy makers [9]. We aimed to review the evidence to determine which athlete-reported outcomes have been used to evaluate the impact of health problems on performance in sport. A secondary objective was to evaluate eligible outcome measures for evidence of validity and potential for future research.

Methods
In order to address the first objective we conducted a systematic review to answer the focused question: ''Which athlete-reported outcome measures of performance have been used to measure the impact of injury and illness on performance in sport?'' Studies were included if they met the following eligibility criteria: (1) participants were currently or had been competing at an elite level as able-bodied athletes; elite level was defined as competitive at Olympic, international, national or professional level [7], (2) any outcome measure of performance, assessed in relation to illness, injury or a related intervention, was reported by the athlete including functional and generic patient-reported outcome measures (PROMS), athlete diaries, interviews and patient satisfaction surveys; (3) the study was published in English. Studies were excluded from the review based on the following criteria: (1) participants were under the age of 16 years; (2) participants were competing at a recreational level; (3) the study was undertaken with a heterogeneous sample (e.g. elite and non-elite, able-bodied and disabled, under and over age 16 years) without reporting groups separately.

Electronic Searches
The databases of MEDLINE (Ovid version), EMBASE, CINAHL Plus, SPORTDiscus with Full Text, and Cochrane library were searched to 26 January 2016. A sensitive search strategy was devised initially in MED-LINE including the following search terms: self-report * athlete * patient reported outcome measure * and used in subsequent searches. An overview of the search strategy is available on request.

Searching Other Resources
The reference lists of included studies were checked for other papers that might be suitable for inclusion.

Data Extraction
Titles and abstracts were screened for eligibility by one of the authors (JG). The full text of all potentially eligible studies was assessed for inclusion by two authors in duplicate and independently (JG and RGS), resolving disagreements by discussion. Where resolution could not be achieved, a third author, experienced in conducting systematic reviews, arbitrated (IN). For included studies, data were extracted using a specially designed form (piloted before use) also in duplicate and independently by two reviewers. Where information in a paper was unclear, the corresponding author was contacted for clarification. Data extraction related to type of study, setting where the study took place, sport, population, injury or illness regardless of need for medical attention and details of the outcome measure.

Quality Assessment
In order to address our second objective, validity of development of outcome measures was assessed. Aspects of validity were evaluated using a pre-defined checklist based on the taxonomy and criteria proposed by Terwee et al. [23,24] for evaluation of measurement properties of health status questionnaires.

Validity
There are many types of validity evidence [6] including face validity (the instrument actually measures the intended construct), content and construct validity. We considered evidence for content validity to include a clear description of the measurement aim, the target population, the concepts being measured and item selection. In addition the target population should have been involved in item selection. Evidence for internal consistency required factor analysis to be applied, with a Cronbach's alpha value between 0.7 and 0.95. Ideally there should be at least 50 participants and minimal floor or ceiling effects [21].
Evidence for construct validity included reporting of values to show convergent validity (agreement in scores from other outcome measures which aim to assess similar constructs) and/or divergent validity (low correlation with scores from outcome measures which assess different constructs). Correlation coefficients such as the Spearman rho or Pearson r are most commonly reported in construct validation studies [6]. There should be at least 50 participants and at least 75% of the results should support a previously defined hypothesis [21].

Reproducibility (Agreement and Reliability)
The outcome measure scores should reflect changes where real change has occurred rather than changes due to measurement error. Evidence for agreement included at least 50 participants and the standard error of measurement (SEM) to be reported along with smallest detectable change (SDC) and minimal important change (MIC) or convincing arguments that agreement is acceptable. Evidence for reliability required at least 50 participants and an intra-class correlation coefficient (ICC) of at least 0.7 to be reported [21].

Responsiveness (Longitudinal Validity)
Evidence for the outcome measurement instrument to detect clinically important change over time included correlation with scores from other outcome measures of the same construct. Interpretability was assessed from evidence that a (change in) score was clinically meaningful along with means and standard deviations (SDs) of scores of reference populations and participant subgroups. In addition an MIC should be defined [21].

Data Synthesis and Reporting
In keeping with the aims of the review, findings from eligible studies were combined narratively using tables of evidence. The characteristics of the outcomes were used to synthesise results as well as validity outcomes. Reporting of the review was guided by the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guideline [25].

Results
The adopted search strategies yielded 6536 results. After removal of duplicates and titles clearly not relevant to the research question, 1358 articles were further screened by title and abstract for consideration in full text screening. The full text of 159 articles was assessed against eligibility criteria and 21 articles were finally included . Agreement on article inclusion was high (0.8). Reasons for exclusion of full text studies are given in Fig. 1.

Characteristics of Included Studies
The studies represented a range of countries, with the USA being the most frequent. Seven categories of health problems including hip and groin, knee, shoulder, lower back, eyes, oral health, overuse injuries and illness were represented across 34 different sports ( Table 1). Ten of the 20 outcome measures were used in evaluations of medical interventions [31, 32, 34, 36-38, 41, 43-45].

Characteristics of the Athlete-Reported Outcome Measures
Athlete-reported outcome measures of performance included return to play, time to return to training/competition, level of competition, perception of performance compared to pre-injury, participation limitation, reduction in volume of training and impact on performance. A summary of the athlete-reported outcomes identified by the search is presented in Table 2.

Evaluation of Athlete-Reported Outcome Measures Used in Health Surveillance
Nine different athlete-reported outcome measures were used in ten observational (epidemiological or surveillance) studies [26,27,29,30,33,35,39,40,42,46]. However, most were designed for use in individual studies without reference to evidence of validity. Selfreported information was used in one qualitative investigation of rugby players' experiences following anterior cruciate ligament injury and repair, conducted over a period of rehabilitation and return to competition [28]. Quality criteria based on a pre-defined checklist [23,24] were applied to the four questionnaires where the study had included a reference to evidence of validity of the outcome measure (Table 3). ArƟcles excluded following full text screening n = 138 Included parƟcipants at lower level of performance n = 46 Outcome not athlete reported n = 35 Included parƟcipants <16 years n = 31 Outcome measures = neuropsychological factors n = 10 Impact of injury/illness on performance in dance (not sport) n = 4 Impact of anxiety on performance in sport n = 3 Impact of sleep on performance in sport n = 3 Impact of personality on performance in sport n = 1 Impact of psychological skills on performance in sport n = 1 Outcome measure = quality of life n = 1 Outcome measure = failure-based depression n = 1 Outcome measure = emoƟonal distress n = 1 ParƟcipants included para-athletes n = 1 ArƟcles screened by full text n = 159

Athlete-Versus Patient-Reported Outcomes to Evaluate Medical Interventions
None of the athlete-reported outcomes of performance used in evaluation of medical interventions cited assessment of validity; seven were used in conjunction with PROMs, not all of which cited validity in a sporting population (Table 2). However, three of the functional PROMs-International Hip Outcome Tool (iHOT-12), Copenhagen Hip and Groin Outcome Score (HAGOS) and Victorian Institute of Sport Assessment-Patellar Tendinopathy (VISA-P)-identified by this review have evidence of validity in a younger active population [48][49][50]. The three generic PROMs used in the studies-Short Form (12) Health survey (SF-12), Short Form (36) Health Survey (SF-36) and EuroQol (EQ-5D) Health Questionnaire-have been reviewed by another author and found to have limited validity in a sport and recreation population [9]. The Hip Sports Activity Scale (HSAS) used to identify level of sporting activity ( Table 4) has evidence of validity in young patients with hip disease [47].

Discussion
Our key finding is that most athlete-reported outcome measures of performance to assess the impact of illness and injury on performance in sport identified in this review were developed for use in individual studies. There can never be a single study which validates an outcome measure; however, evidence of validity and reliability of the inferences drawn from the data accumulates over time with use in multiple studies, thereby allowing meaningful comparison across studies. One oral health self-reported measure of impact on performance was used in Olympic athletes and professional footballers, but evidence of its validity has been assessed in a general population only. Functional PROMs such as i-HOT12, HAGOS and VISA-P, developed using the COnsensus-based Standards for the selection of health Measurement INstruments (COSMIN) guidelines, demonstrate validity in young, active populations but not specifically in elite sport groups ( Table 4). The HSAS self-reported measure of athletic capability has evidence of validity and reliability and could be a useful model for a tool to report the level of competition of athletes in research studies. Although rich in qualitative information, athlete interviews require a substantial time commitment from both the athlete and the researcher, as does the use of multiple PROMs. Consistent use of outcome measures with evidence of validity and reliability could help to quantify the burden of injury and illness and relative risk in athletes across different sporting activities.
Researchers should aim to identify and use outcome measures with evidence of validity in the target group in which they are to be used. Three athlete-reported outcome measures of impact on performance demonstrate validity in a high performance athletic population-the OSTRC overuse injury questionnaire, the OSTRC questionnaire on health problems and the KJOC shoulder and elbow questionnaire; however, the KJOC questionnaire is specific to overhead throwing athletes. All are short and straightforward to complete and measure impact on performance in terms of athlete-reported pain/symptoms, participation, volume and quality of training/competition.

Strengths and Limitations of the Included Evidence
There are challenges to drawing robust conclusions from the included evidence. In general, the data regarding the outcome measures were drawn from their use in single  How did you feel after your first game?
Was the injury still a concern?
Do you feel ready to return to competition?    studies, although one measure of the impact of oral health on performance was used in two separate studies. Few questionnaires reported development using a structured approach and involvement of the target population, limiting their validity.

Eligibility Criteria; Performance Level
In order to limit the review we made a decision to limit the participants in the studies to high performance, able-bodied athletes. This focus resulted in several studies being excluded because the studies included participants with disabilities, participants under the age of 16 years or recreational sports people who could not be separated out from the highest level athletes.

Performance Versus Functional Outcomes
Return to play is dependent on a number of factors, most of which are outside an athlete's control. Included studies had to demonstrate that a self-reported outcome measure was used to evaluate the impact upon performance in elite athletes. This resulted in exclusion of studies which included heterogeneous samples and reported on the development of functional outcome measures using the COSMIN criteria, such as the Functional Assessment Scale for Acute Hamstring Injuries (FASH) [52] and Victorian Institute of Sport Assessment-Achilles Tendinopathy (VISA-A)

Risk of Bias and Quality Assurance
We attempted to minimise bias by developing the protocol a priori and employing duplicate full-text screening and data abstraction. However, initial eligibility assessment of titles and abstracts was carried out by one researcher (JG), which might have introduced bias in study selection.

Comparison with Other Reviews
This review supports the finding of related reviews. One systematic review of PROMs used to assess Achilles tendon rupture management [53] applied COSMIN criteria to 17 region-specific and condition-specific outcome measures; the authors found only four were presented in articles that referenced development and/or validation of that outcome measure and of these only one was developed using recognised methodology for outcome measure development. A systematic review of instruments used to assess outcomes of sport and active recreation injury [9] listed seven different health status and health-related quality-of-life measures, five different functional outcome measures and three physical activity measures; the authors stated that none have been specifically or region designed to measure injury outcomes in a general sport and active recreation population. One recent study of low back pain in international level rowers [54] recommended using the OSTRC overuse injury questionnaire, demonstrating its potential for use across all sports.

Conclusion
Within the limits of this review there is currently no universally accepted athlete-reported outcome measure of the impact of injury/illness on performance in sport. Most questionnaires were designed for individual studies and evidence to support their validity, reliability and Validity in sport population [47] Useful to clarify level of sport performance responsiveness has not been reported. The KJOC shoulder and elbow questionnaire has evidence to support its validity, reliability and responsiveness but is specific to professional baseball players. Consistent use of self-reported outcome measures with evidence of validity, reliability and responsiveness would lead to more reliable and comparable evidence. Despite some limitations, as a potential tool to measure athlete-reported impact on performance across a variety of sports, the OSTRC questionnaire on overuse injuries forms a model that could be adapted to evaluate the impact of any pre-defined health problem on athletic performance. The addition of items related to impact on quality of life could add value in terms of understanding the negative consequences of injury and illness in sport.
Author contributions Ian Needleman conceived the study. Robbie Lumsden assisted with formulating the systematic search strategy. Julie Gallagher and Ruben Garcia Sanchez were responsible for duplicate screening and data extraction. Julie Gallagher prepared the first draft of the study protocol and the manuscript and was chiefly responsible for the conduct of the review. Ian Needleman, Paul Ashley and Julie Gallagher contributed to the final draft of the protocol and manuscript.

Compliance with Ethical Standards
Funding The research project that resulted in this review was funded by an investigator-led grant from GlaxoSmithKline (Award Number 157871) and an Impact Award from University College London.
Conflict of interest Julie Gallagher, Paul Ashley, Ruben Garcia Sanchez, Robbie Lumsden and Ian Needleman declare that they have no conflicts of interest relevant to the content of this review.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http:// creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.