Key Points

  • Strength assessment, performance tests analysis and sport specific skills evaluation can be considered helpful in RTP assessment in athletes with long-standing adductor-related groin pain (LARGP).

  • Imaging methods are not considered useful in RTP decision making process.

  • Agreement established between experts in groin pain can assist clinicians in RTP decision making.

Background

Groin pain provides a massive challenge for all those involved in diagnostic, rehabilitation and physical preparation of athletes at all levels due to the complex anatomy of the groin region and the poor understanding of the adverse mechanisms that predispose the athlete to injury [1, 2].

Studies in professional sports have found groin injury to be the fourth most common injury in soccer [3] and the third most common injury in Australian rules football [4]; it has also shown to have a high prevalence in ice hockey [5] and rugby [6].

Long-standing adductor-related groin pain (LARGP) is a persistent clinical condition with gradual or sudden onset characterised by adductor tenderness and pain on resisted adduction testing [7]. It is a frequent complaint in athletes involved in multidirectional field sports that require multiplanar movement patterns, such as change of direction (COD) [8, 9], high-speed sprinting [4, 10] and kicking [11].

In accordance with Strategic Assessment of Risk and Risk Tolerance (StARRT) framework [12, 13], return to play (RTP) decision making is a complex process based on the evaluation of health and activity risks but it is also influenced by the assessment of the risk tolerance modifiers.

Combining information from biological, psychological and social standpoints can help all RTP decision-makers (clinician, physiotherapist, coach) to make optimal and shared decisions [14].

Nevertheless, RTP criteria for many common injuries like groin pain are not based on solid scientific evidence due to the lack of clarity and consensus on the term ‘return to play’ [14]. So far, no studies have specified which criteria should be assessed by clinicians to allow an athlete suffering from groin pain a timely and fully RTP.

The aims of this Delphi study were to reach an agreement between a panel of experts, based on opinion and practical experience, and suggest potential criteria that could be taken into consideration by clinicians in the RTP decision-making process in athletes suffering from LARGP.

Methods

Purpose and Rationale

The Delphi is a group facilitation technique that seeks to obtain consensus on the opinion of “experts” through a series of structured questionnaires commonly referred to as “rounds” [15]. The Delphi is therefore an interactive multistage process designed to combine opinion into group consensus [16, 17]. The initial questionnaire may also collect qualitative comments which are feedback to the participants in a quantitative form through a second questionnaire [15].

This scientific method has been effectively used in Sports Medicine research [18,19,20,21].

The whole process lasted from February 2020 to July 2020. A total of 3 rounds were carried out using the platform https://www.google.com/intl/it/forms/about/.

Steering Committee

The Delphi survey was created by a 5-member steering committee consisting of four sports physiotherapists and one sport physician, all with background in clinical research and elite sport.

Expert Panel and Procedure

In accordance to previously Delphi studies published [21,22,23], to be considered eligible, to participate in the study, only healthcare practitioners meeting the following inclusion criteria were deemed eligible: (1) 2 or more peer-reviewed publications in the field of groin pain in athletes and (2) experience in scientific methodology and/or (3) clinical expert and designated member of the conference organising committee and (4) follow evidence to guide their clinical decisions and (5) sufficient knowledge of the English language.

According to “snowballing method” [24], each expert contacted could, in turn, invite 3 additional experts then submitted to inclusion criteria.

The experts were contacted (26 directly invited, a further 14 suggested by experts; n = 40) via e-mail and they were asked to be willing to participate in the study and information about the aim and methodology of the study were provided. Participants were given 1 month to complete the questionnaire in each round, with email reminders sent to non-responders after 10 days and 20 days, respectively.

Round 1

Round 1 was the only round prepared before the beginning of the study because each subsequent round was dependent on the responses from the previous one.

Written explanation of the experimental procedure was provided to each individual; this included the aims of the study, the experimental procedures to be utilised and a clear explanation of the use of the definition of LARGP [7] and the use of the definition of RTP [25]. Individuals then provided written, informed consent before participating in the study.

The first round was divided into 2 parts. The first one investigated the “demographic” characteristics of the participants: profession, affiliation, years of experience in the field of sports medicine, the number of athletes treated with groin pain/year, the number of peer-reviewed studies published around groin pain (Table 1).

Table 1 Demographic characteristics of participants

The second part included 38 questions divided into 9 different sections (palpation, flexibility, strength, patient-reported outcome measures (PROMs), imaging, intersegmental control, performance tests, sport specific skills, training load). All sections were selected based on the literature, with the objective of investigating the clinical assessment of each researcher in the evaluation of RTP.

During the first round both close-ended and open-ended questions were used.

The sentence “Do you use/analyse “X” when evaluating RTP in LARGP?” was the first closed-ended question put at the beginning of each section.

Within the section there were also open-ended questions to provide the researcher the possibility to motivate his answer and/or indicate aspects not considered in the question asked.

In accordance with Joyner et al. [26], the answers to each open-ended question were divided into categories. In order to reduce categorisation bias responses were independently coded by 2 different researchers (MZ and MC), and compared only at the end to discuss the final categorization [20]. The 3 categories with the highest consensus were then included in the second round and submitted to other researchers [26].

Round 2

At the beginning of the second round, the categories that reached the cut-off value were listed and the aim of the study was explained again. Round 2 questionnaire investigated only the categories that reached the cut-off value. The first question in each section asked whether or not the researcher considered the category concerned as a RTP criterion. The following questions were formulated based on the answer given within the round 1 questionnaire, feedbacks and suggestions in order to go into more details surrounding each of the categories.

Round 3

Round 3 questionnaire investigated only the answers that have reached the cut-off value in round 2.

For all items that reached the cut-off value in round 2, researchers were asked to express their degree of consensus by using the Likert-scale [27] with values from 1 to 5 (Strongly Disagree, Disagree, Neutral, Agree, Strongly Agree).

At the end, participants were given the opportunity to share comments on the whole Delphi process.

Data Analysis

Data from all Delphi rounds were collected using Google online forms and extracted to IBM SPSS V.21 for statistical analysis. Two of the steering committee members independently performed content analyses and a third investigator was consulted whenever there were any disagreements/ambiguity around the tagging, categorising and interpreting the responses. In closed-ended questions (option yes/no or specific items from a list to be selected), the frequency of each expert’s response was recorded and converted to a percentage (%). For open-ended questions, following recommendations by Côté et al. [28], qualitative data (ie, expert answers, justifications and suggestions) were coded, listed and compared in order to produce clusters of similar concepts which adequately represent the information received by experts. If responses to analogies reached ≥ 70% threshold [18, 20, 23, 29, 30] that particular item/criterion was considered as reaching consensus among the experts and was thereafter retained and elaborated on in further rounds, while those concepts not reaching consensus were discarded. Content analysis was used throughout rounds 1 and 2. Regarding round 3, ratings for each item coded (1–5) were expressed as means with standard deviation (SD). Consensus between participants was measured using coefficient of variation (CV%) and percentage agreement (%AGR) [31]: CV% is a measure of dispersion and %AGR was defined as the percentage of responses falling within the top two categories of the 5-point scale (Agree and Strongly agree).

Agreement between participants was also evaluated across all items using Kendall’s W coefficient (W) of concordance, a non-parametrical statistic that is used to assess strength and changes of agreement between raters [31]. In round 3, Mean rating ≥ 3.5, CV% ≤ 30%, %AGR ≥ 70% and W < 0.05 were defined as concurrent requirements for consensus in order to define a final agreement between experts for RTP in LARGP. Statistical significance was set at p < 0.05.

Results

32 experts over 40 (80%) accepted the invitation to participate in the study and the response rate across the three rounds was 100%.

On the 9 different criteria proposed for RTP, full consensus was achieved on strength, performance tests, sport-specific skills (positive agreement) and imaging (negative agreement).

Round 1

The sections that reached positive consensus in round 1 were: Palpation (78%), Strength (97%), PROMs (72%), Intersegmental Control (72%), Performance tests (78%), Sport-specific skills (87.5%).

The section Imaging reached a negative consensus (75%), however, the sections Flexibility and Training Load did not reach any consensus.

As reported in Table 2, consensus was achieved by 1 item (1/2) in the Palpation section, 5 items (5/21) in the Strength section, 2 items (2/8) in Intersegmental Control, 2 items (2/8) in Performance tests and 2 items (2/2) in Sport-specific skills.

Table 2 Expert panel answers in round 1

Furthermore, Table 2 contains the items list (Top3 approved by 2 researchers through independent coding, based on participants’ answers and suggestions. The list was included in the round 2 questionnaire.

Items of sections that achieved negative or no consensus were not added to the round 2.

However, all the items included in the round 1 are available on the attachment of Additional file 1.

Round 2

In round 2, 4 out of the 7 sections reached consensus as RTP criteria while the other 3 sections did not reach it.

A positive consensus has been confirmed for the sections: Strength (94%), Performance tests (91%), Sport-specific skills (91%).

A negative consensus has been confirmed for Imaging (78%).

Palpation, PROMs and Intersegmental Control lost the consensus obtained in round 1.

In the section of Strength 1 item (1/29) reached consensus, in Performance tests 1 item (1/7) and in Sport-specific skills 3 items (3/5).

Percentages were described in detail in Table 3.

Table 3 Expert panel answers in round 2

A list of all items and full percentages is available within the Additional file 1.

Therefore, a form with 4 sections and 11 items was finalised for round 3 (agreement round).

Round 3

Kendall’s W was significant at 0.03 (p < 0.001).

Round 3 final agreement is presented in Fig. 1.

Fig. 1
figure 1

Round 3 final expert agreement on RTP criteria

Agreement was established for the Strength section with 3 items, the Performance tests section with 3 items and the Sport-specific skills section with 2 items.

A negative consensus was established for the Imaging section.

Discussion

The aim of this Delphi study was to achieve an agreement between experts on RTP criteria in LARGP.

The main finding was that assessment of strength, performance tests and sport-specific skills would seem to be a sine qua non in RTP complex process in athletes affected by LARGP.

As reported in Fig. 2, it was established that during strength evaluation it would seem crucial to analyse adductors isometric and eccentric strength considering “side-to-side symmetry”.

Fig. 2
figure 2

RTP criteria in long-standing adductor-related groin pain (physical assessment)

Planned/unplanned COD analysis seems to be considered as a criterion when performance tests are evaluated; athletes should be confident during completion and totally pain free.

At the same time, during sport-specific skills analysis, athletes should be confident and completely pain free during execution.

Although few items and 4 out of 9 categories reached final agreement, low CV% (mean 18.3%, range 12.9–28.7), high %AGR (mean 84.4%, range 65.6–96.9%) and W = 0.03 (p < 0.001) show the robustness of the consensus established.

Strength

Experts agreed on the importance of strength assessment as a RTP criterion (96.9%). Specifically, at the end of 3 rounds, consensus was achieved for the evaluation of hip adductors isometric strength (75%) and eccentric strength (84.4%).

These findings are in line with several studies [32,33,34] and are supported by evidence that highlights the usefulness of strength both as outcome measure [35, 36] and rehabilitation criterion in groin pain [34, 37, 38].

Despite no agreement established for strength tests to be used, the squeeze test 0° for isometric strength (66.7% of answers) and eccentric strength assessment in side-lying position (53.3%) would seem to be assessment methods with a wider consensus between experts.

Side-to-side symmetry is a discriminating factor in RTP: 87.6% of participants consider this parameter as a criterion to analyse during RTP process.

Although several studies support strength assessments of other hip muscle groups [39, 40], in our study none of these groups achieved expert consensus.

No final agreement was established for strength analysis of other muscle groups; nevertheless, trunk flexors got a high rate of positive response in round 2 (90%). A total of 18 out of the 20 participants consider strength of aforementioned muscle complex important to evaluate. This could be an interesting clinical tip to consider even though no final consensus was achieved.

Imaging

Imaging is the only section that achieved negative consensus (93.7%). In fact, experts strongly agree to not consider or include imaging methods among RTP criteria.

Although imaging can be a valid diagnostic tool to support the clinical examination and identify red flags [41], to date no study supports its use in RTP decision. Therefore, our finding would seem in agreement with literature [42, 43].

Performance Tests

Experts agree that analysis of performance tests can be considered as a criterion to establish RTP readiness in athletes suffering from LARGP (93.7%).

No specific test reached the 3 rounds agreement, but a strong consensus (96.9%) was achieved on the use of planned/unplanned COD to varying degrees (45°-90°-110°-180°).

Data established seems to strongly agree with the current evidence [8, 44, 45]; COD is considered an evocative and provocative movement in groin pain [46, 47] and both a sport-specific movement and a reliable outcome measure [48, 49].

Experts agree that athletes must be fully asymptomatic (78.1%) and confident (93.7%) during COD execution. This seems to be confirmed by Serner et al. [50] that used COD, absence of symptoms and athlete confidence among RTP criteria, even if their study was on acute adductor injuries.

Sport-specific Skills

To date, no study in the literature thoroughly examined the use of sport-specific skills in RTP in LARGP.

However, skills such as “kicking a ball” are considered potential causes of groin pain onset [51].

Buckthorpe et al. [52] recommended the analysis of sport-specific movements to allow the athletes a full and safe RTS.

In the present study, the sport-specific skills section achieved solid consensus (96.9%). In addition, experts agree that athletes must be asymptomatic (75%) and self-confident (96.9%) during the execution of sport-specific tasks.

Even if parameters such as quality of movement and performance in skills execution did not reach agreement, the percentage obtained among participants (65.6%) suggests that these aspects could play a role as well.

No Agreement Sections

Three categories that did not achieve consensus (palpation, PROMs and intersegmental control) would seem to be in some way relevant in RTP decision-making although they are not considered as criteria.

It was established 78.1% of experts use palpation in RTP stage but just 68.8% of them uses it as criterion. Despite literature seeming to agree in assuming that pain-free palpation is important during RTP when considering other muscle injuries [18, 53], no expert consensus was achieved for LARGP. In round 1, 56.5% of respondents (13/23) allows pain in palpation, while 43.5% (10/23) requires a complete absence of symptoms.

A total of 71.9% of experts uses PROMs but only 59.4% of them uses it as criteria in RTP. The PROMs most used by clinicians and researchers (91.3-21/23) is HAGOS [54].

Intersegmental Control analysis seems to be useful in managing groin pain [8]. Even if for this category no agreement was established, intersegmental control is used by 71.9% of sample. In particular, 78.3% of respondents (18/23) use single leg squat as a test to assess motor control.

Analysis of flexibility did not reach the consensus in its utilisation. Nevertheless, evidence highlights the importance of getting total hip range of motion to avoid recurrence episodes of groin pain [35].

Even if training load (TL) cannot be considered a valid tool to assess injury risk [55], as reported by Cummins et al. [56], load management could represent a helpful tool to manage RTP progression. However, in the present Delphi study neither internal nor external load parameters reached consensus.

Discussion of Limitations

The over-representation of some geographical regions, working setting and specific expertise based on different healthcare profession could have introduced unintended bias, as well as the inclusion of experts with knowledge of the English language (required to understand the survey).

Conclusion

Our research showed an agreement among experts on 4 out of 9 sections. As suggested by our expert panel, RTP framework is a complex process composed of several decision-modifiers, however, these findings could be a useful practical tool (Fig. 2) for clinicians in the “first-step” planning of RTP physical aspects assessment. Nevertheless, it would be desirable to establish a more solid and broad experts’ consensus, including assessment of other items, psychosocial factors and a wider and heterogeneous expert’s cohort.