FormalPara Key Points

Twenty-nine different strength tests and 31 different power tests were identified in elite soccer.

Isokinetic knee extensor strength, isokinetic knee flexor strength and Nordic hamstring test represent the three most frequent strength tests.

Countermovement jump, squat jump and vertical jump represent the three most frequent power tests.

1 Introduction

Soccer is an intermittent sport in which high-to maximum-intensity bouts (i.e. jumping, passing, shooting, tackling, turning, sprinting and changing pace) are interspersed with low-intensity activity [1]. Despite the fact that the explosive actions executed during a soccer match account for only a small percentage of the total distance covered, their role is pivotal given many of them are deemed to be key determinants of success, both at an individual and team level [2]. Specifically, male individuals competing in the top European leagues cover distances of approximately 9–14 km, with approximately 900 m at high speed (> 19.8 km/h), and 300 m at sprinting speed (> 25.2 km/h) [3,4,5]. In addition, it is common for over 700 changes of direction to be performed in a single match, although some variation exists between playing positions [6]. Furthermore, the physical demands of elite soccer are becoming more demanding, placing increased demands on the players in terms of the quantity and quality of explosive actions [7, 8]. Consequently, possessing a well-developed set of physical attributes such as strength and power is essential for optimising performance and increasing the chances of a long and successful career at the elite level.

Strength and power are key components of an elite soccer player’s physical profile as they largely underpin the successful completion of many of the crucial actions that occur during the game, such as sprinting, jumping, turning, winning physical duels and scoring goals [9,10,11]. The effectiveness of strength and power interventions in improving the effective execution of various explosive actions such as acceleration, top speed, jumping ability and change of direction has been well documented in elite soccer [12,13,14,15]. In addition, previous research has revealed differences in strength and power levels of starting and non-starting, senior and youth, and professional and amateur soccer players [16,17,18]. Nevertheless, the significance of strength and power is not confined to the concept of performance enhancement. Multiple studies have shown that strength and power can help mitigate against injury [19,20,21,22]. Given the fact that elite soccer players are regularly exposed to a congested match and training schedule, maintaining a sufficient level of strength and power is likely to have an influential role in ensuring players are physically robust, thus also reducing the chance of non-contact injuries. Therefore, special consideration needs to be paid to optimising the strength and power outputs of soccer players, which can have a significant impact on both performance and availability to train and compete.

With this in mind, fitness testing constitutes an integral component of the physical development process, as it facilitates the objective assessment of individual and team fitness levels, the comparison of athlete’s performance to normative data, the identification of strengths and weaknesses, and the effectiveness of a training intervention [23, 24]. This can inform decision making on whether to continue or modify a training programme, helping to promote an individualised approach to training prescription [25]. A recently performed survey examining the practices of strength and conditioning coaches in professional soccer settings revealed the importance placed by practitioners on strength and power assessments [26]. However, despite the well-established role of using testing to determine the efficacy of a training programme [24, 27, 28], no large-scale scoping or systematic review has been conducted on the most appropriate and reliable strength and power protocols for soccer. This is somewhat surprising given the popularity of soccer and the vast quantity of assessment methods available to practitioners. Such assessment methods include, but are not limited to: isokinetic dynamometry, repetition maximum (RM) back squat, a variety of isometric strength testing protocols, use of barbell velocity for 1RM estimation, eccentric knee flexor strength via the Nordic hamstring exercise and a plethora of different jumping protocols [24, 29]. This lack of uniformity poses a significant challenge to practitioners, as inconsistent test selection and administration do not allow for the establishment of normative standards. While a standardised testing battery could be valuable for benchmarking purposes, its realisation can be difficult owing to practical constraints, such as time scarcity and equipment availability. As such, testing selection and implementation must be tailored to the specific needs and resources of each setting [30]. In addition, the testing selection process should be influenced by the reliability or repeatability of a test [31], as well as its sensitivity, which refers to the ability of a test to detect small but important changes in performance [32]. If a test cannot be reliably reproduced, practitioners cannot be confident that the test score is an accurate reflection of an athlete’s ability, and whether any subsequent performance changes are true. Hence, practitioners must have a good understanding of these core concepts.

Although previous work has shed light on strength and power testing in soccer [24, 29], a comprehensive and systematic search of strength and power testing in elite soccer is still missing. A systematic review of the literature could offer valuable insights to practitioners working in elite soccer on testing selection, by providing a clear and comprehensive picture of all the available options for strength and power assessments. Furthermore, a potential investigation of the reliability and sensitivity of these tests can support evidence-informed decisions on the strength and power assessments to be used. In addition, as one of the main responsibilities of soccer practitioners is to prepare physically robust athletes that can withstand the demands of the contemporary game, reporting and summarising normative data from studies performed in elite soccer settings could further facilitate the process of strength and power benchmarking. With this in mind, the aims of this systematic review were to: (1) identify the tests and outcome variables used to assess strength and power of elite male soccer players; (2) provide normative values for the most common tests of strength and power across different playing levels; and (3) report the reliability values of these strength and power tests.

2 Methods

2.1 Design and Search Strategy

A systematic review was conducted in accordance with the Preferred Reporting Items of Systematic Reviews and Meta-Analyses (PRISMA) statement [33]. A search of the academic databases MEDLINE, CINAHL, SPORTDiscus, Web of Science and OVID was performed from the earliest record to August 2023 to identify English-language, peer-reviewed, original research studies that evaluated strength and/or power ability in elite male soccer players. Keywords employed for the identification of the studies are shown in Table 1. Search levels 1–5 were all linked by the Boolean operator ‘AND’, whereas search terms within each search level were joined with ‘OR’ or ‘NOT’. All search results were extracted and imported into a reference manager software (RefWorks, ProQuest LLC, Ann Arbor, MI, USA).

Table 1 Search strategy terms

2.2 Study Selection

Following the removal of duplicates, two reviewers (NA and CB) independently screened all titles and abstracts against the inclusion and exclusion criteria of the review. Studies that did not meet the inclusion criteria were removed. Conflicts were resolved through discussion, or via a third reviewer (AT). The full text of the articles that were not excluded during this process were subsequently reviewed for eligibility. Supplementary to the systematic search, the reference lists of the included papers were reviewed for the identification of potentially eligible articles. With regard to the first objective of the review, studies were eligible for inclusion if they: (1) were original research studies, published in a peer-reviewed journal, and written in English language; (2) had the primary aim to assess strength and/or power; (3) players were male and older than 17 years of age (i.e. mean age of the group), in line with a previous systematic review on fitness testing (Altmann et al., 2019) and to minimise any potential influence of maturation; and (4) their playing level was defined as “professional”, “international” or “elite”. In contrast, studies were excluded from the review if they: (1) were narrative or systematic reviews and/or meta-analyses; (2) assessed physical characteristics as a result of other research aims (i.e. fatigue, recovery, nutrition and genome); (3) the sample consisted of different team sports; (4) players were semi-professional; and (5) players were younger than 17 years of age. For the second objective, studies were eligible if they reported the mean result of the tests under consideration and clearly distinguished between different groups (i.e. professional vs amateurs, men vs youth, male vs female). As such, only normative data for elite male soccer players older than 17 years old were recorded. For the third objective, studies were included if they provided information about reliability statistics (i.e. within-day and/or between-day).

2.3 Assessment of Methodological Quality

A modified version of the Downs and Black [34] assessment scale was used to evaluate the methodological quality of included articles. This checklist has been used previously in systematic reviews with similar research aims [35, 36] and can be adapted to the scope and the needs of the systematic review [37]. More specifically, 11 questions (1–4, 6, 7, 10, 11, 16, 18, 20) from the traditional version of the checklist were considered relevant with the specific aims of this systematic review, and therefore used to grade the methodological quality of the included studies (Table S1 of the Electronic Supplementary Material [ESM]). For the purposes of this review, question 4 was directed to whether the testing procedures in each study were clearly described. Each question was scored as either a ‘1’ (yes) or a ‘0’ (no or unable to determine). Scores were summed for each study with a total score of ‘11’ representing the highest possible score.

2.4 Data Extraction

Data were extracted and documented using a Microsoft Excel 365 spreadsheet (Microsoft Corporation, Redmond, WA, USA). Extracted data from each study included research design, publication details (authors and year of publication), sample information (number of participants, age of the sample, playing level), tests performed to evaluate strength and/or power ability, outcome measures derived from each test, as well as normative values for each test including reliability values (i.e. intraclass correlation coefficient [ICC], coefficient of variation [CV], standard error of measurement [SEM], minimal detectable change [MDC], Pearson’s r and Cronbach’s alpha [α]), where available. Playing level was classified into two distinct categories: (a) senior professionals (i.e. players that were regular members of the first team of a professional soccer club and/or a national team’s senior squad) and (b) elite youth (i.e. players over 17 years of age who were members of the youth department of a professional soccer club, yet not regular members of the first team, were members of a junior national team squad or defined as “elite” by the authors of the study). This distinction was made to account for physiological and training age differences, which is crucial for contextualisation of normative data and more accurate benchmarking. If more than one group of players were investigated in a study, only the groups with a mean age of 17 years or older were considered. To fulfil the purpose of reporting normative values, the mean of each group (i.e. senior professionals vs elite youth) was recorded. When a study consisted of multiple groups of the same playing level, the average of the mean and the pooled standard deviation were recorded. In intervention studies, the baseline values were recorded to eliminate any intervention bias. When a repeated-measures with no intervention study design was implemented (e.g. during seasonal variation studies), the most recent testing point was recorded (except when the most recent point was after a congested fixture period).

3 Results

3.1 Identification and Selection of Articles

The selection process flowchart is presented in Fig. 1. The initial search of databases identified 4217 articles. After removing the duplicates (1468 articles), the titles and the abstracts of 2749 articles were screened. This resulted in the selection of 224 articles to be assessed for eligibility through full-text review. Furthermore, 13 studies were identified through reference lists for full-text eligibility assessment. Following full-text screening, 194 were included for the aim of identifying the tests and outcome variables used to assess strength and power in elite male soccer. Additionally, 139 of these were included for the purpose of reporting normative values for the most common strength and power tests.

Fig. 1
figure 1

Flow of selection process of eligible studies for a qualitative and quantitative synthesis

3.2 Evaluation of Methodological Quality

Assessment of quality scores can be found in Table S2 of the ESM, with an observed range from 4 to 10 for the 11 items assessed.

3.3 Characteristics of Included Studies

Table S3 of the ESM provides a summary of characteristics of the studies included in this systematic review. The range of sample size was 10–939, with a median of 29 participants. A total of 120 studies included senior professionals as participants, 56 included elite youth, while 18 studies involved a group of both. The age range of the samples involved in the studies was 17.0 to 28.3 years, with a median age of 20.7 years. From a study design standpoint, 100 (52%) studies used a cross-sectional design, 44 (22.9%) were intervention studies, 43 (22.4%) used a repeated-measures design, 5 (2.0%) were reliability studies and 2 (1%) were validity studies. The studies took place in 39 different countries, with Brazil, Spain, Qatar, England and Portugal being the countries with most occurrences.

3.4 Tests and Outcome Variables Used to Assess Strength

Evaluation of strength was performed in 115 studies (59.2%) [Table 2]. A total 29 different tests were used to assess strength, further illustrating the wide range of assessment methods for the evaluation of this physical characteristic. Four different types of strength were evaluated (i.e. isokinetic, isometric, dynamic and eccentric). Isokinetic strength was the most frequently evaluated, being present in 62 studies (54.9%), followed by isometric strength in 29 studies (25.7%), dynamic strength (i.e. the ability to produce force during dynamic movements that include both the eccentric and concentric part, such as squat or bench press) in 27 studies (23.9%), and eccentric strength in 19 studies (15.9%). Isokinetic strength was evaluated by eight different tests, isometric strength by 11 different tests, dynamic strength by seven different tests and eccentric strength by three different tests. The three most frequently occurring tests were: (1) knee extensor isokinetic strength (58 studies); (2) knee flexor isokinetic strength (55 studies); and (3) knee flexor eccentric strength (14 references). It is noteworthy that the hip adductor strength test and the half-back squat test were also frequently employed (12 studies each). Of note, the total number of studies that assessed eccentric hamstring strength with the Nordic hamstring exercise, as well as those that assessed isometric and eccentric strength for the hip adductors and abductors, were all performed during the last decade. Isokinetic dynamometry was the predominant measurement method in the overall number of studies that assessed knee extensor and flexor isokinetic strength. In contrast, Nordic hamstring exercise was the primary measurement method to assess knee flexor eccentric strength (13 studies). For knee isokinetic strength, peak concentric torque (48 studies), conventional strength hamstrings/quadriceps ratio (28 studies) and relative peak concentric torque (17 studies) were the main outcome variables. In terms of knee flexor isokinetic strength, the three main outcome variables were peak concentric torque (46 studies), peak eccentric torque (28 studies) and relative peak concentric torque (16 studies). Last, peak force (13 studies) was the main outcome variable in the assessment of eccentric knee flexor strength via the Nordic hamstring exercise.

Table 2 Strength tests and outcome variables

3.5 Tests and Outcome Variables used to Assess Power

Evaluation of power was performed in 127 studies (65.4%) [Table 3]. Thirty-one different tests were used to assess power in elite soccer players, employing primarily various types of jumps (24 different in total). The countermovement jump (CMJ) with hands fixed on hips (99 studies), squat jump (SJ) (48 studies) and vertical jump with the use of an arm swing (VJ) [29 studies] were the most frequently utilised. Among these, jump height was by far the most common outcome variable reported in 95 studies in the CMJ, 47 studies in the SJ and 27 studies in the VJ. However, it is important to note that the calculation of jump height was based on different methods (e.g. impulse-momentum method, flight time method) owing to the different equipment used. Furthermore, two additional commonly reported outcome variables in CMJ were relative peak power (W/kg) [nine studies] and peak power (W) [five studies]. Among unilateral tests, the single-leg CMJ test (SLCMJ) was the most frequently implemented, featuring in 12 studies. It is noteworthy that all of those studies were performed during the last decade. Finally, the drop jump test (used to assess reactive strength ability) [38] was reported in eight studies.

Table 3 Power tests and outcome variables

3.6 Reliability Data

Reliability statistics reported for the strength and power tests can be found in Tables S4 and S5 of the ESM, respectively. For strength tests, reliability statistics were reported in 15 studies (13%). Intra-day reliability was the most common reliability type reported in nine studies, while inter-day reliability was determined in five studies. One study also assessed inter-season reliability. In terms of specific reliability metrics, the reported metrics were ICC (13 studies), SEM (seven studies), CV (four studies), MDC (two studies) and Cronbach’s alpha (one study). Knee extensor isokinetic strength (four studies), knee flexor isokinetic strength (four studies), half-back squat (four studies) and Nordic hamstring testing (three studies) were the tests for which reliability values were most reported. Intra-day reliability values (i.e. ICC, CV, SEM) were higher compared to inter-day and inter-season reliability for all these tests. In terms of power tests, reliability values were reported in 34 studies (27%). Intra-day reliability was the most reported type with 30 studies, whereas a considerably lower number of studies reported values for inter-day (three studies) and inter-season (one study) reliability. The ICC (29 studies) and CV (22 studies) were the most reported metrics, followed by SEM (four studies), Cronbach’s alpha (three studies), Pearson’s r (two studies) and MDC (one study). Countermovement jump (27 studies), SJ (15 studies) and SLCMJ (five studies) represented the tests with the highest availability of reliability values. Specifically, intra-day reliability for CMJ height ranged from 0.80 to 0.99 (ICC) and from 1.8 to 15% (CV), with a SEM that ranged from 0.6 to 1.4 cm. In contrast, the only study that investigated the inter-day reliability in CMJ height reported values of 0.83 (ICC) and 4.3% (CV), with a SEM of 1.7 cm. With respect to SJ height, intra-day reliability ranged from 0.75 to 0.99 (ICC) and from 2.12 to 13.2% (CV), with a SEM of 0.6 cm. Similar to CMJ, only one study examined inter-day reliability, reporting an ICC value of 0.89, a CV of 3.7% and a SEM of 1.4 cm. Last, only intra-day reliability was reported for SLCMJ height. In particular, ICC exhibited a range from 0.74 to 0.99, a CV from 1.98 to 9.63% and a SEM from 0.3 to 1 cm.

3.7 Normative Values for Strength in Elite Male Soccer Players

3.7.1 Knee Extensor Isokinetic Strength via Isokinetic Dynamometry

Table 4 provides the normative values for the knee extensor isokinetic strength test. A range of different angular velocities was observed in the studies that reported normative values in the knee extensor isokinetic strength testing. However, the majority of the studies that reported normative values were conducted at 60°/s. In senior professionals, the mean values ranged from 212.9 to 364 Nm in peak concentric torque (32 studies), from 2.45 to 3.62 Nm/kg in relative peak concentric torque (15 studies) and from 54.0 to 65.5% for the conventional strength hamstrings-to-quadriceps ratio (17 studies). In elite youth, the mean values ranged from 208 to 331 in peak concentric torque (six studies). In terms of relative peak concentric torque, only one study reported normative values, with a value of 3.14 Nm/kg. Finally, the mean values of conventional strength hamstrings-to-quadriceps ratio ranged from 50 to 60.5 in elite youth soccer players (three studies).

Table 4 Normative values for peak concentric torque, relative peak concentric torque and conventional strength hamstrings/quadriceps ratio during the knee extensors isokinetic strength test

3.7.2 Knee Flexor Isokinetic Strength via Isokinetic Dynamometry

Normative values reported for the knee flexor isokinetic strength test can be found in Table 5. As with knee extensor isokinetic strength testing, a variety of different velocities were used in the studies that reported normative values for knee flexor isokinetic strength test. Angular velocity of 60°/s had the greatest number of available normative data. For senior professionals at 60°/s, the mean values for peak concentric torque ranged from 113.2 to 190.5 Nm (30 studies), from 153 to 213.4 Nm for peak eccentric torque (15 studies) and from 1.2 to 2.1 Nm/kg for relative peak concentric torque (11 studies). Conversely, elite youth players had average peak concentric torque values that ranged from 114 to 187.4 Nm (four studies). Only two studies reported a normative value in elite youth for eccentric peak torque (range 149–177.1 Nm). No study reported relative peak concentric torque values for elite youth soccer players.

Table 5 Normative values for peak concentric torque, peak eccentric torque and relative peak concentric torque during the knee flexors isokinetic strength test

3.7.3 Knee Flexor Eccentric Strength via Nordic Hamstring Exercise

Results of the peak force (N) attained by elite soccer players can be found in Table 6. As can be observed, different equipment has been used to assess eccentric knee flexor strength. More specifically, the range of the average values in senior professionals was from 277.5 to 403.7 N (seven studies). Only two studies reported values for peak force in elite youth soccer players. The different devices employed yielded extremely different values, with one reporting a value of 338.2 N and the other 636.5 N.

Table 6 Normative values for peak force during the Nordic hamstring strength test

3.8 Normative Values for Power in Elite Male Soccer Players

Normative values for the CMJ, SJ, and VJ test are presented in Tables 7, 8 and 9, respectively. For CMJ, the average values of jump height observed in senior professional soccer players ranged from 33.6 to 57.2 cm across 54 studies, while the mean values ranged from 34.8 to 58.6 cm across 33 studies in elite youth soccer players. In terms of relative peak power during CMJ, the average values in senior professional soccer players ranged from 26.3 to 54.5 W/kg across four studies. However, only one study reported the relative peak power value in elite youth, which was 55.1 W/kg. In addition, the average peak power values in senior professionals ranged from 3474 to 5029 W (range 3474–5029) [four studies], while the only study that reported a value in elite youth yielded a value of 3778 W. For SJ, the average jump height in senior players ranged from 29.8 to 44.1 cm (23 studies), whereas it ranged from 34.3 to 52.8 cm in youth (16 studies). Last, the average VJ jump height values in senior players ranged from 41.1 to 56.4 cm across 13 studies, while the mean values in elite youth ranged from 41.6 to 65 cm across 13 studies.

Table 7 Normative values for jump height, peak power and relative peak power during the countermovement jump
Table 8 Normative values for jump height during the squat jump
Table 9 Normative values for jump height during the vertical jump with free arms

4 Discussion

The aims of this systematic review were to: (1) identify the tests and outcome variables used to evaluate strength and power in elite male soccer players; (2) provide normative values on the most common strength and power tests; and (3) report the reliability values of strength and power tests used in elite soccer. In summary, the large volume of studies included in this review (194 studies) is indicative of the high level of interest in strength and power assessment in soccer within the scientific community. A wide variety of tests were employed to assess strength and power, which was to be expected given the various time and financial constraints, as well as the different approaches to training and testing in soccer. A considerable amount of variability was also evident in the methods used to calculate the outcome variables of a test, as well as in the terminology used to describe the test. For instance, two distinct methods were identified in the calculation of the jump height (take-off velocity and flight-time method), while the terms “back squat” and “half-back squat” were used sometimes interchangeably. A total of 29 different tests were identified for strength assessment, of which the isokinetic strength test for knee extensors, the isokinetic strength test for knee flexors and the eccentric strength test for knee flexors were the most commonly used. However, 31 different tests were utilised to assess power, with CMJ, SJ and VJ being the most frequently employed. However, it is noteworthy that the majority of the studies included in this review failed to report reliability values, concealing valuable information that could assist in determining test accuracy and consistency.

4.1 Testing Methods and Outcome Variables

As strength and power can support both performance enhancement [2, 9, 18] and injury risk minimisation [19,20,21], a valid and reliable assessment of strength and power ability can form the basis for effective prescription of training interventions. A plethora of strength and power tests were identified in our systematic review, reflecting a high level of interest in researching these attributes. However, this large disparity highlights the inherent complexities in the assessment of strength and power, as well as the lack of consensus on the optimal testing protocols for strength and power profiling in elite soccer players. This variation can be attributed to several factors, such as equipment availability and facilities, time constraints, safety and a competitive schedule among others [24]. Finally, cultural and philosophical differences may also have contributed to the wide range of different tests observed, as the included articles originated from 39 different countries.

4.1.1 Strength Assessment

Based on the results of this systematic review, isokinetic strength assessment of the knee extensor and flexor muscles represent the most popular testing methods to assess strength (58 and 55 studies, respectively). The large number of studies that have evaluated the strength of the knee extensors and flexors highlights the importance of these muscle groups in the execution of fundamental soccer-specific actions as well as in the prevention of common soccer injuries. In particular, knee extensors are involved in many soccer actions such as acceleration, deceleration, jumping and kicking, while knee flexors are highly recruited during running at higher velocities and provide additional support to the stabilisation of the knee joint during landing, deceleration and cutting actions [39]. In addition, the anterior cruciate ligament and the hamstring muscle group represent two of the most affected areas in soccer injuries [40, 41], which further highlights the necessity to assess knee extensor muscle function in soccer. The combination of the two measurements can be used for the calculation of the conventional strength ratio, enabling the determination of strength imbalances between knee extensor and knee flexor muscles. The conventional strength ratio is typically calculated by dividing the concentric peak torque of knee flexors by that of knee extensors [42, 43]. Its use is further supported by the findings of this systematic review, as it represents the second most frequent outcome variable in the isokinetic assessment of knee strength. It has been suggested that greater strength imbalances are associated with an increased risk of injury in the anterior cruciate ligament and hamstring muscle groups, although the overall research findings are inconsistent [20, 42, 44]. The assessment of peak concentric torque during knee flexor isokinetic strength and, subsequently, the conventional strength ratio, may fail to consider the main mechanism of hamstring strain injuries, where an eccentric muscle action occurs. This is further linked with increased demands of high-speed running (> 19.8 km/h) in modern soccer, where the hamstring muscles are subjected to additional eccentric loading. This appears to be the reason why the assessment of peak eccentric torque is another variable of interest (i.e. the second most investigated) in relation to the isokinetic strength of the knee flexors. Furthermore, peak eccentric torque is used for the calculation of the functional ratio, where the eccentric knee flexors torque is evaluated in relation to the concentric knee extensors torque. However, obtaining the functional ratio can significantly extend the duration of an already time-demanding test. Indeed, a recent systematic review [44] demonstrated no difference in association with anterior cruciate ligament and hamstring injuries between the conventional and functional ratios. Although isokinetic testing represents a valid and reliable method of assessing muscle strength at both slow and high contraction velocities, it is not without issues. Isokinetic testing necessitates the use of isokinetic dynamometers, which are costly, lack portability and demand a significant amount of time to complete their various testing protocols. Consequently, clubs with limited resources may not have access to this equipment. This has led to the search for alternative and more practical solutions, and as such, the eccentric knee flexor strength test via the use of the Nordic hamstring exercise has emerged [45]. Nordic hamstring testing enables the functional assessment of eccentric hamstring strength — a critical factor given the high prevalence of hamstrings injuries in soccer as well as the constantly increased demands of the modern game in the amount of high-speed running performed. Nordic hamstring testing is gaining popularity and currently represents the third most common method to assess strength in elite soccer based on the results of our systematic review. Its simplicity of use, the growing availability of Nordic measurement devices in elite soccer environments (e.g. Nordbord), and the ability to assess large groups of athletes in a time-efficient manner may have contributed to its rise. Furthermore, the Nordic hamstring exercise is a staple exercise in many strength and conditioning programmes in elite soccer [26, 46], and therefore no additional time for familiarisation is typically required. Finally, its well-established effectiveness in reducing the incidence of hamstring injuries [47] further validates the increased interest in assessing the amount of force produced in this exercise.

The review of the literature revealed a growing interest in the assessment of isometric and eccentric hip adductors and abductors strength over the last decade in elite soccer players. The rise in their popularity may be attributed to several factors. In addition to hamstring injuries, hip and groin injuries are also common in professional soccer, and result in long absences from training and matches [48]. The assessment of hip muscle strength, especially in the hip adductor and abductor muscle groups, plays a critical role in the clinical evaluation of groin-related issues. In fact, lower hip adduction isometric and eccentric strength values, as well as lower isometric hip adduction/abduction ratios, have been reported in athletes with groin pain [49,50,51]. Furthermore, hip adductors and abductors have an important function as frontal plane stabilisers, as they facilitate the prevention of excessive knee valgus during landing and cutting tasks [52]. These muscles have a significant contribution to the effective execution of COD tasks, potentially by assisting in the generation of propulsion in the lateral plane [53]. In addition, the increased availability of specialised equipment (e.g. ForceFrame, GroinBar, Kangatech KT360) in the field has led to an easier assessment of these muscles. However, the available literature suggests a lack of standardised protocols in isometric adductors and abductors strength testing. Different joint angles (hip: 0–60°, knee: 0–90°), duration of force application (3 vs 5 s), measurement devices (hand-held dynamometers, ForceFrame, GroinBar), limb engagement (unilateral vs bilateral) and outcome variables (e.g. peak force vs peak torque) have been identified in the examined literature. In view of these inconsistencies, standardisation of the overall process of assessing isometric hip adductor and abductor strength seems to be necessary.

While the aforementioned strength tests offer valuable insights into the function of specific muscle groups, they fail to provide an indicator of overall system strength. In this regard, the squat test provides a more holistic assessment of lower-body strength. The investigated literature revealed distinct squat testing methods such as the half-back squat (11 studies), back squat (eight studies) and isoinertial loading squat (three studies). However, when delving deeper into the testing protocols, a lack of clear and consistent nomenclature is evident, especially when differentiating between the half-back squat and the back squat. More specifically, in the studies that reported the “back squat” as the selected testing method, the depth of the squat varied (i.e. 90° vs thighs below parallel vs no information provided on the depth of the movement). Different squat depth has been shown to result in varying levels of muscle activation in the lower limb muscles, with greater depths leading to an increased activation of quadriceps, hamstrings and glutes muscles [54, 55]. Furthermore, individuals are able to lift heavier loads when the range of motion is shorter [56]. This can lead to inconsistent testing results and an inability to perform reliable comparisons. In addition, this discrepancy can have significant implications for the findings of this review by affecting the ranking in test frequency. In fact, better defined and standardised protocols could place either the “half-back” squat or the back squat among the three most popular testing methods in elite soccer. Overall, further standardisation of the squat test is necessary, taking into account the various 1RM calculation methods (direct assessment of 1RM vs estimation of 1RM [i.e. 6RM] vs assessment of barbell velocity using linear position transducers) and setups (barbell vs Smith machine vs Keiser) observed in our systematic review.

Finally, a lack of emphasis on upper body strength and multi-joint isometrics assessment seems to exist in elite soccer. The limited number of studies evaluating upper body strength, using exclusively the bench press test, may be possibly attributed to the specific demands of soccer, where the involvement of the upper body is minimal compared with the lower body. In contrast, similar previous work in rugby and basketball reported that the bench press test is one of the key tests in the assessment of strength [35, 36]. Isometric testing can serve as a quicker and less exhaustive alternative to dynamic testing, and both the isometric midthigh pull (IMTP) and isometric squat have been shown to be reliable options [57]. A recent survey investigating the fitness testing practices of elite male soccer practitioners identified the IMTP as the most commonly used test to assess strength [58]. Nevertheless, this systematic review identified only a single study utilising the IMTP [59], illustrating a discrepancy between research and practice. The specialised equipment required (i.e. force plates) to administer the IMTP may have rendered this test less viable in smaller clubs, accounting for the lower prevalence of the IMTP in this systematic review. In addition, isometric tests have the potential to be used in conjunction with other strength and power assessments to provide a more comprehensive picture of an athlete’s strength and power capabilities, as well as informing the training prescription; for example, with the CMJ for the assessment of a Dynamic Strength Index [60]. However, this systematic review failed to identify any studies in elite male soccer using a Dynamic Strength Index. Furthermore, the IMTP offers the ability to record multiple variables such as peak force, force at specific timepoints, rate of force development and impulse, as well as enable the identification of interlimb asymmetries. Therefore, more nuanced insights on force production can be provided. However, caution should be exercised in the use of time-dependent metrics such as rate of force development and impulse, as it has been demonstrated that their reliability is lower compared with non-time-dependent metrics, such as peak force [57, 61].

4.1.2 Power Assessment

Jump tests represent the main method for assessing power in elite soccer, with the CMJ, SJ and VJ being the most popular protocols. Jump height, measured in centimetres, was the primary outcome variable in these tests and the CMJ was by far the most commonly employed method for assessing power in elite soccer, having been featured in 99 studies. The CMJ is an easy to administer and time-efficient test that requires minimal familiarisation. Furthermore, it provides valuable insights into an athlete’s ability to utilise the stretch–shortening cycle. As hands are typically fixed on the hips during the execution of the test, this elimination of arm swing adds further standardisation to the test in assessing lower body power. A range of different equipment types, including force plates, photoelectric systems and jump mats has been employed in CMJ testing in the examined literature [62,63,64,65,66]; however, force plates are considered the gold-standard equipment for measuring vertical jump height [67]. In terms of calculating jump height, take-off velocity and the flight-time methods constitute the two primary methods [68]. Overall, practitioners are encouraged to use the take-off velocity method with the use of force plates [69], which are often inaccessible in the applied settings. As a result, flight time represents the most frequently used method in the calculation of jump height. However, this approach is not without its limitations. In particular, the flight-time method requires an individual to maintain the same position at take-off and landing, yet the landing position is different owing to the preparation with the ground contact mechanisms (i.e. ankle dorsiflexion and hip and knee flexion) [69]. This leads to an overestimation of the jump height. As a result, jump scores obtained using the flight-time method should not be compared with those obtained using the take-off velocity method, unless a correction equation is implemented [70]. Recently, there has been a call in the field to move beyond jump height and delve deeper into more nuanced metrics, in order to assess and report the movement strategy of the jump [71]. In this way, a more comprehensive understanding of the specific factors underlying a jump can be achieved, thereby leading to more targeted and individualised training interventions. Our literature search identified 12 additional variables, with peak power, both in absolute and relative terms, being the most frequently reported. However, similar to jump height, peak power is classified as an outcome variable that does not reveal the underlying kinetics and kinematics of the jump. Interestingly, the vast of majority of these metrics have been reported in studies conducted within the last 10 years, possibly indicating the increased availability of force plates, as well as a shift towards a more holistic assessment of jumping ability. However, given the high degree of variability found within some strategy metrics compared to jump height [72, 73], careful consideration is warranted in the selection of these.

Power is a multi-faceted concept, and as such, a single test is unlikely to provide a comprehensive assessment of power ability. This is further supported by the different types of jumps identified in this literature review. One such example is the SJ, which theoretically evaluates an athlete’s explosive ability in the absence of a stretch–shortening cycle, as no countermovement is allowed. Based on our systematic review, there is a high prevalence of SJ testing in elite soccer, with 48 studies utilising this test to assess power. The different insights provided compared to CMJ may contribute to a more comprehensive profile of power ability. Nevertheless, strict compliance with the SJ protocol (i.e. isometric hold of 2–3 s prior to the jump) is necessary, as a small-amplitude counter-movement has been shown to affect the jump height achieved [74]. In particular, the authors found that 55% of the SJ trials in their study consisted of a small-amplitude counter-movement when a gross observation was used. However, the occurrence of a small-amplitude counter-movement was increased though to 89% when the trials were analysed using force plates and to 99% when using linear position transducers. This can have significant implications in practical settings, where access to specialised equipment and resources to analyse each jump are limited. In light of these considerations, practitioners should critically evaluate the value of the information provided by the SJ. In addition, a number of studies performed the assessment of SJ under loaded conditions, which is commonly referred to as the jump squat test, using linear position transducers. In this way, an individual’s force–velocity profile and theoretical optimum power zone can be determined [75], subsequently informing targeted training interventions. Last, the VJ is another test commonly performed in elite soccer, featuring in 29 studies in this systematic review. The VJ has many similarities to the CMJ, except that an arm swing is allowed. This inclusion of the arms introduces a coordinative element to the movement and can facilitate the attainment of a higher jump height owing to the increased work output of the lower limbs that results from the use of the arm swing [76].

In recent years, the SLCMJ has garnered an increased amount of attention as a method to evaluate unilateral power in elite soccer. In fact, all 12 studies that used SLCMJ testing were conducted within the last 9 years, further highlighting the growing popularity of this test. Compared with other popular jump tests, the SLCMJ enables the assessment of power in a unilateral manner, something that can be of value given the requirement for unilateral movement competency in soccer. Moreover, the detection of interlimb asymmetries can support injury prevention and return-to-play strategies [66, 77]. Jump height, measured in centimetres, was found to be the main outcome variable obtained from the SLCMJ test. More importantly, jump height (when assessed unilaterally) has been shown to be a sensitive measure for assessing changes in performance of elite soccer players when in a fatigued state [64], encouraging the use of the test in settings where there is lack of specialised equipment such as force plates.

Interestingly, our systematic review revealed that the assessment of reactive strength ability, which represents the ability to transition rapidly from an eccentric to a concentric muscle action, does not appear to be prioritised in elite soccer players. The drop jump represents one of the most popular tests to evaluate reactive strength and provide insights into an athlete’s fast stretch–shortening cycle ability [37]. However, it was reported in only eight studies of elite soccer. In terms of outcome variables, jump height (six studies) and contact time (four studies) were the most frequently reported. The combination of these can be used to calculate the reactive strength index, which was reported in three studies and provides a measure to evaluate an individual’s reactive strength ability. Nevertheless, the fact that the reactive strength index is a ratio and is deemed as an outcome variable points to the need to examine each component separately and delve deeper into metrics that provide insights into the strategy used, such as ground contact time and leg stiffness [71].

4.2 Normative Values for Strength and Power Tests

Normative standards can serve as an important tool in the athletic development process, enabling benchmarking and a data-informed approach to athletic development. Given the potential of strength and power to distinguish between different playing levels [16, 17], availability of normative data can provide multiple benefits to key stakeholders such as coaching and management staff. In particular, normative values for elite soccer players can support practitioners in setting training priorities and objectives, which can lead to the implementation of targeted training interventions. In addition, knowledge of the strength and power outputs of soccer players competing at the highest level can be of great value, including for practitioners working with developmental players. This can enable the reverse engineering of the strength and power development process so that players are ready to cope with the physical demands of elite soccer. Therefore, this review also provides a summary of normative values of strength and power. Owing to the large discrepancy in testing methods identified, only normative values of the most commonly implemented tests and outcome variables were reported. Overall, the biggest challenge encountered in the establishment of normative standards lies in the wide variability of testing protocols and measurement devices. Therefore, readers are referred to Tables 49 for more in-depth information on the values reported by each study for each test.

A variety of angular velocities, with a range from 30°/s to 300°/s, have been used in isokinetic strength testing, enabling practitioners to gain insight into muscle strength capabilities at different speeds. As the majority of isokinetic strength values were reported at 60°/s in this literature review, the mean values reported correspond to this angular velocity. The substantially smaller number of studies reporting isokinetic strength values in elite youth soccer players could possibly indicate a research area where more work needs to be performed in the future. In terms of Nordic hamstring strength, the difficulty to draw conclusions was arguably greater. More specifically, the range of mean values of peak force is large (338.2 vs 636.5 N) in the two identified studies performed in elite youth soccer players, perhaps in part as a consequence of the different equipment used (Nordic assessment device vs acceleration leg curl/extension, neuroexcellence) or the training approaches adopted by the club. In a similar manner, the variety of equipment used in jump testing such as force plates, jump mats and photoelectric cells can introduce a varying degree of measurement error during a jump assessment. In addition, readers should take into account the detailed discussion on the intricacies of jump height calculation provided in Sect. 4.1.2. The range of values observed for SJ, CMJ and VJ corroborate this observation, and as such, generalisation of these results should be avoided.

Although the normative values presented in this review can offer valuable insights to practitioners, thereby enhancing the practical utility of this work, they are subject to many limitations. Careful interpretation and application of these results is therefore recommended. Additional research is required to establish specific thresholds for each playing standard. Finally, further standardisation of data analysis is required, as it was observed that some studies reported the mean value of the trials performed. Currently, it may be advised to determine club specific standards and compare players against this, thus accounting for differences in test equipment, methodology and the adopted culture and philosophy of training and testing [78].

4.3 Reliability Data

Reliability is an important concept in the overall testing process, especially in high-performance sport where success depends on marginal differences. The use of reliable tests and outcome variables can ensure that the data collected reflects an athlete’s true capacity, therefore guiding effective decision making. Nevertheless, our systematic review revealed that a relatively small number of studies reported reliability data for strength and power tests (15 and 34 studies, respectively), impairing confidence in the interpretation of test results and performance changes. This finding highlights the need to generate awareness of the utility of these metrics within the prescription and reporting of testing. It is of paramount importance for practitioners to establish their own reliability measures within their specific contexts, as the characteristics of each setting and athlete sample are unique.

Intra-day reliability was the most common type of reliability. The predominance of intra-day reliability could be attributed to the inherent complexity of conducting between-day reliability studies in elite sports. More specifically, when aiming to undertake inter-day reliability assessments, the second assessment is usually performed within 3–7 days of the first [79], which is not always feasible because of demanding training and competition schedules. Determining the between-day variability, though, can promote a more holistic and evidence-informed interpretation of performance changes, as the between-day variability is not typically the same as within-day data because biological variation is also factored in. In fact, our review revealed generally higher values for the intra-day reliability of the CMJ and SJ height compared with the inter-day reliability values reported in one study [80]. A similar observation was made in Nordic hamstring strength testing, where intra-day reliability values were considerably higher than inter-day reliability values. To address this, more ecological approaches to between-day reliability testing have been recently introduced in elite soccer [80] and rugby union [72], by integrating the reliability testing within the microcycle, where normal training is undertaken in the days prior to the re-assessment.

In terms of reliability metrics, the ICC was the most frequently reported. The ICC is a measure of relative reliability, which is the extent to which an individual maintains their ranking over the course of repeated trials. Although what is an acceptable ICC value can be debatable, it is generally embraced that the ICC ≥ 0.75 is considered as “good” reliability, with an ICC ≥ 0.90 considered as “excellent” [81]. Nonetheless, the ICC is influenced by group homogeneity and does not provide any information on the variation between efforts of an individual. Therefore, it is crucial for absolute reliability to also be established. Based on our findings, the CV and SEM were the most common metrics to evaluate absolute reliability. The CV indicates the relative dispersion of the data points around the mean by expressing the SD as a percentage of the mean, while the SEM provides an index of the precision of the measurement by estimating the range in which the population’s true score is expected to lie, within a defined level of confidence. In addition, these measures are more relevant to practice, as they are used for the assessment of sensitivity. Although the scientific community seems to broadly recognise the value of ≤ 10% as an acceptable threshold, this threshold appears to be rather arbitrary and a more nuanced and context-specific interpretation is required [82]. The paper by Mercer et al. [73] demonstrated that although certain CMJ variables produce a CV > 10%, they are still sensitive to training changes, justifying their use in practice. Readers are directed to this article to gain insights on how to determine the signal-to-noise ratio in an ecologically valid and non-disruptive to the training process manner, with their own athletes.

Regarding strength testing, the smaller number of studies reporting reliability values means that conclusions should be drawn with caution. The half-back squat seems to possess high levels of intra-day (ICC [0.94–0.97], CV [1.8–3.1], SEM [1.71]) and inter-day reliability (ICC [0.99], CV [1.8], SEM [2]). Additionally, the Nordic hamstring strength test appears to have high intra-day relative reliability levels (ICC [0.97–0.99]) in conjunction with a small CV value (1.0–3.2). In terms of inter-day reliability of the Nordic hamstring strength test, the only study performed demonstrated moderate levels of relative reliability, but with a CV value below 10%. Regarding power testing, the reported ICC ranges in CMJ (0.80–0.99), SJ (0.75–0.99) and SLCMJ (0.74–0.99) height, coupled with their CV (CMJ [1.8–15], SJ [2.1–13], SLCMJ [1.9–9.6]) and SEM (CMJ [0.6–1.4], SJ [0.6], SLCMJ [0.3–1]) values identified in this systematic review, confirm their high level of reliability. The increased availability of force plates in elite soccer will warrant the determination of reliability, particularly between days, in metrics other than jump height, representing an area where future research in elite soccer should focus. Although the current reliability data are generally robust, practitioners should still validate these measures within their specific context to ensure the accuracy and applicability of the data.

Finally, only a very limited number of studies reported the MDC. Minimal detectable change, calculated from the SEM, illustrates the minimal amount of change in performance required to be confident that the change can be considered as real at a predetermined probability level (usually 90% or 95%). This may raise the need to further determine MDC of strength and power tests in the context of elite soccer, as this will allow practitioners to identify normal variations or true changes in performance. However, it should be acknowledged that such a high confidence threshold may not be suitable for high-performance settings where a high level of physical performance has already been established and training interventions can therefore only elicit a certain degree of positive adaptations. This can lead to tiny but significant positive changes being labelled as “noise”, resulting in the discontinuation of certain training interventions that are actually working.

4.4 Limitations

Although this systematic review provided a comprehensive picture of strength and power testing in elite soccer, there are several limitations that should be acknowledged. To begin with, as Boullosa et al. [83] indicate, the conclusions of a systematic review can be influenced by the inclusion or exclusion of a few studies. In this sense, the terms “elite” and “professional” are often used interchangeably across the literature. It is likely, though, that these terms may be used differently in different geographic regions and leagues. This can be considered as a limitation, as well as a reflection of the existing soccer literature, highlighting the need for a standardised terminology for “elite” and “professional”. The large variability in equipment is another challenge in the establishment of normative standards, complicating the direct translation of these findings into practice. In addition, because of the heterogeneity of testing methods identified in the present literature review, it was not possible to carry out a meta-analysis. Last, in strength testing, a substantial lower number of studies reporting normative values was available. This discrepancy may interfere with the ability to perform reliable comparisons between men and young elite soccer players.

4.5 Directions for Future Research

This systematic review has identified several areas that require further investigation. There is a need to standardise several aspects of strength and power testing to improve the comparability and the application of the results. This includes standardising the definitions, such as the distinction between the half-back squat and the back squat, and standardising procedures for certain tests, such as the isometric adductor strength test, the Nordic hamstring test and the CMJ. Future research should focus on establishing a hybrid testing framework that incorporates standardised “core” tests for benchmarking and large-scale comparisons, while allowing practitioners to introduce additional context-specific tests tailored to the unique dynamics of their settings. A scarcity of robust reliability data is evident in elite soccer. Practitioners need to establish their own reliability measures, and subsequently the sensitivity of those, within their specific contexts, to enhance confidence in assessing performance changes and reducing reliance on published reliability thresholds. This will assist in determining any of those that do not offer any particular value in decision making, removing any redundant processes and data. A standardised data analysis process should be also adopted, as there is no consensus on the optimal approach to analyse strength and power testing results (i.e. best trial vs average of trials). Future studies should therefore examine the ramifications of each approach. Last, future studies should investigate the most effective methods of reporting the testing results to the key stakeholders to enhance the impact of testing in the training process. These studies have the potential to reshape the strength and power assessment procedures in elite soccer, enabling more robust practices and informed practices.

5 Conclusions

This systematic review, as illustrated in the infographic in Fig. 2, provides a comprehensive overview of the tests and outcome variables used to assess strength and power in elite male soccer. The wide variety of different tests employed combined with the multitude of different outcome variables indicates the lack of a consensus in strength and power testing in elite soccer. This may arise from the diverse training needs of each specific setting, as well as the different testing philosophies across cultures. In terms of frequency, isokinetic knee (extensors and flexors) strength testing and CMJ were the most administered strength and power tests, respectively. The normative values provided for these tests enhance the practicality of this review. However, the application of these normative values warrants careful consideration, as different testing protocols and instruments have been utilised. Future research should focus on the development of a hybrid testing approach to strength and power testing, combining standardised tests for benchmarking purposes, while allowing for flexible testing selection based on the unique requirements of each specific context to enable a holistic profiling of strength and power.

Fig. 2
figure 2

Strength and power testing in elite male soccer. CMV countermovement jump, CV coefficient of variation, ICC intraclass correlation coefficient, SEM standard error of measurement, SJ squat jump, VJ vertical jump