Background

Rugby (either rugby union or league) is a popular sport played professionally or otherwise at both junior and senior levels worldwide [1]. It is generally considered a physical sport characterised by multiple high-intensity activities interspersed with low-intensity activities [2,3,4,5]. The players engage in physically demanding contests such as tackles, rucks and mauls with the primary objective of gaining possession of the ball [6]. These contests require players to possess a wide range of physiological characteristics such as strength, power and endurance which allows them to be stronger and fatigue-resistant [7,8,9,10].

There are numerous studies in the literature that have provided scientific evidence on the physiological characteristics of rugby players. This has been necessitated by the drive to understand the physiological factors that differentiate between playing levels (talent identification) and the physiological characteristics associated with optimal performance [1, 2, 7, 10,11,12,13,14,15,16,17,18]. For example, Gabbett and Seibold [15] postulated that lower body power, upper-body strength-endurance, and prolonged high-intensity intermittent running ability discriminated players for team selection in semi-professional rugby league (RL) players. Smart et al. [17] found correlations between speed, repeated- sprint ability and game performance statistics such as tackle breaks and tries scored in rugby union (RU). Furthermore, Till et al. [18] compared longitudinal changes in physical qualities with career attainment status and found that advanced physical qualities such as absolute strength during the adolescence period contributed significantly to the attainment of professional status in rugby. All these findings suggest an important relationship between physiological characteristics and future career success, physical performance and team selection [15, 17, 18].

Today, physiological profiling of rugby players has become an integral aspect of the contemporary sport of rugby. It allows coaches to determine “competent” players with enhanced physiological capacities to withstand the high-intensity demands of the sport and can win trophies for team, club or country [6, 7]. This forms the hallmark of talent identification programmes. Secondly, understanding the physiological qualities needed in the sport of rugby may specifically inform training development practices of future professional players [18]. With the surge in physiological profiling, proliferation of talent identification and development programmes for young rugby players [18], there is need for identification and use of physical tests with known measurement properties (reliability, validity and responsiveness). A scoping review of the literature showed that there are multiple tests available for measuring the same physiological characteristic. For example, agility is a fundamental physiological characteristic required for optimal performance by rugby players. The construct has been evaluated using different tests such as ‘L’ run, Illinois agility run test, agility 505 test, modified 505 test and change of direction speed test in the literature [6, 10, 16, 18,19,20,21,22]. In an attempt to understand the basis of selecting tests, it may be important to have an overview of all the tests that measures a specific physiological construct and evaluate systematically the measurement properties of the identified tests in an attempt to identify test(s) with the strongest level of evidence per construct. Possibly, this information can help us understand the reasons for selection of particular tests for the measurement of a specific physiological characteristic in terms of measurement properties. To our knowledge, there is no systematic review that has been conducted to provide such information. Therefore, this systematic review was conducted with the aim of addressing the following research questions:

  1. 1.

    What physiological characteristics of rugby players are evaluated in the literature and which tests are used to measure each identified characteristic?

  2. 2.

    What is known about the measurement properties (reliability, validity and responsiveness) of each identified physiological test in the sport of rugby? If there is no information on the measurement properties for each test in rugby, is there any evidence available from other closely-related intermittent, collision team sports to rugby such as Australian Rules football, American football or Soccer? In case of multiple tests measuring the same construct, which test(s) has the strongest level of evidence in terms of the measurement properties?

Stage 1: Methods

This systematic review was registered on PROSPERO with the registration number CRD 42015029747 [21]. This review paper was organised in stages. Stage 1 presents an overview of the physiological characteristics commonly evaluated in rugby and the corresponding tests. Stage 2 presents an overview on the measurement properties of the identified physiological tests. Each stage was written in accordance with the Preferred Reporting Items for Systematic review and Meta-analyses (PRISMA) guidelines by Moher et al. [23].

Literature search

A literature search was conducted using the following databases: Scopus, Medline via EBSCOhost and via PubMed, Academic Search Premier via EBSCOhost, CINAHL (Cumulative Index of Nursing and Allied Health) via EBSCOhost and Africa-Wide Information via EBSCOhost. The review included studies published in the last 20 years between January 1, 1995, and December 31, 2016. Additionally, a hand search was also conducted on reference lists of selected articles to augment the literature.

Selection criteria for the studies

Sports context

There are two major variants of rugby, namely, RU and RL. Although RU differs significantly from RL in team sizes, scoring and in certain situations of tackling and when the ball goes out, there are striking similarities in game duration, field size, player positions, and goal posts [24]. There are also similarities in the physical demands and physiological responses elicited during game play as both sports are predominantly aerobic in nature interspersed with high-intensity efforts [5, 24]. The objective in both is to get the ball over the opposition’s goal line by carrying, passing, kicking and grounding the ball. Therefore, because of the resemblance we included studies on RU and RL. However, studies on the sport of rugby “sevens” were excluded.

Physiological characteristics

Rugby requires a blend of physiological characteristics for players to cope with demands of the game [1]. The studies included had to report on at least one physiological characteristic operationally defined as measures that assess speed, repeated-sprint ability, prolonged high-intensity intermittent running ability, agility, muscular strength, power and endurance and maximal aerobic capacity. In addition, for studies to be included they had to report the name of the test used to measure the physiological construct and include a detailed, reproducible description of the test procedure. There was no restriction in study design applied during study selection. However, editorials, book chapters, poster and oral conference abstracts, unpublished theses, dissertations, and case studies were excluded. Studies published in non-English language were also excluded.

Participants

Since rugby is played competitively at junior and senior levels worldwide, studies included in this review had to involve male rugby participants from the age of 10 years and above (adolescents to adults) from any country. Studies involving rugby participants living with disabilities were excluded.

Search strategy

The search strategy was developed in consultation with an expert librarian in systematic reviews from University of Cape Town (UCT) libraries. The search strategy (see Additional file 1 designed for Medline via PubMed) consisted of a combination of the following search themes connected with the Boolean terms AND:

  1. i.

    Construct-related general search terms: physical characteristics OR physiological characteristics.

  2. ii.

    Construct-related specific search terms: speed OR agility OR flexibility.

  3. iii.

    Target population-related search terms: adult OR adolescent OR youth.

  4. iv.

    Sport-related search terms: rugby OR rugby union OR rugby league.

Selection of articles

The selection process was conducted stepwise based on recommendations for performing systematic reviews by van Tulder et al. [25] and Reimers et al. [26]. The first author (MC) ran the search strategy across all databases. Two reviewers (JD and EB) independently reviewed the search results in two steps. The first step involved applying the inclusion criteria to select potentially relevant articles from titles. The abstracts of studies with titles considered relevant were retrieved for further inspection in the second step [26]. Provided that the abstract fulfilled the eligibility criteria or had insufficient information for a selection decision to be made, both reviewers retrieved the full text to further assess for eligibility [26]. Initially, disagreements among reviewers were discussed among themselves at the end of the selection process. In the case of further disagreements, a third (TM) reviewer intervened until a mutual consensus was reached. In addition, all retrieved articles were then reviewed again against the inclusion criteria by the lead investigator (MC).

Data extraction

Data extraction was performed by two independent people (TM and JD). Extracted data was documented onto a Microsoft Excel data extraction form. The following data were captured for the first objective: publication details of the study (first author, year of publication), the name(s) of the physiological characteristic examined in the study (captured as originally described by the authors) and the name of corresponding test(s) as described in the study used to measure the physiological characteristics. To enable the description of studies, additional information on sport contexts, age of participants, country, target population, study design and sample size were also extracted. The primary author (MC) acted as the data verifier, assessing the exhaustiveness and accuracy of data extracted from the included articles. Discrepancies in data extracted identified by the verifier were communicated to the two data extractors and disagreements resolved by mutual consensus.

Results: Stage 1

Since Stage 1 results were used to inform the methods and selection criteria for studies in the second stage of the systematic review, results for Stage 1 were presented here. The electronic searches revealed 23,976 studies and after initial selection based on abstract and title, 1909 studies were potentially eligible (Fig. 1). After full-text evaluation, 70 studies were included. The majority of the studies did not meet the inclusion criteria because they did not report on physiological characteristics (Fig. 1).

Fig. 1
figure 1

Flow chart of the search and selection process for stage 1 articles

Description of included studies

The general characteristics of the 70 included studies are shown in Table 1. Briefly, the majority of the included studies (n = 35, 50.0%) were conducted in Australia alone. Only three (4.29%) studies were conducted in an African country, namely, South Africa [7, 27, 28]. Of the 70 studies, 34 (48.6%) had adolescents as participants and six (8.57%) used both adults and adolescents. The sample sizes varied greatly across studies from 12 to 1172 participants depending on study designs. Studies varied from retrospective, prospective cohort studies, experimental with the preponderance of the studies being cross-sectional. The majority of studies (n = 50, 71.4%) involved RL participants. Two studies had participants drawn from both RL and RU [24, 29].

Table 1 General characteristics of included studies1

Physiological characteristics and the corresponding tests

Table 2 provides an overview of physiological characteristics, corresponding tests used to measure each construct in rugby and the absolute number of studies that used a specific physiological test. This review identified 15 physiological characteristics commonly evaluated among rugby players. These include speed, repeated-sprint and effort ability, repeated high-intensity exercise performance, prolonged high-intensity intermittent running ability/endurance, anaerobic endurance, maximal aerobic power and speed, agility, lower-body muscular power and strength, upper-body muscular strength and power, upper-body muscular endurance and abdominal endurance. However, there were no studies evaluating muscle flexibility of the rugby players that met the inclusion criteria.

Table 2 An overview of tests used to measure specific physiological characteristics as described in the included studies

The majority of these physiological characteristics had multiple tests for measurement. Overall, the 70 studies included in the review described 63 physiological tests: speed (8), upper-body muscular endurance (8), agility/change of direction speed (7), upper-body muscular power (6), upper-body muscular strength (5), prolonged high-intensity intermittent running ability/endurance (5), lower-body muscular strength (5), anaerobic endurance (4), maximal aerobic power (4), lower-body muscular power (3), repeated high-intensity exercise performance (3), repeated-sprint ability (2), repeated-effort ability (1), maximal aerobic speed (1) and abdominal endurance (1). Table 3 summarises the procedures for administering each physiological test identified.

Table 3 A descriptive summary of procedure for the tests identified as commonly used in the included studies

Speed

Running speed was the most common physiological characteristic evaluated among rugby players. Of the 70 studies, 51 (72.9%) examined the speed characteristics of rugby players. Straight-line sprinting was commonly measured over eight distances of 5 m, 10 m, 15 m, 20 m, 30 m, 40 m, 50 m and 60 m recorded using dual beam electronic timing gates (Tables 2 and 3). Of the 50 studies, 98% assessed the speed of rugby players over multiple distances. Twelve (24%) studies specifically used multiple linear distances of 10 m, 20 m and 40 m [30,31,32,33,34,35,36,37,38,39,40,41] and eight (16%) used the 10 m, 20 m, 30 m and 60 m sprint tests for the speed evaluation of rugby players [41,42,43,44,45,46,47,48].

Repeated sprint and effort ability

There were seven (10.0%) studies that evaluated repeated-sprint abilities of rugby players. However, only two tests were commonly used in these studies to evaluate the construct. The Repeated 20 m Sprint test was used in five of the seven studies [16, 29, 49,50,51]. The test involves players performing 10 or 12 maximal effort sprints over a 20 m distance with each sprint performed on a 20 or 30s cycle [16, 29, 49,50,51]. In addition, there were two studies that evaluated the repeated sprint abilities of rugby participants using the Rugby-Specific Repeated Speed (RS2) test [17, 52]. The Repeated-Effort Ability test was used in one study to investigate the physiological characteristic of repeated-effort ability in rugby players [51]. The protocol comprises of 12 × 20 m sprints and tackles with each sprint commencing every 20s and the tackle performed after each 20 m sprint [51].

Repeated high-intensity exercise performance

The ability to perform repeated high-intensity exercises by rugby players was assessed using specifically developed Repeated High-Intensity Exercise (RHIE) tests. Three tests were used in a study by Austin et al. [24] and were modified for RU backline players, RU forward players and RL forward players.

Prolonged high-intensity intermittent running ability/endurance

Fourteen (20.0%) studies investigated the measurement of a physiological characteristic termed “prolonged high-intensity intermittent running ability” or endurance [15, 16, 18, 24, 49, 50, 53,54,55,56,57,58,59,60]. Of the 14 included studies, eight used the Yo-Yo Intermittent Recovery Level 1 (Yo-Yo IRT1) test [15, 18, 53,54,55,56, 59, 60] and three utilised the Repeated-12 s Sprint Shuttle Speed test [15, 49, 50]. The Yo-Yo IRT1 involves performing 2 × 20 m runs back and forth at a progressively increasing speed keeping to a series of beeps/audio signals from compact disc [15, 53, 54]. The Repeated 12 s Sprint Shuttle speed test involves players performing 8 × 12 s maximal effort shuttles (sprinting forward 20 m, turning 180 degrees and sprinting 20 m) and each shuttle is performed at 48 s cycle [16, 49, 50]. In addition, there was only one study that evaluated the construct of “prolonged high-intensity intermittent running ability” using the Yo-Yo Intermittent Recovery Level 2 (Yo-Yo IRT2) test [24].

Maximal aerobic power and speed

Of the 70 studies, 32 (45.7%) studies estimated the maximal aerobic power of rugby players. Of these studies, 29 (90.6%) used the Multistage Fitness test [7, 8, 10, 16, 27, 30,31,32,33,34,35,36,37, 40, 41, 43,44,45,46, 48,49,50, 61,62,63,64,65,66,67]. Other tests used in singular studies to estimate maximal aerobic power included the 30–15 Intermittent Fitness test (30–15IFT) [68], 1500 m run [42] and the Yo-Yo IRT1 [69]. Maximal aerobic speed was evaluated using the 30–15 Intermittent Fitness test (31-15IFT) [53, 59]. The test involves performing 30s shuttle runs conducted at a pace governed by a pre-recorded beep and interspersed with 15 s periods of passive recovery. The test begins at 8 km/h and increased to 0.5 km/h at each successive running shuttle [53].

Anaerobic endurance

Three (4.28%) studies assessed the anaerobic endurance of rugby players. One study compared results of rugby players on two tests of anaerobic endurance: Triple 120 m (T120S) test and the Wingate 60 (w60) cycle test [70]. Other tests used in singular studies included the 300 m Shuttle Run test [71] and the 400 m Sprint test [42].

Change of direction speed/agility

The change of direction speed/agility of rugby players was commonly measured in a number of studies. It was the third most commonly measured physiological characteristic in the included studies. In total, 33 (47.1%) studies examined the change of direction speed or agility of rugby players. Of these studies, 17 (51.5%) used the 505 test [16, 19, 36, 37, 41, 43,44,45,46,47,48,49, 53, 65,66,67, 72] and seven (21.2%) used the L-run test [19, 31, 32, 34, 35, 40, 58]. The 505 test involves players assuming a starting position 10 m from timing gates and accelerate as quickly as possible along the 15-m distance, pivot on the 5 m line or turn 180 degrees at the 15 m mark and return as quickly as possible through the timing gates placed 5 m from a designated turning point [16, 19, 36, 37, 49, 53, 72]. On the other hand, the L run involves three cones placed 5 m apart in an ‘L’ shape and players have to run as quickly as possible along the 5 m, turn left, run forward 5 m, turn 180 degrees and follow same course to finish [19, 31, 32, 34, 35, 40]. Other tests used in the included studies are the Illinois Agility test (n = 3) [27, 30, 64], Modified 505 test (n = 2) [19, 73] and Change of Direction Speed test (CODS) (n = 2) [6, 74].

Lower-body muscular power and strength

Lower-body muscular power was the second most commonly investigated physiological characteristic in rugby participants. Of the 70 studies, 42 (60.0%) studies included in this review examined that construct. Of these studies, 15 (35.7%) used the Vertical Jump (VJ) test [15, 16, 30,31,32,33,34,35,36, 40, 42, 49, 61, 64, 65, 73]. The VJ involves using a Yardstick device or a board and players are instructed to stand with feet flat on the ground, fully extended arms and hands, and mark the standing reach height. After assuming a crouch position, players are requested to spring upward and touch the yardstick device or the board at the highest possible point [15, 16, 30,31,32,33,34,35,36, 40, 42, 49, 61, 64, 65, 73]. Twenty-two (52.4%) studies used the Countermovement Jump (CMJ) test [18, 38, 39, 41, 43,44,45,46,47,48, 53, 55,56,57, 59, 60, 62, 63, 66, 67, 69, 75, 76]. The difference in the two vertical jump tests is that the CMJ involves participants standing with their hands positioned on the hips and usually jump from a jump mat as high as possible [18]. The Jump Squat (JS) test was used in five studies [13, 75, 77,78,79].

Of the 70 studies, 14 (20.0%) assessed lower-body muscular strength of rugby players. The most frequently used test was the One Repetition Maximum Back Squat (1RM BS). The test was used in nine of the fourteen studies [5, 17, 18, 38, 55, 56, 69, 77, 80]. Using an Olympic bar or free weights, players are instructed to back squat until the top of the thigh is parallel with the ground and return to a standing position to record 1RM [5, 17, 38, 55, 56, 69, 77, 80]. In addition, two studies used the 1RM Box Squat [13, 42] and 3RM Back Squat [15, 60], respectively.

Upper-body muscular power and strength

Nineteen (27.1%) studies evaluated the upper-body muscular strength of rugby players. Of these studies, 13 (68.4%) used the 1RM Bench Press [5, 7, 17, 18, 27, 38, 42, 55, 56, 58, 69, 78, 80]. The 1RM BP test involves players in supine, feet flat on floor, hips and shoulders in contact with the bench. The players are instructed to lower the bar to touch the chest and push the bars until the elbows are locked out, recording the 1RM [5, 7, 17, 27, 38, 42, 55, 56, 69, 78, 80]. Two studies used the 1RM Chin-Up test [17, 42] and the 3RM Bench Press [15, 60]. On the other hand, there were 12 (17.1%) studies that examined that upper-body muscular power for rugby players. The frequently used test in the included studies was the 2 kg Medicine Ball Chest Throw [41, 43,44,45,46,47,48, 57, 66]. Other tests used in singular studies included the 20s Push-Up and 20s Chin-Up tests [36], Overhead Medicine Ball Throw test [73], Bench Throw test [13].

Upper-body and abdominal muscular endurance

Of the included studies, upper body muscular endurance was assessed in five studies only (7.14%). One singular study utilised two tests: 60s Push-Up and Chin-Up tests [36]. Another study used the 1RM Bench Press Repetitions-to-Fatigue test at 60 kg, 102.5 kg and at 60% of 1RM [81]. Other tests used in singular studies included the Pull-Up test [7] and the body mass Bench Press with repetition test [15] and the 30s Plyometric push-up test [58]. Abdominal endurance was identified in one study and was assessed using the 60s Sit-Up test [58].

Stage 2: Methods

Stage 1 allowed us to identify tests commonly used for the measurement of physiological characteristics of speed, repeated sprint ability and effort, repeated high-intensity exercise performance, prolonged high-intensity intermittent running ability/endurance, maximal aerobic power and speed, anaerobic endurance, change of direction speed/agility, lower and upper –body muscular strength, power, and abdominal endurance. Briefly, the second stage of the systematic review was conducted to provide evidence on the measurement properties of each identified physiological test from Stage 1. The ultimate aim, however, was to identify one physiological test per physiological construct with the strongest level of evidence on measurement properties on best evidence synthesis.

Literature search, search strategy and eligibility criteria

The electronic databases used for literature search in Stage 1 were used for Stage 2. Initially, we searched specifically for full-text studies with the primary purpose of investigating the measurement properties (reliability, validity and responsiveness) of the previously identified physiological tests in male rugby participants. This was done for the determination of physiological tests validated in the population of interest to the researcher (MC) for his future studies using rugby participants [21, 82]. However, provided that there was no satisfactory information found on the measurement properties for certain physiological tests in rugby studies, it was pre-planned that we would search for the evidence from clinimetric studies on related, intermittent, collision team sports such as Australian Rules football (AFL), American football, Gaelic football and Soccer. But, included studies from related sports had to have a similar description of the procedure of the test as described in rugby-related studies. In cases where there were major adjustments according to the researcher (MC) in the procedure of test between sports such studies were excluded. A search strategy proposed by Terwee et al. [83] guided the selection of keywords (see Additional file 2). The strategy for searching clinimetric studies in rugby and related sports consisted of a combination of following search themes (i, ii, iii, iv) and (i, ii, iv, v), respectively, connected with the Boolean term AND:

  1. i.

    Test-specific terms: Vertical jump test OR Yo-Yo intermittent recovery test OR repeated 20 m sprint test.

  2. ii.

    Measurement property-related terms: Psychometric* OR measurement* OR clinimetric*.

  3. iii.

    Rugby-related terms: rugby OR rugby union OR rugby league.

  4. iv.

    Target population-related search terms: adult OR adolescent OR male

  5. v.

    Other team sport-related terms: Australian Rules football OR American football OR Soccer.

Data extraction

The selection process of the identified articles was conducted as described previously in stage 1. Subsequently, data extraction was conducted using two independent people (SO and TM). All the data extracted was put on Microsoft Excel and was given to two other independent assessors (JD and TM) for further verification purposes on the accuracy of the data. The following data were extracted: publication details (first author, year of publication), title, purpose of the study, age of the participants, country, sport context, physiological construct evaluated, test(s) used to measure the construct, and the measurement properties assessed (reliability, validity and responsiveness). For the measurement properties, the following data were extracted: type of reliability or validity, interval period for test-retest and inter-rater studies, sample size and the results obtained for each physiological test.

Quality assessment of the clinimetric studies and measurement properties

The Consensus-based Standards for the Selection of health Measurement Instruments (COSMIN) checklist was used to evaluate the methodological quality of the included studies. Briefly, the COSMIN evaluates nine measurement property items (internal consistency, reliability, measurement error, content validity, construct validity (i.e. structural validity, hypothesis testing, cross-cultural validity), criterion validity and responsiveness) (Table 4). It also provides standardised information for evaluating the quality of each item based on design requirements and statistical methods [84, 85]. The COSMIN scoring system per measurement property is based on a point rating scale (poor to excellent) and the overall rating for the methodological quality of each study is obtained by taking the lowest score [83, 84].

Table 4 Rating of the Quality of the statistical outcomes to determine measurement properties

Two reviewers (JD and TM) with prior COSMIN experience evaluated the methodological quality of each study included in Stage 2. It was pre-planned that disagreements were resolved by discussion with the third person (CT) until a consensus was reached. In addition to the methodological quality assessment with the COSMIN, the quality criteria for rating of measurement properties checklist as given by Terwee et al. [86] was used to rate each measurement property in the included articles as ‘positive’, ‘negative’ or ‘questionable’ depending on the results of the property reported (Table 4). Studies with “poor” methodological qualities were not analysed for the quality of the results on the measurement properties.

Best evidence synthesis: levels of evidence

To help synthesise results from numerous studies on the same physiological construct, the “best evidence synthesis” was performed by the primary author (MC). The best evidence synthesis rating was determined based on the number of studies that have investigated the measurement property, the overall COSMIN score, and the rating and consistency of the measurement property result (positive, indeterminate, and negative) [87]. The possible levels of evidence are “strong” (when consistent findings in multiple studies of good methodological quality were found or in one excellent methodological quality study), “moderate” (when consistent findings in multiple studies of fair methodological quality were found or in one study of good methodological study), “limited” (if only one study of fair methodological quality was found), “conflicting” (conflicting findings) and “unknown” (if only studies of poor methodological quality were found or no studies) [87].

Results: Stage 2

Characteristics of included studies

Figure 2 shows a flow chart for the selection of the studies. Of 824 studies identified from the electronic databases, 20 met the inclusion criteria. The majority of the studies did not meet the inclusion criteria because they did not report on measurement properties. The general characteristics of the included studies and a summary of the measurement properties evaluated in each study are summarised in Table 5. The studies were conducted in Australia (n = 9), Denmark, Brazil, Belgium (n = 2), Norway, Ireland, Iran, Italy and Croatia (n = 1). The age of the participants in the included studies ranged from 12 to 36 years.

Fig. 2
figure 2

Flow chart for the search and selection of stage 2 articles

Table 5 Characteristics of included studies from stage 2 and the psychometric properties assessed

Out of the 63 tests identified in stage 1, 20 studies described the measurement properties of only 21 tests. The tests were the 5 m, 10 m, 20 m and 30 m Speed tests (speed), 20 m Repeated-Sprint test (repeated sprinting ability), Repeated-Effort test (repeated effort ability), three Repeated High-Intensity Exercise tests (repeated high-intensity exercise performance), Yo-Yo IRT1 and 2 (prolonged high-intensity running ability), T120 s (anaerobic endurance), 505 test (agility), Modified 505 test (agility), L run (agility), Change of Direction Speed test (agility), Sergeant Jump test (lower-body muscular power), and three Bench Press Repetition-to-Fatigue tests (upper-body strength-endurance).

Of the 21 tests, 18 were studied for their measurement properties in rugby. The Yo-Yo Intermittent Recovery Level 1 and 2 and the Sergeant Jump tests had their measurement properties derived from other related sports (Soccer and Australian Rules football). Other than the tests mentioned above, there was no evidence on the measurement properties either in rugby or related sports for all the other tests identified in stage 1. However, for the 21 tests identified in stage 2, none of the tests had all the measurement properties investigated. But, the majority of the studies (n = 7) investigated the reliability and validity of one or more physiological tests [6, 19, 74, 88,89,90,91].

Measurement properties and methodological quality assessments

Tables 6 and 7 provide an overview of the measurement properties for the identified physiological tests and the COSMIN rating of methodological quality for the studies per measurement property. Table 8 shows rating of the quality of the results on the measurement properties based on the quality rating criteria of measurement properties checklist given by Terwee et al. [86]. The results on the measurement properties for the physiological tests derived from studies of “poor” methodological quality were excluded from the rating.

Table 6 Measurement properties (reliability and measurement error) of the physiological tests and methodological quality scores
Table 7 Measurement properties (validity and responsiveness) of the physiological tests and methodological quality scores
Table 8 Overall quality score by study and rating of measurement properties for the physiological tests

Yo-Yo intermittent recovery level 1 (Yo-Yo IR1) test

Of the 20 studies included in the review, seven investigated at least one measurement property of the Yo-Yo IR1 test (Table 5). Validity was the most commonly studied measurement property with six studies evaluating at least one type of validity [88, 89, 92,93,94,95]. There was evidence on known-group validity [88, 92, 93], convergent [89, 94, 95] and criterion validity [89] of the Yo-Yo IR1 test. However, all the six studies were rated “poor” on methodological quality mainly because of the inadequate sample sizes used in the validity analysis. Reliability was the second most commonly studied measurement property with four studies evaluating test-rest reliability (Table 5) [88, 89, 94, 96]. The test-retest intervals ranged from within one week to eight days [88, 89, 94, 96]. On methodological quality, all the studies investigating the reliability of the Yo-Yo IR1 were rated “poor”. In all these studies, the sample size had the lowest score and therefore determined the total score for the study. Another measurement property investigated for the Yo-Yo IR1 was responsiveness. However, responsiveness of the Yo-Yo IR1 test was reported in two studies of “poor” methodological quality [94, 95].

Yo-Yo intermittent recovery level 2 (Yo-Yo IR2) test

Of the 20 studies included in the review, four studies provided evidence on at least one measurement property of the Yo-Yo IR2 test (Table 5) [91, 94, 97, 98]. Validity and reliability were the most commonly studied measurement properties of the test [91, 94, 97, 98]. Three studies evaluated the test-retest reliability of the Yo-Yo IR2 with a seven day interval between the assessments [91, 94, 98]. However, all the three studies were rated “poor” on methodological quality mainly because of small sample sizes used for the reliability analysis. On the other hand, there were four studies that investigated the validity of the Yo-Yo IR2 test (Table 5) [91, 94, 97, 98]. Two studies provided evidence on convergent [94, 97] and criterion [97, 98] validity of the Yo-Yo IR2 test. In addition, singular studies investigated the known-group validity [97] and concurrent validity of the test [91]. All the studies were, however, rated “poor” on methodological quality. Responsiveness of the Yo-Yo IR2 test was examined in one study of “poor” methodological quality [94].

Speed tests

5 m sprint test

Only one “fair” study investigated the measurement properties (reliability and validity) of the 5 m sprint test (Table 5) [19]. The 5 m sprint test was found to have positive rating [i.e. Intraclass Correlation Coefficient (ICC) > 0.70] for the test-retest reliability (Tables 6 and 8) [19]. The same study provided evidence on the construct validity of the test (Table 7). A positive rating for the known-group validity was found for the 5 m sprint test as specific hypotheses were formulated and at least 75% of the results were in accordance with these hypotheses (Table 8). There was no evidence on the responsiveness found for the test.

10 m sprint test

Three different studies investigated the measurement properties of the 10 m sprint test (Table 5) [6, 19, 55]. Reliability was the most commonly studied measurement property. All the three studies had test-retest reliability evidence for the 10 m sprint test, with an interval of two to seven days between the assessments [6, 19, 99]. However, two of the studies were rated “poor” on methodological quality [6, 99]. In one “fair” study, a positive rating for the test-retest reliability (ICC = 0.87) of the 10 m sprint test was found [19]. Validity of the 10 m sprint test was assessed in two studies [6, 19]. The most common type of validity studied was construct validity (known-group validity). One study was rated as “poor” on methodological quality [6]. In that study, a positive rating of construct validity was found for the 10 m sprint test. There was no evidence found on the responsiveness of the test.

20 m sprint test

Only one “fair” study investigated the measurement properties (reliability and validity) of the 20 m sprint test (Table 5) [19]. The 20 m sprint test was found to have positive rating for the test-retest reliability (Tables 6 and 8) [19]. The same study provided evidence on the construct validity of the test (Table 7). A positive rating for the known-group validity was found for the 20 m sprint test as specific hypotheses were formulated and at least 75% of the results were in accordance with these hypotheses (Table 8). There was no evidence on the responsiveness for the test.

30 m sprint test

Test-retest reliability evidence of the 30 m sprint test was provided by one study rated “poor” on methodological quality [6]. The study used a sample size of 11 participants to establish the reliability of the test with three days between the test-retest assessments. In the same study, the 30 m sprint test was also assessed for its known-group validity [6]. However, the study was also rated “poor” on quality for the construct validity. There was no evidence found on the responsiveness of the test.

Repeated-sprint ability (RSA) test

One study assessed the test-retest reliability of repeated sprint ability test with assessments being conducted after seven days (Tables 5 and 6) [51]. The study was rated of “poor” methodological quality mainly because of small sample size used in the reliability analysis. There was no evidence on validity or responsiveness found for the test.

Repeated-effort ability (REA) test

One study assessed the test-retest reliability of repeated-effort ability test with assessments being conducted after seven days [51]. The study was rated of “poor” methodological quality mainly because of small sample size used in the reliability analysis. There was no evidence on validity found for the test.

Repeated high-intensity exercise (RHIE) tests

One study evaluated the test-retest reliability of three different repeated high-intensity exercise tests, namely, the repeated high-intensity exercise backs test, repeated high-intensity exercise rugby union forward test, and the repeated high-intensity exercise rugby league forward test [24]. The quality of the study was, however, rated “poor” mainly because of the small sample size per reliability analysis utilised for each test. There was no information on the validity or responsiveness of any of these tests in the literature.

30–15 intermittent fitness test (30–15 IFT)

One study assessed the test-retest reliability of the 30–15 Intermittent Fitness test with nine days separating the two assessments [68]. For the measure of reliability for the primary outcome of maximal intermittent running velocity (VIFT), the study was rated as of “good” methodological quality. A positive rating (ICC = 0.89) for the test-retest reliability was reported for the test. Validity of the test was assessed in one study (Tables 5 and 7) [95]. The study was, however, rated “poor” on quality for the convergent validity of the 30–15 Intermittent Fitness test [95].

Triple 120-m shuttle test (T120S)

One study examined the test-retest reliability of the Triple 120 m shuttle test for anaerobic endurance using a four day interval between assessments [70]. On the other hand, the same study evaluated the criterion validity of the test against the Wingate 60s (W60) cycle test. The study used a small sample size of 12 rugby league players both for the reliability and the validity study and was rated “poor” on methodological quality. No information was found on the responsiveness of the test.

Agility/change of direction speed tests

505 test

One study examined both test-retest reliability (over two days) and the construct validity of the 505 test [19]. The study was rated “fair” on methodological quality and a positive rating (ICC = 0.90) was reported for the test-retest reliability. For the construct validity, a negative rating was found for the 505 test as the results of the test showed an unexpected marginal effect size (ES = 0.28) because there were no significant difference between groups on the performance of the test. No information on responsiveness was found for the test.

Modified 505 test

Reliability of the Modified 505 test was investigated in one study [19]. The study was “fair” on methodological quality because of the large sample size. A positive rating (ICC = 0.92) on the test-retest reliability was found for the test. The same study investigated the construct validity of the test. The study had “fair” methodological quality on validity. A negative rating of construct validity (known-group validity) was found for the Modified 505 test as there was no significant difference between groups (ES = 0.32). Therefore, less than 75% of the results were in accordance with the hypotheses. No information was found for the responsiveness of the test.

L run test

One study examined both the test-retest reliability (over two days) and the construct validity of the L run [19]. The study was rated “fair” on methodological quality and a positive rating (ICC = 0.95) was reported for the test-retest reliability. For the construct validity, a negative rating was found for the L- run test as the results of the test showed an unexpected marginal effect size (ES = 0.28). There was no information found on responsiveness of the test.

Change of direction speed test

Two studies reported on the reliability of the change of direction speed test [6, 74]. The test-retest interval ranged between three to seven days. The same studies provided evidence on the construct validity (known-group validity) of the test [6, 74]. However, the two studies were rated “poor” on methodological quality for both reliability and validity. There was no information found on responsiveness of the test.

Sergeant (vertical) jump test

For the Sargent Jump test, there was only one study which was found evaluating inter and intra-rater reliability of the test [90]. Intra-rater reliability was assessed with testing sessions separated by two hours whilst inter-rater reliability assessments were separated by two days. The study was rated “fair” on methodological quality. A positive rating for intra-reliability (ICC = 0.99) and inter-rater reliability (ICC = 1.00) was reported for the test. The same study evaluated the validity of the Sergeant Jump test and showed positive criterion validity against the Jump Platform (JP) test using 45 soccer participants. The study was rated “fair” quality for criterion validity. There was no information found on responsiveness of the test.

Bench press repetitions-to-fatigue tests

One study examined the construct validity of three different upper-body strength-endurance tests, namely, bench press repetitions-to-fatigue at 60% of one repetition maximum test (BP RTF 60% 1RM), bench press repetitions-to-fatigue at 60 kg (BP RTF 60) and bench press repetitions-to-fatigue at 102.5 kg (BP RTF 102.5) [81]. For the BP RTF 60 and 102.5, the study was rated “fair” on methodological quality because of the adequate sample size (n = 38). A positive rating of construct validity was found for the two tests. However, for the construct validity of the BP RTF 60% 1RM test, the study was rated “poor”. There was no information on the reliability or responsiveness of the three tests in measuring upper body strength-endurance.

Best evidence synthesis: level of evidence

A summary of best evidence synthesis are presented in Table 9. The synthesis was derived from information on the rating of the methodological qualities of the studies and results on the measurement properties of the tests. Only studies with “fair” to “good” methodological quality were used to determine the level of evidence per test for each studied measurement property. Best evidence synthesis showed moderate evidence to support the test-retest reliability of the 30–15IFT test. Limited evidence was found to support the test-retest reliability and the known-group validity of the 5 m sprint test, 10 m speed test, 20 m speed test, 505 test, modified 505 test and the Lrun tests. There is also limited level of evidence for inter/intra-rater reliability and criterion validity of the Sergeant (vertical) jump test. Furthermore, there was limited evidence on the known group validity of the upper-body strength endurance tests of Bench-Press repetitions-to-fatigue at 60 and 102.5 kgs. There is unknown evidence available on the measurement properties of all the other tests identified in stage 1.

Table 9 Best level synthesis for the physiological tests

Discussion

The aim of the present systematic review was two-fold. Firstly, we systematically reviewed 70 studies in Stage 1 to identify physiological characteristics evaluated in rugby and the corresponding tests used to measure each construct. Thereafter, 20 studies were systematically reviewed in Stage 2 to provide an overview on the measurement properties of the physiological tests identified in the studies. Most of the included studies from stage 1 were from Australia, United Kingdom, New Zealand, and South Africa. This probably reflects the popularity of the sport of rugby in these respective countries. The fact that there were an almost equal number of adult and adolescent rugby studies indicates that rugby is extensively studied in junior and senior players. It is also possible to speculate that the sport is equally popular among junior and senior players.

One most important finding that emerged from stage 1 was that there are a number of physiological characteristics that are commonly investigated among rugby players. Fifteen physiological characteristics were identified. This extensiveness probably confirms wide interest researchers have in physiological characteristics. The interest could be linked with suggestions that success in rugby is highly dependent on physiological characteristics [75]. With increased professionalism and competition, there has been extensive investment in research towards establishing physical qualities important for successful performance in professional rugby. Moreover, this breadth of physiological characteristics under investigation potentially highlights the physical nature of the sport and diversity in attributes needed to meet the physical demands of the game. It is well-established that rugby is a physical sport requiring participants to partake in challenging physical collisions such as scrummaging, tackling, aggressive mauling and rucking which require optimal muscular strength, power and endurance [5]. This gives rationale to the preponderance of studies investigating lower and upper body muscular power [15, 16, 30,31,32,33,34,35,36, 40, 49, 61, 64, 73], lower and upper body muscular strength [5, 7, 18, 27, 38, 42, 55, 56, 69, 78, 80] and muscular endurance [7, 15, 36, 81]. In addition, rugby players variably cover 5000 to 7000 m during match play and engage intermittently in high-intensity efforts which require exceptional agility, anaerobic and aerobic capacity, speed, repeated sprinting and effort ability and generation of high levels of concentric and eccentric force production [53, 75]. This also provides justification for numerous studies investigating attributes such as speed, agility, prolonged high-intensity intermittent running ability, repeated sprint ability and explosive lower leg power [7, 16, 19, 30,31,32,33,34,35,36,37,38, 40, 49, 51, 53, 70, 72, 76].

Stage 1 findings also showed that almost all physiological characteristics had multiple tests for measurement. For example, this review showed that change of direction speed/agility is often evaluated using the 505, modified 505, Illinois Agility test, change of direction speed test among other tests. However, it was surprising to discover that for all the tests identified in Stage 1, none had all the measurement properties (reliability, validity and responsiveness) investigated using rugby participants. In addition, of the 63 tests identified in Stage 1, only 21 had information on at least one of the measurement properties from rugby and related sports. This suggests that there is limited reporting of the measurement properties for tests commonly used in rugby in the literature. This was particularly evident for the property of responsiveness. All these findings are interesting and raise questions on the rationale for selection of tests by researchers in the field of rugby. For example, speed was the most commonly studied physiological characteristic in the included studies. It was frequently measured from linear distances varying between 5 m and 60 m (Table 2). The commonly tested sprinting distances for speed were, however, the 10 m, 20 m and 40 m. Professional rugby studies have provided the evidence that players seldom sprint distances greater than 40 m in a single bout [100]. This probably justifies the predominance use of the 10 m, 20 m and 40 m sprint tests in assessing rugby players in the literature [30,31,32,33,34,35,36,37,38,39,40]. In addition, straight-line sprinting is reported to be broken down into three phases: acceleration, attainment of maximal speed, and maintenance of maximal speed [101]. This is also possibly justifies the use of more than one sprinting distance for assessing speed as all these distinct qualities of speed should be evaluated separately. Although there could be plenty of reasons researchers prefer a specific test over others, literature generally recommends the use of feasible, reliable, valid and responsive tests [102]. This review found that there is dearth of high-quality studies (according to the COSMIN scoring system) investigating the measurement properties of speed tests using rugby participants. Best evidence synthesis only showed that there is limited evidence for the test-retest reliability and the known-group validity of the 5 m sprint test, 10 m sprint test and the 20 m speed test.

Repeated-sprint ability has also been reported to be extremely important in rugby given the high-intense and intermittent nature of the sport [100]. This review showed that the construct is commonly measured using the Repeated 20 m sprint test and the Rugby-Specific Repeated Speed test. There were no high-quality studies found investigating the measurement properties of these tests in rugby. Only one study of “poor” methodological quality was found evaluating the test-retest reliability of the repeated 20 m sprint test using 12 rugby participants [51]. One needs to apply caution when adopting or using these tests in future studies using rugby players. High-quality future studies may need to explore the measurement properties of these tests. Repeated-sprint ability tests have been reported to underestimate the repeated high-intensity exercise demands of rugby [24]. To overcome the shortcomings of the repeated 20 m sprint test, Austin et al. [24] assessed the reliability of three repeated high-intensity exercise tests specifically developed for backline players, RU forward players and RL forward players. The study was, however, rated as of “poor” methodological quality because of the small sample size per reliability analysis of each test and short interval (2 days) for the test-retest assessments.

There is dearth of high-quality studies investigating the measurement properties of the Yo-Yo intermittent recovery (Level 1 and 2) tests in rugby. This is despite the popularity of the tests in assessing prolonged high-intensity intermittent running ability/endurance and maximal aerobic power among rugby players [15, 24, 53,54,55,56, 69]. This creates a need for future studies to specifically evaluate the measurement properties of the test using rugby participants. However, much of the information on measurement properties of these tests reported in rugby studies is referenced from validation studies conducted using participants from other sports. There are multiple studies providing the evidence of the measurement properties (reliability, validity and responsiveness) of the tests in other related intermittent sports such as Soccer and Australian Rules football [88, 89, 91,92,93,94,95,96,97,98]. However, no high-quality studies were found evaluating the measurement properties of the test according to the COSMIN guidelines. All the studies included in this review assessing the measurement properties of the tests showed “poor” methodological quality. The major drawbacks in all these studies were mainly related to the issues of inadequate sample sizes and lack of a clear description of the expected hypotheses. There were also no studies evaluating the measurement properties of other tests of prolonged high-intensity intermittent running ability such as the repeated 12 s sprint shuttle speed tests.

There were four tests identified estimating maximal aerobic power of rugby players: Multistage fitness, Yo-Yo intermittent recovery level 1 test, 30–15 intermittent fitness (30–15IFT) and the 1500 m run. The multistage fitness was commonly used in a number of studies [7, 8, 10, 16, 27, 30,31,32,33,34,35,36,37, 40, 49, 50, 61,62,63,64]. However, there is paucity of information on the measurement properties for maximal aerobic power in rugby or related sports. Only one study of “good” methodological quality assessed the reliability and the usefulness of the 30–15 intermittent fitness in rugby participants [68]. Best evidence synthesis showed moderate evidence to support the test-retest reliability of the 30–15 Intermittent Fitness test. There were no high-quality studies providing evidence on the measurement properties of tests identified for measuring anaerobic endurance such as the T120 s, Wingate 60 cycle, 300 m Shuttle Run and the 400 m Sprint tests. Holloway et al. [70] evaluated the validity of the T120 s test and compared the validity of the test to the Wingate 60 cycle test. According to the COSMIN guidelines, the study was rated as of “poor” methodological quality as the study had 12 participants.

There were number of studies that evaluated agility/change of direction speed of rugby players. There tests commonly used included: 505 test, Modified 505 test, Illinois Agility test, Change of Direction Speed test and Agility test [6, 16, 19, 32, 34, 35, 40, 53, 74, 77]. There were no high-quality studies evaluating the measurement properties of these tests in rugby. This is despite the importance of agility as a physiological skill in the sport of rugby. There was only one study of “fair” methodological quality according to the COSMIN guidelines that evaluated the measurement properties of the 505 test, modified 505 test, and the L run test. The study showed positive rating for the test-retest reliability of these three agility tests. However, there was negative rating for the known group validity for these tests. These findings support best evidence synthesis results indicating that there is limited evidence on the reliability and construct validity of these tests in assessing agility of rugby players. There is still need for further high-quality studies evaluating the measurement properties of these tests in rugby players.

Lower-body muscular power was the second most commonly studied physiological characteristic among rugby players in the studies included in this review. Although, there were three tests identified estimating the lower-body muscular power in the included studies. We found no studies evaluating the measurement properties of all three tests in rugby. Evidence on the measurement properties were found in one “fair” study evaluating the intra/inter-reliability and criterion validity of the Vertical Jump test among soccer players. A positive rating was found for the intra/inter-reliability of the test. Evidence on criterion validity was found to be questionable (Table 8) as there was no convincing argument that the gold standard test used was “gold”. Overall, best evidence synthesis indicates limited level of evidence for the inter/intra-rater reliability and criterion validity of the Sergeant (vertical) jump test.

There were also no clinimetric studies found testing the measurement properties of tests for lower-body muscular strength, upper-body muscular strength and power. However, one study of fair methodology provided the evidence on the known-group validity of two tests of upper-body muscular endurance (bench press-repetitions-to-fatigue test at 60 kg and 102.5 kg). Best evidence synthesis indicates that there is limited evidence to support the validity of these two tests in evaluating upper-body strength-endurance.

Limitations

The results of this review paper should be interpreted with the understanding of a number of important limitations. Currently, there are no published reviews investigating measurement properties of performance-based tests measuring physiological characteristics in rugby. This renders comparisons with other review studies impossible. However, it suffices to suggest that these results expose a research gap on high-quality studies evaluating measurement properties for physiological tests commonly used in rugby. Although it could also be a major strength for this review, the inclusion criteria only considered full-text peer reviewed articles and completely excluded grey literature. This publication bias likely threatens internal validity of results obtained on measurement properties for this review as unpublished studies are more likely to report negative or unfavourable results. Although the COSMIN has been developed for the evaluation of measurement properties and has been generally used in the literature for that purpose, the guidelines appear well-suited and more applicable for appraising the quality of questionnaire-based studies. In the context of performance-based tests such as used in rugby, the applicability of the COSMIN as a quality rating tool for the studies on measurement properties still requires careful consideration.

Conclusion

This review identified 15 physiological characteristics commonly evaluated among rugby players. These include speed, repeated sprint and effort ability, repeated high-intensity exercise performance, prolonged high-intensity intermittent running ability, endurance, anaerobic endurance, maximal aerobic power and speed, agility, lower-body muscular power and strength, upper-body muscular strength and power and upper-body muscular endurance. The majority of these physiological characteristics had multiple tests for measurement. Overall, there is paucity of high-quality clinimetric studies evaluating measurement properties of commonly-used physiological tests in rugby. For those tests that had evidence on measurement properties, there was no test which was evaluated with respect to all measurement properties. More studies are required evaluating the measurement properties of the physiological tests commonly used in the sport of rugby. The 30–15 intermittent fitness test (30–15IFT) test was the best rated test on maximal aerobic power with moderate evidence supporting its test-retest reliability. The 5 m, 10 m and 20 m speed test were the best tests assessing speed, however, with limited evidence supporting their test-retest reliability and the known-group validity. The 505 test, Modified 505 test and Lrun tests were the best tests for measuring agility but with limited evidence supporting their test-retest reliability. The Vertical jump test was the best test for assessing lower-body muscular power, however, with limited level of evidence for inter-rater, intra-rater reliability and criterion validity. Furthermore, there is limited evidence on the known group validity of the upper-body strength endurance tests of Bench-Press repetitions-to-fatigue at 60 and 102.5 kgs.