1 Introduction

Football is a global sport played across many countries and climates. The game has traditionally been played on natural turf. However, climate conditions and intensive use can make it challenging to maintain a good-quality surface [1]. To overcome these issues, third-generation synthetic turf (3G turf) football surfaces have been introduced. Due to the many different designs and constructions of these surfaces, Fédération Internationale de Football Association (FIFA) has developed a testing programme to ensure the surface properties meet a suitable standard [1,2,3] and ensure safety, performance and durability requirements are met.

Understanding how players perceive different surfaces is important to ensure that the ongoing development of synthetic surfaces and the setting of appropriate test standards aligns with players’ needs [4]. Previous studies have used a combination of interviews, focus groups and questionnaires to investigate players’ perceptions of synthetic surfaces and highlighted several themes with intertwining relationships between players, equipment and surfaces [5,6,7,8]. Research into players’ perceptions of football playing surfaces has also highlighted differing viewpoints due to various factors, such as ability level [9], playing position [7], country [8], sex [10], age and surface experience [7,8,9]. Furthermore, attempts to establish links between players’ subjective perceptions and objective measures of performance have highlighted discrepancies, with player dissatisfaction not reflected in the measured physiological variables [10]. Many of the negative attitudes towards 3G turf surfaces are thought to be partly due to a cognitive bias of players who typically favour natural turf pitches due to a perceived increase in risk of injury from 3G pitches [4, 11,12,13]; this is despite there being little evidence to support these perceptions in studies of injury rates on different surfaces [14].

Other sectors, such as the food and beverage industry, have developed experimental methods designed to reduce potential bias and improve reliability of perception data [15, 16]. Descriptive Analysis techniques, for example, all share a common framework involving the selection and training of a panel to quantify defined sensory attributes of a product [15,16,17]. Suitable panellists are typically identified through a screening process to determine their sensitivity to differences in a product and their consistency in decision making. Rather than studying user preferences (likes and dislikes), investigators work with the panel to formulate a commonly understood vocabulary to define the different sensory attributes of a product [17]. The panel then proceeds to undertake further targeted sensory training exercises to become more discriminant, repeatable and consensual in their decisions [16, 17]. This approach has not been widely implemented in the sporting goods sector, perhaps due to the time and effort required. A trained panel was successful in determining differences in the cushioning of sports shoes [17]. However, the study did not attempt to quantify the benefits of the training process on the quality and reliability of the subjective data collected.

Given the challenges faced in obtaining reliable data regarding players’ perceptions of surface properties, the establishment of a sensory panel to provide more consistent responses and improved discrimination between surfaces could be of benefit in the ongoing development of surfaces and test standards. The aim of this study, therefore, was to develop a suitable process and evaluate the merits of establishing a sensory panel to assess the subjective attributes of 3G turf surfaces used in football. The development of a sensory panel was split into four phases: attribute generation, screening, training and evaluation. Each stage of the process had a distinct objective for the development of the panel (Fig. 1).

Fig. 1
figure 1

The four stages used in the development of a sensory panel in this study with the objectives for each session listed

2 Methods

2.1 Surfaces

The study was conducted in two test areas, one outdoor and one indoor. Both locations were situated on the Loughborough University campus. The outdoor test area consisted of a newly constructed (< 2 years) 3G turf football pitch consisting of a 60 mm monofilament carpet and granulated thermoplastic infill (Table 1). The indoor test area contained ten 3G turf test surfaces, carefully constructed specifically for this study to provide a range of different surface properties that varied in a controlled manner. A Rotational Traction Tester (RTT) and an Advanced Artificial Athlete (AAA) were used to quantify shock absorption (SA), vertical deformation (VD) and peak torque under loading which are typically referred to within the industry as measures of surface ‘hardness’ and ‘traction’. Testing with the RTT (single test) and AAA (three drops) occurred at five equally spaced locations across each surface and followed the test protocol given in the current FIFA standards [2]. Four surfaces had a similar peak torque but varied in SA (H1-H4, Table 1) whilst a further four surfaces had similar SA but differed in peak torque (T1-T4, Table 1). Surfaces were constructed in lanes 5 × 1.2 m in size to provide sufficient room for players to perform movements. For safety, run-off areas were situated at the end of each lane (Fig. 2). After every 2–4 player sessions, a day was set aside for surface reconditioning and testing to ensure surface properties remained similar for all participants and throughout the process.

Table 1 Surface construction details and surface properties for each lane obtained from the RTT and the AAA using the current FIFA standards [2]
Fig. 2
figure 2

Layout of indoor surface test area used for player perception and mechanical measurement of surface properties

2.2 Participants

Participants were recruited from the men’s and women’s student football teams at Loughborough University. University players were used for two reasons, first, they were more easily accessible than local players which was beneficial as they could participate during working hours and were able to complete multiple sessions. Second, players were mainly selected from the first and second teams as they play at a relatively high standard (tier 2 and 3 for women and level 9 for men in the English football pyramid). All of the players had many years of experience of synthetic turf football pitches with most of them training and playing on them multiple times a week.

A total of 12 males and 13 females participated in the attribute generation sessions and 11 males (20 ± 2 years, 74.3 ± 6.1 kg, 180 ± 5 cm) and 7 females (20 ± 2 years, 64.6 ± 5.3 kg, 173 ± 10 cm) completed the full sensory panel programme outlined in Fig. 1. Unfortunately, a number of participants dropped out due to injury or were unavailable for all stages of the programme, hence the reduction in numbers. No participants were removed based on their performance in the screening session. Approval was obtained from the Loughborough University Ethics Review Sub-Committee before the study commenced (Ref: 2021–4389-373) and voluntary written informed consent was obtained from all participants at the start of each session.

2.3 Attribute generation

Focus groups were performed with groups of four to eight same sex players on the outdoor 3G turf pitch and each one took approximately one hour (Table 1). The key objectives were to identify the surface attributes the players deemed the most important and to determine the language and terminology used to describe the sensations associated with these attributes (Fig. 1). The protocol for the focus groups followed a methodology previously found to be successful at gathering athlete feedback on sporting products [5, 12, 18, 19]. After an initial briefing to introduce the session, the trained investigator initiated discussion amongst the group by asking open-ended questions. At intermittent intervals, the groups were given a chance to perform movements on the outdoor 3G turf pitch. This enabled the players to gain immediate sensory feedback instead of relying on their memory. As the session progressed, the investigator probed player responses to gain further clarity or detail [12].

“Hardness” and “grip” emerged as key themes during the discussions but definitions varied between players. Further probing of the players’ responses enabled five key sensory attributes to be defined (Table 2). For hardness, the shock felt in the lower body and the deformation felt under foot were identified as key attributes (Table 2). For grip, the speed and confidence in movements alongside the likelihood of slipping were deemed the most important stimuli (Table 2). For each sensory response, an attribute name was given to match the player definitions to allow for easy identification of attributes.

Table 2 Sensory attributes generated by the players in relation to the themes of hardness and grip

2.4 Screening

Screening sessions took place on the indoor test area and consisted of hour long individual player test sessions. The objective of screening was two-fold: to familiarise the players with the attributes, movements and protocol of the testing sessions and also to evaluate the players’ ability to identify differences in the surface samples with relation to the sensory attributes generated (Fig. 1 and Table 2). At the start of each session, the investigator worked through the attributes with the player to confirm their understanding. To ensure the players were using the same stimuli to judge the surfaces, two movements were selected to assess surface attributes (Table 2). The movements were selected based upon observations and discussions with players in the focus groups and also the practicality of performing the given movement in the indoor test area. A jump and landing, equivalent to a simulated header was performed to assess the attributes associated with surface hardness whilst a 180° stop and turn was used to assess the attributes associated with grip (Table 2). For each of the movements, only general standards were prescribed by the investigators to allow the players to implement their own technique in which they felt was most appropriate. Once satisfied with the attributes and movements, the session protocol was explained.

Two groups of four surfaces were used to evaluate players’ perceptions of surface attributes. Surfaces T1–T4, which differed in peak torque, were considered to be the surfaces most likely to generate perceivable differences in Slip, Movement Speed and Confidence, and were therefore used to evaluate these attributes. The sensory attributes Leg Shock and Give were deemed to be closely associated with surface measurements of shock absorption and vertical deformation respectively and, therefore, surfaces H1–H4 were selected for evaluating these attributes (Table 1). A two-alternative, forced-choice (2-AFC) approach was used as this enabled participants to make a direct comparison between surfaces reducing the reliance on perceptual memory, making differences simpler to detect and record compared to other discrimination tests [20]. This resulted in six pairs of comparisons for each attribute and a total of 30 pairs across the whole session.

Each attribute was assessed in a random order but alternated between grip and hardness attributes. All six comparisons were performed consecutively for each attribute to avoid switching between attributes and causing confusion. Players were requested to select a surface even if little or no difference was perceived between the pair of surfaces. Answers were recorded by the investigator and players were not allowed to change their answers retrospectively. Results were processed to determine the level of internal consistency in each participant’s responses and how well the group, as a whole, could discriminate differences between the surfaces.

2.5 Training

The objective of training was to refine the movements and attributes developed during the focus group and screening (Fig. 1). The hour-long sessions were conducted in groups of 3–5 on the indoor test area. Results from the screening sessions identified which attributes and surfaces the players struggled to discriminate consistently and consensually. For each attribute, the investigator would ask the players to conduct two paired comparisons of surfaces. The surface pairs were selected to provide one pair in which strong agreement was observed and one where agreement was weak. The players recorded their answers on their personal smart devices using an anonymous polling system to reduce the effect of dominant players influencing opinions of other players. Polling results were then shown to the players, and where unanimous agreement was not reached, a discussion was initiated. The discussion aimed to identify areas which may have been causing disagreement and provide strategies to increase the level of agreement going forward. This led to two key changes: the attribute definitions were refined (Table 2) and greater control over the movement standards was introduced. Movement Confidence was dropped as an attribute altogether as results from the screening session indicated low consistency in the results. This decision was backed up by anecdotal feedback during the training session where it became clear players were using the attribute as a way of identifying their preferred surface rather than any specific sensory feedback they were receiving.

The jump and landing movement standard was refined to ensure players landed two footed with minimal bending in the knees and ankles. This change attempted to reduce the amount of shock absorbed through the bending of the lower leg and force the surface to absorb more of the impact energy. For the stop and turn, foot placement perpendicular to the direction of travel and the acceleration out of the turn were more stringently enforced to ensure the technique was consistent between players. Objective measures of kinetic or kinematic parameters were not deemed suitable for monitoring the repeatability and comparability of the players’ movements as the instrumentation would most likely interfere with a player’s natural movement. Following refinement of the attributes and movement standards, players were given an opportunity to practise and ask any outstanding questions related to the changes.

2.6 Evaluation

By the evaluation stage, all players had received training on the movement standards and sensory attributes. The objective of the session was, therefore, to evaluate the surfaces in the indoor test area to determine if the players could discriminate between the surfaces in relation to the sensory attributes (Fig. 1). To ensure the players did not base decisions on memory recall from previous sessions, the positioning of the surfaces within the lane structure was changed so no surface remained in the same position from the screening session (Fig. 2). The hour long testing protocol remained the same as the screening session with a randomised order of attributes. Players completed all six pairs for each attribute before moving onto the next attribute. To minimise the influence of fatigue, players were allowed to rest for a self-determined period between each paired comparison; only when the participant indicated that they were ready would the next comparison be performed. Results were processed to determine the level of consistency and discrimination in the players’ responses and comparisons were made to the results from the screening session to evaluate any improvements.

2.7 Data processing

Results from the screening and evaluation session were recorded in Microsoft Excel and subsequently processed in MATLAB (Mathworks Inc, Natick, MA, USA). Players’ responses were evaluated using two statistical tests. Firstly, intra-player (within player) consistency was assessed by determining the number of circular triads present in each player’s pairwise comparisons for each attribute (Fig. 3). The number of circular triads identified was then used to calculate Kendall’s coefficient of consistence where a value of one indicates complete consistency (zero circular triads) in the player’s answers and value of zero indicates maximum inconsistency (Fig. 3) [21]. Kendall’s coefficients for all four attributes were combined to produce an overall consistency score for each player with a value of four representing perfect consistency across all attributes (Table 2). The presence of occasional inconsistencies in a player’s responses does not necessarily indicate that the player is a poor judge. If the differences between two samples are so small that they are barely distinguishable, it is understandable that a player may not be able to consistently identify the difference.

Fig. 3
figure 3

Representation of a player’s responses where each arrow represents the comparison made between the two surfaces connected and the direction of the arrow indicates the surface perceived to have ‘more’ of a particular attribute. a Inconsistent response resulting in a circular triad and b a consistent response

Second, the ability of the group of players as a whole to discriminate between the surfaces for each attribute was investigated. To determine if the players could identify differences in the surfaces, Friedman’s T statistic was calculated, followed by Tukey’s post hoc analysis to determine if, and subsequently where, significant differences were found (p < 0.05) [21]. A larger value of Friedman’s T statistic is also indicative of improved discrimination between surfaces. Changes in the level of player consistency and discrimination between screening and evaluation sessions for the eight surfaces that were compared (H1–H4 and T1–T4, Table 1) were used to assess the impact of training.

3 Results

The consistency of players’ responses was generally strong in both screening and evaluation phases with only players W3 and W4 achieving a consistency score less than three in both sessions (Fig. 4). Player M2 improved considerably, but for the majority of players (over 75%), their total consistency score remained similar or identical across the two testing phases, including all of the females (W1–W8). Three players, however, had moderate decreases in consistency (M5, M8 and M10).

Fig. 4
figure 4

Kendall’s coefficient of consistency for each player during screening and evaluation phases. Coefficients were combined for all four attributes to give a score from 0–4 with 4 indicating perfect consistency

Players identified significant differences (p < 0.05) between the surfaces for all attributes during both screening and evaluation sessions (Fig. 5). The highest T statistic was for the attribute Give, indicating players were more able to discriminate between the surfaces for that attribute (Fig. 5). Increases in the T statistic were also seen across all attributes between the screening and evaluation phases with Slip producing the largest increase of 14.5 (Fig. 5).

Fig. 5
figure 5

Friedman’s T statistic to determine if significant differences were found between surfaces for each attribute. T exceeded the critical value of 7.8 (p < 0.05) for all attributes and for both screening and evaluation phases

Post hoc analysis revealed the surfaces players were able to discriminate between for each attribute (Fig. 6). For the two grip-related attributes, Movement Speed and Slip, two significantly different pairs emerged during the screening, which increased to four pairs in the evaluation phase (Fig. 6a, b). During the screening phase, the smallest perceivable difference in Movement Speed and Slip between surfaces corresponded with a change of 11 and 13 N.m in peak torque respectively (Table 1 and Fig. 6a, b). During the evaluation phase, the smallest perceivable difference reduced to 7 and 4 N.m for the same attributes (Table 1 and Fig. 6a, b).

Fig. 6
figure 6

Rank Sum ± ½ Tukey’s honestly significant difference for a Movement Speed b Slip c Leg Shock and d Give for both screening and evaluation phases of testing. Rank sums have been normalised on the scale 0–100. Surfaces are considered to be significantly different (p < 0.05) if the bars do not overlap. The most closely related measured property for each surface is listed in the centre of each figure

For the hardness-related attributes, Leg Shock and Give, four significantly different pairs of surfaces were identified in both screening and evaluation phases, although for Give, the pairs differed between screening and evaluation. Significant differences in perceived Leg Shock and Give corresponded to changes as low as 2% in shock absorption and 1.5 mm in vertical deformation during both phases of the study (Table 1 and Fig. 6c, d). During screening, two clusters of surfaces were perceived to be similar (H1 and H4, H2 and H3) but with significant differences between the two clusters (H2 and H3 were perceived to deform less and result in greater Leg Shock than H1 and H4). During the evaluation phase, however, greater discrimination was found, with surface H3 moving further towards the extremity of the scale and H2 emerging as having a more intermediate ranking, sitting between the other surfaces (Table 1 and Fig. 6c, d).

4 Discussion

The aim of this study was to develop a suitable process and evaluate the merits of establishing a sensory panel to assess the subjective attributes of 3G surfaces used in football. A four phase process was developed (attribute generation, screening, training and evaluation) with an improvement in the panel’s ability to discriminate between surfaces when comparing the initial screening session to the final evaluation session.

The aim of attribute generation was to identify the key movements used by players to evaluate surface attributes and develop a language to describe the sensory feedback they receive (Fig. 1). Players identified surface “grip” and surface “hardness” as key themes related to the player-surface interactions. There were still discrepancies between players over how to define these surface themes, thus identifying the key sensory attributes the players were using to assess “grip” and “hardness” provided added confidence that the players were judging the same sensation. Related terms used in the industry such as ‘traction’ were rarely used by the players, highlighting the importance of using player generated descriptors to avoid misinterpretation and ambiguity. Similar conclusions and recommendations were reached about the importance of language in a study on shoe ‘cushioning’ [17]. Giving the players sufficient time to perform movements between periods of discussion appeared to be beneficial as it allowed the players to focus on the specific sensory feedback they were receiving from the surface without distraction from ball or gameplay. As progress was made through the subsequent phases of the study (Fig. 1), the players became further accustomed to the sensory feedback they were receiving and allowed for further adjustment to the attributes.

The screening session tested the players’ ability to discriminate between surfaces using the sensory attributes generated previously. This also served as an opportunity to become accustomed to the protocol and movement standards that were to be used in further sessions. Typically screening has been used to remove panel members who do not display a required level of consistency or discrimination in their answers [15]. In this study, however, all participants remained part of the panel even after assessment of the screening results (Figs. 4, 5, 6). This decision was made for a number of reasons, first, four sensory attributes were assessed by the panellists, and, therefore some panel members could discriminate one attribute better than another. Instead of removing the player, it was hypothesised that, through sensory training, their ability to discriminate between surfaces for a given attribute could be improved. Furthermore, the aim of this study was to establish the merits of a sensory panel, thus, establishing if training could improve the ability of players to discriminate between surfaces was a key outcome. The decision not to deselect any players following the screening was partially justified by player M2 who improved considerably, achieving the maximum consistency score in the evaluation session.

Players identified significant differences between the surfaces in both screening and evaluation phases but the level of discrimination increased further after training (Fig. 5). Whilst players were able to discriminate between surfaces with large differences in properties during the screening phase, training assisted them in being able to discriminate between smaller differences during the evaluation phase of testing (Fig. 6). Anecdotal evidence from the players during the evaluation phase also supported these results, with many commentating how the assistance of training made it easier for them to decipher differences between the surfaces.

The key outcomes from the creation of a trained sensory panel were the common language created and the level of discrimination achieved after training. The difference in discrimination between the untrained panel during the screening phase and the trained panel during the evaluation phase demonstrates how this method of collecting perception data can be beneficial in characterising complex sensory attributes of surfaces [17]. Whilst training can be more time intensive than traditional approaches [8,9,10,11,12], this can be balanced by the reduced number of participants needed to produce quality perception data [15, 17].

Collecting subjective feedback on different 3G turf surfaces presents both practical and logistical challenges. The screening, training and evaluation phases of the study were undertaken in an indoor test area. This provided two major advantages over testing in the field. Firstly, surface properties could be carefully controlled by the investigators both during construction and maintenance. The nature of 3G turf means that properties can change between different areas depending on usage and level of maintenance and also due to environmental conditions [4]. Hence, using an indoor test area alongside a regular maintenance and testing protocol meant surface properties could be more closely controlled. It was important for the surfaces to be constructed to isolate one property to reduce the effect of confounding variables. This was only possible with careful planning and construction of surfaces to specifications not typically found in the field. Real-world installations are constructed to provide desirable playing conditions and it can be hard to find surfaces that provide the same range of properties investigated in this study. Care was taken, however, to ensure the surfaces visually resembled typical 3G turf surfaces found across the United Kingdom, so they did not look dissimilar. The second advantage was the proximity of the surfaces to one another which allowed for an immediate direct sensory comparison without the reliance on memorised feedback. The smallest perceivable differences in peak torque, shock absorption and vertical deformation identified in this study are likely to be greater when comparing real world installations due to the increased complexity of judging differences between surfaces when multiple properties are changing and when an immediate comparison cannot be made due to the time taken to travel between installations.

An important next stage of the research, therefore, will be to evaluate players’ perceptions of real world installations and compare with the results of this study to further validate the findings. The player perception data can also be used to validate the objective measures from existing surface test devices such as the AAA or RTT or develop new measures which correlate better with players’ perceptions. These methods will also be invaluable for addressing a major environmental challenge faced by the sports surfaces industry. The European Commission has recently proposed a ban on the micro-plastic styrene–butadiene–rubber crumb used in 3G synthetic surfaces [22] and, therefore, more sustainable alternatives will need to be identified in the coming years. The techniques developed in this research programme can be used to ensure that new materials and technologies for synthetic turf meet the needs and desires of the players. The methods used may also need to be developed further, in particular to incorporate the views of a wider population such as professional players or ‘naïve consumers’ that more fully represent the global footballing community.

5 Conclusions

This study investigated the merits of establishing a trained sensory panel to capture reliable player perceptions of 3G turf. The results highlighted how targeted training can improve a player’s ability to perceive subtle differences between surfaces. A key aspect of the sensory panel was the opportunity for the players to define the attributes themselves using their own language which was refined over time as they became more experienced in interpreting their own sensory feedback from a surface. The development of specific sensory attributes added further confidence that the panel were using the same sensory feedback to evaluate surfaces rather than their individual interpretations of terms, such as “grip” and “hardness”. A side-effect however, was the time taken to work with the players and the level of control needed over the variables to ensure surface properties remained consistent. The language developed in this study could be useful for future studies particularly if using untrained players as it may be more meaningful to them than industry generated terms. Further research to validate this study’s findings in the field, across different surface types and amongst a broader sample of players should also be considered.