A total of 24 male elite youth football players participated. They were recruited from the under-13 team of the BSC Young Boys, a professional football club in Switzerland. All participants were part of a talent-development program and engaged, on average, in four football-specific training sessions and one match per week. From the initial sample, five participants were not available for the posttest and three participants were excluded due to attendance of less than five of the six training sessions. A complete data set of 16 players (Mage = 12.90 ± 0.27 years) was available for the final analyses. The study was approved by the University’s ethics committee and carried out in accordance with the Declaration of Helsinki. Accordingly, permission to participate was obtained from the players’ parents in advance.
After excluding dropouts, eight participants remained per experimental group (DT and FS, respectively). The groups did not differ in terms of age (MDT = 12.93 ± 0.29 years; MFS = 12.87 ± 0.27 years; t(14) = 0.41, p = 0.69), club-football experience (MDT = 8.43 ± 0.80 years; MFS = 7.62 ± 1.25 years; t(14) = 1.54, p = 0.15), general technical skill as assessed by a standardized passing and dribbling test conducted before the interventions (Forsman, Blomqvist, Davids, Liukkonen, & Konttinen, 2016) (MDT = 40.69 ± 3.51 s; MFS = 40.92 ± 2.43 s; t(14) = −0.15, p = 0.88) or the amount of football played outside of regular training during the time of the study (MDT = 2.44 ± 0.68 h per week; MFS = 2.88 ± 0.99 h per week; t(14) = −1.03, p = 0.32).
Tasks and materials
Football-specific divergent thinking task
Similar to previous studies of creativity in sports, a video-based task was used to assess each player’s football-specific DT (Furley & Memmert, 2015, 2018; Hüttermann, Nerb, & Memmert, 2018, 2019; Klatt et al., 2019; Memmert et al., 2013). In this task, participants were individually shown 20 video clips of attacking game situations that were temporally occluded at key moments by freezing the final video frame. The participants’ task was to imagine themselves as the player in ball possession and to name as many possible solutions as they could think of within a 45 s time interval.
Due to data protection issues, the authors were not provided with the video scenes used by Memmert and colleagues (e.g., Memmert et al., 2013) and needed to construct their own test battery. In order to select video clips of game situations in which (a) the player in ball possession has multiple appropriate solutions and (b) that these solutions can be attributed to varying solution categories, seven football experts with longstanding coaching (M = 17.36 ± 7.48 years; UEFA A or B qualifications) and playing (M = 25.43 ± 5.60 years) experience were recruited. In a first step, the experts independently evaluated 40 attacking scenes (approximately 10 s long) from Swiss first (i.e., Super League, n = 25) and second division (i.e., Challenge League, n = 15) matches of the season 2018/19. These scenes were preselected by the first author based on the aforementioned criteria. For every scene, experts indicated all possible options for the player in ball possession and rated these options in terms of quality (1–5; 5 = excellent, 1 = not good at all). In addition, they were asked to rate the feasibility of each of six predefined solution categories for the respective scene (shot on goal, dribbling, short pass, feint followed by a pass, lob, cross; 0–5; 5 = excellent, 0 = not feasible; cf. Memmert et al., 2013). The experts were generally encouraged to comment if the respective scene could lead to any ambiguities. Based on this evaluation, scenes were omitted if they (a) led to ambiguous interpretations (qualitative criterion, e.g., the player in ball possession seems to be off-balance in the freeze frame), (b) had less than three solution categories with a rating score of ≥ 2 or (c) did not show sufficient agreement between experts’ ratings of quality of solution categories (i.e., intraclass correlation [ICC] < 0.90). The remaining 33 scenes were subsequently ranked according to the number of potential solutions, weighted by the respective quality rating; meaning that, for each scene, the quality ratings for all possible options were first summed up for each expert individually and then averaged across experts. For validation purposes, we asked the same expert panel to individually view the top 24 scenes a second time and to express agreement with the average ratings of solution categories for each scene (1–6; 6 = absolute agreement, 1 = absolute disagreement). As the experts did not show a considerable disagreement (M = 5.7, SD = 0.7), the top 20 clips were finally selected for the DT task. In these scenes, the agreement between experts on the quality of solution categories was very strong (ICC = 0.95).
Functionality and creativity ratings of on-field actions
To assess the functionality and creativity of participants’ actions, the players took part in a semi-structured, 2‑vs‑1 game situation on the field (for a similar task, see Laakso, Davids, Liukkonen, & Travassos, 2019). The task goal for the two attacking players of interest was to outplay a defender within a predefined area on the wing to create a goal scoring opportunity. No specific instructions on how to reach this goal were provided and players were encouraged to compete as in a match. Starting positions were standardized across trials, as illustrated in Fig. 1. The ball carrying attacker (A1) started the trial by dribbling into the defined 2‑vs‑1 area (24 m × 10 m) on the right or left wing. As soon as A1 began, the defender (D1) and the second attacker (A2) came into play. To increase task representativeness, a third attacker (A3) and a second defender (D2) were placed in the centre of the pitch outside the 2‑vs‑1 area. Players were only permitted to dribble into the penalty area or pass the ball to A3 after the ball carrier passed the dotted line. Furthermore, players A1 and A2 were not allowed to enter the penalty area before the ball did and players A3 and D2 were not allowed to enter the 2‑vs‑1 area. The functionality and creativity of the actions performed by players A1 and A2 were later assessed; meaning that only the players in these two positions were evaluated in the on-field task, whereas the other four players served to create a realistic game situation.
The actions performed in the on-field task were video recorded. Experts were then asked to rate the two players’ individual actions in each situation in terms of functionality and creativity. The expert rating was conducted in accordance with the Consensual Assessment Technique (CAT; Amabile, 1996; Hennessey, Amabile, & Mueller, 2011), which has been deemed as the gold standard in creativity research (Baer, 2016; Kaufman, Plucker, & Baer, 2008). In contrast to DT tests, the CAT is not tied to any particular theory (Kaufman et al., 2008) but grounded in a consensual definition of creativity, with the understanding that ‘a product or response is creative to the extent that appropriate observers independently agree it is creative’ (Amabile, 1996, p. 33). Based on this notion, the CAT provides a number of procedural requirements (cf. Hennessey et al., 2011): In essence, a panel of experts is asked to rate the creativity of products—in the present study, of players’ actions—(a) independently, (b) relative to one another (as opposed to rating against some absolute standard), and (c) solely based on one’s subjective conception of creativity, i.e., on what one perceives as creative in the present context and on the basis of one’s experience in the respective domain. Therefore, it is essential for the validity of the CAT, that experts are not trained in advance to agree with one another in the assessment nor instructed to use specific criteria against which creativity of actions should be assessed (cf. Amabile, 1996).
In coordination with the club’s coaches, play and practice tasks were designed that aimed at enhancing either players’ football-specific DT or their motor skills (for in-depth descriptions of exemplary tasks, see the appendix).
Training sessions for the DT group comprised of playful and pronouncedly variable game forms, stimulating players to come up with a variety of new solution ideas across a wide range of situations. The game forms were developed based on the methodological principles of the TCA (e.g., Memmert, 2015b). Principles were combined and incorporated into sessions based on deliberate play (including variable numbers of players; see Greco, Memmert, & Morales, 2010) or one-dimension games (e.g., identification of gaps; see Memmert, 2015b). In these games, a wide range of stimuli were introduced by, for example, adding unexpected environmental changes, introducing various possibilities to collect points or alternating between using feet and hands, as recommended by the diversification principle of the TCA. Additionally, inspired by creativity trainings in school settings (e.g., Fasswald-Magnet, Hefler, Papousek, Weiss, & Fink, 2014; Fink, Reim, Benedek, & Grabner, 2020) and social priming (Furley & Memmert, 2018), game-like tasks stimulating DT in football situations were designed and integrated into certain game forms. All DT sessions also aimed to increase players’ breadth of attention (cf. Memmert, 2007; see also the deliberate coaching principle of the TCA) by creating practice environments that demanded a wide focus of attention and by refraining from providing feedback and instructions during play. Furthermore, in accordance with the deliberate motivation principle of the TCA (see also Hüttermann et al., 2018), all games were instructed with a promotion rather than a prevention focus (e.g., ‘you can collect points for your team by’ or ‘your goal is to’, as opposed to ‘you have to go to the middle zone if you lose the ball’ or ‘I expect from you to’).
In contrast, the training sessions for the FS group were comprised of motor skill-related practice tasks. The focal points of these sessions were derived from situational task demands and to-be-achieved action goals (cf. Hossner, Schiebl, & Göhner, 2015), including, for example, de-stabilizing the direct opponent when dribbling towards him. Hence, the ‘functional’ aspect of the FS training laid focus on guiding players to expand and stabilize their own functional task-solutions rather than practicing ‘ideal techniques’. Based on the generally accepted recommendations for motor-skill practice from both movement science literature (e.g., Davids, Button, & Bennett, 2008; Hossner, Kredel, & Franklin, 2020; Williams & Hodges, 2005) and football coaching literature (e.g., Daniel, Peter, & Vieth, 2014), players were confronted with representative tasks and given instructions in terms of the intended movement effects, i.e., towards desired states (e.g., to bring the opponent off-balance). Given multiple attempts in each situation, players were encouraged to explore different—though still functional—ways to solve the task at hand. Meanwhile, variability was introduced by systematically changing task-relevant constraints (e.g., distances and angles between the attacker and the defender). Accordingly, players were ‘forced’ to continuously adapt and explore alternative solutions to reach the task goal in a functional manner.
The study was carried out over 4 weeks, with testing and training sessions fully integrated into the regular club training. The 3‑week intervention phase comprised of six 20 min training sessions. Players were randomly assigned to one of the two experimental groups (DT vs. FS) after the pretest. Both the DT and the FS sessions were delivered in small groups (5–6 players; for games requiring more players, additional players who did not participate in the study joined). DT and FS sessions were conducted in parallel and, thus, under exactly the same weather and pitch conditions (artificial turf). Two instructors delivered the intervention sessions. Both instructors were football-experienced sports students. While not being blinded about the experimental hypotheses, the instructors were provided with a clear protocol that was collaboratively designed by the researchers and club’s coaches and they affirmed to have delivered the training sessions accordingly. To further ensure that treatment effects were independent of the instructors, they switched from leading either the DT or FS sessions to instructing the other group halfway through the intervention.
The pretest was conducted one week before the start of the intervention over two consecutive days. Both field-based assessments were carried out on the first day, which included the standardized technical skill test followed by the on-field game situation. The standardized technical skill test was conducted only to ensure that both experimental groups did not differ in general technical skill. Participants were randomly allocated to groups of five players for the pretest (independent of the later formed experimental groups). After a collective warm-up of 20 min, the first two groups partook in the standardized technical skill test whilst the remaining players engaged in their regular training. Following the technical skill test, the 2‑vs‑1 situations were presented. Instructions on the task goal and rules were given, and players were assigned to their starting positions in the task (as illustrated in Fig. 1). After every trial, players rotated positions. After a full rotation, the procedure was repeated on the wing on the other side of the field where the initial order was reversed and players D1–A2 and D2–A3 switched positions to ensure new attacker-defender pairings. Consequently, each participant performed four trials in attacking positions of interest, i.e., each player completed two trials as A1, and two trials as A2. After approximately 40 min, the next two groups followed the same procedure. All trials were recorded on video (GoPro Hero 4, 1920 × 1080, 25 fps) from the nearest sideline (Fig. 1).
On the second day, the DT test was conducted. To this end, participants attended individual sessions that lasted about 30 min. After being provided with instructions and a demo scene, participants were presented with 20 test scenes on a standard tablet (9.7-inch). Each scene lasted approximately 10 s before the video stopped at a key moment of action. Participants were asked to imagine themselves as the player with the ball. For the subsequent frozen-frame period of 45 s, a countdown was visible on the screen. Participants were instructed to name all options they could think of within the given time period. Furthermore, they were asked to indicate the option they would finally choose; however, in accordance with established procedures (e.g., Hüttermann et al., 2019), the latter was not included in the data analysis. The verbal responses were recorded on audio tape and in the experimenter’s notes on a response sheet.
The exact same procedures were followed for the posttest, which started two days after the last training session. Although the procedure ensured that both experimental groups were tested and treated under the same weather and pitch conditions, the comparability of environmental factors at pre- and posttest could not be guaranteed. Specifically, in comparison to the pretest, the weather conditions during the on-field assessment at posttest were extremely poor (i.e., heavy rain followed by a substantial temperature drop). Consequently, pre–posttest main effects may not only reflect learning but also the more or less pronounced adverseness of weather conditions, such that the interpretation of group differences should be based on the interaction between group (DT vs. FS) and time of measurement (pre vs. post).
Football-specific divergent thinking task
To quantify players’ DT ability from their responses in the video-based task, the three DT components—fluency, flexibility and originality—were assessed following the standard procedure in sports-related creativity research (cf. Memmert et al., 2013). Accordingly, participants’ audiotaped verbal responses—supplemented by the experimenters’ notes on a response sheet—were first coded corresponding to the classification of options defined in the test construction. Fluency was evaluated as the number of solution ideas a player generated for each scene. For the flexibility score, each response was grouped into a solution category (i.e., shot on goal, dribbling, short pass, feint followed by a pass, lob, cross; cf. Memmert et al., 2013) and one point was given for every distinct category in which a player had generated a solution. For the originality score, each proposed solution was rated by two independent experts (coaching experience: M = 27.00 ± 3.00 years, UEFA A and B+ level) on a Likert scale (1–5; 5 = very original, 1 = not original) and then the two ratings were averaged to obtain an originality value for every solution (ICC = 0.74). In conclusion, that means that, for all 20 scenes, we assessed the number of ideas (fluency), the number of different categories of ideas (flexibility) and the unusualness of the ideas (originality) that the participants came up with. The three component scores (fluency, flexibility and originality) were first independently calculated, then z‑standardized and averaged to obtain an overall DT-score for each participant (cf. Memmert et al., 2013).
Functionality and creativity ratings of on-field actions
In order to rate players’ actions in the on-field task, five football experts (coaching experience: M = 22.20 ± 11.05 years, UEFA A and B level; playing experience: M = 27.20 ± 7.63 years) who did not personally know the players were recruited. After the posttest, the video footage from both pre-and posttest was cut into separate video clips for each single trial and reassembled in random orders. To ensure that the experts were completely neutral, we did not inform them about the research question and the experimental groups. In fact, we only informed the experts after completion of their ratings that the videos originated from two different times of measurement and that a training intervention was conducted.
The experts were asked to rate the actions of players A1 and A2 in the game situations in terms of functionality, creativity and technical quality, with the final category included to disguise the experimenters’ research focus. According to the CAT guidelines (Amabile, 1996), no prior training or instruction was provided for the experts to suggest any criteria or definitions of the three qualities. Rather, experts were asked to rely on their expert understanding of how functional, creative and technically well-executed the actions were in relation to the other actions presented and to the specific situational context. At the beginning of the expert rating, 16 randomly selected video clips were presented in order to familiarize the experts with the game situation and the level of the players. Subsequently, the test video comprising of 64 relevant clips (2 test times × 4 trials × 16 participants/2 participants per scene) was shown, which played each clip twice. To reduce sequence effects, the assortment of video clips and the three quality categories (functionality, creativity and technical quality) were presented in a different random order for each expert. Experts were asked to make their judgements intuitively by providing written marks on continuous scales (1–5; 5 = very functional/very creative/very well executed technically; 1 = not functional/not creative/not well executed technically). Furthermore, experts were asked to rate actions of the same player independently of the player’s previously shown actions. From the experts’ responses on the continuous scales, rating scores (1–5) were measured to two decimal places. Each trial was thus rated by all five experts in terms of functionality (ICC = 0.69) and creativity (ICC = 0.63), whilst the additional ratings for technical quality were not further considered. The ratings of the five experts were then combined by computing a mean value for every action. For each participant, the highest functionality and creativity ratings from the four trials in pre- and posttest were used as measure for further analyses, respectively.
For each of the three dependent variables (football-specific DT score, functionality rating and creativity rating of on-field actions), a 2 (groups: DT vs. FS) × 2 (time of measurement: pre vs. post) ANOVA (analysis of variance) with repeated measures on the second factor was conducted. The significance level was a priori fixed at α = 0.05 and the initial sample size had been determined in advance to ensure sufficient power to detect medium-to-large interaction effects (α = 0.05, 1‑β = 0.80, f = 0.30). Significant interaction effects were further analysed with planned t-tests. Furthermore, when the groups differed at pretest (i.e., in their baseline level), we additionally conducted an ANCOVA to compare pre–posttest differences, while controlling for pretest scores. One-tailed tests of significance were conducted for a priori predicted differences and two-tailed tests for further revealed effects. Effect sizes are reported as ηp2 and Cohen’s d.