1 Background

Organizations expect lifelong learning and lifelong remembering from their employees. But many skills are only required infrequently, for instance due to a high level of automation in production (Kluge et al. 2009), demands of worker flexibility and a high task variety (Karuppan 2011), or long periods of non-use during daily operations. Accordingly, skill retention becomes a challenge (Arthur and Day 2013a; Arthur et al. 2010; Kim et al. 2013). In some organizations, knowledge and skill retention are crucial for safeguarding people’s lives, for example in hospital emergency rooms (e. g. Kaye et al. 1987; Perez et al. 2013; Wollard et al. 2006), military and combat environments (e. g. Arthur et al. 1998; Farr 1987; Hoffman et al. 2010; Villado et al. 2013; Wang et al. 2013), general aviation (Casner et al. 2013; Childs et al. 1983; Fanjoy and Keller 2013; King 2015; Walley 1995) and process industries, e. g. chemical or power plants (Bainbridge 1983; Kluge and Frank 2014; Kluge et al. 2014, 2015). Moreover, skill retention also needs to be supported in modern advanced manufacturing technology environments (Jaber and Kher 2004; Karuppan 2011; Nembhard and Uzumeri 2000).

But what is the best way to support skill retention? As the cited literature and previous articles (Kluge and Frank 2014; Kluge et al. 2015) sum up the current state-of-the-art concerning factors influencing skill retention, in the present paper, we conduct a comparative investigation of the design of interventions to support skill retention by way of three studies.

Use of refresher interventions to prevent loss of access to skills after periods of non-use

According to the New Theory of Disuse, learned knowledge and skills, once acquired, remain in long-term memory, and difficulties in recalling knowledge and skills are entirely determined by the current retrieval strength (Bjork 2011). The problem is not the “loss” of knowledge and skill per se, but rather the loss of access (Bjork 2009, 2011; Bjork and Bjork 1992). Access to items in memory is lost due to interference from competing information and altered stimulus conditions such as recency and current cues (Kluge 2014). As one learns new information, procedures and skills, there is potential for competition with related information, procedures and skills that already exist in memory (Bjork 2011). Arthur and Day (Arthur and Day 2013b) describe decay as a matter of competition between relevant and irrelevant information and processes stored in memory. As new items are learned and added to memory, or as the retrieval strength of certain items is increased, for example by more frequent recall, other items become less retrievable (Bjork and Bjork 1992).

Loss of access can be prevented by interventions to refresh the access and to retain a high retrieval strength, e. g. by means of repetition, retraining and rehearsal (Stammers 1981), also called refresher interventions. An intervention is “each kind of externally controlled, goal-oriented and systematic impact on individuals […]” that consists of a number of tasks with instructions (Hager and Hasselhorn 2008, p. 41). A refresher intervention (RI) aims to re-establish a specific skill level that was acquired at the end of initial training, which should be re-established after a period of non-use in which recall of the skill was not required (Kluge et al. 2012, p. 2437).

Research on the methods of delivery of refresher interventions has been neglected so far. While there is a comprehensive body of literature explaining conditions and effects of training and development (Bell et al. 2017), studies which attempt to understand and explain the design and delivery methods of refresher interventions are relatively sparse. Exceptions are papers by Bodilly et al. (1986), Ginzburg and Dar-El (2000), Schendel and Hagman (1982). It is commonly agreed that “the effectiveness of a particular training delivery method depends on the skill or task being trained and the desired learning outcomes” (Bell et al. 2017, p. 315). In contrast, with respect to refresher interventions, organizations seem to show a more or less common sense-based application of delivery methods, while evidence-based decisions regarding delivery seem to be few and far between. The objective of the present paper is to compare three different refresher intervention delivery methods regarding their impact on skill retention. Such a comparison should help practitioners to select refresher intervention delivery methods in an evidence-based manner.

To systematically investigate the effects of refresher intervention methods on skill retention, we selected the three which are assumed to be the most effective (e. g. Farr 1987): 1. Practice, based on the theoretical assumptions regarding skill proceduralisation (Anderson 1982; Schneider 1985; Sun et al. 2001, 2005); 2. Testing, based on the theoretical assumptions of the testing effect (Bjork and Bjork 2006; Roediger and Karpicke 2006); and 3. Symbolic Rehearsal, based on the theoretical assumption of mental practice (Driskell et al. 1994; Farr 1987; Naylor and Briggs 1961).

The underlying principles of practice, testing and symbolic rehearsal

Practice during initial training means repetitive work on a task until a certain proficiency level is reached (Kluge et al. 2009). A Practice refresher intervention is a practical repetition of learned material (Gonzalez et al. 2003; Kluge 2014), which supports the additional proceduralisation of skills (Anderson 1982; Kim et al. 2013). Repetitions have proven to be effective when applied both before and after task proficiency has been achieved (Hagman and Rose 1983). When used as a refresher intervention, Practice can be regarded as a variant of “distributed practice”, in which Practice is interrupted by longer periods of non-use. In general, retention can be enhanced by increasing the amount of training through task repetition, and studies have provided consistent evidence that Practice refresher interventions support skill retention (e. g. Foss et al. 1989; Kontogiannis and Shepherd 1999; Mattoon 1994; Morris and Rouse 1985; Kluge and Frank 2014; Sauer et al. 2008). Testing skills is proposed as a promising approach to support knowledge and skill retention (Bjork and Bjork 2006; Farr 1987; Roediger and Karpicke 2006) and is assumed to be superior to Practice. Research results on the “testing effect” (Roediger and Karpicke 2006) from the field of instructional psychology suggest that Tests that are applied after an initial training phase support knowledge retention more strongly than additional practice of the learning material (McDaniel et al. 2007; Pashler et al. 2007; Roediger and Karpicke 2006). The testing effect is explained by 1) the intense retrieval effort which learners have to invest in the testing situation in order to retrieve information from long-term memory, and 2) a transfer-enhancing processing of information, which is identical in the refresher situation and the later retention assessment situation (Bjork and Bjork 2006; Roediger and Karpicke 2006). So far, the testing effect has mainly been investigated using traditional, non-dynamic verbal learning material, and findings on the superiority of Tests over Practice for complex skills are inconsistent (Karpicke and Aue 2015; Leahy et al. 2015; van Gog et al. 2015; van Gog and Sweller 2015). Our own previous research showed that Testing a skill (a Skill Test refresher intervention) by executing the procedure is more effective than Testing the procedural knowledge only theoretically without executing the procedure (Kluge and Frank 2014).

Refresher interventions can also be designed as Symbolic Rehearsal (Driskell et al. 1994; Farr 1987; Naylor and Briggs 1961), in which a person visualizes how to perform a task, takes notes, or makes a drawing of how to perform a task without actually performing it (Annett 1979; Driskell et al. 1994). Symbolic Rehearsal appears to be particularly promising for the retention of cognitive skills, mental operations (Driskell et al. 1994; Farr 1987; Kluge et al. 2012; Naylor and Briggs 1961) and sequences of actions (Cooper et al. 2001). Previous research by Kluge et al. (2012) and Kluge et al. (2015) showed that Symbolic Rehearsal attenuates skill decay, but is less effective than Practice and Testing (Naylor and Briggs 1961).

As the objective of our research was to find the “one best way” for skill retention, we conducted three studies in order to compare these three refresher interventions applied to three different tasks (a fixed, contingent and parallel sequence task) which are e. g. required in process control environments.

The research cited above originates from different learning and retention contexts, such as military skills and basic educational/school contexts. Thus, it is unsurprising that answers to the question of which refresher intervention is the most effective are mixed. The strongest evidence can be found for the hypothesis that refresher interventions in general support skill retention: Refresher interventions counteract skill decay (H1). The evidence for Practice refresher interventions is unanimous and consistent, whereas the empirical evidence for the testing effect and the superiority of Testing is mixed. Therefore, we assume that: Practice refresher interventions support skill maintenance better than Skill Test refresher interventions (H2). Finally, based in particular on our own research showing that Testing is more effective than Symbolic Rehearsal, we assume that Skill Test refresher interventions support skill maintenance more efficiently than Symbolic Rehearsal refresher interventions (H3).

2 Method

Refresher interventions are especially important for procedural skills, which are susceptible to skill decay even after short periods of non-use (Farmer et al. 1999; Farr 1987). In the present studies, the participants were required to learn such a complex skill. Complex skills were applied in a fixed-sequence, a contingent-sequence, and a parallel-sequence task (Omerod and Shephard 2004). The tasks are summarized in Table 1.

Table 1 Tab. 1 Description of tasksBeschreibung der Aufgaben

Based on findings from preliminary studies, mean effect sizes were be expected (η2p between 0.13 and 0.45, see Kluge et al. 2012). Therefore for the experiments, according to a mean effect size f of 0.25, a significance level of alpha of 0.05, a test strength of 0.95 is assumed and from this a total sample size of at least 80 subjects was determined for each study (Faul et al. 2007).

2.1 Samples

Studies 1 to 3 were conducted from October 2014 to December 2015. The three studies differed only with respect to the learned task type (Table 2).

Table 2 Tab. 2 Samples of studies 1–3Stichproben der Studien 1–3

Study 1. Eighty engineering students (24 female) from the Ruhr-University Bochum participated in the study from October 2014 to December 2014. The participants were recruited by postings on social networking sites and flyers handed out on the university campus. To ensure basic technical understanding, only engineering students were eligible to participate. Participants received 25 € (Control Groups) or 30 € (Refresher Groups) for taking part. The study was approved by the local ethics committee. Participants were informed about the purpose of the study and told that they could discontinue participation at any time (in terms of informed consent). All participants were novices in learning the process control task used in the study. The recruitment was similar for all three studies.

Study 2. Eighty students (28 female) took part in the study from April to July 2015. Four participants were excluded based on the selection criteria (Table 2).

Study 3. Eighty students (18 female) took part in the study from October to December 2015. Seven participants were excluded based on the selection criteria (Table 2).

2.2 Task: Waste water treatment simulation (WaTrSim)

Main task: The complex cognitive skill in the present studies was performed in a simulated process control task embedded in the simulation WaTrSim (Fig. 1; Kluge and Frank 2014).

Fig. 1 Abb. 1
figure 1

WaterSim Screenshot with valves (V1–V9), heaters (HB1, K1, W1, W2) and tanks (Ba, Bb, R1, HB1, Bc, Be, Bd, Bh, Bj, Bk, Bf, Bg)

Screenshot der Bedienoberfläche der Abwasseraufbereitungsanalge mit den Ventilen (V1–V9), den Heizungen (HB1, K1,W1,W2) und den Tanks (Ba, Bb, R1, HB1, Bc, Be, Bd, Bh, Bj, Bk, Bf, Bg)

The participants were trained to execute the particular start-up procedure in a given order and instructed on how to interact with the interface. The procedure comprised the start-up of the plant, which is assumed to be a non-routine task that requires skill retention (Wickens and Hollands 2000). In WaTrSim, the operator’s task is to separate waste water into fresh water and gas by starting up, controlling and monitoring the plant (Kluge and Frank 2014). The operation goal is to maximize the amount of purified gas and to minimize the amount of waste water by executing the start-up procedure in the correct order while considering the right timing for execution. The time permitted to start up the plant is 180 s. The start-up procedure differed in all three studies (Fig. 2).

Fig. 2 Abb. 2
figure 2

Experimental procedure in study 1–3/RI = Refresher Intervention

Experimentelle Prozedur der Studien 1–3/RI = Refresher Intervention

Fig. 3 Abb. 3
figure 3

Description of fixed-sequence task, contingent-sequence task and parallel-sequence task

Beschreibung der Aufgaben der festgeschriebenen Sequenz, der kontingenten Sequenz und der parallelen Sequenz

Secondary (monitoring) task: To measure mental workload and to produce a realistic work setting, the participants were required to perform a secondary task in addition to the main task. The secondary task was performed by monitoring the tank level of tank Ba every 50 s (Table 2 and Fig. 1; tank Ba can be found on the top left, scoring 0–3 times). The performance measure in this case was the frequency of monitoring the tank level.

Participants were told that their objective was to perform the main and the secondary task in parallel. A video of the two tasks is provided as supplementary material.

Study 1. The operation included the start-up procedure of the plant as a fixed-sequence task comprising 13 steps (Fig. 2). Performing the WaTrSim start-up procedure correctly and in a timely manner led to a production outcome of a minimum of 200 l of purified gas. The minimum amount of purified gas in the initial training was used as selection criterion (≥200 l). The start-up time was max. 180 s.

Study 2. The operation included the start-up procedure of the plant as a contingent-sequence task, comprising 13 steps and following four steps for each condition. The following four steps had to be executed depending on the conditions: heating W1 > 15 °C or heating W2 < 70 °C. After one of the conditions had occurred, the correct four steps had to be executed (Fig. 2). Performing the WaTrSim start-up procedure correctly and in a timely manner led to a production outcome of a minimum of 100 l of purified gas. The minimum amount of purified gas in initial training was used as selection criterion (≥100 l). The start-up time was max. 240 s.

Study 3. The operation included the start-up procedure of the plant as a parallel-sequence task. Two sequences had to be operated in parallel: 13 steps for sequence A and three steps for sequence B. Sequence B had to be executed when the level of tank Bf had reached >75% or <25%. After one of the conditions had occurred, the correct two steps had to be executed (Fig. 2). Performing the WaTrSim start-up procedure (both sequences in parallel) correctly and in a timely manner led to a production outcome of a minimum of 200 l of purified gas. The minimum amount of purified gas in initial training was used as selection criterion (≥200 l). The start-up time was max. 240 s.

2.3 Procedure

Studies 1–3

Participants in the Refresher Conditions operated WaTrSim at three time points (Fig. 2): All participants took part in the initial training (week 1); participants of the experimental group received the refresher intervention one week later (week 2); and after another week, all participants underwent the retention assessment (week 3, Table 3). The control group did not receive a refresher intervention.

Table 3 Tab. 3 Experimental procedure and time points of the studiesExperimenteller Verlauf zu den drei Zeitpunkten

Initial training (Table 3)

The initial training phase lasted for 120 min (Fig. 2). Upon arrival, participants were welcomed and introduced to WaTrSim. After completing tests concerning variables measuring individual differences relevant for the study (general mental ability), participants explored and familiarized themselves with the simulation twice. After this, they trained the start-up procedure (fixed-sequence task, contingent-sequence task or parallel-sequence task) with a manual. Following the training, the participants had to perform the start-up procedure four times without help. They were required to produce a minimum of purified gas (Study 1: 200 l, Study 2: 100 l, Study 3: 200 l). The best trial of this series was used as the reference level of skill mastery after training.

Refresher intervention

The refresher interventions are described in Table 4. The control group received no refresher intervention.

Table 4 Tab. 4 Description of refresher interventionsBeschreibung der Refresher Interventionen

Retention assessment (Fig. 3)

Two weeks after the refresher intervention, the retention assessment took place, which lasted for approximately 30 min. After the participants had been welcomed, they were asked to start up the plant. The first trial was used to assess skill retention/decay.

Studies 1–3

The experimental groups received one of three refresher interventions in week 2 (Practice refresher intervention, Skill Test refresher intervention and Symbolic Rehearsal refresher intervention), which are described in Table 4.

The dependent variables included the effects of the complex cognitive skill execution of the main and secondary task, which are production outcome, start-up mistakes and monitoring. The first trial of retention assessment was used for the calculations: The production outcome was measured by the produced amount of purified gas at initial training (week 1) and retention assessment (week 3). The minimum production outcome at the initial training varied for each task: Study 1: 200 l, Study 2: 100 l, Study 3: 200 l. The start-up mistakes are the sum of incorrect valve adjustments and procedure mistakes, such as adjustment of the incorrect valve flow rate, and also varied for each task: Study 1: 0–15 mistakes, Study 2: 0–19 mistakes, Study 3: 0–21 mistakes (recalculated into percentages, 0–1). Monitoring tank level: The secondary task was measured by the number of times the secondary task was performed (Study 1: 0–3, Study 2: 0–4, Study 3: 0–4) by reading out the data from the logfiles.

The control variables were measured according to retentivity and start-up time. Retentivity was measured with the Wilde Intelligence Test-2, which consists of verbal, numerical and figural information (Kersting et al. 2008). First, the participants had to memorize the verbal, numerical and figural information for four minutes. After a disruption phase of 17 min, they then answered reproduction questions related to the memorized information, choosing one of six response options (scores from 0–21; identical for Studies 1–3). The start-up time was measured according to the best production outcome trial of the initial training (week 1) and the first trial of the retention assessment (week 3). The start-up time was limited depending on the task: Study 1: 0–180 s, Study 2: 0–240 s, Study 3: 0–240 s.

3 Results

Data from 226 of 240 participants were included in the following calculations. Three participants were excluded in Study 1, four participants were excluded in Study 2 and seven participants were excluded in Study 3. The selection criterion was the produced amount of purified gas as described above. Descriptive statistics are given in Table 5. To ensure that all groups started under the same conditions, the groups were compared regarding the control variables at the initial training. The groups in Studies 1, 2 and 3 were randomized and did not differ significantly in terms of the control variables (p > 0.05), with the exception of retentivity in Study 1 (F(3,73) = 5.53, p = 0.002, η2p = 0.19).

Table 5 Tab. 5 Descriptive statistics Means, Standard Deviation and Minimum-Maximun in BracketsDeskriptive Statistik, Mittelwerte, Standardabweichungen sowie Minimum und Maximum-Werte in Klammern

3.1 Hypothesis-testing

To test the hypotheses, in each study, an ANCOVA with planned contrasts was conducted for the dependent variable production outcome, start-up mistakes and monitoring measured at retention assessment (RA) and covariate production outcome measured at initial training. As Study 1 showed significant differences in retentivity, this was also considered as a covariate.

3.1.1 Study 1—Fixed sequence

Production outcome: The analysis regarding production outcome showed a significant difference between groups (F(3,71) = 3.27, p = 0.026, η2p = 0.13; Fig. 4). The planned contrasts for analyzing the difference between refresher interventions and control group showed that the refresher interventions supported production outcome significantly higher than no intervention (hypothesis 1: p = 0.029). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed no difference between the two groups (hypothesis 2: p = 0.079). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention also revealed no difference (hypothesis 3: p = 0.900).

Fig. 4 Abb. 4
figure 4

Production outcome, start-up mistakes and monitoring performance in study 1. IT Initial Training, RA Retention Assessment, P Practice, ST Skill Test, SR Symbolic Rehearsal, CG Control Group

Produktionsleistung, Anfahrfehler und Leistung in der Überwachungsaufgabe in Studie 1

Start-up mistakes: The analysis of start-up mistakes showed a significant difference between groups (F(3,71) = 5.24, p = 0.003, η2p = 0.18; Fig. 4). The planned contrasts for analyzing the difference between refresher interventions and control group showed that refresher interventions supported the production outcome significantly higher than no intervention (hypothesis 1: p = 0.001). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed no difference between the two groups (p = 0.138).

The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed no difference (hypothesis 3: p = 0.872).

Monitoring: The analysis of monitoring showed a significant difference between groups (F(3,70) = 4.56, p = 0.006, η2p = 0.16; Fig. 4). The planned contrasts for analyzing the difference between refresher interventions and control group showed that refresher interventions supported the production outcome significantly higher than no intervention (hypothesis 1: p = 0.004). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed no difference between the two groups (hypothesis 2: p = 0.238). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention also showed no difference (hypothesis 3: p = 0.472).

In conclusion, the results of Study 1 indicate that a refresher intervention supports performance better than no intervention. However, Practice and Skill Test refresher interventions support performance equally well. There was no difference in performance between the Skill Test refresher intervention and the Symbolic Rehearsal refresher intervention.

3.1.2 Study 2—Contingent sequence.

Production outcome: The analysis regarding production outcome showed a significant difference between groups (F(3,67) = 5.09, p=0.003, η2p = 0.18; Fig. 5). The planned contrasts for analyzing the difference between refresher interventions and control group showed no superior performance of refresher interventions (hypothesis 1: p = 0.074). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed no difference between the two groups (hypothesis 2: p = 0.247). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed a significantly superior performance of the Skill Test refresher intervention (hypothesis 3: p = 0.001).

Fig. 5 Abb. 5
figure 5

Production outcome, start-up mistakes and monitoring performance in study 2. IT Initial Training, RA Retention Assessment, P Pratice, ST Skill Test, SR Symbolic Rehearsal, CG Control Group

Produktionsleistung, Anfahrfehler und Leistung in der Überwachungsaufgabe in Studie 2

Start-up mistakes: The analysis of start-up mistakes showed a significant difference between groups (F(3,67) = 4.25, p=0.008, η2p = 0.16; Fig. 5). The planned contrasts for analyzing the difference between refresher interventions and control group showed no superior performance of refresher interventions (hypothesis 1: p = 0.085). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed a significantly superior performance of the Practice refresher intervention (hypothesis 2: p = 0.007). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed no difference (hypothesis 3: p = 0.900).

Monitoring: The analysis of monitoring showed a significant difference between groups (F(3,70) = 4.80, p = 0.004, η2p = 0.17; Fig. 5). The planned contrasts for analyzing the difference between refresher interventions and control group showed that a refresher intervention supports production outcome significantly higher than no intervention (hypothesis 1: p = 0.028). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed no difference between the two groups (hypothesis 2: p = 0.100). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed a significantly superior performance of the Skill Test refresher intervention (hypothesis 3: p = 0.004).

The results of Study 2 demonstrate that refresher interventions support the performance of the secondary task (monitoring) better than no intervention. However, the refresher interventions did not show a superior performance with regard to production outcome and start-up mistakes. Additionally, the Practice and Skill Test refresher interventions supported performance equally well in terms of production outcome and monitoring, but Practice supported the performance of start-up mistakes better. A comparison of the Symbolic Rehearsal refresher intervention and the Skill Test refresher intervention revealed a superior effect of the latter for production outcome and monitoring.

3.1.3 Study 3—Parallel sequence.

Production outcome: The analysis regarding production outcome showed a significant interaction of time and group (F(3,68) = 12.10, p < 0.001, η2p = 0.35; Fig. 6). The planned contrasts for analyzing the difference between refresher interventions and control group showed that a refresher intervention supports production outcome significantly higher than no intervention (hypothesis 1: p< 0.001). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed a significantly superior performance of the Practice refresher intervention (hypothesis 2: p < 0.001). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed no difference (hypothesis 3: p = 0.858).

Fig. 6 Abb. 6
figure 6

Production outcome, start-up mistakes and monitoring performance in study 3. IT Initial Training, RA Retention Assessment, P Practice, ST Skill Test, SR Symbolic Rehearsal, CG Control Group

Produktionsleistung, Anfahrfehler und Leistung in der Überwachungsaufgabe in Studie 3

Start-up mistakes: The analysis regarding production outcome showed a significant interaction of time and group (F(3,68) = 4.80, p= 0.004, η2p = 0.18; Fig. 6). The planned contrasts for analyzing the difference between refresher interventions and control group showed that a refresher intervention supports production outcome significantly higher than no intervention (hypothesis 1: p = 0.001). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed no difference between the two groups (hypothesis 2: p = 0.082). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed no difference (hypothesis 3: p = 0.716).

Monitoring: The analysis regarding production outcome showed a significant interaction of time and group (F(3,68) = 4.21, p= 0.009, n2p = 0.16; Fig. 6). The planned contrasts for analyzing the difference between refresher interventions and control group showed no difference between the two groups (hypothesis 1: p = 0.469). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed a significantly superior performance of the Practice refresher intervention (hypothesis 2: p = 0.028). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed no difference (hypothesis 3: p = 0.235).

In conclusion, the results of Study 3 suggest that refresher interventions support performance better with regard to production outcome and start-up mistakes. Additionally, the Practice refresher intervention supported performance better than the Skill Test refresher intervention with respect to production outcome and monitoring; the two groups performed equally well regarding start-up mistakes. The comparison of the Skill Test refresher intervention and the Symbolic Rehearsal refresher intervention revealed a superior effect of the former with regard to production outcome and monitoring.

3.1.4 Comparison refresher interventions independent of task

Finally, the data from all three studies were combined and the groups were compared. As production outcome requirements, start-up time and monitoring differed between the three studies, the only measure which permitted a comparison between all studies was the percentage of start-up mistakes. The analysis regarding production outcome showed a significant interaction of time and group (F(3,218) = 12.54, p< 0.001, η2p = 0.15; Fig. 7). The planned contrasts for analyzing the difference between refresher interventions and control group showed that a refresher intervention supports production outcome significantly higher than no intervention (hypothesis 1: p < 0.001). The planned contrasts for analyzing the difference between the Practice and Skill Test refresher intervention showed a significantly superior performance of the Practice refresher intervention (hypothesis 2: p = 0.001). The planned contrast for analyzing the difference between the Skill Test and Symbolic Rehearsal refresher intervention showed no difference (hypothesis 3: p = 0.790).

Fig. 7 Abb. 7
figure 7

Comparison of Refresher-Intervention independent of task regarding start up mistakes. IT Initial Training, RA Retention Assessment, P Practice, ST Skill Test, SR Symbolic Rehearsal, CG Control Group

Vergleich von Refresher-Interventionen unabhängig vom Aufgabentyp

This indicates that each refresher intervention supports performance with regard to start-up mistakes better than no intervention. Moreover, a Practice refresher intervention supports an error-free performance better than a Skill Test refresher intervention. However, no superior effect of Practice and Skill Test was found compared to Symbolic Rehearsal.

The results of the three studies and the overall analysis are summarized in Table 6, ordered according to the hypotheses.

Table 6 Tab. 6 Summary of study results for the hypothesesZusammenfassung der Ergebnisse bezogen auf die Hypothesen

4 Discussion

The aim of the present paper was to find the best way to support skill retention. We therefore compared the effect of a Practice, Skill Test and Symbolic Rehearsal refresher intervention in a fixed-, a contingent-, and a parallel-sequence task. All three tasks consisted of a main task (production outcome, start-up mistakes) and a secondary task (monitoring). To summarize, we found the following: Hypothesis 1: Refresher interventions were more effective than no intervention in fixed- and parallel-sequence tasks with regard to the main task, but refresher interventions were not superior to no intervention in the contingent-sequence task. With regard to the secondary task, refresher interventions supported performance in the fixed- and the contingent-sequence task. The overall comparison suggests that refresher interventions are better able to support performance of the main task compared to no intervention. Hypothesis 2: When analyzing the suggested superior effect of a Practice refresher intervention over a Skill Test refresher intervention, a superior performance of the Practice refresher intervention was only found for the performance of start-up mistakes in the contingent-sequence task and for the performance of production outcome and monitoring in the parallel-sequence task. The performance in the fixed-sequence task was supported equally well by the Practice and the Skill Test refresher intervention, as was the performance regarding production outcome and monitoring in the contingent-sequence task, and the performance regarding start-up mistakes in the parallel-sequence task.

The overall comparison indicates that the Practice refresher intervention supported an error-free performance better than the Skill Test refresher intervention. In discussions with colleagues from the field of instructional psychology (Frank et al. 2016), we also heard arguments that the testing effect seems more likely to occur with static learning material, and less likely to occur when refreshing dynamic tasks, as is required in the operation of WaTrSim. Hypothesis 3: Moreover, the comparison of the Skill Test and the Symbolic Rehearsal refresher intervention indicated that the former did not support the performance better than the latter in the fixed-sequence task. Nevertheless, in the contingent-sequence task, the Skill Test refresher intervention supported the performance better in both the main and secondary task. However, the overall comparison showed no superior effect of the Skill Test over the Symbolic Rehearsal refresher intervention or the Practice refresher as is assumed by scholars promoting the testing effect. That means the refresher interventions in general were shown to be important, but the impact and effectiveness of a specific refresher instruction varies throughout the concrete task, and its cognitive demands. E. g. the three sequences differed with respect to length and attention management (e. g. attention sharing in the parallel task). The more the task required the remembrance of steps (in the contingent sequence), the more effective was Skill testing. The more important the aspect of attention management (in the parallel sequence) the more effective was practice, as only further practice enables the operator to practice the attention sharing demands. In conclusion further research should address the multiple cognitive and information processing task demands and their unique emphasis in relation to successful task execution. A cognitive task analysis with an emphasis on memory requirements of sub tasks is assumed to be a promising basis for designing refresher intervention. In that respect, combinations of refresher interventions and their distribution over time can compensate for the shortcomings of a single instruction applied alone.

4.1 Limitations

Although it may be criticized that the results of three different studies were analyzed to compare the effect of the developed refresher interventions, it should be noted that the study procedures and designs are comparable and differ only with respect to the task types. Nevertheless, it would be valuable to conduct a study with extended retention intervals in order to create a more realistic setting.

So far, the results are limited to a prototypical cognitive task which is found in process control and is skill- and ruled-based. Although sequences are also found in many other occupations, e. g. aviation, shipping and many other commercial or military fields, further evidence and empirical work is needed to find the one best way to support skill retention, if indeed there is one. Moreover, for the time being, the generalizability of our results is limited to periods of weeks of non-use, and we cannot transfer our findings to tasks which comprise months or years of retention intervals. Finally, our participants were all novices. It is assumed that ongoing or continuing job experience attenuates skill loss at least a little. However, job experience does not fully attenuate skill loss (Casner et al. 2013; Kluge et al. 2014), as it is known that time on task is more important than tenure.

4.2 Implications

Implications for further research, but also for practical applications, can be derived from the limitations: Future research should include different tasks, longer periods of non-use and experienced operators. Further challenges are the possible interaction between concrete on-the-job experiences in daily operations and skill retention, the interaction between person-related variables, on-the-job experiences and skill retention, as well as reasonable combinations and sequences of refresher interventions. New measurement and intervention methods, such as experience sampling methods implemented on mobile devices, could be used to track the actually performed on-the-job tasks of operators over longer periods of time and relate them to target performance measures which require skill retention.

As skill retention might be a lifelong challenge, evidence-based rules for combining and alternating refresher interventions should be investigated. These might also be implemented, for example, with the help of computer-based methods or mobile devices.

In this respect, our three studies made a first attempt to compare refresher intervention effects systematically, but the many different work settings which were not addressed in the present work leave plenty of scope for more research to support lifelong skill retention.

5 Conclusion

To summarize, there is no one best way to support skill retention, but it appears that a skill test refresher intervention might be the most worthwhile: Although it is not superior to practice, it does seem to be more efficient in some cases (in the more difficult contingent-sequence task), as it requires less time to apply. Nevertheless, if companies are keen to use skill tests, they should keep in mind that they might come at a cost, for instance more start-up mistakes and possibly a higher workload.

In conclusion, refresher interventions and the issue of skill retention should receive as much attention as initial training and learning processes in vocational and occupational settings. We do not know as much about skill retention as we know about training for instance summarized by Bell et al. (2017). Especially for organizations that expect lifelong remembering in combination with only infrequently required skills, for instance due to a high level of automation in production, demands of worker flexibility and a high task variety, or long periods of non-use. Efforts for further research can support evidence-based decisions on the design of refresher interventions that fit the organizations’ and task needs’ in terms of effectiveness and efficiency best.