Learning a skill to the point at which it can be completed independently without prompting from the instructor involves the transfer of stimulus control from the instructor’s prompts to the less identifiable “practice cues” resulting from the subject’s own behavior of frequently repeating the same response pattern (Reid, Nill, & Getz, 2010). When the individual or animal can complete the task without external guidance, the mastered skill is termed autonomous to indicate unassisted competence. Autonomy is desired in nearly all skill learning, so it is important to know how it occurs (Casey, 2009; Reid, Folks, & Hardy, 2014).

Mastering the motor aspects of a skill is not sufficient—one still needs to know when and in which situations one should carry out the task. Control by these situational variables is necessary for the skill to be used in functional ways. Autonomy and proper stimulus control are often problems for developmentally disabled individuals who may depend excessively on the instructor’s prompts, or who may engage in the behavior at inappropriate times or places. As a result, the ways in which stimulus control can be transferred appropriately has stimulated extensive applied and basic research as well as the development of useful procedures and comparisons between them. However, the understanding of how transfer actually occurs lags behind.

In a classic study, Touchette (1971) measured the transfer of stimulus control in three boys with intellectual disabilities by progressively delaying the prompt. He was able to identify the trial in which the participant first responded before the delayed prompt would have occurred. This procedure involved a delay timer, and each reinforced trial increased the delay in the next trial by 0.5 s and errors decreased the delay by 0.5 s. A series of successful prompted trials produced a delay of sufficient duration to “encourage” the participant to respond before the delay was over. This procedure has become a widely used prompting procedure in applied behavior analysis to help children with autism spectrum disorder (ASD) or other disabilities to produce appropriate behavior in everyday situations. The many published studies using this procedure have given it various names: time-delay fading, the delayed-cue procedure, the delayed prompting procedure, the progressive prompt delay (PPD) procedure, the progressive time delay (PTD) procedure, and its close relative, the constant time delay (CTD) procedure.

Unfortunately, the procedure does not always work (Glat, Gould, Stoddard, & Sidman, 1994; Oppenheimer, Saunders, & Spradlin, 1993; Touchette & Howard, 1984), and sometimes it has the harmful effect of creating prompt dependence (Fisher, Kodak, & Moore, 2007; MacDuff, Krantz, & McClannahan, 2001). Researchers disagree about the controlling factors and the characteristics of prompts that usually make the procedure successful (Brown & Rilling, 1975; Glat et al., 1994; MacDonall & Marcucella, 1976; Touchette, 1971; Touchette & Howard, 1984). Improved understanding of these controlling factors is the primary objective of this study. For example, if it is true that the procedure produces delays to “encourage” the participant to respond before the delay is over, then what role does the actual delayed prompt have? This study asked whether it matters whether an actual delayed prompt is presented or not.

Most basic research investigating how transfer of stimulus control works involves transfer from one well-defined conditional discrimination to another well-defined discrimination. An example is the transfer of a color discrimination to a line-orientation discrimination. Although this is an excellent experimental design, the goal of applied research is often to help children with disabilities to acquire autonomy of a more complex behavior chain (a skill) that had been learned correctly through shaping and prompting techniques. Autonomy in this case implies the ability to complete the chain “at the proper time and situation,” yet without the instructor’s prompt. The new controlling stimuli are not as well defined as the conditional discriminations in the controlled experiment. They are commonly described in such terms as “the relevant natural stimuli,” “task-related stimuli,” “task-intrinsic stimuli,” and “contextual cues.” They are analogous to the less identifiable “practice cues” produced by the subject’s own behavior of repeating the same response pattern, mentioned above.

Researchers have recently used a different methodology to study guided skill learning and the effectiveness of cues on the development of autonomy in rats and pigeons (Alonso-Orozco, Martínez-Sánchez, & Bachá-Méndez, 2014; Bachá-Méndez & Reid, 2006; Fox, Reid, & Kyonka, 2014; Reid et al., 2010; Reid et al., 2014; Reid, DeMarco, Smith, Fort, & Cousins, 2013). Of particular relevance for this study, the experiments in Reid, Rapport, and Le (2013) focused on the question of why some guiding cues are substantially more effective at controlling response sequences than others, even though the information provided by the guiding cues should, at face value, be the same. Why would animals be insensitive to highly predictive cues? They demonstrated that rats could learn a Left–Right (L–R) lever-press sequence quickly in a simple “Follow the Light” prompting condition in which the correct L–R lever-press sequence could be produced by “following” the illuminated panel lamp (S+) over the effective lever while the other lamp was extinguished (S-). On the other hand, simply reversing the order of illumination of the panel lights (“Reversed Lights”) produced a substantially less effective prompting condition. When the same L–R lever-press sequence required “following” an extinguished panel lamp (S+) while the other lamp was illuminated (S-), rats required about twice as long to learn the response sequence to criterion (26 sessions vs. 13 sessions).

Experiment 3 of Reid, Rapport, et al. (2013) compared the effectiveness of four patterns of guiding cues during acquisition of the L–R lever-press sequence in rats: the follow-lights condition, the reversed-lights condition, and two types of no-cue conditions: one with both panel lamps illuminated and one with both lamps extinguished. They found that acquisition was substantially faster in the follow-lights condition than in the reversed-lights condition. Acquisition in the reversed-lights condition was significantly faster than in either of the no-cue conditions, which demonstrated that the reversed-lights condition did serve as a beneficial guiding cue during acquisition, compared to conditions providing no differential cues (FL >> RL > (BL = NL)). Similarly, Experiment 4 compared acquisition of the L–R response sequence in the follow-lights versus reversed-lights conditions on the front versus rear walls. They found that acquisition in the follow-lights condition was faster on both the front and rear walls than it was in the reversed-lights condition.

These several studies demonstrate unequivocally that the follow-lights condition provides more effective stimulus control than does the reversed-lights condition during acquisition of the L–R response sequence. Researchers continue to debate, however, what causes this difference in effectiveness. Reid, Rapport, et al. (2013) compared both stimulus procedures in AB and ABA designs to test different explanations of why one guiding cue would be so much more effective than a similar guiding cue: sign tracking, feature-positive discrimination bias, spatial S-R compatibility and the Simon effect. They concluded that none of these explanations adequately explained existing data involving transfer of stimulus control, especially in ABA designs in which path dependence has a prominent effect. However, several explanations were compatible with acquisition data. Early explanations (Hearst, 1978, 1991; Jenkins & Sainsbury, 1969, 1970) provided compelling evidence that the presence versus absence of a stimulus feature (such as illuminated versus extinguished panel lights) is an important factor for stimulus control. It could be that the presence or absence of a stimulus is a more important factor affecting the effectiveness of guiding cues during the original acquisition of a skill than in multiphase transfer-of-control procedures, such as those involving the development of behavioral autonomy and the elimination of prompt dependence.

Also of particular relevance to this study, recent research with these same skills (the L–R response sequence) has demonstrated faster development of autonomy with the less effective guiding cue conditions in rats (Reid, DeMarco, et al., 2013), than with the more effective guiding cue conditions. Similarly, Reid et al. (2014) demonstrated that the more difficult skill in pigeons (a simultaneous chain) produced faster autonomy than an easier skill (a simple serial successive chain). They summarized these observations in the following way: Holding your child’s hand too much seems to slow his or her development of autonomy. The human literature also indicates that providing less guidance with a task can ultimately result in more robust autonomy (see review in Schmidt & Bjork, 1992). Although differences in terminology and procedures make the relevance of this research to nonhumans questionable, our ultimate goal is to understand prompt dependence in children with disabilities, where that research should be more relevant. We wanted to determine whether the effectiveness of the prompt would influence the development of L–R autonomy in the progressively delayed prompting procedure with rats.

This experiment used this recent guided-skills approach to further explore the controlling factors that allow a prompted behavior chain to become autonomous in Touchette’s (1971) delayed prompting procedure. We trained 20 rats to complete a left–right (L–R) lever-press sequence guided by panel lights (the prompt). Half of the subjects were randomly assigned to the follow-lights condition as described above, and half were assigned to the reversed-lights condition. Once response sequence accuracy was high and stable in a multiple baseline across-subjects design, all rats were exposed to an autonomy procedure consisting of four experimental groups in a 2 (follow lights vs. reversed lights) × 2 (delayed prompt vs. no prompt) factorial design. We manipulated (a) the effectiveness of the guiding lights prompt and (b) the presence or absence of a progressively delayed prompt. Thus, the autonomy procedure involved four groups: lights with delayed prompts (L–DP), reversed lights with delayed prompts (RL–DP), and the two control groups, lights with no prompts (L–NP) and reversed lights with no prompts (RL–NP). By manipulating the effectiveness of the prompt, we hoped to simulate situations in applied settings in which children may respond more quickly and accurately to one-word prompts than to more complex ones. By manipulating the presence or absence of the delayed prompt, we could assess whether the prompt is necessary or whether the delay itself is the motivating factor that leads to autonomy.

We predicted that subjects in the two groups that were trained in the reversed-lights condition would display greater autonomy than those trained in the follow-lights condition. We also predicted that L–R accuracy would be higher for prompted trials than for unprompted trials. We predicted that the groups that received delayed prompting would display higher L–R accuracy during unprompted trials (greater autonomy) than the control groups that never received prompts. Finally, reinforcement rate and trial duration were dependent variables controlled by the subjects in both prompted and unprompted conditions. Both variables have been claimed to be controlling factors that make delayed prompting procedures successful. Therefore, we explored the interactions between these variables to better understand how they contribute to autonomy in delayed prompting procedures.

Method

Subjects

Twenty naïve 4-month-old female Long Evans rats (Rattus norvegicus) were housed in individual polycarbonate cages in an animal facility that maintained constant temperature and humidity and a 12:12-h light:dark cycle. We maintained each subject at approximately 85 % of its free-feeding body weight by providing food (Tekland Rodent Diet) after each daily session in home cages. Water was freely available in the home cages. Only 18 subjects completed the study, as two subjects failed to achieve our accuracy criterion.

Apparatus

The experiment used four standard Med Associates modular test chambers for rats measuring 30 × 24 × 22 cm. Each chamber was located inside an isolation chamber containing a ventilation fan, a 7-W, 120-V nightlight, and a miniature TV camera on the ceiling. A sound generator produced constant white noise at approximately 65 dB. Each operant chamber contained two retractable levers on the front wall and two nonretractable levers on the rear wall. Each pair of levers was separated by 16.5 cm, center to center, and located 6 cm above the floor. The magazine hopper, 5 × 5 cm, was centered between the two response levers on the front wall, 3 cm above the floor. One round 28-V white stimulus lamp, 2.5 cm in diameter, was located 2.5 cm above each of the four levers, and a 28-V houselight (GE1819) was located at the center top of the rear wall. The pellet dispenser dispensed 45-mg Research Diet (Formula A/1) pellets. All four operant chambers were controlled by a single Dell personal computer (Pentium 4) located in an adjacent room and programmed in MED-PC IV, which controlled all of the experimental conditions and recorded every event and its time of occurrence with 10-ms resolution.

Procedure

Twenty naïve rats were randomly assigned to four experimental groups in a 2 (follow lights vs. reversed lights) × 2 (delayed prompt vs. no prompt) factorial design. This design involved four groups: (a) lights with delayed prompts (L–DP), (b) reversed lights with delayed prompts (RL–DP) and the two control groups (c) lights with no prompts (L–NP) and (d) reversed lights with no prompts (RL–NP). Our procedure consisted of training followed by the autonomy procedure. Training included shaping the lever-press response on the right lever on the front wall, followed by training on the rear wall, and finally training to complete a left–right (L–R) lever press sequence on the rear wall, all in the presence of the correct panel lights, until baseline accuracy was established. Once each subject met our accuracy and stability criteria for baseline L–R sequence accuracy in the multiple-baseline design, our autonomy procedure began, as described below.

Training

Shaping

We used a successive approximations procedure to train all rats to press the right lever on the front wall, adjacent to the hopper. For subjects assigned to groups involving the follow-lights condition (L–DP and L–NP), the panel lamp over that lever was illuminated (for the entire session) to indicate S+, and the panel lamps over the three (ineffective) levers remained off, indicating S−. Illumination of the four panel lamps was reversed for the two reversed-lights groups (RL–DP and RL–NP) such that the extinguished panel light indicated S+ and the three illuminated panel lights indicated S−. Lever-press training continued until subjects earned 45 pellets of food for two consecutive sessions.

Rear wall

Subsequently, each subject was exposed to a single session of Fixed Ratio-1 (FR-1) for pressing the right lever on the rear wall, while the levers on the front wall were retracted for the duration of the experiment. For subjects in the two follow-lights groups, the lamp over this lever was illuminated to indicate S+, and the lamp over the left lever was off to indicate S− (reversed for the two reversed-lights groups). This single session lasted the earlier of 45 min or until the subject received 45 food pellets. The purpose of this session was to ensure that all subjects were given approximately the same amount of exposure to the reinforcement conditions on the right-rear lever before the experiment proper began, given that subjects required varying amounts of lever-press training on the front wall.

Switch between levers

Subjects were then exposed to a discrete trials training condition that delivered a food pellet for each switch from left press to right press or vice versa, without regard to perseveration on a lever. A 50-ms tone accompanied pellet delivery. For subjects in the follow-lights groups, the two panel lights remained on, but a lever press briefly pulsed that light off (0.2 s) to indicate that the press was effective. For subjects in the reversed-lights groups, the two panel lights remained off, but a lever press briefly pulsed that light on (0.2 s). No time-outs occurred during this procedure. Every trial ended with pellet delivery, followed by a 1-s intertrial interval (ITI) in which the houselight was off and lever presses had no programmed consequences. This training procedure terminated when the subject earned all 45 pellets in three consecutive sessions.

Baseline

At the beginning of each discrete trial in the follow-lights condition, the houselight and the panel light above the left lever were turned on, while the panel light above the right lever was off. A press to either lever turned off the left panel lamp and illuminated the right lamp. A second lever press turned off the houselight and ended the trial, either delivering a food pellet followed by a 3-s ITI or beginning a 3-s time-out (TO). During ITIs and TOs the panel lights and the houselight were off (the nightlight in the isolation chamber continued to provide general illumination), and lever presses had no programmed consequences. Only L–R lever-press sequences produced food. No feedback about response accuracy was provided until two lever presses had been completed. The onset and offset of the panel lights during trials were exactly reversed for the reversed-lights condition. Sessions lasted for the earlier of 45 min or until 45 pellets were delivered.

Each rat was exposed to the follow-lights condition or to the reversed-lights condition until L–R sequence accuracy was high and appeared to asymptote over 75 % for five consecutive sessions with no increasing or decreasing trends. Percentage L–R sequence accuracy was calculated by totaling the number of trials in which the L–R lever-press sequence occurred (thus, ending with reinforcement) divided by the total number of trials in the session and multiplying by 100. The last five sessions of this condition for each rat represented its baseline L–R accuracy, and the rat was then exposed to the autonomy procedure.

Autonomy procedure

As described earlier, 20 rats had been randomly assigned to four experimental groups in a 2 (follow lights vs. reversed lights) × 2 (delayed prompt vs. no prompt) factorial design, which we call the autonomy procedure. In all conditions, a food pellet was provided for the completion of a left-right (L–R) lever-press sequence (the skill) guided by the differential illumination of panel lamps over the respective levers (the prompt). Half of the subjects were trained to complete this task in the follow-lights condition and half in the reversed-lights condition. These two stimulus conditions also defined the guiding cue prompts in the autonomy procedure exactly as described above in the baseline procedure: Groups L–DP and L–NP were exposed to the follow-lights condition, and groups RL–DP and RL–NP were exposed to the reversed-lights condition.

These stimulus conditions (defined by the order of illumination of two panel lamps) served as guiding cues, or prompts, which could be provided either at the beginning of a trial, delayed for some seconds, or eliminated altogether. Two groups of rats (L–DP and RL–DP) were exposed to the delayed prompts condition, which implemented a discrete-trials version of Touchette’s (1971) progressively delayed prompting procedure in which the contingent response was a L–R lever-press sequence rather than a single press, and erroneous sequences produced TO. The first trial of each session provided the guiding cues prompt without delay. However, trials containing a nonzero programmed delay began in a no-cues condition in which both panel lamps were illuminated until the delay timer timed out—the event that produced the guiding cue prompt. Each trial ending in reinforcement increased the programmed delay of the prompt in the next trial by 2 s, and each TO reduced its programmed delay by 2 s. Unprompted trials resulted when rats completed any 2-lever-press sequence before the programmed delay timer provided the prompt. Prompted trials resulted when rats completed a sequence after the prompt was provided. The consequences of completing a response sequence were the same whether it occurred before or after the programmed delay timer timed out. For both delay groups (L–DP and RL–DP), both panel lamps were illuminated as a no-cues condition at the beginning of trials containing programmed delays (delay > 0 s) until the prompt timer timed out (which initiated the prompt, changing the lights) or until a response sequence ended the trial. A goal of this condition was to “encourage” the rats to complete the sequence correctly before the prompt was provided.

Two other groups of rats (L–NP and RL–NP) were exposed to the no-prompt condition, which served as control groups for the delayed-prompts condition. The programmed delay timer worked in exactly the same manner as described above, but when the timer timed out, it never initiated a prompt or altered the panel lights. Both panel lamps were illuminated as a no-cues condition at the beginning of each trial, and they remained illuminated until a response sequence ended the trial. We distinguished each trial as a before trial or an after trial, depending upon whether the response sequence was completed before or after the programmed delay timer timed out. This control condition allowed us to separate the effects of providing the prompt from the potential effects of trial duration, which was controlled by each subject.

Our primary measures were the percentage accuracy of the L–R response sequence, the programmed delay, and obtained trial duration. Rats were exposed to this procedure for 12 daily sessions, and each session terminated after the delivery of 45 pellets or 45 min.

Results

Eighteen of the 20 subjects met our accuracy/stability criteria and completed the autonomy procedure. Two subjects, both from the RL–NP group, were dropped from the study because they failed to meet accuracy/stability criteria during baseline. As a result, this group consisted of three rats, whereas five rats were in each of the other groups.

Figure 1 shows the percentage L–R sequence accuracy for each of the four groups across the last five sessions of baseline and 12 sessions of the autonomy procedure. The dotted vertical line represents the transition from the baseline to the autonomy procedure. In each panel, overall accuracy is depicted as black filled circles. In the two left panels, open circles depict accuracy during unprompted trials, which was calculated by dividing the number of pellets obtained before the programmed delay timer timed out by the total number of unprompted trials in the session and converting to percentages. Accuracy during unprompted trials was one of our measures of autonomy. Similarly, gray filled circles depict accuracy during prompted trials, which was calculated by dividing the number of pellets obtained after the delayed prompt began by the total number of prompted trials in the session and converting to percentage. The two right panels depict groups which never received prompts. However, the programmed delay timer divided the trials into those in which the response sequence occurred before the timer timed out (corresponding to the unprompted trial durations in the left panels) or after the timer timed out (corresponding to the prompted trial durations in the left panels). Accuracy in before and after trials was calculated the same way as during unprompted and prompted trials, and they were also measures of autonomy given that prompts were never provided.

Fig. 1
figure 1

Percentage of L–R lever-press accuracy for the baseline and autonomy procedures for each of the four groups of rats. The dotted vertical line represents the transition from the last five baseline sessions and the 12 autonomy sessions. “Unprompted” and “Before” represent trials terminating before the delay timer timed out. “Prompted” and “After” represent trials terminating after the delay timer timed out. Error bars represent SEM

Comparison of L–R accuracy during unprompted versus prompted trials (left panels of Fig. 1) reveals that accuracy appeared systematically lower during prompted trials than during unprompted trials—a counterintuitive finding since prompts should improve accuracy rather than lower it. Similarly, the two groups represented in the right panels never received a prompt, but accuracy after the programmed delay timer timed out appeared lower than before it timed out. We carried out an omnibus mixed ANOVA comparing accuracy in these trial types for all groups across sessions. Accuracy after the timer timed out (in prompted trials or after trials with longer durations) was significantly lower than before the timer timed out (in unprompted trials or before trials with shorter duration), F(1, 19) = 31.204, p < .001, ηp 2 = .518. There was no effect of sessions and no Session × Group interaction.

We carried out a similar mixed ANOVA on each of the individual groups. Accuracy during unprompted trials in group L–DP (top left panel) was higher than during prompted trials, but the difference only approached statistical significance, F(1, 6) = 4.676, p = .074, ηp 2 = .438. Accuracy during unprompted trials in group RL–DP (bottom left panel) was significantly higher than during prompted trials, F(1, 8) = 5.791, p = .043, ηp 2 = .420. The groups depicted in the right panels of Fig. 1 served as controls for providing delayed prompts. Thus, no prompts were provided to subjects in the two groups depicted in the right panels, although the same programmed delay timer separated the trials into those ending before the timer timed out and after it timed out. In group L–NP (top right panel), accuracy during trials ending before the programmed delay timer timed out was significantly higher than during trials after it timed out, F(1, 5) = 24.672, p = .004, ηp 2 = .420. Similarly, in group RL–NP (bottom right panel), accuracy during trials ending before the programmed delay timer timed out was significantly higher than during trials after it timed out, F(1, 4) = 9.106, p = .039, ηp 2 = .695. Therefore, the difference in accuracy between these types of trials was consistent across groups, whether delayed prompts were provided (left panels) or not (right panels). Thus, the presence or absence of the prompt was not responsible for these observed differences in L–R accuracy. Although the two groups in the right panels never received a prompt during the autonomy procedure, the programmed delay timer separated shorter trial durations from longer trial durations using the same criteria that separated prompted from unprompted trials (left panels). Therefore, the differences in L–R accuracy in each group could be due to differences in trial duration, controlled by the subject, rather than an effect of providing prompts after a delay. We evaluate this hypothesis below.

We expected delayed prompts to improve L–R sequence accuracy across the 12 sessions of the autonomy procedure compared to the groups that did not receive prompts. For example, visual analysis of overall accuracy (filled black circles) in group L–DP (top left panel) seemed to indicate a slight increase across sessions, yet it looked fairly constant in group L–NP (top right panel). Therefore, we looked for a Session × Group interaction comparing the two follow-lights groups, and we separately compared the two reversed-lights groups. A mixed ANOVA comparing overall accuracy for group L–DP with that of group L–NP (top panels) showed no Session × Group interaction, F(11, 66) = 1.287, p = .252, ηp 2 = .177. Similarly, comparing overall accuracy for group RL–DP with that of group RL–NP (bottom panels) showed no Session × Group interaction, F(11, 66) = 1.093, p = .380, ηp 2 = .154. Therefore, the presence or absence of a prompt did not seem to affect the rate of learning to complete the L–R sequence across the sessions of the autonomy procedure for either follow-lights or reversed-lights groups.

Figure 1 displays another result worth note. Focusing on overall accuracy (black filled circles), we observed an immediate pronounced drop in L–R accuracy in the transition from the baseline to the autonomy procedures for both follow-lights groups (top panels), but this drop did not occur for either reversed-lights group (bottom panels). A chi-squared test measured the difference between the average accuracy in the five-session baseline and the first session of the autonomy procedure. Accuracy decreased significantly for the L–DP group, χ2(1) = 5.825, p = .016, and for the L–NP group, χ2(1) = 5.753, p = .016, but not for either of the reversed-lights groups (bottom panels), χ2(1) < 1, p > .37. This difference in follow-lights and reversed-lights conditions has been demonstrated before (e.g., Reid, DeMarco, et al., 2013; Reid, Rapport, et al., 2013) and may have resulted from the extended training required in the reversed-lights condition to fulfill our accuracy/stability criteria. The average number of sessions required to meet these criteria for the follow-lights condition (M = 22.7, SD = 3.23) was significantly less than the number required for the reversed-lights condition (M = 26.3, SD = 2.31), t(16) = 2.71, p = .007.

This difference in overall accuracy between the follow-lights and reversed-lights groups helps illustrate how the programmed delay timer was influenced by the different groups. Recall that the programmed delay was increased by 2 s following each pellet and was decreased by 2 s following each TO. Figure 2 displays the changes in the mean programmed delay across the first 50 trials of each session for the four groups. Both reversed-lights groups produced nearly linear programmed delay curves with greater slopes than either of the lights groups. This observation is consistent with, and probably caused by, the greater overall L–R accuracy observed with the reversed-lights groups (described above), leading to more 2-s increases in the programmed delay.

Fig. 2
figure 2

Symbols show how the mean programmed delay tended to increase across the first 50 trials of the sessions for all four groups in the autonomy procedure.

Figure 3 demonstrates that median obtained trial durations were fairly constant as each session progressed. That is, after the first trial, the median speed of completing the response sequences did not become noticeably faster or slower across the session for any group.

Fig. 3
figure 3

Median obtained trial duration of the first 50 trials each session in the autonomy procedure

We hypothesized above that the observed differences in L–R accuracy between prompted and unprompted trials across sessions in each group (Fig. 1) could be related to differences in trial duration (how much time elapsed before the response sequence was completed), rather than an effect of actually providing delayed prompts. Figure 4 shows the obtained relation between L–R accuracy and trial duration for each group. Surprisingly, L–R accuracy was higher in unprompted trials and in after trials (unfilled triangles) than in prompted trials or before trials (filled triangles) across nearly all trial duration bins. These curves were derived from two repeated measurements (prompted vs. unprompted accuracy) from the same subjects across each trial duration, so we compared the curves using a paired two-sample t test. Although the accuracy differences were not statistically significant for group L–DP, t(6) = 1.50, p = .185, r 2 = .273, they were statistically significant for all other groups: group L–NP: t(6) = 4.255, p = .0054, r 2 = .751; group RL–DP: t(6) = 5.758, p = .0012, r 2 = .847; and group RL–NP: t(6) = 5.159, p = .0021, r 2 = .816. Therefore, the observed differences in L–R accuracy between prompted and unprompted trials across sessions in each group (Fig. 1) cannot be explained by differences in trial duration. Also, these observed differences could not be due to presentation of a delayed prompt because when no prompt was provided (right panels), before trials also had greater accuracy than after trials. Even when prompted and unprompted trials (or before and after trials) shared the same duration, accuracy in unprompted (before) trials was higher.

Fig. 4
figure 4

L−R accuracy as a function of trial duration for all groups. Accuracy was calculated for each point as the number of pellets received in the category, divided by the number of trials that occurred in that category, and converted to percentages

Figure 4 identifies a critical feature of trial duration. In every group, overall accuracy (filled circles) decreased systematically as obtained trial duration increased. As a result, shorter trials were associated with greater accuracy, which would generate higher overall reinforcement rates for faster responding. Waiting for the prompt timer to time out (filled triangles) was associated with lower overall reinforcement rate by (a) decreasing accuracy (at any trial duration) and by (b) lengthening trials, even if no prompt was actually provided (e.g., right panels). Unfortunately, the direction of causality in these correlations is unknown. For example, accuracy could be reduced in long trials (perhaps due to distraction or working memory limitations), which would lower overall reinforcement rate even beyond the effect of the trial being long. Alternatively, higher reinforcement rates and higher accuracy during shorter trials could differentially reinforce faster responding.

We may more fully understand the negative relation between trial duration and overall accuracy, measured as a percentage, by separating the two measures used to calculate this percentage: number of pellets divided by number of trials, at each trial duration bin. For example, knowing the percentage of accuracy does not tell us how many pellets or trials contributed to that percentage, and these values could vary substantially across bins of trial durations. Figure 5 displays the number of trials per subject at each trial duration for the four groups. The highly skewed distributions contain many more short trials (3–6 s) than long trials. Also, there were many more short unprompted trials (and short before trials) than short prompted trials (or after trials). Short prompted trials were infrequent, as one would expect from a delayed prompting procedure, even when no actual prompt was provided. This is because high overall accuracy (60–80 %, cf. Fig. 1) would increase the values of the programmed delay timer to produce long programmed delays and fewer prompts.

Fig. 5
figure 5

The total number of trials per subject of each duration in each category. Overall trials (filled circles) represent the sum of unprompted and prompted trials (left panels) or before and after trials (right panels)

Figure 6 shows that the skewed distributions of the number of pellets per subject earned across trial durations closely reproduced the distributions of the number of trials per subject of each category (Fig. 5). That is, the number of earned pellets varied proportionally with the number of trials at each trial duration. When rats completed the response sequence quickly, producing short 3–6 s trials, most of those trials earned pellets rather than TO, and they mostly occurred before the programmed delay timer timed out—whether prompts were available or not. The ratio of the curves in these two figures (Pellets/Trials), of course, reproduces Fig. 4, in which overall L–R accuracy decreased across bins of increasing trial duration, yet the differences in accuracy between trial types were generally maintained.

Fig. 6
figure 6

The total number of reinforced trials (# pellets) per subject at each trial duration in each category. Overall pellets (filled circles) represent the sum of unprompted and prompted pellets (left panels) or before and after pellets (right panels)

Figure 7 summarizes many of the findings above by displaying strong interactions between trial type and the consequences of the trial (food, TO). An example may help clarify the graphs. Considering the bottom left panel, the proportion of total trials is divided into the four types: prompted trials followed by food, prompted trials followed by TO, unprompted trials followed by TO, and unprompted trials followed by food. This last category represents trials that were completed successfully and autonomously. As proportions, these values sum to 1.0 in each panel. Each panel demonstrates that about 20 % of all trials were prompted or occurred after the timer timed out, and food delivery and timeout occurred about equally often in these trials (overlapping symbols imply accuracy of about 50 %). The proportion of TOs was not higher for the more frequent unprompted (or before) trials, with TOs remaining at approximately 20 % for all groups. However, the proportion of these unprompted trials ending in food delivery (autonomous successes) was considerably higher, particularly for the two reversed-lights groups (bottom panels) and reflects their higher overall L–R accuracy displayed in Fig. 1.

Fig. 7
figure 7

Proportion of trials for each trial type. The four points in each panel represent the two possible consequences (food or timeout) for the two trial types identified on the abscissa. In each panel, the four proportions sum to 1.0. Error bars represent SEM

Figure 8 compares the total number of earned pellets per session for the two trial types, averaged across each rat for the four groups. Independent t tests demonstrated that overall reinforcement per session was significantly greater for rats in unprompted trials than in prompted trials for the L−DP group (unprompted: M = 30.7, SD = 1.36; prompted: M = 13.6, SD = 1.28), t(116) = 9.148, p < .001, and for the RL−DP group (unprompted: M = 32.8, SD = 1.25; prompted: M = 12.1, SD = 1.25), t(112) = 11.73, p < .001. Similarly, overall reinforcement per session was significantly greater for rats in before trials than in after trials for the L−NP group (before: M = 26.8, SD = 1.98; after: M = 15.42, SD = 1.61), t(112) = 4.44, p < .001, and for the RL−NP group (before: M = 38.1, SD = 0.86; after: M = 5.93, SD = 0.71), t(69) = 13.582, p < .001.

Fig. 8
figure 8

Compares the total number of earned pellets per session, averaged across each subject, for the two trial types in each group. Error bars represent SEM. *** represents statistical significance at α = .001

Discussion

The purpose of this experiment was to explore the controlling factors that allow a prompted behavior chain to become autonomous in Touchette’s (1971) delayed prompting procedure, using a recently developed methodology for studying guided skill learning in rats (e.g., Reid et al. 2010; Reid, DeMarco, et al., 2013; Reid, Rapport, et al., 2013). Using a 2 (follow lights vs. reversed lights) × 2 (delayed prompt vs. no prompt) between-group factorial design, we first asked how the effectiveness of the stimulus prompts would influence the development of autonomy. We also asked whether actually providing a delayed prompt was necessary for this delayed prompting procedure to produce autonomy, or whether the success of the procedure depended upon the delay rather than the pending prompt. We explored the roles of trial duration and reinforcement rates as possible controlling factors.

Figure 1 shows that the effectiveness of the stimulus prompts did, indeed, affect the development of autonomy, but in a way that may seem counterintuitive. We measured autonomy as percentage L–R sequence accuracy in unprompted trials, which occurred before programmed delay timer timing out to initiate the prompt (this measure was called “anticipations” in Glat et al., 1994, and others). Although the more effective follow-lights condition produced faster acquisition of the L–R sequence than did the less effective reversed-lights condition during baseline training, autonomy was higher for both of the reversed-lights groups than for either of the follow-lights groups. This indicates that providing less effective guidance can ultimately result in more robust autonomy. This counterintuitive observation was demonstrated in a similar guided skills study with rats (Reid, DeMarco, et al., 2013). It has also been observed and prominently discussed in the human literature concerning the role of practice on skill learning (see review by Schmidt & Bjork, 1992). Although the delayed prompting procedure is normally used with humans, this study used it with rats. It is gratifying to observe that the development of skill autonomy in rats and humans may depend similarly on the degree of effectiveness of stimulus control by prompts.

The progressively delayed prompt procedure is often assumed to provide the delays to motivate the participant to respond before the delay is over. This study asked whether it matters whether an actual delayed prompt is presented or not; that is, could behavioral autonomy develop in the absence of those prompts? Contrary to our prediction, Fig. 1 shows that the presence and absence of a delayed prompt produced nearly identical levels of autonomy—the prompt was not necessary for its development. Overall L–R accuracy in both prompted and unprompted trials (left panels) was closely duplicated in the corresponding groups in which no prompt was actually provided (after and before trials, respectively, right panels). Even the differences between prompted vs unprompted trials within each left panel were maintained in the corresponding right panel.

Because the presence and absence of the delayed prompt appeared to produce the same results, how can we be confident that we have not made a Type II error? Our confidence is strengthened substantially by the fact that several different dependent variables, not just one, showed the same identical results. The presence and absence of the delayed prompt produced nearly identical effects on median trial duration (Fig. 3), the shapes of the distributions relating number of trials per subject to trial duration (Fig. 5), and the distributions relating number of pellets per subject to trial duration (Fig. 6). Nevertheless, groups were significantly affected by the degree of effective stimulus control by the prompts (follow lights vs. reversed lights), as described above. We conclude with confidence that the ability of the progressively delayed prompt procedure to produce behavioral autonomy depended upon characteristics of the obtained delay (trial duration) rather than on the pending prompt itself.

We predicted that accuracy in prompted trials would be greater than in unprompted trials. However, Fig. 1 shows that the opposite was true in both groups that received prompts (left panels). Accuracy was reliably higher in unprompted trials. This difference was maintained even in the two groups that received no prompts (right panels). We consider two potential explanations for this difference: (a) providing a delayed prompt (changing stimulus conditions from both panel lights on to only one light on) could have “confused” the subjects. If so, then we would expect accuracy in prompted trials (Fig. 1, left panels) to be lower than accuracy in after trials (right panels). However, a decrement due to presenting a prompt was not observed, so we reject this explanation. (b) Prompted and unprompted trials could differ in their durations. We suspected that differences in trial duration might be responsible for differences in accuracy within all four groups. We explored this hypothesis in Fig. 4 by measuring accuracy as a function of obtained trial duration. However, accuracy in unprompted and before trials was higher than in prompted and after trials across nearly all trial durations. The difference in accuracy between trial types was not a direct result of trial duration, although overall accuracy did decrease systematically as trial duration increased for all groups. Figure 4 shows that trial duration was important: The combination of shorter trials and their greater accuracy would be correlated with higher overall reinforcement rates for faster responding. This would be expected if rats completed their response sequence before the delay timer timed out (unprompted trials). Waiting for delayed prompts (even if no prompt was actually provided) would be associated with lower overall reinforcement rate by lengthening trials and by decreasing accuracy.

This study was not designed to experimentally control reinforcement rate or trial duration, so we are unable to identify the direction of causality between these variables. Consider reinforcement rate as the controlling factor: It could be that higher reinforcement rates and higher accuracy during shorter trials could differentially reinforce faster responding. This claim was proposed by Touchette and Howard (1984), who argued that reinforcement density per unit of time is a critical variable in producing and maintaining stimulus control in delayed prompting procedures (see also Brown & Rilling, 1975, which extended to secondary reinforcement).

Alternatively, consider temporal variables such as trial duration as controlling factors. Long trials not only reduced reinforcement rate (as described above) but also substantially reduced overall L–R accuracy. It is interesting to consider why this would be true. Recall that the operant task was the correct completion of a left and then right lever-press sequence; all other sequences ended in TO. No feedback was provided with regard to response accuracy until two presses had occurred, and subjects had to remember their order within each trial—a working memory task. Informal observations indicated that long trial durations were often associated with changes in behavior such as grooming, exploration, or freezing due to distracting noises, indicating motivational control by other behavior systems (Timberlake, 1983, 1993, 2001; Timberlake & Lucas, 1989)—the rats were no longer “on task.” These changes are likely to have interfered with memory processes related to the ability to complete the sequence correctly.

The influence of temporal variables such as the delay interval was identified by MacDonall and Marcucella (1976) using rats in a similar progressively delayed prompt procedure with a conditional discrimination requiring a single lever press in each trial. They argued that reinforcing responding during the delay interval would reinforce both (a) responding and (b) responding at that particular delay interval. Shorter response latencies would be differentially reinforced, leading to skewed distributions of latencies (which would be equivalent to trial durations in our study), just as we observed in Fig. 5.

We selected the follow-lights and the reversed-lights conditions because prior research had indicated that these stimulus conditions differ in their effectiveness during acquisition and in the development of autonomy of the L−R sequence (e.g., Fox et al., 2014; Reid et al., 2010; Reid, Rapport, et al., 2013; Reid, DeMarco, et al., 2013; Reid et al., 2014). Some of this research was described in the introduction. Each of the changes in overall accuracy depicted in Fig. 1 was consistent with this prior research and supports our claim that the two conditions differ in their effectiveness as stimulus control conditions. However, other explanations of the current data are possible. For example, we observed a pronounced drop in accuracy when the two follow-lights groups were exposed to the autonomy procedure, but this drop was not observed with either reversed-lights groups. We concluded that the less effective reversed-lights condition (less effective because it required more training) was better in leading to autonomy, consistent with the conclusions of Reid, DeMarco, et al. (2013) and research with humans (e.g., Schmidt & Bjork, 1992). An insightful anonymous reviewer proposed that perhaps the reversed-lights groups never learned to follow the panel lights as a prompt. Perhaps they ignored the panel lights and learned the L−R lever-press sequence without those guiding cues. This experiment did not contain a test to ensure that these rats were actually attending to the panel lights. This could explain why there was no difference between the RL–DP and RL–NP groups, because the absence of an unnoticed prompt should have no effect. However, this explanation could not explain why there was no difference between the L–DP and L–NP groups, in which (a) their faster acquisition during baseline and (b) their drop in accuracy when exposed to the autonomy procedure, provided clear evidence of stimulus control by the panel lights. This explanation claims that the reversed-lights condition is not effective as a guiding cues condition; thus, acquisition rates during the reversed-lights condition should be equal to an unguided “no-cues” condition. Experiment 3 of Reid, Rapport, et al. (2013) tested this claim directly. They found that acquisition of the L−R sequence in the reversed-lights condition occurred slower than in the follow-lights condition, but significantly faster than in two no-cues conditions (both panel lights on and both panel lights off, which produced equivalent acquisition rates).

A second alternative interpretation of our results is the possibility that our choice of illuminating both panel lamps (“both-lights”) as our no-cues condition may have facilitated autonomy in the reversed-lights groups relative to the follow-lights groups. Perhaps, then, if we had presented both lights off (“no-lights”) as our no-cues condition, we would have observed greater autonomy in the follow-lights groups instead. This is because cue tracking may have been generalized to the illuminated panel lights in follow-lights trained rats, but no such generalization would have been expected from reversed-lights trained rats. Interestingly, most guiding-cues experiments with rats appear to have matched follow-lights conditions with no-lights conditions, and reversed-lights conditions with both-lights conditions, but the rationale for these associations is questionable. To our knowledge, the only autonomy study that has provided no-light and both-light options is the experiment by Reid, DeMarco, et al. (2013), but luckily their results directly test this prediction. The experiment used a 2 (condition: follow lights, reversed lights) × 2 (probe type: both lights, no lights) factorial design; thus, both no-cues conditions were paired with both guiding cues conditions. They separately measured the development of stimulus control by panel lights on guiding-cues trials and the development of stimulus control by practice cues (autonomy) in no-cue probe trials within the same session. Greater autonomy developed in both reversed-lights groups than in the follow-lights groups. In the two groups containing no-light conditions (follow-lights with no-light probes and reversed-lights with no-light probes), only the reversed-light group demonstrated autonomy. In a subsequent phase that compared L−R accuracy levels in reversed-lights versus follow-lights conditions when all trials each session were composed of these probe conditions, accuracy (autonomy) in the reversed-lights trained rats was significantly higher than in the follow-lights trained rats. Therefore, the only experiment that addresses this alternative interpretation provides direct evidence against this interpretation. Our choice of illuminating both panel lamps as our no-cues condition was not responsible for greater autonomy in the reversed-lights groups relative to the follow-lights groups. Greater autonomy is observed in reversed-lights groups even when the no-light condition is used.

This experiment involved novel features for delayed prompt research: Rather than require a simple operant response, we required a fixed behavior chain requiring practice because sequence errors are common; and, we asked for a complex transfer of stimulus control from guiding cues to developing practice cues to produce skill autonomy. Early interest in the delayed prompting procedure was based, in part, on the possibility that it would lead to errorless transfer of stimulus control (Terrace, 1963a, 1963b), as Touchette (1971) observed with some of his participants. Errorless learning may not be a goal (and may not even be possible) when more complex discriminations, complex operants, and complex transfer are involved. However, these features may have more ecological relevance to teaching everyday skills to children with disabilities.

It is interesting to ask why we observed only about 20 % prompted trials in our four groups, as indicated by Figs. 5 and 7. Prompted trials occurred when the subject completed the L–R sequence after the programmed delay timer had timed out. Of course, the purpose of the progressively delayed prompting procedure is to encourage subjects to respond early, during the unprompted period, so that responding becomes autonomous. Thus, low numbers of prompted trials may be an indicator that the procedure is successful. At the same time, the number of prompted trials depended upon the programmed value in the delay timer, which was incremented by 2 s following reinforced trials and decremented by 2 s following TO. An increase in this 2-s programmed increment would result in accumulations yielding longer programmed delays (see the slopes in Fig. 2) and fewer prompted trials; whereas a decrease in its size would probably yield more prompted trials (depending upon the speed and accuracy of responding). We hope future research will manipulate this programmed value (say, from 0.5 to 1.5 s) to discover whether the low number of prompted trials we observed is a general feature of the progressively delayed prompting procedure, or whether it was somehow unique to our choice of programming a 2-s delay. This research might help us understand when this procedure will be effective or not, and when it might be harmful by encouraging prompt dependence (Fisher et al., 2007; Glat et al., 1994; MacDuff et al., 2001; Oppenheimer et al., 1993; Touchette & Howard, 1984).

To conclude, this study showed that the less effective reversed-lights condition produced greater autonomy than the more effective follow-lights condition. This was another example whereby providing less effective guidance can ultimately result in more robust autonomy. The ability of the progressively delayed prompt procedure to produce behavioral autonomy depended upon characteristics of the obtained delay (trial duration) rather than on the pending prompt itself. Overall accuracy decreased systematically as trial duration increased for all groups. The interacting effects of trial duration and reinforcement rate could not be separated into independent causal factors with our experimental design, but their interaction influenced most of our measures in ways that only future research can separate. For example, was the significantly higher proportion of reinforced unprompted (and before) trials because these trials were shorter in duration, or was it because they were reinforced more often? Shorter trials and their greater accuracy were correlated with higher overall reinforcement rates for faster responding. Waiting for delayed prompts (even if no actual prompt was provided) was associated with lower overall reinforcement rate by decreasing accuracy and by lengthening trials. These findings extend results from previous studies regarding the controlling factors in delayed prompting procedures applied to children with disabilities.