Participants and Setting
Three male students with English as their first language participated in the study. Alfred was 9 years and 6 months old; Nick was 8 years and 8 months old; and Gavin was 9 years and 5 months old. Pseudonyms were used. All students had a diagnosis of an autism spectrum disorder (ASD) and were issued an Educational Health Care Plan that provided information about their diagnosis, level of ability, and needs. Alfred and Nick were Caucasian, and Gavin was mixed race.
Participants’ performance on two mathematical assessments highlighted that they were of different mathematical abilities (see Table 1). Alfred’s score on the Test of Early Mathematics Ability-3 (TEMA-3; Ginsburg and Baroody 2003) suggests that he had mastered basic addition and subtraction, but not basic multiplication (division is not assessed in TEMA-3). What is more, his performance on the Test of Mathematical Abilities-3rd Edition (TOMA-3; Brown et al. 2013) was below average. Nick’s performance on TEMA-3 suggests that he had mastered basic addition, subtraction, and multiplication, while his performance on TOMA-3 was average. Finally, Gavin’s performance on TEMA-3 also suggests that he had mastered basic addition, subtraction, and multiplication, while his performance on TOMA-3 was above average. Before the study commenced, participants had received instruction in all basic mathematical procedures, including addition, subtraction, multiplication, and division, as part of their educational provision. They were reported, by their teacher, as fluent with the easier multiplication tables, but not with the more complex ones or with the division tables.
The study was conducted at the participants’ school in England, providing special education services for students aged 3–19 years. The curriculum includes self-help, vocational, social, and academic skills. Sessions took place in a 3 × 3 m room, equipped with a camera, a desk, two chairs, and two storage cupboards with all the necessary resources.
For inclusion in the study, students needed to have (a) a diagnosis of ASD, (b) completed at least 50/72 items of the TEMA-3, (c) participated in at least one week of formal lessons on multiplication and division, (d) not exhibited challenging behavior that would hinder engagement with the instructional procedures. The last three criteria were applied to ensure that students would successfully participate in all the study stages. The study received a favorable ethical opinion from the University of Kent ethics committee. Participants were invited and asked to agree to take part in the study following parental consent to include them.
Along with the TEMA-3, which was used to help determine inclusion, a series of standardized assessments was used descriptively to provide more information on general ability. TEMA-3 is a 72-item test measuring an individual’s mathematical ability, including (a) counting proficiency, (b) cardinality, (c) number comparison facility, and (d) elementary arithmetic (Libertus et al. 2013). Its administration lasts approximately 40 min, and the internal consistency has been reported to be between 0.94–0.96 and the test–retest reliability to be between 0.82–0.93 (Ginsburg and Baroody 2003).
The TOMA-3 is a 145-item test assessing (a) mathematical symbols and concepts, (b) computation, (c) mathematics in everyday life, (d) word problems, and (e) the attitude toward maths (supplemental). Its administration lasts approximately 90 min, and internal consistency was reported at 0.96, while test–retest reliability was reported to be 0.89, with the exception of the mathematics for everyday life subtest with 0.73 (Brown et al. 2013). This tool provided additional information on the participants’ mathematical abilities.
The Vineland Adaptive Behavior Scales-II Teacher Rating Form (VABS-II TRF; Sparrow et al. 2005) is a 233-item scale measuring adaptive behavior. It includes four domains related to adaptive behavior, namely (a) communication, (b) daily living skills, (c) socialization, and (d) motor skills, and produces an adaptive behavior composite. Administration lasts approximately 20 min, and standard scores are available for the domain and composite scores. Internal consistency has been reported at 0.98, while test-retest realibility of the adaptive behavior composite has been reported at 0.91 (Sparrow et al. 2005).
The Gilliam Autism Rating Scale-2nd Edition (GARS-2; Gilliam 2006) is a 42-item scale measuring the severity of symptoms related to ASD. It contains three behavioral subscales tailored around the Diagnostic and Statistical Manual of Mental Disorders fourth edition (DSM-IV) and an early developmental history subscale. The four subscales’ internal consistency has been reported at 0.94 on average, while test–retest reliability of the total score at 0.88 (Gilliam 2006). VABS-II TRF and GARS-2 provided, respectively, information on the participants’ adaptive behavior and symptoms related to ASD.
General Classroom Materials
Participant materials were stored in ring binders sized 21 × 29.7 cm. Pencils, erasers, notebooks, and digital timers were used. A laminated ‘class-shop’ catalog, sized 21 × 29.7 cm, was created with pages in portrait orientation and a 28 Times New Roman font with a picture in the middle of each page, sized 13 × 15 cm, showing each available item or activity. Finally, a points’ board was made in portrait orientation with a Times New Roman 12 font and a 6 × 6 grid.
A datasheet, a timings graph, and a daily graph were constructed. The datasheet had a 10 × 10 table divided into five vertical sections each for one day of the week. Each section was divided into two columns (i.e., corrects–incorrects) with five rows, one for each timing. The datasheet also included areas to record the set-criterion timings as well as performance across all relevant testing procedures. Participants were not asked to record how many facts they skipped, to reduce the complexity of the practice; however, we collected data on skipped facts by simply counting their number on each recording sheet (skipped facts are not presented in Fig. 3, to reduce the complexity of the graph).
For the timings graph, the x-axis represented the timings completed each day. The axis was divided into five days and allowed participants to graph up to five timings per day. At the end of each week, the graph was replaced with a new one. For the daily graph, the x-axis represented school days (i.e., Monday to Friday) and was replaced with a new one every four weeks. Both graphs had a logarithmic y-axis. Graphs were constructed based on the timings and daily standard celeration charts (Calkin 2005) but were simplified for ease of use. The datasheets and graphs associated with the MCL approach were always printed in color, and the datasheets and graphs associated with the BPB approach were always printed in black and white, to optimize discrimination between the two approaches.
Materials for Mathematical Practice
All worksheets were created using Microsoft Excel™ and Microsoft Word™. For the untimed practice, we created a laminated page, sized 21 × 29.7 cm, that had four 1 × 5 tables, 5 cm in height, and 11 cm in width. On the top row, we wrote the three numbers that could create a number family (e.g., 18, 2, 36), and the participants wrote all four possible combinations in the remaining rows (i.e., 18 × 2 = 36, 2 × 18 = 36, 36 ÷ 2 = 18, 36 ÷ 18 = 2).
For the timed practice and specifically number writing, the worksheet was in landscape orientation and had eight blank rows at a 2.0 distance per page, resembling the lined paper of a notebook, for a total of 15 pages. For the number families, worksheets were in portrait orientation. Multiplication and division facts were aligned to the left and presented horizontally, in an Arial 20 black font, with blank space on the right for participants to write their answers. Each page had ten facts, presented in random order, and each worksheet had 35 pages in total, to ensure that no artificial ceilings would affect participants’ performance. Finally, for the Application assessment, a separate worksheet was created through www.themathworksheetsite.com. That worksheet was in portrait orientation and had 30 multiplication and division facts per page, for a total of ten pages, presented vertically and in random order.
Two mathematical skills were assessed: a basic skill (i.e., number writing) and a complex skill (i.e., multiplication/division). Number writing was pinpointed as ‘FreeFootnote 1-Writes number 0–9 in ascending sequence and with the correct formation on the worksheet.’ The dependent variable was the correct and incorrect written digits per minute. Digits were scored as correct if they were written in the appropriate sequence (e.g., 0, 1, 2, 3) and with correct number formation (e.g., fully formed and within the lines). Performance criteria were not set for number writing as it was only assessed. Readers should note, however, that the criterion for that skill is 130–160, correctly formed, digits per minute (Johnson and Street 2013). This skill was assessed as it is an essential skill underlying many mathematical skills.
Multiplication/division was pinpointed as ‘See-Writes number of multiplication or division fact presented in random order on the worksheet.’ The dependent variable was the correct and incorrect written digits per minute, while we also recorded the skipped facts per minute. Number formation was not assessed for the multiplication/division skill. Performance criteria were set at a frequency of 80–100 correct digits per minute (Johnson and Street 2013). This range was highlighted with a yellow marker on their daily graph so that participants were consistently aware of the expectations in terms of their ultimate performance.
Four multiplication/division tables: ×÷13, ×÷14, ×÷18, ×÷19, were chosen as participants had never practiced them. Out of the four tables, two were ultimately targeted for practice and one acted as a control. Specifically, ×÷18 and ×÷19 were chosen for practice as they were considered of equal difficulty based on the fact that they had an equal number of digits per multiplication fact, while ×÷14 was chosen as a control based on participants’ low baseline performance. To make practice easier for participants, we separated tables 18 and 19 into smaller parts called slices. Each table (i.e., ×÷18 and ×÷19) consisted of two slices and a review slice. Slices 1 and 2 included four number families, creating 16 combinations each. Slice 1 included families ranging from ×÷2 to ×÷5 (e.g., 18 × 2 = 36 or 90 ÷ 5 = 18), and slice 2 included families ranging from ×÷6 to ×÷9 (e.g., 18 × 6 = 108 or 144 ÷ 18 = 8). Finally, the review slice included all eight number families, creating 32 combinations, and ranging from ×÷2 to ×÷9.
An adapted alternating treatments design with a control condition (Cariveau et al. 2020) was embedded in a concurrent multiple baseline across participants design (Carr 2005). The order of practice was alternated each day randomly, and the control condition was probed three times a week.
Two goal-setting approaches were compared, namely the MCL approach and the BPB approach. The MCL approach set weekly celeration expectations that participants had to meet. Specifically, participants were expected to double their performance from Monday to Friday following a ×2 celeration. The BPB approach set expectations based on participants’ previous best score. In this case, participants were expected to increase their performance by one more digit than their previous best score. We randomly assigned each approach either to the ×÷18 or ×÷19 multiplication/division tables via an online dice roller (https://www.random.org). That way, participants would practice each table with a specific approach. Alfred used the MCL approach with ×÷19 and the BPB approach with ×÷18. Nick and Gavin used the MCL approach with ×÷18 and the BPB approach with ×÷19. Finally, the ×÷14 table was assigned as a control condition with no goal-setting procedure associated with it.
During the study, participants did not receive practice on multiplication and division as their teacher focused on other aspects of the curriculum, such as counting, units of measurement, or telling the time. During baseline, participants were provided with one 30 s timing for each skill and were told to perform to their natural pace until they hear the sound of the timer; no instruction or feedback was provided. For number writing, baseline data were collected for five days across two weeks for all participants. For multiplication/division, baseline data were collected in a staggered fashion following the experimental design. Specifically, Alfred’s performance was assessed for 5 successive school days, Nick’s for 9 days, and Gavin’s for 15 days.
The lesson was delivered, on a 1:1 format, by an experienced Board-Certified Behavior Analyst (BCBA). Throughout the session, the instructor was present delivering instruction, praise, corrective feedback, and points depending on performance. Corrective feedback during untimed practice included saying the correct answer and asking the participants to write it down before proceeding to the next multiplication/division fact. Corrective feedback during timed practice was provided after the timing was completed in the form of saying the correct answer to the participants. Points were delivered only during the sessions with the instructor, for engaging in untimed practice, timed practice, data collection, and graphing on a variable schedule of reinforcement (VR3). Thus, reinforcement was contingent on engaging with all the practice components. Therefore, in some cases, participants acquired the backup reinforcer despite not having met the performance criterion of the day. That decision was made to keep participants motivated throughout the course of the study. When participants met their daily criterion, they received additional praise and two or three additional points. Overall, participants managed to acquire enough points to access the backup reinforcer in all of the sessions.
For clarity, we will report the common features of both goal-setting approaches and then each one’s unique features. At the beginning of each week, participants engaged in two consecutive set-criterion timings that lasted 30 s each. Once both timings were completed, the instructor calculated the performance criteria, and participants started their daily practice that included an untimed element and a timed element. At the early stages of the study, the instructor modeled both the timed and untimed activities and provided additional guidance to students that was faded out over time. During the untimed practice, participants were asked to, simultaneously, write and say all possible multiplication and division combinations of each number family for a total of four families. For example, participants were provided with numbers 18, 2, 36 (which is one number family) and then had to write and say each possible combination. Participants were expected to practice with four number families because slices 1 and 2 had four number families each. In the review slice, where all eight number families were included, participants practiced them in random order. Once they completed a round of untimed practice, they engaged in one 30 s timing and subsequently wrote their correct and incorrect digits on their datasheet and graphed their performance on the timings graph. This process, of untimed and timed practice, was repeated until participants either met their daily criterion or completed five timings. At the end of their practice, participants graphed their best score of the day on the daily graph. Upon completion of their daily practice, participants exchanged their points for a preferred activity or item from the class shop catalog. The catalog included things such as board games, the iPad, Legos, and playing football on the playground. Practice on each slice lasted 10 days for a total of 30 days. The effectiveness of this multicomponent intervention was monitored through the use of Precision Teaching and specifically the use of pinpoints that combined movement cycles and learning channels, as well as the use of the standard celeration chart and behavioral metrics.
Minimum Celeration Line Approach
Despite the common features presented above, each goal-setting approach also had unique features in setting performance criteria and graphing. For the MCL approach, the daily criteria were calculated for the whole week using Microsoft Excel™ based on a × 2 celeration. In terms of graphing, we used the goal box and the minimum celeration line. We drew the goal box on each day’s last line, on the timings graph, to show participants what their daily criterion was (see Fig. 1). Once participants graphed the day’s first timing, we connected that datum point with the goal box. That way, participants could see the minimum celeration line, which showed them what their performance’s trajectory should be for them to meet their daily criterion. Participants were told that their performance should stay on or above the minimum celeration line. If participants did not meet their daily criterion, they still had to increase their performance to meet the next day’s criterion that had already been determined.
Beat Your Personal Best Approach
Contrary to the MCL approach, performance criteria were calculated daily with the BPB approach by increasing the previous day’s best score. In terms of graphing, the score of each timing and the goal box was used. Specifically, participants graphed their performance by plotting each datum point on the timings graph and writing their score above it. This approach also used the goal box to show participants their daily criterion. The difference was that no data points were connected to the goal box and that we wrote the criterion number above the goal box. Also, participants wrote their score above each datum point on the daily graph (see Fig. 2). If participants did not meet their criterion for the day, it stayed the same for the next day. That way, we avoided participants dropping their performance on purpose to decrease the next day’s criterion.
Assessment of Mastery
When participants completed their practice with the review slice, their performance was assessed for the by-products of fluency through the test of maintenance, endurance, stability, and application (MESA). Following the guidelines from Fabrizio and Moors (2003), endurance was assessed by asking participants to complete a 90 s timing three times longer in duration than their typical timing. Stability was assessed by asking participants to complete a 30 s timing in the presence of distracting stimuli. During this assessment, music played on the iPad, and we also said random numbers to the participants for the whole duration of the timing. The third assessment was that of application. For this assessment, participants completed a 30 s timing with an untaught worksheet, which was in a different format than their typical worksheet. This assessed the application of skills to novel materials. Finally, maintenance was assessed on weeks 1, 2, 10, 11 and 12 after the practice was concluded. Participants were asked to engage in two 30 s timings, to account for the lack of practice during this phase of the study. That way, participants had the opportunity to engage in a warm-up timing allowing a more accurate evaluation of their performance.
From the outset of the study, a protocol was in place to account for any school absence due to illness or other reasons. If participants missed one or two days of practice, then on their return to school, they engaged in one or two double sessions accordingly (e.g., morning and afternoon) to catch up. If participants missed three days of school, then they restarted their weekly practice once they were available. Alfred and Nick did 4 double sessions, and Gavin did 3. The practice was restarted only once for Nick when he was practicing slice 1.
Interobserver agreement was calculated for all participants and across all phases of the study for M = 36% (range, 35.5–38%), of the total number of sessions. A BCBA with over ten years of experience independently scored video recordings of the sessions. The agreement was calculated in a two-step manner. First, agreement on correct digits, incorrect digits, and skipped facts was calculated separately by dividing the smaller by the larger number and then multiplying by 100. The three percentages were then added together and divided by three to produce the overall agreement for each skill. This process was repeated for each phase of the study (i.e., baseline, practice, and maintenance). The overall average agreement was calculated by adding the score of each phase and dividing it by three. The average agreement for Alfred was 97% (range, 93–100%), for Nick 88% (range, 82–100%), and for Gavin 94% (range, 87–100%).
Procedural fidelity was assessed for all participants and across all phases of the study for M = 36% (range, 35.5–38%), of the total number of sessions, by the same BCBA that collected data on IOA. The baseline checklist included 11 steps, the intervention checklist included 14 steps, and the maintenance checklist included six steps. The intervention checklist included the same number of steps for both the MCL and BPB approaches as both procedures followed the same sequence of untimed practice, timed practice, graphing on the timings graph, and graphing on the daily graph. The BCBA scored each checklist by writing yes or no for each step. Procedural fidelity was 100% across all participants and all phases of the study.
At the end of the study, participants were given a questionnaire (see Appendix) that included 20 open-ended questions about all aspects of their training (e.g., ‘how do you feel about graphing your scores?’). Thirteen questions had a scale from 1 to 10 with an unhappy face to the left of number 1 and a happy face to the right of number 10. The happy/unhappy faces were used to help participants discriminate how the scale works. There were also five questions with two options, and students had to circle one of them (e.g., Yes/No or Easy/Hard). Finally, two open-ended questions required an answer from the participants (e.g., what was your favorite part of the practice?). Before participants were left to answer the questionnaire, the instructor said:
There are some questions on this paper about our practice together. I want you to circle you answer. I want you to read the question out loud, and if there is something you did not like, you go toward number 1. The closer you are to number 1, it means that you really did not like something. If there is something you liked, you go toward number 10. The closer you are to number 10, it means that you really liked something. If you circle number five, it means that you did not mind. For some other questions, you will have two choices, and you will need to circle one. Finally, there are some questions that you need to write your own answer.
The instructor was present during the process to provide additional clarification but minimized their interaction to avoid affecting the way participants answered the questions.
Data were plotted using an online software named PrecisionX, which provided the standard celeration chart for visual analysis and calculated a series of behavioral metrics. PrecisionX was used only by the researchers as the students used paper graphs. Primary metrics utilized were level, celeration, and the level change multiplier. The level shows the average performance of the individual across time. The geometric mean was calculated as it is more appropriate for data plotted on the standard celeration chart, and it is less affected by extreme variables (Clark-Carter 2005). Celeration (i.e., (count/unit of time)/unit of time) is a frequency-derived measure quantifying students’ learning rate across time. Celeration can be calculated across days, weeks, months, or even years. In this study, the daily celeration was calculated during baseline and practice, and the weekly celeration was calculated during maintenance as performance was assessed across weeks, not days. The level change multiplier produces a ratio showing how much average performance changed from one phase to another (e.g., baseline to intervention). The ratio was calculated by dividing the highest number by the lowest number and then assigning the multiplication (×) or division (÷) sign to indicate an increase or decrease in average performance across time (Kubina and Yurich 2012). However, all the ratios could be transformed into percentages. For example, a ×2 weekly celeration increase would indicate an increase of 100% per week, while a ÷2 celeration decrease would show a 50% reduction in performance. For ease of interpretation, all ratios were transformed into percentages.
In addition to these metrics, the Non-Overlap of All Pairs (NAP) was used to calculate the effect of each goal-setting approach on participants’ performance. The NAP is an appropriate effect size measure for single-case research with high correlations with the R2 effect size index (Parker and Vannest 2009). The NAP was calculated only for participants’ correct digits by comparing the data from the baseline condition to the data from the maintenance condition. This process was conducted for each goal-setting approach separately. Effect sizes were interpreted following the guidelines by Parker and Vannest (2009). Specifically, weak effects ranged from 0 to 0.65, moderate effects ranged from 0.66 to 0.92, and strong effects ranged from 0.93 to 1.0.