The mand repertoire is essential for children diagnosed with autism spectrum disorder (ASD) and limited communication skills because: (1) improvements often reduce challenging behaviors, thus increasing social initiations and spontaneous language; and (2) children with ASD often do not readily acquire mand skills without specific training (Albert et al., 2012; Michael, 1988). Although behavior analytic training is effective in improving children’s mand skills, parental involvement is essential to promote the transfer and maintenance of these skills (Fava et al., 2011).

Behavioral skills training (BST) is an evidence-based training model that consists of: (1) instruction; (2) modeling; (3) rehearsal; and (4) feedback (Erhard et al., 2019). It has been empirically evaluated across a wide variety of populations (e.g., parents and teachers, adults with ASD) and skills (e.g., implementing discrete trial teaching procedures, functional communication training, and computer skills). Although parents can learn and successfully implement communication training based on applied behavior analysis (ABA; Gerow et al., 2018), ABA-service providers are scarce in Japan, which limits the dissemination of ABA (Kuma, 2019) and parent training (Asperger Society Japan, 2015). To further complicate matters, the COVID-19 pandemic posed challenges for parents in terms of accessing appropriate interventions for their children.

BST, coupled with telehealth, could mitigate the aforementioned barriers by improving access to ABA services. Studies support the effectiveness of the BST-based telehealth parent-training program in improving children’s communication skills (e.g., Hoffmann et al., 2019). However, additional research is necessary to ensure current and prospective clients receive quality behavior-analytic services to improve children’s communication skills. For example, Akemoglu et al. (2020) reviewed the literature on telehealth and parent-implemented language and communication interventions, and reported that some of the studies included in the review lacked rigor because they employed nonexperimental designs (i.e., A-B design) and did not contain crucial information such as procedural integrity data or information regarding the technology, tools, and platforms utilized in the study.

Although some evidence supports the effectiveness of BST-based telehealth parent training in improving children’s communication skills, in ethical and analytical behavior analysis, cultural influences must also be considered (Brodhead et al., 2018). Culture not only “shapes and maintains the behavior of those who live in it” (Skinner, 1971, p. 143) but also affects what people consider to be appropriate behavior in social situations (Glenn, 2004) and their likelihood of seeking help or treatment (Fong et al., 2017). Furthermore, cultural competency is a necessity for behavior analysts. For example, the Behavior Analyst Certification Board’s (2020) ethical code states that “behavior analysts actively engage in professional development activities to acquire knowledge and skills to cultural responsiveness and diversity” (Code 1.07). Sivaraman and Fahmie (2020) conducted a systematic review of cultural adaptations in the application of ABA-based telehealth services for individuals with ASD outside the United States. All the studies included in the review made some cultural adaptations such as using translated materials or matching the trainer with the participants in terms of birthplace, ethnicity, or gender.

Although the influences of culture and the importance of cultural competency are recognized, only 3%–10.7% of studies published in behavioral journals provide cultural information such as race or ethnicity and linguistic information (Najdowski et al., 2021). Ignoring cultural influences when conducting research impedes the external validity and promotes racism (Najdowski et al., 2021). Thus, the evaluation of empirically validated procedures across various cultures is warranted.

To extend previous parent-training research conducted in Japan, the current study developed and conducted a telehealth parent-training program targeting mand training by using BST and a within-subject experimental design. The current study evaluated the effects of the telehealth parent-training program on changes in the children’s vocal-mand repertoire, as well as parent procedural integrity and parent acceptability of the training program.



Our experimental protocol was approved by the ethics committee of Keio University (No. 19024) before the study commenced. Informed consent was obtained from all the participants included in the study. Seven parent–child dyads were recruited from a university program in Tokyo and three clinics located in Tokyo and Saitama prefectures. The inclusion criteria for children were as follows: (1) aged between 3 and 7 years; (2) diagnosed with ASD by an independent clinician; (3) possessed limited vocal-mand repertoire; and (4) did not engage in severe challenging behaviors. The inclusion criteria for parents were as follows: (1) Wi-Fi access at home; (2) ability to attend a weekly online meeting with the researcher; and (3) ability to perform the procedure at home at least twice a week with 10 trials per day. Of the seven participants, four parent–child dyads met the inclusion criteria (Tables 1 and 2). The ethnicity (Japanese) and native language (Japanese) of the parents and the trainer were matched. Both parent–child dyads and the trainer lived in Tokyo or its peripheral region. All of the participants lived within 85 miles (one way) of Keio University. None of the parents had prior experience with mand training or any of the procedures used in this study.

Table 1 Children’s characteristics
Table 2 Parents’ characteristics

Settings and Materials

The study was conducted online. Ryo, Gaku, and Yuma’s parents used their laptops and Jin’s parent used a smartphone to participate in the study from their respective homes.Footnote 1 At the onset, a parent-training package was sent to each participant that included a 32 GB Apple iPod Touch with the Box application (a cloud-hosted file storage service), a stand for the iPod Touch, and a parent manual. A password-protected Box account was set up for each participant. The parents used the iPod Touch and the Box application to video record and share their interactions with their children at home on a weekly basis. Once a video(s) was uploaded to the Box, the researcher downloaded the video to an encrypted external hard drive and deleted the video data from the Box permanently. The participants met with the primary researcher, who was a board-certified behavior analyst-doctoral, and a clinical psychologist certified in Japan. The parent manual contained an overview of the study, information regarding ABA and verbal behavior, how to use Webex, and how to record and share videos with the researcher.Footnote 2 In addition, three supplemental lecture videos were created using Microsoft PowerPoint and were shared with the parents via Google Classroom, covering the following aspects: (1) introduction to ABA and verbal behavior; (2) demonstrations of recording and sharing videos; and (3) demonstrations of the procedure. As a visual prompt for Gaku, his parent wrote “grape” in Japanese (budoh) on an A4-size paper.

Dependent Variables

The two dependent variables were: (1) children’s responses, and (2) the accuracy of the parents’ implementation of the procedures (procedural integrity). Parents implemented the procedure during the daily routine and without the researcher’s presence. Parents video recorded their interactions with their children and sent the videos to the researcher for later viewing and scoring.

A frequency measure was used to score children’s responses as independent-mand, prompted-mand, or incorrect or no response. Independent-mand was defined as the child engaging in the target vocal response (Table 3) without or before the prompt. Prompted-mand was defined as the child engaging in the target vocal response following a parent delivered a prompt (e.g., saying “juice” or saying “juice” while pointing to the juice). Incorrect or no response was defined as the child failing to produce the target vocal response or engaging in behavior other than the target vocal response. The percentage of occurrences of each type of response was calculated by dividing the total number of responses by the total number of trials conducted for each session and then multiplying the output by 100.

Table 3 Target behavior and reinforcers for each child

The researcher viewed all videos and recorded every trial conducted by the parent using a procedural integrity checklist (Tables 4, 5 and 6). Correct implementation of the procedure was defined as the parent implementing each of the procedural steps (i.e., 1A, 1B, 2A, and 2B) as listed in Tables 4, 5 or 6. Incorrect implementation was defined as the parent failing to implement each of the procedural steps as listed in Tables 4, 5 or 6. The procedural integrity was calculated by dividing the number of trials implemented with 100% accuracy by the total number of trials. The output was then multiplied by 100 to obtain the procedural integrity percentage. The percentage of occurrences of each type of response was calculated by dividing the total number of responses by the total number of trials conducted for each session and then multiplying the output by 100.

Table 4 Procedural integrity sheet for Phase I
Table 5 Procedural integrity sheet for phase II
Table 6 Procedural integrity sheet for supplemental phase

Social Validity

Social validity was assessed at all three levels (Wolf, 1978): the social significance of the goals, the social appropriateness of the procedures, and the social importance of the effects. The parents answered a seven-question survey based on a 5-point Likert-type scale after the study was completed. There are several validated methods to assess social validity, such as the Behavior Intervention Rating Scale (Von Brock & Elliott, 1987), Treatment Acceptability Rating Form (Reimers & Wacker, 1988), and Treatment Evaluation Inventory (Kazdin, 1980). However, in this study, a social validity survey was developed by the researcher for two reasons. First, the existing published measures are not validated in the Japanese language. Second, there are no validated and published measures available in Japanese to assess social validity at all three levels. The first three questions in the survey pertained to the social significance of the goals, the following two questions focused on the social appropriateness of the procedures (with question 5 being a reverse itemFootnote 3), and the last two questions were related to the social importance of the effects. Social importance of the effects was also evaluated by a research assistant (RA), who was unfamiliar with the study purpose. In particular, the RA viewed two videos of each participant then selected one video for each participant based on better performance. All videos selected for viewing were at least one minute (range: 1 min–10 min) and were presented in random order. The video selection process was as follows: the researcher (1) identified all videos spanning at least 1 min in the baseline and the last phase of the intervention for the target behavior; (2) assigned a number to each video identified; and (3) used a random number generator to select a number from each phase (i.e., baseline and the last phase of the intervention) for each participant.


Prebaseline Assessments

Each participant was assessed individually. To collect demographic information and select the target behavior(s) for each participant, interviews regarding the child’s behavior and daily environment were conducted and the Japanese version of the Vineland Adaptive Behavior Scale (2nd ed.) was administered with the parents (Table 7). Both interviews and the Vineland Adaptive Behavior Scale were administered with the researcher. To identify the preferred items or activities for each participant, the parents were asked to provide a list of and withhold the access (for a minimum of a few hours) to the child’s highly preferred items and activities prior to the interview (all parents brought at least three highly preferred items and/or activities to the interview). During the interview a single-stimulus preference assessment was conducted by instructing the parents to provide a “sample” of each preferred item or activity (e.g., they offered a small piece of candy or played with the child for a brief period) to the child for 30 s. The items and/or activities the child did not engage with during the preference assessment were excluded from the list of highly preferred items and activities. Following the preference assessment, the parents withheld access to the highly preferred items or activities to conduct an informal-reinforcer assessment in the following manner. If the child engaged in a mand response already in the child’s repertoire (e.g., reaching for or trying to grab the item), the parents provided the “sample” again. This procedure was repeated until the child stopped manding for the item or activity, or when the item or activity was delivered to the child three times in a row. If the child did not engage in a mand response, the parents presented the next “sample” of the preferred item or activity. The researcher observed and recorded the items and activities presented and the child’s responses. Items or activities that the child requested more than twice during the informal-reinforcer assessment were considered to be reinforcers with the most frequently requested item(s) used as a reinforcer(s) in subsequent phases. The most frequently requested item(s) by a child were used as the reinforcers for that child.

Table 7 Adaptive behavior composite and communication scores on the vineland adaptive behavior scale for each child

In addition, the parents were asked to complete the following procedures after a reinforcer(s) for the child was identified: (1) call the child by their name; (2) show and withhold the preferred item until the child requested it or until 3 s had elapsed since its presentation; and (3) vocally label the preferred item while showing it and wait until the child responded or until 3 s had elapsed after the vocal model. The researcher observed the children’s responses and recorded the data.


After the prebaseline assessments, each parent received the parent-training package, and a meeting with the researcher was scheduled. During the meeting, based on the information obtained during the prebaseline assessments, the target behavior(s) was determined for each child (Table 3) in collaboration with the respective parent to ensure that they were socially valid and functional for the participant.


After selecting the target behavior(s), the parents received training on how to perform the baseline probes, during which they held the child’s preferred item up and waited until the child engaged in a mand response or until 3 s elapsed. The mand response did not need to be the target vocal response. Any mand response (i.e., the target vocal response as well as mand response that was already in the child’s repertoire) was followed by access to a preferred item or activity for 30 s. After the probe was conducted by the parent without error for three consecutive trials in the researcher’s online presence, the parent completed at least 10 baseline trials per day with the child twice a week. Those baseline trials were conducted by the parents during the daily routine, and without the researcher’s presence. Parents video recorded those trials and shared the files with researcher for later viewing and scoring.

General Procedure

Training with Researcher

After obtaining adequate baseline data, the researcher uploaded the last lecture videoFootnote 4 and trained the parent to implement the procedure. Parent training was conducted during the weekly meeting with the researcher and every time phase change occurred. In the initial parent-training session, a written description of the procedure was shared using Webex’s screen-sharing function. In addition, the rationale was explained, followed by a demonstration of the procedure. During the demonstration, the researcher played the role of both the child and the parent (voicing the words “child” and “mom” and switching from one role to the other) and demonstrated how the parent should act when the child engaged in independent mand, prompted mand, and incorrect or no responses. Following the demonstration, the researcher and parents role-played practice scenarios. After each practice trial, vocal praise or correction was given as performance feedback. The training continued until the parent implemented the procedure with 100% accuracy for at least three consecutive trials in the meeting with the researcher (these data are not included in the procedural integrity figure).

Parent Independent Trials

After meeting the mastery criterion (i.e., 100% accuracy for at least three consecutive trials) during the training with the researcher, the parents began the intervention by conducting at least 10 trials of the procedure per day, twice a week during the daily routines and without the researcher’s presence, The parent was instructed to implement the procedure at a convenient time during their daily routine. While implementing the procedure, the parent video recorded the interactions with their child. The video recording was analyzed by the researcher weekly to evaluate the child’s behavior and the procedural integrity. Across all intervention phases, the ranges and means of the total number of trials per session (excluding the probe trials for Gaku) were 10–17 (M = 10.2), 3–23 (M = 12.4), 2–20 (M = 6.8), and 2–21 (M = 10.3) for Gaku, Yuma, Jin, and Ryo, respectively. If the parent had any questions regarding the procedure outside of the weekly meeting with the researcher, they could email the researcher or wait until the weekly meetings to ask those questions.

After the parents began implementing the procedure with their child without the researcher’s presence, the researcher met with the parents weekly for an hour, and provided feedback using graphs and videos of parent–child interactions. When the researcher analyzed the video recordings and when the data indicated that the child spontaneously emitted the target response for two consecutive sessions, the parent received training in the next phase. Correct implementation was defined as the parent implementing each of the procedural steps (i.e., 1A, 1B, 2A, and 2B) as listed in Tables 4, 5 or 6. Incorrect implementation was defined as the parent failing to implement each of the procedural steps as listed in Tables 4, 5 or 6. The same BST training procedure and mastery criterion as the previous phase were employed for parent training.

Specific Phases

Phase I: No Delay

Phase I of the procedure consisted of: (1) withholding access to the reinforcer; (2) conducting a single-stimulus preference assessment trial; (3) presenting a vocal model (prompt); and (4) providing differential reinforcement of alternative behavior (see Table 4 for a more detailed description).

Phase II: Delay

In Phase II, the procedure was identical to Phase I, except a prompt delay was added: after showing the preferred item, the parent waited until the child produced a vocal response or until 3 s had elapsed, before delivering a vocal model as needed (see Table 5 for a more detailed description).

Supplemental Phase for Gaku’s First Target Behavior

The procedures used with other participants (i.e., Phase I followed by Phase II) were ineffective in improving Gaku’s first target behavior. The children’s benefit was the priority of this study. Hence, Phase II was terminated and a Supplemental Phase was introduced for Gaku’s first target behavior. Before initiating the Supplemental Phase with the child, the parent received training from the researcher in a weekly meeting. The training continued until the parent could implement the procedure with 100% accuracy for at least three consecutive trials in the meeting with the researcher (these data are not included in the procedural integrity data reflected in Fig. 2). Correct implementation was defined as the parent implementing each of the procedural steps (i.e., 1A, 1B, 2A, and 2B) as listed in Table 6. Incorrect implementation was defined as the parent failing to implement each of the procedural step as listed in Table 6.

As Gaku could identify written letters (hiragana letters), a visual prompt was used along with the vocal prompt employed in the previous phase (see Table 6 for a more detailed description). Moreover, because the primary researcher was scheduled to leave the institution where this study was conducted, the study had to end after three sessions of the Supplemental Phase for Gaku. Hence, a probe trial was conducted by Gaku’s parents to determine whether Gaku engaged in the first target behavior independently during the final online meeting. During the probe trial, the parent showed the child’s preferred item without the visual or vocal prompt, then waited up to 3 s for the child’s response. The researcher observed the probe trial and recorded the data.

Research Design

The nonconcurrent-multiple-baseline-across-participants design was adopted to evaluate the effectiveness of the program. This design was selected for two reasons: participants entered the study and began the baseline probes at different times and to minimize the participants’ waiting time before the intervention.

Interobserver Agreement (IOA)

Two independent observers viewed videos of at least 25% of the trials in each phase for all the participants. The observers scored the child’s response as either independent-mand, prompted-mand, or incorrect or no response for each trial. Trial-by-trial interobserver agreement (IOA) was calculated for children’s responses by dividing the number of trials with agreement by the total number of trials in each phase. The output was then multiplied by 100 to obtain a trial-by-trial IOA percentage, which was above 93% for each phase for all participants (range: 93.4%–100%)


The children’s independent and prompted mands are shown in Fig. 1. All the children acquired a novel mand response. Gaku acquired two novel responses. Although parents were asked to conduct at least 10 trials of the procedure a day, only one (Gaku’s) parent met this requirement. To supplement the visual analysis of the graphs, the percentage of nonoverlapping data (PND; Scruggs & Mastropieri, 1998) was used to estimate between-condition level changes. The PND were calculated for each child’s independent mand. The PND were also assessed for two adjacent phases in the following manner. The range of data points was determined for the preceding phase (e.g., baseline), and the number of data points that fell inside and outside of the range were counted for the subsequent phase (e.g., Phase I). The number of data points that fell outside of the range was divided by the total number of data points in the subsequent phase. The output was then multiplied by 100 to obtain the PND for two adjacent phases.

Fig. 1
figure 1

Changes in independent and prompted mands for each child. Percentages of incorrect or no responses are not included in this figure. Phase I: vocal prompt; Phase II: prompt delay; Supplemental Phase: visual plus vocal prompts for Gaku

In baseline, none of the children engaged in independent mands. In Phase I, when parents provided immediate prompts to mand vocally, immediate and increasing trends and levels were observed for all children except Ryo (PND for Gaku’s first target behavior and Yuma, Jin, Ryo, and Gaku’s second target behavior were 100%, 80%, 28.6%, 12.5%, and 100% respectively). In Phase II, when delays were included, independent mands increased for Jin and Ryo (PND was 71.4% for Jin and 62.5% for Ryo). Phase II did not improve the independent mands for Gaku’s first target behavior (PND was 16.7%). Vocal prompts (Phase I) followed by prompt delays (Phase II) were used to establish independent responses for all children. However, as this approach did not improve Gaku’s first target behavior adequately, a visual prompt was employed for him (Supplemental Phase in Table 6). After three sessions of visual and vocal prompts, he independently engaged in the target response in a probe trial conducted by the parent. As he acquired the second target response during Phase I, Phase II was not necessary.

The procedural integrity data are shown in Fig. 2. The data confirm that none of the parents used the Phase I procedure during the baseline. Jin and Ryo’s parents implemented Phase I with an average procedural integrity of 86.1% and 84.7%, respectively. Gaku and Yuma’s parents had low procedural integrity for this phase with an average of 29.1% and 61.3%, respectively, which gradually improved. All the parents implemented Phase II with an average procedural integrity of 74% or above (Gaku = 79.0%; Yuma = 100%; Jin = 71.0%; Ryo = 92.7%). The procedural integrity for Jin was highly variable (range: 33.3%–100%) throughout the intervention phases.

Fig. 2
figure 2

Procedural integrity of parent-implemented mand training. Phase I = vocal prompt; Phase II = prompt delay; Supplemental phase = visual plus vocal prompts for Gaku. The procedural integrity for Gaku’s second target response was not collected. The procedural integrity for Gaku’s probe trial (sessions 7 and 21) is not included in this figure. Graphs indicate the procedural integrity when the parents implemented the procedures in the researcher’s absence

In general, the mand-training procedure used in this study resulted in all children acquiring novel vocal-mand responses. However, a modification to the procedure was necessary for Gaku’s first target-vocal mand. Although the procedural integrity for Jin’s parent was highly variable, the parent-training procedures improved the parents’ implementation skills. All parents rated the goal, procedures, and the effects of the study highly. The social validity survey questions and parents’ responses are presented in Table 8. All the videos selected by the RA (i.e., those with better performance) were from the intervention phase of the study, thus supporting the social significance of the effects for all the participants.

Table 8 The social validity survey questions and parents’ responses


This study evaluated whether a telehealth-parent-training program developed and conducted in Japan would (1) improve the children’s vocal-mand repertoire and (2) allow the parents to implement the program accurately. Moreover, it assessed the social validity of the program at three levels (Wolf, 1978). The results of this study correspond with previous findings that parents can implement behavioral procedures and improve their children’s communication skills after BST-based telehealth-parent training (e.g., Hoffmann et al., 2019). All the parents implemented the procedure with a high procedural integrity, and all the children acquired at least one novel response. One participant, Gaku, acquired two novel responses, thereby providing additional support for the reliability and validity of the program. All the parents gave the highest rating to questions regarding the social significance of the goals and the social importance of the effects. Although some questions pertaining to the social appropriateness of the procedures did not receive the highest rating, all of them were highly rated.

Although all parents implemented the procedure with high fidelity, Jin’s procedural integrity data were highly variable due to two factors: (1) a parent who did not attend the weekly meeting implemented the procedure with low procedural integrity; and (2) parents struggled to identify the examples and nonexamples of the target response while implementing the procedures. To address the latter, the operational definition of the target response was revised. Moreover, multiple exemplar trainings using videos of the child’s responses were employed to train the parents during the weekly meeting. Despite these efforts, this issue remained unresolved. If a similar issue is encountered, practitioners could try in-vivo feedback while the parents interacted with their child and increase the frequency of parent training.

There are a few notable features of this study. First, the researcher evaluated the parents’ implementation of the procedures in the absence of the behavior analyst. This method is beneficial because it allows the behavior analyst to analyze complex parent–child interactions and the reliability of the data at the behavior analysts’ convenience. In addition, because recording videos in the presence of the researcher may cause reactivity effects, observing parent–child interactions at home may reduce such effects (Rentzsch & Schütz, 2009, as cited in Sommer et al., 2016). Lerman et al. (2020) reported that a 7-year-old child engaged in some reactivity behaviors (e.g., making comments about the video feed and attempting to close the laptop) during videoconferencing. It suggests that even the virtual presence of the researcher without a physical presence at home can cause reactive behaviors. Whether parents’ self-recording of interactions in their homes could reduce reactive effects requires further investigation. Second, the procedures were simple, and the inexpensive materials used in this study are commercially available in many countries, which facilitates the replication of this study in Japan and other countries, where the number of professional behavior analysts is limited. Third, procedural integrity was evaluated for every mand-training trial conducted by the parents. Procedural integrity is often assessed intermittently in early interventions, for example, once a month (Lemire et al., 2020). Evaluating procedural integrity for every trial was resource intensive and not feasible in many applied settings. However, for our research purpose, it allowed us to estimate the procedural integrity completely and captured some of the errors that would have been missed had the data been collected in a conventional manner (i.e., intermittently). Finally, the primary researcher constructed appropriate cultural adaptations for the participants, which included but were not limited to: (1) using the participants’ native language (i.e., Japanese) and languages that are familiar to non-behavior analysts throughout the study and (2) calling the parents by their last names and with appropriate titles (calling parents by their first names or by their last names without appropriate titles is often considered offensive in Japan). As cultural competency is a requirement for BCBAs, future studies could attempt to identify specific cultural adaptations or skills that affect the treatment outcome for members of various cultures. For practitioners, we recommend conducting research among members of diverse cultures across the border. Such collaborative efforts could improve the researcher’s cultural competence and help consumers of the research increase their cultural awareness.

Besides providing additional support to highlight the effectiveness and social validity of BST-based telehealth parent training for improving children’s communication skills, we found that after meeting the mastery criterion, additional training was necessary for parents to achieve adequate procedural integrity. In other words, to achieve high procedural integrity, parents had to implement the procedure with their children and receive performance feedback from the researcher. This observation supports the recommendations regarding generalization and training (e.g., Stokes & Baer, 1977). In some types of parent training, parents do not have the opportunity to practice skills with their children (Haraguchi et al., 2013). However, we argue that practice opportunities with children and receiving feedback on those interactions must accompany training and role-play with the trainer.

Although procedural integrity was highly variable for Jin’s parent throughout the intervention phases and it gradually improved for Gaku’s and Yuma’s parents (i.e., some of the sessions were not implemented with high fidelity), the children acquired at least one novel mand response. There are a few plausible explanations for this observation. First, although procedural integrity was low, the parents often implemented crucial steps for the acquisition of mand responses (e.g., initiating mand-training when the child showed interest in the preferred item and reinforcing target response). Second, for Jin, the parent who did not attend the weekly meeting implemented the procedure with low procedural integrity. However, the parent who attended the meeting with the researcher implemented the procedure with relatively high procedural integrity for Jin. Third, although there was an issue with Jin’s parents identifying examples and nonexamples of the target behavior, they reinforced the target behavior when it was clearly pronounced. It is possible that the reinforcement was sufficient to teach Jin a novel mand.

There are several limitations to this study and recommendations for future research. First, it should be noted that all children had vocal-imitation skills at the onset of the study. Some of them (i.e., Ryo and Gaku) had limited vocal-mand skills, which indicated the requirement of additional considerations and training for children without any imitation and socially appropriate skills. Second, although parents were required to conduct at least 10 trials of the procedure in a day, none of them except Gaku’s met this requirement. Although we believe that it is valuable information for practitioners and applied researchers, a replication of this study while controlling the total number of trials conducted in a day may be beneficial. When parents were reminded to conduct 10 trials of the procedure per day by the researcher, they reported that although they knew they had to do so, they were too busy to conduct and video record the trial when the child requested the reinforcer. Therefore, competing contingencies that affected the parents’ behavior seemed to impede them from implementing the required number of trials. To overcome this issue and exert greater experimental control, future studies could incorporate performance management contingency (e.g., if an honorarium is available for the participants, a portion of it can be made contingent on the participant completing the required number of trials per day and throughout the research period). In addition, future studies could evaluate whether using different methods and devices to video record parent–child interactions would affect the number of parent-delivered-training trials. Third, data were recorded only in the home environment, and follow-up data were not available. Therefore, the transfer and maintenance of the participants’ skills were not evaluated in this study. Replications of this study with data obtained in various environments and available follow-up data would be beneficial. Fourth, the social validity survey developed by the researcher was not validated. Future studies could validate the social validity survey developed in this study or already published methods to assess social validity in the Japanese language. Finally, a cost-benefit analysis was not performed in this study, which would be beneficial in determining whether the benefit of adopting this program would outweigh the cost.

The findings from this study support those of previous studies that demonstrate the effectiveness of BST-based telehealth parent training in improving children’s communication skills (e.g., Hoffmann et al., 2019). Moreover, this study contributes to the literature by advancing the international evaluation of behavior analytic training as a pioneer research conducted in Japan that contains of the following features: (1) online program design; (2) mand training; (3) BST model; (4) session-by-session data on children’s behavioral change and procedural integrity; (5) within-subject experimental design; and (6) social validity evaluation. Considering the potential benefits of the training, replications and refinements of the program in Japan and across various cultures may promote the international dissemination of behavior analysis.