Children without disabilities naturally develop critical social skills in the context of interactions with peers and through observation and imitation of others (Pierce-Jordan and Lifter 2005). According to Walker (1983), social skills are “a set of competencies that: a) allow an individual to initiate and maintain positive social relationships, b) contribute to peer acceptance and to a satisfactory school adjustment, and c) allow an individual to cope effectively with the larger social environment” (p. 27). Successfully navigating numerous social situations necessitates an awareness of the individual’s own emotions and the emotions of others and an ability to make decisions based on the social context in order to enable the individual to establish positive relationships with other people (Zins et al. 2004). Children without disabilities learn to adjust their social communication based on the social context or environment and its prescribed rules by understanding both verbal and nonverbal feedback from other children and adults (Haney 2013). The setting (e.g., the classroom, the playground, the home) and the communication partners involved (e.g., teachers, classmates, parents, siblings) dictate the formality of speech and the vocabulary used in that conversational exchange (Winner 2002).

The classroom is one of countless social contexts where individuals develop and practice these social skills. Before beginning conversational exchanges, at the preschool age (i.e., 3–5 years of age), children show affection, concern, and a wide range of emotions. Social reciprocity, which may involve simply exchanging sounds or smiles with another individual, is one of the earliest stages and basis for social interactions. In addition, the ability to imitate adults and peers, engage in pretend play and cooperative play with peers, and demonstrate a desire to please friends are additional instances of interaction among young children (Division of Birth Defects 2014).

Social skills are not only imperative in daily interaction but also greatly impact a child’s success in a number of facets for the entirety of that individual’s lifespan. Academic achievement, building lasting friendships and relationships, resolving conflicts, and how an individual will overall navigate dynamic environments, such as their place of employment, their community, or their home environment, are some of the complexities that demand social skills (McKown et al. 2009).

Characteristics of Individuals with Autism Spectrum Disorder

For the rapidly growing and heterogeneous population of individuals with autism spectrum disorder (ASD), social skills are distinguishing deficits (McConnell 2002). Individuals with ASD exhibit deficits in social skills acquisition and in performing or applying learned social skills to applicable social contexts and situations (Bellini 2006). According to the fifth edition of the Diagnostic and Statistical Manual (DSM), social communication and interaction deficits include three symptoms: (a) deficits in social-emotional reciprocity, (b) deficits in nonverbal communicative behaviors used for social interaction, and (c) deficits in developing and maintaining relationships (American Psychiatric Association 2013). Among these core deficits in social communication and interaction, Barton et al. (2012) identify the ability to attend to relevant cues, to imitate, to understand language, and to participate in functional pretend play to be other fundamental goals for young children with ASD that are also all related to social skills.

Individuals with ASD are often observed displaying minimal interest in engaging in social reciprocity with peers or adults and present a preference for social isolation or detachment (Dawson et al. 2004). The earliest signals for delayed social development in young children with ASD is a lack of joint attention (JA) and expressions of positive affect, such as smiling or laughter. Children participating in JA shift their gaze from an object and make eye contact with a communication partner and use gestures such as pointing in order to engage with another individual (Krstovska-Guerrero and Jones 2013). According to Mundy et al. (1992), individuals who fail to engage in JA also fail to display affective responding, which draws the attention of other adults and peers and is pivotal in increasing opportunities for social interaction.

Theory of Mind (ToM), an ability to identify another’s perspective or read their mind in order to empathize and understand another individual’s knowledge and beliefs, is also associated with the social skill deficits of this population (Baron-Cohen et al. 1985). For preschool children, ToM plays an important role in the success of engaged and sustained play with peers. Deficits in ToM exhibited by individuals with ASD are thus associated with significant delays in social development (Myszak 2010). The inability to empathize and to deduce the emotional state of others by understanding nonverbal communication cues, such as facial expressions, eye gaze, body language, and gestures, severely impact the social skills of individuals with ASD.

Individuals with ASD who are motivated and interested in engaging with other adults or peers may struggle with a range of anxieties and difficulties with social contexts and situations. Children with ASD may not be able to successfully gain the attention of a peer and initiate a conversation. Others may find maintaining a conversation to be difficult and understanding the nuances of social situations with differing peer groups to be challenging (Haney 2013).

Social skills deficits persist for individuals with ASD as they develop and may be further hindered by circumscribed interests or an abnormal fixation on a specific subject or object that relates to rigidity in behaviors (Jones and Klin 2013; Sasson et al. 2008). Some individuals with ASD may be consumed with information on specific topics, such as trains or dinosaurs, and thus commandeer a social interaction and ignore social-emotional reciprocity or the give-and-take in a conversation. Individuals with ASD may not be able to recognize social cues that relate to the emotions of a communication partner and may not initiate or respond to such cues in order to show care or concern (Haney 2013). A restricted fascination with specific toys or objects is also associated with repetitive or ritualistic manipulation of toys, such as spinning or arranging toys, instead of playing with toys for their intended use (Lydon et al. 2011). Such behaviors often preoccupy the individual with ASD and therefore limit their opportunities for social interaction with peers.

Importance of Social Skills Interventions

The successful development of social communication and interaction in individuals with ASD is a lifelong objective, and the acquisition of such skills can have a lasting impact on other critical areas of need that are defining characteristics of this population. Social skills are associated with cognitive, physical, and emotional development (Fragale 2014). As a pivotal skill, targeting social skills produces broad improvements in other areas, such as pro-social behavior, appropriate communication with both peers and adults, and cooperative and functional play (Jung and Sainato 2013). Through social skills instruction, individuals with ASD may be taught to appropriately communicate and initiate interactions, participate in turn-taking, make requests, and ask questions instead of resorting to less preferred behaviors, such as tantrums and aggressions (Egel et al. 2012; Reichow and Volkmar 2010; White et al. 2007). Social skills instruction therefore would lead to increased acceptance from typically developing peers and broaden the opportunities from which an individual with ASD may practice social interaction and communication and build meaningful friendships with peers (Jordan 2003).

Social Skills Interventions for Individuals with ASD

Given the importance of developing social skills for individuals with ASD, the amount of research on interventions targeting such skills has increased exponentially. Numerous interventions targeting social skills have been examined; however, only a small number have been identified to meet evidence-based criteria. The National Professional Development Center on Autism Spectrum Disorders (NPDC) is considered an authoritative source on evidence-based practices and autism. According to the NPDC (2014), peer-mediated instruction and intervention (PMII), prompting, reinforcement, self-management, social narratives, social skills groups, and video modeling (VM) are effective practices that may aid in the gaining of social skills. PMII consists of training peers without disabilities to be responsive communication partners by increasing opportunities for individuals with ASD to socialize. Used in combination with other evidence-based practices, prompting procedures are a method to assist individuals with ASD with learning and performing behaviors and skills. Reinforcement serves as a method to increase the probability of the future performance of the behavior or skill by the individual. Self-management targets the ability of the individual with ASD to autonomously regulate his or her behaviors in multiple contexts. Social narratives are individualized and brief descriptions of a social situation to prepare an individual and emphasize important cues and appropriate responses. Social skills groups are an opportunity for a small group of individuals with ASD to learn and practice appropriate social skills with the guidance of an adult facilitator. Lastly, VM is an instructional approach using recorded videos. A VM commonly includes a desired skill or replacement behavior presented to students in a video format. Students are provided with opportunities to observe the video repeatedly and then participate in sessions which allow the student to imitate and practice the skill or behavior shown in the VM (Hine and Wolery 2006).

Despite the social skills interventions available for individuals with ASD, more research on interventions focusing on social skills is warranted (Jung and Sainato 2013; White et al. 2007). The aforementioned evidence-based interventions addressing social skills for this population do not fully remedy the social skills deficits representative of this population. In a comprehensive review of the intervention research on social development that spanned from 1985 to 2006, White et al. (2007) concluded that there was still much to research in regard to effective intervention approaches. The authors also emphasized a need to conduct replication and elaborative studies and more methodologically rigorous studies.

Video Modeling as a Means to Develop Social Skills in Individuals with ASD

Among the evidence-based interventions necessitating additional research is VM. In a literature review, Fragale (2014) found VM to be an effective intervention for improving play-related skills, such as solitary play and social play, of children with ASD. Based on the results of three separate meta-analyses, Bellini and Akullian (2007), Wang and Spillane (2009), and Reichow and Volkmar (2010) found that VM is an evidence-based practice for individuals with ASD, which aligns with the NPDC. Specifically, Wang and Spillane found VM to be highly effective for this population. In addition, Scheflen et al. (2012) found VM to be more effective than in vivo modeling, which is where live models perform the target behavior.

There are a number of types or methods in which VMs may be presented to individuals with ASD. One type may include adults as the model, where an educator, staff member, or parent models the preferred behavior or targeted skill. Another type of VM is peers as a model, which includes a peer who may be the same age and gender, such as classmates or siblings, modeling the behavior or skill in focus. A video of the actual recipient of the instruction engaging in the preferred behavior or skill is known as video self-modeling (VSM). Point-of-view video models or first-person perspective video modeling is a video of what the recipient of the instruction would actually see if he or she were engaging in the behavior or skill (Shukla-Mehta et al. 2012). This form of VM may include hands demonstrating the skill and using the relevant materials or other individuals connected to performing the skill or behavior (McCoy and Hermansen 2007).

VM may be used in isolation or as part of an instructional package and may be accompanied with additional instruction, prompting, and reinforcement (Wilson 2013). VM may be presented in two common ways, which are both effective for individuals with ASD (Mason et al. 2013; Sancho et al. 2010). Video priming occurs when the individual is presented with the entire video prior to imitating and practicing the desired skill or behavior. Video prompting involves segmenting the video into a task analysis in order to scaffold the learning of a targeted skill or behavior (Mason et al. 2013).

Employment of VM as an instructional tool by educators has become more frequent due to increased access to technology and its cost effectiveness. Instructors may record a number of VMs in a variety of naturalistic settings that are applicable to the individual student (Scheflen et al. 2012). The VM may be used for more than one student or may be edited to better individualize the product by adding preferred music or video clips to encourage the student to attend to the videos (Hine and Wolery 2006).

The use of VM takes into account the preference for visual stimuli typically shown by individuals with ASD. It is also known that this population does not commonly engage in incidental learning; therefore, VM is an approach that directly teaches the skill or behavior to be imitated and to be applied in the naturalistic setting (Hine and Wolery 2006; McCoy and Hermansen 2007). In addition, individuals with ASD frequently struggle with attending to important and relevant cues in their environment (Haney 2013). Video modeling also aims to help children with ASD better identify significant cues by limiting extraneous stimuli shown in the video (Barton et al. 2012; Mason et al. 2013).

Additionally, Shane et al. (2012) stated that the use of unwieldy, more traditional augmentative and alternative communication (AAC) devices (e.g., GoTalk®, DynaVox®) may stigmatize the individual with disabilities. Therefore, the authors emphasized the need to use commonplace and less stigmatizing consumer-level hardware (e.g., laptop computer, cellular phone, tablet) to provide instruction, specifically social skills, language, and communication, for individuals with ASD. VMs are oftentimes presented on handheld technologies, such as iPads, and therefore, the use of socially acceptable technologies that are rather unobtrusive does not limit the individual’s opportunities to interact with peers (Shane et al. 2012).

Advantages of Point-of-View Video Modeling

Recognizing relevant cues in environments that often include both relevant and irrelevant stimuli is a general deficit for individuals with ASD. Therefore, it is suggested that when compared to other forms of VM, point-of-view video models may be most effective in limiting the irrelevant stimuli and drawing children’s attention to the relevant stimuli (Rayner et al. 2009; Tetreault and Lerman 2010). By filming a VM from the student’s perspective, point-of-view video models may better support the learning of the targeted behavior or skill than any other form of VM. Point-of-view video modeling may provide a clear frame of reference to facilitate imitation, which is another obstacle for individuals with ASD (McCoy and Hermansen 2007). Additionally, according to Ayres and Langone (2007), video models recorded from the student’s perspective are more effective not only in emphasizing the relevant stimuli that require attention but also in reducing the need for the recipient of the intervention to have ToM.

Despite the evidence supporting the use of point-of-view video models as an effective intervention for individuals with ASD, VMs employing adults, peers, and VSMs are the most frequently used intervention for social skills instruction (Fragale 2014; Mason et al. 2013). In a meta-analysis of the efficacy of point-of-view video modeling, Mason et al. (2013) identified a study by Tetreault and Lerman (2010) as the only study which examined point-of-view video modeling and social skills. Mason et al. suggested that the effectiveness of point-of-view video modeling in teaching social skills was inconclusive given the limited research. Nonetheless, Mason et al. indicated that this form of VM was promising for individuals with ASD, which aligned with the conclusions of past meta-analysis of the efficacy of video modeling by McCoy and Hermansen (2007) and Shukla-Mehta et al. (2012).

Purpose of the Review of Literature

Social communication and interaction have been identified as distinguishing impairments for individuals with ASD that pervasively affect the individual’s success in countless contexts and the building of relationships and friendships throughout the course of an individual’s lifetime. Early targeting of social skills may be a key to understanding and ameliorating the significant deficits in the ASD population, such as appropriate use of language and communication and engagement in pro-social behaviors. Point-of-view video modeling has the potential to address these deficits and ultimately improve social communication and interaction in individuals with ASD. Therefore, the purposes of this literature review were to investigate the research regarding social skills instruction through the application of point-of-view video modeling and identify key questions that remain unanswered and warrant further investigation.

Guiding Questions

The following questions guided the evaluation of the effectiveness of point-of-view video modeling in teaching social skills to children with ASD:

  1. 1.

    Do point-of-view video models effectively teach social communication and interaction skills to preschool children with ASD?

  2. 2.

    What social skills do point-of-view video models effectively teach and are the social skills being targeted simple functional play skills (e.g., playing with toys appropriately) or complex play skills (e.g., reciprocal and cooperative play)?

  3. 3.

    What child characteristics or prerequisites are required for point-of-view video models to be successful?

  4. 4.

    What is the appropriate length of a point-of-view video model for a preschool child and how frequently should the child view the video for each session?

  5. 5.

    In order to avoid prompt dependence, how are point-of-view video models faded to guide children towards more independent functioning?

  6. 6.

    Do social skills gained through the implementation of point-of-view video models generalize to different settings, people, and similar scenarios?

  7. 7.

    Are the social skills gained from point-of-view video models maintained after a period of time?

  8. 8.

    Based on the current extent of research, what questions relating to point-of-view video models and social skills remain unanswered?

Method

Empirically based literature on point-of-view video models targeting social skills was selected through electronic and ancestral searches of literature published between 2004 and 2014. The rationale for these parameters was due to the limited amount of research on point-of-view video models and social skills; therefore, the parameters were set at 10 years to better identify the existing research. The following databases were used: Education Research Complete (EBSCO), ERIC, JSTOR, MAS Ultra School Edition, MLA International Bibliography, Primary Search, PsycINFO, and Social Science Citation Index. The keywords used to generate the electronic search included autism, autistic, autism spectrum disorder, ASD, point-of-view video modeling, first person perspective video modeling, and social skills.

The articles resulting from the electronic search were included based on the relevance of the title. More specifically, titles which included the abovementioned keywords and did not focus on neurology or genetics were included for closer examination. Other criteria for inclusion included: (a) studies which included participants diagnosed with ASD, (b) studies that specifically addressed social skills (e.g., social communication, interaction with peer or adults), and (c) studies which examined point-of-view video modeling as the only independent variable (i.e., no additional instruction or program package). Studies that employed supplementary reinforcement (e.g., non-contingent, contingent) and prompting in addition to point-of-view video modeling were included due to its recurrent use in many video modeling intervention studies. Determining whether the resulting articles met the inclusion criteria was done by reviewing the abstract and, if necessary, reviewing the entire article. For the purpose of the review, only articles published in English from peer-reviewed journals were incorporated. Both the electronic search and ancestral search yielded five empirically based research articles evaluating the effectiveness of point-of-view video modeling in teaching social skills. The periodicals in both the electronic search and ancestral search included Education and Training in Autism and Developmental Disabilities, Education and Treatment of Children, Research in Autism Spectrum Disorders, and Topics in Early Childhood Special Education. Figure 1 presents the detailed process used to identify these five studies, which were selected for inclusion.

Fig. 1
figure 1

Literature review process

Results

The following five empirically based studies investigated the effectiveness of point-of-view video modeling in instructing social skills to children with ASD. Table 1 presents an overview of these five studies.

Table 1 Summary data for point-of-view video modeling studies

Study 1: Hine and Wolery (2006)

Hine and Wolery (2006) conducted a multiprobe design across two behaviors and across two participants. Two main research questions guided their study: (a) Will preschoolers with ASD readily imitate actions seen through point-of-view video modeling? and (b) Will any acquired skills generalize to the children’s classroom sensory activities and across untrained materials? The study included two female participants identified with autism based on the DSM-IV. Both participants attended an inclusive, full-day preschool, but in separate classrooms that included 10–14 children with approximately half the class being children with disabilities. At the commencement of the study, Christine was 30 months old and Kaci was 43 months old. Based on teacher reports, both participants engaged in stereotypic behaviors during play periods and showed preferences for videos. The Motor Imitation Scale (Stone et al. 1997) was administered to test the participants’ abilities to imitate, and the results indicated that both participants were capable of imitating simple actions observed from adults or materials.

The materials used in the sessions were identified as sensory toys and consisted of a gardening set (e.g., shovels, planter pots, plants) and a cooking set (e.g., utensils, bowls, plates, pots). These materials were placed in a sensory bin filled with potting soil. Baseline data were collected by the investigators. In the preschool therapy room, the investigators placed the set of gardening toys into the sensory bin and verbally prompted the participant to play. During the 2-min baseline probe, the investigators did not provide any additional prompting on how to use the set of toys. After the probe with the gardening toys, the materials were removed and the participant was permitted to watch a cartoon for another 2 min. The procedure for the set of gardening toys was repeated using the set of cooking toys.

Prior to each intervention session, Hine and Wolery (2006) conducted a daily treatment probe to identify the participant’s performance without immediately seeing the video before imitating or practicing the targeted behaviors. The daily treatment probe mirrored the baseline procedures. The investigators reinforced the participants if they were contacting the toys and remaining at the sensory bin, and verbal praise and tangible rewards were provided for on-task behavior.

In the preschool therapy room, the intervention sessions included the participant, the first author, and an observing graduate student. The independent variable was the point-of-view video models. Prior to the video models, a 2-min cartoon was shown to help the participant attend to the video. The point-of-view video models included a female voice stating, “Play with your toys!” The video then showed a pair of adult hands appropriately manipulating one toy from either the aforementioned gardening or cooking sensory toys in the sensory bin. After modeling appropriate manipulation with each toy, the same female voice stated, “Great job playing with your toys!” The same cartoon then played for a total of 60 s. Each video was no more than 2 min in length and included three exemplars of how the participants were expected to manipulate the same set of toys.

In each intervention session, each participant viewed the two videos before beginning the practice session. During the practice session, the procedures used in the baseline phase were repeated; however, the practice sessions were 3 min in length, and the participants received prompts for standing at the bin and playing with toys. Reinforcement was not provided by the investigators when the participants imitated the modeled behaviors from the videos.

The dependent measure in the study was the number of performed actions mirroring what was modeled in the point-of-view video models. In order to collect and code the data, the daily probe and practice session were video recorded. The first author and a trained graduate student coded any imitated actions in the video recordings. There were six possible exemplars to imitate for the gardening set and five for the cooking set.

Kaci exhibited satiation with the same materials being presented repeatedly, which the investigators stated led to decreased responding during the intervention phase. In order to address this, the investigators introduced a new material by changing the potting soil to colored rice. The authors also used a different and more specific prompt (i.e., “Do what you saw on the video.”) and changed the procedures for Kaci to provide verbal praise and edibles for imitating the modeled actions from the videos.

The investigators probed the participants’ ability to maintain any gained play skills by withdrawing the treatment and practice sessions and returning to baseline procedures. In order to assess generalization, probes were conducted in the participants’ classroom with similar sets of gardening and cooking toys. The investigators also conducted procedural fidelity assessments and administered a social validity questionnaire. Using a 5-point Likert-type scale, 20 special education graduate students viewed and rated videotapes of the participants’ performance before and after the intervention based on their “engagement, manipulation of materials, appropriate use of materials, enjoyment of the activity, and need for help using the materials” (Hine and Wolery 2006, p. 88).

The results of the study indicated that point-of-view video modeling was effective in teaching the participants to appropriately manipulate the sets of gardening and cooking toys. Kaci successfully imitated the modeled actions using the set of gardening toys, and Christine was observed playing appropriately with both sets of toys. The alteration to the study materials, prompts, and reinforcement aided Kaci in imitating the modeled behaviors with the set of cooking toys. Hine and Wolery (2006) stated that the presentation of multiple examples of the targeted behavior through the point-of-view video models led to generalization; however, only skills gained with the set of gardening toys generalized to the classroom setting. Both participants performed with inconsistency in the maintenance probes, thus making it difficult to draw conclusions about the effectiveness of the intervention in promoting maintenance of gains. Results from the procedural fidelity assessments showed that all phases of the study were conducted with 95 % accuracy. Based on the social validity questionnaires, the raters found the intervention to be socially valid in increasing engagement with the activity, manipulating the materials multiple times, appropriate use of materials, and enjoyment of the activity. The raters also found that the participants did not require as much assistance using the materials.

The results of the study are promising. However, the conclusions of the study, which used a multiple probe design across two participants and two behaviors—playing with a set of gardening toys and a set of cooking toys—may be made stronger with additional replications (Kratochwill et al. 2010). The adjustments to the study procedures for Kaci also weaken the overall results and highlight the potential need to provide more specific prompting and praise in order to promote skill acquisition. The authors attempted to identify prerequisites and noted that the participants exhibited basic imitation skills with adults as models prior to the study. However, further research needs to be conducted on whether this is an accurate prerequisite for point-of-view video modeling to be effective. The authors also mentioned the limited number of probes conducted in the phases of the study and the need to collect data on other facets of social skills, such as engagement in functional play and social interactions with peers. Lastly, generalization, maintenance, and examining the impact of more cues and reinforcement continue to be areas warranting further research.

Study 2: Sancho et al. (2010)

In another study, Sancho et al. (2010) used an adapted alternating treatments design and multiple baseline design to teach play skills to two children. At the time of the study, Mark was 5 years, 4 months old and Erin was 5 years, 11 months old. Both had been diagnosed with autism by an independent agency and were selected due to their limited imaginative play. Following assessments targeting attention and imitation, it was determined that the participants had the ability to attend to a television for at least 2 min and imitate a minimum of 20 motor movements and 10 simple phrases.

Two play sets, a play house and a circus, which contained five characters per set, were used in the point-of-view video models. Prior to beginning the intervention phase, the investigators collected baseline data by placing a play set before the child and providing the instruction, “It’s time to play.” The participant was observed for 4 min, and no prompts, reinforcement, or further directions were provided.

In the intervention phase, the point-of-view video models presented 2-min play scenarios containing 10 scripted actions with the play set and characters and 10 vocal scripts. The video models contained two adult hands using the play set and characters to model the scripted actions. In addition to filming the video from the first-person perspective, the investigators also recorded the video from an additional three different angles (i.e., in front of the set, to the right of the set, and from the left of the set).

Within the intervention phase, the participants took part in both a simultaneous video modeling procedure and a video priming procedure. With simultaneous video modeling, the participant viewed a video once with the play set and corresponding characters also placed in front of them. While the participant was viewing the video, the investigator would manually prompt and reinforce the play actions with the characters. The prompts were systematically faded and reinforcement was provided contingent on prompted and independent responding. A correction procedure was used for any error by rewinding the video to the specific action and having the student imitate the action. Following the intervention session, Sancho et al. (2010) returned to baseline procedures to collect the post-session data. With video priming, the participant did not have access to the play set and characters while viewing the video model. The investigator did not provide any manual prompts. Reinforcers were provided every 10 s contingent only on the child’s attention to the video, and not to the child’s imitations of the play actions or scripts. It is important to note that the edible reinforcers that were provided during the intervention sessions were placed in a clear cup near the DVD player. The participant was permitted to only consume the reinforcers following the session. Following the intervention session, the investigators returned to baseline procedures to collect the post-session data.

Data collection was facilitated by video recordings of all sessions. The dependent measures included attending to the video or play set characters, imitation of vocal scripts, unscripted verbalizations, imitation of actions with the characters, and unscripted actions with the characters. Data were collected using a 10-s momentary time sampling procedure and frequency data. Additionally, interobserver agreement, treatment fidelity, and social validity were assessed by the investigators.

In order to probe for generalization, five additional settings were selected: the classroom, conference room, office, gymnasium stage, and a multipurpose room in each participant’s home. Novel instructors and similar play sets and characters were also used. Both simultaneous video modeling and video priming procedures were used as described above in the generalization probes. One and 2 weeks after the study, maintenance probes were conducted for the participants.

Based on the results of their study, Sancho et al. (2010) concluded that both video modeling procedures (i.e., simultaneous video modeling and video priming) were effective in teaching and maintaining play skills for the two participating children. For Mark, both types were effective in teaching and maintaining scripted play actions. However, for Erin, simultaneous video modeling was more effective. Unfortunately, engagement in unscripted play actions and vocal scripts occurred rarely. However, simultaneous video modeling led to higher scripted verbalizations in the generalization sessions, while unscripted play actions remained low. In addition, Sancho et al. stated that generalization did not occur for novel play sets. Results from assessments on interobserver agreement showed a total range of 97–100 % and treatment fidelity also showed an overall range of 97–100 %. The results of the Likert-type scale social validity assessment, which was completed by 16 teachers, identified that the educators were willing to implement simultaneous video modeling procedures and video priming procedures.

The findings of the study do not provide any further clarity on whether video priming or a form of video prompting is more efficacious. The authors also mention that the prompting and reinforcement may have impacted their data and may have potentially led to multiple treatment interference. Additionally, like the previous study by Hine and Wolery (2006), the study was rather small, including only two participants with two play sets. According to Kratochwill et al. (2010), at least three replications are necessary to strengthen the conclusions made in multiple baseline studies. However, the study did attempt to address and teach both functional play and social scripts. The authors also collected data on unscripted play actions and vocalizations, which is another step to further improving social communication and interaction in individuals with ASD. Both generalization and maintenance were assessed, which are additional factors in identifying whether point-of-view video modeling is effective for this population.

Study 3: Scheflen et al. (2012)

In the most recently published article on point-of-view video modeling and the teaching of play and social skills, Scheflen et al. (2012) used a multiple baseline design with four male participants. The authors believed that by creating VMs which followed the developmental sequence of play skills, individuals with ASD would be able to acquire such skills, which would also translate to improvements in language. The authors also incorporated past research in directly teaching language through video modeling. The participants were randomly sampled from an ASD treatment program and were between 2 and 3 years of age. Four participants were selected. Ian was 59 months at the time of the study, exhibited no initiation of communication, required the highest level of prompting for one word utterances, did not use multiple word phrases, and engaged in echolalia. Jeremy was 69 months, exhibited no initiation of communication or pragmatics, had poor verbal skills, and engaged in echolalia. At the time of the study, Ryan was 37 months, had severe apraxia, made basic requests, and also engaged in echolalia. Jonah was 37 months, was able to make requests but not comments and questions, and engaged in echolalia. The investigators also collected additional information by implementing the Psychoeducational Profile Revised (PEP-R; Schopler et al. 1990), the Vineland II (Sparrow et al. 2005), and the Preschool Language Scale, 4th Edition (PLS-4; Zimmerman et al. 2002).

Prior to beginning the intervention phase, the authors collected baseline data by observing the participants during a 15-min free play session in the classroom and a therapy room. Both settings included different types of toys. The authors’ aim was to determine the participants’ play levels in different settings.

Scheflen et al. (2012) created video models demonstrating sequences of play that corresponded with each level of play according to the developmental sequence established by Kasari et al. (2006). The levels of play are presented in Table 2.

Table 2 Video models of play based on developmental sequence

The point-of-view video models contained adult hands manipulating different toys or sets of toys, which were also accompanied with scripted language. The intervention sessions took place in the speech therapy room twice a week for 15 min. According to their observed play level during baseline, participants watched the video model of a play skill of the next level of play on the developmental play sequence. Participants watched videos targeting one play skill representing the corresponding level of play with three separate toy models two times each. Each toy model was approximately 30 s in length. After watching one video using the first of three toy models twice, the participant was given the same toys for 2 min to imitate what was modeled. During this time, the investigator provided the instruction, “Time to play!” No further prompts or reinforcement was provided; however, participants were reinforced contingent on imitation of play actions. This procedure was repeated until all three toy model videos were shown and the participant was able to practice with the toys for 2 min. Mastery was determined after the participant was observed engaging in that specific level of play with three differing toys not seen in the video models in the therapy room and in the classroom.

The dependent variables in the study included engagement in play actions according to the student’s level and vocalizations that related to the play actions. Both maintenance and generalization were assessed, in addition to procedural fidelity and social validity.

Based on the results of the study by Scheflen et al. (2012), the video modeling procedures were effective in teaching functional play with toys and developing language during play. The study had a notable strength, which was the inclusion of detailed demographic and assessment information, which may provide some information on prerequisite skills for point-of-view video modeling to be effective. However, the size of the study was small and is a limitation of this study. In addition, the authors acknowledged that the participants received intensive speech therapy during the time of the study, which may influence the interpretation of the results and the conclusions about the impact of point-of-view video modeling.

Study 4: Tereshko et al. (2010)

Tereshko et al. (2010) conducted a multiple baseline design to investigate the impact of point-of-view video modeling on teaching functional play skills. The study included four male preschool participants diagnosed with ASD with the Autism Diagnostic Observation Scale (ADOS; Lord et al. 2001). At the time of the study, Mark was 6 years old and was assessed to be at 43 months with the Peabody Picture Vocabulary Test-Third Edition Form IIIB (PPVT-IIIB; Dunn 1997). Alex was 4 years of age and was assessed at 71 months with the Peabody Picture Vocabulary Test-Third Edition Form IVB (PPVT-IVB; Dunn 1997). Tom was 6 years old and was able to inconsistently identify pictures after being provided with a verbal discriminative stimulus with the PPVT-IIIB. Joe was 5 years of age and was assessed at 65 months with the Peabody Picture Vocabulary Test-Third Edition Form IIIA (PPVT-IIIA; Dunn 1997).

The sessions took place in the school’s therapy room. Pre-assessment data were collected to determine the participants’ abilities to discriminate objects, identify pictures on a computer screen, and attend to a video shown on a DVD player. Mega Bloks® were used to construct four different toy structures consisting of eight pieces each. In the baseline phase, the investigators placed a disassembled toy structure and a picture of the completed project before the participant. After the investigator directed the participant to play, no further prompts were provided. After 2 min, a non-contingent reinforcer was provided and the baseline procedures were repeated for an additional two-toy structure creation.

The point-of-view video model presented adult hands using the Mega Bloks® to construct three separate toy structures. The investigators zoomed into specific actions to help the participant attend to the relevant stimuli. Each full model of an entire toy structure being constructed was then segmented into a response chain. The first video chain included one step only. The second video chain include step one and step two. The third video chain included steps one through three. The video response chains were edited until all eight steps were completed and the final product had been constructed.

Participants were first presented with the full video model. Prompts were only used to redirect the participants’ attention to the video. Baseline procedures were used during the practice session to collect data. Once the participant performed at a stable level with fewer than 50 % of the steps completed, the participant proceeded to view the segmented videos. With the video segments, the participant watched each chain and then proceeded to the practice session which replicated the baseline procedures. The first sessions involved completing only the first step in building the toy structure. Once the participant was able to complete the first step with 100 % accuracy across two consecutive trials, the participant proceeded to the next chain until all eight steps and the toy structure was completed. Once the participant was able to follow all response chains and build the toy structure with 100 % accuracy across two consecutive sessions, the video model was removed and the participant was instructed to build the toy structure with only the picture. The participant was able to proceed to the next toy structure after building the toy structure with 100 % accuracy across two consecutive sessions.

A response blocking procedure was used for three participants to prevent repeated mistakes or attempts to reach for the incorrect Mega Bloks®. If the participant made a mistake in three out of five consecutive sessions on a single step, the investigator blocked the next incorrect response, but did not provide any prompting or redirection to the correct piece.

All sessions were recorded to allow for data collection. Tereshko et al. (2010) collected data on the construction of the toy structure and attention to the video model. The investigators also calculated interobserver agreement. In addition, generalization probes were conducted in the participants’ classrooms once the participant had reached mastery with a toy structure. A fourth toy structure was used for this generalization probe.

Tereshko et al. (2010) indicated their study demonstrated that segmenting the point-of-view video models into forward response chains was effective for teaching functional play with the Mega Bloks®. All participants were able to build all three toy structures and generalize those skills to the classroom setting. For Tom and Joe, the segmented videos were needed to build the first two toy structures. However, on the third, these two participants were able to accurately build the structure by just viewing the full video. The authors suggested that the use of chaining led to greater imitation skills and attendance to relevant stimuli. The segmented videos also scaffolded learning and only allowed the participants to proceed to the next step upon mastering the previous, foundational steps.

The results of the study provide promising evidence that point-of-view video modeling, coupled with segmenting or forward response chaining, is effective in teaching children with ASD to imitate skills and play functionally. However, the use of a photograph to emphasize the final product may have affected the results of the study by providing added support and may threaten internal validity through multiple treatment interference. Additionally, the use of a photograph may not be applicable to building more social play skills or pretend play with other students, since those cannot be as concretely depicted. The authors also failed to address maintenance of skills. Nonetheless, the study by Tereshko et al. (2010) provides a different perspective on how point-of-view video modeling and forward response chaining may effectively teach play skills.

Study 5: Tetreault and Lerman (2010)

In another recently published study, Tetreault and Lerman (2010) examined the impact of point-of-view video modeling in teaching three children diagnosed with autism to initiate and maintain social interactions with others by implementing a multiple baseline design across three behaviors and three participants. The participants, who were attending a private behavior analytic services center, were diagnosed by an independent psychologist. According to the Childhood Autism Rating Scale (CARS; Schopler et al. 1988), Zhane and Janet fell within the severe range of symptomology and Randall fell within the mild–moderate range of symptomology. The Preschool Language Scale, Fourth Edition (PLS-4; Zimmerman et al. 2002) was administered to all of the participants. At the time of the study, Randall was 8 years, 2 months and his receptive and expressive language abilities were assessed to be at the age equivalent of 3 years, 4 months and 3 years, 1 month, respectively. Zhane was 5 years, 5 months and his receptive and expressive language abilities were assessed to be at the age equivalent of 2 years, 3 months and 2 years, 9 month, respectively. Janet was 4 years, 4 months and both her receptive and expressive language abilities were assessed to be at the age equivalent of 3 years, 10 months. All three participants exhibited minimal social initiations, but were able to imitate three- to four-word sentences. Prior to the study, none of the three participants had received instruction through video models.

Tetreault and Lerman (2010) selected three scripts or opportunities for the participants to initiate and maintain a social interaction that would be modeled using point-of-view video modeling, and each script included corresponding materials. The three scripts were entitled: “Get Attention,” “Request Assistance,” and “Share a Toy.” The aim of the script Get Attention was to have the participant obtain a conversant’s attention to show him or her a drawing on a dry erase board. In the Request Assistance script, the goal was to have participants ask for a closed box containing a bottle of bubbles. Lastly, the Share a Toy script asked participants to share a Mr. Potato Head® doll with a conversant and then to request it back. Each script included a form of greeting and five concrete exchanges, which the authors defined as making eye contact and a vocalization with a conversant.

All sessions were conducted in a small room at the treatment center. In the baseline phase, each participant was placed at a table containing the toys that would later be used in the point-of-view video models. The participants were informed that a conversant would leave and then enter and that they needed to play at the table with that individual. Every 10 s, the conversant would state the assigned line in the script regardless of the participant’s performance.

During the intervention phase, a portable DVD player was used to show the video models. The independent variable was the point-of-view video models, which began with a brief visual cue or transition into the video model. The video models were no more than 3 min in length and, as aforementioned, included the verbalized scripts of a conversation pertaining to gaining attention, seeking assistance, and sharing. In the point-of-view video models, the first author verbalized the script to be imitated by the participants and an unfamiliar graduate student was the conversant, who was also recorded in the video models. The first author was not present in the video. The recorded video also showed head movements (e.g., nodding and making eye contact with the conversant) by mimicking such movements with the equipment while recording.

Practice sessions were conducted following the viewing of a video model and contained the same materials used in the particular video model shown. Practice sessions were recorded for data collection purposes. Greetings were scored as correct if the participant vocalized an appropriate greeting. Exchanges (e.g., eye contact and vocalization) were identified as correct if “the child said the exact sentence from the video or a sentence that differed by no more than two words (added or deleted) from the target script” (Tetreault and Lerman 2010, p. 399). If the participant was observed making eye contact with the conversant for any amount of time during the vocalization, it was scored as correct. Practice sessions mirrored the baseline procedures. Additionally, if the participant did not imitate the exchange after 10 s, the trainer provided a cue for the conversant to proceed onto the next statement. This was done with the use of an index card displaying the subsequent statement, which was presented in a manner that could not be seen by the participant (Tetreault and Lerman 2010). The authors identified the mastery criterion to be any 8 out of 10 exchanges (i.e., either eye contact or vocalizations) occurring per session across three consecutive sessions.

At the beginning of the intervention phase, the first author provided reinforcements contingent on attention to the video model. During the practice sessions, reinforcement was provided to the participant if they engaged in the scripted exchange. Only one participant, Janet, began to speak with the first author and not the graduate student serving as the conversant. The authors believed this was due to her associating the first author with the reinforcers. Therefore, the authors decided to remove the reinforcers and only provide the video models. When Janet was unable to reach mastering for the previous two phases of the intervention, the authors provided least-to-most prompting when Janet did not engage in an exchange after 10 s.

In order to probe for generalization, the materials initially used were replaced with similar items, such as a Play-Doh®, a screw-top plastic container, or a toy bus for the respective scripts. In both the generalization and maintenance phase, the authors returned to baseline procedures.

Tetreault and Lerman (2010) concluded that the impact of point-of-view video modeling on initiating and maintaining social interactions with a conversant was unconvincing. Each participant required some level of additional support through reinforcement or prompting or some modification to the script, thus making the results of the study difficult to interpret. The authors indicated the intervention was most successful in increasing and generalizing eye contact among the participants. However, for the vocal exchanges, the authors believed that they were not as concrete or easily discernable as the video movements mimicking eye contact with a conversant. Generalization was minimal for the three participants. The authors also stated the inconclusive results of their study may be in part due to the complexity of the targeted social skills, which have not been studied in the past. This emphasizes the need for further research to better understand the effectiveness of point-of-view video modeling in teaching more complex social skills.

Although the targeted skills, the procedures, and the outcomes of these studies varied, one clear theme emerged: students with ASD, to some degree, showed improvements in social skills following point-of-view video modeling. However, the research pertaining to this intervention had several weaknesses. Although typical for single-subject studies, a common weakness in the studies included small sample sizes. In addition, few studies assessed but found little evidence of generalization or maintenance of the acquired skills. Likewise, the authors of several studies pointed out the need for future research. A more comprehensive discussion of the reviewed studies is provided in the following section.

Discussion

The purpose of this literature review was to investigate the impact of point-of-view video modeling on social skills and to find answers to the guiding questions outlined earlier despite limited research in this area. The five articles which did address this form of VM provide inconclusive results on the effectiveness of this intervention. However, the limited research does provide a foundation for teaching social skills to students with ASD, and a number of noteworthy points may be gleaned from the review of the literature.

Guiding Questions

The existing literature collectively supports the effectiveness of point-of-view video modeling in teaching play skills and social skills to children with ASD. However, questions still remain as to the breadth of this intervention’s impact on teaching the complexities of social skills to this population. Of the five studies, studies 1 through 4 (Hine and Wolery 2006; Sancho et al. 2010; Scheflen et al. 2012; Tereshko et al. 2010) targeted solitary play and study 5 (Tetreault and Lerman 2010) targeted social play (i.e., initiating and maintaining social interaction). The four studies which focused on solitary play also targeted more simple functional play skills (i.e., playing with gardening and cooking sets, play with character toys, building toy structures) instead of more complex play skills that are more reciprocal and cooperative in nature (e.g., role-playing, dress-up games with peers). However, it could be reasonably contended that study 4, which was conducted by Tereshko et al. (2010), differs from its other counterparts due to the level of concrete structure or requirements of the point-of-view video model. Nonetheless, the article focuses on functional play skills that may lead to greater socialization with other peers through the appropriate usage of the building blocks included in the study.

Studies 2 and 5, which were conducted by Sancho et al. (2010) and Tetreault and Lerman (2010), did include more complex play and social skills. Sancho et al. targeted both scripted play actions and vocalizations with a play set and corresponding toys. Tetreault and Lerman used social scripts to target gaining a conversant’s attention, seeking help, and sharing a toy with another. However, it is still difficult to determine how effective point-of-view video modeling is in teaching more complex social skills. The studies by Sancho et al. and Tetreault and Lerman showed mixed results and, overall, minimal evidence of both generalization and maintenance. It is also important to note the ability to engage in simple functional skills is necessary before graduating on to complex social skills. Sancho et al. mentions that the two participants did not engage in imaginative play, and Tetreault and Lerman do not specifically address the participants’ simple social skills. Therefore, the varied results of these studies may be due to the incomplete examination of simple social or play skills as prerequisite skills during the sampling of participants.

Through pre-assessments, observations, and parent and teacher reports, the authors of these studies attempted to determine the prerequisites required for a child to be ideal for point-of-view video modeling. In study 1, Hine and Wolery (2006) identified whether the participants were capable of imitating simple actions observed from adults or materials. Studies 2 and 4 (Sancho et al. 2010; Tereshko et al. 2010) determined whether participants could attend to a video or television. Additionally, Tereshko et al. (2010) assessed the participants’ abilities to discriminate objects and identify pictures on a computer screen. In study 5, Tetreault and Lerman (2010) identified participants’ receptive and expressive language, and in study 3 Scheflen et al. (2012) provided detailed demographic and assessment data. The studies included in the review of literature used a number of different assessments to determine the appropriateness of the intervention, and it still remains unclear whether the prerequisite skills assessed in these studies had a positive or negative impact on the concluding results.

The length of a video model is important in helping an individual with ASD attend to the video and may facilitate imitation of the targeted behavior or skill. Studies 1 and 2 (Hine and Wolery 2006; Sancho et al. 2010) used videos no more than 2 min in length and study 5 (Tetreault and Lerman 2010) included videos no more than 3 min in length. However, studies 3 and 4 (Scheflen et al. 2012; Tereshko et al. 2010) did not clearly report the length of their videos. Several studies were also unclear about the number of times the participant viewed the video models in a single session. However, it was clear that repeated viewings of the video models were necessary to facilitate skill acquisition.

Prompting and reinforcement were used in all five studies. However, only studies 2 and 5 (Sancho et al. 2010; Tetreault and Lerman 2010) included procedures to fade prompting and increase independent functioning. Additionally, the studies included varied results in regard to both generalization and maintenance. Study 1 (Hine and Wolery 2006) showed generalization with one set of toys and study 5 (Tetreault and Lerman 2010) showed generalization only with making eye contact. Additionally, only study 2 (Sancho et al. 2010) showed positive results for maintenance.

These five studies provide preliminary research demonstrating the potential effectiveness of point-of-view video modeling. However, a number of questions remain unanswered and future research continues to be warranted.

Implications for Future Research

The literature provides limited research on how point-of-view video modeling may teach more complex social skills that include social communication and interaction with other adults or peers. In addition, there is little research on whether learning such skills through this form of VM may lead to unscripted play behavior and communication. Future research should be conducted on the extent to which point-of-view video modeling can teach social play and how this evidence-based intervention may further develop unscripted and novel play.

The studies included in this review also address a number of prerequisites that may aid in identifying whether point-of-view video modeling is an effective intervention for an individual with ASD. One potential skill a child may need to have in his or her repertoires is the ability to attend to video shown on a computer screen, portable DVD player, or television. However, McCoy and Hermansen (2007) and Plavnick (2012) stated that it remains inconclusive as to whether there is a relationship between the ability to attend to a video and the imitation of the skill or behavior being targeted in the video model. It is also unclear what verbal skills an individual must have to imitate vocalizations from video models. Therefore, more research needs to be conducted to identify what are the optimal characteristics of an individual with ASD in order for point-of-view video modeling to be a viable intervention in teaching play and social skills. In order to facilitate this, Mason et al. (2013) stated future research should also include more detailed diagnostic information and assessment information on each participant.

The need to conduct additional research comparing the use of video priming and prompting, which was only minimally addressed by study 2 (Sancho et al. 2010), is still necessary. In addition, future research must be conducted to determine the appropriate length of a video model and the frequency in which a participant should view the model before having to practice the targeted skill or behavior. The results of the literature review do not shed any conclusive light on this matter.

Several studies employed unique video editing to further facilitate skill acquisition. In study 2, Sancho et al. (2010) filmed the video models from the first-person perspective and three additional angles. In study 4, Tereshko et al. zoomed into relevant actions and visual stimuli to ensure participants attended to specific details of building a toy structure. In study 5, Tetreault and Lerman (2010) mimicked head nodding and the making of eye contact. Studies 1 and 5 (Hine and Wolery 2006; Tetreault and Lerman 2010) used a visual cue before presenting the video model to gain the attention of the participant. It is not clear whether these differences in the video models led to positive results; therefore, further research should investigate when such edits to the video models are warranted.

All of the included studies in the literature review were coupled with both reinforcement and prompting. This consistency among the studies highlights the potential need to provide specific reinforcement and praise to promote skill acquisition. However, future research must provide procedures to fade reinforcement and prompting. Addressing this may also lead to further generalization and maintenance, which are both areas that continue to require future research to better show evidence of the effectiveness of point-of-view video modeling.