Augmentative and alternative communication (AAC) systems are often used by individuals with developmental disabilities who have complex communication needs (CCN) to support the development of communicative repertoires. It is estimated that approximately 33% of individuals with intellectual and developmental disabilities (Blackwell et al., 1989; Ganz et al., 2017) and 30% of individuals with autism spectrum disorder (ASD); Wodka et al. (2013) do not fully develop functional vocal speech. Thus, the use of AAC systems is often recommended to support the development of their communication repertoires (Beukelman & Mirenda, 2005; Ganz et al., 2012).

AAC includes an array of options, ranging from unaided AAC, such as manual sign or gestures; to aided AAC, such as picture exchange and speech-generating devices (SGDs). Both aided and unaided AAC systems have been effectively used to establish basic communication skills (Ganz, 2015; Gevarter et al., 2013; Schlosser & Koul, 2015). AAC is listed as evidence-based practice (EBP) by the National Clearinghouse on Autism Evidence and Practice (Steinbrenner et al., 2020) and considered an emerging practice by the National Standards Project, Phase 2 (Wong et al., 2015).

In the last decade a substantial increase in the availability of portable electronic aided AAC devices (speech-generating devices) such as tablets (e.g., iPad1) and portable touch screen devices (e.g., iPod1) rather than non-electronic AAC options (e.g., PECS, Lorah et al., 2022; Shane et al., 2012; van der Meer & Rispoli, 2010) has occurred. These trends parallel changes in use of technology in society as a whole; thus, it is not surprising to see increases in research utilizing smart devices in communication-based interventions (Achmadi et al., 2012; Kagohara et al., 2012; van der Meer et al., 2012, 2013).

To date, several researchers have conducted reviews of the literature on AAC interventions (see Crowe et al., 2021; Lorah et al., 2022; Morin et al., 2018; Schlosser & Koul, 2015) to evaluate the body of literature and quality of the evidence. For example, Schlosser and Koul (2015) provided a scoping review of AAC research to evaluate the effectiveness of interventions, identify gaps in the literature, and provide suggestions for future research. Their primary findings indicated a robust body of high-quality studies aimed at teaching requesting skills using speech output technologies. Similarly, Morin et al. (2018) conducted a review of the literature focused on appraising the quality of single-case research on high-tech AAC interventions. Findings of this review indicated implementation of high-tech AAC may be considered an EBP for individuals with autism or intellectual disabilities who have CCN.

In contrast, Crowe et al. (2021) conducted a mega-review of literature reviews, systematic reviews, and meta-analyses, published from 2000 to 2020, that used AAC interventions with children with disabilities. The MeaSurement Tool to Assess Systematic Reviews Revised (AMSTART 2, Shea et al., 2017) was used to assess rigor of the 84 reviews selected for inclusion. Overall, the authors noted that although slight increases in methodological rigor have occurred, several methodological weaknesses continue to exist in the body of literature that purports to support AAC interventions as evidence-based for individuals with developmental disabilities and CCN. However, these reviews were each limited in their analysis of the intervention components used to teach the use of AAC systems and analysis of the system components (e.g., vocabulary used, display settings, features).

More recently, Lorah et al. (2022) conducted a systematic review of evidence-based teaching procedures for individuals with autism using mobile AAC systems. In this synthesis, 38 studies were reviewed, and findings indicated the most used evidence-based teaching strategies used were prompting (N = 38), time delay (N = 32), reinforcement (N = 24), and differential reinforcement (N = 14). However, similar to the previously mentioned reviews, an analysis of the type of vocabulary used within the interventions has not been included.

To date, most of the research on procedures to teach the use of AAC systems has evaluated learning to request (also known as manding; Skinner, 1957) preferred items using nouns (e.g., foods, object names, important people; Ganz, 2015; Mirenda, 2017; Schlosser & Koul, 2015). This approach aligns with patterns observed in early development of expressive communication, especially in the development of English speakers (Bloom, 2000; Fenson et al., 2007). Given the importance of establishing requesting skills for a functional repertoire (Carnett et al., 2019; Sundberg & Michael, 2001), this may simply be a result of the relevant research considering the needs of the population with which it has mostly been conducted; namely children with developmental disabilities.

More recently, research has extended to investigate the type of vocabulary used within interventions for AAC. Specifically, studies have incorporated the use of core vocabulary based AAC systems for individuals with developmental disabilities (Laubscher & Light, 2020). Core vocabulary is defined in the literature as the “lexical items which are accepted as being central and indispensable to language use” (Bell, 2012, p. 1). This vocabulary includes more general words that can be used for multiple purposes and contexts (e.g., A, GO, KNOW, WITH). Laubscher and Light (2020) provided a narrative review of the literature to evaluate core vocabulary lists for young children using AAC systems. Specifically, the authors evaluated five studies, only one of which included participants who were classified as having CCN. Their main findings suggested that many of the categories of words that predominate early vocabulary development were under-represented in the core vocabulary, and thus may not be an appropriate to guide to AAC vocabulary selection for early learners of symbolic (picture/icon-based) communication. Although these findings have important implications for AAC users, a systematic review of interventions that utilize core vocabulary in the context of AAC systems for individuals with CCN has yet to be conducted. Likewise, the quality and rigor of this body of literature has yet to be evaluated. Thus, there are two aims of the current review: (a) to synthesize and analyze the current research, and (b) to evaluate the quality of research on this topic to provide guidance on interventions that incorporate core vocabulary withinAAC systems.

Method

Search Procedures

Systematic searches were conducted using the following databases: Education Resources Information Center (ERIC), PsycINFO, Linguistic and Language Behavior Abstracts (LLBA), and ProQuest Dissertation. Database searches were limited to English language peer reviewed journals, except ProQuest Dissertation. The first author searched each database using the combined search terms: “core word*” or “core vocabulary” or “Makaton vocabulary” or “LAMP” and “auti*” or “ASD” or “cerebral palsy” or “communication disorder” or “complex communication needs” or “deaf” or “developmental delay” or “developmental disability” or “disability” or “down syndrome” or “minimally verbal” or “nonverbal” or “intellectual disability” or “speech disorder” and “treatment” or “teach*” or “therapy” or “intervention”. The database yields were then independently screened by the article abstracts to determine consideration for full screening. Duplicates, reviews, and concept papers were removed; however, dissertations were also included since intervention research on this topic is relatively recent. In total, 94 articles were examined to determine initial inclusion for full analysis. Reference checks were conducted for each article that met the inclusion criteria to identify any additional relevant studies. A total of 23 studies from the initial database searches and eight studies from the hand searches were then identified for further screening. Lastly, the first author conducted targeted searches from the last two years for relevant journals to account for any additional relevant studies that did not include the identified key terms (See Fig. 1).

Fig. 1
figure 1

Search graphic

Screening and Inclusion Criteria

The first three authors then completed a second full screening process, in which the 31 studies were screened to determine if the study met the following inclusion criteria: (a) included at least one participant with a diagnosis of a disability that warrants a communication intervention (e.g., ASD, developmental disability, communication disorder), (b) included an intervention in which an independent variable and a dependent variable could be identified, and (c) included the use of core vocabulary. Full article screening involved checking each article for the inclusion components previously listed. For example, if the article described the use of core vocabulary as a dependent variable, it was included. Application of these inclusion criteria resulted in 10 studies being included in this review (see Table 1). Agreement for the application of the inclusion or exclusion criteria was obtained for 29 out of the 31 studies (94%). For articles in which there was disagreement, the third author reviewed the study for consensus to be reached.

Table 1 Summary of reviewed literature

Data Extraction

Data were extracted for the 10 included studies and summarized in terms of the following variables: (a) participant characteristics, (b) dependent variables, (c) intervention components, (e) research design, (f) rigor, and (g) study outcomes. Data were extracted to a summary table by the first author and checked for accuracy by the second and third authors.

Similar to other published systematic reviews, each study was evaluated for overall effects (certainty of evidence) and classified as positive, mixed, or negative based on visual analysis of graphed results for single case studies (Gast & Ledford, 2009), and for group designs (Davis et al., 2013; Lang et al., 2012). Studies were coded as yielding positive results if all participants showed increases (above baseline levels) on all dependent variables or if statistically significant increases were shown for all dependent variables in a group design. Studies were coded as mixed if some participants showed increases and others did not or if increases were found in some dependent variables, but not all. For group design coding, mixed results were identified if some increases in some of the dependent variables were statistically significant, but others failed to reach statistical significance. Studies were coded as yielding negative results if participants did not show increases on any of the dependent variables or if a group design failed to yield statistically significant improvement in any of the dependent variables.

Additionally, a quality review of each study was completed using the criteria defined by the Council for Exceptional Children (CEC) Evidence-Based Practices (EBP) workgroup (Cook et al., 2015a, b). Specifically, there are eight dimensions included for a study to be identified as high-quality (i.e., context and setting, participants, intervention agents, description of practice, implementation fidelity, internal validity, measurement, and data analysis). The CEC quality standards were selected given the population of focus for this review and because most of the studies took place within school settings.

Results

Table 1 provides a summary of each study.

Participants and Settings

A total of 67 participants with a variety of diagnoses (e.g., ASD, developmental disability, intellectual disability) were included; the most reported was ASD (63%). Of the studies in which gender was reported (8 studies), 80% (N = 35) of participants were male and 20% (N = 9) were female; however, it should be noted that gender was not reported for 23 (33%) of the participants (i.e., Lal, 2010; Riccelli-Sherman, 2017). Participant age ranged from 3 to 12 years; however, a mean age could not be determined given the lack of specific participant demographic data. Some information on communication abilities were provided, but most studies relied on anecdotal information or Individual Education Program (IEP) information. Two studies provided assessment (e.g., Mullens Scales of Early Learning, Verbal Behavior Milestone Assessment and Placement Program, Vineland Adaptive Behavior Scales) information related to communication abilities (e.g., Karnes, 2019; Tan et al., 2014), and three studies used established pre and post assessments (e.g., Communication Matrix, Language Assessment Test for Children with Autism).

For settings, 80% (N = 8) of the included studies conducted sessions within school settings, and two studies (i.e., Karnes, 2019; Tan et al., 2014) conducted sessions in a clinical setting.

Communication Systems

Table 2 provides a summary of AAC systems used and related information. For each of the included studies, the dependent variable involved an AAC system, such as picture exchange (PE), manual sign, aided language board, or a speech-generating device (SGD). For all studies, participants’ target behavior included the use of an AAC in conjunction with core vocabulary communication targets. Within the reviewed studies, the most common modality was SGD-based systems (e.g., Accent, VantageLite, Proloquo2Go), which were used in four (40%) of the studies as the specific communication device (i.e., Bedwani et al., 2015; Hammond, 2017; Karnes, 2019; Mason, 2016). The second most common modality was an exchanged-based system (e.g., picture/symbol exchange, picture exchange communication system; [PECS] (Bondy & Frost, 1994), or tactile symbol exchange), which were used in three (30%) of the studies (i.e., Dorney & Erickson, 2019; Snodgrass et al., 2013; Willis, 2020). The remaining studies used manual sign (i.e., Lal, 2010; Tan et al., 2014), and communication boards (i.e., Dorney & Erickson, 2019; Riccelli-Sherman, 2017). Of note, Dorney and Erickson (2019) reported a variety of AAC systems.

Table 2 Summary of AAC systems and communication targets

For studies in which SGDs, communication boards, and PECS systems were used, most studies provided little to no information on the grid organization and number of symbols on the grid. Of the two studies who provided grid display information, Dorney and Erickson (2019) specified that a total of 65 symbols (54 core vocabulary symbols and 5 color symbols) were used. Karnes (2019) specified that 60 symbols were presented in the grid display, but the study included three phases of stimulus fading (i.e., Phase 1 included 1 symbol; Phase 2 included 1 symbol with 59 dimmed symbols; Phase 3 included 60 symbols).

Dependent Variables

All the studies included in this review used core vocabulary communication targets as part of the dependent variables (DV). Half of the studies included teaching only core vocabulary words; however, only 60% of the studies reported the specific words targeted during intervention sessions. Of the studies that used both core and fringe vocabulary (N = 5), only two provided information on the actual vocabulary words (e.g., Hammond, 2017; Lal, 2010) Of the studies that included only core vocabulary (N = 5), four provided information on the specific words targeted.

The most commonly reported DV was pre/post measures of communication (i.e., vocabulary use or formal assessment), seen in 60% of the studies. The second most common DV reported was direct measurement of communication (e.g., frequency, rate, percentage of independent responses), seen in 50% of the studies. Of note, 30% of the studies included frequency logs of word use (i.e., Bedwani et al., 2015; Hammond, 2017; Mason, 2016).

Independent Variables

Table 3 provides a breakdown of intervention components for the studies. A variety of evidence-based teaching procedures (Steinbrenner et al., 2020) were reported across the included studies. Most (90%) of the studies involved used a combination of intervention procedures. The most frequently reported component included prompting, used in seven studies (70%), modeling, used in five studies (50%), naturalistic instruction, used in five studies (50%), and reinforcement, used in five studies (50%). The least often used component was time delay procedures and differential reinforcement, used in one study (10%).

Table 3 Summary of intervention components

Research Designs

Of the 10 included studies, six (60%) employed single case research design (e.g., multiple baseline design Bedwani et al., 2015; Hammond, 2017; Karnes, 2019; Mason, 2016; Tan et al., 2014), and four (40%) utilized group designs (e.g., quasi-experimental, convergent mixed-method design (i.e., Dorney & Erickson, 2019; Lal, 2010; Riccelli-Sherman, 2017).

Interobserver Agreement and Procedural Integrity

Four studies (40%) (i.e., Dorney & Erickson, 2019; Hammond, 2017; Snodgrass et al., 2013; Tan et al., 2014) reported interobserver agreement (IOA). For each of these studies, IOA was calculated for at least 20% of the sessions (range: 20% to 37%) with mean agreement at or above 87% (range: 87% to 100%). Four of the studies (i.e., Hammond, 2017; Karnes, 2019; Snodgrass et al., 2013; Tan et al., 2014) reported treatment fidelity data. For Tan et al. treatment fidelity data were collected for 20% of the sessions. In Snodgrass et al. fidelity was collected for 30% of sessions. In the Hammond study, treatment fidelity data were only collected for an average of 12 trials for each participant (out of an average of 680 trials per participant), which is generally considered insufficient. Fidelity was reported at or above 95% for Hammond and Snodgrass et al.; however, in Tan et al. fidelity was collected using a 5-point Likert scale (e.g., 1 = strongly disagree, 5 = strongly agreed), which averaged 3.7 (74%). One study reported a variation of standard treatment fidelity method in that a self-assessment score was reported (i.e., Karnes, 2019). Lastly, Mason reported that fidelity data were coded from four video recorded sessions for each participant (out of an average of 53 sessions); however, the author reported only anecdotal results.

Results

The majority of the reviewed studies (60%) were coded as showing mixed results (i.e., Bedwani et al., 2015; Dorney & Erickson, 2019; Hammond, 2017; Mason, 2016; Tan et al., 2014; Willis, 2020) when applying the certainty of evidence standards (Davis et al., 2013; Lang et al., 2012). The remaining four studies were each coded as showing positive results (i.e., Karnes, 2019; Lal, 2010; Riccelli-Sherman, 2017; Snodgrass, et al., 2013). Additionally, the majority of the studies (70%) did not collect generalization or maintenance data. Hammond reported both generalization and maintenance results; however, there was no clear evidence of substantial generalization, and maintenance effects were mixed. Mason reported generalization; however, there was no clear evidence of substantial generalization. Snodgrass et al. also reported data for both generalization and maintenance, which were coded as mixed results because direct training was needed for some of the targets and stability was not observed until the end of the study. Additionally, Tan et al. reported both maintenance and generalization, each of which were scored as mixed. For generalization, only two out of three participants continued to engage in core signs across toy sets, and for maintenance, two participants showed decreases in their rates of responding.

Quality Indicators

Each article was also evaluated for quality using the CEC quality standards across the eight dimensions specified for high-quality studies (see Figs. 2 and 3 for percentage scores). Coding definitions were derived from the descriptions provided by Cook et al. (2015a, b) for both single case and group designs. Of the reviewed studies, three (i.e., Karnes, 2019; Snodgrass et al., 2013; Tan et al., 2014) scored 90% or better within the eight dimensions of quality indicators, two studies (i.e., Hammond, 2017; Riccelli-Sherman, 2017) scored 80% or better, and the five remaining studies (i.e.,Bedwani et al., 2015; Dorney & Erickson, 2019; Lal, 2010; Mason, 2016; Willis, 2020) scored 60% or less, (range: 58% to 41%). For a specific breakdown of the scores see Tables 4 and 5.

Fig. 2
figure 2

Percentage of CEC quality standards included for single case studies. Note. *Denotes doctorate dissertation

Fig. 3
figure 3

Percentage of CEC quality standards included for group studies. Note. *Denotes doctorate dissertation

Table 4 Item analysis of CEC quality standards and percentage included for single-case studies
Table 5 Item analysis of CEC quality standards and percentage included for group studies

Discussion

This systematic review identified and summarized 10 studies that taught the use of core vocabulary within an AAC intervention to individuals with developmental disabilities. Although various systems (i.e., manual sign, PE, SGDs) were used in the reviewed studies, each included the use of core vocabulary as a dependent variable.

The first aim of the current review was to synthesize the literature to evaluate the evidence base for interventions to teach core vocabulary to individuals with language delays and developmental disabilities. The current findings indicate a very limited research base for AAC interventions aimed at teaching core vocabulary. Specifically, out of the 10 reviewed studies, only five were peer reviewed journal articles, which highlights the need for further research, given the popularity of core vocabulary based AAC systems in practice (AssistiveWare, 2023; Brydon & Pretorius, 2021; Thistle & Wilkinson, 2015; Tobii Dynavox Global, 2023).

The second aim of this systematic review was to evaluate the quality of the literature. For the 10 studies that met the inclusion criteria, only five studies met 80% or more of the CEC quality standards. The CEC workgroup specified that 100% of the criteria should be met to achieve acceptable quality standards for research supporting evidence-based practices (Cook et al., 2015a, b). Thus, these findings indicate an unacceptable level of rigor and methodological quality based on current quality standards in the field (Reichow et al., 2008; What Works Clearinghouse, 2017). Moreover, these results indicate limits of the generality and utility of interventions that aim to teach core vocabulary to AAC users.

Further, articles that had higher quality indicator scores (i.e., Karnes, 2019; Snodgrass et al., 2013; Tan et al., 2014) employed three or more evidence-based intervention components (e.g., stimulus fading, systematic prompting, task analysis, reinforcement). In contrast, studies with the lowest quality indicators (Bedwani et al., 2015; Lal, 2010; Mason, 2016; Willis, 2020) each had issues with implementation fidelity, internal validity, measurement, and data analysis, which likely account for the mixed results reported. Overall, given the lack of clarity of effect, future high-quality research is needed to help establish best practice standards for the procedures selected to teach individuals to use AAC systems. Doing so would help practitioners and teachers more effectively support individuals learning to use AAC systems.

Additionally, although all the articles included in this review used at least one evidence-based teaching component, it should also be noted that some studies included non-evidence-based practices, such as motor planning (e.g., Bedwani et al., 2015; Karnes 2019; Mason, 2016). It is unclear what the impact these procedures have on the acquisition of core vocabulary use since these studies did not include a specific dependent variable that measured motor behavior related to AAC use (e.g., activation precision, activation error analysis). These findings are consistent with previous research, since to date, there is a lack of evidence specific to motor learning in research on AAC (Thistle & Wilkinson, 2015). Although AAC is classified as an evidence-based practice (Steinbrenner et al., 2020; Wong et al., 2015), it should also be emphasized that AAC is a set of tools that requires the use of evidence-based teaching procedures to achieve their purpose. Simply providing access to an AAC system may not rise to the level of evidence-based practice if the intervention procedures for teaching the use of the system do not have sufficient research showing their effectiveness (Ledford et al., 2021). Thus, these findings also highlight the importance of selecting evidence-based teaching procedures (e.g., systematic prompting, time delay, reinforcement) based on individual needs.

Lastly, these findings also indicate that caution should be used until scientifically sound demonstrations of effectiveness have been published and replicated sufficiently to meet the requirements specified in evidence-based practice standards (Horner et al., 2005). When coupled with the mixed findings from the analysis of certainty of evidence, the outcome of the current review indicates lack of evidence.

Implications

In light of the current findings, discussion of the implications for future research and practice is warranted. First, more research is needed to evaluate the outcomes of core vocabulary targets within AAC interventions. The limited current body of literature for AAC interventions that incorporate core vocabulary targets do not allow for firm conclusions to be made related to the characteristics of learners who might benefit from these types of interventions. This is also confounded by the lack of rigor seen in the reviewed studies. Thus, more research is needed, and studies should focus on the inclusion of quality standards for designing and conducting research (Cook et al., 2015a, b; Ganz et al., 2023; Reichow et al., 2008; What Works Clearinghouse, 2017).

Further, although half of the studies included in this review incorporated only core vocabulary targets, research indicates the need to include both core and fringe vocabulary based on the individualized needs of the learner (Laubscher & Light, 2020). Thus, practitioners should consider current recommendations, such as selecting individualized communication targets and teaching a variety of vocabulary types, until research provides further clarity on selecting vocabulary targets. Utilizing a standard set of core vocabulary may not account for prerequisite skills, individual preferences, or current communication needs (Laubscher & Light, 2020; Thistle & Wilkinson, 2015). Additionally, only half of the studies in this review included the use of a formal communication assessment to identify the participants’ current abilities and needs, however, a specific rationale for the vocabulary selection was not provided for any of the reviewed studies. In future research, it will be important to conduct a thorough communication assessment that evaluates current functioning levels, abilities, and preferences prior to selecting communication targets.

Lastly, a variety of evidence-based teaching components were used across the included studies; however, the rationale for selecting each component was unclear, which may be important when considering the types of prompts selected for an intervention (e.g., model, hand over hand, stimulus prompting). For example, some of the studies included in this review provided access to a variety of core words at the start of the intervention whereas studies that used a more systematic approach that involved teaching one core word at a time, before introducing new targets. And further, one studies (Karnes, 2019) included in this review used a systematic teaching approach that not only taught one core word at a time, but also involved three phases of stimulus fading (e.g., hiding the other vocabulary and slowly making them visible) which might have impacted the levels of vocabulary acquisition achieved. For current practice, this review is consistent with previous research, which highlights the need to include effective evidence-based teaching strategies and consideration for individual learner profiles when selecting components for teaching (Ganz, 2015; McNaughton et al., 2019; Light et al., 2019; Lorah et al., 2022).

Taken as a whole, these findings highlight the need for consideration on how best to individualize interventions for AAC systems, including consideration for the vocabulary taught, and teaching components used, to ensure AAC interventions meet the needs of each individual and their unique needs.

Limitations and Future Directions

Currently, it is unclear what learner prerequisite skills are needed and what considerations should be made when selecting core vocabulary words for AAC users. Previous research has indicated a just-in-time approach may be useful for vocabulary expansion (Laubscher & Light, 2020; Light & McNaughton, 2012; Schlosser et al., 2016). Thus, future research should evaluate this approach for targeting core vocabulary.

Additionally, this review is limited by the lack of participant information provided and vocabulary selection, which prevented in depth analysis of these specific variables. Further, since most of the studies had mixed findings, it is unclear what intervention components accounted for gains in learner acquisition. Rigorous future research is needed that provides more in-depth and assessment-based participant information to help understand what intervention components best compliment learner characteristics and communication needs.

Although some of the studies included in this review reported gains in the use of core vocabulary, it is unclear how these gains might compare to individualized vocabulary targets. Thus, future research is needed to help evaluate acquisition, frequency of use, and maintenance of communication skills learned through AAC interventions that target core vocabulary.

Lastly, our analysis is limited in that the current version of the CEC quality standards only allows for analysis in terms of the presence or absence of individual study qualities, and do not specify the relative importance of individual items for determining study rigor. For example, items such as design and within-study variability may be more important indicators for establishing strong evidence for experimental control and the establishment of functional relations relative to some other items. (Zimmerman et al., 2018). However, establishing the importance of the individual quality indicators is outside the scope of this review, and future research is needed to analyze the relative importance of study features in determining overall quality of research studies. Because the current review included a very small number of studies with high quality standards, this kind of analysis would be inconclusive. Additional research with a larger set of published studies is needed to provide guidance on predictive study features.

Conclusion

In sum, this review summarizes the current research on the teaching core vocabulary within AAC interventions and highlights the needs for more rigorous research that aligns with relevant quality standards. As research continues to progress, these areas of concern should be prioritized to ensure that services aimed at teaching the use of AAC systems are grounded in evidence-based practices and tailored to the specific needs of the learner.