In researching and developing intervention programs to establish early repertoires of language and cognition, behavior analysts have a long and rich empirical history to draw upon, with work on addressing social and communicative behavior going back to the inception of applied behavior analysis (e.g., Hart & Risley, 1968; Risley, 1968; Schreibman & Carr, 1978; Schumaker & Sherman, 1970; Wheeler & Sulzer, 1970; Wolf et al., 1964). The output from this research was incorporated into comprehensive intervention programs (Lovaas, 1987) and has evolved over decades to become the collection of tactics referred to as “progressive discrete trial teaching” (Leaf et al., 2016, 2022). Additional programming recommendations and assessments have been developed based on Skinner’s (1957) analysis of verbal behavior (VB; e.g., Barbera & Rasmussen, 2007; Partington, 2006; Sundberg, 2008, 2016; Sundberg & Michael, 2001; Sundberg & Partington, 1998/2010). These programs use Skinner’s (1957) analysis and classification of verbal operants, emphasize the analysis of motivating operations during teaching (Carbone, 2013; Sundberg, 2004) and include the use of natural environment training as an important focus of programming (Sundberg & Partington, 1999). Teaching approaches based on verbal behavior development theory (Greer & Ross, 2008; Greer & Speckman, 2009) have also emerged from empirical work focused on establishing the integration of speaker–listener repertoires as a critical learned foundation (“cusp”) for generative language. In addition, over the last several decades, relational frame theory (RFT; Dymond & Roche, 2013; Hayes et al., 2001) has amassed overwhelming evidence implicating relational responding in a number of phenomena including perspective-taking, academic performance and psychological well-being (see, e.g., Stewart, 2016) and this evidence has converged in a relational account of language and cognition that in recent years has been applied to a range of educational and intervention programs (e.g., Barron et al, 2019; Cassidy et al., 2011; Cassidy et al., 2016; Colbert et al., 2018; Hayes & Stewart, 2016; Paliliunas et al 2022).

Skinner’s analysis of verbal behavior and RFT have historically been viewed as theoretically in conflict (see Gross & Fox, 2009, for an overview of this controversy). However, we have seen that there are many points of overlap as these theories have been applied (e.g., see Sivaraman et al., 2023, for a discussion of the similarities and differences between verbal development theory and RFT). Moreover, as RFT-based curricula for language intervention programs (e.g., Dixon, 2015, 2016; Ming et al., 2019, 2022) have become available, practitioners have shown increasing interest in how to apply RFT within behavior analytic programs for autistic children and other individuals with deficits in generative language repertoires. We (the authors) have focused in our research and practice on the development of early intervention programs that specifically promote generative language. It is this research and related recommendations, particularly integrating the Skinnerian verbal behavior perspective, and the collection of programming recommendations based on this analysis, commonly known as the “verbal behavior approach” (Barbera & Rasmussen, 2007; Sundberg & Partington, 2010) or the “applied verbal behavior” approach (AVB; LeBlanc et al., 2006; Shillingsburg et al., 2022), with RFT, that we will focus on in the current article.

To begin with, we must clarify what we mean by generative. Many different behavioral processes can result in responding that has not been directly taught, including stimulus generalization, recombinative generalization, response induction, and observational learning. All these are critically important in any educational or intervention program, particularly with young learners or any individuals with limited language repertoires; however, our focus is not only on the generalization of taught responding to novel settings or stimuli (as important as that is) but rather on the generativity of language. Generative language involves both producing and understanding the infinite variety of completely novel utterances characteristic of fluent speaker and listener behavior (Malott, 2003; Stewart et al., 2013).

In emphasizing language generativity, we ground our approach in RFT and thus place a central focus on assessing and establishing, as well as strengthening and broadening, a repertoire of arbitrarily applicable relational responding (AKA relational framing) through multiple exemplar training. Our approach and related practical tools are described in detail in our handbooks (Ming et al., 2019, 2022), and the basic tenets of relational frame theory are also described in detail in many other sources (e.g., Ming et al., 2023; Zettle et al., 2016), but we will briefly outline critical points here. Relational responding (or “relating”) involves responding to one stimulus in terms of another. This can be on the basis of their physical properties (as in identity matching), termed nonarbitrary relational responding (NARR), or on the basis of contextual cues that specify an “arbitrary,” (i.e., socially determined), relation, referred to as arbitrarily applicable relational responding (AARR). AARR is seen by RFT as critical to human language and cognition and it includes equivalence responding as well as multiple other patterns of derived relational responding—a classic example being comparative value such as with different countries’ currencies being worth “more” or “less” than others.

As an operant, AARR is established through multiple learning opportunities. In the course of typical development and interactions with parents, caregivers, and the environment, children learn to relate words to objects and to other words, and to relate numerous concepts to each other in multiple different ways under contextual control. The earliest examples of these operants to develop are naming (Horne & Lowe, 1996; Miguel, 2016)—“responding to the symmetrical relation between words and their referents” (Barnes-Holmes & Barnes-Holmes, 2002, p. 35)—and equivalence (Sidman, 1971, 1994, 2000), in which three or more stimuli are interrelated and substitutable for one another. In each of these examples, novel responses may be derived that have not been taught. In the case of naming, a child can respond by tacting or discriminating an object when that object has simply been named by another, without explicit training. In equivalence relations, having been taught that one stimulus (A) can be related to two others (B and C), those two other stimuli may then be related without explicit training (i.e., having been taught A = B and A =C, B = C may be derived).

Such repertoires are learned in typical development through the natural multiple exemplar training provided by a child’s verbal community. Many early language activities involve relating animals with their names and sounds (e.g., American Speech-Language-Hearing Association, n.d.), and identifying animals by name and the sounds they make are common in early childhood songs and books. Although initially children will need to be explicitly taught to tact animals by name and identify the sounds they make as tacts, listener responses, and intraverbals, with a sufficient learning history in this pattern a child could then be taught only a specific subset of relations—such as tacting the name of the animal and selecting a picture of the animal based on the sound it makes—and then the remaining relational responses can be derived, such as responding intraverbally to a question about what sound a particular animal makes. This example highlights one of the benefits of integrating an AVB perspective with RFT, to which we will return shortly—in order for relational responses to be derived (e.g., intraverbals), other relations must be taught (e.g., tacts or listener responses), and the AVB approach provides an excellent foundation for teaching these early verbal operants.

From the RFT perspective, sameness/equivalence (in RFT termed a frame of coordination) is only one type of relational frame. There are many others for which empirical evidence has been provided, including for example distinction (e.g., Roche & Barnes, 1997), comparison (e.g., Barnes-Holmes et al., 2004a; Berens et al., 2007; Gale & Stewart, 2020), opposition (e.g., Barnes-Holmes et al., 2004b; Kirsten & Stewart, 2021), analogy (e.g., Stewart & Barnes-Holmes, 2004; Persicke et al., 2012) temporality (O’Hora et al., 2004; Neufeld et al., 2023a), spatial relations (May et al., 2017) and deixis (McHugh et al., 2004). RFT argues that this variety of relational patterns or frames underlies the diversity, complexity and generativity of human language. Again, MET provides the basis for these repertoires of relational responding. For example, in the relational frame of comparison, if I tell you that my pet snake John is longer than my pet raccoon Blake then you could derive that Blake is shorter than John, without seeing either of them. In this case, you are responding in terms of a comparative relation between them, based on the cue “longer” rather than on physical properties, based on a lifetime of exposure to a pattern in which if X is longer than Y then Y is shorter than X. When you initially learned this pattern of relating, physical objects were likely involved, but after enough exemplars, you could respond to verbally stated relations without seeing physical objects.

Critical to understanding relational framing as an operant is that as such, relational framing is not something that one either does or does not do—but like any other operant is a skill that can be fragile (e.g., less widely generalized, with a slow rate of responding) or stronger (e.g. more widely generalized, fluent). Thus, practice in relational framing would be expected to strengthen the operant and result in more fluent responding in any context in which deriving relations is relevant—that is, almost any language-based task. Over the last decade, research on relational training has suggested that not only can multiple exemplar training result in the acquisition of framing repertoires if not already present, but that strengthening existing relational repertoires can result in significant increases in standardized cognitive and academic measures across a range of populations (e.g., Amd & Roche, 2018; Brooks-Newsome et al., 2014; Cassidy et al., 2011; McLoughlin et al., 2022; Roche et al., 2023; see Beck et al., 2023, for a review) as well as improved engagement in acceptance and commitment therapy sessions with children with autism (Gilsenan et al., 2021). Such work supports the core RFT thesis that relational framing is central to human language in all its potential generativity and complexity and hence is a key operant on which to focus when we need to establish or strengthen these critical aspects in a child’s repertoire.

Language as Behavior: Synthesizing Skinnerian VB and RFT

As noted, in our approach we integrate research on relational framing with research on other important foundational skills for linguistic generativity from the behavior analytic literature, including research and applied work using a Skinnerian analysis of verbal behavior (for consistency’s sake, and to clarify that these are based on but not delineated in Skinner’s work, we will use the term “AVB” for such applied programs). Some see the application of Skinnerian VB as somewhat in opposition to work in RFT (see Gross & Fox, 2009, for an overview of the controversy historically surrounding work in RFT). However, we believe these approaches are compatible, particularly (but not exclusively) in early intervention programs for teaching language, focusing on frames of coordination (sameness, equivalence) as the first relational pattern to assess, teach, and capitalize on. At the simplest level, combining the AVB approach with RFT has considerable merit to the extent that it brings together decades of research on how to best teach early verbal operant responses with the literature on relational framing and deriving relations. A number of studies have investigated the use of relational framing paradigms synthesized with Skinner’s analysis of verbal behavior (e.g., Murphy et al., 2005; Rosales & Rehfeldt, 2007), including using multiple exemplar training (MET) to train coordinate relations with auditory and visual stimuli—that is, derived tact and derived intraverbal responding (for reviews, see Belisle et al., 2020; Ming et al., 2014; Raaymakers et al., 2019). However, there are several much broader aspects common to both approaches that we would like to highlight: first, an emphasis on joint attention and cooperation in the development of language; second, the importance of understanding the broad context and multiple sources of control for any particular verbal response; and finally, the use of multiple exemplar training to establish generalized operants.

Joint Attention, Cooperation, and Early Language Development

A hallmark of AVB early intervention programs is to begin by focusing on increasing cooperation, using an analysis of the child’s motivation (Carbone, 2013; Kelly et al., 2015; Sundberg & Partington, 1998/2010), and emphasizing child engagement and initiation of interactions (such as independent mands) as critical measures (Shillingsburg et al., 2022). As Barbera and Rasmussen (2007) notes, “the goal of any academic program is that the child is a happy and willing learner” (p. 65). Before beginning intensive instruction in specific target responses (such as tacts or listener discriminations), these programs aim to establish instructors and the instructional setting as sources of reinforcement, through a process generally referred to as “pairing” (Barbera & Rasmussen, 2007; Pennsylvania Training & Technical Assistance Network, n.d.). Pairing procedures involve initially delivering reinforcement contingent only on reaching for the stimulus as provided by the instructor, and then shaping approach behaviors within the natural environment in the context of fun activities (e.g., orienting to the instructor, walking to an instructor a few feet away, seeking the instructor out within the room; Pennsylvania Training & Technical Assistance Network, n.d.). Given the assumption within AVB programs that verbal behavior requires socially mediated reinforcement, the pairing process also has the explicit aim of conditioning social stimuli such as adult facial expressions and verbal interactions as reinforcers, and AVB programs view interest in social stimuli as pivotal for repertoires of cooperation, language, and play as well as joint attention more broadly (Ward, 2008).

AVB programs also begin instruction with an emphasis on manding (Barbera & Rasmussen, 2007; Pennsylvania Training & Technical Assistance Network, n.d.; Sundberg & Partington, 1998/2010). In doing so, such programs begin with activities that naturally target broad repertoires of joint attention and social referencing—behaviors that serve a common function of recruiting or responding to bids for attention within a shared experience (Jones & Carr, 2004; see Dube et al., 2004; and Holth, 2005, for operant analyses of joint attention). Verbal behavior development theory also emphasizes the critical nature of early orienting behaviors and the conditioning of adult voices and faces as reinforcers for observing behavior in the development of generative language (e.g., Greer et al., 2011; Maffei et al., 2014; Sivaraman et al., 2023). This focus is entirely congruent with an RFT approach, which similarly identifies joint attention and social referencing as necessary precursors to derived relational responding (Barnes-Holmes & Harte, 2022; Pelaez, 2009; Pelaez & Monlux, 2018).

From an RFT perspective, cooperation is not only a necessary prerequisite component of early language development, but also a driving evolutionary force for the development of human language (Hayes & Sanford, 2014). In early intervention, this should translate into a heavy emphasis on activities that promote the earliest repertoires of cooperation—bidirectional, reciprocal interactions that establish what Barnes-Holmes and Harte (2022) term “mutually entailed orienting and evoking.” Through these interactions, as children orient to particular stimuli, caregivers establish stimulus functions (appetitive or aversive) while establishing early listener repertoires. In doing so, a foundation is set for learning repertoires of derived relational responding, and for the transformation of functions through relational framing. We contend that the “pairing” activities generally associated with AVB programs naturally provide a kind of loose multiple exemplar training (MET) for this early cooperative behavior, as approach and orienting repertoires are shaped and instructors pair words and facial expressions with reinforcement in the context of numerous short interactions in the context of fun activities. When pairing, instructors “attend closely to the interests and preferences of their learner, to create optimal conditions for interaction to occur” (Shillingsburg et al., 2022, p. 57), and as Ward (2008) notes (and emphasizes the importance of), “skilled instructors who spend time engaging their students in fun interactions with objects can establish [joint attention] repertoires” (pp. 20–21).

Understanding Multiple Sources of Control

Another hallmark of AVB approaches is an emphasis on identifying and analyzing the sources of control for particular responses, in order to precisely teach Skinnerian verbal operants as the targets of intervention, and to effectively remediate problems of faulty stimulus control (Sundberg & Partington, 1998/2010). A strength of these programs is thus in identifying systematic programming for establishing functional control for each of the verbal operants—that is, prescribing how to teach initial language skills. Because all derived relational responding requires some initial teaching of baseline relations, this is another reason we believe that combining the AVB and RFT approaches can be a natural fit for early intervention programs. Taking the previous example of animals with their names and sounds, a frame of coordination frequently includes a combination of auditory and visual stimuli, and thus includes two of the primary verbal operants (tacts and intraverbals) as well as listener behavior (see Fig. 1). These may then be further classified as either taught through direct contingency training, or derived based on relational framing, as suggested by Barnes-Holmes et al. (2000). So, for example, one might teach related tacts (and use strategies developed by AVB programs, e.g., Sundberg et al., 2000; Pistoljevic & Greer, 2006), and then probe for the derivation of a related intraverbal. We will expand on this more below.

Fig. 1
figure 1

Skinnerian Verbal Operants as Relational Responding

Skinner’s (1957) focus of analysis was not merely on developing a taxonomy of verbal operants under specific sources of control, however—rather, in identifying these potential sources of control, one can then see how most language is under the control of multiple variables. The AVB approach thus also emphasizes an analysis of response patterns that occur under convergent and divergent multiple control (Axe, 2008; Michael et al., 2011). This expands the focus of analyzing any given response to include all the potential influencing variables in a given context, including an extremely broad range of discriminative stimuli that are immediately present in the external environment (including audience variables) and one’s own private verbal behavior, as established and influenced by one’s individual learning history. This expansive understanding of multiple control in the context for responding is clearly consistent with the functional contextualist foundation of RFT, which would explicitly add relational framing with respect to those stimuli, based on control by contextual cues (e.g., Ming et al., 2023). Both approaches also thus emphasize the need for increasing flexibility and fluency in responding to changing sources of stimulus (or contextual) control, and this is an important aspect of any programming in early intervention.

Multiple Exemplar Instruction

As noted previously, RFT views relational framing as an operant established through multiple exemplar training, also termed multiple exemplar instruction (see LaFrance & Tarbox, 2020, for a discussion of how these two terms have been used as well as the importance of MET and MEI in behavior analytic intervention programs). Programs based on verbal behavior development theory also emphasize such instruction to establish naming and have developed well-tested protocols for doing so (see Sivaraman et al., 2023, for further discussion of these programs). And, a hallmark of programs within the verbal behavior approach is the coordination of teaching targets for tact and listener discriminations (and mands), as well as mixing verbal operant targets in “verbal modules” (Sundberg & Partington, 1998/2010) and “intraverbal webbing” (Alzrayer, 2020; Sweeney-Kirwan, 2008). As Sundberg (2008) notes, the goals of procedures to transfer control from one verbal operant to another, is to “achieve untrained transfer between the language skills” (p. 140); Alzrayer (2020) describes a benefit of intraverbal webbing procedures as being to “accelerate stimulus and response generalization” as well as establishing responding to more complex antecedent verbal stimuli.

We would view these types of AVB programs as providing loose MET for naming and combinatorial entailment. When tacts and listener discrimination targets are coordinated, learners receive instruction that naturally provides for establishing bidirectional relations (within a teaching session, learners will both tact and select the same object multiple times). With verbal modules and intraverbal webbing, learners are taught to respond in a variety of forms to a number of different relations among stimuli that are interrelated. For example, Sundberg and Partington (1998/2010, pp. 132–133) give an example of a verbal module that involves a sequence of instructions about a cat and a dog, including asking the learner to “give me an animal” (listener discrimination), answer the question “what animal is this?” (tact), “touch the cat” (listener discrimination), and find each animal based on the sound it makes, in essentially the same format that might be used to provide MET on equivalence relations of A (picture) = B (name), A (picture) = C (category), and A (picture) = D (sound). They go on to note that additional instructions relevant to the features, functions, or categories of dog and cat could be worked into additional teaching sessions, such questions could be asked about other animals, and that the trials be varied from session to session in as natural a format as possible. Intraverbal webbing requires students fill-in related statements about the features, functions, or class of a particular theme/topic (Alzrayer, 2020).

For many children, these methods seem to be sufficient to establish early coordinate relational framing repertoires (as demonstrated by Alzrayer, 2020, in which all participants emitted at least some untrained intraverbal responses having been taught related tact, listener and intraverbal responses) and we believe they are a likely reason for the success of AVB programming in establishing what is termed “generalization across verbal operants” (Sundberg, 2008). However, for students unable to show mutually entailed tact/listener responding despite having been taught many coordinated tacts and listener discriminations, or unable to show combinatorially entailed intraverbal (or tact or listener) responding, a more targeted teaching procedure may be required, and that is where an RFT lens can be extremely beneficial.

A growing body of research provides evidence for the effectiveness of MET in establishing a variety of relational frames (e.g., Barnes-Holmes et al., 2004a, 2004b; Berens & Hayes, 2007; Gorham et al., 2009; Weil et al., 2011), as well as establishing bidirectional stimulus relations (e.g., tact/listener, intraverbal/reverse intraverbal; e.g., Allan et al., 2014; Greer & Ross, 2008; Greer et al., 2005; Greer et al., 2007; Luciano et al., 2007; Pérez-González et al., 2007). For example, learners who use pictures for manding have been taught to select a picture (A) when the vocal name (B) is stated, and taught to select a text card (C) when the vocal name (B) is stated, and have then shown not only derivation of the picture–text relation, but also derived manding using the text card (Rehfeldt & Root, 2005; Rosales & Rehfeldt, 2007). Other studies have used abstract symbols in place of pictures or text, and MET has also been shown to be effective in establishing a derived manding repertoire when learners did not immediately derive the mand using this type of framework (Murphy et al., 2005; Murphy & Barnes-Holmes, 2009, 2010). Similar procedures have been used to transfer discriminative functions from picture schedules to text-based schedules (Sprinkle & Miguel, 2013).

There is also evidence for establishing derived intraverbal responding using textual stimuli (Walsh et al., 2014) as well as vocal stimuli (Shillingsburg et al., 2018) at the level of equivalence; however, there is as yet only limited empirical work examining the effectiveness of MET for derived intraverbals at the level of equivalence using auditory (or vocal) stimuli, and only extremely limited evidence of effectiveness with children whose language skills are at an early developmental level. Nonetheless, given the importance of relational framing to the flexibility of conversational skills, it is critical to explore training methods that could lead to flexibility and generativity of tact, listener, and intraverbal responding. Our lab has contributed to research identifying potentially effective protocols for MET with early learners (Ming, 2015) but much remains to be done and this is certainly an area ripe for further research.

Having highlighted a number of broad elements common to both RFT and AVB, namely, joint attention and cooperation in language development, the importance of the context for any particular verbal response, and the use of multiple exemplar training, in the next section we will examine some of what we see as key potential benefits of combining RFT and AVB approaches, particularly in early intervention.

Teaching Language to Young Children: Curriculum Implications

Integrating RFT with programs based on an AVB approach can have significant benefit in terms of analyzing and developing curricula for early language intervention. When working with early language learners, we consider three aspects of how to apply RFT to our analysis of intervention needs and subsequent curriculum development: how to assess for early repertoires of AARR, how to establish such repertoires when they are absent, and how to capitalize on and generalize such repertoires to more efficiently teach the wide range of content that all young children need to learn. Here again, the practical strategies and focus of AVB programs can inform RFT-based programming as well as vice-versa. We have already discussed the benefits of an RFT lens for bringing more systematic multiple exemplar instruction into verbal behavior programming to establish repertoires of naming and equivalence, as well as the necessity of establishing repertoires of joint attention within a cooperative learning context. The practical lens of AVB programs, however, can also highlight the need for modifications to both assessment and teaching protocols when working with young learners with limited language repertoires. Even given a reasonable level of cooperation, the kinds of protocols developed in experimental work with adults or older children may still require modification to maintain motivation, such as by interspersing breaks or using noncontingent reinforcement during testing (LeBlanc et al., 2003). We outline suggestions for assessing early repertoires of AARR elsewhere (see Ming et al., 2019), and so we will now turn to the issues of determining curricular content and efficiently teaching that content in ways that are likely to establish and capitalize on the generativity of relational framing.

Efficiently Teaching New Content: Equivalence-Based Teaching

As noted, one important area of programming in early intervention is simply how to efficiently teach the wide range of content that children need to learn. AVB programs emphasize the use of natural environment training (Sundberg & Partington, 1999), and we have found this emphasis on functional, natural contexts for language intervention to be particularly important in the development of RFT-based programs. A major benefit of an RFT lens is prioritization of function over form or content—experimental work uses completely abstract and arbitrarily selected stimuli, for example—which then leaves a practitioner free to use content that is relevant to an individual learner. This is particularly important when implementing equivalence-based teaching, which is now well-established as a means of rapidly teaching new content across numerous educational and therapy settings (for reviews, see Belisle et al., 2020; Brodsky & Fienup, 2018; Ming et al., 2014; Raaymakers et al., 2019). For example, for a student whose family includes sports fans, learning about team logos and cities may be an extremely functional and useful expansion of their language skills. In contrast, if the only reason to teach particular content is that it is the next thing on the list of programs, then it is unlikely to be maintained in the natural environment and unlikely to enter into any additional relational networks. This orientation tends to be a strength of AVB programs, which emphasize natural environment training, the use of individual learners’ motivation, and also provide loose EBT through the same procedures (verbal modules, intraverbal webbing) that are likely to provide loose MET for coordination. Shillingsburg et al. (2018) also describe systematically using an EBT framework within an AVB program to establish untrained intraverbal responding with respect to features and functions of stimuli.

However, the RFT lens allows us to more precisely examine EBT procedures, and problem-solve when equivalence is not demonstrated in new contexts—when generalization is not occurring “across verbal operants,” in the parlance of AVB programming. One aspect that may not be as systematic as needed in AVB programs is including an analysis of contextual cues for particular relational patterns As discussed above, contextual cues for specific relational patterns are established through MET and can come to evoke appropriate relational responses. For example, the cues “is” or “goes with” clearly indicate coordination/equivalence. Many cues used in verbal module training sessions, however, might indicate a different pattern (e.g., hierarchy or containment) with which a student has not yet had sufficient history. It is important to recognize that many common “LRFFC” and intraverbal content may not, strictly speaking, represent relations of equivalence. For example, categories and features that are identified as parts of a whole should ultimately form hierarchical relations—a bus and a plane are types of vehicles, but not all vehicles are buses/planes; a bus has wheels, but wheels do not “have” a bus. RFT allows such relations to be analyzed in detail in terms of not only multiple patterns of relations, but also interactions between relations. Early in language development, these common features and categories do begin in relations essentially of equivalence. Very young typically developing children as well as individuals with autism are able to identify the categories and features of items, and tend to respond to those relations as if they were equivalent, before being able to respond in accordance with true hierarchical relations, as we have seen in our research on class inclusion (Ming et al., 2018). Nonetheless, if there are problems with the derivation in EBT, it should be considered whether one is working within the frame of coordination, and what contextual cues are being used. It is important to ensure that similar cues are used in training and testing (as also recommended by Shillingsburg et al., 2018). For example, if the selection of a car has been taught in response to the question “what has wheels?,” it might be useful to ask, “what’s something that has wheels?” as the intraverbal test (i.e., trained LRFFC and tact relations of A: “has wheels” → B: car; B: car → C: “car” and a tested relation of A: “has wheels” → C: “car”). This ensures that the cue “has wheels” is clearly present in both the trained and tested relations, rather than other cues such as “where can you find wheels?” or “there are wheels on a. . . .” One might also first work on establishing relations that would more clearly involve cues for coordination, such as “goes with”—for example, teaching a match of socks to shoes, and testing “what goes with socks?”

In addition to these issues specific to patterns of relational framing, if one is having great difficulty establishing AARR, one might also examine other prerequisite skills to strengthen—and these skills are generally the focus of AVB programming. As noted, joint attention is critical. Flexibility of tacting under different sources of stimulus control may be particularly critical as one moves into EBT within verbal module sessions. Building a solid foundation of these other skills may allow for a more fluent repertoire of relational framing later.

Despite an emphasis on sameness relations in early intervention programming, it is not only frames of coordination that are relevant to early language development, as already seen in our examples above with respect to categorization, parts of a whole, and so on. Across relational patterns, there is evidence that nonarbitrary relational patterns are learned before arbitrary relations (Kent et al., 2017; Pomorska et al., 2020), and many are learned quite early—in the 3–4-year-old range (Kirsten & Stewart, 2021), placing these well within the realm of early intervention programs. AARR that combines coordination and distinction (i.e., same vs. different) as well as responding to comparison, opposite, spatial, and temporal relations, also begin to emerge in the developmental range (before age 5–6) typically targeted by early intervention programs (Barnes-Holmes et al., 2004a, 2004b; Kirsten & Stewart, 2021), as do the earliest forms of hierarchical responding such as category naming (e.g., Bornstein & Arterberry, 2010). In developing protocols for teaching these relations to young children, we have continued to find it useful to conceptualize programming from a perspective that integrates work from both Skinnerian VB and RFT.

Topographies of Responding

One of the ways in which we have found it important to integrate these two approaches is simply by establishing a common language to describe the variety of response topographies that relational responding may involve. As we have described elsewhere (Ming & Stewart, 2017; Ming et al., 2022), experimental work has included a variety of topographies of relational responding, including matching stimulus pairs, selection of stimuli on the basis of contextual cues for particular relations, producing or selecting the names of relations in response to stimulus sets, and using yes/no responses to identify a relation as consistent with a specified cue. These different topographies of responding are similar to the extent that they constitute contextually controlled relational responses. However, some theories would make substantial distinctions between them, and various terms are used depending on the orientation of the research. For simplicity’s sake, we have chosen to use terms that may be more familiar to clinicians, including terms based on Skinnerian verbal operants. As described in our previous review of the literature on distinction relations (Ming & Stewart, 2017), we use the term relational matching to describe pair matching on the basis of the relation exemplified by the pairs involved (e.g., matching AB [different] with XY [different] rather than ZZ [same]). We refer to the selection of comparison stimuli in a match to sample format, under the control of a specific relational cue (such as “find the one that is worth more”) as relational listener discriminations. Finally, we refer to the production/selection of the name of a relation (e.g., “different”) in response to two stimuli in a pair (e.g., two physically dissimilar stimuli) as relational tacting, whether the response to a pair/set of stimuli is topographical (such as saying “different”) or selection-based (such as selecting the textual stimulus “different”).

Another topography of relational responding involves “yes/no” responding, which is a special case and merits some discussion because it is a common component of most ABA-based early intervention programs. For example, “Yes/No” programs in Leaf and McEachin (1999, pp. 217–218) include phases for responding to yes/no with respect to increasingly complex requirements, beginning with desires (“Do you want this?”) and progressing to identity of objects (“Is this a truck?”), and questions about things that are not present (“Do birds have wings?”). This type of responding is typically classified as an autoclitic from a Skinnerian verbal behavior perspective, and primarily has been examined in terms of how responses of yes/no function within and across different verbal operants. That is, a mand using yes/no has a different function than a tact or an intraverbal using yes/no, and responses using yes/no may not simply generalize across these different contexts (Neef et al., 1984; Shillingsburg et al., 2009). However, from an RFT perspective, this type of responding is one way to evaluate the “coherence” of a particular relation or relational network—that is, the extent to which it is consistent with one’s previous learning history (e.g., being presented with a picture of a cow, and asked, “Is this a cow?”) or not (e.g., being asked the question, “Do pigs fly?”; see Maloney & Barnes-Holmes, 2016; Hayes et al., 2017). Therefore, yes/no responding is termed a “relational coherence indicator” and as such has been incorporated into a number of procedures in common use in RFT research (see e.g., Barnes-Holmes et al., 2001; Barnes-Holmes et al., 2010), and also is a part of the PEAK assessment and curriculum (Dixon, 2015, 2016) as well as our own protocols (Ming et al., 2022).

The RFT perspective thus provides an additional lens with which to analyze AVB programs that involve “yes/no” questions, potentially allowing practitioners to more clearly determine why such responding may not generalize across verbal operants. For example, when applied to tacting an object, yes/no responding indicates the coherence of the coordinate network between a word and that object. When applied to intraverbals, yes/no responding may indicate coherence of any number of relations. Thus, it is particularly important to break down such questions further into the relations and functions that may be cued in order to determine both how best to teach yes/no responding as well as how to determine remediation when yes/no intraverbals may not be consistently correct. For example, I could tell you that unobtainium is rarer than kryptonite, and kryptonite is rarer than osmium, and then I might ask you if osmium is more plentiful than unobtainium—which would require derivation and then evaluation of comparative relations. If a learner does not yet have a well-generalized repertoire of arbitrary relational responding in accordance with frames of comparison, then it would be no surprise that they would have difficulty with this type of question, even if they could answer other types of intraverbal “yes/no” questions that either do not require derivation or involve less complex levels of relational responding.

Expanding Relational Responding Repertoires: Curricular Sequencing

As we examine relations other than sameness, a primary task in early intervention is to establish contextual control over nonarbitrary relational responses as a foundation for contextual control in relational framing. The distinction between arbitrary and nonarbitrary relational responding thus becomes critical to curriculum development. RFT argues that contextually controlled nonarbitrary relational responding—that is, responding on the basis of physical relations between stimuli—provides the foundation for later relational framing (for example, being able to respond to things as physically bigger or smaller supports learning to relationally frame things such as coins in terms of their comparative value). This is a key element of the protocols we use, which have been based on our own work as well as reviews of the literature on relational framing as applied to teaching individuals with developmental delay (e.g., Kilroe et al., 2014; Luciano et al., 2009; May & Dymond, 2014; Ming et al., 2014). All our protocols (as described in detail in (Ming et al., 2022) follow a general sequence of establishing fluent nonarbitrary relational responding across multiple contexts with multiple topographies of responding, in familiar contexts, before establishing arbitrary relational responding in familiar and then more generalized contexts. Thus, when examining teaching targets typical of early intervention curricula, it can be useful to first identify what relational pattern is being targeted, then determine if a nonarbitrary or arbitrary relation is targeted, and then determine the most appropriate means of establishing generalized contextual control over the relational pattern.

Comparison

Most language intervention programs would teach tacting and listener responding to adjectives. For example, the VB-MAPP (Sundberg, 2008) includes milestones for manding, tacting, and listener discriminations that involve adjectives, and identifies the purpose of such programs being to respond to the “relative properties of objects” (p. 76) and the “comparison of those properties of one object to the properties of another” (p. 74). At the earliest level, this might simply involve tacting or manding based on a formal property of an item—“big” and “hot” are very early words for most children (Stanford Wordbank, http://wordbank.stanford.edu/) possibly because of common admonitions by parents to avoid things that are “too hot” or asking if a child wants a “big” piece of cookie, pointing out a “big” truck, and so on. However, once programs target comparing items as being “bigger,” we are shifting more clearly into the realm of relational responding. From a Skinnerian verbal behavior perspective, both the dimensional quality and the grammatical tag “er” in English would be considered autoclitics, modifying the primary tact (e.g., “big” modifies the tact “truck” to indicate an unusual size; in addition, adding “er” modifies the tact “truck” in terms of its size in comparison to another truck). As such, we would expect this skill to require fluency in the relevant primary verbal operants such as tacting the property itself, and subsequently require MET to establish abstraction of the autoclitic frame (e.g., Speckman et al., 2012). From an RFT perspective, the “er” functions as a relational cue for “more”—that is, continuing farther along the continuum in a particular direction. As a contextual cue for the relation, we would thus also expect the need for MET in the particular pattern across many stimulus sets and points along the continuum; that is, establishing contextual control over nonarbitrary comparison relations. Thus far, a number of studies have shown the establishment via MET of comparison relations in young children, both with and without developmental delay (see, e.g., Barnes-Holmes et al., 2004a, 2004b; Berens & Hayes, 2007; Gale & Stewart, 2020; Murphy & Barnes-Holmes, 2010). Furthermore, in the case of a number of the participants involved, exposure to NARR was a key support for training of AARR (see e.g., Berens & Hayes, 2007).

Opposite

Intraverbal antonyms (e.g., hot is the opposite of cold) are common in preschool programming, and as noted above, adjectives such as hot/cold, wet/dry, big/small are also common to early intervention programming and may also be taught as opposites. Pérez-González et al. (2007; Pérez-González & Garcia-Asenjo, 2016) describe programming from an equivalence perspective to teach antonyms and test for derived responses (such as being able to say that “hot is the opposite of cold” after being taught that “cold is the opposite of hot”) both with respect to taught/tested intraverbals as well as through programming that incorporated words in relation to pictures of the relevant stimulus item. However, there is an important distinction to be made between responding to a simple intraverbal question about the opposite of something—“the opposite of wet is dry” in the absence of actual things that have those properties —versus responding to the cue “opposite” in the context of a task in which actual physical objects that are physically same or opposite each other along a particular dimension are present. Children may indeed learn to “say their opposites” as a common preschool task, and may even be able to provide a reverse intraverbal statement without teaching. However, this does not necessarily mean that they are able to respond to relations of opposite, such as by identifying the “opposite” of a sample jar that is full of cookies when presented with comparison jars that are full, partially full, or empty (without those labels necessarily being stated). However, there is little research on teaching either nonarbitrary or arbitrary opposite relations as functional units in the RFT sense. Barnes-Holmes et al. (2004a, 2004b) trained opposition AARR with young typically developing 4–6-year-old children and showed considerable generalization of the core repertoire. Kilroe et al. (2014) employed a novel protocol to teach opposition and other patterns of both nonarbitrary and arbitrary relational responding to young children with autism with some success. Nonetheless, we would advise proceeding with flexibility and caution in teaching these relations. We also advise clearly defining the responses being taught as either opposite relations or as intraverbal opposites (or antonyms), because these two response classes may have quite different functions.

Temporal Relations

The issue of discriminative intraverbal versus contextual control for a relation is likewise important when examining programs that teach sequencing. As with opposites, children may learn to sequence the days of the week or months of the year as taught simple intraverbal chains as when learning the alphabet song. Likewise, they may learn to describe familiar stories and routines like getting dressed or baking a cake with first/next/last as taught intraverbal chains within a variety of common sequencing activities (see Speech & Language Kids, n.d., for examples). The words associated with temporal relations—first, last, before, after—emerge early in language development (in some cases, by 30 months; Stanford Wordbank), but this does not necessarily mean children will be able to identify what comes before or after another event in a sequence of novel events that is presented to them. That is, generalized contextual control over temporal relations may still need to be explicitly and systematically taught. As for other relations mentioned above, there is relatively little work showing the establishment of temporal relations in young children but what work there is provides a useful guide; for example, Neufeld et al. (2023b) recently trained temporal AARR based on “before” and “after” cues in typically developing 5-year-old children and showed that much more work was needed to train “after” than “before.”

Spatial Relations

Likewise, interventions that focus on “prepositions” can benefit from a deeper analysis of the relations involved, in this case spatial. It is important to note that commonly used initial “preposition” programs, either in curriculum guides (e.g., Leaf & McEachin, 1999), or the procedures in most of the empirical literature, teach relations unidirectionally, by using a small set of stimuli as “targets” arranged in a specific relation to another stimulus item, designated as the “base.” In unidirectional teaching the teacher always focuses on a relation in one direction only. Consider the example “the book is on the table.” In this case the book is the target and the table is the base. However, we could also describe the spatial relation between these two objects by saying that “The table is under the book.” In this case the table is the target and the book is the base and the focus is on the relation in the other direction, from table to book. Teaching spatial relations unidirectionally means teaching in one of these two directions only when referring to the relation between two objects and ignoring the other possible relational direction. However, an RFT analysis (and as also suggested by Barnard & Garofalo, 2004) would focus on MET in the bidirectional patterns of spatial relations across multiple sets of stimuli, rather than targeting various base/target pairs. In this example, we would not only teach that “The book is on the table’” but also teach, alongside this, that “The table is under the book,” as well as many other random and novel pairs of stimuli in under/on relations. We must note, however, that there is only one study in the RFT literature on teaching spatial relations (May et al., 2017), which trained adult participants to respond to abstract contextual cues with nonarbitrary spatial relations prior to then using those cues to establish arbitrary relational networks. Nonetheless, this study is consistent with our approach of first establishing contextual control over nonarbitrary relational responding and then arbitrary.

Categorization

Teaching category naming is also common in early intervention and preschool programs. For example (and as discussed above with respect to multiple exemplar training), programs for “listener responding by feature, function and class” (LRFFC; see Sundberg & Partington, 1998/2010) along with transfers of LRFFC responses to intraverbal responses, form a significant part of AVB programs. There is also considerable behavior analytic work on early repertoires of category naming from an equivalence perspective, for example by training tact or listener skills with the names and categories of items (e.g., tacting a stimulus as a “hammer” and also tacting that stimulus and others as “tool”), establishing equivalence classes among visual stimuli, and testing for the emergence of intraverbal categorization (e.g., responding to the question “tell me a tool”; e.g., Carp & Petursdottir, 2015; Kobari-Wright & Miguel, 2014; Miguel et al., 2005; Miguel et al., 2008; Miguel & Kobari-Wright, 2013; Pérez-González et al., 2018; Petursdottir et al., 2008). The emergence of such derived intraverbals in this context has been termed “Intraverbal bi-directional naming (I-BiN”; Miguel, 2016).

Intraverbal naming of this type is an important first step, but from an RFT perspective, it is only a beginning repertoire in learning hierarchical framing, which involves not only generalized equivalence classes (which in turn entail similarities and differences among and between classes) but also establishing relational networks in which classes are contained within classes. For example, the class “animals” contains the subclasses “dogs,” “cats,” “horses” etc., and each of these in turn contains further subclasses corresponding to specific types of each (e.g., Alsatian, Manx, Mustang). Members of any particular class are different from each other in terms of their specific characteristics but are the same in the context of the higher order class (e.g., dogs, cats and horses are distinct types based on different physical characteristics but they are the same in the context of the overarching class “animal”). As such, hierarchical framing involves multiple contextually controlled relations.

Our research has examined teaching children to respond to subclasses as belonging to larger classes (class inclusion; Ming et al., 2018; Zagrabska-Swiatkowska et al., 2020) by increasing the saliency of a nonarbitrary containment relation (including having children tact a category as a group of items within a physical container), as well as teaching children to respond to arbitrary containment relations and hierarchical class relations (Mulhern et al., 2018). Paliliunas et al. (2022) used hierarchical relational training to establish multiple categories and showed a transformation of stimulus functions in accordance with those categories. By seeing categorization as a complex repertoire proceeding from a variety of nonarbitrary relations to arbitrary hierarchical responding, rather than simply the intraverbal responses relevant to intraverbal category naming, a more comprehensive set of programming can be developed.

Conclusions and Future Directions

Integrating RFT with AVB approaches to early intervention allows us to use the strengths of each approach as we focus on establishing generative language in intervention programming. AVB programs have much to say about how to analyze, capture, and contrive motivation for establishing cooperative repertoires, including joint attention. They have excellent recommendations for how to analyze and establish functional control for each Skinnerian verbal operant—that is, how to effectively teach early language skills. Also, we would argue that they provide loose multiple exemplar training for coordinate relational responding patterns, as well as loose equivalence-based teaching, which may be sufficient for many learners.

Adding the lens of RFT allows us to keep the focus of programming on generative language from a functional perspective, rather than checklists of isolated skill targets as sets of discriminative stimuli. RFT also gives us the tools to precisely analyze many early language development programs as teaching multiple patterns of relational responding. When we can view even highly complex language as responding to relational patterns, that allows us to identify likely component prerequisite repertoires and establish a strong foundation in early intervention through multiple exemplar training.

However, much further study is needed with respect to the acquisition and training of relational framing in early language development. More work is needed on the most effective and efficient strategies for establishing relational framing repertoires when absent. Although there are some well-researched protocols for establishing naming (e.g., Greer & Ross, 2008; Miguel & Petursdottir, 2009; see LaFrance & Tarbox, 2020, for further discussion), there has been limited work on establishing early responding in frames of coordination, particularly with vocal responding. That is, there have been relatively few studies that have used MET to establish coordinate combinatorial entailment (rather than using equivalence-based teaching capitalize on existing repertoires), and most have used selection-based responding with visual stimuli such as text (Walsh et al., 2014) or objects (Luciano et al., 2007), or have used other nonvocal responses such as actions (Gómez et al., 2007). In fact, we are aware of only one recent study that used MET to establish a generalized repertoire of coordination with combinatorially entailed vocal intraverbals (Cho & LeePark, 2023). Given this relative lack of research on early repertoires of generative language, there are many questions to be answered. For example, how does the training structure, when combined with vocal or other topographical response forms for tacting and intraverbal responding, influence the likelihood of showing derived relational responding, and influence the efficiency of MET for establishing frames of coordination?

Moreover, there has been relatively little research examining the best comprehensive sequence of training nonarbitrary and early arbitrary relational responding or how repertoires of nonarbitrary and arbitrary relational responding across multiple frames might interact with one another. For example, is a well-established repertoire of relational framing in one pattern necessary before moving on to teach another—for instance, is combinatorial entailment with respect to same/different necessary before establishing mutual entailment in another frame (such as comparison)? Does an earlier developing repertoire of AARR (such as coordination) need to be generalized beyond familiar, naturalistic contexts before introducing more complex repertoires of AARR (such as comparison)? Likewise, is it necessary or facilitative for responding to nonarbitrary relations in patterns of lower complexity, such as sameness or difference, to be well-generalized across multiple contextual cues (e.g., responding to cues such as “goes with,” “like,” “belong together,” “isn’t,” rather than only “same” or “different”) before introducing new, more complex relational patterns such as comparison? Would a stronger and more flexible repertoire of nonarbitrary same and different responding improve not only arbitrary responding in frames of coordination or distinction, but also derived relational responding in frames of opposition, comparison, or hierarchy? Pomorska et al. (2020) comment that there may be “a dynamical and perhaps even idiosyncratic relationship between the development of nonarbitrary and arbitrary responding” (p. 22), and there is clearly much work to be done in this area to inform curricular sequencing.

At these early levels of language development, many questions also arise with respect to the intersection of taught Skinnerian verbal operant repertoires and both nonarbitrary and arbitrary relational responding repertoires. For example, how are relational responding repertoires affected by learning to tact with increasing levels of conditional discrimination requirements and increasing complexity of multiple control (e.g., tacting simple objects vs ongoing actions vs. sensory qualities of stimuli across modalities)? Is it helpful to teach “adjectives” such as big/small, hot/cold, wet/dry simultaneously as comparative relational responding repertoires, or are tacts of these qualities necessary or helpful prerequisites to such relational responding? How might teaching “intraverbal antonyms” (e.g., “hot is the opposite of cold”) affect both nonarbitrary and arbitrary opposite relational responding? Is it best to begin with unidirectional or bidirectional spatial relations when planning teaching of “prepositions?” Is teaching familiar sequences a necessary prerequisite for temporal relational responding?

Categorization and hierarchy are particularly important as examples of how verbal behavior programming might integrate work from an RFT perspective, and further research is needed on relevant early teaching procedures. For example, an RFT approach would emphasize the use of specific contextual cues for hierarchical and containment relations (such as “category,” “type of,” “belongs to,” “part of,” “has,” “contains”) that are not always used systematically in AVB programs when teaching category names or features of items. Would the use of such cues improve the efficiency of both taught and derived responding? Would a focus on contextually controlled tacting of groups as categories, rather than tacting the category name for singular items, improve the flexibility of categorization and variability of intraverbal category naming?

Finally, as noted above, there is evidence that relational training can affect measures of language and cognition as well as other academic skills such as reading (Brooks Newsome et al., 2014). However, further research is needed on the practical impact of relational training especially with respect to real-life academic skills and additional repertoires such as psychological flexibility. We should mention that we have seen children benefit from learning to respond even to nonarbitrary difference relations as they start to attend to their environment in broader ways, commenting on and responding to sameness and differences observed among stimuli. We’ve even heard reports that learning to respond to contextual cues to engage in “different” behavior (such as building a “different” block structure) can allow a learner to later use that same cue in a self-rule when coping with denial of reinforcement—“It’s ok, I can do something different.” These possibilities speak to critical issues for intervention programs working with autistic individuals. Might establishing new relational responding repertoires at either nonarbitrary or arbitrary levels lead to increased flexibility/reduced rigidity as measured by autism diagnostic scales? More important, how can we measure the impact of such training from a perspective of social validity—how does it affect individuals’ quality of life, and engagement with the activities that are most meaningful and important to them?

We believe that integrating AVB and RFT approaches can create a whole that is greater than the sum of its parts. We hope that more labs and applied researchers take on the challenge of answering the many questions that arise from both theory and practice when developing language intervention programming.