Developmental disorders (DD) are a heterogeneous group of conditions including cerebral palsy, developmental coordination disorder, language delay and disorders, and a host of other problems. Intellectual disabilities (ID) and autism spectrum disorder (ASD) are among the most common and debilitating of these conditions [1-4]. These two conditions overlap to a large degree [5-8]. In fact, ID and ASD are known to co-occur with a number of other conditions, such as mental health disorders [9-15]. One of the most notable of these comorbid disorders is a group of varied overt behaviors often referred to as challenging behaviors (CB) or problem behaviors [16-19]. Among the most common of these are aggression, self-injurious behavior, feeding problems, and tantrums [20, 21]. All of these issues impede development and adjustment, and thus they are a high priority for intervention.

Because CB impacts quality of life to such a profound degree, researchers and clinicians have tried a variety of treatment options in the hopes of resolving or at least minimizing these behaviors [22, 23]. Medication has proven to have only minor benefits [24, 25]. The reason for this appears to be that CB tends to be less of an artifact or symptom of the mental health or developmental disorder. Researchers have noted that some behavioral phenotypes, including CB, are frequently correlated with certain genetic disorders. For example, self-injury is frequently noted in individuals with fragile X syndrome, Prader-Willi syndrome, and Cornelia de Lange syndrome [26]. Aggression commonly occurs in individuals with ASD and Angelman and Smith-Magenis syndromes [26]. The frequency of CB in the context of certain disorders, however, does not imply that phenotypic behaviors are not being maintained by environmental conditions. Rather, the inherent deficits in persons with developmental disabilities, such as communication and social skills deficits and poor problem-solving skills, tend to be the primary drivers of these CB [27-32]. This fact has led researchers and clinicians to explore other avenues with respect to resolving these specific behaviors.

Functional assessment and functional analysis are similar but distinct methods of investigating the cause(s) of behavior. In a functional assessment, information and data are collected from a variety of sources in a non-experimental manner – e.g., scatter plot analysis, antecedent-behavior-consequence (ABC) sheets, direct observation, and interviews – in order to identify the function of a behavior. A functional assessment may include a functional analysis, especially when the function of a behavior remains unclear. For example, various methods of assessment may have narrowed the number of behavior functions to two (e.g., attention and escape), but assessors are uncertain which function is primarily maintaining the behavior. In this situation, a functional analysis may be conducted. Functional analysis is experimental in nature and involves manipulating specific variables to ascertain with more certainty what is maintaining the behavior.

Direct Observation Methods

Functional analysis is an offshoot of applied behavior analysis. Montrose Wolf is often credited with developing the first method for conducting this type of assessment [33], and nearly two decades later, Iwata embarked on a series of studies that was largely responsible for popularizing the technology. The first section of this review focuses on the development and popularization of experimental functional analysis (EFA).

Experimental Functional Analysis (EFA)

EFA derives directly from applied behavior analysis [34]. The assumption is that much of what constitutes CB has environmental factors that both trigger and maintain the behavior of interest to the clinician. The focus of EFA, therefore, is to systematically vary stimuli, one at a time, in an attempt to determine which environmental stimuli are likely to covary most often with the CB. The stimulus or stimuli that covary most frequently are studied, and other “replacement” or adaptive behaviors are selected and trained as alternative client responses. The purpose of this technique is to teach or train the client to use these replacement behaviors versus CB to achieve desired outcomes, and the goal is to make the selection of reinforcement methods more systematic, precise, and effective [35].

The study most often cited as the template for this methodology was published by Iwata, Dorsey, Slifer, Bauman, and Richman [36]. In this approach, five standard conditions are presented in vivo to the client by the therapist. The attention condition is designed to test the reinforcement magnitude of attention by others with respect to the CB. Both frequency and intensity are evaluated. Demand is a condition that refers to caregivers making requests or placing requirements on the client’s behavior(s). Alone pertains to whether the CB is self-reinforcing or self-stimulating. The tangible condition uses edibles, toys, or other items to establish their reinforcing value. The final condition is a noncontingent reinforcement or play condition in the presence of a therapist. During this phase, attention and tangibles are provided in a thick reinforcement schedule with no demands. Each condition is presented for 5–15 minutes, typically two to three times. The sequence in which they are presented is varied to eliminate order effects.

The conditions noted above are not evident in all EFA studies, but the notion of in vivo alteration of one independent variable at a time in order to establish what is reinforcing the behavior is fairly standard. Length and number of sessions vary widely across studies. Literally hundreds of studies, typically with two to six participants, have been published over the last three decades [37•].

EFA has proven to be a very popular methodology among researchers in the field of applied behavior analysis. Implementation, however, has been more problematic. EFAs are in common use in clinical settings such as small inpatient units of major medical schools, but they are rarely used in the broader context of group homes, outpatient clinics, and developmental centers. This phenomenon has little to do with the efficacy of the methods. Rather, the sticking point involves practical considerations.

Several points are at issue here. A room is needed, and optimal conditions require that it be devoid of all but the most essential items needed for the assessment, such as chairs and a table. Sharp or protruding objects such as door hinges can be problematic. In addition, EFA requires at least two trained investigators who would typically have at least some post-bachelor’s degree experience and training in applied behavior analysis. The process of setup, implementation, and administration involves several hours. Each assessment, therefore, requires both considerable time and considerable expense. It is also the case that low-rate, high-intensity behaviors may not be appropriate for EFA. Eliciting the behavior may be difficult and, in some cases, may expose the client and staff to potential physical and/or emotional harm. Another concern is providing contingent rewards, such as attention, that may not have maintained the behavior in the past but which may shape new maintaining variables. In addition, the assessment occurs in a novel environment around people who are not the client’s normal care providers, and these factors may elicit atypical responses. Finally, these methods increase arousal and therefore may increase the potential for CB after the session.

All of these factors support the conclusion that EFA has an important role in functional analysis moving forward. Certain case studies have employed abbreviated EFAs – namely, single-variable EFA – in a manner to avoid some of these issues. In one case study, self-injurious scratching continued for more than two years following exposure to poison oak [38]. To investigate whether social attention maintained the behavior, frequency of attention contingent on CB was manipulated systematically over the course of one hour using reversal design. In this case, an abbreviated single-variable EFA in the child’s natural environment was sufficient for devising an efficacious treatment plan. However, not all CB are so easily or quickly understood, particularly if CB occurs with less frequency. Accordingly, other methods of functional assessment were needed to complement the EFA. The answer was standardized tests of functional assessment, which is the topic that will be addressed next.

Scaling Methods

Tests of intelligence, education, and mental health have traditionally followed a systematic set of established rules. These rules result in measures, referred to as standardized tests, in which there is a “standard” by which each client’s behavior(s) and/or syndrome can be judged. This method requires the development of a norm group, cutoff score to denote severity of a problem, reliability, and validity to demonstrate that the test is consistent and measures what it purports to measure. Finally, factor analytic procedures are often also used to establish subtests or subcomponents of the scale.

This approach has been proven fruitful with respect to functional assessment. A number of tests have been developed, the most-studied of which is the Questions About Behavioral Function (QABF) [39]. Other measures reported in the literature include the Motivation Assessment Scale (MAS) [40], the Questions About Behavior Function – Mental Illness (QABF-MI) [41], the Functional Assessment for multiple CausaliTy (FACT) [42], the Motivation Analysis Rating Scale (MARS) [43], and the Functional Analysis Screening Tool (FAST) [44]. These measures and specific research on each will be reviewed next.

Questions About Behavioral Function (QABF)

The most heavily researched of the standardized measures, the QABF is modeled after the EFA in that it is designed to assess the same functions, with the addition of physical nonsocial/automatic negative reinforcement. Factors of the QABF include attention, escape, nonsocial, physical, and tangible functions. This scale has been extensively researched, with a greater number studies on this test than all of the other standardized tests combined [45].

Paclawskyj, Matson, Rush, Smalls, and Vollmer [46] demonstrated good reliability of the scale. A Spanish version of the QABF found similar reliability and factor structure to U.S. studies [47], while Matson and Wilkins [30] found reliability to be better for high- versus low-rate behavior. In addition, aggression was more reliably rated than self-injury. A large validity study compared 120 treatment plans, devised with and without initial QABF assessment, with treatment plans in the experimental group based on functions identified by initial QABF [48]. Half of the adults with intellectual disabilities were given the QABF, with a treatment plan then specifically designed for each client based on information from the assessment. The other half of the sample received no functional assessment. Rather, they underwent a generic treatment plan of blocking problem behaviors, redirecting the person, and placing them on a reinforcement schedule. At the end of the treatment period, greater reduction in CB was achieved in the QABF group.

In comparisons of the QABF with EFAs, these two methods produced similar results [49-51]. An Irish study by Healy, Brett, and Leader [52•] compared QABF and EFA results for 32 people with autism who evinced aggression/destruction, self-injury, and stereotypies. There was agreement on primary function in 24 of the 32 cases, leading the authors to conclude that the QABF was an effective tool for ascribing behavioral function.

The QABF produces interpretable results in most cases. Paclawskj, Matson, Rush, Smalls, and Vollmer [50] reported identifiable functions in 84 % of adults with ID who evinced aggression, self-injury, or stereotypies. Wilke, Tarbox, Dixon, Kenzer, Bishop, and Kakavand [53•] tested 53 adolescents and children with ASD and reported interpretable results for stereotypies in 90 % of the cases.

The QABF and MAS have been compared to EFA in the same cases. Smith, Smith, Dracobly, and Pace [54] found that the QABF produced results more in line with an EFA than did the MAS. Wasano, Borrero, and Kohn [55] conducted a similar comparison of three people with pica and found similar results in all cases.

Function of Behavior

A topic that has gained considerable attention concerns the variables involved in maintaining specific topographies of CB. Much of this research has been conducted with the QABF and is predicated on the underlying theory regarding behavior analysis in general and functional analysis specifically. The assumption is that the bulk of the CB evinced by persons with ID or other developmental disabilities is caused and maintained by environmental factors [56]. For example, Applegate, Matson, and Cherry [57] found that aggression was externally driven, while stereotypies, pica, and rumination were maintained by nonsocial/automatic reinforcement. Dawson, Matson, and Cherry [58] found similar results, as did Embregts, Didden, Schreuder, Huitink, and van Nieuwenhuijzen [59]. Food refusal was associated with self-injury and aggression as a means of escaping the task [60, 61].

Etiological Factors

Etiological factors have also been examined with regard to the QABF [62•]. Medeiros and associates [62•] determined that lower levels of IQ within the ID range resulted in more functions that maintained aggression and self-injury, which may be the result of poorer verbal skills and more limited overall repertoires. Langthorne and McGill [63] found that maintaining variables for CB were affected by specific genetic disorders. For example, in persons with Smith-Magenis syndrome who evinced CB, the cause was much more likely related to physical discomfort. Persons with fragile X syndrome were much less likely to display attention-maintained behavior. Greater social behavioral deficits and mental health problems also exacerbate CB and affect the factors most likely to maintain the behavior [64, 65], and thus various genetic and mental health disorders, as well as IQ level, should be taken into account when exploring the maintaining variables of CB.

A modified QABF for individuals with mental illness, the Questions About Behavior Function – Mental Illness (QABF-MI), is a scale that is essentially the QABF normed on a mentally ill inpatient population of 135 individuals. The scale has also been factor-analyzed, and the subscales were an identical match to the QABF [41]. The measures are scored in the same manner, on a 4-point Likert scale ranging from 0 to 3, with scores corresponding to frequency descriptors of never, rarely, some, and often.

Motivation Assessment Scale (MAS)

The MAS is designed to be rated by the caregiver and is composed of 16 items on a 7-point Likert scale ranging from 0 to 6. The scores correspond with frequency ranging from never to always. Four factors are included and are based on face validity. The scales are escape from aversive events (demand), accessing attention, tangible rewards, and sensory reinforcement. Durand and Crimmins [66] established reliability on 50 children with self-injurious behavior, reporting good reliability, while Zarcone, Rodgers, Iwata, Rourke, and Dorsey [67] reported poor reliability for the measure. Sigafoos, Kerr, and Roberts [68] also reported lower reliability than the measure's authors did, and they questioned its reliability. Conroy, Fox, Bucklin, and Good [69] produced results supporting Sigafoos, Kerr, and Roberts [68]. Conversely, Shogren and Rojahn [70] reported acceptable reliability, with results more in line with the scale’s developers. Therefore, with available psychometric data that is mixed at best, it is clear that the psychometrics must be further developed.

Functional Assessment for multiple CausaliTy (FACT)

This scale was identified as a method to follow and complement the QABF. The scale’s purpose is to determine the hierarchical nature when multiple functions exist for a given CB and thus two or more factors receive high scores on the QABF. The stated goal of the FACT is important, as up to half of the CB displayed may have two or more functions [71]. In cases of clients with multiple CB, this approach may be essential for streamlining treatment plans and decreasing the likelihood of overwhelming staff tasked with implementing the program. The FACT has 35 forced-choice items (e.g., “engages in the behavior more when he/she does not want to do something, or more when you have something he/she wants, or neither”). The forced choice between two functions in each question serves to identify which of multiple maintaining factors is likely primary and should be addressed first in a treatment plan. Good psychometrics have been established, and factor analysis has established five subscales [42], which are attention, physical, tangible, escape, and nonsocial.

Motivation Analysis Rating Scale (MARS)

The MARS was one of the first published standardized functional assessments [43]. The scale comprises six items that are rated on a Likert scale, addressing the maintaining factors of social, tangible, escape, and self-stimulation. Little research has appeared on the MARS, and its psychometric properties are still in question.

Functional Analysis Screening Tool (FAST)

The FAST has 27 items covering social influence to obtain access to specific activities or items or to escape demands from others. The measure also covers social reinforcement (attention) and nonsocial reinforcement via sensory stimulation or pain attenuation. The scale’s developers have published only one study, which found mean inter-rater reliability of 71.5 % [43]. In the same study, the condition of functional assessment in which the highest rate of problem behaviors occurred was predicted by FAST scores 63.8 % of the time [44]. Although the measure is in fairly broad clinical use, further research is needed, and it cannot be recommended at this point.

Psychometrics Between Standardized Instruments

Some studies have been published comparing psychometrics of the various scales. Shogren and Rojahn [70] compared psychometric properties of the QABF and MAS, which were administered to 20 adults with ID and CB, including aggression, self-injury, or property destruction. Good test-retest reliability and internal consistency were reported for both scales. Freeman, Walker, and Kaufman [71] also found good reliability for the QABF and MAS, and they found acceptable convergent validity for the two measures as well. Koritsas and Iacono [72] replicated the factors of the QABF and MAS, and their factor analysis supported the five factors reported by the QABFs. The MAS produced a four-factor solution, but only one factor was clearly established. The researchers found low agreement between the MAS and QABF.

Perhaps the most ambitious of the functional assessment scale comparisons was made by Zaja, Moore, van Ingen, and Rojahn [73]. The QABF, FACT, and FAST were studied in 130 adults with ID who evinced self-injury or stereotypic behavior. Overall, the QABF and FACT demonstrated better inter-rater and test-retest reliability, as well as better internal consistency, than the FAST.

Observational Methods

Contingency (A-B-C) Event Recording

Contingency event recording was one of the first methods of functional behavior assessment [74] and remains one of the most commonly used non-experimental observation methods. More frequently referred to as Antecedent-Behavior-Consequence (A-B-C) recording, this method involves observing the individual in his or her natural environment. Like functional analysis, A-B-C recording puts a particular emphasis on antecedents and other environmental-setting events that induce or trigger CB [32]. The primary difference between functional analysis and A-B-C recording is that functional analysis is experimental in nature. In A-B-C recording, the issue of retrospective bias inherent in interviews is avoided by recording real-time data. A number of A-B-C formats and checklists have been created [34], and these generally include columns to record behaviors, predictors/antecedents, and actual consequences, with all targeted CB listed. Some forms also include a column for perceived functions [75]. Relatively little research has been conducted to investigate the differences between structured and unstructured A-B-C recordings [34]. In a study of special-education teachers and paraprofessionals, Lerman and colleagues [76] found that the checklist format was preferred over unstructured A-B-C data collection due to greater perceived efficiency.

Sequential Analysis

Sequential analysis is particularly useful in cases of infrequent CB when a functional analysis is needed but a full EFA is impractical. Some researchers have utilized this method with the assistance of various technologies for recording and coding CB (e.g., video recording, ObsWin software) [77, 78]. In sequential analysis, correlations between the target CB and an environmental variable are corroborated by additional analyses of reinforcement contingencies. Sequential analysis can be used to determine the conditional probability of the suspected maintaining variables at established intervals before, during, and after a CB. The output is then examined for specific response patterns with regard to antecedents and consequences [79, 80].


A number of structured interviews have been developed to aid in functional assessment. The use of these procedures ensures a more systematic evaluation by individuals who have less training than is required with some of the other methods of investigating behavioral function (e.g., EFA).

Functional Assessment Interview (FAI)

One such measure is the Functional Assessment Interview (FAI) [75], which is an extension of the Functional Analysis Interview published in 1990. The FAI test is a paper-and-pencil measure and requires a 45–90-minute interview. Eleven topics are covered, including a description of the target behaviors, possible maintaining variables, factors predicting the CB, the CB function, how effective the behavior is in eliciting the desired outcome, possible replacement behaviors, the individual’s communication skills, actions to avoid, possible reinforcers, and a diagram for antecedents, behaviors, and consequences. A study of 21 mother/toddler pairs [81] found an 85 % agreement between the FAI and MAS. In a literature review of studies using the MAS and FAI, Floyd and colleagues [82] surmised that the measure is thorough in guiding informants through descriptions and prioritization of CB and salient variables. However, they also noted that many of the studies using the FAI failed to address specific effects in treatment development and outcome or to control for relationships with other variables (e.g., informants or interviewers having knowledge of results of other assessment methods) [82].

Student-Assisted Functional Assessment Interview

The Student-Assisted Functional Assessment Interview, which was the first such measure, is less popular than the FAI[83]. The method takes 20–30 minutes to administer and is divided into four sections. Topics include the student’s schoolwork, why and when the CB occurs, what behaviors need to be modified in the school environment to change the behavior, and what class or classes the child enjoys most. The fact that the interview is restricted to one setting and one group of people may largely explain why this interview has seen limited use. Additionally, little research on the measure is available.

Other Interviews

A second classroom-focused measure is the Student-Guided Functional Assessment Interview [84]. The primary focus of this scale was CB such as talking out of turn, teasing/bullying, not following directions, and not completing schoolwork. This two-part measure may be administered to both the student and teacher.

Other interviews exist and are all similar in design and administration. Questions are posed in-person to caregivers by the clinician or researcher conducting the assessment. Responses are typically open-ended and can provide more comprehensive information regarding the target CB and environmental variables. The primary issue with classroom-focused methods is the lack of research since the inception of these methods. None of the structured interviews have well-established reliability or validity, and given that most of these measures have been available for many years, it is unlikely that such data will be available in the future.


Over the last two decades, the functional assessment and analysis of CB has likely been the most prevalent and intensely studied method for treating persons with ID. Simultaneously, the period has witnessed what is probably the greatest advance in applied behavior analysis. While multiple strategies and methods of assessment/analysis have been proposed, the two methods that have emerged as the most researched and most popular for clinical use are EFAs and standardized scaling methods (even within the scaling methods, there are good and not so good measures). Some professionals fall into camps that see these methods as an either/or proposition. The bulk of professionals, including the current authors, see value in both methods. It is incumbent upon the clinician or researcher to consider the appropriateness of a measure in any given situation. Some measures have better-established psychometrics than others, and not all measures are equally suited for each setting.

Clinically, it would appear that the most cost-effective approach is to start with a standardized scale or scales. Some instances will lend themselves to a preliminary A-B-C analysis as well, particularly in residential or treatment facilities where caregivers are familiar with such analysis and it is often integrated into incident reports. Identifying the maintaining variable may be more difficult for individuals with a longer history of the CB, as additional maintaining variables may have been established over time. Infrequently occurring CB may be analyzed via time-lag sequential analysis. If these approaches do not produce the desired results and the CB is particularly problematic, an EFA would be in order. In instances in which a single or primary maintaining variable is not established, treatments for multiply maintained CB have proven efficacious. Indeed, these techniques have been successfully used in treating CB since the advent of applied behavior analysis, often with no preliminary analysis to identify primary function [85]. Functional assessment methodology, however, ensures a greater probability of success, with less trial and error and lower risk of behavior substitution.

Also emerging from the literature is a roadmap with regard to the maintaining functions that will likely serve as established operations and/or setting events. Level of ID, cause of ID, and type of CB all contribute to this profile, which refers to interaction between etiology, type, and function of CB. In situations where a clear function cannot be readily established for a CB in an individual with a known disorder, clinicians should consult this body of research to help determine which variables are most likely to be maintaining the CB. For example, in individuals with Prader-Willi syndrome, nonsocial automatic reinforcement is most often the maintaining variable for skin-picking [86], and this factor should be taken into consideration during treatment planning. Research in CB, functional analysis, and the areas noted in this paper will continue to undergo study and refinement. As this happens, the already-powerful tool of functional assessment will become even more efficient and effective.