Play is a fundamental part of children’s lives. Children learn about their worlds through play, developing knowledge about objects, people, and events, furthering cognitive and socio-emotional development. Play affords opportunities for children to make developmental gains in all areas. The knowledge gleaned from play contributes to the content that children draw upon to communicate with the people in their worlds (Bloom & Tinker, 2001; McCune, 1995).

Many researchers have examined children’s object play—what children do with toys and other objects—from infancy through the preschool years (e.g., Lifter et al., 2022; Belsky & Most, 1981; Fenson, et al., 1976; Lifter & Bloom, 1989; McCune, 1995; Nicolich, 1977; Watson & Fischer, 1977). Results were described in terms of qualitatively different categories of play, expanding upon Piaget’s (1962) categories of “sensory-motor activity,” “symbolic or imaginary play,” and “games with rules” (p. 2). Their findings provided a window into cognitive development and learning, including developments in mental representation and symbolic function, which can be seen on the continuum of increased complexity in object play. Because object play is embedded ubiquitously into intervention for children with delays, it is even more important to learn how object play is being described, assessed, and subsequently utilized for intervention choices.

Researchers also conducted studies on the play of children developing with delays. The categorical terms used in the studies of typically developing (TD) children were often used to describe the play of children with delays. Results demonstrated that children with delays evidence less elaborated and varied play (e.g., Libby et al., 1998; Williams et al., 2001). Several problems have been identified across both groups of studies, which render their translation to practice difficult, namely inconsistencies in descriptions. Such inconsistencies have been identified in the past, especially in serving children developing with delays (e.g., Barton, 2010, focused on pretense play; and Vig, 2007, focused on object play) and still occur. For example, an activity such as feeding a doll with a spoon would be regarded as Presymbolic for Harrop et al. (2017) and Functional for Campbell et al. (2016). These discrepancies become especially problematic when attempting to establish standardized play assessment and intervention activities. For example, if there is no consensus definition of Functional Play, establishing a child’s readiness to learn that category based on assessment would be inherently inconsistent. Further, the choice of intervention activities based on variable descriptions would be inconsistent and potentially ineffective.

The present study first aimed to systematically review studies of the play of children developing with delays to identify the delay groups and categorical terms used, building on the reviews by Barton (2010) and Vig (2007). A second aim was to compare the descriptions found with 21 categories from a recent study on typically developing children (Lifter et al., 2022), with the objective of creating a clearer category system for play descriptions. Below is an overview of several factors that contribute to inconsistent descriptions, implications for assessment and intervention, and a summary of our recent study.

Factors Contributing to Inconsistent Descriptions

Definitions of Categories

A central factor that contributes to inconsistencies in descriptions is determining the meanings of a category and the nature of activities that constitute the category. Definitions vary widely. For example, Hill and McCune-Nicolich (1981) termed actions that involved the use of toys in a conventional way as Single-scheme Symbolic, whereas Thiemann-Bourque et al., (2012, 2019)described such actions as Functional Combinatorial. The “use of toys in a conventional way” also is subject to interpretation.

Another problematic factor centers on the extent of descriptions offered in category definitions. Sidhu et al. (2022) reviewed 146 empirical, peer-reviewed play studies that used the term Functional Play as a descriptor. Less than half provided a definition of Functional Play. Of those that did, most could be interpreted in terms of “appropriate use of toys.” Sidhu et al. concluded that the toy-directed focus subsumed in appropriate play is subject to variation in interpretation. They suggested that functional play should not be used as a category of play, consistent with Vig’s (2007) proposal.

Global Versus Differentiated Categories

Several studies used global descriptions of children’s play as opposed to more differentiated categories. For example, Libby et al. (1998) used five categories—Exploration, Sensorimotor, Relational, Functional, and Symbolic—to describe the play of toddlers with autism, those with Down syndrome, and those who were TD. Dominguez et al. (2006) drew their categories from the Libby et al. (1998) study. They reported no differences in the Functional Play of children with autism compared to the TD children they observed. Studies do provide support for the differentiation of categories for describing play. Williams et al. (2001) subcategorized the global category of Functional Play into Simple Functional Play (consisting of Functional Association and Functional Use of a Single Objects) and Elaborated Functional Play (consisting of Functional Use of Multiple Objects, Functional Acts Supported by Appropriate Vocalization/Gesture, and Doll-Directed Functional Acts). This differentiation supported their conclusion that the play of children with autism, compared to TD children and children with Down syndrome, “was less elaborated, less varied, and less integrated than that of the controls” (p. 67). The foregoing inconsistencies are related to the definitions used and the extent of differentiation of those categories.

Characterizations of the Children Studied

Studies have compared children across one or more clinical groups. Factors contributing to inconsistencies include the various methods used to determine the nature of children’s delays. For example, some studies employed highly detailed diagnostic measures (Campbell et al., 2016; Hobson et al., 2013; Naber et al., 2008), while others used simpler measures such as parent rating scales (Charman et al., 1997; Papaeliou et al., 2019).

In addition, studies have included children from 2 to 10 years of age. Given the emphasis on children developing with delays, most studies appeared to focus on preschool age children when delays generally manifest. Others focused primarily on toddlers between the 2nd and 3rd year. The variability in populations of interest and differences across age groups can lead to different play descriptions and inconsistencies in resulting findings.

Implications for Research, Assessment, and Intervention

The term “play” and its various categorical descriptions are used widely and variably in serving young children developing with delays, affecting play’s use in research and practice. Consistent descriptions are imperative for understanding the nature of play, and, more specifically, to ensure standardized assessment and intervention activities. Considerable activity is dedicated to assessment and intervention activities to support these children’s development (e.g., Barton, 2015; Kasari et al., 2013; Pane et al., 2021; Pierucci et al., 2015). Evaluating play from infancy through the preschool period, with clearly specified categories, would provide a useful guide for intervention.

Contemporary Study of Children’s Play

In our recent study (Lifter et al., 2022), we exhaustively analyzed 30-min, videorecorded play samples of 289 children taken in their homes, at 10 age points between 8 and 60 months of age. We justified the need for a fresh look at play in TD children, given the limitations in the older studies. Analyses across four groups of toys within the 30-min samples focused on object play. We identified 27 mutually exclusive, qualitatively different categories of play. Exploratory factor analysis revealed five major clusters: Perceptually-based; Representationally-based; Pretense; Role-play; and Construction. The first three clusters noted here emerged in a sequential order. The Role-play and Construction clusters occurred minimally. These clusters included 21 of the 27 categories identified (see Table 1), accounting for 52.4% of the variance, which served to organize the development of the categories. They represent a highly differentiated set of categories for describing children’s play.

Table 1 The set of 21 play categories coded for correspondences

Purpose of This Review

Given the inconsistencies in descriptions across studies regarding the play of children with delays, together with our new descriptive results for TD children, we undertook a scoping review of play studies centered on children developing with delays. The following research questions guided this scoping review: (a) Which populations and age groups of children with delays in play have been studied? (b) How has their object play been observed and categorized and what findings have been reported? (c) How do the play categorizations identified in the reviewed studies correspond to the 21 categories generated in Lifter et al., 2022?

Method

We used the Preferred Reporting Items for Systematic Reviews and Meta-Analyses Extension for Scoping Reviews (PRISMA-ScR) checklist guidelines (Tricco et al., 2018).

Search Methods

PRISMA guidelines were followed to review the literature, which included determining the search terms, databases, and the eligibility criteria for study inclusion. The search strategies were drafted through consultation with an experienced university librarian, and further refined through team discussion. The electronic search using ERIC, OVID, PubMed, Wiley, and PsycINFO was conducted in November 2021, repeated in February 2022, and repeated in March 2023. Initial search terms included “object play”, “children”, and “delay”. Eligibility inclusion centered on empirical, peer-reviewed, descriptive studies focused on children, and published in English between 1964 and 2022. The reference list of each included article was screened for further eligible studies.

First, we identified articles that included the above search terms in the five databases, which yielded 42,278 studies. We then screened the articles with an expanded set of search terms—object play, children, developmental delay, and “play”—in the abstract or title of the article, searching for studies that examined the role of play in children’s development and the connections between play and other developmental domains. Second, we searched through review articles and removed duplicates. Third, we manually reviewed articles (n = 489) to determine: if the study was descriptive; if the sample did indeed include children with any type of delay; if the methodology included spontaneous object play observed in a naturalistic context, as opposed to elicited play; and if the play activities were coded categorically. Four reviewers (authors and graduate assistants) independently reviewed the titles and abstracts obtained from the third round of searches, with at least two reviewers evaluating each article found. When these terms were unclear from the abstract, the authors reviewed the entire article to determine if it met eligibility. When evaluations of full text differed, consensus was reached through discussion. Thirty-four studies met inclusion criteria. A fifth reviewer completed inter-rater reliability for the database search. The PRISMA flow chart is shown in Fig. 1.

Fig. 1
figure 1

PRISMA flow chart

Data Extraction Process

Study Characteristics

For each study: the name of the study; year of publication; purpose and design; and a brief summary of results were recorded. Also included were participant variables such as type of delay, diagnostic parameters for determining delay (if specified); chronological ages; numbers of participants; whether a TD control group was included; and assessments used. We defined descriptive studies, either cross sectional or longitudinal, based on analyses of spontaneous play activities that were conducted in a research laboratory or home setting. Studies were included that described play activities in relation to another domain (e.g., language) if there was a categorization of play with descriptions included. Intervention studies were not included. One author, who was not involved in the search and recording of characteristics, reviewed each study for the same variables. Agreement was determined through discussion and consensus.

Definitions and Origins of Play Categories

We recorded the definitions and any included examples provided for the play categories. Each study was evaluated to determine how the researchers identified their play categories.

Analyses of Studies

The definitions and categories presented in the studies were then evaluated in terms of their correspondences to the 21 categories generated from our earlier study (Lifter et al., 2022). The 21 categories are presented in Table 1, grouped according to the clusters in which they factored. The Perceptually-based cluster was the first to emerge in the children’ play. The categories were characterized as having a perceptual basis, in which the activities were supported by the physical characteristics of the toys and their perceptually-based relationships to one another. The Perceptually-based cluster was defined in terms of the physical characteristics of the toys and the apparent perceptually-based combinations that were afforded by the presentation of the toys (e.g., putting a puzzle back together; moving objects in and out of the dumper of the dump truck). They were distinguished from the Representationally-based cluster in that they did not require calling forth information from memory; the potential relations to be constructed were apparent in the environment.

The Representationally-based cluster followed in development, and included categories that indicated the children’s abilities to mentally represent the objects, people, and events of their cultures, whether as individual activities or linking activities together. The Pretense cluster emerged thereafter, which included categories of object substitution and the attribution of animacy to dolls and people. The categories in the Role-play cluster required some form of role-play activity and occurred minimally. Finally, the Construction cluster, which also occurred minimally, consisted of two categories that involved assembling objects into some kind of configuration that was greater than the characteristics of the objects involved.

The process of matching the categories presented in each study to the 21 categories listed in Table 1 was based on the definitions the researchers provided, along with examples if available. We depended on the explicit descriptions, and the examples offered, to make these correspondences. Where a correspondence could be made, the category name was entered, corresponding to the relevant category in our 21-category coding sheet. For example, Campbell et al.’s (2016) definition of Functional Play included “stacking blocks.” Thus, their term Functional Play was entered into our coding sheet to correspond with our category Specific Physical. Charman et al. (1997) provided the example “makes doll cook” within their category of Pretend Play; therefore, Pretend Play was entered into the coding sheet corresponding to Doll-as-Agent. This process involved two authors evaluating each study. A third author reviewed these correspondences. Agreement was determined by consensus (please see supplementary materials for the tables of correspondences).

Inter-Rater Agreement

We completed inter-rater agreement for (a) the search procedures and (b) the correspondences to our 21-category framework, described below.

Database Search and Study Inclusion

A doctoral student in applied psychology, who was not part of the original search, completed an independent review of the procedures. This student replicated the database search steps and inclusion criteria for 33% of the databases to ensure that all eligible studies had been included. After replicating the search terms and inclusion criteria in Fig. 1, the percentage of overall inter-reviewer agreement on the inclusion of articles was 88%.

Category Correspondence

Inter-rater agreement for correspondences to our 21 categories was completed in two phases. In the first, a doctoral student in applied psychology, who was not part of the original search, underwent guided training with the first two authors. Training involved a review of the coding procedures, guided practice in determining correspondences, and independent practice in coding three samples to criterion. For the IOA, a random sample of six articles was identified to be coded. Agreement ranged from 0.48 to 0.96, with an overall average of 0.75. We concluded that we needed to provide more detail in how the correspondences were to be made.

For the second phase, we followed the same training procedures with a new student, but we provided increased specification for how to match the studies’ descriptions of categories to our 21 categories. The student independently coded a random sample of 12 studies, representing 35% of the total sample. Agreement ranged from 0.38 to 0.90, with an overall average of 0.74. This process continued to be difficult. It reaffirmed the inconsistencies across studies in the comprehensiveness, or lack thereof, of the definitions provided, which contributed to the consequent difficulties in making these correspondences.

Appraisal of the Studies

The Joanna Briggs Institute Critical Appraisal Tools (Joanna Briggs Institute, 2017) were used to assess the methodological quality of the studies. The potential appraisal value for the Briggs’ checklists ranges from 0 to 8, with a score of 8 considered the highest in terms of methodological and ethical quality. For the studies in our review, we used the validated eight component Quality Assessment Tools for analytical cross-sectional studies, cohort studies, and quasi-experimental studies (Joanna Briggs Institute, 2017). The appraisal process was conducted by three graduate research assistants in applied psychology. Each student independently completed a Joanna Briggs checklist for each study (n = 34) and then compared ratings and conferred on a final appraisal value through discussion and consensus. Many of the studies were published in the 1980s and 1990s and did not meet the scoring criteria of 8. In general, the older studies scored lower because they lacked explanations about potential confounders and/or accounting for these confounders by adjusting study design or acknowledging limitations about the generality of results. We included studies that met a score of 4 or higher, which represented at least 50% of the 8 component criteria. The appraisal values ranged from 4 to 8, with an average of 5.5. None of the 34 studies was excluded due to having a score of 4 or lower.

Results

Study Selection

The PRISMA flow diagram (Page et al., 2021) that formed a guide to our search is presented in Fig. 1. It illustrates the screening steps and corresponding number of studies at each step of the process. After duplicates were removed and manual screening of the articles occurred, 34 studies met inclusion criteria and were assessed further. The most frequent reasons for exclusion were that the sample did not include children with any type of delay, the methodology did not include spontaneous play, or that the study did not code play categorically.

The Participating Children

The 34 studies included children with chronological ages between 11 months to 16 years, 5 months. The majority focused on children in the second and third years of life. Studies that included older children (e.g., older than 5 years) with developmental delays set out to match their developmental levels to those of younger children without delays. The age ranges, delay groups, control groups, and sample sizes are listed in Table 2.

Table 2 Study characteristics

Delays and Diagnostic Parameters

The studies included children with a variety of delays: children with autism, children with Down syndrome (DS), children with language or cognitive delays, children with intellectual disabilities, children with moderate learning difficulties, and children with hearing or language impairments. Studies also included children with Williams syndrome (WS) and fragile X syndrome. Some studies included children with multiple delays, either disaggregating results or combining them into a broader definition of delay type (Tilton & Ottinger, 1964; Sigman & Ungerer, 1984; Wilson et al., 2017; Hobson et al. 2009; Hobson et al., 2013).

The diagnostic parameters to determine delay type for each study’s sample of children varied extensively. A few studies evaluated children for autism through a complete clinical evaluation with licensed practitioners, including the Autism Diagnostic Observation Schedule, 2nd edition (Campbell et al., 2016; Hobson et al., 2013; Naber et al., 2008). Other studies used rating scales to determine if children met criteria for autism or they did not report diagnostic parameters for autism or developmental/cognitive delays (Charman et al., 1997; Papaeliou et al., 2019; Sigman & Ungerer, 1984; Stone et al., 1990).

How Object Play Was Studied: Methodologies

Design of the Studies

All 34 studies included naturalistic observation of spontaneous play, either by direct observation or retrospective video analysis. Twenty-six studies included TD children as a control group. Sample sizes varied from 16 total participants (Mundy et al., 1987) to 145 participants (Campbell et al., 2016).

Origins of Play Categories

The majority of studies identified the origin of the categories they used. Origins included previous literature as well as play categories specifically generated or designed for a particular study. Explanations and justification for play category choice varied widely. Studies that stated origins of coding schemes primarily used categorical descriptions taken from earlier studies, except for four studies (Hobson et al. 2009; Rescorla and Goossens 1992; Manning and Wainwright 2010, and Lewis & Boucher, 1988).

Of the studies citing previous literature as the basis for their play categorizations, two studies used Belsky and Most’s (1981) categorization scheme (Campbell et al. 2016; Gowen et al., 1992). Two other studies used variations of the Developmental Play Assessment categories (Lifter, 2000; Thiemann-Bourque et al., 2012, 2019). Multiple studies used different combinations of previous literature categories, citing varying other literature for play categories in functional, relational, and manipulative play (Williams et al., 2001; Charman et al., 1997; Harrop et al., 2017; Jarrold et al., 1996; Libby et al., 1998; Naber et al. 2008). Fanning et al. (2021) explained their specifically designed coding scheme given their claim of a lack of clear-cut organization across the previous play literature and differing operationalizations of categories. Some studies listed categories of play but did not explain the origin or justification of using their coding scheme (Lewis & Boucher, 1988; Manning & Wainright, 2010; Rescorla and Goossens 1992; Papaeliou et al., 2011; 2019). Overall, there was no clear pattern or consensus in terms of the origins of play categories for the 34 studies analyzed.

Findings of the Studies

Overall, the studies confirmed that children with delays have delays in play, with some variation and inconsistencies across different categories of play. For the most part, researchers reported delays for children in their play activities, the exception being Thiemann-Bourque et al. (2019), who identified similarities across children in functional play. A summary of this review’s results is presented in Table 3.

Children with Autism

The studies examining children with autism compared to TD controls reported significantly fewer instances of and diversity in pretend play, less complex functional play, and less engagement in social interactions during play (Baron-Cohen, 1987; Blanc et al., 2005; Campbell et al., 2016; Fanning et al., 2021; Hobson et al., 2013; Jarrold et al., 1996; Manning & Wainwright, 2010; Stone et al., 1990; Williams et al., 2001). Several studies used assessed mental age to match children with autism to children with the same mental age who had other disabilities, and/or to typically developing peers whose chronological age matched those groups’ mental ages (Baron-Cohen, 1987; Fanning et al., 2021; Williams et al., 2001). Consequently, the children with disabilities in these studies were on average 2–3 years older than the TD children.

Other studies of children with autism showed conflicting patterns of play. Thiemann-Bourque and colleagues (2012) did not find significant differences in play when comparing 3–6-year-old children with autism to children with other delays when matched by age, expressive vocabulary, and scores on the Mullens Scale for Early Learning (MSEL). A follow-up study matched children with autism to TD controls using scores in MSEL and Preschool Language Scale-4th Edition assessments of expressive language. Of note, on average the children with autism were 74-months-old while their matched peers were 34-months-old (Thiemann-Bourque et al., 2019). These authors found that children with autism had similar functional play skills as TD controls but struggled to match their symbolic play skills (Thiemann-Bourque et al., 2019).

Children with Down Syndrome, Williams Syndrome, Language Delays, and Other Delays

Studies including children with DS or WS frequently did not focus on these groups’ play behaviors or how their play compares to TD peers. Instead, these groups were often used as comparisons to further describe the play behaviors of children with autism. Hill and McCune-Nicolich (1981) investigated the play of children with DS, without comparing them to other groups, and found delays in symbolic play that correlated with developmental level. Studies on children with WS demonstrated that they exhibited significantly less spontaneous and imaginary play compared to TD children (Papaeliou et al. 2011; Fanning et al., 2021). Fanning et al. (2021) matched the children with WS to TD controls based on chronological age, while Papaeliou et al. (2011) used raw scores on the MSEL to match between groups.

Rescorla and Goossens (1992) focused primarily on children with expressive specific language impairment (SLI-E). They found no differences in Functional Conventional, but did find less well-developed Sequential Play, and fewer occurrences of Symbolic Play transformations in SLI-E. Other studies assessed play as a secondary level of analysis, focusing on play’s relation to attachment, gender differences, social engagement, or cognitive and language abilities (Campbell et al., 2016; Gowen et al., 1992; Harrop, et al., 2017; Hill & McCune-Nicolich, 1981; Malone & Langone, 1995; Mundy et al. 1987; Naber et al. 2008; van der Kooji, 1978).

Summary of Findings

In general, the differences between and among groups largely centered on symbolic play as opposed to functional play. Williams et al. (2001) and Fanning et al.’s (2021) studies were the exception, where differences between and among groups were identified in the children’s functional play (see Table 3 for a full description of the studies’ results).

Analysis of Play Categories: Category Correspondence

The results of the analyses comparing the categories gleaned from the 34 studies to the 21 categories from our earlier work are presented below. There were three major clusters where we saw the majority of play activities: Perceptually-based; Representationally-based; and Pretense categories (Lifter et al., 2022). As a visual example, the Perceptually-based cluster is depicted as a sunburst graph, in which the play categories from our earlier study are depicted in the inner rings, and the corresponding different descriptors identified across the 34 studies are depicted in the outer rings. The number of segments in the outer rings show the diversity of descriptors and the relative size of the segments indicates the number of studies that used a particular descriptor. Studies that each used a unique descriptor for a given category are all counted under the “Unique Term” category. When a correspondence to one of the 21 categories could not be made, the study was counted in the “Not Included” category for that play behavior (please see ESM Appendix 2 for sunburst charts for the two other clusters).

Perceptually-Based Cluster

The five categories included in the Perceptually-based cluster are presented in Fig. 2.

Fig. 2
figure 2

Perceptually-based cluster

This cluster was the first to emerge in the children’s play. The categorical descriptions that aligned with the Discriminative, Takes Apart, Presentation Combinations, General Combinations, and Pretend Self categories were each described by researchers using multiple overlapping and unique terms. Discriminative activities were frequently not included (n = 11), or they were labeled as Functional (n = 11), Functional Conventional (n = 2), or given a unique label (n = 10). Behaviors aligned with the Takes Apart category were often not included (n = 19), labeled as Exploration (n = 4), or categorized using nine different terms across the remaining 11 studies. Presentation Combinations were most often labeled Functional (n = 10), Construction (n = 4), or not included. General Combinations were not included in 7 studies or were labelled as Relational (n = 5) or Construction (n = 4), with the remaining 16 studies using 11 different terms. Pretend Self activities were frequently categorized as Functional (n = 7), Pretend (n = 4), one of 15 different terms by 19 studies, or not included (n = 2).

Representationally-Based Cluster

The five categories that comprise the Representationally-based cluster are Child-as-Agent, Learned Combinations, Specific Physical, Single-scheme Sequence, and Multi-scheme Simple. Play behaviors that lined up within these categories were labeled with many distinct terms or were not included in the reviewed studies. Activities that corresponded to Child-as-Agent and Learned Combinations were often considered Functional Play (n = 10; n = 11), Pretend Play (n = 4; n = 4), or described using 17 and 15 distinct terms across 18 and 17 studies, respectively. Activities that corresponded to the Specific Physical category were most often considered Functional Play (n = 11), not included (n = 7), or labeled as Construction (n = 4), where the remaining 12 studies used 11 unique terms. Single-scheme Sequences and Multi-scheme Simple were both frequently not included (n = 19; n = 20), were included as Sequence: Single-scheme (n = 4) or Sequence: Ordered Multi-Scheme (n = 4), or described using one of 10 and 9 distinct terms for Single-scheme Sequence and Multi-scheme Simple, respectively.

Pretense Cluster

The categorical descriptions that aligned with the Pretense cluster categories of Substitution + , Substitution − , and Doll-as-Agent were each often categorized, respectively, as Symbolic (n = 13; n = 12; n = 7), Pretend (n = 9; n = 10; n = 7), or not included (n = 4; n = 6; n = 12). Activities corresponding with Multi-scheme Complex were not included in 23 studies, and they were coded as Sequence: Ordered Multi-scheme in 4. Person-as-Agent play was not included in any reviewed study.

In summary, the correspondences presented in the 34 studies compared to the 21 categories identified in Lifter et al., (2022) varied widely. The number of terms used to describe any of the 21 categories ranged from 1 (Hobson et al., 2009, 2013, who focused only on pretend play) to 10 (Gowen et al., 1992; Rescorla and Goossens 1992). The overall average across the 34 studies was 3.9 categorial descriptions that corresponded to one or another of the 21 categories.

There was overlap in the terms used to describe the different categories. Functional Play stood out as a general descriptor of many categories. This term also was used to describe play activities in different clusters, namely Discriminative activities, Presentation Combinations, and Pretend Self in the Perceptually-based cluster and all five categories in the Representationally-based cluster. Symbolic Play was the predominant term used to describe the categories in the Pretense-based cluster, except for Person-as-Agent, which was not captured in any of the studies.

Discussion

This review focused on the object play of young children developing with delays, yielding 34 studies that met criteria for inclusion. These studies contributed important information: children who are developing with delays also consistently have delays in play, and the categories and definitions used to describe their play vary significantly. It is therefore critical to document these play delays with a higher degree of consensus for the development of intervention plans and to contribute clarity to the play literature.

The 21-category coding scheme afforded the opportunity to organize the descriptions across studies. Overall, we join these researchers in seeing the same activities in the children’s play, but we are all using different names to describe these same activities. Our results complement and extend the work of prior reviews by Barton (2010) and Vig (2007); the identified inconsistencies in play terminology render comparisons across studies and conclusions about play trajectories difficult to draw. Limitations of the studies included varying ages of children and differences in cognitive or language matching or lack thereof. Further, there were vague descriptors of play categories and studies that were mainly focused on different outcomes (e.g., language, gender) rather than purely describing play as related to the respective clinical group. This review highlights practical implications from these inconsistencies in the literature that center on how to evaluate a child’s progress systematically and consistently in play to determine goals for intervention. This discussion is organized by the parameters of analysis, followed by recommendations, limitations, and implications for assessment and intervention.

Children Developing with Delays

Delays in play were observed for a wide range of children identified as developing with delays. Various clinical groups were studied, with and without control groups. Of note, most studies focused on children with autism (n = 25) compared them to TD children or those with other delays. The prominence of studies of children with autism may be related to the increasing numbers of children who are identified with autism earlier than in the past and the deficits in spontaneous play often seen in these children (Libby et al. 1998; Fanning et al., 2021; others).

Several studies reported on children with different kinds of delays (e.g., Down syndrome, Fragile X syndrome, language delay: Naber et al. 2008; Malone & Langone, 1995; Stone et al., 1990), but it was frequently a secondary or third level of analysis. Further research focused on children with various delays, in addition to those with autism, such as children with language and/or motor delays and genetic syndromes is warranted, rather than primarily using them as comparison groups.

Although the focus of this review was on inconsistencies across descriptors, additional inconsistencies were introduced with variable assessments of diagnostic characteristics, which was especially the case for the children on the autism spectrum. Children were categorized as having autism or other delays by a variety of diagnostic parameters, suggesting very heterogenous populations across studies, making it difficult to draw conclusions to then apply clinically. Further, TD comparison groups varied widely. Because some studies matched TD control groups by mental age, others by chronological age (i.e., children with delays were often older), and still others by language ability, it is difficult to compare results across descriptive play studies and make inferences for clinical application across children with varying delays.

Origins and Definitions of Play Categories

There was considerable variability in the origins of the play categories offered across studies. Researchers borrowed coding schemes from other studies or integrated their descriptions from multiple studies. Alternatively, researchers generated their own coding schemes. These methodological differences contribute to discrepancies in descriptions. These patterns reveal that there is no consensus on how to describe children’s play. The 21-category scheme used here was generated from a review of the literature to begin with, and then revised and expanded to account for every activity seen in the samples of children’s play. We suggest that this process resulted in a comprehensive, empirically-based set of categories spanning the age range of 8 to 60 months.

Further, the extent of explanations of the definition of categories across studies differed. Some studies provided thorough descriptions with several examples (e.g., Fanning et al., 2021; Harrop et al., 2017; Thiemann-Bourque, 2012) while other studies provided limited information, directing the reader to a study from which they drew their categories or not providing detailed information or justifications (e.g., Papaeliou et al., 2011; Manning & Wainright, 2010; Williams et al., 2001; Charman et al., 1997). The inconsistencies in the presentation of coding schemes contributed to the difficulties in establishing inter-rater agreement.

Comparisons Across Categories

There was considerable variability across studies in how play activities were described. Some studies focused exclusively on one type of play, whether functional play or pretend play, which accounted for the lack of correspondence of their categories to the full 21-category scheme presented here. For example, Blanc et al. (2005) and Hobson et al. (2013) focused only on pretend play. The mental ages of the children in Blanc et al. (2005) were approximately 40 months, which corresponds to the developmental period when symbolic play emerges. Conversely, other studies focused on developmentally younger children (Gowen et al., 1992; Wilson et al. 2017). Therefore, it was not surprising that the more complex categories were not accounted for in their coding schemes. Further, sequences were not included in some studies, and Person-as-Agent activities were not accounted for at all. Person-as-Agent emerged from analysis of children’s play activities in Lifter et al., 2022) and appeared to complement Doll-as-Agent activities.

These inconsistencies and the varying ages of the children studied have implications for: category correspondence to the 21-category scheme; comparisons across studies in the meaning of functional play and symbolic play, including the value of global versus differentiated descriptions; and clinical application of play assessment and intervention.

Dominance of Functional Play

Functional Play was a global category used to describe categories that cut across our clusters. Several studies regarded the categories we would call Discriminative, Presentation Combinations, and Pretend Self, from the Perceptually-based cluster, as Functional Play. Similarly, several studies used the same term for categories in the Representationally-based cluster (i.e., Learned Combinations, Child-as-Agent, Specific Physical), which emerged after the Perceptually-based cluster. This level of overlap suggests that functional play, as a descriptor of play, is not specific enough to capture the nuances in children’s progress in play. These inconsistencies support the need for increased specificity in definitions of categories.

Nevertheless, several researchers have pointed to the importance of functional play for children with delays and disabilities but cautioned that more research is needed. Williams et al. (2001) differentiated functional play into several subcategories. They found there were no group differences in the proportion of play time in functional play or the number of acts performed, but the functional play of children with autism was less elaborate and less varied. Dominguez et al. (2006) and Fanning et al. (2021) suggested that functional play needs to be studied further as it applies to prerequisites for symbolic play. They noted that the differences across groups of children found in pretend play are harder to interpret without knowing the contribution of earlier play skills (i.e., functional play). The foregoing studies provide support for a differentiation of categories within what is called functional play, to study the prerequisites more carefully.

Pretense (Symbolic) Play

There appeared to be greater consistency across studies in the definition of symbolic or Pretense play. Several studies drew from Leslie (1987), where object substitution is a central component of definitions. The inclusion of object substitution added clarity to the definitions of symbolic play, as well as greater correspondence to our two categories of substitution.

The category Doll-as-Agent was specified less frequently and Person-as-Agent not at all. These categories might not have been identified due to the relatively younger ages of children in some studies. Doll-as-Agent was observed at later ages than the two forms of object substitution (48 months) in our earlier study. Where it was identified in this review, it was often designated as symbolic play, along with object substitution. Thiemann-Bourque et al. (2019) distinguished Doll-as-Agent. They reported that the children with autism had a “specific deficit” in this aspect of symbolic play, stating “the children with autism engaged in significantly less symbolic play (than TD children) and this difference was related to one category, doll as agent” (p. 11).

The attribution of animacy to doll figures and other persons suggests a still more advanced level of cognitive understanding than the two forms of object substitution, rendering it distinct from object substitution. Researchers have explained these developments as required for the development of theory of mind (Bialystok, 2001; Farhadian et al., 2010). The fact that it emerged in our study (Lifter et al., 2022) following the two forms of object substitution supports the claim of increased cognitive complexity required for the attribution of animacy as well as the importance of increased differentiation of categories.

The omission of Person-as-Agent in all studies may serve to underestimate children’s capacities for the attribution of animacy. We were mystified that we did not see Person-as-Agent activities in some of the children we observed, even though they appeared to be sophisticated players. Ultimately, some children were engaging their caregivers more in their play—seemingly expecting the caregivers to be actors in the play activities—from which we generated the category. These differences may be the result of cultural differences and/or family differences in the way caregivers play with their children, which warrants further study.

Clinical Application of Play Assessment and Intervention

Using more differentiated categorical descriptions of play inherently increases precision in how play is assessed, monitored, and understood. Given the complexity and diversity of play, comprehensive and standardized play assessment is necessary to formulate individualized intervention choices that are based on the child’s specific skills, goals, and development. More specific categorization and assessment of play behaviors can help clinicians improve treatment efficacy by accurately identifying a child’s present and emerging play skills. Further, engaging in developmentally relevant play activities can be more engaging and reinforcing for children, helping them attain fluency in specific play skill areas that scaffold more advanced ones or supplement other domains (e.g., language, social emotional: Pierucci et al. (2015)).

More practical relevance is evident in results from intervention studies. For example, through assessment, Pane et al. (2021) distinguished between developmentally relevant and age-appropriate play activities; they targeted both kinds of activities in their intervention study with children with autism. Their results supported acquisition of the developmentally relevant as opposed to the age-appropriate activities. Lifter et al. (1993) provided similar results, while Pierucci et al. (2015) emphasized the importance of choosing intervention goals that are matched to both developmental level and play skills by using comprehensive assessment. Accurately understanding a child’s progress in play can help families and practitioners set realistic expectations for that child, more effectively improve play skills, and target other developmental domains through play more successfully.

Limitations

A limitation of this review centers on the difficulty in obtaining high inter-rater reliability in making the category correspondences. These difficulties were related to the varying levels of specificity in the definitions offered across the studies. The 21 categories that were used for the correspondences describe distinct play behaviors. However, the variety and lack of definitional specificity in play behaviors across the reviewed studies led to challenges in consistently establishing these correspondences. The authors (composed of researchers and students) reached initial agreement with the category correspondence through consensus where the correspondences were difficult to make. However, for the inter-rater reliability analyses, the students worked independently without the opportunity to reach consensus on questions.

Further, the use of 21 categories, despite the level of precision, runs the risk of being too numerous for clinical purposes. These highly differentiated categories did provide a starting point to review the wide array of descriptions in the literature. However, there is still potential for certain play behaviors to be double coded or challenging to accurately categorize (e.g., a child pretending to make food with a doll and caregiver could involve play behaviors across multiple categories). Some form of systematic reduction in clinical application for external practitioners is required to improve useability without sacrificing specificity.

Conclusions and Future Directions

This review affirmed the importance of studying the play of children developing with delays, particularly due to the extra supports these children may need. Given the inconsistencies across studies, we conclude that play is a complex construct to define. Researchers have struggled with coding schemes to capture the nature of children’s play activities. Nevertheless, the variable categories and definitions of play skills compromise the generality of study findings, both to compare across research studies and for clinical utilization.

Future studies should maintain specific categories based on empirical evidence, but the number of categories should be reduced to identified relevant categories. A reduction could be accomplished by: emphasizing categories that appear for all children; combining categories based on conceptual similarities; and eliminating categories that are expressed by only select children. For example, we plan to remove Social Engagement from the categories, given that it really is a social aspect that overlays the focus on object play. We included it in our coding scheme to capture comprehensively all the activities the children expressed. In addition, we plan to combine the two forms of Substitution, given that they emerge in a similar developmental trajectory. Further reduction will require analysis of the conceptual requirements, as well as developmental trajectories, of children’s progress in the categories.

Clear and specific descriptions of play are needed to assess children’s play as well as to guide intervention choices, given that play is essential to development. Such descriptions will allow for systematic comparisons across different clinical groups of children. In addition, it is important to optimize the success of interventions by targeting activities that correspond to children’s current developmental profiles—activities that children are ready to learn—which is based on a child’s progress in play.

As such, play is a key component in growth across all developmental domains for children with delays and disabilities, as well as for typically developing children. Without an empirically-based standard format, it remains difficult to fully understand and leverage the multitude of benefits that assessing play can have in research and clinical work involving young children.

Table 3 Study Results