1 Introduction

This study investigates the extent to which disruptive behavior in German elementary school students can be reduced through the use of a behavior-modifying intervention, the Daily Behavior Report Cards [DBRC], [1]. Given the high prevalence of mental health issues among students and the implementation of Multi-tiered Systems of Support (MTSS) in Germany, exploring the effectiveness of DBRC is both reasonable and novel, providing valuable insights into behavior management in a different cultural setting [2]. Previous studies have primarily focused on the American education system, leaving a gap in understanding the effectiveness of interventions like DBRC in Germany. This study addresses this gap by using a single-case design to investigate DBRC in German elementary schools. Disruptive behavior refers to actions and conduct exhibited by students that disrupt the learning process or create a negative classroom environment [3]. These behaviors are often described as inappropriate, distracting, or disrespectful behaviors and they are impeding the ability of teachers and students to engage in educational activities effectively. Examples of disruptive behaviors exhibited by elementary school students include speaking out of turn, not following instructions or rules, engaging in physical aggression towards peers or staff, and struggling to remain focused on tasks [4, 5]. Disruptive behavior does not only diminish the quality of instruction [6, 7] but also has long-lasting detrimental effects on the well-being of teachers [8]. Studies have demonstrated its negative impact on teacher burnout and self-esteem [9,10,11]. Other effects of frequent disruptive behavior include reduced social integration [12, 13] and deterioration of the overall school and classroom climate [14, 15]. Identifying the underlying causes of disruptive behavior is essential, as these behaviors can arise from a variety of factors, including a student's temperament, social and emotional challenges [16], and learning difficulties [17, 18]. Conventional approaches to behavior management encompass several strategies, including student-centered approaches, such as differential reinforcement and time-out, classroom- and teacher-centered interventions such as establishing rules and routines and setting clear expectations [19].

2 The Daily Behavior Report Cards as a school-based method of assessment and intervention

Daily Behavior Report Cards [DBRC], [20, 21] can be effective in reducing disruptive behavior and widely used for documenting and providing feedback on student behavior. DBRC procedures involve clearly defining one or more target behaviors, utilizing a scale for periodic assessments or a frequency of counts, implementing a daily monitoring system, and fostering communication between student teachers and their homes [1]. DBRC procedures involve teachers and students setting one or more goals for student behavior (e.g., completing classroom assignments with fewer than two prompts). The exhibited behavior is then assessed on an individual scale, and daily feedback is provided to the students. DBRC forms serve as the basis for feedback conversations, which may involve parents and other professionals. DBRC stands apart from other behavior rating forms because of its emphasis on feedback and summative rating. Studies have demonstrated the efficiency and effectiveness of DBRCs in improving academic skills and addressing behavioral issues at the same time (e.g., [22, 23]). Additionally, the DBRC facilitates communication between homes and schools [20, 24]. DBRC are known for their flexibility, adaptability, and cost-effectiveness, making them efficient in delivering direct feedback on students’ academic and social behaviors [1]. Therefore, DBRC can also be considered an evaluation tool [21].

Also the DBRC has proven to be a sensitive progress-monitoring measure, given the correspondence between assessment and intervention targets [25]. This characteristic provides information concerning the specific behaviors targeted for intervention but does not inform about how the intervention may impact the broader constructs of interest. Two levels of measurement specificity are required to evaluate the effects of school-based interventions. First, a highly sensitive measure with a high degree of overlap with treatment targets is needed to enable agile decision-making, and a more global measure to determine the extent to which the intervention impacts the overall problem domain [26, 27].

3 Research on the Daily Behavior Report Cards

Extensive research has been conducted on DBRC, particularly in the United States. Several review papers summarize studies on DBRC with different focuses. The meta-analyses of Vannest, Davis et al. [28] evaluated the effectiveness of DBRC within single-case research designs. They examined 17 studies involving 107 participants, yielding 48 separate effect sizes. The mean improvement rate difference for all studies was 0.61, indicating a 61% improvement rate from the baseline to the intervention phase on a range of outcomes. The range of improvement was 56–66%. They found that high levels of home involvement and broad use throughout the day are associated with stronger intervention outcomes.

In addition, a meta-analysis by Pyle and Fabiano [29] concluded that the DBRC is an effective and acceptable intervention for school settings, including 14 articles that provided data on 40 individual cases of students with Attention Deficit Hyperactivity Disorder (ADHD). The findings indicate that using DBRC generally led to improvements, although the effect size varied depending on the estimation method. Implementing the DBRC leads to significant behavioral changes, with an increase of nearly 30 percentage points in desirable behavior from baseline to intervention. The effects were consistently large, as indicated by effect sizes ranging from 0.59 to 0.94 and an overall Hedges’ g of 2.19. This study highlights the importance of single-case studies, particularly in ADHD, where most studies use a single-case design.

Riden, Taylor et al. [30] examined studies published between 2007 and 2013 that included single-case and group designs. They analyzed 11 studies, three of which were single-case designs and eight were group designs. The total number of participants was 390, with 11 single case studies and 379 group studies. The study participants were classified as at-risk individuals, eligible for special education services, and/or having ADHD or other emotional and behavioral disorders (EBD). In summary, the meta-analysis revealed a wide range of effect sizes across studies, suggesting that the effectiveness of DBRC interventions can vary depending on the research design and participant characteristics.

Another meta-analysis by Iznardo, Rogers et al. [31] examined the efficacy of DBRC in children with ADHD. The authors analyzed seven group-design evaluations of DBRC interventions involving 272 participants with an average age of 7.9 years. These studies used various methods, including randomized group assignments, pre- and post-treatment scores, and observational coding, to assess the impact of DBRCs. The dependent measures were teacher ratings and systematic direct observations of academic and social behaviors. The results showed that DBRCs reduced teacher-rated ADHD symptoms with a moderate effect size (Hedges' g = 0.36). However, two studies that used observational coding instead of standardized tests were excluded. A separate analysis using systematic direct observation as a measure found a large effect size (Hedges' g = 1.05) but with high heterogeneity. Another analysis found differences in the effects of DBRCs on comorbid externalizing symptoms with a moderate effect size (Hedges' g = 0.34) and high heterogeneity. In conclusion, DBRCs effectively reduced the frequency and severity of ADHD symptoms in classroom settings. They also have a significant impact on co-occurring externalizing behaviors.

The latest meta-analysis on DBRC was conducted by Ackley [20], including eleven studies from 2007 to 2022 with preschool to sixth-grade participants. The analysis consisted of 11 studies with a total of 39 participants. Most studies employed single-case research designs, with the multiple-baseline design being the most common (63.6%), followed by the AB design (9.1%), reversal design (9.1%), changing criterion design (9.1%), and multiple-probe multiple-baseline design (9.1%). The results, determined through standardized mean difference calculations, demonstrated a significant positive impact of DBRCs on student outcomes. The analysis also considered several moderators, multiple baseline design types, and publication statuses that accounted for some of the variability observed across studies. In summary, this meta-analysis confirms the efficacy of DBRC.

In summary, extensive research on DBRC has been conducted, with multiple reviews summarizing the findings. Meta-analyses by Vannest, Davis et al. [28], Pyle and Fabiano [29], Riden, Taylor et al. [30], and Iznardo, Rogers et al. [31] evaluated the effectiveness of DBRC interventions, particularly for students with ADHD and EBD. These studies consistently support the positive impact of DBRCs on behavioral changes of varying effect sizes. These studies emphasize the importance of considering factors such as home involvement, research design, and participant characteristics when implementing DBRC interventions for positive outcomes.

Another noteworthy factor in the study data is the predominance of single-case designs among the intervention studies included in the reviews above. These designs offer significant value in the context of DBRC research for several reasons. First, single-case design studies allow the examination of individual developmental trajectories [32]. Second, single-case studies enable data collection without external reference norms such as a control group. This approach is particularly advantageous because it is ethically feasible within the context of school intervention [33]. In addition, it was possible to trace the development course precisely in connection with the intervention. However, they also present unique challenges, such as the delay of intervention in multiple baseline designs (MBD) and the complexities of treatment removal in reversal designs. Despite these challenges, it is beneficial to employ differentiated and robust regression-based analyses, specifically for single-case data. This methodology facilitates accurate statements about student development using regression models [34,35,36]. Additionally, using various overlap indices applied alongside visual inspection for analysis purposes (Parker et al., 2011) further enhanced the significance of these studies in single-case evaluations. In addition, various overlap indices can be used for analysis and visual inspection. This allows the phase differences to be systematically displayed as a percentage overlap between transitions. Therefore, different overlap indices provide different types of information regarding data development [37].

4 Objectives of the study

The systematic reviews and meta-analyses above show that DBRC has a significant impact on reducing undesirable behaviors and promoting desirable ones. In addition, the review papers indicate a scarcity of studies conducted in Europe, and no studies conducted in Germany specifically examine the effectiveness of DBRC using a single-case design. Several factors may contribute to this outcome, such as the teacher’s role, the nature of the pedagogical strategies employed, the prevalence of DBRC, and the degree to which DBRC is integrated into tiered systems, as seen in the USA. Thus, it is reasonable and novel to assess the effectiveness of DBRC in a sample of German elementary schools. These schools also need effective interventions in response to the high prevalence of mental health issues [38,39,40]. Germany has also tried implementing MTSS, applying a potentially effective Tier 2 intervention for reducing disruptive behavior. International comparisons, showed that various preventive concepts, along with tiered support, have implemented DBRC or comparable interventions at the second tier [41]. DBRC has already been integrated as an indicated intervention in the Multimo MTSS approach [2, 42]. However, the effectiveness of the findings is not easily interpretable, necessitating further studies [2]. Considering the challenging situations faced by German students and teachers, conducting a study to examine the effectiveness of DBRC in a German sample would complement existing research and contribute to the understanding of intervention-related MTSS research in Germany. The following research questions are derived:

4.1 Research questions

  1. 1.

    Does a DBRC intervention lead to decreases in disruptive student behavior in an inclusive elementary school setting?

  2. 2.

    Does a DBRC intervention lead to decreases the stated problem behaviors of students in an inclusive elementary school setting?

  3. 3.

    To what extent do changes in specific goal behavior associated with a DBRC intervention impact a global measure of general disruptive behavior?

4.2 Hypotheses

  1. 1.

    Implementing DBRC can reduce disruptive behaviors in elementary school students with disruptive behavior.

  2. 2.

    Implementing DBRC can lead to a decrease in the behaviors that were set as the goal of the intervention (DBRC).

  3. 3.

    Changes in the specific goal behavior (SGB) correlate with the general disruptive behavior (GDB).

5 Methods

5.1 Participants & setting

The participants in this single-case study were ten second-grade students from three classes in an inclusive elementary schoolFootnote 1 in the federal state of North Rhine-Westphalia (Table 1). Each class had around 25 children. Before the study, parents and students consented to participate and were informed about the study objectives and the intervention. More than half of the students in the sample (60%) were male, with six male and four female students (40%). The average age of the students was 7.70 years, with a standard deviation of 0.675 and a median age of 8. The age of the students ranged from 7 to 9 years. The school was located on the outskirts of a major city. Students were selected by their teachers based on disruptive behavior issues. The assessment of disruptive behavior was determined based on the total score of the Integrated Teacher Rating [ITRF, 43]Footnote 2 and every student demonstrated behavior above the norm on either the overall or disruptive behavior scales.

Table 1 Individual data of all 10 cases

5.2 Implementation

To implement DBRC in the three classes, classroom teachers who carried out the intervention received training at the end of the baseline phase. A half-day in-service training program was conducted at the school with colleagues to accomplish this. During the training, the intervention’s mode of action and goals, including its application in everyday school life, were discussed. The training topics included positive reinforcement, feedback discussions, and understanding the development of behavioral problems. Based on this the project team, together with the teachers, analyzed the main problem behaviors of the students in the sample. Specific student behavior goals were established with the project team as the foundation for implementing the DBRC. Additionally, there was a consideration of the strategies and tools that both the teachers and the students should be provided with. Both the project team and teachers evaluated the training as beneficial and successful. Additionally, regular meetings were held between the project team and the teachers to ensure proper integration of the intervention into the classroom.

The quality of implementation was assessed based on the collected data (Table 2), which can be regarded as high. The implementation phase spanned 120 days across the classes. Of these 120 days, 27 were canceled for vacations, breaks, or other reasons. Among the 93 potential days when DBRC could have been used in the classes, it was utilized for 89 days (95%). Hence, there were only four days when it was not employed in Class B. Classes A and C implemented the DBRC every possible day. Additionally, five supplementary components (reminders, tips, praise, reason, and next opportunity; Table 2) were included in assessing whether the intervention was utilized. Each day, the teachers indicated whether they had implemented these aspects of the feedback in relation to the intervention with the students. The results revealed that the intervention was predominantly implemented following the prescribed feedback structure (Table 2). The criterion for praise (3) was the least implemented for all classes. This is due to communicative variations among teachers, where it was sometimes challenging to identify positive behavioral aspects, leading to its implementation in only approximately 78% of the cases. The criterion “remind” was implemented in 88% of cases across all classes, indicating the highest level of fidelity. “Tips” (83%), “reason” (81%), and “opportunity” (81%) fall in between with slightly lower rates of implementation.

Table 2 Implementation data

5.3 Experimental design

This study was carried out as a single-case study in the AB design, with different start and measurement times in the baseline phase A (Phase A). The Intervention phase (Phase B) had several start times. Accordingly, this was a single-case study with a multiple baseline design [32]. The baseline phase comprised 35–45 measurement timepoints. The intervention phase comprised 35–45 measurement timepoints (Table 3).

Table 3 Descriptive statistics of the single-case data

Figure 1 presents a schematic representation of the study design. To ensure the successful implementation of the multiple baseline and research designs, the teachers were reminded with visual and written identification of intervention start times. Additionally, the project team continuously monitored the process of seamlessly integrating the research design into everyday school life.

Fig. 1
figure 1

Experimental design

6 Data collection & materials

6.1 The Daily Behavior Report Cards

The DBRCs used in this study were developed together with the university team and the teaching staff. The project team provided templates, which were adapted by the teachers according to the specific goals of the students. All DBRCs contained a three or five-point scale on which the behavior displayed was assessed. This scale also formed the basis for the feedback discussions and the rewards for the students. In addition, the DBRCs contained columns for the respective date and the type of behavior observed and a line for the behavior target along the specifications for creation (Fig. 2).

Fig. 2
figure 2

Example of a DBRC used in this study. This DBRC includes the student’s name, date, days of the week, goal, and rewards

6.2 Integrated Teacher Rating Form (ITRF)

To assess children exhibiting disruptive behavior, as identified by teachers, a shortened version of the German Integrated Teacher Rating Form (ITRF) was used [43,44,45]. The abbreviated form consists of two subscales: oppositional/disruptive (OD) and Academic Productivity/Disorganization (APD). It comprises 16 problem-oriented items that evaluate student behavior in the classroom. Casale, Volpe et al. [43] reported a high level of internal consistency for the oppositional and disruptive behavior (OD) scale (α = 0.96) and the academic productivity and disorganization (APD) scale (α = 0.95). The temporal stability for 2–4 weeks was deemed acceptable, with correlation values of r = 0.88 (APD) and r = 0.78 (OD).

Receiver operating characteristic curve analyses indicated that the ITRF effectively identified at-risk students, referring to those who, based on their observed behavior, should be placed in either Tier 2 or Tier 3 within the MTSS's decision-making process. Teachers rated the ITRF highly for its perceived accuracy, feasibility, acceptability, and usefulness in guiding interventions. In conclusion, the ITRF is a reliable and valuable tool for teachers to identify and support students facing behavioral challenges in the classroom [44].

6.3 Direct behavior rating (DBR)

To assess the disruptive behavior of the students, the Direct Behavior Rating was used [46,47,48]. The DBR combines systematic observation and rating methods and is considered suitable for capturing detailed behavioral processes in single case studies because of its cost-effectiveness and versatility [48, 49]. It assesses behavior at specific time points during school days. Huber and Rietz [48] conducted a comprehensive analysis of its performance in school and obtained satisfactory reliability, validity, and accuracy results. However, it is important to note that the results may vary depending on the type of behavior being evaluated and the assessor. The study focused on learning and work behaviors, and the DBR demonstrated reliable results with an ɛ2 and ɸ score of ≥ 0.80 after only four measurements [50]. Further analyses on measurement invariance also confirm the reliability of the DBR [51]. In terms of operationalizing the variables for this study, the project team adhered to the guidelines set by Chafouleas [52]. For both dependent variables, a Single-Item Scale (SIS) was employed. Research findings have consistently shown that SIS is generalizable and reliable across different raters, items, and observation times [53, 54].

To collect the data in this study, the teachers were trained in three sessions on the use of the DBR. In this process, criteria for assessing disruptive behavior were developed: Disruptive behavior was defined as behavior that disrupts the class or interferes with other children’s learning. Examples include shouting in class, fooling around, engaging in inappropriate side conversations, and not staying seated. These criteria were taken into account by the teachers during the observation. Teachers were instructed to complete the DBR immediately after conducting the DBRC and rated the behavior on a scale of 0 to 5 (0 = low; 5 = high). Accordingly, DBR appears as a suitable and valid method that can be implemented by teachers in the classroom and can also collect data on behavioral characteristics of young students who are not yet able to adequately assess their own behavior or that of their classmates.

6.4 The two variables of the daily behavior rating: general disruptive behavior and specific goal behavior

Data for the single case study were gathered using two variables, General Disruptive Behavior (GDB) and Specific Goal Behavior (SGB). The first variable, GDB, pertains to general disruptive behavior exhibited during class and is defined as: “Behavior that disrupts the class or interferes with other children’s learning. Examples include shouting in class, fooling, engaging in inappropriate side conversations, and not staying seated.” A lower score on this scale indicates reduced disruptive behavior.

The second variable SGB, represents the criterion for assessing the effectiveness of the DBRC concerning a specific behavior. This variable therefore measures the success of the intervention with regard to the students individual goals. Even though the specific goals may vary from student to student, the variable nonetheless reflects the success of the intervention in relation to a particular behavior that was established as a target goal of the DBRC for each student. The interpretation of the variable remains consistent as it was based on observed behaviors within a specific problem area. Therefore, a lower score is more desirable than a higher one. It is important to note that even if the behavior falls within the context of disruptive behavior, the variable does not reflect general disruptive behavior but focuses on a particular aspect.

To effectively capture SGB during the baseline (Phase A), a meeting was held with the teachers several weeks before the first measurement to inquire about such behaviors. The teachers described the specific disruptive behaviors exhibited by the students, which formed the basis for baseline data collection (Phase A) and goal formulation for DBRC (Phase B). For example, teachers described “The child vehemently refuses to work after a certain time in class.” The project team then extracted the core statement of “work refusal” from these descriptions and created a DBR that was used as an observational tool for teachers. Relevant information, such as the subject and class period, was also collected. This allowed for the early collection of behavioral data during the baseline phase A (Phase A), which later became the target for applying DBRC during the intervention phase (Phase B). Figure 3 shows a schematic of this process.

Fig. 3
figure 3

The two variables of the daily behavior rating: SGB and GDB

Thus, the two variables differ in their interpretations; however, within the observation guidelines, both can be observed well through teachers.

6.5 Data analysis

Digitally processed data were descriptively analyzed using R Studio ([56], Table 1). This analysis served as the basis for evaluating the Daily Behavior Rating, which was also performed using R. The SAN package was used to analyze individual case data [56].

Initially, each student was examined individually. The mean values and non-overlap of all pairs (NAP) for Phases B and C were calculated and evaluated. The NAP [57] is a commonly used index to assess the overlap between phases in single-case studies. To calculate the NAP, the number of data points that overlapped between the two phases was determined, and this value was then divided by the total number of pairs (data points) to obtain an index score ranging from 0 to 1. Higher scores indicate greater non-overlap, whereas lower scores indicate greater overlap. Additionally, the NAP considers any outliers or extreme values in the dataset that may affect its accuracy. A value between 50 and 65 indicated a small effect, between 66 and 92 indicated a moderate effect, and above 93 indicated a large effect.

Individual cases were analyzed using regression-based methods. The A and B phases for each case were compared inferentially to determine significant effects specific to the developmental trajectory of the individual case. These analyses involved calculating level, trend, and slope effects. The level effect represents the intervention's impact on behavior regarding the overall change, while the slope effect signifies the intervention's continuous development following its initiation [56]. The trend effect indicates a change across phases (A and B). Additionally, interaction effects at the case level were analyzed using regression models for each case. In these regression models, the level, slope, and trend effects were not included because the focus was on the isolated effects of the variables on each other.

Furthermore, piecewise multilevel regression analyses were conducted for all cases to examine the changes in the entire sample. In this analysis, the level, slope, and trend effects were calculated and used to describe the observed behavior changes over time. All effects were treated as fixed or random. Random effects highlight the differences between individuals in the entire sample.

7 Results

Each case showed a high level of disruptive behavior, either on the OD scale or in the total ITRF score (Table 1). The mean OD scale score is 12.2 (SD = 6.14, Range = 1–23) and the mean APD scale score is 13.5 (SD = 6.54, Range = 3–23) with following cutoffs: APD ≥ 10, OD ≥ 5, ITRF Total Score ≥ 13.

The analysis of the Overlap Index NAP showed a strong effect on the GDB variable in 4 of 10 cases (40%) and a moderate effect in six of 10 cases (60%) (Table 4). All effects were significant. Similarly, the analyses of the Overlap Index NAP for SGB showed a strong effect in 4 of 10 cases (40%) and a moderate effect in six of 10 cases (60%) (Tables 4 and 5). The NAP of Case 5 was not significant.

Table 4 General Disruptive Behavior (GDB)—Piecewise linear model for each case including Non Overlap of All Pairs (NAP)
Table 5 Specific Goal Behavior (SGB)—Piecewise linear model for each case including Non Overlap of All Pairs (NAP)

Analyses across the groups showed a significant decrease in observed problem behaviors and different effects for the GDB and SGB. For a detailed analysis, three different models were developed using the data.

7.1 Model 1 “GDB”

The coefficient of 0 for slope phase B suggests no significant slope or change in the disruptive behavior during phase B. The associated p-value of 0.97 indicates that this coefficient is not statistically significant. A value of 0 for the trend effect suggests no significant trend or change in disruptive behavior over time. The associated p-value of 0.77 indicates that this value is not statistically significant, implying that the change in disruptive behavior is unrelated to time.

The intercept coefficient is 2.5. The estimated value of GDB at the start of the analysis was 2.5 (Table 6). The coefficient of -1.94 for the level effect indicates that the disruptive behavior changed significantly by an average of -1.94 during the B phase compared to the A phase.

Table 6 Model 1 “Hierarchical piecewise linear regression for multiple single-cases at once for General Disruptive Behavior (GDB)”

In summary, the model indicates that disruptive behavior significantly changed during the B phase compared to the A phase (level phase B), but there was no significant trend or slope in disruptive behavior over time or during the B phase. The intervention appeared to impact disruptive behavior, leading to stabilization at a new level. Regarding random effects, the intercept, trend, level phase B, and slope phase B showed significant variations across the cases. The ICC value is 0.01 (p ≤ 0.001), suggesting that approximately 1% of the variability in the disruptive behavior is attributable to between-case differences.

7.2 Model 2 “SGB”

The intercept coefficient of 3.07 indicates that the estimated value of SGB at the start of the analysis was 3.07 (Table 7). The coefficient of -1.67 for the level effect in phase B indicates a significant change in SGB during the B phase compared with the A phase. An associated p value of ≤ 0.001 shows this change was statistically significant. The coefficient of -0.02 for slope phase B suggests a slight decrease in SGB in phase B, but it is not statistically significant (p = 0.08). The coefficient of 0.01 for trend mt suggests no significant trend or change in the SGB over time. Regarding the random effects, the intercept, trend effect, level effect, and slope effect of Phase B showed significant variations across cases. The ICC value was 0.12 (p ≥ 0.001), indicating that approximately 12% of the total variability in SGB was attributable to between-case differences.

Table 7 Model 2 “Hierarchical piecewise linear regression for multiple single-cases at once for Specific Goal Behavior (SGB)”

In summary, the model suggests that specific goal behavior is significantly influenced by level phase B, with a significant decrease observed during B phase compared to A phase. However, there was no significant trend or slope phase B effect on goal behavior over time. Random effects analysis indicated significant between-case variations in the intercept, level effect of phase B, and slope effect of phase B.

7.3 Model 3 "Influence of SGB on GDB”

An intercept coefficient of 1.2 indicates that the estimated value of disruptive behavior (GDB) at the start of the analysis is 1.2 (Table 8). A coefficient of zero for the trend effect suggests no significant trend over time. The coefficient of -1.22 for the level effect in phase B, indicates a significant change in disruptive behavior during the B phase compared with the A phase. The coefficient of 0.01 for slope phase B suggests no significant slope or change in the disruptive behavior during phase B. The coefficient of 0.42 for the SGB indicates a significant positive effect on disruptive behavior. An associated p-value of ≤ 0.001 shows that this effect was statistically significant. To investigate the relationship between variables at the case level, the interaction effects on a case-by-case basis can provide additional insights into the connections between variables.

Table 8 Model 3: “Hierarchical piecewise linear regression for multiple single-cases with interaction between GDB and SGB”

Isolated, without the influence of level and slope effects at the individual level, significant effects (B = 0.10–1.05) are found in 8 out of 10 cases (Table 9). The individual models show a high model fit (one case with a small effect, one with a minor effect, and eight with a high effect), suggesting a strong influence of SGB on the GDB. The lack of influence of the specific target behavior on disruptive behavior in Case 6 can be partly explained by the choice of target: no oral participation in the introductory phases (German). The fact that the connection between this target and disruptive behavior is very distant in terms of content can explain this result. Regarding the random effects, the intercept, trend effect, level effect of Phase B, and slope effect of Phase B showed significant variations across cases. The ICC value is 0.09 (p ≤ 0.001), indicating that approximately 9% of the total variability in disruptive behavior is attributable to between-case differences.

Table 9 Isolated Interaction effects between GDB and SGB

In summary, Model 3 suggests that the GDB is influenced by both the level effect of phase B and the SGB. The level effect of phase B indicates a significant change in disruptive behavior during the B phase compared to the A phase. However, there was no significant trend or slope effect in phase B effect on disruptive behavior. Additionally, the inclusion of the SGB shows a significant positive effect on disruptive behaviors. Random effects analysis indicated significant between-case variations in the intercept, the level effect of phase B, and the slope effect of phase B, and the ICC value indicated meaningful between-case variations.

8 Discussion

The previous findings indicated a significant reduction in both GDB and SGB, thereby corroborating the efficacy of DBRC as a behavioral intervention strategy. Additionally, this study expands on previous research in several key areas. It addresses the gap in existing research, which has focused mainly on American schools, by investigating the effectiveness of DBRC in a German context. This demonstrates its applicability and effectiveness across different cultural and educational settings. The findings also support MTSS implementation in German schools and offer practical insights into usable evidence-based interventions. The long-term implementation of DBRC over 120 days highlights the sustainability of its positive effects and the importance of consistent application to achieve lasting behavioral changes. Additionally, factors such as individual development or natural maturation processes might have influenced the observed behavioral changes. In the following sections, the results will be discussed in relation to the different research questions.

8.1 Research question 1: does a DBRC intervention decrease disruptive student behavior in an inclusive elementary school setting?

At the individual case level, the following results were observed: Out of the 10 cases, 70% (7 cases) showed a negative Level Effect (Cases 2, 3, 4, 7, 8, 9, and 10) ranging from −1.25 − 3.83 (M = 2.53). On average, the displayed behavior decreased by approximately 2.5 points across these seven cases after the introduction of the intervention. Disruptive behavior was evaluated using a 6-point scale (ranging from 0 to 5), indicating a reduction of 20.8% to 63.8% (with a mean value of 42.2%) based on this scale. Additionally, Case 8 (− 0.04) shows a significant Slope Effect, indicating that in this case, the behavior exhibits both a mean-level difference and a continuous further decline between the phases.

Furthermore, three cases (30%) demonstrate a significant Trend Effect (Case 6 = −0.03, Case 7 = 0.05, Case 8 = 0.10). In Case 6, this effect was negative, whereas it was positive in Cases 7 and 8. This implies that over the entire period (Phases A and B), disruptive behavior increases (Cases 7 and 8) or decreases (Case 6). For Cases 7 and 8, this indicates that despite the continuous slight increase, the effects concerning the reduction in disruptive behavior are still measurable. The NAP results confirmed this substantial decrease in observed disruptive behavior.

Additionally, Model 1, which focused on disruptive behavior (GDB), demonstrated that DBRC significantly influenced the reduction of disruptive behavior during the intervention phase. The level phase B coefficient indicated a significant change in disruptive behavior, whereas the trend and slope effect coefficients did not show significant effects. This suggests that DBRC effectively stabilizes disruptive behavior at a new level without significantly changing the trend or slope over time.

The results of this part of the study confirm previous findings from other studies concerning disruptive behavior [29, 58,59,60,61] The effects of this study are even greater and show a higher level of data quality in direct comparison concerning the number of data points [34].

8.2 Research question 2: does a DBRC intervention decrease the stated problem behaviors of students in an inclusive elementary school setting?

Similar to the first variable, a comparable pattern was observed at the individual case level: Out of the 10 cases, 70% (7 cases) showed a negative Level Effect (Cases 1, 3, 4, 7, 8, 9, and 10) ranging from -1.53 to -3.07 (M = 2.21). Specific goal behavior was evaluated using a 6-point scale (ranging from 0 to 5), indicating a reduction of 25.5% to 51,2% (with a mean value of 36,8%) based on this scale. On average, specific goal behaviors decreased by approximately 2.21 points across these seven cases after the introduction of the intervention. Additionally, Case 6 shows a significant Slope Effect (− 0.10), indicating that, in this case, the behavior exhibits both a mean-level difference and a continuous decline between the phases. Furthermore, Case 7 also shows a significant Level Effect (0.05), suggesting that despite the slight continuous increase, the effect of the reduction in specific goal behavior is still measurable. The NAP results confirmed this substantial decrease in observed disruptive behavior.

In addition, Model 2, examining the SGB, revealed a significant decrease in SGB during the intervention phase compared with the baseline phase. The level effect coefficient indicated a significant change in SGB, whereas the trend and slope effect coefficients did not reach statistical significance. This highlights the effectiveness of DBRC in improving specific goal behaviors among participants. However, these results may vary among teachers.

The results of this part of the study are comparable to those of other studies and students' progress toward their goal behaviors. This confirms the effectiveness of DBRC in modifying specific behaviors. The observed effects are slightly weaker than those of the GDB variable but are consistent with the findings of other studies [23, 24, 31]. The data showed a high level of quality concerning the number of data points [34].

8.3 Research question 3: to what extent do changes in specific goal behavior associated with a DBRC intervention impact a global measure of general disruptive behavior?

Specific goal behavior varies among students according to the definition of the variable. Accordingly, a detailed examination of the respective observational variables is relevant. In five out of ten cases (Cases 2, 4, 7, 9, and 10), the chosen goal targeted a specific situation related to disruptive behavior. In the other five cases, the chosen goal behavior targeted behaviors related to work or learning (Cases 1, 3, 5, 6, and 8). However, the effects of disruptive behavior were still evident in three cases (Cases 3, 6, and 8). The NAP was significant and moderate in every case. Working on the specific goal behavior “Lack of attention during entry phases (Math)” with the DBRC has no measurable effect on that specific behavior but reduced GDB. Similar considerations were made in Case 2. The specific goal behavior “Behavioral problems in open situations, such as breaks or transitions” is not measurably reduced, but the GDB decreased.

For Cases 3 and 5, no significant level or slope effects were reported for the SGB variables. Additionally, for Case 5, NAP is insignificant for the SGB variable. In all the other cases, the measurement results were consistently reduced. Another noteworthy observation in the individual data is that of Case 1. Here, no measurable effects except significant NAP were observed for the GDB variable. However, the SGB variable shows a significant effect of -1.53. This can be explained by the goal “Lack of autonomy during work periods (German & Math),” as it does not directly address disruptive behavior. Nevertheless, given that the NAP is at least significant and moderate, it suggests a certain interaction between the SGB and GDB.

Model 3, which explores the influence of SGB on GDB, provides further insights into the relationship between these variables within the context of DBRC. These results indicate that SGB and DGB are correlated. Specific goal behavior had a significant positive effect (0.42) on disruptive behavior. This means that as SGB increases, GDB increases too, suggesting that as participants demonstrated higher levels of specific goal behavior, their general disruptive behavior also increased. This finding underscores the importance of considering specific goal behaviors when implementing DBRC to manage disruptive behaviors effectively. Random effects analysis revealed significant between-case variations in the intercept, level phase B, and slope phase B coefficients across all models, emphasizing individual differences among participants in responding to the DBRC. The ICC values further support the presence of meaningful between-case variation, suggesting that individual factors influence disruptive and specific goal behaviors.

To these analyses, it is essential to emphasize that the correlation between the two collected variables is not obvious, as far as the data analysis is concerned. However, at first glance, both variables yielded similar results. This similarity could be due to their joint collection, which may have introduced a tendency for similar evaluations. In addition, both variables are largely characterized by conspicuous undesirable behaviors. However, upon closer examination, some differences between the variables emerged, justifying the analyses described from a methodological standpoint. The GDB variable represents general disruptive behavior that can be observed in the classroom across various situations over a comparatively extended observation period. It encompasses a wide range of behaviors that may not necessarily align with the defined target behavior of the DBRC and, therefore, are not the focus of the intervention. Therefore, even if the SGB variable appears to be related to the behavior captured by the GDB, it is not congruent, and a direct relationship cannot be assumed. This lack of congruence is partly due to differences in the definitions and psychological behavioral manifestations of specific conspicuous disruptive behaviors that are challenging to modify. By contrast, SGB is characterized by observing a particular behavior in a specific situation. DBRC explicitly aims to address and reduce this specific behavior. Although it was initially assumed that a correlation exists between these different variables upon closer examination, it is necessary to provide evidence for this assumption. Model 3, however, can confirm the presence of this correlation.

8.4 Summarized discussion

This study emphasizes the effectiveness of DBRC in reducing general disruptive behaviors and specific goal behaviors. In addition, the single-case study provides valuable insights into the interplay between the GDB and SGB and the efficacy of DBRC. Analysis of the single-case data revealed a significant decrease in GDB and SGB during the intervention phase. The general GDB and SGB showed strong and moderate effects, respectively, as indicated by the NAP overlap index. These findings demonstrate that DBRC is valuable for addressing disruptive and specific goal behaviors. These results contribute to the understanding and implementation evidence-based interventions to manage disruptive behaviors. This study confirms and expands the findings of other studies [20, 28, 31]. However, further research is warranted to refine and enhance the effectiveness of DBRC and explore its applicability in diverse educational settings.

Methodologically, it is noteworthy that this study exhibited high data quality owing to the comprehensive number of measurement points in both the baseline and intervention periods. The number of students was minimal. Power analyses conducted using the scan package in R Studio confirmed the statistical significance of individual case trajectories, thus establishing the statistical validity of the analyses across all cases [55, 56]. A multiple-baseline design was implemented across all three classes, fulfilling the requirements for a single-case study in this regard [32].

Another relevant point of discussion pertains to the varying quality of class implementation. Class A exhibited the lowest median of successfully implemented intervention criteria (66.7%), whereas Class B (97.6%) and Class C (91.2%) demonstrated higher fidelity than all criteria (1–5). In absolute terms, Class A showed the highest number of times DBRC was used (39). In fourteen lessons, at least one implementation criterion was not applied. Additionally, Class A had fewer children (2) than Classes B and C (4 each), necessitating careful consideration when comparing the effects of different children across classes. Regarding GDB, Class A achieved an average NAP of 85, Class B scored 87.5, and Class C had the highest value of 92.5. Notably, despite having the highest implementation fidelity, Class B did not achieve the highest NAP effect, whereas Class A had the lowest values for implementation and NAP. Similarly, for SGB, Class B displayed the lowest average effect (80.7), despite having the highest implementation fidelity. These descriptive facts do not conclusively establish a direct relationship between implementation data and success. Analysis of the significant slope and level effects also presents an interesting pattern. In Class A, the slope or level effect for the GDB variable was not significant in Case 1; for the SGB variable, this was also the case in Case 2. Both cases yielded significant effects on both variables in at least one instance in Class C. For Class B, none of the effects were significant for the SGB in Case 5, and the GDB showed no significant level or slope effects in Cases 5 and 6. Class C displayed significant effects in 100% of the cases for both variables, whereas Class B and Class A showed significant effects in 50% and 50% (for SGB) and 75% (for GDB) of the cases, respectively. In summary, it is challenging to interpret the nature of implementation based solely on observed effects. While there are tendencies for Class A to exhibit lower or fewer effects in some cases and have the lowest implementation quality, further research is necessary to understand these potential correlations precisely.

9 Limitations

This study has several limitations that should be acknowledged. First, the sample size for hierarchical piecewise regression was relatively small, which may have restricted the generalizability of the results. A larger sample size is preferable to enhance the validity of the study. A second limitation is the absence of Interobserver Agreement (IOA) calculation. Although the calculation was not feasible due to the lack of additional raters, this limitation is mitigated to some extent by the selection of the DBR as a measurement tool. The DBR is validated as a scientific measurement system that reliably generates high-quality data. The methodology combines systematic behavior observation with a rating scale, which has been proven effective for detailed behavior tracking in quantitative case studies due to its flexibility and economic efficiency [48, 62]. Research findings have consistently demonstrated the DBR’s generalizability and reliability across different raters, items, and observation points, confirming its reliability and validity [54, 63]. Furthermore, teachers were trained in the application and implementation of the DBR through practical examples, ensuring a consistent assessment of the observed behaviors. Another limitation is that the study was conducted in a single elementary school, which may have limited the broader applicability of the findings. Including data from multiple schools or additional classes increases the study's external validity. Additionally, the study relied on only three observers, which may have introduced potential biases and reduced the reliability of the data collected. A more diverse set of observers would improve the quality of the data. The specific interventions and goals used in this study may also limit the generalizability of the findings to other cases, classes, schools, or educational settings. A broader range of intervention goals would enhance the applicability of this study. This study could benefit from employing additional methods, such as qualitative interviews or surveys, to gain a deeper understanding of the effectiveness of the intervention. This would provide more comprehensive insight into the mechanisms underlying the observed outcomes. A typical AB design is limited in terms of internal validity. Although it showed a clear data change from the baseline to the intervention phase, it could be attributed to external events, individual development, or natural maturation processes apart from the treatment. Thus, the AB design is often considered a quasi-experimental approach with reduced internal validity. A more robust approach, a multiple baseline design, was adopted to address this limitation. This design involved applying the intervention simultaneously across various independent behavioral domains or with different participants. Consequently, the potential for attributing the observed results to external events, individual development, or natural maturation processes is significantly reduced. This enhancement ensured a clearer attribution of changes to the intervention, thus bolstering the experiment’s internal validity.

Another limitation of Model 3 is important to note that although the effect is statistically significant, further analyses and interpretations are necessary to understand this relationship's precise nature and strength fully. Additionally, the observed behaviors were associated with disruptive behaviors in half of the cases. The effects observed here should serve as indicators and incentives for further investigation. Additionally, other potential factors contributing to disruptive behavior, which may not have been accounted for in the current model, should be considered. In summary, although this study makes valuable contributions, these limitations should be considered when interpreting the results and considering the implications of the findings.

10 Practical implications

DBRC proved an effective measure for inclusive schools for children within the studied age range. The precise formulation of goals is of utmost importance as it simplifies achieving objectives. Collaborating with teachers, the university team developed goals, analyzed needs, and received positive feedback regarding helpfulness. Without employing tests or survey instruments, the needs were identified through collegial discussions resembling open consultations. Embracing these measures is crucial. Teachers strongly committed to the study and intervention and actively sought implementation. When questions arose, the university team became approachable, encouraging practical exchanges and sharing experiences among colleagues in the field. Beyond the intervention, the teachers' thoughtful reflections on the emergence of specific goal behaviors proved valuable. Asking why certain behaviors manifest and what influences them, teachers found this reflective process rewarding and effective, even without a direct school intervention.

This underscores the vital aspects of inclusive education and differentiation. Teachers become aware of and respond optimally to personal well-being and psychologically oriented needs at both group and individual levels. By acknowledging and addressing these factors, schools can enhance the efficacy of inclusive practices and create supportive student environments. In this context, the DBRC is a promising and suitable intervention.

The findings of this study also suggest that this adaptable intervention can be implemented in diverse educational settings. To implement DBRC effectively, it is essential to examine the conditions under which teachers operate. This includes evaluating whether teachers have knowledge of positive reinforcement and can conduct feedback sessions. The intervention must also fit into the daily school routine and classroom environment while being suitable on an individual level. Consistent school rules and the presence of a school-wide concept, like MTSS, facilitate successful implementation by providing a framework for Tier 2 interventions like DBRC.

11 Future directions

Several areas can be explored in the future to enhance the understanding and application of DBRC as an effective intervention for managing disruptive behaviors in inclusive educational settings. To improve the generalizability and validity of the study, future research should involve larger sample sizes encompassing multiple schools or educational settings, enabling a more comprehensive examination of the effectiveness of the DBRC and its suitability in diverse contexts. Conducting follow-up studies with an extended duration would provide valuable insights into the sustainability of the effects of DBRC over time and determine its lasting benefits. Involving a more diverse set of observers in future studies would enhance the reliability of data collection and minimize potential biases in the evaluation of DBRC outcomes [20]. Supplementing quantitative data with qualitative interviews or surveys would offer a deeper understanding of the intervention mechanisms and their effectiveness from the perspectives of teachers, students, and other stakeholders. In addition, exploring a broader range of intervention goals would enhance the study's applicability and shed light on the versatility of the DBRC in addressing various behavioral challenges beyond those investigated in this study. This could be combined with the question of how individual differences in student characteristics and learning styles influence the effectiveness of DBRC, facilitating tailored interventions for different students. In addition, work with older groups of students and the effects of the intervention adopting a multi-method approach, such as combining behavioral observations with self-report measures or teacher ratings, would provide a more comprehensive evaluation of behavioral changes and enrich the understanding of the effects of DBRC. By addressing these areas in future research, researchers can strengthen the evidence base for the effectiveness of DBRC and refine its implementation to manage disruptive behaviors effectively in inclusive educational settings.