Introduction

The use of information communication technology (ICT) is increasingly widespread, and modern society demands a thorough preparation in this field at the earliest possible age (Hernandez, 2017; Kafai & Burke, 2013; Stamati, 2020; Yelland, 2005). Programming is now an integral part of everyday life (Iivari et al., 2020; Luxton-Reilly, 2016), and primary school appears to be the most designated place to teach pupils the required competencies and skills at an early stage (Atman Uslu & Usluel, 2019; Edwards, 2005). There is a variety of didactic approaches by which learners can develop computational thinking (CT), which can contribute to their problem-solving capabilities (Estapa et al., 2018; Nouri et al., 2020). However, CT involves more than the ability to solve challenging problems using skills derived from the world of computer science (Brennan & Resnick, 2012; Israel-Fishelson & Hershkovitz, 2022; Tsarava et al., 2021). It encompasses the mental skills and practices needed to design computations that can let computers perform tasks for us, and to explain and interpret the world as a complex system of information processes (Denning & Tedre, 2019; Lai, 2021). The provision of opportunities to further stimulate the development of CT, and to enable pupils to acquire the associated skills, entails demands on both the environment and the task (Kong & Abelson, 2019; Rich & Browning, 2022; Yadav et al., 2016).

A variety of programming environments are available to teach pupils the concepts of programming at the primary education level (López et al., 2021; Wahl & Thomas, 2002). Visual programming environments are currently very popular and offer a wide range of possibilities (Chao, 2016; Ray, 2017). The basic starting point for visual programming is that definable code blocks are dragged into a worksheet using the ‘drag and drop’ method, and are arranged in the correct sequence (Weintrop, 2019; Weintrop & Wilensky, 2015). When used in this way, pupils can construct a program and execute it. A distinction can be made between (a) visual programming with a visual on-screen output, in which virtual objects are programmed and controlled, and (b) visual programming with a tangible, physically perceptible output in which concrete artefacts are programmed and controlled (Caci et al., 2013a, 2013b; Corral et al., 2019; Horn et al., 2009). This difference in the type of output determines the way in which pupils receive feedback on their programming actions (Sapounidis et al., 2015; Zhang & Nouri, 2019).

The ability to construct a computer program that anticipates changing environmental conditions by means of sensor observations demands a different computational approach than performing programming tasks in an unchanging, predictable environment (Gomes & Mendes, 2007; Kim & Kim, 2003; Kyriazopoulos et al., 2022). By using sense-reason-act (SRA) programming, a programmed artefact or a simulation of reality can react to changes in its surroundings (Fanchamps et al., 2021). The concept of SRA, derived from the world of robotics, requires the understanding that the operating program should continuously compare the external conditions with the desired conditions (Lith, 2006). Actions then become conditional rather than automatic. An SRA-programming application is characterised by an initial process in which a state is detected through sensing or sensor observations (sense); this situation is then compared with the corresponding values of the computer program (reason), and the computer program is then used to execute subsequent actions (act) (Slangen, 2016). SRA-programming requires the use of more complex forms of iterations, conditionals and functions rather than restricted linear, sequential programming structures (Martinez et al., 2015). This requires logical reasoning of the ''if … then …'', ''wait … until …'', ''repeat … until …'' kind. The complexity is rooted in the ability to think in terms of scenarios, and to understand and apply the more complex concepts of programming (Popat & Starkey, 2019).

Earlier research shows that although the use of SRA contributes to the development of CT (Fanchamps et al., 2020; Slangen, 2016; Wong, 2014), pupils lack fluency in the more complicated language of conditional reasoning and nesting required for successful SRA-programming. In addition, previous research has also shown that pupils need (dedicated) assistance in order to understand the concept of SRA (Slangen, 2016). It has been demonstrated that the concept of SRA, encompassing a generalisable theory that contains several specific conceptual elements (e.g. parallel thinking, cause-effect relationships and conditional reasoning), is applicable within visually oriented programming environments where variations are compared only with a visual, on-screen output, or only with a physical tangible output (Fanchamps et al., 2019, 2020). It is questionable whether the distinction between a visual, on-screen output and a tangible output can influence a better understanding of the concept of SRA, with consequent effects on the development of CT. This study therefore aims to investigate whether the use of SRA-programming has an impact on the understanding of complex programming concepts, and whether the impact on CT when using visual programming environments depends on the use of an on-screen output or a physical, tangible output. Our research results are also compared to the performance of a control group that did not use either type of visual programming environment.

Theoretical framework

In this research, we are specifically interested in the effect on CT of applying SRA-programming in a visual programming environment, as demonstrated previously (Fanchamps et al., 2021), depending on the type of output (i.e. visual or tangible). We also want to know whether the type of output influences the understanding of the more complex programming concepts when SRA is used (Zapata-Cáceres et al., 2020).

From earlier research (Fanchamps et al., 2019), we know that primary school pupils show a higher level of CT when SRA is applied in robot programming. We also know that the application of SRA with an impact on CT depends on the design of the task and the environmental conditions in which robots are programmed (Slangen, 2016). In addition, we know that the application of SRA in a visual programming environment with visual, on-screen output enables the development of CT (Fanchamps et al., 2021).

CT is a conceptualised way of thinking with the aim of solving problems by using fundamental concepts of computer science (Hsu et al., 2018; Wing, 2006). It refers to a logical approach towards solving problems through problem formulation, data organisation, analysis and representation (Denning & Tedre, 2019; Voskoglou & Buckley, 2012). CT is a process of thinking in which problems and their solutions can be reformulated to allow them to be presented in a form that can be effectively implemented by an information processing digital agent (Dummer, 2017; Leifheit et al., 2018; Vourletsis & Politis, 2021). Skills such as problem decomposition, algorithmic thinking, pattern recognition, parallelisation and abstraction are addressed (Catlin & Woollard, 2014; Chalmers, 2018; SLO, 2017). A solution-focused ability is strengthened, and this stimulates creative thinking about the use of digital tools to solve a problem (Lee et al., 2011; Tedre & Denning, 2016).

Programming in accordance with the SRA approach in which sensor-based programming is used and where sensory input determines consecutive actions, is characterised by connecting observations of (virtual or material) sensory input (sense) to a reasoning component which initiates actions based on these observations (reason) and a process of subsequent actions based on the given inferences (act) (Krugman, 2004; Slangen et al., 2011; Wong, 2014). When applying SRA-programming, complex programming concepts such as iterations, conditionals and functions are used (Basu et al., 2016; Werner et al., 2012). SRA-programming requires conditional, causal and iterative reasoning, abstract thinking and thinking in terms of parameters and variables (Estapa et al., 2018). The ability to functionally apply SRA in programming environments requires pupils to develop logical reasoning and systematic thinking (Fanchamps et al., 2019). Anticipating the requirements of the task design, and enabling and executing targeted interventions, demands the correct selection and implementation of sensors and actuators (Durak et al., 2019; Oswald et al., 1999). The application of SRA-programming requires pupils to adopt a different approach to solving a programming problem from applying a linear programming approach (Slangen, 2016; Wyeth et al., 2003).

From research by López et al. (2021), it appears that iterations, conditionals and functions, the prominent concepts underlying SRA-programming, are difficult for primary school pupils to comprehend. More specifically, these concepts are operationalised through programming elements such as nested loops, “if–then-else”, “wait until”, “while” or functions with parameters. Pupils tend to avoid applying these complex concepts due to the higher level of abstraction involved (Werner et al., 2012). Only when a programming environment or task design appeals to the added value of complex programming concepts will pupils be triggered to use them in solving a programming problem (Wahl & Thomas, 2002). When introducing complex programming concepts to primary school pupils, game-based and robot programming environments provide a promising opportunity to illustrate and reveal their functions and applications (Chevalier et al., 2021; Dlab et al., 2019; Martinez et al., 2015). Previous research indicates that applying SRA thinking, including the use of sensory input to anticipate unforeseen, changing events in the task design, forces pupils to abandon linear thinking and offers them the opportunity to effectively understand and apply complex programming concepts in a goal-oriented way (Fanchamps et al., 2020). SRA thinking involves logical, causal and conditional reasoning and the ability to establish cause/effect relationships when using sensor input to anticipate changes in the task design.

The influence of the characteristics of the learning environment appears to be very important in programming applications (Durak et al., 2019; Gross & Powers, 2005), in order for pupils to not only comprehend the coding environment used but also the programming concepts themselves (Williams et al., 2015). The characteristics of the programming environment can influence pupils’ performance, and provide opportunities for assessing and providing feedback (Ahmed et al., 2018; Allison et al., 2002). Furthermore, the design of the programming environment can characterise the ways in which pupils perceive, interact with and respond to the environment (Gomes & Mendes, 2008). It is important that pupils develop programming skills in an environment that supports them in learning the basic concepts of programming (Gomes & Mendes, 2007; López et al., 2021; Zaharija et al., 2013). The programming learning environment must therefore create the conditions for understanding and applying certain abstract programming concepts (Sáez-López et al., 2019; Werner et al., 2012). The design of the task and the learning environment can help pupils to understand the effect of the programming intervention and can clarify the functions of concepts (Popat & Starkey, 2019; Wahl & Thomas, 2002). In terms of the comprehension and application of the basic concepts of programming, visual programming environments prove to be perfectly suited due to their imaginative power and low level of abstraction, and are easily accessible to enable pupils learn the basic concepts of programming required (Kaučič & Asič, 2011; Tsai, 2019).

Direct manipulation environments (DMEs) are powerful tools for creating a learning environment that can make programming understandable. Robotic DMEs are concrete, physical artefacts (robots/constellations) that can be controlled by programming that makes use of actuators and sensors (Jonassen, 2006; Rekimoto, 2000). DMEs offer a potentially rich context for learning, understanding and practicing programming, for understanding the concepts of robotics, and for developing (general) problem-solving skills. The use of robotic DMEs provides the possibility to obtain ‘instant’ feedback from the technology on pupils’ thinking and acting (Slangen et al., 2011). Examples of such environments include TechnoLogica, K’nex, Fischertechnic, Arduino, Makeblock and Lego Mindstorms, which allow pupils to build controllable structures that can be programmed to perform predefined tasks (Jonassen, 2000; Slangen et al., 2008, 2011; Slangen et al., 2009). The efficient and yield-oriented use of DMEs places demands on the programming environment and the task design with regard to enabling a problem-solving approach. DMEs tend to use complex programming concepts (e.g. nested loops, if–then-else, wait until, while, functions with parameters, etc.).

Robotic and virtual environments are considered powerful tools for learning complex programming concepts (Caci et al., 2013a, 2013b; López et al., 2021). A tangible output can be experienced and perceived through physical representations, unlike a visual output, which uses more mental representations (Chevalier et al., 2022; Marshall, 2007). Moreover, a visual programming environment with a tangible output also differs from a visual programming environment with an on-screen output in terms of the connections between the physical and digital representations (O'Malley & Fraser, 2004). Furthermore, it can be argued that a three-dimensional, physical representation provides different forms of information, immersion and engagement from a two-dimensional, visual, on-screen representation (Price et al., 2003). Some studies indicate that learning to program is more effective and meaningful when the learner operates a tangible and meaningful object (Horn & Bers, 2019; Papert, 1980; Resnick et al., 1990); other research claims that the visual characteristics of the more virtual world are a better way to develop mental representations of the program based on the structure of the data flow (Navarro-Prieto & Cañas, 2001; Segura et al., 2020). In order to explain the learning outcomes resulting from different output modalities in programming, more research is needed into the influence of the interaction between a visual programming environment and a tangible or visual type of output (Skulmowski et al., 2016; Zhu, 2021).

Visual programming environments use on-screen code elements that help novice programmers to easily understand and construct the process of programming (Price & Barnes, 2015). The advantages of these visual programming environments are that no specific syntax needs to be mastered and that the level of abstraction is low, due to the high extent of visualisation and generalisation (Weintrop & Wilensky, 2015). Visual programming environments are also intended to support the understanding and application of control flow structures (Chao, 2016). Due to their attractiveness, transparency and clarity, visual programming environments can help to increase user engagement in solving programming tasks (Asad et al., 2016). In a visual programming environment, a computer program to solve a computational problem is constructed by manipulating visual programming elements in order to formulate and design a solution to the problem (Sáez-López et al., 2016). Through the on-screen execution of the constructed program, direct visual feedback can be obtained from which the user can anticipate and determine the subsequent interventions by means of problem-solving actions (Moreno et al., 2011; Tsai, 2019). Through simulation, visual programming environments can provide more complex and richer functionality than the physical boundaries of artefacts in the material world allow. Direct feedback obtained from visual output may be experienced as more powerful than feedback obtained via tangible output (Caci et al., 2013a, 2013b; Sefidgar et al., 2017). Visual programming environments also often include integrated, directional incentives which provide the user with instant information on whether the programming solution is the optimal one, or whether it could be constructed more efficiently. These incentives provide guidance, and the user may decide to use them when support is needed (Karalekas et al., 2020). Seen from these perspectives, visual programming environments offer excellent opportunities for solving challenging programming problems and acquiring CT (Papadakis et al., 2016; Rose et al., 2017).

In programming with a visual output, the application and execution of each programming operation is displayed purely on a screen (Sapounidis et al., 2015). The information obtained from the programmed operation can be characterised as two-dimensional percipient (Mladenović et al., 2020; Price et al., 2003). The elaboration of programming actions appeals to the more abstract imagine ability and reasoning capacity of the user (Price & Barnes, 2015), but it is not possible to fall back on the tangible and physically perceptible (Horn & Bers, 2019; O'Malley & Fraser, 2004; Sefidgar et al., 2017; Skulmowski et al., 2016). Nothing can be grasped in a hands-on way, and the execution of the programming action cannot be seen from more than one point of view (Sapounidis et al., 2015). When there is a physically perceptible, tangible output of the programming operation, concrete artefacts are controlled by the computer program (Chen et al., 2017; Jonassen, 2006). The information obtained from the execution of the programming intervention is perceptible in a three-dimensional way, from all points of view (Korkmaz, 2018). Imagination and reasoning abilities are stimulated at a low level of abstraction, and the execution is tangible at any moment (Bers, 2020; Ilieva, 2010; Wang et al., 2014). In addition, users can check their expectations of the execution in physical reality, at any point in time (Marshall, 2007).

Building on the above theoretical exploration, we hypothesise a relation between the SRA approach in a visual programming environment with different types of output and an influence on computational thinking. In addition, depending on the evocation of SRA thinking, we expect an influence on CT caused by a greater understanding of the concepts of programming. It is also expected that the more complex concepts of programming will initiate a more profound development of CT, and that the direct feedback obtained from visual output during visual programming will be more powerful than the feedback that can be derived from the execution of a visual program using a physical artefact. Our conceptual model, shown in Fig. 1, provides an overview of the relationships and interconnections between the independent and dependent variables, in which some connections are reciprocal.

Fig. 1
figure 1

Schematic representation of the conceptual model

Research question, sub-questions and hypotheses

Based on preliminary studies and research in the literature, our main research question is: What is the influence of the type of output in a visual SRA-programming environment on the development of CT and complex programming concepts among primary school pupils?

Supplementary to the main research question, our sub-questions are:

  1. 1

    To what extent can the influence on the development of CT be attributed to the evocation of SRA thinking and the type of output (tangible/on-screen) in SRA-programming?

  2. 2

    What is the impact of visual SRA-programming environments (task design) on the understanding of complex programming concepts?

  3. 3

    What is the influence of the understanding of complex programming concepts on CT?

These sub-questions result in the following hypotheses:

  1. 1.

    Pupils who apply SRA-programming in a visual programming environment show the development of CT.

  2. 2.

    Applying SRA-programming in a visual programming environment with visual, on-screen output leads to a higher level of development of CT compared to SRA-programming in a visual environment with tangible output.

  3. 3.

    The application of SRA-programming in a visual programming environment with visual, on-screen output has a greater impact on the understanding of complex programming concepts than SRA-programming in a visual environment with tangible output.

Method

This research should be seen as an exploratory approach to gain more insight into the effects of the difference in output on the development of CT. To this end, an exploratory study was conducted in which, by application of a pre-/post-test questionnaire survey, quantitative data were obtained (a) to determine the effect of the intervention; (b) to assess the associated hypotheses and (c) to investigate the research questions.

As a pre- and post-assessment, a questionnaire on CT was applied. Quantitative data were collected from this exploration in order to answer the research question, sub-questions and hypotheses. In order to investigate the effect of the intervention, we used a pre-test/post-test design as illustrated in Fig. 2. For the dependent variable, this included the scores from a pre-/post-assessment of CT, and for the independent variables, we used two variations of visual SRA-programming interventions that differed in output (perception), namely Bomberbot, with on-screen output, Lego EV-3, with tangible output, and a control group that were not asked to program.

Fig. 2
figure 2

Research design

Participants

This research was conducted with pupils that ranged in age from 9 to 12, from grades 5 and 6Footnote 1 (N = 156), from various primary schools selected at random from the South of the Netherlands, and from which two experimental groups (Lego n = 47/Bomberbot n = 50) and a control group (n = 59) were randomly composed. None of the participating pupils were familiar with programming, apart from using basic computer programmes such as Word, PowerPoint and the Internet, and none had previously taken a CT test. The control group did not receive any programming offerings and followed the regular curriculum during the implementation of the study. All of the pupils from the primary schools involved were randomly assigned to one of the three conditions (Lego Ev-3, Bomberbot, control group). This ensured that the control group was a realistic reflection of the target group.

Materials

To answer our research questions, we used two visual programming environments with different types of output. This allowed pupils to learn complex programming concepts by applying SRA thinking. For each application, we aimed to deduce whether there was a difference in the effect on CT due to the different type of output. We used Bomberbot©, with a visual, on-screen output, and Lego EV-3 Mindstorms© robots, with a tangible output.

Bomberbot is a game-based, visual programming environment with an on-screen output in which pupils can learn the concepts of programming in a playful way. Bomberbot is a robot simulation that can be programmed to accomplish in virtual world tasks such as collecting stars, smashing obstacles and gems, opening treasure chests, and sliding unexpectedly across slippery surfaces. Using the ‘drag and drop’ method, programming commands need to be dragged from a command library into a worksheet in the correct sequence (El-Hamamsy et al., 2021). The command library provides the ability to work with fundamental programming concepts such as iterations, conditionals and functions, which allows pupils to comprehend the principle of operation and effects of these concepts. Bomberbot is designed to provide the user with direct, self-correcting feedback and is characterised by a build-up from beginner to advanced level. The programming tasks to be performed are predefined, and consist of 20 missions to be carried out, with ascending complexity. The aim is to achieve all missions as efficiently as possible, allowing the user to earn various numbers of stars, ranging from three stars for the most efficient solution to one for each working solution. In Bomberbot, both the programming environment and the execution environment are represented simultaneously within the same on-screen image. Figure 3 shows the basics of programming in Bomberbot.

Fig. 3
figure 3

Programming in Bomberbot©

Lego EV-3 Mindstorms is a visual robotics programming environment in which tangible, perceptible robots are controlled by means of an interface. The application of Lego EV-3 is characterised by a direct effect in the physical space. By using ‘drag and drop’ programming blocks, which are arranged into a worksheet, the program controls a set of actuators and sensors. The controllable parameters and variables inside these blocks (for instance speed, direction, rotation, detection) can be influenced. Lego EV-3 teaches the applicability of basic concepts of programming such as iterations, conditionals and functions, and either predefined assignments or self-designed tasks can be used. In this research, we used 15 training missions and five final challenges in which a predefined robot, equipped with a push-button sensor and an ultrasonic sensor, had to be programmed to navigate through various labyrinth setups. The game element of the Lego challenge tasks is that SRA-programming ultimately leads to the most efficiently constructed program. This is determined through the application of SRA, based on the runtime of the robot when successfully completing each challenge task. In Lego EV-3 Mindstorms, the programming environment is separated from the physical task environment. Figure 4 shows a programming solution devised using Lego Mindstorms software, and Fig. 5 shows an example task in Lego EV-3.

Fig. 4
figure 4

SRA-programming in Lego EV-3 Mindstorms©

Fig. 5
figure 5

Example of a labyrinth task in Lego EV-3©

To determine the level of CT between the pre- and post-measurements, we used the validated Computational Thinking test (CTt) (Román-González et al., 2017). This test makes it possible to generate information on the level of solving CT tasks (hypothesis 1), the understanding of the computational concepts involved (hypothesis 2), and the understanding of complex programming concepts (hypothesis 3). All pupils involved in this research completed this questionnaire individually. The questionnaire contained a total 28 items that relate to the various computational concepts involved, i.e. basic directions (28 items), loops (repeat times: 13 items, repeat until: 12 items), conditionals (if-simple: six items, if/else-complex: four items, while: four items) and functions (simple functions: four items, functions with parameters: zero items). The existence or nonexistence of nesting can be derived (19 items). The computational task required (completion: nine items, debugging: five items, sequencing: 14 items) to provide the right solution for each of the 28 questionnaire items can be deduced. To determine the reliability of the scale, we calculated Cronbach's alpha. It should be noted that a value for Cronbach's alpha of 0.70 is considered an acceptable reliability factor (Santos, 1999). The developers of the CTt indicate that for fifth and sixth graders (N = 176), Cronbach's alpha should be α = 0.721 (Román-González et al., 2017). We measured a value of Cronbach's alpha of α = 0.679, thus almost complying with the required level of internal consistency for our scale with this particular sample. An explanation for this slightly lower value for Cronbach's alpha can be found in the fact that the research sample size was smaller than that for which the designers of the CTt validated the test. In addition, the age category of the research population included in this study was towards the lower limit of the test, which may be the reason for the lower reliability. Taking this into account, the CTt performed almost as reported by the original authors, and the measurement results obtained were therefore be used as such.

Procedure

As a pre-test measurement (Fig. 2), all three groups completed the CTt. Following this, the group that were to program with Lego EV-3 Mindstorms received basic instruction and then completed 15 programming tasks, with five final challenge assignments in five 1-h sessions, by applying SRA-programming. The group that was to use Bomberbot completed, after a short introduction, 10 programming missions in five 1-h sessions using SRA-programming, each consisting of 15 programming tasks. The control group did not program with either of the programming environments. At the conclusion of the investigation, all three groups completed the CTt as a post-test measurement.

Results and data analysis

Our main research question, “What is the influence of the type of output in a visual SRA-programming environment on the development of CT and complex programming concepts among primary school pupils?”, is answered by analysing the means for all the variables measured in this research. We aimed to explore (i) whether one of the two visual programming environments (Bomberbot and Lego EV-3) with different types of output, and if so which, led to significant differences with respect to the control group, and/or (ii) whether significant differences may occur in a comparison between the two programming environments. To this end, a variance analysis (Anova) and Levene's test were initially conducted for all variables to obtain preliminary indications and to assess whether equal variances should be assumed. Since specific hypotheses were formulated in this study, a subsequent contrast analysis was carried out to demonstrate possible significant effects and to confirm or reject these hypotheses. To make the magnitude of the effects visible, Cohen’s d was calculated.

The pre- and post-measurement results from the CTt were entered into SPSS for quantitative data analysis, and the effects of the independent variables on the dependent variables were assessed. The differences in values were determined by comparing the means. In all of our statistical analyses, a significance level of 5% (p =  ≤ 0. 05) was assumed. The nature of the data met the conditions for the assumption of normality, indicating that the distribution of sample means (across independent samples) was normal. We tested whether our assumptions of the homogeneity of variances were violated (p ≤ 0.05). Degrees of freedom were calculated, and a bootstrapping procedure was applied to re-estimate the standard error of the mean difference. The confidence interval was studied to assess the difference between the means and to determine whether a value of zero was within the confidence interval. The Shapiro–Wilk test, used to demonstrate the normality of the variables, gave a value of p = 0.126, which was greater than the chosen alpha level of 0.05. The null hypothesis could not be rejected, as there was evidence that the tested data were normally distributed. Histograms also showed a normal distribution. It could therefore be assumed that all variables were normally distributed. The value for the extent of the effect size (Cohen's d) was calculated (it should be noted that d = 0.2 can be considered a small effect size, d = 0.5 represents a medium effect size, d = 0.8 indicates a large effect size, and any value above d = 1.4 is considered a very large effect) (Field, 2013).

Differences in the level of development of computational thinking

In order to provide a structured overview of the differences in the development of CT between the three groups, this aspect is divided into subcategories. The basis for this approach originates from the subdivision used by Román-González et al. (2017) in relation to the CTt. Table 1 shows the data for each of the subcategories for the three different groups.

Table 1 Differences in computational thinking

An analysis of the means of the pre- and post-measurement results reveals that in comparison with the control group, the two groups that applied SRA-programming using Bomberbot and Lego EV-3 (a) solved more CT tasks successfully, (b) showed more control over loops, conditionals and functions, (c) showed more use of nesting and (d) applied sequencing, completion and debugging more often for the required task application. This can be deduced from the data, as both groups that applied SRA-programming with Bomberbot or Lego EV-3 showed higher average scores (M) in the post-assessment than the control group for all variables measured. The mean values should be considered as the average number of correctly answered questions per respondent, normalised to a value from zero to one. This increase can also be deduced from the percentage values that were calculated for each variable separately for each intervention group. This percentage calculation was applied because the three different intervention groups differed in terms of the number of respondents (Bomberbot: n = 50; Lego EV-3: n = 47; control group: n = 59). The percentage values per category were calculated on the basis of the number of items per category compared to the total number of items in the questionnaire, and this value was multiplied by the calculated mean (M). By illustrating the differences between the two intervention groups and the control group in the form of percentage values, the effect and impact of SRA-programming on CT could be objectively compared. For the combined categories, the percentages were calculated on the basis of the weighted averages. The category "loops: combined" is an aggregation of the subcategories "loops: repeat times" (13 items) and "loops: repeat until" (12 items). Since both subcategories have an overlap of three items, the category "loops: combined" is based on a total of 22 items.

The category "conditionals: combined" is an aggregation of the subcategories "conditionals: if-simple" (6 items), "conditionals: if/else" (4 items) and "conditionals: while" (4 items). Since there are two items that overlap between these three subcategories, the category "loops: combined" is assumed to contain a total of 12 items. Based on the measured values obtained, it can be stated that SRA-programming, with either Bomberbot or Lego EV-3, was the cause of this increase in comparison with the control group. This is despite the fact that for some variables, only a slight increase in the measured values in the post-test could be established. The control group did not score better in any category than either the Bomberbot and Lego EV-3 groups. In addition to the increase for the Bomberbot and Lego EV-3 programming interventions, it was striking that the control group showed a decline in the value measured for each variable.

Development in solving computational thinking issues at a higher level

In order to gain more insight into the impact of the interventions based on SRA-programming and whether this led to a higher level of CT, a contrast analysis with a three-group comparison was performed. Table 2 shows the data for the contrast analysis as applied to all variables, in a comparison for all groups.

Table 2 Contrast analysis with a comparison of SRA-programming for all groups

Results of the contrast analysis and main significant effects

A contrast analysis of the total number of correctly solved CT tasks (28) shows that there was a significant difference, with a medium measurable effect, between the group that programmed using Bomberbot and the control group t (153) = 2.527, p = 0.013, d = 0.409, and an almost significant difference with a small effect between the group that programmed using Lego EV-3 and the control group t (153) = 1.810, p = 0.072, d = 0.293. No significant difference could be measured between the groups that programmed using Bomberbot and Lego EV-3 (p = 0.517).

A contrast analysis of the differences in the application of “loops: repeat times” shows that no significant difference could be measured between the group that programmed using Bomberbot and the control group (p = 0.247), but there was a significant difference with a small effect between the group that programmed using Lego EV-3 and the control group t (153) = 2.150, p = 0.033, d = 0.348. No significant difference was found between the groups that programmed with Bomberbot and Lego EV-3 (p = 0.333).

A contrast analysis of the differences in the combined application of “loops: repeat times” and “loops: repeat until” shows that there was no significant difference between the group that programmed using Bomberbot and the control group (p = 0.122), but a significant difference with a small effect was measured between the group that programmed using Lego EV-3 and the control group t (153) = 2.078, p = 0.039, d = 0.336. No significant difference was measured between the groups that programmed using Bomberbot and Lego EV-3 (p = 0.597).

A contrast analysis of the differences in the application of “conditionals: if-simple” shows that there was a significant difference with a medium effect between the group that programmed using Bomberbot and the control group t (153) = 3.085, p = 0.002, d = 0.499, but no significant difference between the group that programmed using Lego EV-3 and the control group (p = 0.566). There was a significant difference, with a small, negative effect, between the groups that programmed using Bomberbot and Lego EV-3 t (153) =  − 2.336, p = 0.019, d =  − 0.383.

A contrast analysis of the differences in the application of “conditionals: if/else” shows that there was a significant difference with a medium effect between the group that programmed using Bomberbot and the control group t (153) = 2.059, p = 0.041, d = 0.333, but no significant difference between the group that programmed using Lego EV-3 and the control group (p = 0.236). No significant difference was measurable between the groups that programmed using Bomberbot and Lego EV-3 (p = 0.423).

A contrast analysis of the differences in the application of “conditionals: while” shows that a significant difference, with a medium to large effect, was measured between the group that programmed using Bomberbot and the control group t (153) = 3.538, p = 0.001, d = 0.572, and an almost significant difference with a small effect between the group that programmed using Lego EV-3 and the control group t (153) = 1.814, p = 0.072, d = 0.293. No significant difference was measurable between the groups that programmed using Bomberbot and Lego EV-3 (p = 0.111).

A contrast analysis of the differences in the combined application of “if-simple”, “if/else” and “while” conditionals shows that there was a significant difference, with a medium to large effect, between the group that programmed using Bomberbot and the control group t (153) = 3.831, p = 0.000, d = 0.619, and an almost significant difference, with a small effect, between the group that programmed using Lego EV-3 and the control group t (153) = 1.660, p = 0.099, d = 0.268. A significant difference, with a small, negative effect, was measurable between the groups that programmed using Bomberbot and Lego EV-3 t (153) =  − 2.027, p = 0.044, d =  − 0.328.

A contrast analysis concerning differences in the application of “nesting” shows that there is a significant difference with a medium effect between the group that programmed using Bomberbot and the control group t (153) = 3.274, p = 0.001, d = 0.529, and that a significant difference with a small effect was measurable between the group that programmed using Lego EV-3 and the control group t (153) = 1.980, p = 0.049, d = 0.320. No significant difference was measurable between the groups that programmed using Bomberbot and Lego EV-3 (p = 0.235).

A contrast analysis of the differences in the required task “completion” shows that a significant difference with a small effect was measurable between the group that programmed using Bomberbot and the control group t (153) = 2.469, p = 0.015, d = 0.399, but no significant difference between the group that programmed using Lego EV-3 and the control group (p = 0.319). No significant difference was measurable between the groups that programmed using Bomberbot and Lego EV-3 (p = 0.171).

A contrast analysis of the differences in the required task “sequencing” shows a significant difference with a medium effect was measurable between the group that programmed using Bomberbot and the control group t (153) = 2.920, p = 0.004, d = 0.472, and a significant difference with a small effect between the group that programmed using Lego EV-3 and the control group t (153) = 2.131, p = 0.035, d = 0.345. No significant difference was measurable between the groups that programmed using Bomberbot and Lego EV-3 (p = 0.478).

An analysis of the data obtained from this contrast analysis indicates that the influence of the different types of output by applying SRA-programming can have significant effects on the development of different characteristics of CT. Notable among these are the significant values that we have indicated with an * in Table 2. An interpretation of the reported results makes it clear that depending on the characteristics of the programming environment, a significant development of the (sub) characteristics of CT is measurable. A possible explanation for this may be that Bomberbot, with its visual, on-screen output, more effectively stimulates the use of conditional, cause-and-effect reasoning by means of the applied task design, and that Lego EV-3, with its tangible output, facilitates the understanding of iterations and nested loops and other types of linear thinking. The difference in the ability to perceive the impact of the programming intervention in Bomberbot more directly may also explain the significant results found. Bomberbot appears to be more focused on parallel programming interventions, whereas Lego EV-3 still leaves much room for finding a programming solution via linear programming. It is also noticeable that Bomberbot incorporates stimuli that encourage pupils to keep searching for the most optimal and efficient programming solution, which seems to have an effect in the development on sub-characteristics of CT.

Conclusions

From an examination of the data, it can be deduced that the control group initially showed a higher starting level (pre-assessment) for all variables present in this study, compared with the two other experimental conditions (Bomberbot and Lego EV-3). However, the difference in the increases between the pre- and post-assessments of CT for the three different conditions indicates that the groups that received an SRA-programming intervention involving either Bomberbot or Lego EV-3 showed more growth (difference between pre- and post-assessment) and also achieved higher end-values in the post-measurement compared to the control group. In fact, the end-values for the control group did not increase, or hardly increased (or even decreased). This supports our claim that the application of SRA-programming is the cause of this identified increase. It can therefore be assumed that due to the application of SRA-programming in both visual programming environments, pupils (1) solved more CT tasks correctly; (2) solved more questions correctly which required the application of loops, conditionals, functions and nesting and (3) showed an increase in the application of CT (e.g. completion, debugging and sequencing) compared with the control group, thus indicating that SRA-programming contributes to a better understanding of complex programming concepts.

A further interpretation of the available data shows that there was a significant increase in the level of CT as a result of the application of SRA-programming using two different visual programming environments that differed in terms of output. With regard to the application of programming concepts and CT, both environments were comparable; the differences in the effects measured for both environments were caused as a result of the distinguishing features of the task design. In addition, the extent and type of visualisation and the visual program itself also seem to have a strong influence. The data analysed here indicate that both visual programming environments, with either a visual or tangible output, provided a substantial development in CT compared to the control group. Significant yields were measured for all variables considered in this research. The extent of the effect of applying SRA-programming was medium to large. The findings of this research indicate an impact on CT due to the use of SRA-programming.

The hypothesis that applying SRA-programming in a visual programming environment leads to a development of CT can be confirmed. This can be deduced from the total number of correctly solved CT tasks: both the Bomberbot and Lego EV-3 groups showed a substantial increase in the number of correctly solved CT tasks compared to the control group. For the group that programmed with Bomberbot, this development was significant, whereas for the group that programmed with Lego EV-3, this development was almost significant.

The hypothesis that the application of SRA-programming in a visual programming environment with visual, on-screen output leads to a higher level of development of CT compared to SRA-programming in a visual environment with tangible output can be confirmed based on the values for the required CT tasks of “completion” and “sequencing” considered in this research. For the CT task of “debugging”, no significance could be found, despite an increase in comparison with the control group.

The hypothesis that the application of SRA-programming in a visual programming environment with visual, on-screen output has more impact on the understanding of complex programming concepts than SRA-programming in a visual environment with tangible output can be confirmed based on our results for “conditionals” and “nesting”. The values for “loops: repeat times” and “loops: combined” indicate that the application of Lego EV-3 causes a higher level of development than Bomberbot. The values for “loops: until” and “functions” indicate that despite an increase in comparison with the control group, no significance can be found for either visual programming environment, i.e. Bomberbot or Lego EV-3.

In general, it can be stated that visual SRA-programming environments with either a visual output or a tangible output are characterised by their own specific yield. In comparison with the control group, both groups showed a substantial development of CT through the application of SRA-programming. The application of SRA-programming within the visual programming environment of Bomberbot, with a visual output, gave a higher development of CT and more understanding of complex programming concepts than the use of SRA-programming with the visual programming environment of Lego EV-3, with a tangible output.

Discussion

The aim of this research was to find answers to the question of whether the type of output in a visual SRA-programming environment influences the development of CT and the understanding of complex programming concepts.

Our findings indicate that the use of visual SRA-programming environments with different types of output is a suitable way to establish the relationship between SRA-programming and the development of CT. The results obtained from our research show that visual SRA-programming with on-screen output predominantly leads to a higher development of CT and better understanding of complex programming concepts than visual SRA-programming with tangible output. We can also confirm that a visual programming environment with an on-screen output can create a higher level of understanding of complex programming concepts. This is in line with the assertions of Carlisle (2009), López et al. (2021), Werner et al. (2012) and Williams et al. (2015), who state that visual/on-screen oriented programming environments with an on-screen output provide an accessible, functional way to introduce these concepts into primary education. In this way, pupils acquire functional programming skills at a higher level, with a positive effect on CT.

Our results show that following the use of visual SRA-programming in both environments, pupils subsequently show development of computational concepts and, due to the influence of the SRA approach, can solve programming tasks of higher complexity. This is in line with claims made by Caci & D'Amico (2002) and Wong (2014), who state that anticipating differences between expected and observed events in the task design triggers a reconsideration of the underlying reasoning. The subsequent programming action ensures that pupils develop a higher level of cognitive ability, resulting in the ability to solve more complex programming tasks. The questions that then arise relate to why visual SRA-programming with an on-screen output gives rise to significant developments in terms of (1) the required CT tasks of “completion” and “sequencing”, and (2) complex programming concepts, "conditionals" and "nesting"; and (3) why a tangible output gives rise to significant developments in the complex programming concepts of "loops: repeat times" and "loops: combined"? We conjecture that the underlying explanation is the more structured setup of the programming environment of Bomberbot, in comparison with the more open form implemented in Lego EV-3. A further explanation can be found in a study by Korkmaz (2018), who states that visual programming environments with a visual output make a more positive contribution to the development of logical-mathematical reasoning than those with a tangible output, but that a tangible output makes a more positive contribution to problem-solving skills.

Another aspect that may have influenced the results of the use of SRA-programming, apart from (1) the difference in output, (2) the similar difficulty of the task design and (3) the same drop-down method of programming, is that the two visual programming environments differ in the way feedback is provided to the user. In Bomberbot, feedback is provided as a constant guiding trigger, which provides pupils with input via visualisation as to whether the solution they have programmed is the most efficient one. This functions as an incentive, and more or less as a form of ongoing coaching. This view is endorsed by Papadakis et al. (2014), who state that in a visual on-screen programming environment, incentives can be used as a strong motivational tool for pupils to strive for the most efficient programming solution. However, Lego EV-3 is characterised by a more open, problem-solving approach to programming assignments. Pupils only receive feedback on the program that can be perceived by the concrete, tangible robot performing the programmed task as a reflection of the requirements of the environment and the task. This is in line with the perceptions of Asada et al. (1999), who claim that changes in a physical robotics environment evoke the programming actions to be taken, and can therefore be seen as directional feedback. The question that then arises is whether a visual incentive as a direct stimulus in a programming environment with a simulated reality works better than powerful, physical feedback as indirect stimulus in a tangible robotics environment. In a planned follow-up study, we expect to gain a better insight into the influence and relevance of the form of guidance that such programming environments provide to pupils.

Another noteworthy aspect is why pupils who are already familiar with the added functionality of the more efficient SRA-programming often do not apply it on a self-initiated base. One explanation is that pupils get stuck in the routine of a sequential trajectory, and/or that their first programming experiences have established the basis for choosing a linear approach without question. Our findings show that when using Bomberbot, pupils begin to understand initiated actions on which parallel programming routines rely. From a pedagogical perspective, it can be assumed that experience of programming with Bomberbot prior to experience of robotics programming could provide an opportunity to deter pupils from a linear, sequential programming approach.

For the control group, no measurable retention was caused by the application of the CTt. It may be a coincidence that this group showed a decline in CT, but it can be stated with certainty that no increase was detectable. It is also possible that pupils from the control group were less motivated to complete the CTt a second time, compared to the experimental groups. This may have been because pupils from the experimental groups were more likely to see a relationship between the assessment and the intervention. Further research is needed to explain this.

A further analysis and comparison between the use of Bomberbot and Lego reveals a number of striking findings. The in-depth examination of the contrast analysis (Table 2) shows that for the computational concept of "conditionals", Bomberbot showed significant improvement compared to Lego, and exactly the reverse was true for "loops". We conjecture that explanations for these differences can be derived from the specific characteristics of these programming environments. A primary characteristic of Bomberbot is that the visualisation of processes resulting from programming interventions, based on the on-screen display, is more imaginative. The ability to establish a relationship between a programming action and the execution of the programming process, and the perceptible effect of what each variable does, is immediately apparent in Bomberbot. Using Lego EV-3, this processing is indirectly visible via the tangible robot, and only the progress of the programming execution can be seen on the screen. From this, it can be inferred that the programming process is much more imitable in Bomberbot than in Lego.

We also believe that the design of each programming environment and the way in which information is provided to the user before, during and after the process may have a significant influence. The visualisations used and the instant feedback and support offered in Bomberbot can be conceived as direct guidance. The graphic design used for the tasks to be programmed and the requested application of conditionals in Bomberbot immediately appeal to parallel thinking as a characteristic of SRA-programming. This is in contrast to Lego Mindstorms software, where this indication of parallel thinking occurs indirectly, and the user must establish a relationship with parallel thinking independently. From this, we can conclude that the use of SRA-programming in Bomberbot is suggested more explicitly than in Lego EV-3 Mindstorms. The provision of incentives in Bomberbot also supports this statement.

This research contributes to the theory of CT and programming education. It also can be directional in terms of how programming can be operationalised within education. Based on our significant results, it can be concluded that the SRA approach is a promising concept for developing CT in a more profound and meaningful way. From the effects measured, it can be concluded that the application of SRA-programming, when programming in a visual environment, leads to an increased capability to solve programming tasks at a higher level of complexity. It does not matter what kind of output is used within a visual programming environment, as long as a SRA approach is applied. From this point of view, the user's preference for a particular type of output can be matched accordingly. Overall, it can be concluded that the design and specifications of the programming environment, rather than the difference in the type of output, partly determine the expected learning effect on specific characteristics of complex programming concepts, resulting in a higher level of CT.

Limitations and follow-up research

Several limitations and considerations can be identified regarding the results of this research. Due to these a certain lack of generalisability of the results obtained should be taken into account.

During the course of this research, it may be the case that non-experimental variables played a determining role in the final results. Reasons for this may include that pupils continued to work with programming environments at home or within their current primary school curriculum, or that pupils have developed over time as a result of their standard educational programme. In addition, our findings could also be explained by taking into account children's familiarity with computer games with visual output representing tangibles and/or their previous computer experience.

This research made use of two visual programming environments: Bomberbot, with an on-screen output, and Lego EV-3 Mindstorms, with a tangible output. In order to be able to generalise the results of our research, it should be replicated with other visual programming environments with different types of on-screen and tangible output. The issues of whether the nature of the programming task and the level of difficulty affect the outcome and learning effects should also be examined. There are also arguments that the use of visual programming environments with incentives provides a low threshold for giving guidance to users; the user is guided more explicitly through the tasks, and it is therefore questionable whether these incentives restrict freedom of choice.

It would also be interesting to further investigate (1) whether the use of SRA-programming in a visual programming environment in which a tangible output is first used and then a visual output (or vice versa) yields a greater understanding of complex programming concepts; (2) whether this can be attributed to the application of SRA and (3) to what extent this results in the subsequent measurable development of CT. We note that SRA-programming of robotics with tangible output involves a very different form of application of SRA compared to visual SRA-programming with on-screen output. Learning to apply SRA in one environment may have (dis)advantages in the other. In follow-up research, it would be worthwhile to further clarify the relationship between the use of SRA-programming, the sequence in which the type of programming environment is applied, and possible differences in the development of CT.

In this research, there was relatively little time available (five sessions of 1 h each) for pupils to learn SRA-programming. If more time had been available, this may have led to different results; for instance, pupils may have gained more understanding of the complex concepts of programming.

In terms of definition and assessing the development of CT among primary school pupils, it should be noted that CT is still a developing concept. The findings and results reported in our research relate to the interpretation of CT as elaborated by Román-González et al. (2017), and as operationalised in their validated instrument to identify and measure the development of CT. In further research, it would be valuable to consider other interpretations relating to CT.

In this research, the researcher also acted as the supporting supervisor. It is important to consider that initiating, supporting and supervising programming activities requires specific competences from the supporting teacher. Moreover, a teacher should be well equipped to adequately facilitate and guide such activities. In subsequent research, guidance will be provided by regular teachers who are competent with regard to the requirements of pedagogical content knowledge and effective coaching.