1 Introduction

Today's society is technology-driven, which increases the need for technological literacy. Educational policy advocates the development of students' computational thinking, defined as the ability to analyze, and solve problems using computational principles (Chalmers, 2018). It refers to a style of thinking and acting that is fundamental to everyone, not just computer scientists Indeed, it implies progressing from being passive consumers to understanding technology (Shute et al., 2017). As a result, learning to code has attracted interest in educational systems worldwide (Grover & Pea, 2013; Papavlasopoulou et al., 2018).

Coding is usually attributed to computer science studies subjects or majors. Yet, calls for the inclusion of coding from elementary levels onwards abounds (Berland & Wilensky, 2015; Bers et al., 2014; Chalmers, 2018). In this sense, several countries have proposed coding-related school subjects (Balanskat & Engelhardt, 2014). Similarly, more recent trends integrate coding tasks during science and mathematics instruction (Li et al., 2020; Sung et al., 2017). For example, the Next Generation Science Standards in the United States provides a basic curriculum for students’ early coding-literacy, in which mathematics and computational thinking are key scientific and engineering practices (NGSS Lead States, 2013). In Spain, current educational curricula include specific standards related to coding (LOMCE, 2013). A large number of EU countries, such as Denmark, Italy, or Portugal also incorporate coding in their school curricula (Balanskat & Engelhardt, 2014).

In summary, there is a renewed interest in introducing coding into elementary curricula. This is due in part to the availability of easy-to-use educational resources, such as board games, screen-based robots, or floor robots (Casey et al., 2020; Hamilton et al., 2020). Similarly, the growing emphasis on computational thinking has recently resulted in the development of user-friendly coding environments. Examples include Scratch, Code.org, Snap, or Micro:bit MakeCode, all of which use drag-and-drop strings (Fig. 1). Such resources, known as block-based coding or visual coding, enable the introduction of coding in elementary school (Sáez-López et al., 2016; Sung et al., 2017). Unlike text-based programming languages, visual coding uses pre-designed blocks. Each block represents actions, and their combination forms strings of instructions. Doing so, students code using pictographic language, which is then converted into error-free syntax.

Fig. 1
figure 1

Examples of block-based coding environments

2 Statement of the Problem

Much research has been conducted on the benefits of block-based coding at early ages (Sung et al., 2017). Likewise, research on how to introduce coding in primary and secondary school abounds. Yet, students’ acceptance of such resources remains unexplored, especially in the Spanish context. This lack of research is partly due to a lack of valid and reliable instruments for such a purpose, with few exceptions of ad-hoc measures. Hence, this study presents the development and psychometric evaluation of an instrument used to measure acceptance of block-based coding.

3 Theoretical Underpinnings

The proposed instrument is rooted in the Technology Acceptance Model (TAM; Davis, 1989). It was adapted from the Theory of Reasoned Action to study the acceptance of information system technologies. Specifically, TAM was proposed to explain the behavioral intentions when using technological innovations (Davis et al., 1989). This model describes the relationships between important determinants.

These include perceived usefulness and perceived ease of use of the system; attitudes toward use, behavioral intentions, and actual use of the technological innovation (Fig. 2). It represents one of the major models used to explore the acceptance of technology (for reviews, see Al-Maroof et al., 2021; Schepers & Wetzels, 2007). Its use in educational settings has provided insight into the acceptance of a broad array of information systems. Examples include cell phones (Zogheib & Daniela, 2021), statistics learning platforms (Song & Kong, 2017), floor robots (Casey et al., 2020), or role-play games (Suki & Suki, 2019).

Fig. 2
figure 2

The Technology Acceptance Model (TAM). Adapted from Davis et al., (1989). Key determinants highlighted in grey

The TAM posits three main aspects affecting individuals' acceptance behaviors. The first domain is perceived usefulness and relates to beliefs about whether the use of a certain system would improve performance. The second domain is ease of use. It refers to the notion that using a particular system would be simple, and it has a significant impact on perceived usefulness. Both domains are affected by external variables, such as gender, social influence, or support (Davis et al., 1989). The last major domain is attitudes, which are shaped by perceived usefulness and perceived, ease of use of the given system (Davis, 1989). It is postulated that greater levels of perceived ease of use and usefulness develop favorable attitudes; this increases the user's behavioral intention of use, which leads to acceptance of the information system technology (Al-Maroof et al., 2021).

4 Method

4.1 Study Design

This is an instrumental study design (Ato et al., 2013). This design involves research that examines the psychometric properties of measurement instruments.

4.2 Sample and Context

The sample was drawn using purposive sampling from 15 primary schools located in the province of Burgos (Spain). Participants attended a week-long intensive curriculum enrichment program, which included visual coding activities within science and math lessons. The BBC:MicroBit microcontroller board and its block-based environment MakeCode were used. Specifically, students programmed the BBC:MicroBit microcontroller boards into sensors that were used during hands-on laboratory investigations. For instance, a thermometer was programmed to measure the temperature of different water samples. Another example includes coding a timer to determine how long it takes to filter and purify the water (program detailed in Toma, 2020).

There were 337 elementary school students. After removing 22 questionnaires (6.5%) with missing data or more than one response per item, a valid sample of 315 students was formed. Almost half of the participants identified as girls (49.3%). Participants were enrolled in 3rd (n = 25), 4th (n = 77), 5th (n = 104), and 6th (n = 109) grades and had a mean age of 10.18 years (SD = 1.13).

A paper–pencil questionnaire was distributed on the last day of the program. To prevent social desirability, visiting school teachers were absent throughout data collection. Students were informed that their participation was voluntary and anonymous and that their responses would not affect school grades.

4.3 Instrument Development Framework

DeVellis (2017) guidelines for scale development were used. The first step was to determine the construct to be studied. As mentioned earlier, the TAM model was chosen. The second step was to generate a pool of items reflecting the latent variables of the TAM model. The recommended minimum number of items per construct is three. Therefore, six items measuring perceived usefulness, ease of use, and attitudes were developed. The items were worded to refer to the action of "coding" rather than to visual coding itself (Davis et al., 1989). The items on perceived usefulness related to different aspects (e.g., future study, school, life) to capture the phenomenon. The items on ease of use referred to students' perceived difficulty in using visual coding. Finally, the items measuring attitudes referred to positive dispositions toward visual coding. Therefore, simple terms (e.g., enjoyment, fun, or interest) were used. Some items were worded negatively to avoid acquiescence. In the third step, a five-point Likert scale (1: strongly disagree; 2: disagree; 3: Neither disagree or agree; 4: agree; and 5: strongly agree) was chosen as the measurement format. The fourth step was to assess the content validity of the items. Finally, in the fifth step, the revised items were administered to a large number of participants and tested for construct validity.

4.4 Data Analysis

4.4.1 Content Validity

A panel of eight experts (two university professors and six elementary school teachers) reviewed the original pool of items. The experts independently mapped the items to a TAM construct and rated their linguistic appropriateness and clarity (0: not appropriate; 1: appropriate).

4.4.2 Construct Validity

The revised items were subjected to an exploratory factor analysis following contemporary recommendations (Gaskin & Happell, 2014). The TAM model assumes that factors are correlated; therefore, the extraction method Principal Axis Factoring with Promax-oblique rotation was used. Retention of factors was determined using the results of Horn's parallel analysis (Hayton et al., 2004). Items with communalities > 0.30, loadings > 0.40, and no cross-loadings between factors were retained. Since there are gender differences in computer use and programming (Stoilescu & Egodawatte, 2010), an exploratory factor analysis by gender was also conducted. Finally, the correlation matrix was examined for evidence of convergent and discriminant validity. Items on one factor are expected to be more highly correlated than items on another factor. In addition, moderate correlations (r = 0.30 -0.50) between factors are preferable.

4.4.3 Reliability

Several indices were used to examine the internal consistency reliability of each retained factor of the proposed instrument (Hayes & Coutts, 2020). These include Cronbach’s alpha (α ≥ 0.70), McDonald’s Omega (ω), and Spearman-Brown’s split-half (≥ 0.60).

5 Results

5.1 Content Validity

Inter-rater agreement in linking the items to the TAM constructs was high, with at least seven of the eight experts reaching consensus (87.5–100%). Inter-rater agreement on the ratings of each item for linguistic appropriateness and clarity was also adequate, with at least six of the eight experts reaching consensus (75–100%). Few changes, however, were proposed. The word 'irrelevant' was changed to 'of little use to me'. Likewise, 'relevant' was replaced with 'useful', which are easier words for Spanish-speaking elementary students.

5.2 Construct Validity

The Kaiser–Meyer–Olkin measure of sampling adequacy was 0.858 and Bartlett's test for sphericity was statistically significant (χ2 = 1442.397, p < 0.01), supporting the factorability of the data matrix. After parallel analysis, only three latent factors exceeded the eigenvalues of a randomly generated sample with the same characteristics (Fig. 3).

Fig. 3
figure 3

Parallel analysis results. Note: PA: parallel analysis; EFA: exploratory factor analysis

The EFA was then repeated, retaining three factors. Three items had commonalities below 0.30. After removing these items, two items with loading less than.40 and one item with cross-loadings emerged. Removal of these items yielded a final three-factor solution of conceptual meaningfulness that explained 58.46% of the variance (Table 1). The final questionnaire consisted of 12 items distributed as follows: four items measuring attitudes toward visual coding (34.52% variance explained); two positively and two negatively worded items measuring perceived usefulness of visual coding (14.89% variance explained); and three positively and one negatively worded items measuring perceived ease of use of visual coding (9.05% variance explained). The Spanish version of the items is included in "Appendix 1".

Table 1 Results of exploratory factor analysis

Items in each factor were moderately and highly correlated with adequate ranges for the attitudes (0.42 ≤ r ≤ 0.61); usefulness (0.32 ≤ r ≤ 0.59); and ease of use (0.30 ≤ r ≤ 0.48). Hence, factors exhibit convergent validity. Similarly, a moderate correlation between the attitudes and usefulness (r = 0.55), and a low correlation between attitude and ease of use (r = 0.27), and usefulness and ease of use (r = 0.22) provides evidence of discriminant validity of the retained factors.

Exploratory factor analysis by gender resulted in the same three-factor structure explaining 60.49% and 62.49% of the variance for the girls and boys sample, respectively (Table 2). However, one of the items measuring ease of use had a commonality below the 0.30 cutoff for the sample of boys. Since its removal affected reliability and given its acceptable factor loading, it was retained in the final questionnaire. Taken together, these findings provide evidence for construct validity.

Table 2 Results of exploratory factor analysis by gender

5.3 Reliability

Each factor exhibited adequate reliability. Cronbach’s alpha reliabilities were 0.82 for attitude, 0.75 for usefulness, and 0.76 for ease of use factors. McDonald’s omega coefficients were 0.83, 0.74, and 0.68, respectively. The Spearman-Brown split-half indices were 0.80, 0.67, and 0.73, respectively. Taken together, these findings provide evidence of good to excellent reliability for each retained factor.

6 Discussion

The current study presents the development and validation of an instrument for the measurement of Spanish-speaking elementary school students' acceptance of visual coding. Using the TAM model as a guiding framework, psychometric analysis suggests that the proposed instrument is robust regarding validity and reliability evidence. In particular, a panel of experts confirmed that the items developed were linguistically appropriate, and consistent with the theoretical framework adopted. Such results provide evidence of content validity (DeVellis, 2017).

These outcomes were supported by the findings of exploratory factor analysis. Specifically, factorial results indicated a parsimonious three-factor latent structure. Hence, in line with theoretical expectations, usefulness, ease of use, and attitudes towards visual coding are empirically distinct. Moreover, further analyses revealed adequate convergent and discriminant validity, thereby lending support to the construct validity of the instrument (DeVellis, 2017; Gaskin & Happell, 2014). Likewise, separate factor analyses by gender found that there were no differences in the latent structure. This suggests that boys' and girls' acceptance of visual coding can be assessed using the same items.

Additionally, each subscale of the proposed instrument exhibited good to excellent reliability, as indicated by several coefficients (Hayes & Coutts, 2020). Taken together, this investigation advances a promising, valid, and reliable instrument regarding the assessment of visual coding acceptance.

6.1 Implications

To the best of the author’s knowledge, this study represents the first effort to advance an acceptance of block-based or visual coding instruments for the Spanish context. Except for Cheng, (2019), whose study includes Chinese students and uses ad-hoc measures, research on the acceptance of visual coding environments such as Scratch, Blockly, Snap, or Micro:bit Makecode is scarce. Hence, this investigation has taken an important step forward to bridging the gap in prior literature on visual coding.

Elementary school students are introduced to coding through block-based resources (Chalmers, 2018; Sáez-López et al., 2016; Sung et al., 2017). Based on the TAM model, students' behavioral intention to engage in coding-related activities depends on their perception of such resources as useful and easy to use. Ultimately, this will lead to positive attitudes and acceptance of visual coding (Al-Maroof et al., 2021; Davis, 1989). The proposed instrument is therefore likely to be useful in determining whether the visual coding environments that are widely used in elementary schools are, indeed, accepted by students. For example, studies comparing existing block-based resources may be informative as to what coding resources are preferred by students of different ages and gender (Yildiz Durak, 2020). Furthermore, research into the features of such resources that can improve the perceived ease of use, usefulness, and students' attitudes may also be beneficial to the computational thinking research agenda. This could ultimately lead to the refinement and improvement of visual coding environments.

It is also beneficial to investigate the psychometric properties of the instrument in secondary school students. This would contribute to the development of cross-sectional and longitudinal studies that would provide a broader picture of students' acceptance of visual coding resources.

6.2 Limitations

The present investigation does exhibit some limitations. First, it should be noted that the psychometric properties of the instruments were tested with students engaging with one specific visual-coding resource (i.e., Micro:bit MakeCode). Therefore, the latent structure of the instrument should be confirmed in future studies using alternative visual coding environments (e.g., Scratch).

Second, a school-year difference in sample size was also noted. Children from the last four years of Spanish primary school (3rd to 6th grade) were included. Yet, third-graders made up fewer than 8% of participants. This has hampered additional analysis to establish if the factor structure is invariant across school grades. Future studies with a larger sample size are warranted (Gaskin & Happell, 2014).