1 Introduction

One of the persistent challenges in science education research is to promote inquiry-based science teaching from the early stages of the educational system (Akerson & Bartels, 2023). Inquiry teaching is central to achieving students' scientific literacy (Schwartz et al., 2023). However, many teachers may not feel confident or prepared to teach science using inquiry (Baroudi & Helder, 2019; García-Carmona et al., 2018). This may be the case in Spanish-speaking countries, in which the presence of inquiry-teaching is very scarce according to international assessment studies (Mullis et al., 2020). Therefore, to tackle this issue, it is important to first develop assessment instruments that can measure Spanish-speaking teachers' self-efficacy beliefs regarding teaching science using inquiry. Recent literature reviews of international (Blalock et al., 2008; Toma & Lederman, 2022), and Spanish-speaking literature (Toma, 2020) show a dearth of measurement instruments for such a purpose. This article, therefore, aims to address this gap by presenting the Spanish adaptation and psychometric evaluation of Aydeniz et al. (2021) Inquiry-Based Science Teaching Efficacy Scale (IBSTES), which is the first instrument of its kind to include self-efficacy beliefs regarding both helping students develop procedural inquiry skills and epistemological understanding of the nature of scientific inquiry.

2 Theoretical Underpinnings

2.1 Inquiry Teaching and Self-Efficacy Beliefs

The concept of “inquiry” has multiple interpretations. One perspective views it as a mean to science education, which represents how science should be taught. Here, inquiry is considered a pedagogical approach that engages students in exploring scientific phenomena, asking questions, conducting investigations, and communicating findings (Crawford, 2014). Opposing the pedagogical view, another perspective conceptualize inquiry as the desired outcome of science education This interpretation emphasizes the development of scientific skills, both manipulative and cognitive. Manipulative skills encompass identifying variables, formulating hypotheses, using evidence, evaluating explanations, and drawing conclusions, among others (García-Carmona, 2020; Strat et al., 2023). Conversely, cognitive skills focus on understanding the nature of scientific inquiry (Lederman & Lederman, 2020). This includes the epistemological characteristics of the scientific endeavour, such as the distinction between data and evidence, that scientific procedures can influence results or that conclusions must be consistent with the evidence gathered, among others (Lederman and Lederman, 2020).

Inquiry has important benefits. Learning science through inquiry and learning about the nature of inquiry is a central component of students of scientific literacy, which is the ability to use scientific knowledge and skills to make informed decisions (Akerson & Bartels, 2023; Schwartz et al., 2023; cf. Oliver et al., 2021). Doing inquiry and understanding its nature is necessary for students to make informed decisions based on scientific evidence, impacting both personal and societal choices (Schwartz et al., 2023). Hence, it prepares students for the challenges and opportunities of the twenty-first century, where they will need to be able to solve complex problems, communicate effectively, collaborate with others, and adapt to changing situations (Valladares, 2021; Yacoubian, 2018). It also improves students' attitudes, career interest in science, performance success, and achievement, as well as their understanding of the nature and processes of science (Aguilera & Perales-Palacios, 2020; Furtak et al., 2012; Lazonder & Harmsen, 2016; Lederman & Lederman, 2020; Savelsbergh et al., 2016). Such favourable outcomes also apply to pre-service teachers (Strat et al., 2023).

However, implementing inquiry in the classroom, both as a teaching approach or as a content, poses many challenges for teachers, such as a lack of appropriate curriculum materials and resources, time and classroom management issues, and limited pedagogical knowledge and support to design and facilitate effective inquiry activities (Baroudi & Helder, 2019; Chichekian et al., 2016). These challenges affect teachers' self-efficacy, which is their belief in their capabilities to teach effectively (Aydeniz et al., 2021; Bandura, 1997). Self-efficacy beliefs are important because they influence teachers' motivation, persistence, instructional decisions, and teaching quality (Burić & Kim, 2020; Morris et al., 2017). Low levels of self-efficacy can hinder teachers' willingness and ability to implement inquiry and to teach about its nature (Perera et al., 2022), and thus limit the potential benefits of such an approach and content for students' scientific literacy.

2.2 Measuring Science Teachers’ Self-Efficacy Beliefs

There are various instruments that measure teacher self-efficacy, including Bandura's (Bandura, 2006) and Tschannen-Moran & Hoy’s (Tschannen-Moran & Hoy, 2001). However, these instruments are not specific to science teaching. For science teachers, the most widely used and recognized instruments are the Science Teaching Efficacy Beliefs Instrument (STEBI-A, Enochs & Riggs, 1990a) and its version for pre-service teachers (STEBI-B, Enochs & Riggs, 1990b). These instruments assess the beliefs of in-service and prospective science teachers about their ability to teach science effectively. However, these instruments have a limitation: they do not capture the self-efficacy of teaching science as inquiry, which is a key aspect of science education (Crawford, 2014; García-Carmona, 2020).

Instruments focusing on self-efficacy beliefs regarding IBST are scarce. One of the earliest attempts to measure teachers' self-efficacy beliefs regarding inquiry teaching was made by Smolleck et al. (2006), who developed a 69-item Likert scale instrument. However, this instrument has several limitations, both conceptual and methodological. On the conceptual level, the instrument only covers procedural aspects of inquiry teaching, such as formulating scientific questions or communicating scientific explanations, and neglects other important dimensions, such as the nature of scientific inquiry (Lederman & Lederman, 2020). On the methodological level, the instrument has not been validated by factor analysis to explore its underlying structure, and six out of the ten subscales have low Cronbach alpha values (below 0.70), indicating internal consistency reliability issues (Taber, 2018). Moreover, the instrument is very long and may cause administration fatigue. Likewise, its use in investigations that measure multiple outcomes may be limited.

Other attempts to measure to measure teachers' self-efficacy beliefs regarding inquiry teaching consist of ad-hoc adaptation of items. Such an example is the adapted version of the Dimensions of Attitude towards Science (DAS) instrument, which was used by Van Aalderen-Smeets et al. (2017). This adaptation included four items to measure self-efficacy beliefs. However, this approach is problematic, and such instruments have not been subjected to a rigorous psychometric analysis either. Hence, their validity is unknown.

2.3 The Inquiry-Based Science Teaching Efficacy Scale (IBSTES)

The IBSTES is a self-report instrument developed by Aydeniz et al. (2021) to specifically measure pre-service science teachers' self-efficacy beliefs in implementing inquiry-based science instruction. The instrument is based on Bandura's (1997) conceptualization of self-efficacy. Teacher self-efficacy was defined as "beliefs in their pedagogical capacity to implement a specific form of instruction to impact specific student outcomes" (Aydeniz et al., 2021, p. 107). The instrument consists of 29 items that use a seven-point Likert scale, ranging from 1 (the lowest level of agreement) to 7 (the highest level of agreement). The items reflect the vision of inquiry in the United States National Science Education Standards, as well as the recent emphasis on understanding the nature of scientific inquiry (Gai et al., 2022; García-Carmona, 2020; Lederman & Lederman, 2020). Therefore, it represents both behavioral engagement in inquiry practices (e.g., asking questions, designing investigations, or drawing conclusions) and active cognitive, epistemological understandings (e.g., understanding the collaborative nature of the inquiry, the role of empirical evidence, or the contribution of scientists of different backgrounds to scientific endeavor).

The IBSTES has a robust development procedure. Items were developed and assessed against content validity with a panel of experts in science education and psychometrics, cognitive interviews, and a focus group with the target group of pre-service science teachers. Next, psychometric properties were examined against construct validity with exploratory factor analysis –which yielded one interpretable factor that explained 56.25% of the variance–, and internal consistency reliability with excellent results (α > 0.97).

The efforts of Aydeniz et al. (2021) are commendable. Yet, some methodological decisions should be considered when interpreting their findings. A possible limitation is that Aydeniz et al. (2021) did not explain the criteria used for deciding how many factors to retain in their factor analysis. Moreover, they retained only one factor even though they included items that conceptually represent two distinct constructs. On the one hand, ten items seem to measure self-efficacy regarding helping students improve their understanding of the nature of scientific inquiry (all items related to ‘understand’, e.g., Item 2. I feel confident in my pedagogical knowledge and skills to help my students understand the skeptical nature of science). The remaining items seem to measure self-efficacy regarding helping students develop procedural skills to do inquiry (e.g., Item 3. I feel confident in my pedagogical knowledge and skills to help my students formulate scientific questions).

While related, these are conceptually different aspects (García-Carmona, 2020; Osborne, 2014). Inquiry skills represent the procedural, active process of inquiry. On the contrary, the nature of scientific inquiry represents an understanding of how science works and the scientific endeavor (Gyllenpalm et al., 2021; Lederman & Lederman, 2020). This distinction has been widely acknowledged in the literature, and there is a general agreement that they constitute separate, yet complementary and equally important, dimensions of inquiry (Schwartz et al., 2023). This distinction is also reflected in the standards that guided the development of the IBSTES. Moreover, Harman's single-factor criterion suggests that if a single factor explains more than 50% of the variance in an EFA, there may be common method bias. Since only one factor with 56.25% of the variance was originally retained (Aydeniz et al., 2021), the unidimensional structure may be influenced by common-method bias. These issues may limit the validity and reliability of the instrument, as well as its usefulness for assessing teachers' self-efficacy beliefs regarding different aspects of inquiry-based science teaching. A future exploration using confirmatory factor analysis is needed to clarify whether the items fit better an unidimensional or a two-factor structure.

3 Methodology

3.1 Design of the Study

This is an instrumental type of research (Ato et al., 2013). It is defined as studies analysing the psychometric properties of measurement instruments to find evidence of validity. Validity, as defined in the Standards for Educational and Psychological Testing (AERA et al., 2014), refers to “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests” (p. 11).

3.2 Participants

The participants of this study were 428 pre-service teachers enrolled in kindergarten and elementary school education programs at a public university in Spain (University of [anonymized]). They were recruited using a convenience sampling technique, which means that the participants were selected based on their availability and willingness to participate (Cohen et al., 2018). The sample consisted of 94 males, 324 females, and 10 participants who did not provide their gender information. There were 115 kindergarten pre-service teachers and 313 elementary school pre-service teachers. The mean age of the whole sample was 21.13 years (SD = 2.22). The participants had different backgrounds in terms of their university entrance studies: 234 had studied social sciences, humanities, or arts, 92 had studied science or technology, and 102 did not provide this information.

3.3 Translation of the Instrument

The original instrument was adapted to the target language using a cross-cultural translation procedure (Beaton et al., 2000). This involved a forward translation by two independent translators, followed by a reconciliation of the two versions. A back translation was then conducted by another independent translator, who was blind to the original instrument. Any discrepancies between the back-translated version and the original were resolved. Cognitive interviews were conducted with six participants from the target population until data saturation was achieved; the point where enough data has been collected to make the required adjustments. Based on participant feedback, minor language modifications were made to improve the instrument's comprehensibility and cultural suitability. The appendix provides the full text of the items.

3.4 Psychometric Analyses

Confirmatory factor analysis was chosen due to a priori knowledge about the underlying factor structure (DeVellis, 2017). This knowledge was based on two key points: (1) prior research that conceptualizes inquiry as both a teaching approach and an epistemological understanding of science (Crawford, 2014; Lederman& Lederman, 2020; Schwartz et al., 2023), and (2) the inclusion of items reflecting both conceptualizations in Aydeniz et al.'s (2021) original instrument. Two models were tested: a unidimensional model (Aydeniz et al., 2021) and a two-factor model (Fig. 1). The two-factor model had one factor for self-efficacy in teaching the nature of scientific inquiry (items 2, 8, 9, 10, 16, 22, 24, 25, 27, and 28) and another for self-efficacy in teaching inquiry skills (items 1, 3, 4, 5, 6, 7, 11, 12, 13, 14, 15, 17, 18, 19, 20, 21, 23, 26, and 29). An iterative process was used to retain items with standardized loading estimates ≥ 0.7 and r-squared values ≥ 0.4 (Hair et al., 2010). The skewness and kurtosis were within ± 2, satisfying normality. Thus, the Maximum Likelihood (ML) method was used. Model fit was assessed using multiple goodness-of-fit indicators (Marsh et al., 2004). CFI and TLI ≥ 0.95 indicated excellent fit, and ≥ 0.90 indicated acceptable fit. RMSEA ≤ 0.06 and SRMR ≤ 0.08 indicated excellent fit, and ≤ 0.08 and ≤ 0.10 indicated acceptable fit. The model with the lowest AIC had a better fit. When possible, model fit was improved by correlating errors of items with high modification indices (Brown, 2015).

Fig. 1
figure 1

Simplified graphical representation of the two models. Not all items are included for clarity. Results will be presented in tables

Measurement invariance for gender was evaluated for the best-fitting model. Measurement invariance ensures that an instrument measures the same construct across different groups; a crucial aspect for comparing results without group differences affecting the measurement of the construct (Chen, 2007). Invariance is achieved if the instrument meets the following criteria: Configural invariance (same factor structure among females and males), Metric invariance (same factor loadings), Scalar invariance (same intercepts, allowing for means comparison), and Strict invariance (equal residual variances). Invariance is confirmed if the change in CFI after each test is less than 0.01 (ΔCFI < 0.01), indicating no significant impact of group differences on model fit (Chen, 2007).

Two indicators, Cronbach's alpha (α) and McDonald's omega (ω) were used to assess reliability (Taber, 2018). The following cut-off scores determined the reliability level: < 0.60 indicates unacceptably low reliability, 0.60 to 0.69 is marginally reliable, 0.70 to 0.79 is reliable, 0.80 to 0.90 is highly reliable, and > 0.90 is considered very highly reliable (Cohen et al., 2018). Cronbach's α is widely used (Taber, 2018), yet McDonald's ω is more appropriate for ordinal Likert-type items (Hayes & Coutts, 2020).

4 Results

4.1 Construct Validity

The results of the CFA for the unidimensional structure are first presented. The initial model, which included all 29 items, showed a poor fit to the data, with CFI = 0.846, TLI = 0.834, SRMR = 0.051, RMSEA = 0.095, and AIC = 33,404. Based on the criteria of r-squared values greater than 0.4 and standardized loading estimates greater than 0.7, seven items (1, 2, 6, 7, 14, 17, and 18) were removed from the model. The fit of the reduced model improved slightly but still did not meet the recommended cutoff values for TLI and RMSEA, with CFI = 0.901, TLI = 0.890, SRMR = 0.042, RMSEA = 0.089, and AIC = 24,745. To further improve the model fit, the errors of items 3 and 4 were allowed to correlate, as suggested by modification indices. The final model showed a better fit to the data, but still had a high RMSEA value of 0.085, indicating a lack of parsimony. The other fit indices were CFI = 0.910, TLI = 0.900, SRMR = 0.040, and AIC = 24682. Therefore, it could be concluded that the unidimensional model, reported by Aydeniz et al. (2021), did not display a good model fit. The standardized estimates of the final unidimensional model are displayed in Table 1.

Table 1 Standardized estimates for the unidimensional and two-factor structure

Next, the results of the CFA for the two-factor structure are presented. The initial model, which included all 29 items, showed likewise a poor fit to the data, with CFI = 0.854, TLI = 0.843, SRMR = 0.050, RMSEA = 0.093, and AIC = 33325. Eight items were removed from the model due to r-squared values < 0.4 (items 1 and 2) or standardized loading estimates < 0.7 (items 6, 7, 17, 18, 26, and 29). The revised model, which consisted of 21 items, showed improvement in fit, satisfying validity criteria: CFI = 0.925, TLI = 0.916, SRMR = 0.038, RMSEA = 0.079, and AIC = 23557. Furthermore, the model fit was enhanced by adding a residual covariance between items 3 and 4, which were highly correlated within the same factor. The final model, which had the best fit among the tested models, had the following fit indices: CFI = 0.932, TLI = 0.923, SRMR = 0.035, RMSEA = 0.076, and AIC = 23513. The results indicated that the two-factor structure is a valid representation of the data. The standardized factor loadings are reported in Table 1.

4.2 Measurement Invariance

Measurement invariance was tested for the best-fitting model. The instrument was gender invariant at all levels of measurement, as indicated by CFI changes below 0.01 between models. The configural model (CFI = 0.918, ΔCFI = 0.014) showed an acceptable fit, indicating a similar factor structure for males and females. The metric model (CFI = 0.917, ΔCFI = 0.001) showed a good fit, suggesting invariant factor loadings. The scalar model (CFI = 0.917, ΔCFI < 0.001) showed a similar fit, indicating invariant intercepts. The strict model (CFI = 0.915, ΔCFI = 0.002) showed an adequate fit, implying invariant residual variances. These findings suggest that the Spanish IBSTES can be used with female and male pre-service teachers and comparisons based on gender can be conducted.

4.3 Reliability

The two-factor model of pre-service teachers' self-efficacy beliefs, including the nature of scientific inquiry and procedure inquiry skills, was assessed for reliability. The results indicate that both factors had very high-reliability coefficients. The nature of scientific inquiry factor had a Cronbach α of 0.926 and a McDonald ω of 0.927, while the procedure inquiry skills factor had a Cronbach α of 0.944 and a McDonald ω of 0.944. These values suggest that the two-factor model produces reliable results and can be used to measure pre-service teachers' self-efficacy beliefs.

5 Discussion

IBST is a pedagogical approach that emphasizes active student involvement in scientific investigations (Crawford, 2014; García-Carmona, 2020). Successful implementation requires focusing on both inquiry-procedural skills and the nature of scientific inquiry (Schwartz et al., 2023). Inquiry-procedural skills refer to the ability to design, conduct, analyze, and communicate scientific investigations (Osborne, 2014). The nature of scientific inquiry entails understanding the epistemological aspects of scientific endeavor (Akerson & Bartels, 2023; Eliyahu et al., 2021; Leblebicioglu et al., 2017). However, there is a lack of measurement instruments that effectively capture both aspects of inquiry (Blalock et al., 2008), particularly in Spanish (Toma & Lederman, 2022). This study aimed to adapt and assess the psychometric properties of Aydeniz et al.'s (2021) IBSTES instrument in Spanish. The IBSTES is a promising tool for measuring teaching self-efficacy regarding both procedural and epistemological aspects of inquiry. The findings suggest that the Spanish version of IBSTES is a valid and reliable measure for kindergarten and elementary school pre-service teachers.

Regarding its adaptation and translation, the instrument demonstrated cultural relevance, and comprehensibility, and maintained the original item's meanings. The psychometric analysis revealed that the two-factor structure of the IBSTES, involving self-efficacy beliefs regarding (1) helping students understand scientific inquiry and (2) assisting students in developing inquiry process skills, exhibited a better fit than an unidimensional structure. This finding aligns with the vision of inquiry that is being promoted worldwide, emphasizing the epistemological and procedural aspects of scientific inquiry (Lederman & Lederman, 2020; Schwartz et al., 2023). However, Aydeniz et al. (2021) reported a unidimensional latent factor, suggesting a discrepancy potentially stemming from methodological limitations in their factor analysis procedure, lacking criteria for factor retention. Consequently, the present study implies that inquiry teaching self-efficacy is a multidimensional construct, reflecting various dimensions of teachers' confidence in implementing IBST.

This study also found that the two-factor structure of the Spanish IBSTES was invariant across gender groups (Chen, 2007). This implies that the instrument measured the same constructs in the same manner for both male and female students, without any biases or differences in how male and female pre-service teachers interpret and understand the items. This aspect supports the validity and generalizability of the Spanish IBSTES across gender groups and allows for gender comparisons. It is a noteworthy contribution because Aydeniz et al. (2021) did not investigate the measurement invariance of the IBSTES in their original study. Hence, this study adds to the literature on self-efficacy beliefs by demonstrating the cross-gender applicability of the Spanish IBSTES.

Finally, the internal consistency of the Spanish IBSTES was found to be very high, hence the items were consistent in both constructs (Taber, 2018). The reliability coefficients obtained were similar to the original version reported by Aydeniz et al. (2021). This suggests that the Spanish version of IBSTES is a reliable measure for research and evaluation purposes.

5.1 Contribution and Implications

This research offers important theoretical and practical contributions. First, as IBST is a core component of science curricula worldwide, the IBSTES instrument stands as a valuable tool to identify teacher self-efficacy for not only teaching using inquiry but also teaching about the nature of inquiry, a central component of scientific literacy (Gyllenpalm et al., 2021; Lederman & Lederman, 2020; Schwartz, et al., 2023). The use of this tool, therefore, would allow for targeted professional development to address gaps in teacher self-efficacy. Furthermore, the tool holds particular value for Spanish-speaking countries undergoing curricular reforms that emphasize inquiry and its nature within science teaching practices.

Second, the Spanish-speaking immigrant population is increasing worldwide, especially in Europe and the USA. The diaspora of Spanish-speaking people across the globe makes for interesting opportunities to explore how Spanish-speaking teachers and pre-service teachers feel in terms of their preparation for teaching inquiry skills and the nature of inquiry given the contexts in which they prepare to teach. This aspect creates a need for Spanish-language assessment tools in science education. Therefore, this study provides a valuable contribution to the literature.

Third, in terms of theoretical implications, the findings confirm that pre-service teachers' self-efficacy beliefs in teaching science through inquiry are not a single construct, but rather two related aspects. This means that pre-service teachers may have varying levels of confidence and competence in facilitating students' understanding of scientific inquiry and in supporting their inquiry skills development. To the best of the authors’ knowledge, this has not been empirically confirmed in extant literature; hence, this represents a novel contribution. The implications are that teacher education programs should address both aspects of IBST self-efficacy and offer opportunities for pre-service teachers to develop and improve their procedural skills and nature of inquiry understandings.

Finally, in ters of methodological contributions, this study introduces a novel instrument that differentiates between epistemological and procedural aspects of inquiry. It is a significant endeavour because it allows future investigations to understand Spanish-speaking self-efficacy in inquiry teaching. The instrument demonstrates strong psychometric properties and can be used by researchers and practitioners in various ways. For instance, it can be used in descriptive studies to assess pre-service teachers' self-efficacy and investigate influencing factors. Furthermore, it can be used to evaluate the effectiveness of teacher education programs and interventions aimed at improving pre-service teachers' self-efficacy. The existing research and measurement instruments on science teacher self-efficacy focus only on procedural inquiry skills (Bleicher, 2004; Smolleck et al., 2006; van Aalderen-Smeets et al., 2017). Hence, the Spanish IBSTE allows studies to also focus on teacher self-efficacy for promoting students’ understanding of the nature of scientific inquiry, which is an aspect worth investigating.

5.2 Limitations and Avenues for Future Research

The findings of this investigation should be interpreted in light of several limitations. The instrument was adapted and validated for kindergarten and elementary pre-service teachers. Therefore, its applicability to other groups may be limited. Future research should consider testing the instrument with different samples, including in-service primary and secondary teachers. Additionally, the convenience sampling technique used in this study may have reduced the representativeness of the results. Furthermore, it should be noted that the gender distribution was imbalanced, with a majority of female pre-service teachers. Despite these limitations, the study is timely and relevant because it presents the first self-efficacy instrument for Spanish-speaking pre-service teachers that specifically measures their efficacy beliefs regarding the teaching of inquiry and the nature of inquiry.