1 Introduction

What makes it difficult to solve a problem of a scientific nature? Jonassen (2007) responds to this question by stating that the reason is the interaction between the participants, the activity, and the context. When school contexts are added to the research aspects (Östman & Wickman, 2014), the result is a very fruitful field in which to study the interrelationships, obstacles, and pupils’ perceptions during this process.

The objective of the present study was to formulate an initial theoretical approach linked to pupils’ perceptions about difficulties in problem solving (PS) in the Physics classroom, based on a set of variables considered to be relevant to the process, and then, through the existing knowledge in the scientific literature, estimate the interrelationships among those variables and submit this estimate to empirical tests by means of structural equation modeling (SEM). Underlying the set of variables is our acceptance of Bachelard’s idea (Bachelard, 1983) that the obstacles are a form of knowledge which has in general been satisfactory for solving certain problems, but unsuitable for confronting new ones.

The importance of PS, and more specifically its collaborative version, was currently the object of special attention internationally through PISA 2015 and the program “Collaborative Problem Solving”, in which the skills of providing solutions, sharing knowledge, and making efforts towards answering posed questions were assessed (OECD, 2017). Recently, PISA, 2018 has examined what students know in science and what they can do with what they know (Schleicher, 2019): 78% of students attained Level 2 (basic procedural and epistemic knowledges) or higher in science, on average across OECD countries (in Spain, 78.7% students). Only 0.8% of students attained Level 6 (highest level: students can evaluate competing designs of complex experiments, field studies, or simulations and justify their choices); remarkably very poor results in a society as technological as ours.

In light of these results, it would be interesting to know what perceptions students have about the obstacles in PS, in line with what some studies have found, for example, with experimental PS: assigned time, instruction sheets, attendance and preparedness (Deacon & Hajek, 2011), or that good performance on ill- and well-structured problems is sensitive to different social network configurations (Pulgar et al., 2020). As the linear problems are the exception and the complex ones the norm in Physics, a better understanding of students’ views would help to understand the processes of learning physics from a research perspective: most influential obstacles, existence of latent variables, among others, in order to ultimately know their influence on learning outcomes and to incorporate these insights into teaching focused on these perceptions.

2 Material and Methods

In our study, each variable is associated with underlying obstacles that determine students’ difficulties (Astolfi, 1999; Vázquez-Bernal & Jiménez-Pérez, 2016), as detailed in Annex 1. To begin this section, we will discuss the studies that have had a general impact on the understanding of the factors that may affect students’ problem-solving in science learning and then focus on the studies that analyse their perceptions in the field of physics, and which have some kind of relationship with the variables under study (see shortened form in Annex 1). Next, we shall then discuss only those specific studies which have used SEM with their focus being on mathematics, science, and technology. Finally, we will address the role of gender in PS.

2.1 Problem-Solving in Science Learning

In an attempt to model the impact on pupils’ behaviour in performing their school tasks (Tarhini et al., 2016), it was discovered that their confidence (Di), motivation (In), habit, and expectation of their performance (Co) have positive effects. Other research (Dunkley & Blankstein, 2000) illustrates how coping strategies (Pa), emotions, tasks, and evasion, associated with self-critical perfectionism can have important consequences for pupils’ experience of their daily school tasks. In studies of structural relationships between the pupils’ conceptions and their approaches to science learning (Lee et al., 2008), the results reveal that pupils who had constructivist conceptions of science learning (Kn) tended to use profound approaches to it. Another influence on the pupils seems to come from the adoption of new forms of learning, since attitude (Ef) is the variable that stands out most in some models (Park et al., 2012).

Some studies have modeled which factors could predict success in learning, finding that a relevant negative factor was a pupil’s undesirable (In) and withdrawal behaviour (Ef) (Normandeau & Guay, 1998). In this sense, models have shown that self-efficacy directly affects pupils’ achievement (Akın, 2010), as also does social support given to them (Nugent et al., 2015). Another component in PS, the speed of information processing (Pa), can predict school performance, but by way of other higher-level cognitive skills such as intelligence and creativity (Rindermann & Neubauer, 2004). In the case of Physics PS, and particularly equation solving, it has been found that, by the last year of primary education, most pupils already had a basic relational view of equivalence (Ca) and began to compare the two sides of an equation (Rittle-Johnson et al., 2011), something that they should delve into, as it is a necessity in Physics PS.

2.2 Perceptions of Students in the Field of Physics

The study of the perceptions of students in the field of Physics is not extensively investigated, being important, because according to Irving and Sayre (2015), future Physicists are characterized by their commitment to research and their interest in a profound understanding of physical phenomena (In). In this sense, perceptions of students are an essential element to consider, as they define the rules of interaction in a classroom largely (Turpen & Finkelstein, 2010).

Prosser et al.. (1996) determined that students perceive learning Physics as hard work (Ef), but unrelated to how to study the subject matter, seeking reproduction before understanding (Un) and combining interest in Physics (In) with success, but for others, not for oneself. In a study of perceptions with secondary school students developed by Fonseca and Conboy (2006), it was found that the major factors of failure were quality of teaching and previous student, as well as the low expectations of success and preparation for life. For their part, Silva et al. (2019) have shown that Hooke’s law is an adequate opportunity to introduce secondary school students to true investigative methodologies in the classroom, instead of more practical ways (sequence of steps), in tune with PISA level 6.

2.3 SEM for PS in Mathematics, Science, and Technology

The following paragraphs will deal with studies that used SEM for PS in three specific domains: Mathematics, Science, and Technology. In Mathematics, SEM has been used to test the predictive and mediating role that self-efficacy beliefs play in PS, and supports the role of self-efficacy hypothesized in social cognitive theory (Pajares, 1996). It has also been used to formulate and validate four cognitive processes in PS—editing the problem, filtering important and critical information, comprehending the structural relationships in quantitative information (Pa), and translating the quantitative information (Ca) from one mode to another (Pittalis et al., 2004). In geometry, SEM has been used to examine the effects of knowledge about cognition and of the regulation of cognition on declarative, conditional, and procedural knowledge, finding that the strongest direct effect comes from knowledge about cognition (Aydin, 2007). Other applications have been: to examine the effects of two psychosocial features of the classroom environment—teacher support and personal relevance—on undergraduates’ academic self-efficacy and enjoyment of mathematics classes, finding teacher support and personal relevance to be influential (Aldridge et al., 2012); to examine and validate the explanatory and predictive relationships among the variables Mathematical Problem-Solving and Reasoning Skills, Sources of Mathematics Self-Efficacy, Spatial Ability, and Mathematics Achievements (Yurt & Sünbül, 2014); to determine the effects of mathematical PS skills (Ca) and prior mathematics knowledge on mathematics achievement (Kn) and self-efficacy (Kamalimoghaddam et al., 2016); to analyse the relationships between executive functions, IQ, and math abilities, finding that working memory (Me) and age predict number production and mental calculus, and shifting and sex predict arithmetical PS (Arán & Richaud, 2017); and to suggest that structured, semi-structured, and free problem-posing activities improve pupils’ PS skills and metacognitive awareness (Akben, 2018). In summary, most of the previous studies have a predictive character in the performance of the student in PS based on different variables: self-efficacy, cognitive processes (Pa), teacher support, personal relevance, and prior mathematics knowledge (Kn).

In Science in general, SEM has been used to study the relationships between motivational beliefs, metacognitive strategies, and effort regulation (Ef) (Sungur, 2007); to explore the connections between new knowledge (Kn) and meaningful learning in Ausubel and Novak’s construct (Brandriet et al., 2013); to assess pupils’ understanding of the particulate nature of matter (Kn), and its collective properties and physical changes (Stamovlasis et al., 2012); to indicate that a motivating science class (Cl) and a family who has positive attitudes towards science and are to some degree engaged with science may influence the pupils to adopt deeper approaches to science learning (Soltani, 2018); to suggest that pupil cohesiveness, inquiry, and task orientation (Pa) are the most influential predictors of pupil motivation (In) and self-regulation in science learning (Velayutham & Aldridge, 2013); and to develop models describing academic self-concept in science (Hardy, 2014). In summary, it is mostly used to modeling some motivational aspects (In) and their relationship with science learning.

Finally, in Technology, SEM has been applied to analyse the antecedents affecting pupils’ Web-based PS performance, finding that the task-technology fit could be the main factor affecting the pupils’ intention to learn from the Web, (Kuo & Hwang, 2015); and to study the theoretical framework of technological, pedagogical, and content knowledge (TPACK), finding that pedagogy is TPACK’s core element (Celik et al., 2014). That is, for predicting the Web-based PS performance and theoretical constructs about knowledge.

Concluding this review, basically the majority use the SEM in predicting the performance of the student in PS through motivational aspects (In), but there are aspects related to how the students perceive some of these variables in the PS, so we believe it is important to fill this gap, since the existence of possible latent variables, together with their predictive nature, can help us to improve their skills in SP.

2.4 Gender

Neither has the role of gender and motivation in PS tasks gone unnoticed. The results of a study analysing the relationship between gender and entrepreneurship in these tasks cannot be explained solely by differences in motivation (Haus et al., 2013). The role of gender and emotional intelligence in tests and examinations has also been modeled, with positive predictions that favour females (Austin et al., 2005). With regard to motivation (In), it has been confirmed that pupil self-regulation is a strong predictor of science learning (Velayutham et al., 2012).

2.5 Objective

The main objective of this study was the formulation of a theoretical model for PS in the Physics classroom. Figure 1 shows this model (AMOS© notation), which has been called “Conceptual diagram of the initial Problem-Solving Difficulties Model A”. This model would be subjected to empirical testing (SEM) to check whether it is rejected or not. The variables that make up this model are based on a previous study that will be described in the “Measurement Instruments” subsection. Our main hypothesis is that all these difficulty variables are so closely interrelated that there is only a single latent factor, although there could be internal correlations in aspects not directly measured by the error variables. These errors represent more than just random fluctuations, indicating unmeasured aspects of the variables in our study. In addition, we want to know whether the parameter gender has some type of influence on any of the variables in the study.

Fig. 1
figure 1

Conceptual diagram of the initial Problem-Solving Difficulties Model A

2.6 Design

According to Latorre et al. (1996), this study is an ex-post-facto non-experimental study with a quantitative methodology called SEM, a multivariate quantitative technique used to describe the relationships between observed variables, and which helps the researcher to test or validate a theoretical model for theory testing and extension (Thakkar, 2020).

Convenience sampling was used since the education centres involved formed part of a research project of the Regional Government of Andalusia (Spain) involving action research in the classroom (7 teachers specialising in Physics and Chemistry unrelated to referees). The focus of this project is the use of inquiry-based methods (Windschitl et al., 2008; Bevins & Price, 2016) that carry out practical work with pupils, examining their perceptions which affect PS, and the obstacles, emotions, self-efficacy, and self-concept, under the conceptual umbrella of the European programme “Inquiry-Based Science Education” (European Commission, 2007).

The data collection was associated with that project and lasted four school years (from 2015 to 2019). The classroom inquiry was based on the question “What is the relationship between the length a spring stretches and the force applied?” With this question as basis, the pupils were introduced into a process of inquiry in which the initial ideas are presented in the form of hypotheses, possible designs are put forward, and a consensus is reached between the teacher and the groups in class as to what the optimal design should be. Measurements are made, the results are debated and confronted with each pupil’s initial hypotheses, and a research report is written up. Later, problems are taken on that are more closed—around the particular topic of Hooke’s law. Finally, a written numerical test is given (see Annex 2).

The specified questionnaire is carried out approximately 1 month later. Under the same curriculum every year, there was a total flow of communication between researchers and teachers, with visits to classrooms, although the teaching strategies implemented by teachers could vary. We used the research report on Hooke’s law to know the perceptions of the students and to discard students with a low level of commitment in the completion of the questionnaire (less than 10 in the sample). Trying to make this investigation as natural as possible, a specific test for Hooke’s law was not considered but was included in the Unit of Forces and Movement, hence the month that passed.

2.7 Participants

The variables under study affect basic aspects of PS in a natural context, associated with school inquiry processes, in which a group of Physics and Chemistry teachers and their pupils took part. The study involved 528 pupils from various public secondary schools from the medium–low social level in the same geographical area of southern Andalusia (Spain). All the pupils were in the 2nd year of Lower Secondary Education (ESO, the acronym in Spain for compulsory secondary education) with an 8% bias in the distribution by gender towards more males, and mostly of 13 or 14 years in age. Table 1 lists the distribution of the participants by age and gender.

Table 1 Distribution by age and gender of the participants

2.8 Measurement Instruments

In the original study, the survey was developed by 15 experienced Physics teachers (referees) and was answered by 419 Spanish students between 18 and 22 years of age. Those referees used interviews with students to verify that the writing and understanding of the items was adequate. This process was verified by our group, selecting three students with different competences in PS in order, through written interviews, to effectively assess that adequacy. A doubly validated and student-centred instrument induced us to use it, although no statistical validation data was reported in the original study (Oñorbe & Sánchez, 1996).

The questionnaire (Table 2) consists of items scored on a Likert scale with a range from 1 (no agreement) to 5 (full agreement).

Table 2 Problem-solving difficulty survey

Another aspect was the breadth of the scale. It has been pointed out that including five to seven values in the scale increases reliability and helps the data adjust to a normal distribution (Matas, 2018).

It was decided to obtain a large sample of questionnaires in order to reduce the impact of any lack of normality (Cupani, 2012), since non-normality was expected given the experience of studies with samples of very similar characteristics (Vázquez-Bernal & Jiménez-Pérez, 2016). The instrument consists of 11 variables that represent different difficulties associated with obstacles of different origins (Annex 1). With regard to the validity and reliability of the instrument, the frequency analysis showed the variables to be skewed towards the upper values of the Likert scale (3 and 4). Given then that a non-normal distribution of the population was to be expected, the non-parametric Kolmogorov–Smirnov test was applied. The null hypothesis (H0) was rejected for all the variables as they gave values below the 5% significance level (p < 0.000). It was therefore assumed that the sample came from a population with the variables non-normally distributed.

McDonald’s omega (McDonald, 1999) is preferred to Cronbach’s alpha when the variables are ordinal or there are fewer than seven eligible alternatives. This statistic gave a value of 0.888, a very satisfactory and robust value in exploratory research (Viladrich et al., 2017). For DeCarlo (1997), Likert scales often have multiple variance problems that affect multivariate kurtosis. The kurtosis had a maximum positive value of 0.479, while negative values ranged from − 0.822 to − 0.180. As a value above 4 would have indicated deviation from normality, it can be argued that there is no substantial kurtosis in the distributions. Taking the variables together, the value of the critical ratio (CR) should be less than or equal to 5, which was met in all cases. The value of the multivariate CR however, 24.462, indicated a departure from normality, so that it was justified to opt for estimation with asymptotically distribution-free (ADF) techniques. In addition, it suggested that the sample size should be at least ten times the number of parameters estimated (i.e., 250, exceeded by our 528 cases).

3 Results

In the “Results” section, we will first address the descriptive statistics in relation to the study variables (difficulties) and the number of latent factors that our model must include. Secondly, we will show the SEM modeling process of the variables and, finally, the results related to gender influence.

Table 3 lists the descriptive statistics comprising the initial exploratory analysis. It can be seen that the perception of the greatest difficulty is that of Complication. In the way the pupils see the problems, in our opinion, this is a didactic obstacle and how the subject is taught in the classroom; perhaps due to the custom of using overly linear solving problems. The least difficulty is Interest (associated with intrinsic motivation, and therefore a personal obstacle). These values coincide with the values assigned in the article by Oñorbe and Sánchez (1996), so they were not surprising. Is that a sort of inherent obstacle or one that the authors think was specific to how the subject was being taught in the participating classrooms?

Table 3 Descriptive statistics of the variables measured

Regarding Interest, we can ask ourselves if the absence of this obstacle is due to the teaching of the topic related to Hooke’s law or it is something more extensive. We are inclined towards the former, since the students were asked to focus on this single topic intensively when taking the questionnaire.

According to Muthén and Kaplan (1992), it is advisable to use polychoric correlation when the univariate distributions of ordinal items are asymmetric (skewed) or have excessive kurtosis, i.e., unless both indices are less than unity in absolute value. In Table 3, one sees that the skewness of the variable Interest exceeds unity, so we decided to work with polychoric correlation to decide which factors to extract and contrast with the original study (Oñorbe & Sánchez, 1996).The polychoric correlation matrix (for lack of space, not specified), computed using Bayesian modal estimation (Choi et al., 2011), presented values between 0.231 and 0.654—values that indicate strong correlation.

To determine the number of factors our model should have, we used the parallel analysis and Velicer’s minimum average partial test (revised). The parallel analysis based on minimum rank factor analysis (Timmerman & Lorenzo-Seva, 2011) advised for the extraction of a single factor, giving a variance explained of 56.3%, with positive matching coefficients (Bartlett’s statistic 0.000010 and KMO test 0.90265). This result was corroborated by the Velicer’s minimum average partial test (with values between 4.6603 and 0.3502), and refutes the five dimensions extracted in the original study. We thus proceeded in accordance with the present outcome.

We decided to apply SEM to confirm the strong correlation of the variables and the extraction of a single factor. In this approach, a model is specified from among all the pairs of variables (Arbuckle, 2011). The analysis was carried out with the AMOS 20© program, obtaining significant correlations (p < 0.01) between all the pairs of variables. Although the degree of significance was consistent with that found in the two previous analyses (parallel and Velicer), this time, the fit of the model was based on ADF techniques. The (standardized and non-standardized) residual covariance matrices were calculated, all presenting values of 0.000, ensuring that there are no significant discrepancies.

3.1 Structural Equation Modeling for Problem-Solving Difficulties: Explanation of the Initial Model

In this subsection, we shall indicate the initial hypotheses in support of the model shown in Fig. 1. The results of these preliminary analyses and the absence in the literature of studies that deepen the perception of the difficulties that adolescent students have in PS in Physics and specifically on Hooke’s law and PS, led us to think that, as a first approximation, the most plausible interpretation would be that there is only one factor underlying the exogenous (observed) variable. We therefore proposed a first-order factor or endogenous (unobserved) variable—Problem Solving Factor—that covers all the indicator variables involved in the process. Each indicator variable has a dependence on a variable termed error, designed to represent not just random fluctuations but also aspects not measured in the study on which the indicator variable could depend, assigned the arbitrary value 1 to avoid problems of identification (an indirect way of choosing a unit of measurement for the error).

While maximum likelihood (ML) estimation is efficient and unbiased when multivariate normality assumptions are met, this is precisely what our sample suffers from. In these circumstances, ADF techniques provide more benefits provided the sample is sufficiently large, as explained above. The results indicated that model A has to be rejected because the value of chi-squared (χ2) of 164.99 for a distribution with 44 degrees of freedom (df) corresponds to a probability level less than the 0.05 threshold, thus rejecting the null hypothesis that the data of the sample coincide with those of the population from which it is drawn. Nonetheless, although model A is not accepted (probability level 0.000 and a χ2/df ratio of 3.745), it is in terms that can be improved. We shall only present visually the standardized results for Model A (Fig. 2).

Fig. 2
figure 2

Standardized outcomes for Model A (ADF estimation)

In summary, Fig. 2 shows that, for example, for the variable Understanding, the value 0.71 is the Standardised Regression Weight and expresses that when PS factor goes up by 1 standard deviation, Understanding goes up by 0.71 standard deviation. On the other hand, the value 0.51 represents the Multiple Squared Correlation (i.e., the predictors of Understanding explain 51% of its variance). In other words, the variance of the Understanding error is approximately 49% of the variance of Understanding itself. Values range reasonably well.

The first step in trying to reset the model is assessment using modification indices. We decided to construct and test a new Model B, including only those covariances which are significant and have theoretical justification (the constraints). These were error1 (Ap)–error2 (Un), error3 (Co)–error4 (Kn), error6 (Pa)–error7 (Cl), error7 (Cl)–error8 (Me), error8 (Me)–error9 (Ef), error9 (Ef)–error10 (In), and error8 (Me)–error11 (Ca). The previous justifications are supported by the original study by Oñorbe and Sánchez (1996), based on the factor analysis carried out by the authors, in which a set of underlying factors were found: Interest–Effort, on the one hand, and Calculus–Memory–Classroom–Pathway, on the other. Moreover, some works (Pozo & Gómez, 1998; Lee et al., 2008; Quílez, 2019), already link the Application–Comprehension association, which involves deep learning, its application and the scientific terminology used, especially in students with little experience in PS. Meanwhile, other works (Alpaslan et al., 2017; Jonassen & Hung, 2015; Yerushalmi & Magen, 2006), link between Complication–Knowledge, to the extent that students’ perceptions are compatible with naïve epistemology without contextual appropriateness and linear problem-solving.

Delving into these previous ideas, the pupils’ perceptions of their PS difficulties which are related to the Application and Understanding of these problems likely share hidden variables not measured in this study—in particular, the content (Hooke’s law) has to be clearly understood for it to be applied in the different contexts in which the problems were set. The case is similar for the pupils’ Knowledge of the concept and their perception of its Complication, which one would clearly expect to share some hidden variable. The procedure to follow in a problem’s resolution (Path) is mediated by the perception of what is done in class (Classroom), hence the covariances shown. Memory plays a central role in covariance errors between the Effort needed for resolution, the Calculations that must be done, and the type of activities that are carried out in class (Classroom), and we found that the unmeasured variables in these processes share similar origins or causes. Finally, Effort and Interest showed error covariances, as it would seem reasonable since, if an effort is made, then there have been a minimum of motivation, so that again there might be shared hidden variables involved not measured in this study (Model B, Fig. 3).

Fig. 3
figure 3

Conceptual diagram of the initial Problem-Solving Difficulties Model B

In light of the results presented in Table 4, the introduction of restrictions led to improvement and settling of the model, since the model B has a percentage chance of 6.1%, above the minimum of 5% that is not rejected. We must remember that the higher the probability value associated with χ2, the closer the fit between the model under hypothesis (model A or B, assumed true H0) and the perfect fit (Bollen, 1998). That is, we accept the null hypothesis that the sample data coincide with the population from which it comes. In this sense, the probability level being greater than 0.05 shows that model B is accepted (0.061).

Table 4 Comparative summary of the goodness-of-fit indices for Models A and B

In conclusion, all the indices evolved positively (see evolution in Table 4), validating the initial results and confirming that Model B fits the data well (Jöreskog & Sörbom, 1993; Hu & Bentler 1995; Arbuckle, 2007). Figure 4 visually presents the standardised results of model B. It can be seen that, for the variable Comprehension, the values are somewhat lower than those shown in Fig. 2 (0.71 and 0.51 respectively). The value between error1 and error2 indicates that 0.17 is the estimated correlation between error1 and error2. Overall, the statistical values are acceptable. The study of the total effects (direct and indirect effects of any given variable on others) showed that there were no inverse effects, although the direct standardized values of two variables, Memory and Calculations, were below 0.5, i.e., 0.434 and 0.472, respectively. Table 5 lists the values of the composite reliability (CRe) and the average variance extracted (AVE) calculated following Fornell and Larcker (1981). Those authors indicate recommended values of 0.7 and 0.5 for CRe and AVE, respectively, so that a good value was obtained for CRe (0.93), but that for AVE (0.46) is somewhat low. Since this is a new study, the latter value can be taken as acceptable with precaution.

Fig. 4
figure 4

Standardized outcomes for Model B (ADF estimation)

Table 5 The CRe and AVE values for the reliability and validity of the (in-house developed) instrument of measurement (Model B—LISREL Notation for columns 1 to 4)

3.2 Gender Influence

As the gender data were available and there was a fairly balanced proportion in the gender of the pupils, we decided to check whether there were any statistically significant differences and, if so, the corresponding statistical powers and effect sizes. The results are presented in Table 6. The female pupils gave higher scores to their perceptions of the variables Complication, Understanding, Classroom, and Memory—a combination of didactic and psychological obstacles (see Annex 1). Nonetheless, although the statistical powers were medium–high, the effect sizes were small. A plausible explanation for these differences is that the three didactic variables are extrinsic to the learner, and the underlying obstacles are associated with the way in which the teacher develops PS in class (Un), with the kind of interactions that are established in the classroom (Cl), and how the female pupils perceive the whole process (Co); intimate aspects related to the social environment that students perceive and to which they may plausibly be more sensitive, influencing their self-efficacy. Memory (Me), for its part, may not be unaffected by this amalgam of obstacles, as it is directly dependent on the task and cognitive overload.

Table 6 Descriptive statistics, Mann–Whitney U test, statistical power, and effect size of the differences in the female and male pupils’ perceptions

4 Discussion

The present study has consisted of a first part directed at using descriptive and multivariate tools to explore the viability of modeling a construct that takes into account pupils’ perception of PS in the Physics classroom. For some authors (Castro & Lizasoain, 2012), when such an approach is linked to SEM, it can be a powerful tool at the service of educational research, but it has to rely on a solid theoretical base.

In the second part, a model was presented, in which a single first-order factor accounts for the closely correlated set of observed variables. Kessels et al. (2006) found that the explicit attitudes of students about Physics in general, were predicted by three factors: Difficulty, Masculinity, and Heteronomy. However, we note that our study is related to only topic and PS, assuming that student perceptions may be dynamic.

Classical studies of the topic in the scientific literature (Schoenfeld, 1982) have shown that experts perceive the “deep structure” of problems, and implement much better metacognitive strategies than novices that prevent them from wasting their PS resources. As some meta-analyses have shown (Peltier & Vannest, 2017), these skills can be improved through instruction. In the present study, it is therefore interesting to find that this factor intertwines variables concerning knowledge and its application, the way to solve problems, the understanding of the tasks involved, and how the classroom is perceived and its suitability for learning, together with the pupil’s own confidence.

From the point of view of pedagogical potential, this study shows that PS must be approached from a holistic perspective in the classroom by teachers, since students perceive it that way. Different difficulties converge, intertwined in their perceptions, which imply obstacles of a different nature (see Annex 1), but which has the perception of the initial task (Complication—didactic obstacle) as something remarkable, and the Memory with an influence is yet to be determined.

On the one hand, the influence of the class environment on PS has been highlighted in other modeling studies, since it is a variable that indirectly influences other variables (Wüstenberg et al., 2016), and this is coherent with certain aspects of our model. On the other hand, learning opportunities have a strong dependence on the specific topic chosen (Hooke’s law in the present case) which, according to some studies (Kang et al., 2016), must be firmly contextualized to provide such opportunities in PS contexts. As Aramendi et al. (2018) note, pupils’ attitudes towards learning improve when the search for and management of information are encouraged, and inquiry processes linked to everyday life are developed.

A study carried out by Eseryel et al. (2014) revealed the solid link between motivation and the rest of the variables. According to Abeysekera and Dawson (2015), this motivation to commit oneself to working on PS is reinforced when one participates actively in the creation and dissemination of knowledge. This predictive power of motivation is evident even at early ages (Mercader et al., 2017) and, to a large extent, motivation impregnates personal and life satisfaction (López-Cassá et al., 2018).

Pupils perceive a close relationship between the role of memory and the mathematical calculations involved in PS. Some authors (Ramirez et al., 2016), in modeling mathematical anxiety and working memory capacity, have found that pupils with greater cognitive capacity avoid using advanced PS strategies when they are very anxious about mathematics, and as a result perform more poorly. It is still a matter of debate how this could be improved through education (Cowan, 2014). In any case, there is agreement that specific activities focused on metacognition and working memory can contribute to improving arithmetic performance in PS (Cornoldi et al., 2015).

An enormous number of variables may be involved in PS since, from the solver’s perspective, there may be many types of obstacles in the way of finding the path to a solution—personal, psychological, didactic, and epistemological. For instance, in a study deriving from PISA 2012 with 85,714 participants, Philpot et al. (2017) found that reasoning skills were essential to solve complex problems. We have opted for a more pupil-centred vision, but it seems that certain findings are shared, as is reflected in the correlations between variables that are hidden in the model. Even in apparent basic phenomena, such as Hooke’s law, there are simultaneous complex causal models which, according to Perkins and Grotzer (2005), are related to four categories—mechanism, interaction pattern, probability, and agency. Recently, problem-solving experiments in Physics in collaborative groups of students, show that students design, share, rethink, and evaluate their thought experiments, highlighting the importance of conceptual understanding, past–daily experience, logical reasoning, and conceptual–logical inference (Bancong & Song, 2020). This could explain the close relationship we found between the variables of the model and the error covariances: Memory (Me) as an obstacle binder, which would imply the redesign of problems (Cook, 2006), and the improvement of understanding (Un) through shared learning (Co), to facilitate the learning of mental models of Hooke’s law and its application (Ap), in our case to Elasticity (Batlolona et al., 2020).Therefore, we sincerely believe that this pupil-centred view can complement other perspectives.

As we have seen, the study of students’ perceptions of PS and its difficulties are not common in the field of Physics, limiting itself to finding links between memory, calculations, and anxiety. Our study delves into these difficulties from the point of view of the obstacles (Bachelard, 1983), contributing that these perceptions of the difficulties do so holistically and that attention should be paid to the underlying didactic obstacles. This is important, because very recently, Dávila-Acedo et al. (2021) have found that in young students, the study of Forces, among other topics, mostly experience negative emotions, possibly because solving Physics problems causes boredom, nervousness, and worry. Furthermore, we know that negative emotions are a good predictor of self-efficacy (Akin & Kurbanoglu, 2011). Based on these results, more studies on student perceptions are needed.

There are more than 11 variables in the student’s perceptions of difficulties in SP we are aware of. A recurring variable that appears in the examined literature is self-efficacy, and that, due to complexity, as we will see at the end, deserves another separate construct. Our approach is quite unusual in the literature and is tentative in nature. In addition, these difficulties arise from classroom practice via teachers, at a critical age when students are introduced to PS and which will determine, to a large extent, their liking or rejecting physics.

After these considerations, an important objection persists; if students’ perceptions are changing, can we really develop a static, quantitate model of them? At present, we have concluded another investigation on PS, with students of the same age range, but on another topic (Ponderal Laws and Stoichiometry), within Chemistry. It will be interesting to see how much dynamic there is in the model and what remains static, that is, how the content and topic influence the students’ perceptions of PS.

Finally, with regard to gender as a discriminatory variable, although we found differences in some of the endogenous variables, those differences only had a small effect size. It is interesting to note, however, that the underlying obstacles were preponderantly didactic in nature. This finding is compatible with studies that have looked at the importance of self-efficacy. In one (Marshman et al., 2018), it was found that female pupils’ lower self-efficacy compared with similarly performing male pupils can result in detrimental short-term and long-term impacts. In another (Short-Meyerson et al., 2016), there was found to be an influence of the parent–child interactions, with more encouragement being given to boys than to girls. There is clearly a need for further research in this area.

5 Limitations

The main limitation of the study is that the modeling is formulated from a confirmatory perspective when there is no robust supporting theoretical basis. Nevertheless, to the best of the authors’ knowledge, there have as yet been no holistic approaches to studying secondary education pupils’ PS difficulties in science. Hence, we sincerely believe that the proposal to consider all the variables of the study interrelated in a single factor, with covariances between hidden factors (errors) as constraints, constitutes a plausible starting point for future developments. It should help to be able to start with a set of strongly related variables from which to delve into the model’s internal structure in search of latent structures supported by theory. We point out that some of these restraints imply, with a fine-grained vision, a role relevant to Memory (cognitive overload–psychological obstacle) and an opportunity for further research.

There are currently calls for the reliability of statistical studies in education to be improved. This is one of the purposes of using the statistical power or probability (1 − β) of not committing type II errors (Aron et al., 2013; Kraemer & Blasey, 2015). For example, in the next model, we computed Model C; the RMSEA limits were LO 90 = 0.000 and HI 90 = 0.044, and, considering a value of α of 0.05, our study sample size (N = 528), and the degrees of freedom (df = 38), we get 92.51% for the power—a fairly positive value (Preacher & Coffman, 2006).

The imposition of constraints on the variances and covariances can affect errors and residues, and is undesirable as it distorts the model (Hoyle, 2011). They can be accepted, however, especially in emerging studies, as long as they are theoretically justified. On the other hand, we believe that the choice of a 5-point Likert scale was positive, even though it meant that the concomitant problems of normality required that ADF techniques be used, and the samples that are necessary for them are somewhat complicated to obtain in educational settings that demand strong and sustained commitment over the years on the part of the teachers involved who work with a particular set of methods.

Another limitation is the somewhat low value of the AVE (Table 5), as against the high value of CRe. This disadvantage may come from the fact that AVE only explains all the variables as a group. West et al. (2012) argue that a model must have the results of research as its basis, capturing the features of interest and having a substantive theoretical underpinning (Montesinos & Backhoff, 2010). As required by Kline (2012), the assumptions made appear to be plausible, and the findings we present appear to be within what the scientific literature considers acceptable in PS. About the near future, our intention is to use this PS model, together with a construct both already completed (Vázquez-Bernal & Jiménez-Pérez, 2016) and others still under development (Emotions, Self-regulation / Self-efficacy,…), to predict pupil performance in PS, whose data are currently available.