1 Introduction

Taking students to observe volcanic activity would be an ideal learning scenario, but it is far too dangerous and costly. The study of concepts and contexts not visible in the classroom presents a significant challenge (Cheong et al., 2022), with volcanism being a prime example. Wellington and Ireson (2018) emphasise the importance of using Information and Communication Technologies (ICT) in the classroom, particularly in these learning situations. Cheng and Tsai (2013) highlight the usefulness of augmented reality (AR) in developing spatial abilities, practical skills, and understanding of concepts in science education, a view supported by Robles and Duarte-Hueros (2023). AR can also aid in the preservation of geosites under severe touristic pressure, such as the Tagoro volcano in Spain (Vegas et al., 2023). Toledo and Garcia (2017) note that AR is a transformative technological advance in education, allowing the creation of interactive, three-dimensional content while keeping students safe.

However, research over the past 30 years indicates that technology alone does not significantly impact students’ academic outcomes (Hattie, 2009, 2023; Mayer, 2009, 2014; Law & Heintz, 2021; Mayer & Fiorella, 2022, Wen et al., 2023). Significant effects are observed only when specific teaching strategies are integrated with technology (Clark, 1994; Herrington & Kervin, 2007; Miranda, 1998; Pea & Kurland, 1984, Chiang et al., 2014).

This study analyses the academic performance of students from four 7th-year basic education (BE) classes and two 10th-year secondary education (SE) classes in the thematic unit of volcanism. The main objective is to determine whether the drawing principle, when combined with AR, produces significant learning improvements compared to conventional strategies. Previous studies have shown AR to be a motivating factor (Padeste et al., 2020; De Lima et al., 2022) and to enhance learning (Ferrari et al., 2024; Gnidovec et al., 2020; Gulboy & Denizli-Gulboy, 2024; López-Cortés et al., 2021; Moser & Lewalter, 2024; Sahin & Yilmaz, 2020; Weng et al., 2020). This is an important aspect that our research tried to address in light of the research carried out and published to date. This does not mean that researchers have not used certain pedagogical strategies associated with the use of AR, but they have not been described, much less empirically tested. The majority of studies analysed in two systematic literature reviews, one with a meta-analysis (Faria & Miranda, 2023, 2024a) referred only to the use of AR in conventional teaching. And even in this aspect, they did not mention what conventional teaching consisted of.

The Generative Drawing Principle, as reviewed by van Meter and Garner (2005), is comparable to strategies like summarising, self-questioning, or activating prior knowledge. These authors explain, “…drawing involves constructive learning processes that engage nonverbal representational modalities and requires integration” (p. 288). Leutner and Schmeck (2014, 2022) indicate that “…the generative drawing principle states that asking students to create drawings while reading text causes generative cognitive processing that leads to better learning outcomes” (p. 446, p. 367). Yepes-Serna et al. (2021) add that “Generative drawing involves the creation of sketches of newly acquired subject matter leading to an increased mental effort beneficial to learning” (p.85). Van Meter and Garner (2005) mention that drawing helps students select and organise elements into symbolic representations, connecting them to prior knowledge. Leutner and Schmeck (2014) highlight that drawing from text content encourages deep cognitive and metacognitive engagement, enhancing topic understanding. However, broad drawing tasks can overwhelm students with extrinsic aspects, reducing learning effectiveness (Leutner & Schmeck, 2022; Moser & Lewalter, 2024). These authors suggest starting with small, manageable drawing tasks to avoid cognitive overload.

In this study, AR was used with the drawing principle in a BE experimental group to create diagrams of volcanic apparatus by reading digital manual information, conducting experimental activities, and using two apps: Assemblr EDU: Learn in 3D & AR and Secrets of the Earth AR. The control class followed the same learning dynamics without AR apps. The other two BE classes used conventional teaching methods, with one as the control group and the other as the experimental group using the same apps. In SE classes, the drawing principle was not tested, but the Quiver—3D Coloring App was used with conventional teaching in the experimental class, compared to conventional teaching without AR in the control class.

Conventional teaching in BE involved the “I used to think… now I think” approach (Project Zero, 2024), starting with students sketching their initial knowledge, followed by expository sessions using multimedia materials and the digital manual for exercises. Practical laboratory activities were conducted, with records corrected by teachers, and sessions for doubt clarification and exercise correction. In SE, multimedia presentations were combined with the digital manual and worksheets for recording learned aspects, with pair work and class corrections. Doubt clarification and exercise correction sessions were also held.

This study is organised as follows: the introduction provides an analysis of related studies on AR usage within the theoretical background, followed by the problem statement, research questions, and hypotheses. The methodology section is next, followed by the results of hypothesis testing. The conclusion addresses limitations and future research possibilities.

1.1 Theoretical background, problem, research questions and hypotheses

Augmented Reality (AR) technology is referred to in many studies as an added value associated with education and teaching strategies (Marshall, 2023, Avila-Garzon et al., 2021; Arici & Yilmaz, 2022). The advantages highlighted include: (i) in students’ visuospatial processes (Fleck et al., 2014; Hung et al., 2016; Say et al., 2017; Layona et al., 2018; Herpich et al., 2021; and Ferrari et al., 2024), (ii) in improving the relationship between theory and understanding of the non-visible (Dunleavy, 2014; Martínez et al., 2016; Salmi et al., 2017; and López, 2018), (iii) in better learning performance (Stoyanova et al., 2015; Gopalan et al., 2016; Karagozlu et al., 2017; Morales & García, 2017; Layona et al., 2018; Gnidovec et al., 2020; Sahin & Yilmaz, 2020; Petrov et al., 2020; Weng et al., 2020; Arici et al., 2021; Keçeci et al., 2021; López-Cortés et al., 2021; Faria & Miranda, 2024a2024b); (iv) among other aspects such as increasing motivation, promoting collaboration and student well-being, and favouring the BYOD (bring your own device) policy (LaPalnte et al., 2017; Fokides & Mastrokoukou, 2018; Chen et al., 2019; Faria & Miranda, 2024a). Reilly and Dede (2019) highlight that the use of AR facilitates group work as it promotes active and interventional collaboration of students in problem-solving, adding that there is already evidence of more focused attention, leading to significant learning. Osuna et al. (2016) conclude that the integration of AR in teaching is facilitated because most students have mobile devices and know how to install apps. Moser and Lewalter (2024) in a study with future STEM teachers analysed learning about the cardiovascular system, combining AR with the self-explanation learning strategy, AR with the drawing learning strategy, and the isolated use of AR. The authors concluded that students achieved better knowledge outcomes with the self-explanation learning strategy combined with AR, compared to the other options, with the worst results when only AR was used. Although many of these studies do not analyse the methodology and teaching strategies applied when using AR, as previously mentioned, these are fundamental to verify positive effects on the studied variables. The teaching strategies that in this study were associated with the use of AR in the study of volcanism with BE students (7th-year) were the drawing principle and conventional teaching, and in SE only conventional teaching, as previously noted.

The research problem consisted of developing learning environments that used a conventional teaching strategy, and another called the drawing principle, with and without AR, for teaching volcanism in the 7th and 10th-years and determining its effects on students’ academic performance. An experimental method with a quasi-experimental design was used. The independent variable (IV) was the teaching strategy associated with the use of AR in the study of volcanism, which took on two values: the use of AR associated with conventional teaching, commonly used by teachers at the school where the study was developed, and the use of AR associated with the drawing principle. The Dependent Variable (DV) was the students’ academic performance results, measured by tests constructed for this purpose by the teachers, which are analysed later. The choice was made for students in the 7th and 10th-years, as volcanism is common to both curricula, and it is also intended to provide support for the efficient use of AR technology in the classroom context. The teaching in the 7th-year was carried out by two different teachers, the same having happened in the 10th-year, with one of these classes being taught by the researcher. In the control of extraneous variables, care was taken to train the teachers on the AR apps used and to follow the same procedures in all classes, facilitated by the use of worksheets that ensured the same learning sequences. The experimental and control classes were chosen at random, as it is not possible in a school context to randomly choose the participants of each class.

Four research questions were defined to specify the initial problem, which turned into conceptual and operational hypotheses. We present below the questions and the conceptual hypotheses derived from each research question. The operational hypotheses consisted of operationalizing the conceptual hypotheses for both the IV and the DV. The IV was operationalized in four ways in the 7th-year: conventional strategy vs. drawing principle with and without augmented reality for each of these strategies; and in two ways in the 10th-year: conventional teaching strategy without and with augmented reality. The DV was measured using knowledge assessment tests developed by teachers, based on Bloom’s cognitive taxonomy (1956) and the difficulty of the items determined by the researcher, using classical test theory (Lord & Novick, 1968).

1.1.1 Research Question 1 (Q1)

Are there differences in the academic performance of 7th-year students taught with a conventional strategy associated with certain AR apps compared to students who only use the conventional strategy?

1.1.2 Conceptual Hypothesis 1 (HC1)

There is a positive difference in the academic performance of students taught with the conventional method associated with AR compared to students who only use the conventional method.

1.1.3 Research Question 2 (Q2)

Is there a difference in the academic performance of 7th-year students taught with the drawing principle strategy when associated with augmented reality compared to students who only use the drawing principle?

1.1.4 Conceptual Hypothesis 2 (HC2)

There is a significant difference in the academic performance of 7th-year students when taught with the drawing principle strategy associated with AR compared to students who only use the drawing principle.

1.1.5 Research Question 3 (Q3)

Can we consider there to be a more favourable teaching strategy for the integration of augmented reality in the teaching of volcanism in the 7th-year?

1.1.6 Conceptual Hypothesis 3 (HC3)

There is a more favourable teaching strategy for the integration of AR in the teaching of volcanism in the 7th-year.

1.1.7 Research Question 4 (Q4)

Is there a difference in the academic performance of 10th-year students taught with a conventional strategy when associated with augmented reality compared to students who only use the conventional strategy?

1.1.8 Conceptual Hypothesis 4 (HC4)

There is a significant difference in the academic performance of 10th-year students when taught with the conventional strategy associated with AR compared to students who only use the conventional method.

2 Methodology

To achieve the objectives and test the formulated hypotheses, the BE classes were organized into two sets, each with two classes (one control and one experimental). In one of the sets, one class used the conventional teaching strategy (Control Group 1—CG1) and the other the conventional teaching associated with AR (Experimental Group 1—EG1); in the other set of classes, one conducted the teaching applying the drawing principle (CG2) and the other with the application of the drawing principle associated with AR (EG2). In the SE, one class conducted the unit with conventional teaching (CG), and the other class with conventional teaching associated with AR (EG), but without the use of the drawing principle. In this case, it was wanted to test whether conventional teaching associated with the use of an AR app for teaching volcanism produced better results.

2.1 Participants

In the current study, students from two academic years were involved, where the theme of volcanism was taught, although it is more developed in the context of the 10th-year of schooling (secondary education), compared to the 7th-year (basic education). In basic education (7th-year), with the express authorization of the guardians, it was possible to obtain and analyse valid data from 81 students, with an average age of 12 years, 44 (54%) of whom were male and 37 (46%) females. In secondary education (10th-year), also with the express authorization of the guardians, it was possible to collect and analyse valid data from 56 students, with an average age of 15 years, 33 (59%) of whom were male and 23 (41%) females, as can be seen in Table 1.

Table 1 Distribution of students from the 7th and 10th-years of the CG and EG by sex and age

2.2 Design and research procedure

An experimental method with a quasi-experimental design was applied, as already mentioned. Cohen et al. (2018) indicate that a truly experimental design occurs in a laboratory, with an artificial environment. Thus, the same authors refer to the quasi-experimental design as one where it is not possible to control all aspects of the truly experimental design. These authors cite Kerlinger (1970) to clarify that this is a design widely applied in an educational context due to the difficulty in constituting random samples. Tuckman (19992012) clearly explains this and other constraints in school institutions, indicating that the educational system itself does not allow for the dissolution of classes and their random reorganisation. These are aspects that are understood as inherent to the very nature of the institutions and their functioning. Almeida and Freire (2007) indicate that in the quasi-experimental plan, “some extraneous variables that converge or may converge with the independent variable in explaining the results are still not controlled” (p. 102). Campbell and Stanley (cited by Moreira, et al., 2021) make it clear that these studies lack complete control of variables and random selection of subjects. These aspects occur because it is not possible to randomise students in classes, but only the randomness of the control and experimental classes, given that the classes are defined by the school’s management. It also involves ethical questions.

Regarding the evaluation of the effects of the developed learning environments on the student’s knowledge, all classes were subjected to a post-test and a follow-up test. These tests were constructed by the teachers, integrating items at three levels of the cognitive domain of Bloom’s taxonomy. A first level designated elementary (E) that appeals to memorisation and reproduction of knowledge, a second level designated intermediate (I) that appeals to understanding, interpretation, and application of knowledge to routine situations, and a third level designated complex (C) that appeals to argumentation, value judgment formulation, application of knowledge to new situations, and problem-solving strategies. Table 2 presents an example of an item from each category intended for students in the 7th and 10th-years.

Table 2 Example of an item from each category applied to 7th and 10th-years

In the BE (7th-year), two classes followed the conventional teaching strategy (with and without the use of AR) and two others followed the drawing principle (with and without the use of AR). In SE (10th-year), the conventional strategy (with and without the use of AR) was followed for teaching the volcanism unit, as shown in Table 3.

Table 3 Teaching strategies used in the control and experimental groups

In the BE, the volcanism unit was carried out using conventional teaching and drawing principle, with and without AR, and was organised as shown in Table 4.

Table 4 Planning the volcanism unit at BE

In the SE, conventional teaching was planned as shown in Table 5. The experimental group used the AR application Quiver—3D Coloring App, and the students painted an image of a volcano that served as a marker to trigger AR. The students in the control group, without AR, watched a video about explosive volcanic activity and then painted and assembled a three-dimensional volcanic cone, provided on paper.

Table 5 Planning the volcanism unit at SE

2.3 Measurements and analysis

The results are presented below, starting with a descriptive analysis per item in each test applied (post-test and follow-up test) using Classical Test Theory to determine the difficulty of the item and its discriminatory power. The results obtained in the 7th-year classes are analysed first, followed by the 10th-year classes. This is followed by descriptive and inferential statistics of the results obtained by the classes in the post-tests and follow-up tests, firstly for the paired classes/groups, i.e. the results obtained by the students in each class in the post-test compared with the results of the students in the same class in the follow-up tests, and then for the unpaired classes/groups, i.e. the results in the post-tests and follow-up tests are compared between the classes. To determine which statistics to use for hypothesis testing, the existence of normality was determined using the Shapiro–Wilk test, and the homogeneity of variance was determined using the Levene test. Once the existence of normality and homogeneity of variance had been verified, the Student's parametric t-test was applied.

2.3.1 Analysis of the items in the BE assessment tests

In this study, knowledge was assessed in a post-test in all the classes, both in BE and SE, and after three weeks in BE and two months in SE, a new assessment was made with a follow-up test to check the consistency of the knowledge acquired over time. The items were integrated into three levels of the cognitive domain of Bloom's taxonomy, as already mentioned. Within the framework of Classical Test Theory (Sartes & Souza-Formigoni, 2013), the difficulty of the item was determined, with values ranging from 0 for a not at all accessible item to 1 for a very accessible item, and the discrimination parameter, with values ranging from -1 to + 1. In the BE classes, the first stage, known as the assessment form, consisted of an instrument that only assessed volcanism as the content taught; in the second stage (follow up test) the content was assessed in an autonomous group within the test, but which included other content taught. Table 6 analyses the students whose content was taught using conventional teaching (with and without AR) and Table 7 uses the drawing principle in the post-test (known as the evaluation form).

Table 6 Descriptive statistics for the items on the evaluation form (post-test) in the 7th-year classes with conventional teaching
Table 7 Descriptive statistics for the items on the evaluation form (post-test) in the 7th-year classes that used the drawing principle (with and without AR)

With regard to Research Question 1 (Q1), Table 6 shows that there is no advantage in the students' academic performance that translates into better results in EG1, where AR was used. It is possible to observe better results in the post-test performance of students with the conventional teaching strategy when they don't use AR (CG1). This could mean that the use of AR with 7th-year students (an average age of 12) led to a higher mental effort which translated into worse academic performance. This result is congruent with findings from other studies that point to cognitive overload in students with the use of AR (Akçayır & Akçayır, 2017; Dunleavy & Dede, 2014; Dunleavy et al., 2009).

Regarding Research Question 2 (Q2), the results from Table 7 do not show clear evidence of benefit in the academic performance of students who used the drawing principle associated with AR (EG2) compared to those who did not use AR (CG2). However, the item of cognitive level C (which called for argumentation, value judgement formulation, and application of knowledge to new situations) indicates a better academic performance for EG2, which integrated AR in learning with the drawing principle at a higher cognitive level. These results positively respond to Q2 of the research and part of the formulated research problem.

In the evaluation items (post-test) of basic education, higher item difficulty values were observed in the classes that used the drawing principle strategy, while in the conventional teaching classes, this was only noted in items 2 (0.60 in the experimental group and 0.89 in the control group) and 3 (0.75 in the experimental group and 0.74 in the control group). It was found that in the classes that used conventional teaching, academic performance was better in CG1, which did not use AR, compared to the class that used the same strategy associated with the use of AR (EG1). In items I (which called for understanding, interpretation, and application of knowledge to routine situations) and C (which called for argumentation, value judgement formulation, application of knowledge to new situations, and problem-solving strategies), this difference is more pronounced in favour of the class that did not use AR (CG1). The comparative analysis of the average values in the knowledge assessment of students by item in the post-test, illustrated in Fig. 1, shows that in almost all cases the best performance occurs when the drawing principle strategy is used (with and without AR), compared to conventional teaching, which reinforces the positive response to the formulated Research Question 2 (Q2).

Fig. 1
figure 1

Comparison of average results per post-test item for conventional and drawing principle teaching

Tables 8 and 9 analyse the results of the follow up test for students in conventional education (with and without AR) and for students who used the drawing principle (with and without AR), respectively.

Table 8 Descriptive statistics for the follow up test items in the 7th-year groups with conventional education
Table 9 Descriptive statistics for the follow up test items in the 7th-year groups that used the drawing principle

Analysing Table 8 shows a smaller difference between the groups, which means equivalent retention with the conventional strategy with or without integrated AR. However, there is still a better performance in item C (which calls for argumentation, formulating value judgements, applying knowledge to new situations and problem-solving strategies) in Bloom's taxonomy, in favour of the students who didn't use AR (CG1). It can be said that in the conventional strategy without AR, CG1 students show better retention and application in higher cognitive level items. We have already mentioned a possible explanation for these results.

Analysing Table 9, the follow-up test with the drawing principle strategy (with and without AR), shows better performance on item C, which indicates that when this teaching strategy is used without AR, the students (CG2) show better retention and application on items of a higher cognitive level.

In the follow-up test, administered three weeks after the post-test, the difficulty values of the items are very close between groups for each of the learning strategies used. Items 3 to 6 are the ones with the highest difficulty values, and the average values of the results obtained are very similar. This assessment of students ‘ knowledge does not allow us to consider that the integration of AR associated with the drawing principle teaching strategy (EG1) is an aspect that favours effective learning in terms of long-term memory, as can be seen in Fig. 2. This partly proves one of the research hypotheses formulated, which states that certain teaching strategies are as important as or more important for students’ learning than the use of technology per se.

Fig. 2
figure 2

Comparison of the average results per item of the follow-up test for conventional and drawing principle teaching

2.3.2 Analysis of the items in the SE assessment tests

In the SE, conventional teaching was used with AR in the EG and without AR in the CG, and knowledge of volcanism was assessed. Table 10 analyses the items for the post-test and Table 11 the items for the follow up test which was applied after two months.

Table 10 Descriptive statistics for the test items (post-test) in the 10th-year (conventional education) groups
Table 11 Descriptive statistics for the follow-up test items in the 10th-year (conventional education) groups

In the SE, item 11 (P = 0.00) proved to be the most difficult in the post-test, followed by item 9 (P = 0.23 in the experimental group and P = 0.24 in the control group). Most of the remaining items oscillate between 0.3 and 0.7, which correspond to the recommended range. To answer Research Question 4 (Q4), we found a slightly better academic performance in the post-test in the students who used AR (EG) compared to the students who only used conventional teaching without AR (CG). The results are more favourable to learning when AR is present, although they don't always translate into a significant difference in terms of the average score on the item, as can be seen in Fig. 3. In the SE, the follow-up test showed that item 11 (P = 0.00) was the most difficult, followed by item 9 (P = 0.23 in the experimental group and P = 0.24 in the control group). Most of the remaining items oscillate between 0.3 and 0.7, which correspond to the recommended range. Analysing Table 11, considering Q4, there was better academic performance in the follow-up test (administered two months after the post-test) in the students who used AR (EG) compared to the students who only used conventional teaching without AR (CG), an aspect that reflects more lasting and effective learning. The results are more favourable to learning when AR is present, although they don't always translate into a significant difference in terms of the average score on the item, as can be seen in Fig. 4.

Fig. 3
figure 3

Comparison of the average results per item of the post-test for conventional education in SE

Fig. 4
figure 4

Comparison of the average results per item of the follow-up test for conventional education in ES

2.4 To summarise

This first descriptive analysis of the results obtained in the knowledge assessment tests, although without significant results in favour of the experimental groups, partially answered the problem and research questions formulated. It was found that 7th-year students showed no improvement in knowledge acquired when AR was combined with a conventional teaching strategy and that the drawing principle strategy favours learning when not combined with the use of AR. These results are in line with others achieved previously (Akçayır & Akçayır, 2017), which showed cognitive overload in students with the use of AR. Although the use of AR did not harm the acquisition of knowledge in the short and long term, it did not benefit the students who used it compared to the students who did not use it.

2.4.1 Results analysis at BE

We began by presenting the results of the descriptive statistics and then analysed the results of the inferential statistics, where the hypotheses formulated were tested. Firstly, it was determined whether the samples met two important criteria for knowing which statistics to apply: the Shapiro–Wilk test for normality and the Levene test for homogeneity of variance. Given the results obtained, the parametric Student's t-test was applied for paired groups (when analysing the post-test and follow-up results for the same groups) and the same test for unpaired groups, i.e. between the results of the control and experimental groups. Table 12 shows the descriptive statistics for classes A and C (CG1 and EG1) who used conventional teaching, with and without AR, and for classes B and D (CG2 and EG2) who used the drawing principle, also with and without AR.

Table 12 Descriptive statistics for the 7th-year class groups

Table 12 shows better results in the assessments carried out using the drawing principle strategy, both in the post-test and in the follow-up test, compared to conventional teaching. Comparing the results of the four groups, there was better retention only in the results of the conventional strategy associated with the use of AR (EG1). However, despite this increase in school performance, the results in terms of long-term memory were more favourable when using the drawing principle teaching strategy associated with the use of AR (EG2).

Table 13 shows the values of a normal distribution in all the classes, opting to analyse the values from the Shapiro–Wilk test, p > 0.05, given that the number of observations was less than 50 (Marôco, 2021). All skewness and kurtosis values are within -2 to + 2, which confirms normal distribution (Sahin & Yilmaz, 2020).

Table 13 Inferential statistics for normality testing

The values for homogeneity of variance or homoscedasticity (Marôco, 2021) obtained by Levene's test returned for the post-test (evaluation form) F = (3.77) = 0.772, p = 0.513 and for the follow-up test F = (3.77) = 0.450, p = 0.718, values that allow H0 to be verified, thus homogeneity of variance.

2.4.2 Results analysis at SE

Table 14 shows the descriptive statistics for the 10th-year SE classes, both of which used conventional teaching, with AR in class B (EG) and without AR in class A (CG).

Table 14 Descriptive statistics for the 10th-year class groups with conventional teaching

The results of the post-test and follow-up test, carried out two months later, illustrated in Table 14, show a better performance in the EG students who used AR in their learning using the conventional strategy.

Table 15 shows the values of a normal distribution in all the samples, opting to analyse the values of the Shapiro–Wilk test p > 0.05, given that the number of observations was less than 50 (Marôco, 2021). All skewness and kurtosis values are within -2 to + 2, which confirms normal distribution (Sahin & Yilmaz, 2020).

Table 15 Inferential statistics for normality testing

Analysing the values of homogeneity of variance (Marôco, 2021) obtained by Levene's test, we found that for the post-test F = (1.54) = 0.698, p = 0.407 and for the follow-up test F = (1.54) = 0.773, p = 0.383, values that allow us to verify H0, thus the homogeneity of variance.

3 Hypothesis test results

3.1 Results at BE

3.1.1 Paired groups

Starting with the comparative analysis of the students' performance, we used the t-test for paired samples, since we analysed the students’ performance according to the teaching strategy and the use of AR (see Table 16).

Table 16 Paired samples t test (Post-test – follow-up test)

Analysing this data shows that there was no statistically significant difference between the post-test and the follow-up in groups A and C (EG1 and CG1) where the conventional method was used (with and without the use of AR), so we rejected the first hypothesis (HC1). There was no statistically significant difference in group B (CG2), which followed the drawing principle but without using AR. Class A showed a negative average, due to better performance in the follow-up test than in the post-test. However, we can only say that there was a statistically significant difference between the post-test and the follow-up in class D (EG2), where the drawing principle associated with AR was used. In this class, the value of t = 5.73 was 1.706 for 26 degrees of freedom, and the p-value was less than 0.05. This allows us to accept Hypothesis 2, i.e. that when the drawing principle teaching strategy was combined with certain AR applications to learn about volcanism, the students improved their results in the long-term acquisition of knowledge, which should be one of the school's objectives. Pinto (2001) points out that one of the aims of teaching is to promote the acquisition of well-organised knowledge in long-term memory. Moser and Lewalter (2024) in a study of university students reported better results when the use of AR was combined with the drawing principle strategy than when AR was used on its own.

In analysing Table 17, we chose to use the values with the Hedges' correction because there was heterogeneity in the number of participants in the groups and because the groups were small (three of the four classes had 20 or fewer students), as suggested by Espirito-Santo and Daniel (2017) and Cohen et al. (2018). These figures show that there were no statistically significant values in groups A, B and C (EG1, CG1 and CG2), since zero is included in all the confidence intervals. Thus, only in class D (EG2) can statistically significant values be accepted. Assuming that the value d = 1.103 (1.071 with Hedges' correction) translates into a high effect when the drawing principle strategy was associated with AR, this reinforces the acceptance of Hypothesis 2, which formulated that when the drawing principle teaching strategy is associated with the use of AR, students improve their school performance.

Table 17 Paired samples effect sizes (Post-test – follow-up test)

3.1.2 Unpaired groups

The comparative analysis of the students' performance using the t-test for unpaired samples, i.e. between groups, made it possible to analyse the students’ performance between strategies when AR is not present and when it is used in conjunction with a particular teaching strategy.

The analysis in Table 18 shows a p-value = 0.274 for the post-test and a p-value = 0.583 for the follow-up test, which allows us to conclude that the variances are homogeneous in both and leads us to use the values for the Student's t-test, which assumes equal variances (Marôco, 2021). Thus, analysing the Student's t-test values for the post-test, the p-value = 0.047 leads us to reject H0 (Marôco, 2021) and therefore consider the mean scores of the groups to be significantly different. In the follow-up test, the p-value = 0.324, which leads us to consider that there is no statistical evidence to reject the H0 and therefore the mean scores of the groups do not differ.

Table 18 Independent samples t test (Post-test e follow-up test) conventional education classes

The analysis in Table 19 shows a p-value = 0.153 for the post-test and a p-value = 0.770 for the follow-up test, which leads us to conclude that the variances in both tests are homogeneous and leads us to use the values for the Student's t-test, which assumes equal variances (Marôco, 2021). Thus, analysing the Student's t-test values for the post-test, the p-value = 0.917 and in the follow-up test the p-value = 0.192, values that lead us to consider that there is no statistical evidence to reject H0. It can be assumed that the average results of the two groups are not significantly different, which means that there are no statistically significant results in the results of the students in the four groups in the post-test and follow-up test. These results indicate that Hypothesis 3 (H3), which stated that there were significant differences between groups with differentiated teaching strategies with and without the use of AR, was not confirmed.

Table 19 Independent samples t test (Post-test e follow-up test) classes learning with drawing principle

3.2 Results at SE

3.2.1 Paired groups

The t-test for paired samples was used in the comparative analysis of the 10th-year students’ performance, since the aim was to analyse the students’ performance according to the conventional teaching strategy with and without the use of AR in the post-test and in the follow-up test carried out two months later. These results can be seen in Table 20.

Table 20 Paired samples t test (Post-test – follow-up test) for 10th-year with conventional teaching

Analysing the results shows that there is a statistically significant difference between the post-test and the follow-up in both classes. However, because the mean difference is positive, we can say that the students performed better in the post-test than in the follow-up test. The slightly higher mean difference shows that the results were better in the class where AR was not used. This means that the use of AR associated with conventional teaching does not promote better results in the acquisition of knowledge about volcanism than conventional teaching without AR. Both produce good results, with the results of conventional teaching being higher than those of conventional teaching with AR. One possible explanation could be an excess of cognitive load due to the fact that several sources of information were used in the EG.

In analysing Table 21, we chose to use the values with the Hedges' correction since there was heterogeneity in the number of participants in the classes (CG with 24 and EG with 30). Since zero is excluded in all the confidence intervals, we can consider the values to be statically significant. The d values with Hedges' correction reflect a high effect when the conventional teaching strategy is associated with AR, allowing HC4 to be accepted. We can associate this result with the fact that the use of AR, being an innovative aspect for students, improves motivation, attention and attachment to activities (Yun et al., 2022; Faria & Miranda, 2023, 2024a, 2024b).

Table 21 Paired samples effect sizes (Post-test – follow-up test) for 10th-year with conventional teaching

3.2.2 Unpaired groups

The comparative analysis of the students ‘ performance using the t-test for unpaired samples, i.e. between the EG and CG in the post-test and follow-up test, made it possible to analyse the students’ performance between strategies when AR is not present and when it is used with conventional teaching (see Table 22).

Table 22 Independent samples t test (post-test e follow-up test) classes with conventional learning

The analysis in Table 22 shows a p-value = 0.407 for the post-test and a p-value = 0.154 for the follow-up test, which leads us to conclude that the variances in both tests are homogeneous and to use the values for the Student's t-test, which assumes equal variances. Thus, analysing the Student's t-test values for the post-test, the p-value = 0.348 and in the follow-up test the p-value = 0.294, values that lead us to consider that there is no statistical evidence to reject H0. It can be concluded that regardless of the use of AR, the conventional method produces good results in the acquisition of knowledge in the short term but that these do not last in the long term.

4 Discussion

At this point, the four research questions and hypotheses are revisited to analyse whether the questions have been fully or partially answered and whether the hypotheses have been confirmed or not, and to advance possible explanations, referencing the adopted theoretical framework, the results obtained in previous research, and the conditions under which the studies were developed in the 7th and 10th-year classes. The response to Research Question 1 (Q1), which posited that there could be differences in the academic performance of 7th-year students taught with a conventional strategy associated with the use of certain AR apps compared to students who only used the conventional strategy, was not positively answered as there was no better academic performance recorded for the students in the class where AR was used. The Hypothesis (H1) arising from this research question was also not confirmed. On the contrary, it was possible to observe better results in the post-test performance of students with the conventional teaching strategy when they did not use AR. This may mean that the use of AR with 7th-year students (average age of 12 years) led to a higher mental effort that translated into poorer academic performance. This result is congruent with other previous studies that noted cognitive overload in students with the use of AR (Akçayır & Akçayır, 2017; Dunleavy & Dede, 2014; Dunleavy et al., 2009; Keller et al., 2021). Lin et al. (2024) present examples of authors reporting contradictory findings on the cognitive load experienced by students using augmented reality (AR). These authors mention that some studies indicate that students experiencing higher cognitive load may achieve better performance, perform worse, or show no significant difference compared to control groups. They also highlight multitasking as a significant issue in AR-based learning activities, noting its potential to be a distracting factor. Another possible explanation stems from the natural difficulty inherent in the cognitive development stage of this age group, which is in the transition from the stage of concrete operational thought to formal operational thought, where it becomes easier to consider multiple sources of information and response alternatives simultaneously (Piaget & Inhelder, 1958; Inhelder & Piaget, 1976).

The response to Research Question 2 (Q2), which posited that there could be differences in the academic performance of 7th-year students taught with the drawing principle strategy associated with the use of certain AR apps compared to students who only used the drawing principle, was answered affirmatively. It was observed that there was an improved academic performance in the post-test for the students in the class where AR was used, but this difference was less pronounced in the follow-up test applied three weeks later to both classes that used this strategy. The Hypothesis (H2) arising from this research question can be considered partially verified, as in the short term, the performance was better in the class that integrated AR with the drawing principle strategy. We can deduce that the integration of knowledge may be due to a more present visual memory in the assessment close to the learning moment, with AR presenting a sense of authenticity (Moser & Lewalter, 2024) and the pictographic production of knowledge promoting an active role in knowledge acquisition (Leutner & Schmeck, 2014, 2022). This improvement in performance can be understood as associated with the observation of aspects not directly observable, given the dangerous nature of the volcanic activity, but perceived through the integration of virtual images in the classroom context that enhance motivation in learning (Alkhasawneh & Khasawneh, 2024; Dreimane & Daniela, 2021; Gomes et al., 2020).

The response to Research Question 3 (Q3), which sought to determine if there was a more favourable strategy for integrating AR in the teaching of volcanism to 7th-year students, was also answered affirmatively. There was a better academic performance in the classes that used the drawing principle strategy, both associated with AR and without it, compared to the classes that used conventional teaching. However, in a subsequent assessment, the follow-up test, this improved performance was not maintained and was diluted with the expression of very similar performance results between classes with different strategies, whether using AR or not. The Hypothesis (H3), arising from this research question, can be considered partially verified, as only in the long term was the performance better in the class that integrated AR with the drawing principle strategy. We can consider that the appropriation of this knowledge may be due to a more active visual memory in retention. Thus, AR can present with a sense of authenticity (Moser & Lewalter, 2024) and the pictographic production of knowledge can promote a more active role in knowledge acquisition (Leutner & Schmeck, 2014, 2022). This improvement in performance can also be understood as associated with the observation of aspects not directly observable, given the dangerous nature of the volcanic activity, but perceived through the integration of virtual images in the classroom context that enhances motivation in learning (Alkhasawneh & Khasawneh, 2024; Dreimane & Daniela, 2021; Gomes et al., 2020), an understanding of the non-visible (Ferrari et al., 2024; López, 2018; Radu et al., 2022; Yoon & Wang, 2014).

The response to Research Question 4 (Q4), which hypothesised that there could be differences in the academic performance of 10th-year students taught with a conventional strategy associated with the use of AR compared to students who only used the conventional strategy, was answered affirmatively. An improved academic performance was recorded in both assessment moments for the students in the class where AR was used. However, Hypothesis 4 (H4) arising from this research question was not confirmed. It can be considered that the appropriation of this knowledge was more effective and lasting in the class that integrated AR (EG), compared to the one that did not (CG), but without a statistically significant result. This improvement in performance can be understood as associated with a greater focus on learning motivated by the integration of AR, since the activities promoted in the learning sequence were the same, including the requirement to colour the volcanic apparatus which in EG served as an activator of the AR, while in GC it was the basis for assembling a three-dimensional structure on paper.

A comparative analysis of the results regarding the conventional strategy in 7th-year and 10th-year students shows that the 10th-year students perform better than the 7th-year students when using AR to enhance their learning. As aforementioned, this could be linked to the cognitive overload in 7th-year students, combined with multitasking being a distraction factor, aspects that were overcome due to the development expected in the 10th-year students.

5 Conclusion

This study compared two teaching strategies (conventional vs. drawing principle) in four classes of 7th-year students, associated with AR applications in the two experimental classes, to develop learning environments about volcanism. In secondary education, 10th-year students, the same teaching strategy (conventional) was compared with and without the use of AR applications for the study of volcanism. The results obtained from students in 7th-year of basic education confirmed that the drawing principle teaching strategy produces better long-term results than conventional teaching, with and without AR. Therefore, it is the teaching strategy rather than the use of AR that seems to produce the best results at this level of education. As we've mentioned, this may be due to an excess of information which has caused cognitive overload in the students. In secondary education, conventional teaching combined with AR proved to be a more efficient strategy to promote student performance, especially in terms of short-term results.

5.1 Limitations and future research possibilities

This study has certain limitations that warrant consideration. Firstly, the research was conducted within the confines of a specific private school, which adheres to a conventional teaching method and boasts teachers well-versed in digital technologies. This context limits the generalizability of our findings to other educational settings. Secondly, the student population in this private school predominantly comprises middle and upper-middle-class students. Consequently, the socio-economic bias inherent in this sample may not accurately represent the diverse demographics found in public schools.

Future research should adopt the following strategies to enhance the robustness of our findings. Firstly, replication studies across different schools are essential. By conducting similar investigations in public schools, we can explore whether the benefits of using Augmented Reality (AR) extend beyond the specific context of the original study. Secondly, variation in school environments—such as urban versus rural settings or schools from different countries—will provide a more comprehensive understanding of AR’s effectiveness. By identifying factors influencing AR’s success, we can tailor its implementation to diverse educational contexts. Thirdly, expanding the sample size to include a more extensive range of students will enhance the study’s validity. Comparing results across different student demographics, including private versus public schools, can reveal nuanced effects. Lastly, future studies could generate new hypotheses about AR’s impact on student learning. For instance, does AR enhance students’ engagement, retention, or problem-solving skills differently in private versus public schools? Investigating such questions will contribute to evidence-based practices in teaching strategies across various educational settings.