1 Introduction

The core goal of art education is to help students see beauty through perceptual learning, and the key to seeing beauty lies in spatial ability, which can use visual thinking, solve visual problems, and enhance mental imagery [1]. Therefore, spatial ability is an essential skill for art students, which can make them form decent three-dimensional (3D) thinking and spatial perception. Some researchers believe that spatial ability can predict job performance and has a specific predictive power for career success [2]. As Ursyn put it, Learning, problem-solving, and memorisation require the ability to visualise scientific concepts [3]. Without good spatial ability, success in professional knowledge will be limited [4].

Art is often considered a discipline that requires high-dimensional artistic interaction throughout the learning process. For example, in the field of visual communication design, the spatial ability is important not only because designers need to express design ideas graphically but also because they need to solve problems involving abstract objects and understand other designers' sketches and solutions [5]. Sutton and Williams believe that spatial ability plays a crucial role in architectural art design and the learning experience of art students [6]. The spatial ability allows them to easily convert two-dimensional (2D) images of a given location into 3D images, meaning they can effortlessly create a three-dimensional depiction of the area being designed in their mind while dealing with a 2D layout on paper [7]. Therefore, reading, understanding, and comprehending 2D information in a 3D environment (spatial ability) is essential for design communication and generation [8, 9]. For art students, the cultivation of spatial ability is the initial task and the ultimate goal.

However, spatial ability is not a concrete, practical ability but an abstract 3D thinking and spatial perception, and its learning process needs to consume a lot of cognitive resources [10, 11]. According to the cognitive load theory, human cognitive resources are limited and must be consumed in learning. Thus, when one activity consumes many cognitive resources, the number of cognitive resources available for other activities decreases. If the cognitive load exceeds a person's total cognitive resources, cognitive resources cannot be used for learning, and learning is considered ineffective.

In art and design classes, students learn by doing things and thinking about what they are doing [12]. Although creativity and spatial ability are important factors in the design process, there is limited research on their relationship and role in the design process [13]. Currently, schools' standard teaching methods for cultivating spatial ability combine teaching slides and video or two-dimensional animation displays. Students can only internalise relevant knowledge by integrating notes, textbook content, and their understanding, which may increase the cognitive load of spatial learning. This inhibits students' understanding of spatial concepts and the development of spatial abilities [14].

With the evolution of science and technology, virtual reality (VR) technology enables novel applications in education. Its spatial visualisation features and 3D interactive functions provide a new way of learning to carry out spatial ability learning and enhance the experience and feeling during learning [15]. The immersive learning environment (ILE) created by VR technology has various advantages that traditional education cannot provide. It can provide a 3D virtual learning environment so students can get an immersive learning experience and acquire knowledge comprehensively through hearing, vision and touch. It also provides a more comprehensive range and greater integrated sensory stimulation, enhancing art students' perception of space. The VR technology provides more opportunities and possibilities for improving students' spatial ability and spatial learning effect than traditional spatial ability training methods. Some studies also show that proper use of VR technology is conducive to improving students' spatial ability [16, 17].

Dynamic visualisation research based on cognitive load theory argues that immersive learning environments provide students with a complete representation of external processes, reducing the cognitive load they are subjected to when developing mental representation models, thus enabling them to achieve higher spatial ability and more learning benefits [18, 19]. By improving students' spatial ability, they can have strong spatial representation ability and achieve better learning performance [20, 21].

By using cognitive load theory as the research framework, this study discusses the influence of two learning environments on students' cognitive load and spatial ability learning performance: one is digital learning media (DLM) based on slides, and the other is immersive learning environment (ILE) based on VR technology. We propose the following hypotheses: (1) whether students' learning performance in ILE is higher than that in DLM, and (2) whether students' cognitive load in ILE is lower than that in DLM. At the same time, this study explores the interaction between gender, cognitive load, and learning performance in different learning environments.

This study adopts an experimental research method. The participants, who were university students in the first year, were divided into the control group and the experimental group depending on their entrance scores and gender, and assigned to DLM and ILE, respectively, while ensuring the same number of students with similar entrance scores and gender in the two learning environments. Specifically, this study examines students' cognitive load and learning performance in different learning environments using the same teaching content.

2 Background

2.1 Spatial ability and art education

Art design discipline has the characteristics of cross-border integration and interdisciplinary specialisation. Especially in the aspect of creation, it has high flexibility. The curriculum needs considerable freedom and pays more attention to the practice verification of professional knowledge. Spatial ability uniquely develops creativity and is the critical content of art design teaching [22]. As Buhalis et al. put it, in realising creative design schemes, designers need to accurately estimate the size and distance and visualise the possible scene effects of complex interior Spaces from different viewpoints [23]. Gomez-Tone argues that the ability to manipulate and rotate objects is essential to creating design. In contrast, the ability to manipulate and turn objects and judge size and distance mentally belongs to the category of spatial cognition [24].

No matter what kind of teaching method is adopted in the art course, its purpose is mainly twofold: one is to equip students with the ability to transform the design ideas in the brain into visual graphic language at any time; The second is to train students' knowledge of spatial expression and spatial transformation. In transferring graphics to express ideas, students should not only pay attention to the size and proportion of objects and the spatial relationship between objects but also have the ability to transform between plane and 3D or 3D presentation to graphics.

From the analysis of the current teaching status of spatial ability in art design courses, although most students have good spatial ability or their interest in spatial knowledge and acceptance of relevant knowledge may be more potent than that of non-such students, there are still differences in spatial ability among them [25]. Moreover, some teachers point out that although a student with good spatial ability faces fewer difficulties in learning 3D design software and creating 3D models, he does not necessarily become a good designer. Therefore, improving their spatial ability through corresponding courses and training is necessary to enhance their professional competitiveness.

2.2 Function of ILE

Researchers believe VR has unique advantages in improving spatial ability and training design thinking [26, 27]. Its concept can be defined as a virtual learning environment based on VR technology, with deep immersion, borderless field of view, and 360° panoramic display space [28], integrating multi-sensory channels such as sight, hearing, and touch as one, with vivid situations and natural and efficient interaction ways, to support students to carry out a variety of learning activities, through multi-sensory channel collaboration and virtual world information interaction, get a positive immersion experience and sense of reality [29].

In this study, we defined an immersive learning environment (ILE) as the experience and feeling of students immersed in a virtual environment through a head-mounted display (HMD) [30, 31]. In contrast to other types of digital learning, such as slideshow or video, ILE enables students to enter the immersive 3D virtual space to view, explore, and participate in the interactive process as an active element of the environment [32]. Students can rotate, identify, compose and disassemble materials, promoting attention and concentration through direct interaction with design objects.

Previous studies have found that ILE has potential advantages in art education. The function of ILE is manifested in the following aspects:

  1. (i)

    ILE is conducive to improving students' academic ability and learning experience. The immersion and interactivity provided by ILE can promote students' visual presentation and mental rotation ability [33] and realise human–computer interaction through various interaction methods such as vision, voice, and gesture, thus bringing students a strong sense of immersion and improving their sense of embodiment. Ahmed Abdel Rahman enables students and teachers to carry out experiential learning in ILE, which enhances their creativity and productivity [34]. Another study used ILE for arts education to achieve good results for students and teachers regarding immersion, interactivity, and imagination [35]. These results suggest that creating artistic design can benefit from using ILE.

  2. (ii)

    ILE is conducive to expanding the learning space. ILE has no physical entity; it is a completely digital world, and all things are represented digitally [36, 37]. All kinds of learning information of ILE are contained in the context, simulating, reproducing, and surpassing people's working, living, and learning environment. Visualisation and visualisation are its prominent characteristics. Simulating various things can significantly expand the learning space and provide new cognitive ways and places for learning and teaching.

  3. (iii)

    ILE can promote the occurrence of intuitive teaching and meaningful learning [38]. Intuitive and vivid displays can leave a deep visual memory and impression on students. Using ILE to explore unknown areas can play an important supporting role in the learning environment [39], thus promoting the occurrence of meaningful learning.

  4. (iv)

    ILE is conducive to creating new learning styles and teaching methods. Students will achieve the best learning effect When they devote themselves entirely to ILE and fully interact with it [40]. The immersion of ILE is conducive to creating a new student-centred teaching model and carrying out virtual practical teaching and immersive teaching, etc. According to new research by Rong et al., introducing VR technology into art education can encourage students to engage in deep learning [41].

2.3 ILE in art courses

Regarding the characteristics of art courses, the teaching process needs to show and explain many artworks to students. However, whether it is the traditional course teaching in college or the appreciation of artworks based on Western educational ideas, they are eager to show a strong sense of space. However, it is not easy to achieve in traditional art teaching [42].

Since space conception is considered crucial in art education, a simulated learning environment can be constructed using VR technology to allow college students to comprehensively understand various art and design concepts in an immersive environment [43]. Combining pictures, text, sound and images through virtual reality technology helps create a vivid teaching situation, mobilises students' sensory responses from multiple angles, and transmits and shares lively and vivid teaching content with pictures and pictures [44]. At the same time, ILE can bring students into the simulated real world through vision and perception from hearing, touch, smell, and other aspects to have a deeper understanding of art [45].

This teaching process can increase the teaching information, enrich the teaching content to the maximum extent, give the students a strong interest in the content, and stimulate the thirst for knowledge [46]. ILE, constructed with the clever use of VR technology, can make the pictures in the original textbooks and multimedia courseware be displayed in front of students in two-dimensional and 3D space so that they can fully perceive and understand to train and cultivate their visual perception and modelling ability of artistic forms. In addition, in the teaching practice of art classes in colleges and universities, the use of virtual reality technology can give full play to the artistic language, such as composition, line, colour, and space in creative works, so that each specific part of the work can be presented in a comprehensive and detailed way and produce a more substantial visual impact, which is what ordinary classroom teaching can not achieve [47].

However, due to the limitations of various conditions, only a tiny part of the classroom is integrated into modern technology for teaching. Most focus on case appreciation and computer applications; even fewer research projects use VR to design ILE. Therefore, understanding how to design ILE better to facilitate the learning of spatial ability can benefit countless art students.

2.4 Cognitive load theory

The human cognitive structure includes working memory, which has limited duration and capacity [48]. Long-term memory has an unlimited capacity for storing automated schemas that can be brought into working memory for processing when needed. A schema is a cognitive structure that enables an individual to organise information according to the user of the information, automating cognitive activities with sufficient practice [49]. Cognitive load theory is the theory that has to do with the consequences of limited working memory. The theory is an instructional theory based on knowledge of human cognitive structures, with a core focus on the finiteness of working memory [50]. The mental activity that is realised at the same time as working memory is called cognitive load.

Students in immersive learning environments may encounter a cognitive load because virtual reality technology and other modern information require students to deal with much information. When the information is represented, it creates a cognitive load, which causes their cognitive load to overload [51]. Spatial ability involves transforming 2D and 3D relationships and consumes more cognitive resources. Dan and Reiner showed that watching 2D teaching videos on computers produced a much higher cognitive load index than watching 3D teaching movies [52]. Therefore, in the instructional design process, any unnecessary burden on working memory should be minimised, and the opportunity to acquire and develop automated schemata should be maximised.

There is evidence that displaying data in virtual reality can reduce unnecessary load. Several studies have demonstrated that using AR in education can bring specific learning benefits, such as reducing cognitive load [53]. Based on this, Darwish proposes that spatial learning can be enhanced by creating an environment that may reduce cognitive load [54], especially for complex learning materials, by increasing the spatial or temporal continuity of relevant information elements, which leading substantial learning gains [53]. An excellent instructional design should allow working memory resources to be focused on learning rather than external activities unrelated to the learning process. This study uses some teaching design principles proposed by cognitive load theory to construct learning environment design and presentation strategies based on virtual reality technology.

3 Research methods

3.1 Participants

The target population of this study is first-year students majoring in art at a university in Chongqing, China. They have changed from a general content learning stage (high school) to a professional content learning stage (university), and their learning thinking and content have undergone great changes. Students have not studied systematically and are interested and fantasised about the professional content. Therefore, effective learning activities should be adopted to satisfy the interest and enthusiasm of students and lay the foundation for the subsequent study of core courses.

In the preliminary preparation, this study investigated the personal situation of 437 first-year students majoring in art, including "whether they have used VR-related software or equipment," "how do you feel about using 3D or immersive environment", "What do you think about learning in ILE" and other questions. Among them, 92.47% of students want to experience a 3D or immersive environment. 98.27% of the students were interested in using ILE for learning and were willing to try it.

Then, the study surveyed the enrollment results of the above students. Since the students have just entered the university, they have no experience applying for and learning VR design software. However, they have a certain art foundation and have received professional sketch and colour training. Thus, the admission results can reflect the student's initial ability. After applying to the university's teaching department and explaining the purpose of the research, the student's admission results were obtained after the university approved them.

Finally, 28 students were selected for this study through purposive sampling, including 14 females and 14 males, with an average age of 19.96 years (standard deviation = 1.46 years). All students had no history of mental illness and had normal or corrected vision. Nearly 90 per cent of them have never used VR before, and the rest have only experienced it once or twice.

To ensure that the study was conducted ethically, before recruiting students, the teacher in charge of the investigation provided written consent and was willing to participate. In addition, before experimenting, the researchers described the study's characteristics to the students involved in the research and finally obtained the written consent of the students. None of the students had been exposed to the content of the "Composing Foundation" course before teaching, nor had they taken the course using VR technology.

3.2 Experiment design

This study contains three variables. The first variable is the learning mode, including slideshow-based digital and VR-based ILE modes. The learning material in both models is derived from relevant content in the "Composing Foundation" course. The second variable is cognitive load, measured by the cognitive load scale developed by Hwang et al. [55]. The third variable is learning performance, which professional teachers test according to the teaching content of the course in the experiment.

The experiment was conducted with the same teachers taking charge of the teaching work. It reduced the possible influence of other factors on the investigation results and ensured the teaching style, teaching progress, teaching content, teaching difficulty, and so on were controlled to be the same in the two stages of the experiment, except for the different forms of learning methods (i.e., DLM and ILE) The instructors have more than five years of teaching experience. In the early stage of the teaching experiment, teachers received relevant training in immersive teaching to ensure that teachers would not affect the experiment results due to improper operation. The specific training content is shown in Table 1.

Table 1 The experimental content and procedure

The content of the experiment is related to the "Composing Foundation" course, which was jointly developed by many professional teachers and included two learning modules: three-view theoretical knowledge and case display. The theoretical understanding of three views includes the definition of three views, the concept of projection, the classification of the projection method, the basic features of orthographic projection, the formation of three views, and its projection law. The purpose of this section is to provide students with a conceptual overview. The case introduces the expansion effect and drawing method of three views from simple to complex structures. In the formal test, the three-view theoretical knowledge part was 60 min, and the case presentation part was 40 min.

In this experiment, 28 students were divided into the control and experimental groups, with seven females and seven males in each group, based on the average of students' admission scores (up and down 5 points), gender, age and prior knowledge. The students who participated in the experiment had no significant differences in age, gender, or basic conception of spatial ability. They had not been exposed to the content of the " Composing Foundation" course before teaching, nor had they taken the course using VR technology. Therefore, it can be considered that there is no difference in spatial ability between groups. Each group was given only one learning environment. In other words, the control group only learned in the DLM mode based on slides, and the experimental group learned in the ILE mode based on VR technology. The specific test framework is shown in Fig. 1.

Fig. 1
figure 1

Experiment framework

In this experiment, the students learned the same content and the same teacher taught the three-view concept to the control and experimental groups. Students in the control group learned theoretical knowledge through the slides explained by the teacher and watched the animation to understand the 3D dynamic trajectory of objects in a normal classroom. The experimental group studied in the immersive learning environment learned theoretical knowledge by playing video on the teacher's avatar, operating virtual 3D objects with VR devices, and understanding the 3D dynamic trajectories of objects. At the end of the study, students complete the three-view and cognitive load tests.

3.3 Experimental procedures

3.3.1 Pre-test preparation

One week before the formal experiment, students in the experimental group were required to learn how to use VR devices, which could reduce the impact caused by students' inability to operate them.

3.3.2 Test content and process

In the control group (DLM mode), students learn in a regular classroom. The teacher explained the theoretical knowledge of the three views in the form of slides and then played relevant videos to explain cases, showing the expansion effect of the simple and complex structures of the three views. By recording the teaching content, students convert the theoretical knowledge demonstrated by the teacher into practical content on flat paper.

In the experimental group (ILE mode), students wore VR glasses to study. Formal learning begins after students fully understand and become comfortable with the VR equipment and related operations. After the students enter the virtual environment, the teacher's virtual avatar will explain the three-view theory knowledge. Then, under the teacher's guidance, the students moved, rotated, and viewed the object 360 degrees, then decomposed and combined the object to view the three-view expansion effect of various structures. The specific experimental contents are shown in Table 2.

Table 2 The experimental content and procedure

Table 3 describes the detailed teaching environment of the two modes. The difference between DLM and ILE patterns lies in practical operations. As the practical operation of DLM is limited by time, place, materials, etc., the standard practical operation mode is to observe static 3D objects and form 3D images of objects through mental rotation and construction. However, the virtual scenes and objects in ILE can truly restore the actual situation, and students can intuitively watch and feel the dynamic three-dimensional objects, reducing the burden of working memory.

Table 3 The learning environment in DLM and ILE
Fig. 2
figure 2

Normal classroom and virtual classroom

Fig. 3
figure 3

Theoretical knowledge teaching

Fig. 4
figure 4

Practical operation

It can be seen that the most significant variable of spatial ability learning in DLM and ILE mode is practical operation, which can effectively reduce the interference caused by other factors to the results of this study and more accurately analyse the impact of different learning environments on students' cognitive load.

3.3.3 Three view test

After completing the course, we asked each student to complete the Three Views test. Since the control group learned the same things as the test group, we used the same tests to measure the learning outcomes of all students. The test questions are provided by professional teachers, and the content is centred around "three views." students must carry out two-way tests of 3D-2D conversion and 2D-3D conversion. In the 3D-2D conversion part, students observed the 3D modelling model provided and drew the front view, top view, and side view of the model for 15 min (Fig. 5). In the 2D-3D conversion part, students draw the 3D style of the 3D shape according to the front view, top view, and side view provided in the question, and the time is 15 min (Fig. 6). The two trials lasted a total of 30 min. The total score of the test questions is 100 points. Each question is worth 25 points.

Fig. 5
figure 5

3D-2D conversion test

Fig. 6
figure 6

2D-3D conversion test

3.3.4 Instrument of cognitive load

The questionnaire method is the main research tool to evaluate the cognitive load proposed in this study. A questionnaire is a research tool consisting of questions or statements designed to gather statistical data from respondents. There are several guidelines followed to build the tools for this study, which were that (i) the tools should be nice looking and clear, (ii) only consider things that are relevant to the objectives of the study, (iii) use regular and understandable language, and (iv) prevent boot or loading problems [56]. Therefore, after each student completed the Three Views test, we asked them to complete the cognitive load measurement scale.

"Cognitive load" refers to the total amount of mental activity applied to working memory while processing information. This study adopts the cognitive load scale developed by Hwang et al. [55], including mental load and effort. The scale consists of 8 items and assumes a 5-point Likert scale form.

Mental load. "Mental load" refers to the influence of the interaction between learning tasks and subject characteristics on students' memory capacity. There are five items in this dimension, specifically: (1) The learning content in this learning activity is complex for me; (2) It took me a lot of effort to answer the questions in this learning activity; (3) It bothers me to answer the questions in this learning activity; (4) I feel frustrated to answer the questions in this learning activity; I don't have enough time to answer the questions in this learning activity.

Mental effort. "Mental effort" refers to the cognitive capacity allocated to accommodate the demands imposed by the task. There are three items in this dimension, which are as follows: (i) In this learning activity, the teaching method or the presentation of the content of the textbook is quite difficult for me; (ii) I must put a lot of effort into accomplishing the learning activity or achieving the goal of the learning activity; (iii) The way this learning activity is taught is difficult to understand or keep up with.

The scale provides numbers as a scale for scoring rather than semantic space or verbal clarification to classify feedback positions [56, 57]. Thus, responses to each item in the questionnaire were measured on a five-point scale, labelled "strongly disagree," "disagree," "Neither agree nor disagree," "agree," and "strongly agree," respectively, indicating the extent to which participants agreed or disagreed with the statement.

The questionnaire was conducted in a controlled environment to avoid bias issues such as influence from classmates and reference to other learning materials, and no positive feedback was provided. Therefore, students fill out questionnaires in the classroom, with the teacher present to observe but not interfere with the student's responses, which enables students to pay full attention to the questionnaire and provide honest feedback.

3.4 Data collection and analysis

Three experts reviewed the three-view test. Except for those answers that are completely correct (perfect score) or those that are not answered (0 score), other answers may have some details wrong or a few correct cases. Since the scoring of such questions is subjective to a certain extent, the three experts will score according to the actual situation, and the average score of the three scores will be taken as the final score to avoid the difference in scores caused by cognitive differences among experts, which will affect the final result.

The final score of the Cognitive load test was assessed by collecting the cognitive load scale filled in by the students.

After the experimental data were processed by Microsoft Excel software, the paired sample T-test and one-way ANOVA were performed by Microsoft SPSS 23 software. The significance level of P < 0.05 was significant, and P < 0.01 is extremely significant. The test results were expressed by Mean ± SEM.

A Shapiro–Wilk test was used to test the normality of data, which is crucial for calculating the possibility of a normal distribution of random variables behind the data set [57]. Subsequently, the paired sample T-test was used to compare students' learning performance and cognitive load in DLM and ILE models. The analysis describes changes in student learning performance and cognitive load in different learning environments. This study also used one-way ANOVA to analyse the difference between male and female students' learning performance and cognitive load in DLM and ILE models.

4 Result

4.1 Three-view test data

This study tested the simple main effects of spatial ability, learning environment, and learning performance. After three professional teachers evaluated the students' Three View Test (TVT), the final scores of 28 students were obtained in this study. The descriptive statistics of students' TVT results in DLM and ILE are shown in Table 4. In DLM mode, the mean of learning performance was 37.623, SD = 15.435; In ILE mode, the mean of learning performance was 83.143, SD = 8.690.

Table 4 Descriptive statistics of TVT in DLM and ILE

Subsequently, this study conducted a paired sample t-test analysis of students' TVT scores in DLM and ILE. Before this, the Shapiro–Wilk test was used to test the normality of data. The results showed that in DLM mode, the test value of the TVT was 0.954, p = 0.629 > 0.05; in ILE mode, the test value of TVT was 0.965, p = 0.800 > 0.05, and the research samples were in line with normal distribution, as shown in Table 5.

Table 5 Normality test of TVT in DLM and ILE

The results of the paired sample t-test are shown in Table 6. There was a significant difference between students' TVT scores in DLM and ILE mode (t = −10.433, p < 0.001). The results show that students' learning performance varies with different learning environments. Students' learning performance after learning the three-view theory in ILE is higher than that in DLM (Fig. 7).

Table 6 Paired samples T-test results of TVT in DLM and ILE
Fig. 7
figure 7

Learning performance of students in different learning environments

In addition, this study also analyses the influence of gender on learning performance. In DLM and ILE modes, the descriptive statistics of female and male students' test scores in three views are shown in Table 7. In the DLM mode, the mean of learning performance in female students was 24.429, SD = 6.268, and that in male students was 50.857, SD = 8.335. In the ILE mode, the mean of learning performance in female students was 79.714, SD = 7.718, and that in male students was 86.571, SD = 8.753.

Table 7 Descriptive statistics of TVT in DLM and ILE

The results of one-way ANOVA are shown in Fig. 8. In DLM mode (Control Group), the learning performance of male students is significantly higher than that of female students (Control Group + female: 24.428 ± 2.369; Control Group + male: 50.857 ± 3.150, p < 0.001). However, in the ILE mode (Experimental Group), there was no significant difference in learning performance between female and male students (Experimental Group + female: 79.714 ± 2.917; Experimental Group + male: 86.571 ± 3.308, p > 0.05). On this basis, the study also analyses the influence of the learning environment on the learning performance of students of different genders. The learning environment has a significant impact on the learning performance of both female and male students (Control Group + female: 24.428 ± 2.369; Experimental Group + female: 79.714 ± 2.917, p < 0.001; Control Group + male: 50.857 + 3.150; Experimental Group + male: 86.571 ± 3.308, p < 0.001).

Fig. 8
figure 8

Learning performance of different genders in different learning environments

4.2 Cognitive load data

According to the cognitive load theory, human cognitive resources are limited and must be consumed in learning. There is/does not exist an influence between cognitive load and learning environment, significantly influencing students' spatial ability learning.

The descriptive statistics of the cognitive load of students' spatial ability in DLM and ILE are shown in Table 8. In the DLM mode, the mean of the cognitive load was 37.623, SD = 2.992; In the ILE mode, the mean was 14.357, SD = 4.343.

Table 8 Descriptive statistics of cognitive load in DLM and ILE

Subsequently, this study conducted a paired sample T-test analysis on the cognitive load changes of students in DLM and ILE modes. In order to ensure the rationality of one-way ANOVA, a Shapiro–Wilk test was used to test the normality of data. The results showed that in DLM mode, the cognitive load test value was 0.913, p = 0.176 > 0.05; in ILE mode, the cognitive load test value was 0.941, p = 0.429 > 0.05, and the research samples were in line with normal distribution, as shown in Table 9.

Table 9 Normality of cognitive load in DLM and ILE

The results of the paired sample t-test are shown in Table 10. Students' cognitive load in DLM mode significantly differed from that in ILE mode (t = −27.178, p < 0.001). The results show that the learning environment affects the students' cognitive load. Students' cognitive load in DLM was higher than that in ILE (Fig. 9).

Table 10 Paired samples T-test results of cognitive load in DLM and ILE
Fig. 9
figure 9

The cognitive load of students in different learning environments

Then, this study analysed the influence of gender on cognitive load in different learning environments. In DLM and ILE modes, the descriptive statistics of cognitive load outcomes for female and male students are shown in Table 11. In the DLM mode, the mean of cognitive load in female students was 38.429, SD = 1.397, and that in male students was 34.000, SD = 2.449. In the ILE mode, the mean of cognitive load in female students was 16.571, SD = 4.198, and that in male students was 12.143, SD = 3.437.

Table 11 Descriptive statistics of TVT in DLM and ILE

The results of one-way ANOVA are shown in Fig. 10. Learning spatial ability in DLM mode (Control Group), the female students' cognitive load was significantly higher than that of male students. (Control Group + female: 38.429 ± 0.528; Control Group + male: 31.285 ± 1.936, p < 0.05), while in ILE mode (Experimental Group), there was no significant difference in cognitive load between female and male students (Experimental Group + female: 16.571 ± 1.586; Experimental Group + male: 12.143 ± 1.299, p > 0.05). In different learning environments, the cognitive load of both female and male students had significant changes (Control group + female: 38.429 ± 0.528; Experimental Group + female: 16.571 ± 1.586, p < 0.001; Control group + male: 31.285 ± 1.936; Experimental Group + male: 12.143 ± 1.299, p < 0.001).

Fig. 10
figure 10

The cognitive load of different genders in different learning environments

5 Discussion

The theme of this study is to explore the impact of ILE on the spatial ability development of students. According to the experiment, both the external learning environment and the internal individual differences impact the students' development of learning spatial ability. In this study, students were equally assigned to DLM and ILE to study the related knowledge of spatial ability. Before learning in ILE, students have been systematically trained in using VR devices, and their cognitive resources will not be affected by the operation of VR devices. Therefore, in the experiment, we believe there is no significant difference in other external conditions except the different learning environments.

5.1 External cause-learning environment

In this study, the learning environment affects students' cognitive load and learning performance in learning spatial ability. It is found that the cognitive load of students learning spatial ability in DLM is higher than that in ILE, while the learning performance of students learning spatial ability in DLM is lower than that in ILE. This is basically consistent with the research results of Khalil et al. and Elford et al. [58, 59]. The research results of Khalil et al. are consistent with the cognitive load theory. Compared with traditional paper-based teaching strategies, innovative computer teaching strategies reduce external cognitive load [58]. In the experiment, Elford found that the learning performance of spatial ability between the observation groups based on PPT teaching and AR teaching was significantly improved [59].

Cognitive resources theory holds that human cognitive resources are limited. When the cognitive load is high, the higher-order cognitive resources used for attention allocation are occupied, so students cannot effectively inhibit their responses to task-independent stimuli and thus are susceptible to interference from task-independent stimuli. Conversely, when cognitive load is low, students have sufficient cognitive resources to suppress their responses to task-independent stimuli to focus on learning tasks [60, 61].

DLM mode (such as video or animation) will consume a large number of cognitive resources, increase cognitive load, and lead to limited spatial ability that students can learn [62]. According to cognitive load theory, this can decrease learning performance [63]. Cultivating spatial ability requires students to spend a lot of cognitive resources. Students need to use mental rotation, spatial orientation, navigation, and other spatial skills to build 3D models in their minds. Sweller believes that working memory is limited, which may affect students' recognition, composition, and decomposition of 3D models [50]. ILE can provide a broader range and a higher degree of synthetic sensory stimulation to enhance art students' perception of space [45]. By visualising the steps of building 3D models, students can better construct mental models and reduce the cognitive memory they consume in recognising 3D models to leave enough working memory time for understanding, composing, and decomposing 3D models. Finally, students' cognitive load is reduced, and their learning performance is improved [62]. Therefore, this experiment proves that the learning environment impacts students' cognitive load, and training spatial ability in ILE can effectively enhance students' learning performance.

Learning spatial skills in ILE brings about a positive experience for students. For students, assembling in a virtual space may seem like an interesting novelty, more exciting than learning in an actual physical space. Student acceptance of the ILE model is significant, as student interest in VR is one of the main drivers of its adoption in education [64].

With the rapid development of computer technology, VR technology has unique advantages in simulating reality, interactivity, visibility, and expansion of design ideas of the objective world. The results of this study may be favourable because ILE based on VR technology may be a more accurate medium for assessing spatial skills than traditional teaching [65]. Therefore, innovative educational technology is a development trend. ILE provides a new perspective on spatial ability learning and verifies the feasibility and advantages of virtual reality technology in education [66].

5.2 Internal cause-individual gender

Although different learning environments lead to different cognitive loads and learning outcomes, individual differences in students also affect their level of learning performance in different learning environments [67]. A large number of studies have shown that the spatial ability of different individuals can positively predict their performance in tasks requiring visuospatial-related information processing [68]. Therefore, to further understand the relationship between spatial ability cognitive load and learning performance, this study also analysed the reasons affecting the development of spatial ability in terms of gender differences. It was found that both male and female students experience significant changes in their cognitive load in different learning environments: the cognitive load in DLM is significantly higher than that in ILE. On the contrary, learning performance in the DLM is significantly lower than in the ILE. The study of Chen and Huang also proves this result. It shows that the study performance of male and female students in the experimental group (teaching mode based on an assisted learning system) is better than that of the control group (teaching mode based on instructor-centred narrative). The experimental group also had a lower cognitive load during learning activities than the control group [69].

Subsequently, by comparing the performance of the same gender in different learning environments, it is found that in the DLM, the cognitive load of female students is higher than that of male students, and the learning performance is lower than that of male students. This result has also been verified in the experiments of Heo and Toomey and Roach et al.: Male participants consistently outperformed female participants on the DLM learning task, and men outperformed women on the spatial ability task [70, 71].

The study found that understanding and grasping information related to spatial ability may be more challenging for female students. This difference may be due to men being more familiar with 3D computer simulations because they spend more time playing video games [72, 73]. Female students may face the problem of excessive cognitive load when learning spatial energy. They have more difficulty understanding visual information in three views, which may interfere with their memory and comprehension ability, and therefore require more cognitive resources to construct 3D models, resulting in lower learning performance.

In basic spatial ability tests, men outperform women [71]. However, the size of this effect depends on the type of spatial ability measured against the environment [73]. Therefore, the ILE visually presents the 3D model, which reduces the cognitive load of female students' spatial perception. There is no obvious difference between the cognitive load of female and male students and their learning performance. Therefore, this experiment proves that the visual presentation provided by ILE can effectively reduce the influence of gender on spatial ability learning.

Several studies have found that gender characteristics differ in learning outcomes between static and dynamic learning resources [74, 75]. To better understand the unique gender effects by differentiating other potential factors that may influence spatial ability learning outcomes [76].

In future studies, gender variables should be considered in cognitive load theory and teaching visualisation research so that students can obtain equal teaching resources in learning [75]. For those courses that require high cognitive resources, relevant, immersive learning environments can be constructed to reduce the cognitive load of female students or those with low spatial ability and provide them with a good learning environment [70].

5.3 Expectation

In this study, the immersive learning environment is positioned as a learning environment for understanding and constructing spatial knowledge. The visual impact of an immersive learning environment improves students' learning interest, design ability, and listening efficiency and greatly promotes the cultivation of students' spatial ability and design expression ability. Space ability is not only the unique ability of art students, but it also has an important influence on the development of science and technology in science, engineering, medicine, and so on. For example, some clinical studies have demonstrated that VR training can enhance stereoscopic vision, contrast spatial sensitivity, etc. [77]. The research of Cevikbas et al. also shows that VR technology can enable learners to enter the virtual world, actively immerse themselves in it, and interact with objects, thus improving their mathematical thinking ability and spatial ability [78].

Therefore, the findings of this study provide a reference for the training of learning environment and spatial ability supported by VR technology. Other disciplines can refer to the model of this study to build an ILE for professional development, develop relevant classroom VR prototypes, and form an innovative classroom teaching model integrating education and emerging technologies.

6 Conclusion

This study constructs an immersive learning environment through VR technology and preliminarily discusses the influence of different learning environments on the spatial ability development of students. The study found that in the DLM, students' cognitive load is relatively high, and the cognitive load of female students is higher than that of male students. On the contrary, the academic performance is lower than that of male students. In contrast, the cognitive load of middle school students in ILE decreased significantly, and their academic performance improved accordingly. Meanwhile, there was almost no gap between female and male students in cognitive load and academic performance. This result is consistent with the hypothesis of cognitive load theory.

Spatial ability is an important cognitive skill. Improving students' spatial ability in school education has always been of concern to scholars. This study draws on relevant research experience, selects the appropriate learning content and designs the corresponding learning activities. At the same time, it organically combines the imparting of curriculum knowledge, the training of spatial ability and practical activities. The effects of immersive virtual environments on spatial ability were investigated from the perspectives and methods of educational technology and other fields.

One of the latest research topics in the twenty-first century is the impact of digital technology on the creativity of middle school students in art and design education [79]. Based on the dilemma that traditional teaching is insufficient in cultivating students' spatial ability, this study constructs an immersive learning environment oriented to spatial ability development and carries out practical application activities. The research results reveal the influence of external environment and gender differences on spatial ability learning and provide valuable empirical evidence for the use of virtual reality technology to cultivate students' spatial ability in school contexts, which can promote education researchers and teachers to conduct in-depth exploration on how to develop students' spatial ability.

The immersive learning environment realises the unification of virtual reality technology, spatial learning situations, learning activity and learning content. This study is a valuable exploration of how to improve spatial ability in school classrooms with the help of information teaching tools. Thus, educators can use these findings to promote the design of actionable learning environments in school contexts and to develop individualised instruction, thereby providing feasible strategies for spatial competence development in arts and non-arts majors.