1 Introduction

With the development of digitization, networking, and intelligence, society and the economy have entered the intelligent era, and higher education is similarly undergoing intelligent reform in learning, teaching and other aspects. The emergence of new technologies has brought many possibilities for breaking through old limitations in the field of education. The Ministry of Education of the People's Republic of China (MOE) has promulgated the Ten-year Development Plan for Education Informatization (2011–2020) and the Education Informatization 2.0 Action Plan, making "Education informatization" the key development direction of China's education reform. The Horizon Report, initiated and led by the American New Media Consortium (NMC), releases annual forecasts and analysis of the development trend of education informatization, which are regarded as the wind vane of the construction and development of education informatization (Wang et al., 2015). In the edition of The Horizon Report in 2020, Extended Reality (XR) is one of the "new technologies and practices". Extended Reality refers to the mixture of the physical and virtual environment, or environments providing a fully immersive virtual experience (Jin et al., 2020), which mainly includes Augmented Reality (AR), Virtual Reality (VR), Mixed reality (MR), and other virtual technology types.

Virtual technology has been extensively explored for its applications in education. Calvert and Abadia (2020) compared the learning effects, engagement, presence, and empathy of 360-degree panoramic technology (360-degree PT) and VR technology. They found that VR technology performed better in these aspects. Similarly, Klingenberg et al. (2020) compared immersive VR (IVR) and desktop systems and discovered that IVR outperformed in terms of intrinsic motivation, perceived enjoyment, and sense of presence, making it more popular among students. Some studies have focused on the development and performance of several virtual technologies in architectural education through interview research (Redyantanu & Asri, 2021). Others have examined the comparison between simulation and real environments, investigating people's different responses in simulated and real environments through three different modes: 2D photos, 360-degree panoramic images, and VR (Higuera-Trujillo et al., 2017), with an evaluation of the learners' sense of presence. Albrecht et al. (2013) compared the learning effects of medical students using mAR devices and traditional textbook learning methods. Ferrer-Torregrosa et al. (2016) compared image notes, videos, and AR technology in terms of knowledge acquisition and time spent. Alfalah et al. (2019) compared the teaching effects of VR systems and traditional physical heart models. These studies focused on examining the effectiveness of virtual technology in subject teaching. Results showed that virtual technology had better learning effects in terms of knowledge acquisition, learning interest, and engagement across multiple dimensions (Chirico et al., 2018; Buttussi & Chittaro, 2018; Schutte and Stilinović, 2017; Allcoat & von Mühlenen, 2018). Based on the above applications, it can be observed that more and more people are using virtual technology for learning. In conclusion, learning via virtual technology will be an essential trend in the future of education.

Traditional architecture has significant research value in the discipline of architecture, particularly in the areas of history and regionality. Buildings are dynamic entities, constantly growing and evolving to adapt to specific natural and social environments (Shi et al., 2014). Therefore, historical buildings have become unique carriers of history, representing stories of different times and places (Jia, 2019). Regionally, the formation of historical buildings is closely tied to local natural conditions and contextual factors. Given differences in natural climate and geographical features, buildings vary in terms of structural materials, layout and spatial form. In the knowledge system of architecture, historical architecture is considered foundational theoretical knowledge. This system includes cognitive theory of architecture, methodology of architectural design, and knowledge of science and technology related to architecture (Ding, 2015). Within this system, knowledge of historical architecture pertains to the cognitive theory of the basic characteristics of architecture. The knowledge system of historical architecture is broad, encompassing function, structure, decoration etc., and also related to history, culture, philosophy, and even geomancy (Lu, 2011). The study of historical building types and specific cases enables students to develop a comprehensive understanding of architectural entities and the multidisciplinary knowledge behind them, laying an important foundation for architecture students to improve their design thinking, creative ability and aesthetic sense (Wang and Huang, 2020).

The complexity of the knowledge system of historical architecture necessitates higher requirements for learning methods. Architecture is a three-dimensional space with specific content characteristics. Although it can be described with words and expressed with two-dimensional images or technical drawings, these representations are only translations of the architecture and do not represent the complete and true architecture ontology. Hence, individuals must carry out three-dimensional processing through their imagination. In the realm of cognitive knowledge studies, Michael Polanyi, a British philosopher, introduced the theory of tacit knowledge in 1958, which has since opened up a new direction in contemporary epistemological research. Polanyi categorized knowledge into two types based on its ability to be formalized, systematized, and clearly articulated: explicit knowledge and tacit knowledge (Polanyi, 2015). Explicit knowledge refers to knowledge that can be expressed and effectively transferred through written texts, charts, mathematical formulas, and other media. On the other hand, tacit knowledge refers to experiential understanding that is difficult to explicitly document, such as experiences, sensations, and intuitions. This type of knowledge aligns closely with the emphasis on spatial experience in the field of architecture. Architectural educator Albers emphasizes the importance of immersing oneself in real architectural settings to gain firsthand experience. By observing and perceiving architectural features such as scale, materials, and structures, learners can combine this information with their existing architectural knowledge to make judgments and interpretations, ultimately developing a comprehensive understanding and cognition of the architecture. Immersive Virtual Environment gives the experience of sensed reality in virtual environments. It helps the user to perceive some volumetric qualities of a building or space which are hard to depict in 2D drawings. It develops an artificial environment that imitates real-world surroundings convincingly enough that the users suspend skepticism and fully engage with the created environment (Chowdhury & Schnabel, 2019). Figure 1 shows the complex multiple timber-framed structure inside ancient Chinese wooden buildings, which would be easier for students to understand if presented in a more intuitive and interactive way. Virtual technology constructs a completely virtual environment that interacts with the human sensory system and can break the spatial restrictions of the real world. The main way to achieve immersion is through immersive technologies (Suh & Prophet, 2018), refers to the use of digital perception devices and algorithms to blur the boundaries between the physical and simulated worlds, creating a sense of immersion. This technology encompasses augmented reality (AR), virtual reality (VR), and mixed reality (MR) (Margetis et al., 2020), and it generates an experience where the real and virtual worlds merge seamlessly through digital interactive techniques (Mystakidis, 2022).

Fig. 1
figure 1

Complex structural features of Chinese historical architecture (Li, 2009)

With the rapid development of digital technology, it is urgent to carry out intelligent reform in the field of architecture. However, in the architectural field, the application research of historical buildings supported by virtual digital technology is still insufficient. In the field of architecture, the ability to abstractly extract and mentally reproduce spatial models after experiencing spatial environments is an essential core skill. Therefore, spatial experience holds significant importance in architectural education, as spatial cognition corresponds to the tacit knowledge in teaching historical architecture. The perception of spatial changes, referred to as spatial sense in the field of architecture, is a form of spatial cognitive ability. Spatial sense encompasses individuals' experiences of space (Rahimi et al., 2018). Including processes such as object positioning, measurement of dimensions, and distance assessment. The discipline of architecture places a strong emphasis on spatial experience, and virtual technology, as a medium for information dissemination, is highly suitable for learning abstract, dynamic, or non-intuitive phenomena. In terms of spatial expression, it surpasses verbal or numerical expressions (Tost & Economou, 2007). Currently, the application of virtual technology in the field of architecture, particularly in non-historical architectural contexts, is primarily concentrated in architectural design education. Design methodologies based on virtual reality can enhance students' ability to solve design problems (Özgen et al., 2021). Most studies start from the idea of cultural heritage protection and focus on the empirical application of technology. They mainly introduce the application process and methods of virtual technology through the practice of individual cases, verifying the learning and application effects (Qie, 2010; Zhang, 2019; Zhang & Wu, 2019; Zhang et al., 2019). In the early stage of technological development, most studies were mainly on the restoration and reconstruction of virtual models of historical buildings, such as the virtual reconstruction of ancient Jerusalem (Eiteljorg, 1999) and the restoration of Phimai Temple (Noh et al., 2009) with Augmented Reality. However, current studies lack attention to spatial experience, cognitive ability, thinking mode and other aspects. The application effect of technology in architectural disciplines is also affected by the performance of hardware and software (Pamungkas et al., 2018), and the difference in the performance of different types of technology in the application of architecture has not been clearly defined. For these different virtual technologies, the equipment used has different requirements, the construction method is different, the final effect performance may also have their own advantages and disadvantages. Currently, research on technology comparisons is primarily focused on disciplines such as medicine and anatomy (Moro et al., 2017; Moro et al., 2021a, b; Barteit et al., 2021). Despite comparative studies on different technologies and equipment being carried out in other disciplines, the architecture discipline is still lacking in this aspect. As the application effect of different types of virtual technologies in the architectural teaching process has not been fully studied, it is impossible to select the matching virtual technologies in the teaching practice and further limits the application depth of virtual technologies.

The focus of this study is on the teaching content of historical buildings. The study aims to empirically compare the teaching applications of typical virtual technologies through a comparative study of several typical virtual technologies and the construction of an adaptive evaluation system, the study quantitatively analyzes the application differences of different technologies and discusses the adaptation of specific technologies and the teaching of historical buildings. By conducting in-depth research on the application performance and differences of different technologies in historical architectural teaching, the author aims to reveal the application adaptability rules of virtual technology in the teaching of historical buildings and provide reference suggestions for future teaching practices.

2 Design of experiment framework

After a preliminary theoretical comparative analysis of typical virtual technologies, the following virtual technologies were selected for subsequent empirical research.

2.1 Typical virtual technology selection

2.1.1 Augmented Reality (AR)

Augmented Reality (AR) is a technology that enhances human perception of the real world by adding virtual and computer-generated information. Its main features include the integration of the physical real world with virtual objects, real-time operation, and allowing interaction between users and virtual objects (Liarokapis et al., 2007). In simple terms, AR technology overlays computer-generated 3D virtual objects into real scenes in real-time. By utilizing the camera functionality, the corresponding scenes or objects are recognized and virtual content is displayed on the screen. A study conducted with 87 university students learning about historical architecture in outdoor environments found that students using AR had higher learning outcomes in terms of knowledge scores, satisfaction, and motivation compared to the control group (Chang et al., 2015). The advantage of AR technology lies in its meaningful association of virtual learning content with the real environment, presenting a reform in terms of interactive experiences.

AR technology should be combined with real scenes, but the requirements are not high. Virtual interaction can only be carried out by scanning real objects or photos. The AR technology can be used by common intelligent devices, such as smartphones, tablets, and other common devices, and can be used independently without teachers. However, AR cannot realize roaming of space scenes and can only view 3D models or understand relevant information, which does not meet the requirements of this study.

2.1.2 Mixed reality (MR)

On the basis of Augmented Reality, Mixed Reality (MR) is a technology that further blends real and virtual environments. It is important to note that there are multiple definitions of Mixed Reality. In a broad sense, MR encompasses AR and refers to all technologies that merge real and virtual environments in different ways. However, in this paper, we specifically discuss narrow MR, which refers to a special type of augmented reality that lies between AR and VR in terms of its technology and applications. The main difference between MR and AR is that MR aims to further integrate virtual objects into the real world. For example, it can place virtual objects on a real-world table or change the material of the environment to make virtual objects appear as if they are part of the real world. The fundamental characteristics of MR technology are flexibility, immersion, interaction, coexistence, and enhancement. Its advantage lies in the ability to present high-precision holographic images. Applying MR to health science courses in higher education enables students to visualize human anatomy from various perspectives, providing them with a deeper understanding (Moro & Gregory, 2019). The anatomical models displayed in holographic lenses can be placed on surgical planes, allowing users to explore and interact with virtual anatomical structures through gestures. Compared to textbooks, this provides a more realistic and clinically relevant surgical experience (Maniam et al., 2020).

MR technology has no specific requirements for location, but it mainly provides touch interaction in real scenes, so the best effect can be achieved by matching it with real scenes. The MR Helmet or MR Glasses are required for equipment and the operation method requires a learning process and appropriate assistance from teachers. Although MR can provide a first-person perspective experience, its space roaming is mainly in real physical space scenes, which does not match the requirements of this study.

2.1.3 Immersive Virtual Reality (IVR)

Immersive Virtual Reality (IVR) primarily utilizes more immersive virtual devices, such as CAVE (Cave Automatic Virtual Environment) systems and HMDs (Head-Mounted Displays). IVR offers higher technical fidelity, enabling stronger interactivity, immersive display, and a heightened sense of realism in simulated environments (Buttussi & Chittaro, 2018). CAVE is a projection-based virtual reality system that consists of projectors, multiple projection screens surrounding the user, and speakers. It is capable of providing a fully immersive virtual environment. On the other hand, HMD devices primarily use goggles that are worn on the head to directly project images onto the retina of the eyes, presenting enlarged virtual object images.

IVR technology's site requirements depend on the equipment selected. CAVE equipment requires a special laboratory, while HMD equipment is more convenient and only requires a 6 square meter open space. Thus, HMD equipment is more in line with the requirements of this study. It requires learning and adaptation to use the handle, and the use process is completely isolated from reality, which may cause uncomfortable symptoms. Therefore, teachers are required to observe and assist throughout the process. IVR can perform first-person roaming in a completely virtual scene, which is in line with the requirements of this experiment.

2.1.4 Desktop Virtual Reality (DVR)

Desktop Virtual Reality (DVR) utilizes two-dimensional display screens that are viewed directly with the naked eye, and interaction is carried out using a mouse and keyboard. DVR is more widely used due to its simple device requirements. For example, many electronic games or virtual simulation experiments are implemented using this desktop form. Moreover, due to the convenience of its equipment, this approach can also be combined with online remote learning, adapting to a wider range of application scenarios.

DVR technology uses the most common computer equipment and has no special requirements for the site. As computer popularity rate is very high, students can quickly learn to use it independently. The interactive form allows for first-person roaming and can be combined with long-distance online learning, making it highly applicable on a large scale.

2.1.5 360-degree panoramic technology (360-degree PT)

360-degree panoramic technology (360-degree PT) is fundamentally an emerging photography technique that enables the complete capture of a three-dimensional space, simulating and reproducing real environments. The content can take the form of photos or videos, allowing viewers to move and observe scenes in all directions. Regarding the benefits of 360-degree panoramic videos in education, scholars have conducted comprehensive analyses of application reports from various disciplines such as medicine, natural sciences, history, sociology, computer science, and accounting. They have found that the use of 360-degree PT indeed presents significant advantages in terms of student performance, learning motivation, and knowledge retention. Additionally, it includes heightened levels of interest, engagement, enjoyment, and a sense of presence. However, there are also challenges associated with its application, mainly relating to issues such as attention diversion, dizziness, or discomfort. These problems are often linked to poor video quality and limited usability (Pirker & Dengel, 2021).

360-degree PT utilizes equipment similar to that of IVR, resulting in identical site and process requirements. Although lacking interactive features, 360-degree PT allows for easier viewing through the use of a helmet. Panoramic shots, taken from a first-person perspective, can capture the entire roaming experience, satisfying the requirements of this experiment.

In summary, considering the technical implementation conditions, DVR is the only technology that can fully meet the research requirements. For IVR and 360-degree PT, HMD equipment should be chosen, and the process needs to be assisted by teachers. MR and AR technologies should be combined with the real scene but cannot provide space interaction from the first-person perspective, making them unsuitable for historical building teaching applications. Based on these findings, AR and MR technologies that do not meet the requirements are eliminated, and 360-degree PT, IVR, and DVR are selected as representative technologies for subsequent empirical research.

Implementation conditions should meet three aspects: site requirements, process requirements, and form requirements. Site requirements refer to the difficulty of on-site visits in traditional learning methods, which often involve high time and economic costs, making it difficult to carry out on a large scale and frequently. Moreover, the focus of this experimental study is on historical buildings, which may no longer exist or have been destroyed. Therefore, the implementation requirements of virtual technology should not have too many limitations on the learning venue and do not require on-site interaction with real scenes. Process requirements state that the operational difficulty of the experience should be as simple and easy to understand as possible, facilitating independent operation by students. If the technical implementation requirements are high and require multiple technicians or teachers to provide guidance throughout the process, it would be inconvenient for large classes. Therefore, one of the process requirements for technical implementation is to enable students to operate independently as much as possible, reducing their reliance on teacher assistance, thus improving teaching efficiency. Form requirements mainly arise from the learning needs of tacit knowledge of historical buildings. Tacit knowledge refers to spatial cognitive ability, which requires an immersive spatial experience. Only by cognizing the architectural space from a first-person perspective can one experience and discover the characteristics and charm of historical buildings from a "human" standpoint, and then understand the true logic of architectural space composition. Therefore, virtual technology needs to provide a first-person perspective simulation roaming in a virtual space scene. Figure 2 shows whether different virtual technologies meet the corresponding technical implementation requirements.

Fig. 2
figure 2

Virtual technology comparison conclusion

2.2 Adaptive evaluation system index selection

The application adaptability of virtual technology in education is mainly discussed through its evaluation and verification of effectiveness. In terms of teaching applications, the primary concern of the evaluation is the specific learning outcomes(Calvert & Abadia, 2020; Albrecht et al., 2013; Ferrer-Torregrosa et al., 2016; Alfalah et al., 2019). The primary focus of evaluation for instructional applications is the specific learning outcomes. In particular, the optimization of the learning process through virtual technology can be divided into three parts, including knowledge transfer (Bhargava et al., 2018; Lucas, 2018), knowledge retention (Butt et al., 2018; Krokos et al., 2019; Meyer et al., 2019), and task engagement (Bhargava et al., 2018; Bharathi & Tucker., 2015; Pirker et al., 2017), among others. Studies on knowledge in this category are mainly conducted by investigating the mastery of knowledge, such as collecting data through pre- and post-learning assessments or comparing answer performance between different groups (Albrecht et al., 2013; Stromberga et al., 2021). Currently, the instructional effectiveness of virtual technology has been extensively validated through a large body of research.

In addition to the intuitive learning effect, from the field of educational psychology, many scholars focus on the psychological state of learners in virtual learning, including engagement, learning interest (Snelson & Hsu, 2020), flow (Biasutti, 2011; Mirvis, 1991), autonomy (Ryan et al., 2006), among others. Studies in this category mainly collect data through professional psychological scales. In the learning process, learners' emotions are improved, such as a stronger sense of presence, higher classroom participation, and easier emotional stimulation, which contribute to the formation of a positive feedback loop that promotes learning (Cheng & Tsai, 2019; Heidig et al., 2015; Pedram et al., 2020; Plass et al., 2014; Um et al., 2012). The architecture discipline's attention to the effects of virtual technology application is divided into two parts: the teaching of knowledge acquisition and the improvement of spatial thinking ability. Technically, the evaluation of the application's effectiveness focuses on the experience feedback of virtual scenes. Therefore, the main evaluation criteria for the suitability of typical virtual technology in teaching historical buildings can be categorized into four categories. Table 1 illustrates the corresponding aspects for each of these four categories.

2.2.1 Architectural learning dimension: Acquisition of professional knowledge

Regarding professional knowledge, the focus is primarily on investigating the mastery and transfer of knowledge. The most commonly used method is to collect test score data to quantify and facilitate analysis and statistics. In addition to answer scores, learning efficiency can also be analyzed by taking into account the learning time as another dimension for comparison. These objective indicators provide intuitive and important feedback results for learning.

2.2.2 Architectural learning dimension: Spatial cognitive representation

Architectural educator Alberts believes that the learning process of architecture is not just about simple knowledge accumulation. The best learning method for architecture is personal hands-on experience and spatial cognition. Representative figure of architectural phenomenology theory, Steven Holl, emphasizes the direct feeling that architecture brings to its users and links this experience to the basic characteristics of architecture. He abstracts the basic elements of architecture and focuses on people's perception of elements such as materials, light, color, scale, details, and activity sequences. These elements are the superficial characteristics of architecture, and through the perception of these architectural elements, the experiencer generates a perceptual experience of the overall spatial scene. Therefore, the degree of spatial cognition can be reflected from the perception of these characteristics. Based on the practical learning and application of architecture, several specific types of spatial cognitive abilities that are most relevant to architecture include spatial relations, dimension transformation, and scale perception.

Spatial cognitive ability is primarily concerned with one's ability to comprehend architectural features, spatial relations, dimensional transformation, and scale perception. Architectural features include building materials, lighting conditions, color perception, and intricate design details. Mastery of spatial relations is mainly reflected in one's ability to perceive spatial layout, sequence, and identify spatial elements. Dimensional transformation refers to the ability to convert three-dimensional space to a two-dimensional plane. Scale perception is the ability to judge the scale of a scene based on personal perception.

2.2.3 Architectural learning dimension: Learning state performance

In virtual learning, the learner's mental state serves as a crucial indicator that educational psychology examines. Five prominent psychological indicators, namely Flow, learning interest, learning enthusiasm, learning autonomy, and learning sense of achievement, have been selected for examination. Flow is characterized as a highly focused and immersive mental state that emerges when the challenges and abilities are balanced (Biasutti, 2011). Flow is a mental state characterized by complete immersion in a challenging yet enjoyable activity (Winn, 1993). Learning interest refers to the level of interest that students have in the current learning content, According to a comprehensive review study on teaching with 360-degree panoramic videos, the sense of presence in immersive learning allows learners to feel as if they are actually present in the virtual environment. Learners have shown a great preference for the immersive experience of 360-degree panoramic videos, demonstrating higher levels of interest, engagement, and enjoyment in the learning process (Snelson & Hsu, 2020). Learning enthusiasm pertains to the students' eagerness to continue learning. Learning autonomy describes the extent to which students control their own learning process and experience, Autonomy refers to the sense of control over one's actions and outcomes. Autonomy can influence intrinsic motivation and is associated with feelings of joy, preference, and the adjustment of states during the process. The importance of autonomy can be understood by referring to self-determination theory, which emphasizes the degree of self-determination in human behavior (Ryan et al., 2006). The sense of achievement in learning is the feeling of accomplishment that students experience after completing learning tasks. These five indicators collectively reflect the psychological state of students throughout the virtual learning process, and the data will be collected in conjunction with professional psychological scales.

2.2.4 Technology application dimension: Virtual experience effect

The virtual environment should provide users with a sense of presence. Presence is an important indicator in the field of psychology for studying virtual environments. It refers to the feeling of being immersed in an environment, where the individual experiences a sense of "being there" through this mediated environment (Ijsselsteijn et al., 2000). Presence is also referred to as "spatial presence" and is defined as a psychological state where virtual objects are perceived as real objects (Lee et al., 2004). Some researchers in the field of psychology have proposed a two-dimensional dynamic model related to presence, comprising the dimensions of experiential self-location and possible actions in the media environment (Wirth et al., 2007). The study of presence can help determine the level of authenticity and immersion in the virtual environment.

Table 1 Indicators of virtual technology adaptability evaluation system

2.3 Design of virtual scene teaching mechanism

2.3.1 Learning case

Currently, the research on learning about historical buildings through virtual technology mainly focuses on individual historical buildings, with little attention given to architectural settlement groups. Additionally, the knowledge content is mainly focused on basic professional knowledge, with insufficient research on spatial experience cognition. Therefore, this study aims to investigate the ancient residential buildings of the CAI clan in Fujian, also known as the CAI clan ancient Cuo [" cuo "means house in the Minnan dialect]. The ancient house of the CAI clan is a complete residential building with a well-organized layout, unified and harmonious architectural style, and distinctive details. The architectural form has many typical features of folk dwellings in southern Fujian, making it a representative example. Currently, it is well-preserved, and field investigation and recording are convenient. Therefore, this study selects the ancient house of the CAI clan as a specific case study of folk dwellings in southern Fujian, and conducts research and investigation on professional knowledge and spatial cognition.

2.3.2 Learning content

This experiment aims to teach professional knowledge through knowledge point explanation and improve spatial cognition through immersive spatial experience. The panoramic video showcases typical outdoor scenes of CAI clan ancient houses, which are precisely arranged and adjacent. IVR technology and DVR technology utilize the same scene model built by SketchUp to restore the original style of the building as accurately as possible. Figure 3 shows a comparison between the virtual scene and the live video. The three technical groups use the same material content for knowledge content, music, dubbing, and other aspects.

Fig. 3
figure 3

Comparison of the effects of virtual scene (left) and live video scene (right)

Figure 4 illustrates the selected knowledge points, which focus on four aspects: the overall introduction of buildings, building space, building materials, and decorative details. Firstly, the overall description of the buildings includes the orientation of the buildings and its reasons. The typical exterior space types, including flagstone and firebreak alleys, are introduced. In the section on building materials, the typical characteristics of building materials in southern Fujian are described in detail. The decorative details explain the combination of carving and color and the three representative decorative features of red brick parquet, dovetail ridge, and red brick relief. The spatial cognitive experience mainly involves the study of the overall relationship between the ancient house of the CAI clan. Through roaming, participants experience the spatial sequence, layout relationship, scale changes, and other knowledge content.

Fig. 4
figure 4

Roaming route and knowledge diagram

2.3.3 Interaction software

The interactive aspect of 360-degree PT mainly involves using Adobe Premiere Pro, a video editing software, to edit the footage captured by the Panorama camera. This includes optimizing the visual effects and adding music and dubbing. Unity3D was selected as the virtual interactive construction tool for both IVR technology and DVR technology. The specific content and interaction of the two VR technologies are exactly the same.

2.4 Experimental questionnaire design

The questionnaire used in this research can be consulted in the appendix.

2.4.1 Spatial cognitive ability

The Likert scale method was used to subjectively score the architectural features, spatial relations, dimensional transformation, and scale perception. To improve the accuracy of the data, objective test questions were added to assess scale prediction and space identification. Scene recognition not only measures spatial cognition but also reflects the depth of students' learning of architecture. It is one of the test questions used to measure the learning effect.

2.4.2 Learning state

The corresponding topic was established according to five representative psychological indicators, and the Likert scale method was used to quantify responses on a seven-point scale. The specific expressions used were related to professional scales used in psychology. The FFS scale, proposed by Rheinberg, Vollmeyer, and Engeser in 2003 (Rheinberg et al., 2003), was used to measure flow, with absorption by activity (ABA) and fluency of performance (FP) being the key dimensions. The description of flow used in this questionnaire was based on the ABA dimension of concentration.

2.4.3 Virtual experience effect

The study consisted of two main parts: subjective evaluation and presence experience. The subjective evaluation included four questions regarding the favorability, recognition, and description of spacious or narrow space, and gorgeous or concise elements. In the field of psychology, there are several professional scales for assessing virtual simulation scenarios, such as IPO, MEC-SPQ, ITC-SOPI, TPI, among others. Some scholars have conducted empirical tests using these scales to compare the virtual experience of different technologies. The study found that the ITC-SOPI scale was not effective in distinguishing between different types of immersive technologies, while the MEC-SPQ was more sensitive than TPI due to its greater number of dimensions. The MEC Spatial Presence Questionnaire was designed based on the MEC Two-level Model of Spatial Presence (Vorderer et al., 2003). The scale consists of multiple dimensions, including Spatial Presence, which comprises Self Location (SPSL) and possible actions (SPPA). These two dimensions can be considered as an independent scale, referred to as the SPES Spatial Existence Experience scale. Based on the research mentioned above, the measurement of presence in this questionnaire refers to the SPES Spatial Presence Experience scale in the MEC-SEQ scale.

2.4.4 Learning effect test

The learning effect test mainly assesses students' understanding of key concepts, such as roof type, orientation, actions for firebreak alleys, building materials, and flagstone functions. Figure 5 illustrates the key elements of architecture learning. The questions cover various aspects, including spatial identification problems, and assess students' knowledge through different question types, such as single choice, multiple choice, fill-in-the-blank, and picture recognition.

Fig. 5
figure 5

Architecture learning factors

3 Experimental methods

3.1 Experimental design

This study collects objective quantitative data to analyze and compare the effectiveness of technology applications in teaching historical buildings. Specifically, it seeks to answer the following research question: Are there significant differences in the application effects of various virtual technology groups in the teaching of historical buildings?

The experiment employs virtual learning and questionnaire survey as research methods. Objective quantitative data are mainly collected through the questionnaire, which includes two parts: objective test and subjective survey. The objective questions primarily assess knowledge content and scoring combines with learning time. For the subjective questions, the subjects rate their perceptions on a scale of 1 to 7 using a professional psychological scale combined with the Likert scale. This approach quantifies the data of subjective feelings and facilitates a horizontal comparison of differences between groups.

The main purpose of the experiment is to compare different technology groups. By controlling variables, efforts are made to ensure consistency in terms of environment, content, time, and difficulty, with a focus on studying the differences between groups brought about by the variable of technology. The content of the two virtual reality groups is identical, with only differences in the devices used. On the other hand, the content and interaction methods of the 360-degree PT differ from those of the two virtual reality groups. The learning content in the 360-degree PT is presented through real-life videos, which include more detailed scenes such as people walking, parked electric cars, surrounding trees, and daily objects. These details are omitted in the virtual reality scenes. In terms of learning content, there are no subtitles in the voice explanations, and there is a lack of interactivity and autonomy. Both the content and interaction methods of the 360 group differ from those of the two virtual reality groups. Therefore, in order to study the impact of technology on learning outcomes, the 360 group, which represents a relatively traditional and restrictive learning environment, is used as the control group. Through a horizontal comparison with the virtual reality groups, a clearer understanding of the differences in learner performance and experience under different technology conditions can be obtained.

3.2 Experimental equipment and participants

The experimental equipment used by the three technical groups differed. The 360-degree PT group used an Insta 360 EVO 180°folding panoramic camera with stabilizer to ensure effective video shooting. HTC VIVE, one of the most common VR devices on the market, is used in the experience process. It has the advantages of high cost performance and stable effect. The IVR technical group also uses the HMD equipment. DVR technology group primarily utilized computers, including a keyboard, mouse, monitor, and running host, with the Dell U2721DE screen for display.

Students selected for the experiment were limited to architecture and planning majors with a certain knowledge base of historical architecture. This facilitated experimental learning, allowed for more accurate understanding of the professional expression in the questionnaire, and made it convenient to contrast with traditional learning methods. Participants had not previously experienced the space scene or visited the ancient house of CAI clan, nor had they learned about folk dwellings in southern Fujian. The final sample included 60 architecture and planning major students (33 women, 27 men) divided into three groups of 20 people each: the 360-degree PT group (360 group), the IVR technology experimental group (IVR group), and the DVR technology experimental group (DVR group). Table 2 shows the controlled grade ratio and male-to-female ratio of each group member. Each group consisted of 3 or 4 undergraduate students and 16 or 17 graduate students, with 9 males and 11 females. The participants in the experiment were required to meet specific physiological conditions, including the absence of heart disease, dizziness, and other illnesses, as well as the absence of any discomfort reactions to 3D products. Additionally, participants were required to have a naked-eye visual acuity or corrected visual acuity with contact lenses of at least 5.0, as wearing eyeglasses frames would hinder the proper use or cause discomfort when using the head-mounted display (HMD) device. Prior to the experiment, participants were informed about the potential risks involved and were required to sign an informed consent form.

Table 2 Participants information statistics

3.3 Experimental procedure

Figure 6 illustrates the specific process of the experiment, while Fig. 7 shows the experimental scene.

Fig. 6
figure 6

Schematic diagram of experimental process

Fig. 7
figure 7

Experimental scenes of different technical groups

3.3.1 Experimental process description and equipment debugging

The first step involved describing the experimental process to the participants and ensuring that they understood the task by signing an informed consent form. Next, the equipment was worn and debugged to ensure that it operated normally and did not interfere with the experiment.

3.3.2 Concrete experiment

The learning program was opened, and the specific operation method was explained to the students to ensure that they understood how to use it. Formal learning began once the students were familiar with the operation method. The experimenter observed the learning progress of the participants and recorded the total duration of the entire learning process, as well as the participants' conditions.

3.3.3 Questionnaire survey

After completing the learning task, the participants removed the equipment and completed the questionnaire.

3.3.4 Data analysis

The results of the questionnaire were analyzed and summarized according to the evaluation system.

3.4 Experimental learning tasks

The learning process for the 360 group did not require any interaction. Students simply followed the video and listened to the explanations. The order of the knowledge points presented followed the sequence of "general introduction-orientation-material characteristics-flagstone-red brick relief-dovetail ridge-firebreak alleys-red brick parquet-construction characteristics". The entire duration of the experimental video was approximately 7 min.

Both VR groups require students to navigate according to the arrow-guided route on the ground, providing them with a general understanding of the overall spatial scene. The navigation paths in both technologies are consistent with those in the 360-degree video. During the navigation process, there will be voice-over explanations to introduce the overall features of the Cai clan ancient house. When students reach the endpoint of the navigation path, a task instruction will be triggered, guiding them to collect puzzle pieces scattered throughout the scene. There are a total of 8 puzzle pieces, each located in a position related to specific knowledge content. Students are required to carefully observe different parts of the building to discover the puzzle pieces. Each time a puzzle piece is clicked, a text and image description will appear, accompanied by voice-over explanations. Students can close the window by clicking "OK" after completing the learning process. The sequence in which students search for the puzzle pieces is random. Therefore, the learning speed and progress depend on individual learning capabilities, allowing students to adjust autonomously. If students wish to review the content, they can click on the same puzzle piece again. Once all 8 pieces are collected, a completion notification window will automatically appear, displaying the complete map of the Cai clan ancient house and explaining its cultural heritage value. This marks the completion of the entire learning process.

4 Data analysis

Data statistical analysis in this experiment was conducted using SPSSAU (version 20.0.0) and utilized a variety of statistical methods (Fig. 8). Initial analysis focused on the reliability and validity of the data. Reliability analysis was used to assess the credibility of the quantitative data, while validity analysis was used to evaluate the rationality of the topic. A Cronbachαreliability coefficient value greater than 0.8 indicated high data reliability quality, a KMO value greater than 0.6 indicated good data validity, and all research item communalities greater than 0.4 indicated effective information extraction.

Fig. 8
figure 8

Flow chart of data analysis

Due to the involvement of participants of different genders and grade levels, independent sample t-tests were conducted to examine whether these variables would result in differences. The gender variable was categorized as male and female, while the grade level variable was categorized as graduate and undergraduate. If the p-values are considerably larger than 0.05, it indicates that the data does not exhibit any significant differences.

Analysis of differences between technical groups primarily employed ANOVA. Data were first tested for normality and homogeneity of variance. In practice, the graphical method illustrated in Fig. 8 was used for normality testing, with data identified as normally distributed if they exhibited a symmetric bell shape and had absolute values of kurtosis less than 10 and absolute values of skewness less than 3. If the results of homogeneity of variance analysis were not significant (p > 0.05), conventional ANOVA was used. If homogeneity of variance was significant (p < 0.05), Welch ANOVA was utilized.

4.1 Architectural learning dimension: Acquisition of professional knowledge

The mastery effect of architectural professional knowledge was assessed through test scores, duration, and learning efficiency based on score/duration. The unit for measuring study duration is minutes. Initial analysis focused on the reliability and validity of the data, which were found to be excellent and suitable for further analysis. Independent sample t-tests of gender and grade variables revealed no significant differences, indicating that learning outcomes of architectural professional knowledge were not affected by gender or grade. Following normality and homogeneity of variance testing, ANOVA was used to directly compare differences. Results are presented in Table 3.

Table 3 Acquisition of professional knowledge—ANOVA

4.1.1 Learning score

Objective data revealed significant differences in learning scores among the technical groups. The IVR group had the highest score (71.53 ± 17.09), followed closely by the DVR group (69.58 ± 13.47), with the lowest score observed in the 360 group (57.35 ± 19.11). The 360 group content was relatively simplistic, with attention focused solely on the scene, potentially leading to a disregard for explanations. In contrast, the virtual scene of the IVR group included dubbing, text, and pictures, with each knowledge point presented as an independent module, resulting in clearer understanding. Participants from the IVR group were able to control their own learning time and speed, contributing to their superior learning outcomes.

4.1.2 Learning duration

Regarding learning time, the 360 group had a fixed duration, whereas the IVR group (10.85 ± 2.03) had a significantly longer time than the DVR group (7.78 ± 1.94), with the ANOVA analysis revealing a highly significant difference (p = 0.000 < 0.01). Although the content of IVR is the same as the DVR group, IVR requires more time to learn due to the complexity of device operation.

4.1.3 Learning efficiency

Learning efficiency, defined as the ratio of learning score to duration, also exhibited significant differences (p = 0.012 < 0.05). As the IVR group required more time for learning, it demonstrated the least efficiency (6.82 ± 2.21), while the DVR group exhibited the highest efficiency (9.44 ± 3.04). Overall, the IVR group achieved better learning outcomes, but at a longer duration, while the DVR group exhibited the highest learning efficiency.

4.2 Architectural learning dimension: Spatial cognitive representation

The data in this section met the requirements for reliability and validity, with no significant difference observed in the grade variable. However, a significant difference was observed in the gender variable for the dimensional transformation indicators, specifically the ability to "have a clear concept of spatial arrangement, and even draw a schematic plan" (p = 0.013 < 0.05). The score for males (4.85 ± 1.32) was significantly higher than that for females (3.91 ± 1.49). This difference may be due to gender differences in spatial perception, or differences in confidence levels between men and women, as women tend to be more conservative. Furthermore, the data were normally distributed, with some exhibiting homogeneity of variance. Welch ANOVA was used for this part, while conventional ANOVA was used for the remainder (Table 4).

Table 4 Spatial cognitive representation—ANOVA

4.2.1 Architectural feature perception

Regarding perception of building features, significant differences were observed in building materials and color changes. Building materials exhibited a p value of less than 0.01, indicating a highly significant difference. For color changes, the 360 group had the highest score (6.15 ± 0.88), followed by the IVR group (5.75 ± 1.29), and the lowest score was observed in the DVR group (5.10 ± 1.17), indicating that panoramic technology is better at showcasing color changes, while the DVR group's color change performance was less apparent.

4.2.2 Relationship of space, Transformation of dimensions, Change in scale

There was no significant difference observed in the ANOVA results for these three aspects, with the data being relatively similar. This may be attributed to the short experience time and the simple case space, resulting in no significant gap between the different technologies. Regarding objective tests, including scale estimation items and spatial identification choice items, chi-square analysis revealed no significant difference between the different techniques. However, the scale estimates for each group were very close, but not very accurate. For spatial identification, the accuracy rate for the 360 group was 67.5%, which was significantly lower than the 82.5% for the IVR group and the 80% for the DVR group. This reflects the disadvantage of panoramic technology in spatial cognition.

4.3 Architectural learning dimension: Learning state performance

The data in this section met the requirements for reliability and validity, with no significant differences observed in the grade or gender variables. The data were generally normally distributed, and the homogeneity of variance was not significant. The results are presented in (Table 5).

Table 5 Learning state performance—ANOVA.

The only indicator of learning status that exhibited significant differences among the different groups (p = 0.013 < 0.05) was learning autonomy. The disparity mainly stemmed from the difference between panoramic video and the two virtual scene groups, with a difference of more than 1.1, while the two VR groups were close, with a difference of only 0.15. The learning process for the 360 group was relatively passive, with participants only able to watch and listen to audio explanations in place, lacking interaction and uncontrolled learning progress. Thus, panoramic video has limitations in terms of learning autonomy. Although the difference in learning achievement was not significant (p = 0.11 > 0.05), the average data indicated that the 360 group was significantly lower than the two VR groups, which aligns with the objective data results of the previous learning score.

4.4 Technology application dimension: Virtual experience effect

Initially, the reliability and validity of the data were analyzed, and all requirements were met. Subsequently, t-test analysis was conducted to examine gender and grade differences, with no significant differences observed. Normality and homogeneity of variance tests were performed on the data, and they were found to be generally normally distributed. The homogeneity of variance analysis revealed that some of the data were significant, and thus the Welch ANOVA method was utilized for this portion of the analysis.

4.4.1 Subjective spatial evaluation

Initially, the subjective spatial evaluation was analyzed, with no significant differences observed. This suggests that the spatial experience of each group was similar, and there were no noticeable differences. While this may indicate that the technical differences were not significant, it could also be attributed to the simplicity of the scene space itself.

4.4.2 Presence

The SPES scale was used to measure the sense of presence, and the data were processed to obtain two dummy variables of Self Location and Possible Actions (Table 6). Based on the homogeneity of variance test results, both analysis of variance and Welch analysis of variance were conducted, revealing significant differences in both cases.

Table 6 Virtual experience effect—ANOVA

"Self Location" emphasizes the sense of presence, with the highest value observed in the IVR group (5.22 ± 0.81) and the lowest in the DVR group (4.22 ± 1.14). The 360 group showed moderate performance, falling in the middle. The stronger sense of presence in the immersive environment of the 360 and IVR groups, which both use HMD devices, could explain this difference.

"Possible Actions" refers to the degree of behavioral autonomy in a virtual scene. The results showed significant differences among different technology groups (p = 0.011 < 0.05), with the highest mean value observed in the IVR group (5.09 ± 0.89), followed by the DVR group (4.64 ± 1.06), and the 360 group (4.11 ± 1.00) having the lowest value. The passive scene movement of the 360 group only allowed for movement with the video camera, without active roaming. In contrast, the IVR group, which uses an immersive HMD device, had a stronger sense of control over actions in the virtual scene compared to the DVR and 360 groups.

5 Discussion

5.1 Technical adaptive discussion summary

This study uses residential building types in southern Fujian as a learning case and conducts a comparative study of the entire process of typical virtual technologies, from preliminary development requirements to final application effects. We can draw comparative conclusions on the suitability of these technologies:

  • IVR virtual reality technology is the most suitable. It yielded the highest learning grade, the highest learning autonomy, and the strongest sense of presence, but it is relatively time-consuming. If immersive effect is a priority, IVR virtual reality technology is recommended.

  • 360-degree PT has a medium suitability. Its application effect is limited by content form, but it performs best in presenting architectural features. Therefore, if the focus is on architectural effect and showing more details, 360-degree PT is recommended.

  • DVR technology has a medium suitability. It yielded the highest learning efficiency, a balanced overall performance, and the most convenient implementation conditions. Therefore, if the goal is to have technology with wider application scenarios, DVR technology is recommended.

  • AR technology and MR technology have poor adaptability. These two technologies do not provide an immersive spatial experience and are thus unsuitable for the learning needs of historic buildings.

Currently, in the field of research disciplines, a comparison has been made among these technologies. In the field of architectural cultural heritage preservation, AR is more suitable for enhancing exhibition effects, VR is more suitable for virtual museums, and MR is more suitable for indoor and outdoor reconstruction applications (Bekele et al., 2018). In the fields of clinical physiology and anatomy, both VR and AR are feasible alternative methods that have no significant impact on performance improvement (Moro et al., 2021a, 2021b). In stroke and asthma courses, AR, VR, and MR can all provide support for future teaching projects (Barteit et al., 2021).

These research findings indicate that different virtual reality technologies have different advantages and applicability in various disciplinary fields. While there may not be significant differences between different technologies in certain fields, in other fields, certain technologies may have better advantages. This study supplements the understanding of performance differences among different types of technologies in the application of teaching historical architecture. Therefore, when selecting virtual reality technologies, specific application needs and goals should be considered. Further research is needed to explore the application and effects of different virtual reality technologies in other disciplinary fields, as well as how to further enhance their performance and user experience.

We found that our results are consistent with previous research comparing the learning effects, engagement, presence, empathy, and other indicators of 360-degree panoramic technology and VR technology, which consistently showed better performance of VR technology (Calvert & Abadia, 2020). When comparing immersive VR and desktop systems, studies have consistently shown that immersive VR performs better in terms of intrinsic motivation, perceived enjoyment, and sense of presence, and is more popular among students (Klingenberg et al., 2020). However, our study revealed some novel findings, such as the positive impact of 360-degree panoramic technology on spatial cognitive abilities in architectural education, as well as the different potential of IVR and DVR in education. Moreover, our study also shed light on the influence of different learning media on historical architectural education, an aspect that has not been extensively explored in the literature.

Additionally, our comparative analysis has helped us identify potential areas for further research. For instance, while previous studies have already demonstrated the effectiveness of virtual technologies in promoting learning, the focus of research has mainly been on fields such as psychology and education. Educational studies primarily emphasize learning experiences and outcomes (Albrecht et al., 2013; Alfalah et al., 2019; Calvert & Abadia, 2020; Ferrer-Torregrosa et al., 2016), while educational psychology further examines the state manifestations of the learning process, such as learning interest, flow, and autonomy (Albrecht et al., 2013). However, further research is still needed to explore the impact of virtual technologies on the specific professional skills required in different disciplinary fields. For example, this study supplements the focus on tacit knowledge, such as spatial experience and cognitive abilities, within the field of architecture. Our research highlights the importance of autonomy in the effectiveness of these technologies, indicating that future studies should explore methods for fostering learner agency and self-directed learning within virtual environments. Overall, our comparative analysis provides a more nuanced understanding of the impact of these technologies on educational outcomes and identifies potential avenues for future research. By situating our research findings within the broader context of existing literature, we hope to make meaningful contributions to the field and enhance our understanding of the application of different virtual technologies in architectural education.

5.2 Technical differences in application effects

5.2.1 Architectural learning dimension: Acquisition of professional knowledge

The study found significant differences in objective learning data related to professional knowledge acquisition. The IVR group demonstrated the best academic performance, while the 360 group demonstrated the worst. The IVR group also took the longest time to learn, while the 360 group took the shortest time. As a result, the DVR group exhibited the highest learning efficiency, while the IVR group exhibited the lowest.

5.2.2 Architectural learning dimension: Spatial cognitive representation

The study revealed significant differences in spatial cognition ability, particularly in the perception of architectural features. The 360 group demonstrated significantly higher scores for building materials and color changes compared to the two VR groups. The results of the objective questions regarding scale estimation and spatial identification were similar across all groups. Furthermore, gender differences were observed in the dimension transformation index, with males scoring significantly higher than females.

5.2.3 Architectural learning dimension: Learning state performance

Only learning autonomy showed significant difference in learning status, IVR group and DVR group were significantly higher than 360 group, and the rest were not significant.

5.2.4 Technology application dimension: Virtual experience effect

Regarding the impact of virtual experience, significant differences were observed primarily in the sense of presence. The IVR group reported the highest sense of presence, while the DVR group reported the lowest self-location perception. The 360 group had the lowest perception of possible actions.

This study aimed to investigate possible significant differences in the effects of different virtual technology groups in teaching historical buildings. The results are summarized in Fig. 9. Overall, IVR technology demonstrated the best application effect among the three technologies, performing well in terms of academic performance, autonomy, and sense of presence. However, it took the longest time to learn and was the least efficient. Panorama technology exhibited clear advantages in feature perception but underperformed in academic performance, autonomy, and other aspects. DVR technology had the highest learning efficiency and demonstrated balanced overall performance.

Fig. 9
figure 9

Comparison of indicators of significant differences in application effectiveness

5.3 Suggestions for subsequent application

5.3.1 Technology path: Combine different virtual technologies to complement each other

Based on the impact analysis of application effects, it is concluded that high-simulation scene effects, interesting interaction design, easy-to-understand knowledge presentation, and easy-to-operate interactive equipment are key factors in virtual technology learning and application. Regardless of the chosen virtual technology, attention should be given to these aspects.

In technology usage, it is possible to expand thinking by combining different technologies. For example, 360-degree PT can be combined with VR virtual reality technology, where virtual models are placed in real photos or videos for interaction or real photos or videos are displayed when viewing key spaces or architectural details. This approach allows students to learn about real buildings through real scenes and interactive learning through virtual models, enabling the advantages of different virtual technologies to be fully utilized to improve learning outcomes.

5.3.2 Learning mechanism: Enrich interactive design and balance interest and teaching

As this experiment is a comparison of basic methods, the learning method and path design is relatively simple. However, virtual technology can provide creative and realistic interactive operations, which can enhance the learning experience, such as model building, material changes, and historical scene reproduction.

However, data analysis of IVR technology reveals that enhancing interaction can hinder the recognition of space, which is an essential aspect of spatial cognitive learning. Therefore, in subsequent specific applications, it is necessary to strike a balance between the interest of interaction and the professionalism of teaching. In terms of interaction design, it is essential not only to enrich the forms of learning interaction but also to avoid causing students to focus too much on the forms of interaction and neglect real knowledge learning.

5.3.3 Teaching applications: Clarifying the application positioning of virtual technology

The empirical experiment has fully verified the advantages of virtual technology in teaching historical architecture. Virtual technology can enhance students' understanding of historical architecture knowledge through interest, visualization, and autonomy methods, and can compensate for the lack of intuitiveness and interactivity in traditional classroom teaching. In the experiment, students generally expressed their approval and welcomed such virtual learning methods. As such, virtual technology can be an important supplement to traditional teaching methods. However, in practical applications, it is necessary to first clarify the intended location and learning objectives of virtual technology and fully consider the emphasis, such as visual intuitive presentation, interactive dynamic interest, or knowledge content completeness. Only in this way can the appropriate virtual technology be selected.

6 Conclusions

This paper introduces an adaptive evaluation system tailored to the goals and needs of historical architecture teaching. We conduct multidimensional comparisons of typical virtual technologies and verify technical differences through empirical research. The research findings on application suitability provide valuable insights into the use of virtual technology in teaching historical architecture and offer practical guidance for selecting the most appropriate virtual technology for subsequent teaching practices. This innovative approach contributes to a deeper understanding of the effectiveness of virtual technology in historical architecture education. This study focuses on a comparative analysis of representative virtual technologies, but there are some limitations. Some indicators, particularly spatial cognition, did not show significant differences, which may be due to the selection of cases and research methods. Future research should aim to enhance the technical level, increase sample data, cover a wider range of architectural cases, and conduct more comprehensive and systematic analysis. This will deepen our understanding of the relationship between virtual technology and architectural learning, and promote the development of historical architecture teaching methods.