1 Introduction

Science, art, and history museums make extensive use of virtual reality (VR) (Zhou et al. 2022). The user-centered design approach of human–computer interaction (HCI) emphasizes user participation (Case 2013), provides design frameworks (Quezada et al. 2021) and emphasizes the importance of user experience (UX) in HCI fields such as VR-based museum exhibitions (VRME). The UX design of VRME has encountered opportunities and difficulties due to the differences in HCI between VR-based and conventional interaction interfaces (Schofield et al. 2018), so proposing the UX model is crucial.

Researchers have used CiteSpace to examine VRME UX studies and create a visual map (Chen & Song 2019). The researchers found 161 studies that use the terms "virtual reality," "museum," and "experience." The keyword co-occurrence map, the cluster map of 2017 and later cluster maps (when this type of literature appeared more frequently) were created using CiteSpace (Q = 0.6833, S = 0.8894, indicating that the clustering structure is significant and persuasive) and drawn in the same map as displayed in Fig. 1.

Fig. 1
figure 1

Literature keyword co-occurrence map and cluster map generated by CiteSpace

As seen in Fig. 1, many VRME UX researchers rely on empirical methods, which center on real users (Kabassi 2017) and thus aid in UX research.

The map illustrates that many academics have considered cultural heritage to be a type of VRME. The use of digital technology in cultural heritage, such as 3D scanning, photogrammetry, 3D modeling and rendering, is also evolving as a crucial method of preserving and promoting cultural heritage (Li et al. 2010). This allows for the use of VR to improve the exhibition experience. It changes the conventional exhibition method, which traditionally prohibits visitors from touching the exhibits, and allows them to closely interact with virtual cultural artifacts that are nearly identical to the real ones (Kyriakou and Hermon 2019) or immersively enter restored or destroyed historical remains without leaving their homes (Kersten et al. 2017; Pagano et al. 2020).

The map also shows that the word "reality" is commonly used, which reveals VRME’s UX orientation. Since people come from reality, the proportion of reality in virtual applications is a key metric (such as the virtual-reality continuum proposed by Milgram and Kishino (1994) and the "reality, virtuality, mediality" classification method proposed by Mann (2002)). VR’s UX aims to dissociate users from reality and persuade them to see themselves as part of the virtual environment (Flavi et al. 2019). A real-world individual entering VR is like going indoors after being outside in the rain. Wet clothes take time to dry. A strong VR experience design can help participants immerse themselves more quickly and more constantly for a better experience, which is comparable to the way that a fan or air conditioner can quickly dry wet clothes to create a more pleasant environment. A bad VR experience design or an unexpected reality input will break the user’s immersion, and they will return to reality (von Willich et al. 2019). It is equivalent to piercing the roof of the room. Even if the hole is plugged immediately, people must still wait for their wet clothes to dry.

To conclude, it is vital to empirically examine the user experience elements that help users escape reality in VR-based cultural heritage exhibitions (VRCHEs). The user experience design should also address the challenges it encounters and propose possible solutions.

2 Related works

Figure 1’s keyword co-occurrence map indicates that there are few related works on VRCHE UX models. Of course, there are still some related studies, and in other virtual exhibition sectors such as mixed reality (MR), augmented reality (AR), or website technology, there are some works related to UX models that provide a reference for this research.

Baradaran et al. (2022) examined the benefits of VR game-based museum exhibitions using the spatial experience model (Baradaran et al. 2018) and qualitative analysis. The study compared three approaches—VR games, photographs, and videos—for the same exhibition theme. It found that VR games enhanced audience engagement, emotion, and learning. Each component of this study examined the benefits of VR game-style museum exhibitions based on the spatial experience model 2 and other dimensions from qualitative analysis. However, this study limited itself to VR games as a virtual exhibition form; it did not cover other forms or provide design principles, new model or framework. Kabassi (2017) studied the usability and functional evaluation of VR-based exhibitions. The user-centered empirical method research reveals both similarities and differences in the evaluation indicators. This study demonstrates that audience groups vary in their perspectives on virtual exhibition experiences and their components. However, the study does not provide an experience model or a set of guiding indicators, nor does it use empirical research to validate the review findings. Kim (2018) reviewed the literature on virtual exhibition visitor communication, which can guide the design, systematization, and theorization of virtual exhibitions and ensure effective communication between visitors and virtual exhibitions. However, the researchers did not verify the model’s validity with empirical study. Trunfio and Campana (2020) established a novel approach to evaluate MR’s effect on visitor experience. To validate the model, this research adapts an existing model for the museum MR experience, asks users to rate each variable based on the model for the MR museum application using the importance-performance analysis method, and then conducts quantitative analysis to verify the model. The model has limitations because this study only tests the model with quantitative analysis and does not try to make it more comprehensive with qualitative techniques. Yi and Kim (2021) defined six types of museum cultural relic encounters. This study combines existing experience models, introduces new experience components, and creates a new experience model. Users who tested an MR museum exhibit application rated its design highly. Later user studies evaluated the new model, which depended more on literature reviews than empirical research. The authors checked the application design’s logic, but not the model’s soundness. Wu et al. (2022) provided 11 hypotheses on customers’ behavioral intentions toward a digital clothes museum. Quantitative analysis of data from 265 clothing-related respondents revealed significant factors influencing user intentions to use a digital clothing museum. This study showed users’ expectations for virtual museum exhibits and their correlation, but did not use qualitative analysis to explore the link between these expectations. Roussou (2002) designed a virtual reality system for simulating and experiencing ancient Greek rituals, and a user-centered method for developing immersive and collaborative virtual environments for cultural heritage, respectively. But the research did not consider different cultural contexts and scenarios, nor did they validate their system and method with empirical and longitudinal studies. Roussou et al. (2013) proposed a framework for evaluating the educational impact of virtual reality learning environments, but they did not test and support their framework and indicators with sufficient data and analysis. Harrington (2020) and (2023) (This researcher is the same person as Roussou mentioned earlier) conducted researchers on the correlations between user interface elements and emotions. Both of them adopted a story-based approach to create and assess immersive, interactive, and multimodal virtual exhibits that aimed to foster user’s appreciation of natural beauty and science. However, these two studies have not been further extended to more types of VRCHE. Kennedy et al. (2021) employed a mixed method that combined eye-tracking, emotion measurement, questionnaire survey, and interview to measure user’s cognitive and affective outcomes after interacting with an augmented reality-based exhibit. However, they did not consider the diversity of user characteristics and backgrounds, such as prior knowledge, interest, or motivation, and how they might affect user’s learning. Liu and Lan (2021) applied a narrative-based research method that integrated story theory, sensory anthropology, and emotional geography to explore user’s emotional interaction and co-creation with heritage space and digital storytelling. However, they did not critique and analyze the existing quantitative evaluation methods that might have some validity or reliability issues, nor did they examine the dimensions and element relationships of the story co-creation model they proposed.

In addition, it is important to mention that Zou et al. (2021) undertook a literature review and suggested a "Rose model" (Fig. 3), claiming that a VRME has four levels where visitors create a variety of experience features using Norman’s (2004) mental processing method. This study is thorough compared to the related work above, and suggests possible links and causes among different experience components, as well as some conclusions for creating VRCHE. However, the model developed by this study has some limitations due to the research methods used and the lack of empirical validation.

Therefore, the research gaps are as follows: (1) Few studies exist on VRCHE models, frameworks, or assessment indicators. (2) The current model does not work for all kinds of VRCHEs. (3) The existing model is problematic in two aspects due to quantitative empirical research methodologies, namely, the summarized model dimensions lack comprehensiveness, and the model’s element relationships are not analyzed. (4) Few studies present recommendations for design strategies.

3 Research aims

The purpose of this study is to investigate the UX elements in VRCHE that facilitate visitors’ immersion and escapism. This study adopts a mixed-method approach based on grounded theory to address the research gaps identified in the literature review. Specifically, this study first classifies VRCHE exhibitions into four types according to their content and design features, and then conducts experiments with participants to collect both qualitative and quantitative data. Next, this study applies three-level coding and statistical analysis to the data, and uses theoretical coding to develop a comprehensive and detailed model of VRCHE UX that reveals the dimensions, elements, and relationships of the UX. Finally, this study provides design strategies for VRCHE developers and practitioners based on the model and the findings.

4 Experimental procedures based on grounded theory

Figure 2 depicts the experimental procedures. According to the fundamental tenet of grounded theory, research questions should emerge spontaneously throughout the experiment (Glaser and Straus 1999).

Fig. 2
figure 2

The experimental procedures based on grounded theory

4.1 Materials

The literature review used to construct the researchers’ “local concept” (according to Glaser and Strauss (1999), this can be viewed as the structural concept) for VRCHE presents the four categories of VRCHE. Zou et al. (2021) describe the four VRCHE categories as follows:

  • Artifact level: VRCHE’s artifact level focuses on tangible cultural heritage or the tangible aspects of intangible heritage. This type of exhibition process is structured, and the visitor gradually comes to understand the exhibited object based on its fixed appearance.

  • Behavioral level: In behavioral-level exhibitions, the visitor actively controls the exhibits and learns how to use them according to the structured process, in which the task design is frequently evident.

  • Spiritual level: The exhibition process of spiritual-level exhibitions is often unstructured, using tactics such as the reconstruction of historical situations or special effects to create an atmosphere. The visitor has no tasks but integrates exhibition pieces in accordance with cognitive approaches to interpret the objects.

  • Creation level: At the creation level, there is often a task design but no defined way to actualize it, providing visitors more freedom to explore.

Therefore, the four categories are task-based and structured, task-based and unstructured, nontask-based and structured, and nontask-based and unstructured exhibitions. A categorization technique such as this can differentiate common exhibition types by UX perspective and sensibly connect them with UX design. Figure 3 shows the four exhibition types in four quadrants.

Fig. 3
figure 3

The four types of VRCHE generated from the rose model proposed by Zou et al.

Based on the same exhibition theme of "folk toys," the researchers selected the Huarongdao (a Chinese klotski puzzle whose name comes from the Three Kingdoms story), Kongming lock (a Chinese folk wooden puzzle with a tenon-and-mortise structure), tangram, pitch-pot (a throwing game played by ancient Chinese scholars and officials at feasts), and Tower of Hanoi. These toys may be transformed into four types of exhibitions, are useful for diffusion and presentation, and symbolize etiquette, craftsmanship, literature, creativity, and mathematics.

Researchers conducted two experiments. In Experiment-A, participants passively experienced different types of VRCHE in certain set orders to explore their UX elements. In Experiment-B, participants determine the order of VRCHEs according to their motivation to explore the most essential UX expectations of visitors. The final UX model is meant to show how different UX aspects satisfy various VRCHE visitors’ expectations. Therefore, Experiment-A’s conceptual category building will reference Experiment- B’s results. All text in the materials was presented in Chinese because it is the participants’ mother language. All test materials were produced using Unity (developed by Unity Technologies in U.S., version 2019.1.9f1) and displayed with HTC VIVE pro eye (manufactured by HTC Corporation in China).

4.1.1 Design of experiment A

Figure 4 depicts the four VRCHEs used in experiment A: the pitch-pot, Kongming lock, Huarongdao, and tangram. The experimental design guideline includes viewing, hearing, and kinesthetics in the whole experiment.

Fig. 4
figure 4

Experimental material design

The pitch-pot, Kongming lock, Huarongdao and tangram exhibitions are designed separately as structured nontask-based, structured task-based, unstructured nontask-based and unstructured task-based VRCHEs. All of them include models of the toy, a brief introduction, a narrator reading the text introduction, some instruction for the task and the toy and looping background music. Visitors can wander around and pick up the model of the toy to observe (pitch-pot), solve the puzzle (Kongming lock), trigger the animation (Huarongdao) or put models together to create some patterns (tangram). Specifically, after the visitor’s first move in Huarongdao, he or she will be sent into a large Huarongdao chessboard, and the chess pieces will morph into life-sized Three Kingdoms characters. The visitor stands stationary on a soldier square on the board. War-themed music and sound effects transform the setting into a battleground. When visitors press the move button, they can observe the Three Kingdoms characters move automatically as they move with the chess piece that they stand on following the Huarongdao rules until the puzzle is solved.

4.1.2 Design of experiment B

Figure 4 illustrates experiment B’s Tower of Hanoi VRCHE. This experiment uses all four types of VRCHE to guarantee that the participants comprehend the Tower of Hanoi before establishing UX expectations. This experiment aims to build a communication context with which to elicit participants’ fundamental expectations.

To start, visitor see a museum scene with a model of the Tower of Hanoi, a text introduction on the left, and three buttons on the right, labeled as "Play the Tower of Hanoi," " Explore the hidden meaning of the Tower of Hanoi (it helps the visitor understand the Tower of Hanoi’s mathematical elegance and logic.)," and "Design a Tower of Hanoi." The initial scene introduces the Tower of Hanoi as a structured, nontask-based VRCHE that was not an option but rather a key introduction to the exhibit. Participants must move the Tower of Hanoi blocks to complete the puzzle in the task-based structured VRCHE of "play the Tower of Hanoi." The unstructured, nontask-based VRCHE "explore the hidden meaning of the Tower of Hanoi " recreates an old Indian temple using a 64-piece Tower of Hanoi. In the middle, there is an Indian monk. Visitors are prompted to click the button to view the monk’s introduction to the Tower of Hanoi’s legend (derived from an ancient Indian narrative by the French mathematician) and to feel the wonder of the mathematical law and its value to the world. "Design a Tower of Hanoi" is an unstructured tasked-based VRCHE. The visitor can change the style of each module and design a souvenir according to their own preferences.

4.1.3 Test experiment

Before the experiment, participants perform a test (Fig. 4). They read the controllers’ usage and perform the basics, such as position teleporting, clicking a button, tapping a button, and grabbing an object, and they practice the Think-Aloud approach (see 4.2 for details) under observation from the researcher and emulating the virtual agent in the test scenario.

4.2 Data collection: the think-aloud method, observations and interviews

This study gathered data using the Think-Aloud method, which provides rapid feedback and has minimum impact on an activity, and thus better meets the demands of HCI research (Ramm et al. 2018). Using this method, the thoughts that participants naturally generate during a task are collected. Participants’ action protocols during the experiment can also be used for later coding. This study constructed a semistructured interview to complement the data with the thoughts that participants did not speak aloud during the experiment, as well as to explore several Think-Aloud protocols that required further explanation from participants (Solomon 1995; van den Haak et al. 2003). Researchers collected data as follows: (1) Video data was collected throughout the entire experiment by means of a fixed camera (including audio). (2) Audio recording was used to capture relatively clear interview recording data. (3) Screen recording was used to capture think-aloud data as recorded by the headset microphone and operation video of the participants in the VR environment. (4) Memos were used throughout the experiment.

4.3 Pilot study: the emergence of research questions and construction of preliminary theory

4.3.1 Participants

Kabassi’s evaluation concludes that professional and regular users report different VRCHE UX characteristics (Kabassi 2017). The researcher selected VR-experienced and nonexperienced subjects for study. The experimental strategy was to continuously recruit new volunteers until the researcher had enough data and was unable to develop new insights (Glaser and Straus 1999). Four 22- to 25-year-old participants were chosen, two were female and two were male. One had more VR experience (having developed VR apps), one had some experience, and two had no VR experience.

4.3.2 Procedure

The goals are to (1) specify research questions, (2) develop a preliminary hypothesis to improve formal research data collection, (3) identify experimental design flaws, and (4) remove any distinctive characteristics that could affect participants’ experiences.

The experiment used repeated measurements. Each participant participated in Experiment A, which had 24 exhibition sequences, and Experiment B, which had 6 up/down button orders. Repeated measures were used to overcome the order retention effect (Brooks 2012). The researcher set the order of the two trials on the experimental system’s configuration interface to ensure that each participant encountered them in a different order.

As described in the “Materials” section of this study, experiment A included four exhibits: pitch-pot, Kongming lock, Huarongdao and tangram. These four exhibits were presented to the participants in a specific order in the experiment. After experiencing each exhibit, the participants could choose to end the current exhibit and enter a 15-s gap, during which they saw a nice room unrelated to the experiment in their headsets. Then they entered the next exhibit directly. In experiment B, the participants saw an initial scene introducing the Tower of Hanoi. Then they chose which exhibit to watch first according to their preference. After watching, they exited to the initial scene and chose the next exhibit to watch.

Participants were required to narrate their behaviors, thoughts, and feelings aloud while completing tasks to transmit their remarkable experience aspects or experience expectations. The experiment lasted one day.

4.3.3 Data processing and result analysis

The process for data processing was as follows: First, all audio data collected in the experiments were transcribed into text. Then, researchers individually compared the video and screen recording and restored the transcriptional accuracy of the original words of the participants and the intervention of the researcher as finely as possible. Then, researchers collected and recorded the action protocols observed in video (including all noteworthy behaviors with specific meanings, such as laughing or being scared). The next step involved segmenting the data on the basis of the participants’ speech patterns, deleting useless data, and obtaining the original segment data (Solomon 1995). Last but not least, QDA miner lite (developed by Provalis Research in Canada, version 2.0.9) was used to perform open coding (This means that the researchers summarized the segment protocols into concise codes as much as possible. Table 2 in Appendix A lists some examples.), and the data were categorized into the following three conceptual categories: "experimental process problems," "experience elements," and "experience expectations." Codes that appeared in more than half of the participants’ data were then screened.

The results are as follows: First, all participants compared experiment B to experiment A, showing that experiment B was affected by experiment A. Because the settings of the two experiments are different, experiment- A’s effect on experiment B is not the same as experiment B’s effect on experiment A, resulting in an asymmetric order effect. The repeated measurement design of the pilot study is flawed. Second, in experiment B, two participants indicated that they were familiar with the Tower of Hanoi; hence, their choice of next experience and their explanation for that choice were considerably altered, resulting in erroneous findings. Third, the research question emerges naturally as follows: What is the link between VRCHE’s UX elements? Finally, the following conclusions are made on the basis of UX elements and expectations: (1) Different participants’ UX expectations for the same experimental material vary. (2) Different participants’ UX elements for the same experimental material vary. (3) UX experiences are causally related.

4.4 Formal study: theoretical sampling

4.4.1 Experimental modification

Experiments A and B both used an independent sample group design. First, participants continued to be sought until experiment A reached theoretical saturation, and again until experiment B reached saturation. Then, by asking participants about their VR experience before the experiment and assigning them to experiment A or experiment B based on their responses, similar representativeness was achieved between the groups sampled in the two experiments. In addition, the researchers added the following question to the interview in experiment B: “How would you order the three Tower of Hanoi exhibitions and why?” Encouraging users to define an ideal product (Kristensson et al. 2008) gives participants (especially those familiar with the Tower of Hanoi) a greater scope for conversation. Finally, the researchers created the interview strategies based on preliminary theory. First, they asked open-ended questions or questions based on collected Think-Aloud data and then applied the laddering technique by asking participants "why" to elicit the core cause of an experience or the causal relationship among the aspects of an experience.

4.4.2 Participants

The study design and participant selection followed the pilot study. Forty-one volunteers aged 22 to 29 joined. Twenty-five participated in experiment A (14 males and 11 females), and 16 participated in experiment B (9 males and 7 females). The formal experiment and the pilot study used the same recruitment strategy: they recruited participants until new participants could not provide new conceptual categories. To ensure similar VR experience distribution patterns for experiment A and B, the experiments were conducted alternately, not sequentially. The experimenter decided which experiment to assign the participants to after learning their basic information. Figure 5 shows the percentage of VR users. Experiments A and B show similar VR experience distributions.

Fig. 5
figure 5

Schematic diagram of the VR experience distribution for participants in experiment A and experiment B of the formal study

4.4.3 Procedure

The formal study expands on the polit research’s preliminary theory. (1) What are participants’ UX expectations? (2) Why do the UXs of different participants vary? (3) What is the link between the participants’ UX elements?

Figure 6 displays the final 3-day experiment. Except for dividing the participants into two groups for experiment A and B, the rest of the experimental procedure was the same as in the pilot study. Also, the formal experiment’s data collection and processing, excluding coding, were identical to the pilot study. As described in 4.3.3, all the data (video, audio, and observation records, etc.) in the formal experiment were processed into original segment data, preparing for the subsequent data analysis process using coding.

Fig. 6
figure 6

Photographs from the formal study

4.5 Methodology

The coding approach utilizes the QDA miner and is based on Strauss’s grounded theory analysis method (Strauss 1987).

4.5.1 Open coding

In experiment A, the researchers maintained as much of the original data as possible while summarizing it openly and generating as many new codes as possible. In experiment B, each participant’s data were tagged as having one or more causes. In experiment A, 251 codes were found. In experiment B, 5 codes were found. Open coding examples are shown in Table 2 (Appendix A).

4.5.2 Axial coding

The first step of the axial coding was the clustering of the codes used in experiment B. According to the visitors’ UX expectations, the researchers clustered the three conceptual categories of “learning information about the exhibits,” “entertainment,” and “finishing the task” in experiment B and classified the related codes into these three conceptual categories. Table 3 in Appendix A displays the coding results and the percentage of cases.

The second step was the clustering of the codes used in experiment A on the basis of experiment B and known theories. Clustering improves a model’s explanation. IN accordance with Zou et al. ’s Rose model, the three conceptual categories obtained by clustering in experiment B were "learning," "emotion," and "ease of use," which supports this element of the Rose Model. In experiment A, the researchers clustered codes into categories of "emotion," "learning," "ease of use," and "others." Then, the study found one-case codes, checked the original data for similar codes and deleted them if none were found.

The third step involved clustering the new concept categories in experiment A. In experiment A, the researchers formed several new conceptual categories, including "self-evaluation," "comparative evaluation," "induction and summary," "learn more exhibits," and "time perception." Neff’s definition of self-evaluation combines the concepts of self-judgment and self-compassion (Neff 2003). Both have been shown to regulate emotion (Phillips 2021), so researchers link them to emotion. In their research on VR-enhanced museum experiences (Baradaran et al. 2022), Baradaran et al. classified "time perception" as "enjoyment," and they stressed its interplay with emotion. However, in the experiment, participants who mentioned time perception also mentioned information acquisition. One participant said, "The extended animation did not convey enough important information; therefore, I was bored." Emotional influence is indirect, and time perception is a type of central cognition (Spapé et al. 2022), so researchers classify it as "recognitive" under "learning." The study’s comparative evaluation and induction and summary requires analyzing cognitive material and expressing subjective viewpoints. According to Gutwil (2017), these are both metacognitive skills. According to the rose model, "knowing more exhibits" can be linked to "behavior inclination" even though it was not detected in experiment B.

The next step was reconstructing the concept categories obtained in experiment A, which was conducted in line with Gutwil’s approach regarding think-aloud data, the rescreening of data and the encoding of metacognition-related data (Gutwil 2017). Metacognition is a type of self-regulation; hence, it has three stages, namely, model comparison, model execution, and single-instance inferences, as included in Zimmerman’s self-regulation stage (Zimmerman 2000). The researchers deconstructed metacognition into the five concepts of "using common sense as a model," "taking knowledge learned during visits as a model," "taking the existing knowledge of cultural exhibitions as a model," "taking the existing knowledge of the game as a model" and "taking the existing knowledge of the exhibits as a model.” Experiment A provided 121 codes, 5 first-level concept categories, 20 s-level concept categories, and 19 third-level concept categories (Table 4 in Appendix A).

The last step involved constructing relationships. The researchers related each subcategory on the basis of experiment A’s aims and the causal relationships in the original data. They saved the relationship network constructed among idea categories in Neo4j. Table 5 shows examples of these relationships.

4.5.3 Selective coding

Reducing the three-level conceptual category to two-levels simplified experiment A’s coding relationship network. Researchers counted and evaluated cases in each subcategory to determine how many participants had cited each. Table 1 shows 15 subcategories, including 14 with over 50% of cases and one for behavioral intention. Theoretical coding focuses on subcategories and their relationships. The researchers then queried the relationship network using neo4j Cyper. They separated the relationship network into three steps between two main categories to facilitate observation and subsequent theoretical coding, as illustrated in Fig. 7.

Table 1 The proportion of selected subcategories and number of cases (The “Cases” column indicates how many participants’ data contained the corresponding subcategory)
Fig. 7
figure 7

Schematic diagram of the relationship network between the main concept categories

4.5.4 Code reliability test

Since one researcher performed all the coding, an independent researcher majoring in design performed a reliability test to avoid bias. The completeness and descriptiveness of subcategories affect the reliability of coding, as they affect relationship formation and selective coding. The independent researcher was given all 121 codes from experiment A and told to cluster them using the Card Method. The independent researcher identified 10 categories that can all be found in the original subcategories and have no advantages in description. The researchers believe that the original coding results are reliable in a certain way and should not be changed.

5 Results & discussion: theoretical coding and descriptive statistics

Figure 8. summarizes codes used and displays the VRCHE UX model, exhibiting UX construction. As visitors bring real-world knowledge into the virtual environment, it is traced back through cognitive methods such as recall and association, and the process of metacognition forms in combination with the knowledge perceived through the senses in the current exhibition. The metacognition process, together with the sensory perception and cognitive processes, stimulates visitors to form UX and behavioral intentions, including emotion, learning and ease of use. These experience characteristics have complex causal relationships, yet they all culminate in emotion and affect visitors’ behavioral intention. 5.1–5.4 explains these relationships in detail.

Fig. 8
figure 8

VRCHE UX model

5.1 Emotion

Visitors enjoy positive self-evaluation. Self-evaluations are produced through task design and interactive feedback. Immediate interactive feedback, a basic but moderately tough task design, adjustable work flow, and realistic job completion method cues can help visitors feel good about their experience and induce positive emotions. (See Fig. 7A).

Visitors can experience positive emotions through basic recognition methods. Associations are important items that might evoke positive emotions and recollections. They can also be used to determine how well visitors can perform a task by comparing it to known items. Memorizing exhibit features and comprehending their historical relevance helps visitors increase their understanding and learning pleasure. Some visitors evaluate an exhibit on the basis of how much they learned and how much time that they spent interacting with it. An effective experience may cause positive emotion (See Fig. 7B).

Metacognition can stimulate positive emotions. VRCHEs that follow real-world physics (such as gravity or object collision) can make visitors happy. Permission to flout real-world norms, such as permitting visitors to drop exhibits on the floor or allowing virtual avatars to stand on booths, may also generate positive emotion. When visitors model their knowledge on cultural heritage exhibitions, they consider exhibits with high rendering authenticity to be precious cultural relics and come to feel that forms that can be closely interacted with in the virtual world are more interesting than those in traditional exhibitions, generating further positive emotions. Compared to video games, VRCHEs offer more immersive visual and audio experiences and stronger action engagements, which can also generate positive emotion (See Fig. 7C).

5.2 Learning

A user-friendly exhibition improves the learning experience. The controllable task process allows the visitor to learn the exhibits at their own pace. By enhancing the visitor’s cognitive experience, the scene’s rendering can help them appreciate the exhibits’ spiritual and historical importance. A reasonable, unrestricted, and controllable exhibit operation mode can encourage visitor contact with the exhibits and allow visitors to independently investigate how to use and learn from the exhibits (See Fig. 7D).

Metacognition is another important learning component. Through metacognition, a visitor memorizes exhibit content or draws connections between such content and their previous knowledge or their common sense. By contrasting the exhibits’ text introduction with the exhibits, the visitor defines the exhibition’s contents and summarizes the tasks, thus comprehending the exhibits’ features (See Fig. 7E).

5.3 Ease of use

Visitor metacognitive processes are responsible for the positive ease of use that was experienced in the trials. Comparing the VRCHE to video games, visitors can see how some exhibitions use high controllability, flexibility, suitable controller button design, and gamification task design (such as game levels and puzzle solving) to boost the system’s ease of use. The realistic virtual avatar and the way that operation feedback complies with real physical phenomena (such as the controllers vibrating when touching an object) further increase the visitor’s evaluation of the system’s usability when compared to their expectations. Some visitors summarize the previous exhibition’s operating parameters and task completion methodologies in preparation for the next (See Fig. 7F).

5.4 Behavioral intention

Researchers have discovered little evidence that visitors have any behavioral intention. Only two of the participants in experiment A wanted to immediately engage with the next exhibition, which is stimulated by the visitor either believing that the task cannot be finished and experiencing unpleasant feelings or believing that the experience was highly engaging and wanting to experience more equally exciting exhibitions (See Fig. 7G).

5.5 Differences in UX elements among different exhibition types

Figure 9a shows the distribution of number of cases for some subcategories of the four exhibition types.

Fig. 9
figure 9

The distribution of subcategory numbers of cases in different VRCHE types

In an unstructured task-based VRCHE, visitor self-evaluation is more likely, making it easy to overlook visitor interaction with the scene itself. This is because unstructured tasks can promote self-reflection among visitors but require more energy to determine how to best complete the task, making is easier to ignore other elements in the scene.

Structured nontask-based VRCHE visitors engaged in fewer positive emotions or self-evaluations than other visitors, but their learning experience was richer. Because this kind of VRCHE is similar to traditional museum exhibits, visitors are not confronted with as much novelty. Because this type of exhibition is common, visitors can use their prior knowledge of cultural exhibitions in regular museums to quickly learn about the exhibits’ characteristics, thus retaining more mental energy with which to understand the exhibits through common sense.

In unstructured, nontask-based VRCHE, visitors use cognitive methods, including association and scene construction to obtain an atmosphere and more scene resources, which then leads to the generation of more associations.

According to the distribution of the number of cases in task-based and nontask-based exhibitions across 15 subcategories (see Fig. 9b for example results), task-based exhibitions more readily promote visitor self-evaluation. Traditional cultural heritage exhibitions are often nontask-based. Therefore, visitors use only their previous knowledge with which to undertake metacognitive processes in nontask-based VRCHEs.

6 Conclusion

This study verifies some research on VRCHE experience models to a certain degree. This study also categorizes VRCHE into four types resulting from a literature review. It then designs two trials for each category and proposes an experience model. A literature review and experimental data are used to explain how UX elements are developed and how they interact, and its conclusions can be applied to all major VRCHEs. Meanwhile, this study offers a unique perspective by completing a grounded theory practice through VRCHE model research. It ensures the validity of the model through the integration of empirical research and a literature review and uses the advantages of grounded theory to identify new UX elements and new causal relationships among elements that have not been summarized in other studies. Based on experimental data, this paper offers a thorough design strategy to provide insights for VRCHE designers and organizers.

6.1 Design strategy

Understanding visitor characteristics: As shown in the model in Fig. 8, the visitor is at the center of this model and interacts with both virtual and real elements. Therefore, before choosing a VRCHE type and developing content, the visitor attributes should be described. Visitor characteristics influence the UX path, or how users generate certain experiences. For example, in this study’s experiment, some video game players tend to view VRCHE as a kind of game. They will compare their experiences with games and expect more engaging scene content, action interactions, and audio-visual effects. The rest of the audience will compare their experience with museums, expecting a more organized exhibit display and simpler tasks because they view VRCHE as legitimate museum exhibits. The main traits of the exhibition audience will affect how the UX path is planned later.

Planning the UX path according to visitor characteristics: As Fig. 7 shows, all the elements ultimately point to “emotion” and its subcategories, indicating that “emotion” is the final outcome or purpose of all the elements. Therefore, the UX path should be planned based on the visitor and the specific exhibits, using this study’s findings as a reference. Each path should lead to a positive emotion. For example, if a project wants to construct a VRCHE about Chinese traditional paper-cutting art for children, it is expected that most children have more game-related knowledge, less information about paper-cutting, and less willingness to respect social rules (this is just an example to illustrate how to practice this strategy, maybe it is not the real situation). Then, by gamifying the exhibition design, the designer should focus on encouraging children’s metacognitive processes, directly producing some good emotional experiences, and then improving the system usability by designing gamified levels and increasing the operation freedom to enhance the metacognitive process and the usability experience, and then indirectly producing some positive emotional experiences. Conversely, according to the same assumption, it will be ineffective to directly or indirectly improve the emotional experience by providing a lot of information about exhibits like texts and recreating the real museum as accurately as possible.

Selecting VRCHE type according to UX path: According to the data analysis in Sect. 5.5, different VRCHE types focus on forming different experience contents. The advantages of different VRCHE types should be fully utilized and multiple VRCHE types should be combined to avoid any drawbacks. For example, a motivating task achievement display or assessment part can be added to the task-based, unstructured VR culture show. This can stimulate the audience’s self-evaluation or self-reflection, making them feel that they have gained something from the exhibition. Moreover, it is possible to integrate various VRCHE types or present different exhibition types in stages. By contrasting presentations at different times, this can compensate for the experience limitations of some exhibition forms and give the audience a sense of surprise.

Designing the continuity of the virtual world and the real world: A VRCHE is driven by visitors’ reflections regarding reality and virtual reality. In this study, it is mainly reflected in various metacognitive processes. When designing the continuity between virtual and reality, it is possible to match the virtual with the real world, or keep a slight mismatch with the real world. The audience may have a positive experience if the real world’s laws are followed or if the audience’s desire to conform to social norms is satisfied. However, there are cases when viewers have fascinating experiences with VR events that violate the real world’s laws. In addition, the analogy of "preheating" can be used to increase visitor immersion. Following the metaphor offered in the introduction, people always come from the real world, and it is impossible for them to immerse themselves in VR all day. By offering individuals an "umbrella" and letting them hold it before entering VR, they will "dry their clothes" in advance without affecting real-world actions. Then, the clothes will dry faster once they enter. This umbrella can take the form of either MR or AR.

In conclusion, the design strategy for a VRCHE involves investigating the visitor’s intentions toward reality (metacognition model) and creating a logical path (including the choice of exhibition types) designed to enable visitors to perceive a certain experience value from the continuity between reality and virtual reality.

6.2 Limitations

Due to the experimental design (in which only think-aloud transcripts and semistructured interviews were used to obtain participant ideas, while no structured method was used to ask specific questions), there were no detailed and reliable experimental data to support the formation of behavioral intention and its relationship with other experience factors. Researchers may not know about undiscovered UX elements. The conclusions of this study are exclusively based on experiments and literature reviews and do not refute the conclusions of other researchers. Besides, this study only divided and covered the participants’ VR experience, and did not further divide and cover other demographic variables. (For instance, due to COVID-19, all participants were Chinese undergraduate or graduate students from a single university. They likely had comparable cultural contexts and limited perspectives.) Additionally, the lack of accurate data on VR experience distribution across a larger group may compromise the quantitative analysis’s validity. An important goal of future work should be increasing demographic variable diversity and sample representativeness.