1 Introduction

Virtual Reality (VR) technology emerged in the 1950s and its early rudimentary experiences have evolved into what is known today as Immersive Virtual Reality (iVR) [1]. Since then, iVR technology has undergone significant development, and is a relevant discipline for society [2]. Early VR technology was primarily restricted to military and research activities [3]. However, new uses for VR emerged over the years, such as education and training [4], as well as the dissemination of Cultural Heritage (CH). Combinations of CH with VR technology began to be explored at around the 1990s [5]. Older VR projects, now known as low-immersive VR, the main feature of which is viewing on conventional screens or other 2D displays [4], differ greatly from the technology of the modern iVR projects. The degree of interaction in their environments, or Virtual Worlds (VW), was minimal, and were far from realistic, mainly due to the limitations imposed by the technology available at the time. Some of the references that compile the low-immersive VR VW applications are the following [6,7,8,9]. Those works include the reconstruction of ancient Roman Pompeii, Edo Castle in Tokyo, and later Rome Reborn [10], within which ancient Rome could be freely explored.

As the years passed, technical limitations were overcome, leading to the development of the Cave Automatic Virtual Environment (CAVE), an iVR system in which the user is surrounded by a large 3D viewing area [11]. Those systems marked the beginning of high-immersive VR, or iVR [4]. Cases of CAVE systems appear in the latest reviews during that initial period of CH applications in VR [5, 9, 12]. Some examples from that period include the serious game “Gates of Horus” [13], the reconstruction of the 12th c. Muslim suburb of Sinhaya [14], and the virtual tour of the Mogao Grottos in China [15].

However, the iVR technology that has recently become more popular is the new generation of Head-Mounted Displays (HMDs) [16]. These devices have become more affordable, especially since the launch of the first Oculus Rift development kit (dk1) [4]. The popularization of HMDs is having a great impact on iVR research, increasing the number of cases, as mentioned in several reviews [17,18,19]. Indeed, other technological advances are playing a crucial role in propelling iVR forward. Among these advances are the popularization of development engines specifically designed for iVR, such as Unreal Engine and Unity [20], the affordability of digital scanning software [21], and the availability of accessible scanning devices such as DSLR cameras and 360° cameras [22]. Even the increasingly functional technological innovations of smartphones are advantageous for eXtended reality (XR) technologies, given their broader range of functionalities, such as Augmented Reality (AR) [23], or even iVR [24]. This technological evolution has prompted a diversification of research, leading to other review papers on the implementation of CH with XR in various fields, such as Historic Building Information Modelling (HBIM) [25], Virtual Museums (VMs) [18], Virtual Humans (VHs) [26], and Digital Twins (DT) [27].

However, those latest reviews and others on the same topic, despite advancing and proposing new lines of research, present severe limitations when conceptualizing the state of the art of iVR and the virtual reconstruction of CH. A systematic review of 94 papers on the virtual reconstruction of CH in iVR is therefore presented in this paper, to address the lack of reviews on this topic. An analysis of this field is presented in this review paper, taking into consideration the technological explosion of recent years. Aspects such as experience design, its application in various domains such as museums or education, the utilization of technologies such as photogrammetry and the incorporation of VHs will all be analysed. The chronological structure of the analysis covers the steps involved in creating those sorts of iVR experience. An overview of the review structure, based on the review of Hovart et al. [28], is provided in Fig. 1. According to that paper, the design of iVR experiences can be divided into 3 phases: preparation, execution, and post-processing. At each stage of the design of an iVR experience of CH reconstruction, the most relevant items identified in the review articles were analysed. Each item raised a series of questions that were explored and that structured this review. Firstly, the available CH and the characteristics of the 3D model were analysed in the preparation phase. As iVR is a tool that helps people appreciate CH [29], the type of experience that best fitted the CH in question and its characteristics were subsequently analysed in the execution phase. Finally, the evaluations of the reconstructions were analysed in the post-processing phase.

The structure of this paper is as follows. In Sect. 2, a review of related work is conducted, emphasizing the novelty of the study. In Sect. 3, the methodology, based on the PRISMA (Preferred Reporting Items for Systematic Reviews and Meta-Analyses) guidelines, is described. In Sect. 4, the taxonomy used in this review, related to CH and its characteristics, the experience, the iVR design, and the evaluation are all detailed. In Sect. 5, the statistical analysis of the sample is broken down and the results obtained are presented in the following subsections: (1) demographics, time evolution, and type of publication; (2) reconstruction period, and heritage type; (3) reconstruction characteristics, and area of application; (4) development and design of the iVR experience; and (5) assessment of the experiences. In Sect. 6, the best practices identified in the review are summarized with a statistical analysis and future research lines are discussed through a qualitative analysis. Finally, the main conclusions of the review are explained in Sect. 7, answering the questions posed in Fig. 1.

Fig. 1
figure 1

Summary of the main points covered in the review

2 Related work

As presented in the Introduction, numerous reviews have studied the status of CH and XR from multiple perspectives. XR technologies present powerful tools for heritage visualization [30]. It is moreover an upward moving trend [19]. However, there is currently no review in which the relationship between virtual reconstruction of CH and iVR is comprehensively addressed. Existing reviews have limitations when covering the relationship between XR technologies and the numerous applications of XR in CH, due to the breadth of both concepts. In the following section, some of the reviews conducted on the application of CH to XR will be referenced and explanations given as to why they do not cover the gap addressed in this study.

XR is a term that encompasses various technologies, such as VR or AR [18]. Consequently, many of the reviews of CH applications conducted from this perspective are overly broad, as the topic as a whole is addressed, rather than focusing on a specific technology. Some of these reviews are [20, 23, 31,32,33,34,35,36]. Moreover, reviews have been developed that, while not specifically addressing the virtual reconstruction of CH and iVR, delve into closely related topics. However, those reviews often have a relatively limited sample size such as the 32-paper review of serious games in CH [37], or the 42-paper review of VR games in CH [19]. In addition, they are not focused on the topic of virtual reconstruction of CH. Indeed, among those reviews, there are some with a very extensive sample size, but their analyses were specifically focused on cases of virtual reconstruction of CH in iVR with limited sample sizes, such as the 290-paper review on VR for CH [35]. Or the 146-paper review of technologies for the preservation of CH [38], though its analysis of AR/VR only covered 17 papers. Furthermore, reviews of XR in the field of CH tend to be focused on other areas, such as VMs [18, 39], VHs [17, 26], or DT [27], but not on virtual reconstruction. Finally, many of these reviews are not literature reviews [20, 31, 33, 39,40,41]. Therefore, there is also a gap in this type of review.

Although there has been extensive research on this topic in recent years [42], this investigation has not been specifically focused on the use of virtual reconstruction of CH in iVR. The scenario underscores the need for a literature review that addresses the relationship between those two topics, to elucidate the current state of the art, its characteristics, and future lines of research. This study addresses that gap by updating the existing bibliography on CH and XR, analyzing the relationship between virtual reconstruction of CH and iVR.

3 Methodology

The PRISMA methodology [43] was followed in this literature review. It enables the transparent explanation of the systematic review process, so that aspects such as the search strategy, eligibility criteria, and the process of selection and analysis can all be easily elucidated. The entire methodological process can be consulted in Annex I.

Only 1 search was performed to find all the study cases. The search was TITLE-ABS-KEY ( ( (virtual AND reality OR vr ) OR ( ( virtual AND reality OR vr ) OR headset OR HMD OR head AND mounted AND display ) ) AND ( cultural AND heritage OR CH OR digital AND heritage ) OR ( reconstruction OR 3d AND model OR virtual AND tour ) ). Only articles published between 2013 and 2022 were analysed. The search was run in December 2022 on the Scopus database. In the search process, it was necessary to resort to specific keywords to locate all the cases, without limiting the search to the mandatory appearance of those keywords. Many of the papers with different purposes shared keywords, such as “VR” in articles that reported the use or made no mention of HMDs, or “reconstruction” in 3D reconstruction and photogrammetry papers. Finally, some papers that appeared in the bibliography of the sample were added to the review, in an example of a snowball effect.

Figure 2 depicts the steps of the process following the PRISMA methodology, which left a total of 94 papers [16, 44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64,65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81,82,83,84,85,86,87,88,89,90,91,92,93,94,95,96,97,98,99,100,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,132,133,134,135,136].

Fig. 2
figure 2

PRISMA literature analysis diagram of the review

The title and the abstract of each paper were read to select the ones to be included and to verify whether they fitted the requirements of the analysis. If the information was not clearly explained in the abstract, a first reading was performed to determine whether to include the paper in the review. Once selected, the papers were analysed on a spreadsheet that was also used for the statistical analyses. This information can be consulted in Annex II.

The purpose of the above criterion was to select papers that reported the reconstruction of tangible CH environments visualized though HMDs with iVR systems. If any parameter was not clear in the article, a more in-depth search was performed to locate it; if the data were not found, the article was excluded. The parameters of the exclusion were as follows:

  • The paper was a review, an editorial, or another type of publication that did not explain a case study. Although articles of this type were excluded, their references were reviewed for inclusion in the sample.

  • The paper does not describe a case of virtual reconstruction of CH. For this purpose, the experience cannot represent existing CH. In other words, existing heritage reformed at some point in the past was acceptable, even if its original appearance and its current appearance were not alike. In addition, the reconstruction hypothesis must be based on some kind of historical or archaeological documentation.

  • The heritage has no environment. Only CH with either an environment or an observable or explorable environment was selected. Single objects or collections of objects were removed from the selection. In this way, experiences such as virtual museums were excluded, unless they offered access to complete environments.

  • Intangible heritage. The reconstructed heritage must be tangible. Intangible heritage cases were accepted, if they also reconstructed the environment and the environment was tangible heritage, so a virtual reconstruction of intangible heritage that included a tangible environment was acceptable.

  • The experience was not iVR with HMD. In this way all the projects that used CAVE, AR, Mixed Reality (MR) or non-immersive VR with a regular screen were excluded, in order to compare only the most common iVR systems. Also, if a conversion of the 3D model to iVR was planned in the study, but it had yet to be converted, then the paper was also excluded.

Many parameters were considered in the course of this review, due to the depth of the analysis. So as not to extend the paper unnecessarily, the parameters of each article listed in spreadsheet form can as previously mentioned be consulted in Annex II.

4 Taxonomy

In this section, the taxonomy used to analyse iVR experiences and their characteristics are introduced. Figure 3 provides a summary of the taxonomy, which is presented in chronological order, reflecting the stages involved in creating a virtual reconstruction of CH in iVR, starting from the preparation and ending with the post-processing stage. The following sub-sections therefore describe the taxonomy in accordance with the order proposed in Fig. 3: (1) heritage characteristics; (2) reconstruction characteristics; (3) experience design; (4) iVR design; and (5) evaluation.

Fig. 3
figure 3

Taxonomies used in the review and chronologically ordered according to the phases of creating a virtual reconstruction of CH for iVR

4.1 Heritage characteristics

When creating a virtual reconstruction of CH, the initial step is to analyse the available heritage [41]. This step is critical, because the CH is the foundation of the iVR experience [29]. Two taxonomic classifications are proposed in this review with which to analyse the heritage characteristics: (1) the first category is the Reconstruction Procedure (RP), which indicates the type of procedure used to reconstruct the CH, determined by its preservation status. (2) The second taxonomy is the heritage type, taking into consideration its original usage.

Four distinct procedures were proposed in relation to RP. All four procedures varied according to the reconstruction techniques and the current state of the heritage. Each technique is classified as either a 3D-modelling technique or as a digitization (such as photogrammetry or 360° photography). A distinction regarding the state of the heritage was drawn between complete disappearance or scarce remaining traces, and partial preservation of the original appearance or having undergone restauration. The classification used in this paper was based on de Francesco’s work [137], in so far as the same techniques were used for the reconstruction, though its complexity was broader, as the current state of the heritage was also compared with Münster’s classification [138]. In that taxonomy, reconstruction and digitization of heritage differ, depending on whether human interpretation is required to undertake the reconstruction.

Taking this into account, the following RP taxonomy has been created:

  • Virtual reConstruction by 3D modelling (reCon3D): 3D modelling is used as the main tool in a heritage that has disappeared or of which few ruins remain. For instance, the virtual reconstruction of the 14th c. Spanish city of Briviesca [128].

  • Virtual reConstruction by digitalization (reConD): Photogrammetry, 360° photography or other digitization techniques are used as the main tool in a heritage that has disappeared or of which few ruins remain. For example, the virtual reconstruction of the 4th c. BC Roman Theatre of Miletus, in Turkey [54].

  • Virtual reForm by 3D modelling (reForm3D): 3D modelling is used as the main tool in heritage that has undergone few changes or a restoration. For instance, the virtual reconstruction of Ioannina Open Market in the 20th c., in Greece [55].

  • Virtual reForm by digitalization (reFormD): Photogrammetry, 360° photography or other digitization techniques have been used as the main tool in a heritage that has undergone few changes or a restoration. For example, the virtual reconstruction of the Spanish city of Burgos in the 20th c [60].

The following taxonomy was used to classify the heritage types. It was based on the types of heritage most frequently found in the review and the classification was based on the original usage of the heritage:

  • Civil heritage: This category includes heritage designed for civil use, such as theatres, forums, or residences: for instance, the Forum of Augustus in Italy, 1st c. B.C [123].

  • Urban heritage: This classification includes entire population centres or parts of them. There may be heritage from the other categories within these centres of population, but if the element to be reconstructed is the centre or a part of it, it will be classified in this way. For example, the town of Segeberg in the 17th c. in Germany [91].

  • Sacred heritage: This category groups sacred or cult heritage, such as churches or mosques, but also Roman temples or other places of worship. For instance, the Temple dedicated to Hera at Paestum in Italy, in the 5th c. B.C [74].

  • Industrial heritage: This taxonomy includes heritage or elements with industrial use. It includes heritage from before the last industrial revolution, such as neolithic or Roman furnaces. For example, the Power Plant of Pieštány located in 20th c. Slovakia [86].

  • Other: This category groups different minority types of heritage located in the sample, such as military heritage. For instance, the Sarajevo war tunnel constructed in Bosnia Herzegovina in the 20th c [69].

4.2 Reconstruction characteristics

In this section, the taxonomy of the virtual reconstruction and its characteristics are described. A reconstruction is created on the basis of the heritage characteristics [29]. First, as it is an iVR experience, restrictions on user movements within the environment must be considered. Secondly, as it is a virtual reconstruction, both the detail of the 3D model [139] and the documentation to create a 3D model can vary [41]. Considering the above, three metrics were used to compare the reconstruction characteristics: the Level of Size (LoS) of the environment, the Level of Hypothesis (LoH) of the reconstruction, and the Level of Detail (LoD) of the 3D model. All metrics were rated on a scale of 1 to 5. The parameters were used to analyse the sample in the same way as Koszewski [140] and Münster [141], but with the addition of LoS, as it is an important factor when analyzing environments.

The LoS metric is used to measure the size of the environment. The 360° environments were separated from free roaming iVR environments for accurate evaluation of the size. The 360° environments were marked as such before the metric was applied, and together with the free roaming iVR environments were rated on a scale of 1 to 5, based on their size. Figure 4 provides a visual comparison of the different environment sizes in relation to the size of a person. Based on those sizes, the following scores for LoS were assigned:

  • 360°: is a rendered 360° environment. For instance, the virtual reconstruction of the “Villa con ingreso a protiro” in Italy in the 2nd c. AD [126].

  • 1: a small, closed environment. For example, a room, such as the Ducal Chapel of San Ludovico in Parma, Italy in the 19th c [108].

  • 2: a large, closed environment or a closed environment with several spaces. For example, a house with several rooms. For instance, the theatre “Corral de comedies de l’Olivera” in 17th c. Spain [99].

  • 3: a small open environment. A single open and delimited space of small size. For example, the Roman theatre of Pausilypon, Italy, built in the 1st c. BC [97].

  • 4: a large open environment. A medium-sized open space with different bounded areas. For instance, the Kampung Hulu Mosque with its garden in 18th c. Malaysia [98].

  • 5: a very large open environment. A large-sized open space with different bounded areas. For instance, an entire city, such as the 17th c. German city of Stade [112].

Fig. 4
figure 4

Examples of the different sizes of environments according to Level of Size (LoS) scores in relation to the size of a person

The LoD is a computer science concept that defines the degree of abstraction of a real object versus its virtual representation [142]. As the level of abstraction decreases, the model object appears more realistic. In this review, the LoD score was based on both its geometrical and radiometrical fidelity. These two parameters have been considered taking into account on the Münster classification of LoD [141]. The final LoD score of the model was obtained by averaging these two features. On the one hand, geometrical fidelity refers to the detail on the surface of the 3D model. On the other hand, radiometrical fidelity refers to the detail in the reproduction of its visual properties. Lighting parameters were not considered as this is a factor outside the 3D model. The degree of angulation of the faces of the 3D model was used to measure geometric fidelity. A lower angle implies greater detail, as it provides more points for creating geometric details [143]. Two factors were considered for the measurement of radiometric fidelity: on one hand, the resolution of the textures, and on the other hand, the quantity of textures per material. Higher texture resolutions imply a higher LoD; likewise, a greater quantity of textures per material suggests more complex Physical Based Rendering (PBR) materials [144], which implies a higher LoD. In that type of measurement, a more realistic model is therefore considered to have a higher LoD. It is unsuitable for stylized models where realism is not an objective. However, this is not a problem in the context of CH, where virtual reconstructions typically strive for realism [145]. More details on this method can be found in Annex III. The score had to be estimated, because the 3D models cited in the papers were inaccessible. The scores ranged from 1 to 5, at intervals of half a point where higher scores indicated higher realism.

Figure 5 displays three examples of LoD assignments to CH models, previously created by the research group. The first image shows the virtual reconstruction of the city of Palacios de la Sierra, Spain, in the 11th c. Its score, 2 out of 5, was due to its simple shapes and textures, indicating a low LoD. The second image displays the virtual reconstruction of the city of Leon, Spain, in the 1st c. BC. The superior geometrical fidelity of the model included details such as doors and chimneys, and different materials, and those and other details may be appreciated, due to its radiometrical fidelity. This model received a score of 3.5 out of 5, indicating a medium LoD. More details about the development of this model are elucidated in another paper [146]. Finally, the third image shows the virtual reconstruction of city of Vitoria, Spain, in the 12th c. In this case, both geometrical and radiometrical fidelity are hyper-realistic, resulting in a score of 5 out of 5, indicating a high LoD.

Fig. 5
figure 5

Examples of different reconstructions according to Level of Detail (LoD) scores. The first image shows the virtual reconstruction of the city of Burgos, the second of the city of Leon, and the third of the city of Vitoria, Spain

The LoH, or Level of Information in other papers [2], measures the quality of the historical sources used in the reconstruction. It is a commonly used classification in the field of virtual reconstruction [90]. In this paper, the LoH score was classified on a scale of 1 to 5, along with other features. The following aspects were considered to assign the score. The information sources were reduced to 3 inputs, to simplify the process and to generate a LoH that could in all cases be comparable: archaeological remains, documentation, and memory. Greater weight was given to primary sources, such as archaeological remains, as opposed to secondary sources, such as graphic and written documentation. Memory was given the lowest score, as it is not an immutable record. The same procedure that Münster [141] and Hauck [147] followed to assign the scores was replicated. The scores varied depending on whether multiple documentary sources were combined, though always with a preference for archaeological remains over documentation, and documentation over memory. On that basis, the following scores for LoH were assigned:

  • 1: The source of information was mainly based on the memories of people who knew the heritage.

  • 2: The source of information was mostly written or graphic documentation.

  • 3: The source of information was mainly written or graphic documentation with the support of the memories of those who knew the heritage.

  • 4: The source of information was mostly drawn from archaeological remains.

  • 5: The source of information was mainly archaeological remains with the support of written or graphic documentation.

4.3 Experience design

Having obtained the 3D model of the virtual CH reconstruction, an iVR experience can be designed to showcase the reconstructed heritage. This taxonomy can be divided into the following parts: one dedicated to the design of the experience while considering its area of application, and another dedicated to the type of iVR experience. Three taxonomic categories were used in this section: one for the area of application, another for the Degrees of Freedom (DoF) of the experience, and a third for the type of iVR experience.

Münster’s classification [138], was used as a basis to classify the area of application. However, a new and frequently observed category in the analysis of the sample was also added -e.g., the Musealization of Marto [32] and Bekele (called Exhibition in his review) [20]- which resulted in the following classification:

  • Preservation: Although specified in a broader context in the paper of Münster, preservation in the context of this study is associated with iVR experiences that are usually limited to university research with no precisely defined target audience.

  • Musealization: iVR experiences designed to be used in exhibitions or museums.

  • Education: iVR experiences with educational uses or evaluated with students to test their educational potential.

  • Research: works with the purpose of research within the discipline. They evaluate the iVR experience with experts in the discipline to improve some features.

iVR experiences can be categorized based on the DoF of the user’s interaction with the VW [148]. There are two types: experiences with 3 Degrees of Freedom (3DoF) and experiences with 6 Degrees of Freedom (6DoF). These degrees of freedom determine the user’s freedom of movement in the VW.

  • 3DoF: the user can interact with the environment by rotating the view, but the user cannot move around the VW. The 3 degrees correspond to the rotation of the 3 spatial axes.

  • 6DoF: not only can the user rotate the view, but the user can also move around the VW in all directions. This type of movement usually generates a greater sense of immersion. The 6 degrees correspond to the rotation and displacement of the 3 spatial axes.

Among the characteristics of an iVR experience is full user immersion in the VW [149] and interaction with it in a natural way [150]. Hence, Checa’s [4] classification was used, based on the range of user interactivity that defines the different types of iVR experiences. The taxonomy is as follows:

  • Passive: very limited user interactivity and movement, such as 360° environments.

  • Explorative: free exploration of the virtual environment, although no direct interaction.

  • Explorative interaction: the user can explore and interact freely with the virtual environment.

  • Interactive experience: user interaction with the environment, but no free movement within it.

4.4 Immersive Virtual Reality design

Once the area of application and type of iVR experience are determined, other design decisions can be considered. The most important design decisions observed in the sample were as follows: (1) how interaction takes place in iVR; (2) how the experience is ended; (3) the HMD through which it is visualized; (4) the inclusion of characters; (5) the utilization of sound resources; and, (6) the use of interface. The following taxonomies were therefore established.

The taxonomy relating to interactivity was partially adapted from Checa [128] and Boletsis [151], though two new types of interaction that are more commonly used in 3DoF experiences were added. The following classification was therefore used:

  • Head movement: the user can only rotate the point of view through head movement.

  • Point and click: movement through the environment is based on a teleportation system through previously defined selectable points.

  • Gamepad locomotion: Movement through the environment is accomplished by controlling a gamepad or keyboard.

  • Room scale: Movement through the environment is only accomplished by tracking the user’s real movement.

  • Teleport locomotion: Movement through the environment is accomplished by combining the tracking of the user’s real movement and a free teleportation system.

The following aspects were taken into account to classify each HMD: (1) whether the HMD is capable of reproducing only 3DoF or also 6DoF experiences; (2) whether it is limited to functioning only when connected to a computer (desktop device) or whether it can run autonomously (standalone device); and (3) whether the tracking system of the HMD is external or internal and how many degrees it covers. The taxonomy was therefore as follows:

  • 3DoF desktop: HMD 3DoF only works connected to a PC, such as Oculus Rift dk1.

  • 3DoF standalone: HMD 3DoF that can work autonomously, in the same way as a cardboard.

  • 6DoF 180° desktop external tracking: HMD 6DoF with external tracking that can capture the user at an angle of approximately 180° and that need a PC, such as Oculus Rift CV1.

  • 6DoF 360° desktop external tracking: HMD 6DoF with external tracking that can capture the user from all angles and that needs to run on a PC, such as HTC Vive.

  • 6DoF desktop internal tracking: HMD 6DoF with internal tracking, which need to run on a PC, such as Oculus Rift S.

  • 6DoF standalone internal tracking: HMD 6DoF with internal tracking, which work independently, such as Oculus Meta Quest 2.

The following taxonomy was also created in relation to the way that the experience was ended, which covered all possible ways of ending the experience in the sample.

  • Free: the user can end the iVR experience at any moment.

  • Time: there is a time limit to enjoy the experience, once the time is up, the experience ends.

  • Exploration: it is necessary to explore the entire environment or view all the 360° points to finish the experience.

  • Tasks: it is necessary to finish one or more tasks to end the experience.

The inclusion of characters was categorized according to the following taxonomy, taking into consideration their method of creation. Rather than considering avatars, representing users, in the taxonomy, only virtual agents, whether interactive or non-interactive, were used [26].

  • Digital: The characters were digitally created, with techniques such as 3D modelling.

  • Recorded: The character or characters were real individuals recorded and integrated into the virtual environment.

Regarding the utilization of sound resources, the following taxonomy was formulated, considering the type of sound, its relationship with the virtual world, and its interaction with the user.

  • Ambient: diegetic sound effects, directly related with the virtual reconstruction and passive in nature. No user action is required for their playback, i.e., sounds of nature.

  • Music: extradiegetic sound effects, not directly related to the reconstruction and passive in nature. No user action is necessary for playback, i.e., background music.

  • Sound effects: actively triggered; user action is required for playback, such as grabbing an object.

  • Narration: this category encompasses all voice sounds, whether conversations or extradiegetic narrations, and whether or not they require user action.

Finally, the taxonomy related to the user interface is described. Thus, its relationship with both the environment and the user has been taken into consideration. The taxonomy is outlined as follows:

  • Panel: a diegetic and passive user interface. Situated within the virtual reconstruction and operated independently of user input. For instance, a text panel within the environment.

  • Point of Interest (PoI): a diegetic and active user interface. Located within the reconstruction, it necessitates user interaction for display. For example, a panel activated by proximity or by pressing a button.

  • Menu: an extradiegetic user interface. It exists outside the virtual reconstruction, and accessing it requires the use of a pause function or a user-specific action.

4.5 Evaluation

Developers may administer a survey to users, to determine whether the objectives of an experience have been achieved. There are many types of evaluative surveys and tests, varying in terms of their typology, purpose, and timing. Three taxonomic categories were used in this section: one for the purpose of the evaluation, another for the type of the evaluation, and a third for the type of questions that were used.

The purpose of the evaluation has been classified with reference to Chong’s taxonomy [35] in the context of VR for CH practices. Additionally, a category based on Chang’s study [152] has been included, emphasizing the effect of engagement in iVR. The following taxonomy outlines the purposes of the evaluations that were analysed:

  • Usability and user experience: its purpose is to assess aspects of usability or user experience, such as satisfaction or presence.

  • Technology or System application: the motivation behind the evaluation is to compare or to assess the effectiveness of a technological solution.

  • Education/Engagement: the purpose is focused on evaluating acquired knowledge or interest in knowledge presented in the reconstruction.

In this review, a classification of evaluation types, in accordance with their phases and implementation times, based on the taxonomies of Martinez [153] and Tsita [154], is presented. The following categories describe the types of evaluations found in the sample:

  • Post-evaluation: the test was administered after completion of the experience.

  • Pre/post-evaluation: one test before starting the experience and another at the end of the experience.

  • During/Post-evaluation: the test was administered at the end of the experience, coupled with data collection during the experience.

  • Pre/During/Post-evaluation: one test was administered before the experience, another at the end of the experience, and data were collected during the experience.

  • No evaluation: there was no test.

Finally, the following taxonomy was developed to classify the types of questions most commonly used in evaluation questionnaires based on the type of response:

  • Likert: Responses can only be given to this type of question in the form of numbered options on a scale. It is not necessary to adhere precisely to a Likert scale as long as the options are graduated on a scale.

  • Options: This type of question only accepts predefined responses, such as yes or no.

  • Open: The user can respond freely with their own comments.

5 Survey

In this section, the sample analysis and results are presented in the same order as they appear in the previous sections, as shown in Fig. 3, with an added sub-section on Data distribution. Unspecified data have been excluded from the figures to reduce unnecessary clutter. The values may vary slightly between graphics, due to unspecified data and some papers that treated more than one reconstruction or iVR experience.

The internal organization of each subsection will follow the same structure to facilitate the reading of the Survey section. Firstly, the order of variable analysis will be explained, following the sequence presented in Fig. 3. Secondly, each analysis will be conducted in the following internal order: (1) Presentation of the variables and the method of analysis, (2) explanation of the summary figure if available, (3) presentation of the data, and (4) discussion of results. Finally, a summary of the subsection results will be provided at the end.

The Survey section is divided into the following sub-sections: (1) Data distribution; (2) Heritage characteristics; (3) Reconstruction characteristics and experience design; (4) Design of immersive Virtual Reality; and (5) Evaluation.

5.1 Data distribution

Initially, the data pertaining to the articles in the sample will be examined. To do so, an analysis of the year and type of publication will be conducted. Subsequently, the nationalities of the authors and the localization of the reconstructions will be analysed. Finally, the authors’ field of expertise and the indexing of the article in Scopus will be scrutinized through a qualitative analysis.

Statistical data on the date and type of publication were analysed. The sample contained 94 publications: journal articles (46%), conference papers (51%), and book chapters (3%). Figure 6 shows the temporal evolution of the references separated by type of publication. The X-axis shows the year of publication. The Y-axis shows the number of references per year separated into journal articles, conference papers, and book chapters. The number of references has progressively increased since 2013, reaching a peak in 2018. Currently, the discipline is in a state of consolidation and publications are stabilizing. In fact, 2018 was the year with the highest production, which was consistent with the results of other reviews [18, 19, 34, 38]. Furthermore, the number of journal publications has increased over recent years, which is an indication of maturity. Nevertheless, there were more conference papers and book chapters than journal publications. These data coincided with some [19, 34], though not all of the reviews on the topic [18, 38]. However, there is little difference between these two types of publications. A deeper analysis of the publications reveals that DAACH - Journal of Cultural Heritage and Digital Applications in Archaeology (n = 7), Applied Sciences (n = 4), JCH - Journal of Cultural Heritage (n = 3), and Archeologia e Calcolatori (n = 3) were the preferred journals. In addition, XR Salento (n = 7) and CIPA - Symposium on Great Learning and Digital Emotion (n = 4) were the preferred conferences.

Fig. 6
figure 6

Number of references and types of publication by year of publication (2013–2022)

The nationality of the first author and the location of the reconstructions were also analysed. Scientific production has mainly been concentrated in Europe (80%), followed by Asia (9%), America (5%), Oceania (5%) and Africa (1%). Moreover, at the continental level, reconstructions were located as follows: Europe (81%), Asia (12%), America (4%), Oceania (2%) and Africa (1%). Most publications were in the following countries: Italy (n = 24), Spain (n = 17), United Kingdom (n = 9), and Germany (n = 6). As a result, most of the references referred to the study of reconstructions within either Italy (n = 31) or Spain (n = 14), particularly those that used Virtual reForms procedures, due in all likelihood to the abundance of Roman remains in these countries. As will be seen in the next section, there is a great abundance of reconstructions throughout the period of Roman antiquity. These results are not unexpected considering that Europe is where most of the assets on the UNESCO World Heritage List are located [30]. Moreover, these results are consistent with the observation of Mendoza [38] that the first authors of papers on similar topics were invariably based in either Italy or Spain. In any case, Asia stands out in a framework of highly centralized production. It is the second largest producer, almost doubling the production of the third largest, both in terms of the location of the reconstructions. It should be noted that Asia, together with Europe, is the only region in which there is a surplus of reconstructions in relation to actual production. Those results are aligned with the findings of Lucchi [27], which identify Asia as a secondary hub of production in the field of DT and CH.

Finally, an analysis is conducted on the indexing of the article based on Scopus keywords and the field of expertise of the authors. Some articles in the sample lacked a Digital Object Identifier (DOI), so they were excluded from this analysis. Annex II provides a list of articles without a DOI. A bibliometric analysis matrix was generated using the Bibliometric software correlations, to scrutinize article indexing [155]. Clusterization and network analysis methods were employed to establish initial correlations. Consequently, Fig. 7 was generated, depicting correlations between the Scopus keywords. These keywords were generated using an algorithm that detects repeated words and phrases forming the titles of the articles. While bibliometric analyses are rooted in data, their visual projection leans towards a qualitative analysis as the interpretation of the graph takes precedence over the raw data. As depicted in Fig. 7, five clusters were identified, with two markedly larger than the others. These two clusters are denoted in dark green (“Virtual Reality”) and orange (“three-dimensional computer graphics”).

Fig. 7
figure 7

Matrix of bibliometric data from the selected papers with the Scopus keywords

Interpreting these data reveals a meaningful disparity in the weight of keywords related to technology and VR compared to those associated with CH. The two principal clusters are defined by the keywords “Virtual Reality” and “three-dimensional computer graphics,” representing the primary technologies addressed in the analysed articles, one focusing on the creation of 3D models and the other on their visualization in iVR. Conversely, an abundance of keywords related to CH, such as “museums,” reflects a growing trend, as highlighted in the introduction. However, these CH-related keywords are subordinate to technological ones, occupying a secondary position in terms of their importance.

Finally, after reviewing the fields of knowledge of the authors, it was observed that most of the articles have a main author from a technical department (n = 62), with a secondary author from an history or an archaeology department (n = 25). An observation that highlights the interdisciplinary nature of this field according to both Münster [155] and The London Charter [29], in so far as it bridges technical and humanities departments. Nonetheless, similar to the preceding analysis, it is evident that in this discipline, the technical department carries greater significance than the historical department.

In summary, the following conclusions can be drawn. The discipline is in a state of consolidation, having reached its peak production in 2018. These articles are distributed between conference papers and journal articles, with an increasing trend toward publication in journals. Geographically, Europe stands out as the continent with the highest production and the most reconstructions located, with Italy and Spain being prominent within the continent. On the other hand, Asia occupies second position both in terms of production and localization. Lastly, the discipline is clearly interdisciplinary, combining technical and historical disciplines, with a pronounced emphasis on technical aspects.

5.2 Heritage characteristics

The first step in creating a virtual reconstruction of CH is to study the available heritage [29]. The heritage type and its historical period were considered in the analysis of the heritage, its characteristics, and its RP. The first analysis was focused on the relationship between the RP and the period to be reconstructed, the second on the relationship between the heritage type and the period to be reconstructed.

Figure 8 shows the distribution of the sample according to the period of the reconstruction and its RP. The percentage usage of the RPs was as follows: reCon3D (48%), reConD (9%), reForm3D (22%), reFormD (20%). The Y-axis shows two variables: on the one hand, the percentage usage of the RP in each period, graded on the left Y-axis with stacked columns; on the other hand, the total number of cases in each period, plotted with the black line and graded on the vertical Y-axis. The horizontal X-axis shows the period of the reconstruction, grouping the centuries into 7 periods: (1) Neolithic - Bronze Age, from ∼7000 BC to 18th c. BC; (2) Iron Age, from 10th c. BC to 5th c. BC; (3) Roman Republic, from 4th c. BC to 1st c. BC; (4) Roman Empire, from 1st c. BC to 3rd c. AD; (5) High Medieval Period, from 4th to 10th c.; (6) Late Medieval Period, from 11th to 15th c.; (7) Modern Age, from 16th to 18th c.; and (8) Contemporary Age, from 19th to 20th c. The arrangement of each reconstruction into historical periods was to facilitate the reading of Fig. 8, as patterns can be appreciated when differentiating according to the RP. Except for the first three periods (not very representative due to their low number of cases), the use of reCon3D was inversely proportional to the number of reconstructions, since other RPs were used to a greater extent during the periods with more cases. It is an important point, because reCon3D is the majority RP, representing 48% of the cases. Nevertheless, reCon3D is also the only RP in which no archaeological remains have to be used. However, both ReCon3D and reFormD, both based on digitization, usually employ some remains for digitization, and reForm3D, which also needs some remains to carry out the reconstruction. Regarding the absolute values, it may be observed that the production peaks refer to the periods of the Roman Empire and the Contemporary Age, becoming increasingly frequent towards the present. These periods coincide with those with the lowest relative usage of reCon3D.

Fig. 8
figure 8

Distribution of heritage reconstructions by period and by Reconstruction Procedure (RP). Black line shows total number of cases per period

Figure 9 displays the same data as Fig. 8, but it separates the sample by heritage type instead of by RP. In this case, the most frequent types of heritage were Civil Heritage, representing 33% of the cases, followed by Urban Heritage (29%), and Sacred Heritage (21%). This can be seen in Fig. 9, where Civil Heritage is concentrated in the period of the Roman Empire and the Modern and Contemporary Age, which are the periods with the highest frequency of cases. Industrial Heritage also shows significant importance in those last two periods. Furthermore, Urban Heritage and Sacred Heritage behave similarly to reCon3D and are frequent in all periods, but especially in those with a lower number of cases, such as Neolithic and Late Medieval periods.

Fig. 9
figure 9

Distribution of reconstruction cases by period reconstructed and by heritage type. Black line shows total number of cases per period

These results suggest that the heritage characteristics have an influence on the development of virtual reconstructions for iVR. The most frequent periods for reconstruction have been the Roman Empire and the Contemporary Age, which is likely to be due to two factors. Firstly, there are more archaeological remains from these than from other periods, which can be inferred from the RP. RPs requiring archaeological remains are concentrated in these two periods. CH is one of the fields with the greatest weight in digitalization [156]. According to other reviews on similar topics, there are more documentation projects than reconstruction projects [34], and photogrammetry is the preferred data-acquisition system in CH reconstruction [38]. These results are not surprising given the aforementioned factors. Besides, the support from existing heritage makes it easier to create a virtual reconstruction [138]. The second reason for the results, as seen in the previous section, is that Italy and Spain are the countries with the most weight in the discipline [38]. A reason that might explain the large number of Roman remains and Civil Heritage in the sample that correspond to those historical periods.

5.3 Reconstruction characteristics and experience design

The next step in creating an iVR experience with reconstructed heritage is to design an experience that highlights the value of the available heritage [29]. In this sub-section, the reconstruction characteristics and experience design will be analysed together, as they are closely related. First, the most frequent areas of application and their characteristics will be analysed. Secondly, the DoF and the RP will be discussed and, finally, the relationship between that area of application and the type of iVR and DoF.

The areas of application for the experiences are as follows: Preservation (54%), Musealization (30%), Education (12%), and Research (4%). Due to the limited data available for Research and its heterogeneous nature, this area of application has not been included in the results to avoid introducing noise. The low presence of Research in the sample coincides with the observations of Mendoza [38] that AR/VR applications for CH are focused on non-expert users.

The relationship between the reconstruction characteristics and the area of application have been considered. Figure 10 displays the three reconstruction characteristics (LoS, LoD, and LoH) on the Y-axis, each divided according to the areas of application. The X-axis shows the average score for each reconstruction characteristic according to its area of application. Since the use of 360° environments was not scored in the LoS, it will be discussed outside Fig. 10. The proportion of 360° environments were as follows: Preservation (45%), Musealization (41%), and Education (7%). First, Preservation was the category with the lowest overall scores and that had the most 360° environments. One likely explanation is that these applications are not intended to be tested with end-users and have not left the academic environment. In Musealization, both its relatively low LoS and its high percentage, 41%, of 360° environments stand out. It also had the highest LoD and LoH, a crucial and significant aspect for realism in 3D models [157]. Education was the opposite, with the highest LoS, the lowest number of 360º environments, but the lowest LoD, and its LoH was not prominent.

Fig. 10
figure 10

Average LoD, LoH, and LoS scores for each area of application

The DoF were taken into consideration, to analyse the suitability of these areas of application for an iVR experience. They were furthermore compared with the RP, taking into account the LoS, LoD, and LoH results, considering that the heritage characteristics affected its virtual reconstruction [29]. Analyzing the proportion of reconstruction cases according to DoF, it was observed that 3DoF (32%) was used less than 6DoF (68%), which was the most common type of experience. Figure 11 shows the percentage of RP and DoF usage by application area on the Y-axis with stacked columns, each column divided into two areas by use of DoF. On the X-axis, each of the areas of application can be seen. The most widely used RP in Preservation was reCon3D, although the use of other RPs was significant, as the proportion was more balanced than in the other areas. Regarding DoF, there was a clear predominance of 6DoF. The most widely used RP in Musealization was reForm3D, an area that stood out because of its low use of reCon3D and high use of the other RP, reFormD, both based on digitization. Regarding the use of DoF, 6DoF was still dominant, but this area showed a greater presence of 3DoF. Finally, the most widely used RP in Education was reCon3D. Similarly, 6DoF was more frequently used than 3DoF.

Based on the above data, the following conclusions can be drawn. Preservation is an area of application where 6DoF predominates, and there is no preferred RP. These data coincide with the general statistics, probably because it is the most general area and was not designed for end-users. Moreover, Musealization has a greater preference for RPs based on digitization, and the use of reCon3D was minor, despite it being the most widely used RP. Additionally, 3DoF can be highlighted within this area. Although its usage was not in excess of 50%, its use was proportionally higher than in the rest of the areas, which is significant given the low use of 3DoF in the sample (32%). Furthermore, Musealization has a proportionally higher number of 360° environments than other areas of application, having almost half of them (45%) despite being a medium-sized area of application (27%). Finally, in Education, reCon3D is the most widely used RP, and 6DoF predominates.

Fig. 11
figure 11

Percentage usage of RP and DoF by area of application

The preference for the use of RP by area of application may be better understood through a comparison of the reconstruction characteristics of the RP by area of application and through their LoS, LoD, and LoH scores. Figure 12 displays the three reconstructions characteristics (LoS, LoD, and LoH) on the Y-axis, each divided according to the RP. The X-axis shows the average score of the reconstructions according to their RP. Regarding the number of 360° environments, the proportions were as follows: reCon3D (48%), reConD (3%), reForm3D (24%), and reFormD (24%). First, reCon3D showed the highest LoS and a balanced LoH, but the worst LoD. On the other hand, reConD had the second lowest LoS, a good LoD, and the worst LoH. In turn, reForm3D had the second highest LoS, the best LoD, and an elevated LoH. Finally, reFormD had the worst Los, but a balanced LoD, and the highest LoH.

In the light of those results, the highest LoS and a balanced LoD and LoH can be associated with the RP techniques based on 3D modelling (reCon3D and reForm3D). Conversely, the RP techniques based on digitization (reConD and reFormD) stand out for having the lowest LoS, but a balanced LoH and LoD. Furthermore, these RP techniques accumulate most of the 360° environments. The balance between the RP techniques based on 3D modelling and digitization can be explained by the state of heritage. The RP techniques based on reConstruction (reCon3D and reConD) have worse LoH and LoD. Using those techniques, having a lower LoH is normal since the use of 3D modelling requires filling empty spaces in the LoH [138]. Those based on reForm (reForm3D and reFormD) have a high LoH and better LoD, which assists heritage conservation.

In conclusion, the state of heritage affects the reconstruction characteristics. When fewer remains are preserved the reConstructions are used, LoD and LoH decline, while LoS increases, offering greater flexibility to work directly in a digital 3D environment. Conversely, a greater preservation of CH facilitates reForms and the utilization of digitization techniques, with the resulting improvements to LoD and LoH, due in all likelihood to the ease of generating realistic and accurate models through photogrammetry. These outcomes align with the observations in Figs. 10 and 11, where Musealization primarily employs digitization-based techniques, leading to a lower LoS and a greater emphasis on 360° environments and 3DoF experiences, but a superior LoH and LoD. Conversely, Education focuses on reCon3D, resulting in higher LoS and numerous 6DoF experiences, but yielding poorer outcomes in LoH and LoD. Preservation, on the other hand, showed a less pronounced trend, due to the diversity of available reconstruction techniques and the absence of a clear objective.

Fig. 12
figure 12

Average LoS, LoD, and LoH scores for each RP

The last analysis of this sub-section served to establish the most suitable types of iVR and DoF for each area of application. The amount of iVR types and their relations with the application areas were considered in the analysis. The percentage usage of the iVR experiences were: Passive (18%), Explorative (51%), Explorative interaction (24%), and Interactive experience (7%). Figure 13 shows the percentage usage of each type of iVR experience by area of application. Each area of application has two columns, the left with the results of 3DoF experiences, and the right the results for 6DoF. On the Y-axis, the percentage usage of each type of iVR experience is shown, and on the X-axis, the application areas are listed. 3DoF experiences are in the minority, but in Musealization experiences, there is a balance with 6DoF, as shown in Fig. 11. These DoF have a majority use in Passive experiences, exclusive to this DoF. Explorative experiences are also used in this DoF, but mainly for Preservation. Finally, Explorative interaction experiences are used in Musealization and Education, with a greater impact in the latter area where there is a balance between the 3 types of 3DoF experiences: Explorative, Explorative interaction, and Passive. 6DoF experiences were the most frequently used in all areas except for Musealization. In this DoF, the most frequently used type of experience was Explorative. Explorative interaction and Interactive Experience have some weight in all areas, especially in Education, where the use of Explorative experiences is low. Passive and Explorative types of iVR were the most common according to their DoF, both types of experiences being the least interactive. These results can be contrasted with reviews of other iVR areas where interactive experiences are more prevalent [4].

Fig. 13
figure 13

Percentage usage of different types of iVR by application area and by DoF

Regarding the area of application, Preservation is more conducive to 6DoF Explorative experiences. With 3DoF, a clear preference was observed for Passive experiences. Nevertheless, the analysis yielded balanced results in conjunction with the global results, which was in all likelihood due to Preservation, which was the most common area of application with no specific end-user specialization. An equilibrium between 3DoF and 6DoF experiences was evident for Musealization, which was noteworthy given the low usage of 3DoF in the sample. In 3DoF experiences, Passive types were preferred, while in 6DoF, Explorative types were favoured. Both DoFs demonstrated the use of Explorative interaction. Finally, Education showed a strong preference for 6DoF and the most interactive experience types. Even in 3DoF, the use of Passive was minimal. Explorative Interaction was the most frequently used experience type in 6DoF. These results were consistent with the research of Bekele [20], which emphasized that CH educational experiences with XR required interaction. Comparison with reviews of other areas yielded similar results to Checa’s review [4] of iVR in education and training.

Summarizing this sub-section, the following conclusions can be drawn. The characteristics of heritage dictate its future area of application, with preferred usage of certain RP types. On the one hand, preferential usage of RP types based on 3D modelling were noted in Preservation, but especially in Education experiences that had high LoS, due to the creative freedom provided by 3D modelling. Interactive iVR types were therefore more frequent in Education [4] where 6DoF predominated and Explorative interaction was the most common experience. In addition, this greater interaction could be used as a pedagogical element in this type of experience [20]. However, they had a low LoH and LoD, which is natural as there were fewer archaeological resources [138], and a high LoD in an educational resource could distract students [158]. On the other hand, RP types based on digitization had a preferred use in Musealization. Their low LoS is aligned with their greater predominance for 3DoF experiences and iVR types with lower levels of interaction. This reduced user freedom may be beneficial in the context of group visits [20]. Moreover, their high LoD and LoH make them suitable for this area, as the high LoD makes them adequate for end-users due to their visual impact [157], and the LoH makes them useful in Musealization because of its higher historical accuracy. However, they had a low LoS. Finally, Preservation had a less marked profile and a more equitable distribution of RP types, with a predominance of 6DoF and experiences of all types, although Explorative in 6DoF and Passive in 3DoF were more common. As Mendoza noted [38], this less marked profile was probably due to a lack of specialization among the end-users of iVR experiences.

5.4 Design of immersive Virtual Reality

In this sub-section, the design of the iVR experience will be analysed. Given the significant impact of DoF on the choice of iVR experience type, it is also used to separate the sample. Firstly, the type of interaction will be analysed, followed by the choice of HMD and its relationship with the types of interaction. Thirdly, the way to end the experience will be verified. Subsequently, the use of characters, audio, and interface in the experiences will be analysed. In fifth place, the game engines of each experience are listed and, finally, an explanation will be given of technological developments within this field.

The percentage usage of each type of interaction in the types of iVR experience was analysed, to investigate the relationship between the type of interaction and the type of iVR experience. The percentage usage of each type of interaction was as follows: Head Movement (20%), Point and Click (16%), Gamepad locomotion (17%), Room Scale (10%), and Teleport locomotion (37%). Figure 14 displays the percentage usage of each type of interaction on the Y-axis, and the types of iVR experience on the X-axis. Each type of iVR experience is divided into two columns, with the left column referring to the 3DoF results, and the right column to the 6DoF results. The Passive experiences were exclusively 3DoF, while the Interactive experiences were exclusively 6DoF. As shown in Fig. 14, two types of interaction were used in 3DoF: Point and Click for Explorative and Explorative Interaction experiences, and Head Movement for Passive experiences. The 6DoF results were more varied. The explorative 6DoF experiences used Teleport locomotion, Gamepad locomotion, and Room Scale, with a greater preference for Teleport locomotion, the most widely used type of interaction overall. The explorative interaction experiences only used Teleport and Gamepad locomotion, but with a greater use of Teleport locomotion. Finally, Interactive experiences only used Teleport locomotion and Room scale, but in this type of iVR, the most widely used interaction system was Room scale.

Fig. 14
figure 14

Percentage usage of interaction type by type of iVR and DoF

These results may be due to the following reasons. The exclusive usage of both Point and Click and Head Movement, the two simplest types of interaction, is standard in 3DoF. Head Movement is the most restrictive, so it is exclusively used in the most limited ‘Passive’ types of experience. Regarding the results of 6DoF, it can be seen that Teleport Locomotion is the most widely used type of interaction. This result is expected, as it has been one of the most common types of locomotion in iVR over recent years, as Prithul pointed out in his teleportation review [159]. Regarding all the iVR 6DoF experiences, Explorative experiences are the most varied interaction types, which may because they are the most common in the earlier years of the study when taxonomies and locomotion systems were more general and less specific [160]. In Explorative interaction, there is an increase in the use of Teleport locomotion, probably because these experiences are more modern and there is a need to teleport over long distances [159]. Regarding Interactive experiences, there is a greater predominance of Room scale and the disappearance of Gamepad locomotion. It is an expected result, as this type of experience is characterized by a lot of interaction within a small space [4]. Therefore, interaction with both a gamepad and a teleportation system make less sense within small spaces [159].

The following point to be studied is HMD usage. To do so, the types of HMDs used during the years under study (2013–2022) were analysed. The percentage usage of HMDs was as follows: 3DoF desktop (6%), 3DoF standalone (23%), 6DoF 180° external tracking (21%), 6DoF 360° external tracking (37%), 6DoF internal tracking desktop (12%), and 6DoF internal tracking standalone models (1%). Figure 15 displays on the Y-axis the number of cases and the types of HMDs, while on the X-axis the year of publication of the references are shown. On the one hand, the usage of 3DoF standalone and desktop HMDs is similar throughout all the years, showing no significant evolution. On the other hand, in the first years of analysis (2013–2017), 6DoF 180° external tracking desktop HMDs were widely used, but in 2017 they were being replaced by 6DoF 360° external tracking models in 2017. Those models are the most widely used HMDs, but in recent years, they have been losing ground in favour of 6DoF internal tracking desktop and standalone models.

Fig. 15
figure 15

Development of HMD usage (2013–2022)

3DoF HMDs have been developed least over the 10 years of the study. This finding is reflected in Fig. 14, which shows the limited variability of 3DoF experiences, due perhaps to their more limited experiences. Regarding 6DoF experiences, the following conclusions can be drawn. There have been three stages in the use of 6DoF HMDs: a first stage dominated by 6DoF 180° external tracking desktop HMDs, a second stage with 6DoF 360º external tracking desktop HMDs (a development that incorporates more complex tracking), and a third stage, that we are currently going through, with more weight placed on 6DoF internal tracking desktop and standalone models. It implies an expected future trend where 6DOF internal tracking desktop and standalone models will replace 6DoF 360º external tracking desktop models as the most widely used HMDs.

The following point is the relation between the chosen HMD and the type of interaction. To do so, the types of HMDs are classified by the type of interaction. Figure 16 presents the percentage cases of each type of HMD on the Y-axis, and the types of interaction on the X-axis. The types of interaction are separated by their DoF, so Head Movement and Point and Click are exclusive to 3DoF and Teleport Locomotion, Gamepad Locomotion, and Room Scale are exclusive to 6DoF. On the one hand, it can be observed that the 3DoF standalone is the most popular HMD in 3DoF experiences, followed by 6DoF 360º external tracking desktop HMDs. Furthermore, the latter HMD was only used for experiences with a Head Movement system. On the other hand, 6DoF experiences showed more differences. Reviewing the types of HMD, the most widely used were in general also used in 6DoF experiences, in the following order: 6DoF 360°external trackers desktop HMDs, 6DoF 180º external tracking desktop HMDs, and finally 6DoF internal tracking desktop and standalone HMDs. The few cases of 3DoF HMDs for 6DoF experiences were due to outdated experiences and adapted visors. Moving on to the types of interaction, Gamepad locomotion is currently used in the widest variety of HMDs, very probably because it is the most common type of interaction in the first years of the analysis. Room scale follows the majority statistics, with a predominance of 6DoF 360° external tracking desktop models, followed by those with a field of view of 180°. Lastly, Teleport Locomotion was mainly used for 6DoF 360º external tracking desktop models and 6DoF internal tracking desktop and standalone models, with a minor quantity of 6DoF 180º external tracking desktop models.

Fig. 16
figure 16

Usage rates of different types of interaction in relation to type of iVR and DoF

The following conclusions can be drawn from the type of interaction and the HMDs. The most widely used type of HMD in 3DoF is the 3DoF Standalone, followed by 6DoF 360º external tracking desktop HMDs for Head Movement experiences, which is the most common type of movement in 3DoF. The reason for the use of this HMD is probably due to its popularity and it is not surprising that a few HMDs use it for 3DoF experiences. In addition, some equipment, such as eye-tracking sensors, is recommended for that type of Head Movement. With regard to 6DoF, it may first of all be noted that there is no consensus regarding the use of Gamepad locomotion, where all types of movement are used almost interchangeably, probably due to the age of the experiences. Secondly, Room scale is the type of interaction that follows the general statistics, with a majority use of 6DoF 360° external tracking, but being the most unusual type of interaction. Finally, Teleport locomotion is the most widely used type of interaction, making use of the most modern HMDs. This type of interaction is also exclusive to the newer 6DoF internal tracking desktop and standalone HMDs. The use of the most modern HMDs with Teleport locomotion confirms this type of interaction as the most popular locomotion system for 6DoF experiences, echoing the results of Prithul’s review on teleportation in iVR [159].

The following analysis was conducted to investigate the different ways of ending the iVR experiences and the type of iVR experience in use. The percentage usage of each type of ending was as follows: Free (62%), Time (8%), Exploration (10%), and Task-related (20%). Figure 17 displays the percentage usage of each type of ending on the Y-axis, while the X-axis shows the different types of iVR in two columns, displaying the information on the 3DoF and the 6DoF experiences on the left and on the right, respectively. The passive experiences and the interactive experiences were exclusively 3DoF and 6DoF, respectively. No preferred form of ending was evident for Passive 3DoF experiences, with Exploration, Time and Free endings being equally divided. As regards the Explorative experiences, both 3DoF and 6DoF showed a predominantly Free form of ending. However, for Explorative interaction, a greater difference between DoFs can be observed, where Exploration was the more common ending for 3DoF, while Tasks were more frequent for 6DoF, albeit with a still significant presence of Free conclusions in both. Finally, Interactive experiences with 6DoF often had Task-related endings, and less commonly endings with Exploration.

From these observations, certain conclusions can be drawn on the endings of the iVR experiences. While no clear preference could be established for Passive experiences, Task-related endings were excluded, due to the low level of interaction in those mainly 3DoF iVR experiences. Regardless of the DoF, Explorative experiences tended to have Free form endings. Given the limited interactivity of these experiences, it is to be expected, as users are afforded the freedom to explore their environment as they please. As for Explorative interaction, the prevalence of Exploration endings for 3DoF and Task-related endings for 6DoF, can be attributed to the higher degree of interactivity involved in these iVR experiences where both types of endings require active user participation. Finally, the predominance of Task-related endings for Interactive experiences was consistent with the high degree of interactivity present in that type of iVR experience.

Fig. 17
figure 17

Percentage usage of each type of ending by type of iVR and DoF

The following analysis will be focused on the utilization of characters, audio, and interface in iVR experiences. In that regard, the percentage usage for each element in different types of iVR experiences are considered. Given that many articles lack clear specifications on these concepts, the results should be interpreted as indicative, and the analysis is aimed at providing a general overview. For that reason, unspecified data are included, serving as an indication of their non-inclusion. Figure 18 depicts the percentage usage on the Y-axis, with each type of iVR experience represented on the X-axis. To present the data collectively, the percentage of character usage in each iVR experience is displayed from left to right, followed by the percentage of audio usage, and finally, the interface usage. In Fig. 18, whether these elements have been used in each iVR experience is presented, without specifying the percentage usage within each category identified in the taxonomy. These data will be discussed in the text, so that Fig. 18 remains clear. Whenever a specific data point is not specified in the article, it is labelled “No inclusion” in Fig. 18.

Firstly, common patterns will be elucidated, followed by the presentation of individual data for each of the three items. As a general pattern, the inclusion of those elements was observed to increase with the complexity of the iVR experience. There were three exceptions to this pattern, with characters and audio being more common in Passive experiences, and interface inclusion being slightly less common in Interactive experiences. In all, 33% of experiences had characters, of which 28% were Digital and 8% were Recorded. Recorded characters are exclusively used in 3DoF experiences, and predominantly in Passive ones, which explains the higher usage of characters for Passive experiences. Audio formed part of 70% of the reconstructions, of which 36% was Ambient, 20% Narration, 10% Sound effects, and 4% Music, the least common category. Within iVR experiences, there were no clear patterns regarding audio usage, except for Music that was exclusive to Explorative and Explorative interaction iVR experiences. Similar to characters, the extensive use of audio in Passive experiences was attributed to its significant use in 3DoF experiences, where Passive was the type of iVR with the highest audio usage. Finally, interface usage was employed in 59% of experiences, where 24% was PoI, 18% Panel, and 17% Menu. Those percentages remained consistent across types of VR experiences with varying DoF.

Fig. 18
figure 18

Percentage usage of characters, audio, and interface by type of iVR experience

The following conclusions were drawn on the basis of the above data. Generally, more complex experiences make greater use of characters, audio, and interface. A conclusion that is not surprising, as higher complexity in the iVR experience requires more elements for interaction and feedback. An exception to that trend was observed in Passive experiences, where the use of characters (exclusively Recorded) and audio was higher. It may be attributed to the non-interactive nature of Passive experiences, as both audio and Recorded characters were pre-recorded resources. Nevertheless, those results should be interpreted as indicative, due to the limited specificity provided in the articles.

Finally, the last analysis of the sub-section was on the type of game engine in the iVR experiences. The usage of each game engine, the LoS and LoD that it achieved, and the specialization of the development teams are considered to achieve this objective. The most widely used development engines were Unity (n = 55) and Unreal Engine (n = 25). It is no surprise that Unity is the preferred game engine, as it is one of the most widely used in this field [20], as well as within other areas of iVR applications [4]. Nevertheless, the use of Unreal Engine is relatively widespread compared to other areas of iVR applications where it used far less [4]. Something that may be due to the programming simplicity of this game engine, which makes it suitable for multidisciplinary teams [38]. If game engine usage is separated by the field of study of the first author, Unreal Engine has greater usage among history and archaeology specialists. Additionally, an analysis of the reconstruction characteristics revealed that Unreal Engine obtained a better score than Unity in both LoS (3.57 vs. 3.03) and LoD (3.88 vs. 3.02). These data coincided with the conclusions of other studies that underlined the hyper-realistic results of Unreal [16, 20].

In summary, conclusions regarding the design of 3DoF and 6DoF experiences will be obtained. An analysis of the technological evolution of the topic over the period of analysis (2013–2022) is presented in this sub-section to support the formulation of conclusions. This evolution is depicted in Fig. 19. The Y-axis displays the number of cases per variable, which, for readability in Fig. 19, has been divided into three areas, analysing the evolution of the type of iVR experience, HMD type, and type of interaction. The X-axis shows the temporal evolution, divided into three triennia to facilitate the analysis and the conclusions: Triennium 1 (2013–2016), Triennium 2 (2017–2019), and Triennium 3 (2020–2022). Although Triennium 1 spans a period of 4 years, it effectively covers three years of analysis, as there were no data available for 2014. These three evolutions were chosen as they were the most technologically relevant among those analysed, clearly demonstrating an evolution. Not all taxonomies within each of these areas are presented; instead, the most relevant ones are included to clarify their evolution. A legend at the end of Triennium 3 displays the name of each variable alongside its particular line. Figure 19 is then discussed and analysed in conjunction with the summary of the design analyses of 3DoF and 6DoF iVR experiences.

Fig. 19
figure 19

Technological evolution of the type of iVR experiences, HMD type and type of interaction

The following conclusions can be drawn from the design of 3DoF experiences. The most frequently used type of movement was Head movement for Passive experiences and Point and Click for Explorative and Explorative interaction, which were the simplest of the entire taxonomy, and exclusive to those two forms of interaction in 3DoF. As depicted in Fig. 19, Passive experiences are in a state of consolidation, showing minimal evolution in Triennium 3. Similarly, Head movement demonstrates a similar trend in a state of stabilization. Point and click shows a relatively higher increase in some cases, indicating a slightly higher potential for evolution. Standalone 3DoF HMDs are the most widely used, although there are several cases of 6DoF 360° external trackers desktop for Passive experiences, possibly due to the eye-tracking capabilities of those HMDs. Similarly, Fig. 19 illustrates that the evolution of 3DoF standalone is mild and is aligned with the trends observed for Head movement and Passive experiences. Free was the most common type of ending, although Exploration had a majority presence in Explorative Interaction experiences, due to its greater interaction There was no clear trend in Passive experiences, with Free, Exploration, and Time being used. Furthermore, this DoF stands out for the utilization of Recorded characters and extensive use of audio in Passive experiences. Those elements, while non-interactive, contribute life to this type of reconstructions. The findings highlight 3DoF experiences as the simplest, with the most straightforward types of interaction and the most affordable and least functional HMDs. As with other studies, the findings indicated that, if well-designed, fewer stimuli lead to greater engagement in 3DoF experiences [161, 162]. It also explains the greater use of these experiences in Musealization, as they are simpler experiences, comfortable to use in groups [20], and with lower-cost HMDs that can be marketed to individual users. The slower evolution of the number of cases related to this technology in the last triennium under analysis can in all probability be attributed to its increased technological simplicity, resulting in comparatively lower research interest.

Furthermore, the following conclusions were drawn from the design of the 6DoF experiences. Explorative experiences and Explorative interaction represent the most common type of iVR experience in this DoF. Gamepad locomotion was rarely used and was mostly found in older versions, while Teleport locomotion was the most common type of movement [159]. A fact that is clear in Fig. 19, where it is observed that as the use of Gamepad Locomotion decreases, the utilization of Teleport Locomotion doubles between triennia. Interactive experiences were an exception, in so far as Room scale became more important, due to the lack of need for movement [4]. However, Interactive experiences were less common. In terms of HMD preference, there was a shift from 6DoF 180º external tracking desktop HMDs to 6DoF 360° external tracking desktop HMDs, which are currently the most widely used, and to 6DoF internal tracking desktop and standalone HMDs. It is an understandable change, as the 360° models offer more tracking area and internal trackers give greater user comfort, due to the lack of external trackers. The same thing can be observed in Fig. 19 where it is evident that in Triennium 1, the most widely used HMD is the 6DoF 180º external tracking desktop HMD. It continues to grow in Triennium 2 but is surpassed by 6DoF 360° external tracking desktop. Finally, in Triennium 3, although there are still more instances of 6DoF 360° external tracking desktop, the trend declines, and the 6DoF desktop internal tracking HMD emerges. The most modern HMDs are primarily used with Teleport locomotion, with 6DoF 360° external tracking desktop HMDs being the preferred choice for Interactive experiences. A choice that is due to the varied range of motion sensor positions that enhance tracking. The way the ending of the experiences varied was greatly dependent on the type of iVR experience. The Free ending was the preference for Explorative experiences, while Task-related endings were preferred for Explorative interaction and Interactive experiences, due to the higher interactivity of these types of iVR experiences. Furthermore, as the complexity of the iVR experience increases, it is more common to encounter the inclusion of characters, audio, and interface. Unity was used more frequently as a development engine, but Unreal stood out, because of its ease of use in multidisciplinary teams [38] and its high Los and LoD [16]. Those characteristics make it particularly suitable for Education and Preservation applications. The greater complexity of 6DoF experiences affords greater interactivity and variability, which is beneficial for learning and research purposes. Reasons that explain the greater usage of 6DoF in Education and Preservation. The greater complexity of 6DoF implies greater interactivity and variability, which is beneficial for learning [20], but also for research, as experiences with many variables may be explored. The trend toward increased interaction can also be observed in Fig. 19. While Explorative experiences decline in the last triennium, there is a rise in other more complex experiences such as Interactive experiences and Explorative interaction.

5.5 Evaluation

Once an iVR experience is completed, a decision on whether to assess the experience in the post-processing phase can be taken [19]. This section analyses the quantity, the purpose and the quality of the evaluations. First, the quantity of evaluations was considered, based on year of publication, then the purpose of the evaluation was analysed and, finally, the type and the characteristics of the evaluation, highlighting some examples of more complex and higher-quality evaluations.

The first analysis considered year of publication and number of evaluations. The total percentage of papers that included evaluations was 42%. In Fig. 20, the Y axis shows all the studies divided between those that did and did not describe evaluations. On the X axis, the publication years are displayed. In the early years (up until 2017), it was relatively common to conduct evaluations or tests, but due to the small number of cases, it is not a very representative part of the sample. In the following years, it can be observed that the number of evaluations increased year by year until 2022, a year in which 50% of the papers reported evaluations.

Fig. 20
figure 20

Papers with and without evaluation of the iVR experience by year of publication (2013–2022)

This analysis showed that evaluations are scarce, as less than half of the reconstructions were evaluated. Despite these results, outcome evaluations are a growing trend. In recent years the percentage of cases evaluated has approached 50%, suggesting that evaluations will be increasingly common.

The following analysis will be focused on the purpose of the evaluations that were conducted, taking into account the area of application, the evaluation purpose, and the type of questions. On one hand, the most common evaluation purpose was Usability and user experience (69%), followed by Education/Engagement (20%) and Technology or System Application (11%). On the other hand, the most frequent type of question was Likert (48%), followed by Open-ended (30%) and Options (10%). The percentage usage of the evaluation purpose and type of questions are shown on the Y-axis of Fig. 21. The areas of application are divided into two columns on the X-axis, with the data for the evaluation purpose in the left column and the type of question in the right column. In general, it can be observed that Preservation has the most extreme percentages of usage both for the evaluation purpose and for the type of question. Musealization is somewhat more balanced, while Education shows greater diversity. In Preservation, Usability and user experience were used almost exclusively as the evaluation purposes. In terms of the type of questions, it is divided between Likert and Open, with slightly more emphasis on Likert. In Musealization, Usability and user experience remain the most common purpose, but Technology or System Application and Education/Engagement are added, each with a 20% usage rate. The type of question is somewhat more balanced, with a slightly higher usage of Open questions. Finally, the most balanced area of application is Education, with Usability and user experience having the same weight as Education/Engagement, each with a usage rate of approximately 40%. Additionally, Technology or System Application rose to 20%. Concerning the type of question, Options-based questions gain more prominence, causing Open questions to lose some of their presence.

Fig. 21
figure 21

Percentage usage of evaluation purpose and type of questions by area of application

From these data, the following results can be obtained. The most frequent evaluation purpose is Usability and user experience, followed by Education/Engagement and Technology or System Application. However, this distribution is not uniform and varies depending on the area of application. Preservation, not targeting a specific audience, is almost exclusively focused on Usability and user experience. On the other hand, areas of application with an orientation toward a specific audience require a greater diversity of evaluation purposes and types of questions, with Education being the primary exponent. Naturally, Education also has Education/Engagement as its main evaluation purpose, which is predictable given the nature of the area of application. These results are anticipated, as specialization in the area of application typically leads to a corresponding specialization in evaluation purpose and type of question, diverging from general trends. Additionally, these findings are aligned with the results of Checa’s review of educational applications for iVR [4], where the evaluation purpose is divided between Education/Engagement and Usability and user experience. Regarding the type of question, Likert is the most commonly used, followed by Open and Options. However, as with the evaluation purpose, the usage of type of questions varies in accordance with the area of application. In this case, the only area that shows a significant difference is Education, where Options are much more used. This difference could be attributed to the fact that knowledge-oriented evaluation questions are often assessed using test-type questions, falling under the Options type of question.

The type of assessment according to the divisions between phases, the comparison of results with a control group, the sample size, and other evaluation techniques were all considered to analyse the quality of the evaluations. The indicators of evaluation quality were the number of evaluation phases, with Post being the least favourable and Pre/During/Post being the most robust. Additionally, a control group was used in the evaluation and the sample size, where a higher size was considered better [154]. Furthermore, some evaluations employed more sophisticated methods such as eye-tracking, which serves as an additional quality indicator [163]. All the advanced evaluation methods identified in the sample were not specified in the article, due to a lack of discernible patterns in their usage. The comprehensive list is available for review in Annex II.

Accordingly, the graph in Fig. 22 shows the percentage usage of different types of assessment and the use of control groups on the Y-axis. Each area of application is represented on the X-axis. The column of each area refers to the type of assessment, and the separate areas reflect their use of control groups. The areas indicate the use of control groups across all the evaluations in each area of application. However, this percentage is not correlated with the type of evaluation in Fig. 22. Preservation is the area with the least evaluation and where control groups are less used. Musealization ranks second with a higher presence of evaluation and the same quantity of control groups, but with simple evaluations. Finally, Education is the area where the most evaluations take place, with more complex evaluations and a greater presence of control groups.

Fig. 22
figure 22

Percentage usage of control groups and each type of evaluation by area of application

Regarding the sample size, the overall mean was 113 individuals, a high average which was influenced by specific evaluations with very large groups. The total median, however, was 37 individuals, indicating a more representative measure. The sample size was aligned with the same quality trend observed for the type of evaluation, with the following means (M) and medians (Md): Preservation M = 50 Md = 23, Musealization M = 120 Md = 37, and Education M = 209 Md = 72. Similarly, Preservation had the smallest samples, while Musealization and especially Education had the largest ones. This finding is aligned with those of other studies indicating that evaluations conducted in museums and similar environments often involve large samples [163]. Education, on the other hand, is an area that typically stands out for the robustness of its evaluations [4]. Lastly, the utilization of advanced evaluation and data analysis methods, such as eye tracking, Electroencephalogram (EEG), or Analysis of Variance (ANOVA), were considered. These techniques were employed in 9% of Preservation evaluations, 10% in Musealization, and 17% in Education. Once again, the pattern was repeated, although the percentage was relatively low across all areas of application.

In summary, the following conclusions can be drawn. Preservation was focused on the assessment on Usability and user experience. In addition, it is the area with the highest percentage of evaluations of the lowest quality, although it has a few complex evaluations, low sample size, and few complex evaluations. The lack of evaluations within this area makes sense, given the characteristics of the experiences, which were not tested with end-users. Musealization, while continuing to focus on Usability analysis, incorporated additional purposes. This area of application was ranked second, but it had very few complex evaluations. Moreover, the size of their samples was similar to the overall mean and median, indicating that the most common practice in this topic was to evaluate with groups of around 40 people. The orientation of this area towards end-users facilitates large evaluations [163], although the difficulty of conducting complex evaluations with very large groups may explain its low quantity and quality [20]. Finally, Education can be highlighted under evaluation purposes as having a high interest in the Evaluation/Engagement purpose. This area had the highest percentage of evaluations, more complex evaluations, greater use of control groups, larger sample size and increased utilization of complex evaluation techniques. It is hardly surprising in an area where robust evaluation is one of its pillars [4]. These data coincided with other studies that indicated that the evaluation of CH experiences was poor, and with simple evaluations [154]. Nevertheless, Education showed more robust evaluations, and the number of evaluations of iVR experiences is rising in view of the valuable lessons that can be learnt from the results. Some examples of particularly robust evaluations will be provided, to conclude the findings of this subsection. The following examples stand out for their type of evaluation, sample size and use of a complex methodology. The first example involved the virtual reconstruction of Moscow’s Red Square (Russia) in the 20th c [104]. , utilizing a Pre/Post-evaluation method with a sample of 60 individuals, incorporating a control group, and employing Polychoric PCA as the analytical method. The second example was the virtual reconstruction of the Neolithic site of Çatalhöyük in Turkey around the 70th c. BC [101]. In this study, a Pre/During/Post-evaluation method was applied to a sample of 84 individuals, with the use of control groups and employing factor analysis and ANOVA as statistical methods. Lastly, the virtual reconstruction of La Draga settlement (Spain) around the 50th c. BC was chosen [133]. In that study, an assessment was conducted with a sample of 262 individuals, collecting data on their performance in the iVR experience, on which basis user behavior was clustered. Additionally, a post-evaluation was performed with a correlation analysis on a sample of 42 users.

6 Best design practices and future lines

In this section the most common design best practices found in the review and their application in the development of virtual reconstructions of CH in iVR will be explained. Secondly, the future lines of research of the topic will be outlined. To do so, two different types of analyses were conducted. Firstly, for the identification of best practices, a correlation analysis was performed to identify the most important variables to consider. Secondly, for the definition of future lines, a qualitative analysis supported by the results of the review was conducted.

6.1 Best design practices

A correlation analysis was conducted for the analysis of best design practices. In this sub-section, the most relevant results of that analysis will be detailed, linking the variables to the outcomes of the study. The explanation of best design practices has been organized following the same order as the Survey section: (1) data distribution; (2) heritage characteristics; (3) reconstruction characteristics and experience design; (4) design of iVR; and, (5) evaluation. All variables will not be addressed, due to the nature of this analysis, and some sections will be omitted.

Figure 23 presents the correlation matrix conducted in this study. It illustrates several variables analysed during this investigation along with their correlation coefficients represented in the matrix. Some variables were not included in the analysis, due to the noise they generated in the matrix, caused by the large amount of unspecified data, among other reasons. The correlation coefficient ranged from − 1 to 1, representing absolute correlations at the extremes. The orange colours represent negative coefficients, while the green ones represent positive coefficients. The more extreme the value, the more saturated the colour. A linear progression was not always followed by the variables that were examined in this study. For instance, “reconstruction country” cannot be numerically compared with “game engine.” Whenever possible, the data were coded from lower to higher, assigning lower values to variables with lower technological complexity. In the case of LoS, LoD, and LoH, values were assigned based on their scores. The complete coding of the data can be reviewed in Annex II. It is mentioned, due to that situation, whether the correlation was positive, negative, or indiscernible, as the data were entirely qualitative. Considering the abundance of correlations in the text, only those values below − 0.30 or above 0.30 will be discussed. Similarly, in Annex II, the complete analysis can be reviewed, including its p-value. It is essential to note that the correlation between values implies no causality and that these data will be discussed considering previous results.

Fig. 23
figure 23

Correlation matrix of the most relevant variables

Firstly, continent of publication has a correlation of -0.68 with reconstruction continent and − 0.31 with country of publication. Similarly, country of publication has a correlation of -0.38 with reconstruction continent and 0.71 with reconstruction country, and reconstruction country has a correlation of -0.36 with reconstruction continent. In this case, no distinction could be drawn between positive and negative correlations, due to the qualitative nature of the data. The high correlation coefficients in these variables indicate something that could already be observed in the previous analysis: there is a strong relationship between the place where the reconstruction is developed and the location where it is situated. Whilst it may be true that Europe and Asia, for example, hosted more reconstructions than they produced, that percentage was not excessively high.

Assessing the heritage characteristics, it can be observed that they also yielded noteworthy correlation coefficients related to geographic variables. On one hand, LoS has a correlation of -0.35 with reconstruction continent and 0.31 with reconstruction country. On the other hand, LoH has a correlation of -0.40 with reconstruction country. Similar to the previous correlations, no distinction can be drawn between positive or negative signs, due to the qualitative nature of the data. The following conclusion can be drawn in relation to those correlations and the previous results. As discussed in this study, there are reconstructions that stand out because of their low LoS but high LoH, usually related to RPs based on reForm. Many of those reconstructions are of Roman heritage, located in Italy and Spain, so the correlation could be attributed to that reason.

Finally, the variables related to iVR design stood out because of their high correlation coefficients. The values presented below were linearly encoded, so that either positive or negative correlations could be considered. Firstly, DoF has shown positive correlations with LoS (0.64), type of iVR (0.54), HMD type (0.43), and type of interaction (0.48). Those correlations are aligned with the results of the article. On one hand, the lowest DoF (3DoF) has the lowest LoS, many of them being 360° environments, more limited types of iVR experiences like Passive, HMDs with fewer degrees of freedom, and simpler types of interaction, such as Head movement. On the other hand, the highest DoF (6DoF) has higher LoS, with freedom of movement, more complex types of iVR experiences such as Explorative interaction, modern HMDs, and types of interaction with more freedom of movement, such as Teleport locomotion. Additionally, type of iVR has a positive correlation with type of interaction (0.42). This also corresponds to the study’s results, with more complex types of iVR typically having more freedom in types of interaction, such as Teleport locomotion. Lastly, it is observed that HMD type has positive correlations with year of publication (0.31), type of interaction (0.47), and game engine (0.31). Aligned with one of the conclusions of technological evolution, it indicates that as the years passed, HMDs evolved to have more functionalities. Thus, advanced HMDs with more functionalities tended to use Teleport locomotion as a type of interaction. Similarly, more complex HMDs, such as 6DoF 360° desktop external tracking, require powerful game engines such as Unity and Unreal Engine.

6.2 Future lines

Considering the reviewed papers and the results extracted from their analysis, the following future lines of the discipline have been extracted:

  • Advances towards increasingly better and affordable scanning and photogrammetry techniques will make reconstructions based on digitalization techniques more common.

  • 3DoF experiences have clear preferences, few variables and have therefore reached high levels of maturity. This trend is expected to continue, resulting in fewer cases in research and a focus on experiences that are entirely geared towards end-users. Further specialization is likely in Musealization where 3DOF characteristics are especially suitable for large groups. Regarding 3DoF, the only current trend that may change is the use of Point and click, which will increase usage in favour of slightly more interactive experiences.

  • 6DoF experiences will become more common, due to their potential in university research. It will perhaps mean that Explorative interaction will become more relevant within this DoF. The increase in interactivity will lead to more Task-based conclusions and the complete disappearance of Gamepad locomotion. Furthermore, the usage of characters, sound, or interface is expected to increase, as they are more necessary in more complex experiences. It is possible that this trend will evolve towards a more complex use of sensory feedback, such as biofeedback. With regards to HMD usage, the 6DoF 360º external tracking desktop HMDs will be replaced by the 6DoF internal tracking desktop and standalone HMDs.

  • 6DoF experiences will become more important in the future, as 6DoF HMDs become more functional and affordable.

  • The number of Education experiences will increase, as they have the greatest potential for university research, due to their interactivity and the availability of better evaluations.

  • The use of simple game engines will increase in this field, meeting the needs of multidisciplinary teams with less knowledge in computer science.

  • Evaluations will increase in quantity and quality. 3DoF will evaluate more aspects of usability, design, and device selection with end-users. In contrast, 6DoF will focus on educational aspects or the usability of iVR itself to make this technology more accessible to end-users.

A two-colour scale, grey and orange, is proposed for the qualitative assessment that concludes this section. Figure 24 presents a summary of the most common and promising solutions, divided between 3DoF and 6DoF experiences. Each DoF is divided into five sectors, which were the most important categories analysed in each section of the review: (1) reconstruction procedure; (2) area of application; (3) type of IVR experience; (4) Head Mounted Display; and (5) type of interaction. The size of the circles represents the current number of cases. The orange circles denote the solutions with the greatest growth potential in each sector. The solutions marked with higher growth potential in orange have been outlined by experts involved in the development of virtual reconstructions of CH in immersive iVR based on the results of the analyses presented in the article. This assessment system has been used in other reviews related to technological solutions and CH, such as the reviews by Lucchi in CH and photovoltaic systems [164, 165].

Fig. 24
figure 24

Summary of the most common and promising characteristics in 3DoF and 6DoF experiences

7 Conclusions

The use of iVR in the virtual reconstruction of CH is an expanding tool with educative potential that deserves further attention. This technology has significant room for improvement that needs further research, particularly into 6DoF experiences. Based on the analysis of this sample, the following paragraphs respond to the questions posed in Fig. 1, including Methodology and Data distribution section. Some questions have been grouped to facilitate their response. In each paragraph, the title of the question is highlighted in bold, followed by the response.

To answer “Conclusions on methodology. Not asked in Fig. 1”. Standardizing the taxonomy for iVR and virtual reconstruction of heritage is crucial. During the search for papers, many were found that could not be fitted into the chosen taxonomy. There were numerous papers that described virtual reconstruction such as photogrammetry or iVR for computer game experiences. As a proposed solution, the taxonomy outlined in the Methodology and Taxonomy section should be adopted.

To answer “Conclusions on data distribution. Not asked in Fig. 1”. The discipline is in a state of consolidation, having reached its peak production in 2018. These articles are distributed between conference papers and journal articles, with an increasing trend toward publication in journals. Geographically, Europe stands out as the continent with the highest production and the most reconstructions located, with Italy and Spain being prominent within the continent. On the other hand, Asia occupies second position both in terms of production and localization. Lastly, the discipline is clearly interdisciplinary, combining technical and historical disciplines, with a pronounced emphasison technical aspects.

To answer “Do nationality, period, preservation, and type of heritage influence the reconstruction and its future use? Are there more viable characteristics for a reconstruction?” Nationality, period, RP, and heritage type have a notable influence on the subsequent development of an iVR experience. The most common and constant cases are those of reconstructions of Urban cores and Sacred heritage, with processes based on reConstruction rather than reForm. However, the production peaks consist of papers on Roman history and Contemporary-Age, Civil and Industrial heritage. Peaks that are probably due to the use of digitization in these reconstructions, which makes the process more affordable and gives it greater historical accuracy. Additionally, reconstructions are typically located in the same country where they were developed.

To answer “What are the characteristics of a reconstruction in iVR? Do the characteristics of the reconstruction influence the design of the experience and its area of application?” During the analysis, it has been observed that LoS, LoD, and LoH are highly effective values for assessing the features of a virtual reconstruction. The RP of the reconstruction influences their characteristics. The use of reForm was associated with a low LoS and many instances of 360° environments, but those experiences stand out because of their LoH. On the one hand, cases of reConstruction have a very high LoS but a lower LoH. Those characteristics are relevant when creating the iVR experience. A low LoS combined with a high LoH are recommended features for 3DoF Musealization experiences, as a simpler, more compact experience is more usable for large groups. Also, a high LoH provides historical accuracy. On the other hand, LoS proves to be relevant for 6DoF Education as it facilitates the inclusion of interactive elements and the exploration of large environments.

To answer “What are the most popular areas of application and types of iVR? What characteristics should reconstructions have in each area of application? What is the most recommendable type of iVR for each area of application?” Preservation is the most common area of application, although its characteristics hardly stand out, due to its lack of a specific target audience in the context of this study. These characteristics are relatively low as they are not intended for end-users. It is an area where primarily Explorative and 6DoF iVR experiences are developed, which are the most common type of iVR experience and the most prevalent DoF. Musealization is the second area with the highest number of cases. It stands out for featuring reForm reconstructions with a low LoS and a high LoH, along with a significant number of 360° environments. Explorative iVR experiences are predominantly developed in this area, as they are the most common type of iVR experience. However, there is also a notable concentration of Passive experiences within the same area. Similarly, 6DoF experiences are the most popular, though nearly half of all 3DoF experiences are found within the area. Lastly, Education is the least frequent of the three areas. It features reconstruction with a very high LoS. As a result, most experiences within this area are Explorative Interaction, with 6DoF being more prominent than 3DoF.

To answer” What is the most suitable design for each type of iVR experience? What are the most widely used HMDs and game engines for each type of iVR experience?” 3DoF experiences usually have the following design characteristics. They are iVR experiences with a strong emphasis on Passive experiences, primarily controlled through Head movement. Point and click interaction is also used in experiences with more interactivity, such as Explorative ones. These experiences often have either with Free or Exploration endings, after the entire content has been viewed. These experiences can be complemented with the use of Recorded characters and audio. The most widely used HMD is the 3DoF standalone. The affordability of the procedure and HMDs, along with the simplicity of these experiences, make them highly recommended for groups. Additionally, thanks to RPs based on digitalization, these experiences tend to have a high LoH, making them particularly suitable for Musealization. There is therefore widespread overall consensus on the path to be followed for the development of this type of experience. 6DoF experiences usually have the following design characteristics: Explorative and Explorative interaction experiences. They are predominantly controlled through Teleport locomotion, the most common type of interaction. However, in Interactive experiences, Room scale is the most frequently used, as it requires less movement. The endings of these experiences are usually Free, although in cases with more interaction, the endings tend to be Task-related. Currently, the most widely used are 6DoF 360° external tracking Desktop HMDs. However, it is expected that in the future, there will be a shift towards 6DoF internal tracking desktop and standalone HMDs. As these experiences gain complexity and interaction, it is more common to find elements such as characters, audio, and interface. Their higher level of interaction means that these experiences are more suitable for Education and Preservation experiences conducted in university settings. However, the increased complexity and higher development costs make them less accessible to users, both individually and in groups. Nonetheless, their versatility and potential make them particularly noteworthy in the fields of Education and Preservation. Unlike 3DoF experiences, there is less consensus regarding the use of tools in 6DoF experiences, indicating ongoing developments and research prospects.

Simple game engines play an important role in this discipline, particularly for teams with limited programming expertise. Moreover, these sorts of game engines can achieve high levels of LoS and LoD. Game engines are more important in 6DoF experiences, as it is not always necessary to use those engines for the development of 3DoF experiences.

To answer “How has technology evolved in this field?” On one hand, 3DoF experiences have shown little evolution. They gain importance in the second triennium, but the associated technologies barely gain significance in the last triennium, where Point and click is the only one showing an upward trend as a type of interaction. On the other hand, 6DoF experiences demonstrated significant evolution. Despite Explorative being the most common type of iVR experience, a clear trend has been observed in the last analysed triennium, with a decline in its importance in favour of Interactive experiences and, especially, Explorative interaction. The use of HMDs has also undergone significant transformations, with 6DoF 180° desktop external tracking being the most used in the first triennium, 6DoF 360° desktop external tracking dominating in the second triennium, and although still prominent in the third triennium, it is almost equalled by 6DoF desktop internal tracking, which is likely to become the most common HMD in the coming years. Regarding the type of interaction, there has been a shift, with Gamepad locomotion being the most common in the first triennium but being significantly surpassed by Teleport locomotion from the second triennium onwards. Overall, even in 3DoF experiences, the technological evolution in this field is progressing towards more interactive experiences with user-friendly technologies.

To answer “How much is evaluated in this topic? What areas of application have more and better quality evaluations?”. Improved evaluation techniques are necessary to advance the discipline. Currently, only less than half of papers include an evaluation, a minority of which with a control group. Preservation was the area of application with the fewest evaluations and the highest percentage of low-quality assessments. It also had the lowest sample size and fewer complex evaluations. The predominant evaluation purpose in this domain was almost exclusively Usability and User Experience. These results could be attributed to the fact that Preservation lacks a defined target audience. On the other hand, Musealization was ranked second in the number of evaluations, but its quality was only slightly better than Preservation. It showed limited use of control groups and complex evaluations. However, its sample sizes were larger, averaging around 40 participants per study, and its evaluation purposes were more diverse, including Education/Engagement and Technology or System Application. The variability in purposes and larger sample size may be due to the context of these studies, involving iVR experiences for larger groups, which, in turn, complicates the execution of more complex evaluations. Finally, Education is the area of application with the highest number of evaluations, greater quality, larger sample size, and more diversity in evaluation purposes, prominently featuring Education/Engagement. It reflects the common observation that Education stands out as an area with highly robust assessments.