1 Introduction

Since the turn of the century, technology has progressively transformed education, and it will undoubtedly become the main driver of educational practices in 10 to 15 years (2021). Notably, augmented reality (AR) and virtual reality (VR) technologies have recently been receiving much attention due to the advanced levels of immersion and interaction on cognitive, behavioral and emotional levels (Gattullo et al. 2022; Coban et al. 2022; Rauschnabel et al. 2022). VR goggles are affordable technologies for both institutions and individuals today thanks to the recent advancements in hardware and software. Nevertheless, the adaptation of VR goggles particularly in engineering education is facing a stiff challenge due to a lack of relevant digital content and, more broadly, of reliable guidelines in the design, development and implementation of such content (Lanzo et al. 2020; Udeozor et al. 2021b; Soliman et al. 2021; Kumar et al. 2021; Huang and Roscoe 2021; Wang et al. 2022; Gattullo et al. 2022).

Computational fluid dynamics simulations (CFD) are physics-based numerical solvers widely utilized in engineering design and analysis. They help accurately predict fluid flow with relevant physical and chemical phenomena. Both low and high-order cognitive skills with analytical approaches are required to solve engineering problems with CFD simulations. Performing a complete CFD simulation project requires engineering, computer science and mathematics skills, therefore is a multidisciplinary tool requiring a certain level of expertise. CFD simulations have been increasingly utilized in VR applications to reduce the entry barrier for non-experts (Li et al. 2017b; Baheri Islami et al. 2021; Solmaz and Van Gerven 2022a). Several proofs-of-concept have been demonstrated in immersive technologies with high-fidelity engineering solutions from medical experts (Shi et al. 2020) to farmers (Lee et al. 2022). Apart from technical aspects of integrity, human factors have been recently taken into account to provide better user experiences with CFD content. In general, studies have assessed the usability and user experience of VR prototypes to validate their design guidelines and concepts (Lin et al. 2019; Yan et al. 2020; Kim et al. 2021; Christmann et al. 2022; Solmaz et al. n.d.). However, some vital questions remain unanswered on the wider scale implementation, long-lasting utility and sustainability of these digital products: What are the driving factors behind students’ intention to use them? What can prevent students from using them? How can we assess students in these environments? To our knowledge, no study has directly attempted to answer these questions.

Identification of the underlying human factors in the acceptance of new technology has been a hot topic (Tamilmani et al. 2021). Theoretical frameworks on technology acceptance support researchers to hypothesize and investigate various content and context with technology. The Technology Acceptance Model (TAM) (Davis 1989) and the Unified Theory of Acceptance and Use of Technology (UTAUT) (Venkatesh et al. 2003) are prominent conceptual frameworks to investigate intended technology use. Meanwhile, several immersive virtual environments with CFD simulations have come out in recent years such as blood flow in cerebral aneurysms (Baheri Islami et al. 2021), farmer education on indoor air quality (Lee et al. 2022), fire emergency evacuation scenarios (Lorusso et al. 2022), communicating the design of modular buildings (Gan et al. 2022), medical surgery planning (Deng et al. 2021), as well as automated intelligent systems in industry 4.0 (Silvestri 2021). Unlike other domains, individuals’ intention to use such applications has been overlooked most probably due to the complexity of the topic.

1.1 Study objectives

In a previous study, we developed the Virtual Garage concept, which is a holistic immersive virtual reality experience to educate students through a real-life engineering problem solved with CFD simulation data (Solmaz et al. n.d.). We performed a user study to evaluate usability, user experience, workload and simulator sickness while validating applied pedagogical and technological design guidelines. This enabled us to reach an optimized version of the Virtual Garage, in which students can adequately perform tasks and learn without being interrupted by custom design aspects.

As a follow-up to the previous investigation, this study focuses on behavioral intention, task performance and learning in the Virtual Garage. More insight into underlying factors behind students’ use of the Virtual Garage may impact the development of long-lasting and sustainable educational experiences with VR technology. Our theoretical framework stems from UTAUT2—the extended version of UTAUT—to investigate technology acceptance in the Virtual Garage. In addition, learning gain and task performance are also evaluated to elaborate further on the utility of the Virtual Garage in the student assessment. The main objectives of our work are:

  • To examine the effect of the UTAUT2 constructs on students’ behavioral intention to use the Virtual Garage application,

  • To examine users’ perceptions and attitudes before and after the use of the Virtual Garage,

  • To examine knowledge gain and in-game task performance to form a summative assessment methodology.

2 Research model

2.1 UTAUT2: an extended framework

Technology is ever-evolving, and so is technology acceptance of individuals. Various new devices and services are being perennially released to the market aiming at fast and long-term user adaptation. Despite this, it is not always trivial to truly anticipate users’ behavior to use and adapt to new technologies. Hence, both academia and industry have developed custom methodologies and performed user studies to uncover significant factors behind individuals’ technology acceptance.

TAM was one of the very first models to comprehend users’ technology acceptance based on perceived ease of use, perceived usefulness, attitude and behavioral intention (Davis 1989). However, the model falls short to truly explaining technology acceptance due to obscured human factors (Tamilmani et al. 2021). Advances in user technology have forced the theoretical frameworks to evolve and consider more inclusive and malleable models. One example of this is the UTAUT model, which is made of the eight most substantial acceptance models to assess technology acceptance in workplaces (Venkatesh et al. 2003). Technology being a part of daily life, UTAUT was also extended to UTAUT2 by considering the use of technology in a consumer context (Venkatesh et al. 2012). A growing body of literature has utilized UTAUT2 to hypothesize and validate the conceptual models on individuals’ technology acceptance (Tamilmani et al. 2021). The baseline version of UTAUT2 incorporates performance expectancy (PE), effort expectancy (EE), social influence (SI), facilitating conditions (FC), hedonic motivation (HM), price value (PV), and habit (H) as executing factors to determine technology acceptance (Venkatesh et al. 2012). UTAUT2 has evidenced its utility to predict technology acceptance, showing that the model achieved some 74% of the coefficient of determination in the behavioral intention of use of new technology (Venkatesh et al. 2012).

2.2 Conceptual model and hypotheses

The ultimate focus of our study is to measure students’ behavioral intention to use the Virtual Garage in educational practices. In particular, the Virtual Garage is an immersive learning application blending technology as VR, content as CFD and pedagogy as instructional psychology. Its main goal is to facilitate an immersive learning environment to reduce the entry barrier for complex engineering concepts and multiphysics CFD simulations, while enabling higher cognitive, physical and emotional interferences. It appears that the UTAUT2 model is a proven framework to evaluate consumers’ behavioral intention concerning fundamental aspects of human factors. In addition, the model has the flexibility to consider external factors to hypothesize technology acceptance concepts in any context. Thus, the model can provide a strong foundation to formulate our hypothesis in the evaluation of the behavioral intention to use the Virtual Garage concept. The conceptualization of the UTAUT model for this particular application was undertaken with regard to the four sets of recommendations given in a recent literature review (Tamilmani et al. 2021).

First and foremost, we carried out a literature review to identify similar concepts and opt for suitable constructs to be utilized to explore and examine the behavioral intension of users in the Virtual Garage. Eventually, we structured a conceptual model by incorporating recommendations from the design guideline to structure a model fit better to explore the research context (Tamilmani et al. 2021). Figure 1 illustrates our conceptual model composed of original and extended constructs.

Fig. 1
figure 1

Our conceptual model to understand users’ behavioral intention in the Virtual Garage

We only retained performance expectancy, effort expectancy and behavioral intention from the baseline model. Performance expectancy is defined as the degree of benefits to users in performing certain activities. Effort expectancy is the degree of ease of use. Behavioral intention describes the intentions of an individual to use new technology. Neither social influence nor facilitating conditions were kept in our model since VR technology is not yet widely adopted by institutions or individuals for daily use. Similarly, price value, habit and use behavior were excluded as well from the model because it is not yet a common personal preference to own VR goggles. Even though VR has recently become commercially affordable, it has not yet been as invasive as other portable user devices.

Hedonic motivation indicates the fun, joy and pleasure that users have during the use of the technology. In our model, we extended hedonic motivation to additionally measure interest, thereby together forming intrinsic value (Wigfield and Eccles 2000). Literature has shown the importance of interest in similar concepts (Khechine et al. 2020) where users might be potentially interested in either technology or content.

Furthermore, similar to the price that users pay to afford technology, time and effort are imperative resources consumed by users when learning something new. In the literature, this construct is called as learning value. Literature has shown the importance of learning value in technology acceptance in similar concepts (Ain et al. 2016; Dajani and Abu Hegleh 2019). It refers to the perceived benefit and effectiveness of the digital environments in enhancing educational outcomes. It encompasses not only the acquisition of knowledge but also the ability to apply this knowledge practically, thereby exceeding the investment of time and effort. Learning value can allow users to not only expand their knowledge base but also to exercise control over their learning trajectory, making the learning process more tailored and engaging. Thus, we incorporated the learning value as a construct in the conceptual model to measure its interrelation with behavioral intention.

Users seek content that satisfies their individual preferences. Content is one of the major aspects of individual technology acceptance and use (Zhou 2013a; Kunz and Santomier 2019; Chen et al. 2020b, 2021; Chin et al. 2020). Content value is characterized by the relevance, utility, and engagement of the content provided. This includes not only the attractiveness of the content in terms of its presentation and design but also its significance and applicability in enhancing relevant skills. Users can intend to use a technology or service in order to access the content they are interested in. The content value has become an intriguing parameter in the educational use of VR. Only a limited amount of content for limited contexts are so far made available for students (Lanzo et al. 2020). Therefore, the content value was included in the conceptual framework to better assess the value of educational content.

As mentioned previously, the habit was excluded since users do not own such VR goggles individually. Yet, literature indicated that personal innovativeness may be a triggering factor in technology acceptance. Given the popularity of VR nowadays, innovation-driven individuals may tend to embrace the use of new educational technologies (Farooq et al. 2017; Handoko 2019). Researchers have also claimed the positive influence of personal innovativeness on technology acceptance. Eventually, personal innovativeness was incorporated into the conceptual model as a potential predictor of behavioral intention.

In studies utilizing new educational technologies such as smartphones and VR, self-efficacy has been found to be an important predictor of technology acceptance (Almaiah et al. 2019; Hatchard et al. 2019; Chao 2019; Plotzky et al. 2021). The construct is mainly formed by competency and belief to tackle a new technology and its operations. Since VR technology has not yet widely been spread among individuals, some users may avoid using it due to lacking competence and belief. Hence, self-efficacy was added to the conceptual model.

The final construct utilized in the extended model is the flow theory. The Virtual Garage concept is an immersive virtual reality learning tool blending technology, content and learning theory to leverage physical, cognitive, emotional and behavioral aspects of immersion. Aside from the flow theory, similar constructs have been investigated by researchers such as attention focus, perceived control, perceived enjoyment, concentration, immersion, interaction and imagination (Tao Zhou 2011, 2013b; Alwahaishi and Snásel 2013; Shin 2019; Ma et al. 2021; Bölen et al. 2021; Barrett et al. 2021). On the whole, the flow theory is a global construct arguably composed of all these constructs and might have a substantial effect on behavioral intention in VR experiences. Our conceptual model was, therefore, expanded with the flow theory including the interaction and immersion as a part of the main construct.

There are also some other constructs that have emerged in the literature but are not adapted to our conceptual model due to their vivid effects on behavioral intention such as simulator sickness. Details are conveyed in the Supplementary Information.

This study examines the behavioral intention of students to use the Virtual Garage application. We propose the following hypotheses:

  • H1: Performance expectancy (PE) is a significant positive predictor of behavioral intention.

  • H2: Effort expectancy (EE) is a significant positive predictor of behavioral intention.

  • H3: Intrinsic value (IV) is a significant positive predictor of behavioral intention.

  • H4: Learning value (LV) is a significant positive predictor of behavioral intention.

  • H5: Self-efficacy (SE) is a significant positive predictor of behavioral intention.

  • H6:Content value (CV) is a significant positive predictor of behavioral intention.

  • H7: Personal innovativeness (PI) is a significant positive predictor of behavioral intention.

  • H8: Flow theory (FT) is a significant positive predictor of behavioral intention.

Our study moved a step further to examine the moderation effects of interest and experience on behavioral intention:

  • H9: Experience and interest in CFD, VR and gaming moderate the relationship between independent constructs and behavioral intention.

2.3 Perception and user assessment

Aside from the technology acceptance with the UTAUT2 model, pair-wise comparison of some constructs can uncover interesting changes in individuals’ perceptions and attitudes towards the technology (Joa and Magsamen-Conrad 2021). Therefore, we decided to evaluate five out of nine constructs in the conceptual model that users can readily rate their perceptions without having used the Virtual Garage: performance expectancy, effort expectancy, intrinsic value, self-efficacy and immersion. We collected users’ feedback on these constructs before and after the VR experience.

Furthermore, the increasing popularity of VR in education has convinced some researchers to assess knowledge gain and evaluate learning outcomes. Recent evidence reveals that VR can enhance learning, sometimes even better than traditional practices (Lanzo et al. 2020; Klingenberg et al. 2020; Coban et al. 2022). Even though in the previous study we conveyed some insights on the assessment of learning in the Virtual Garage (Solmaz et al. n.d.), we did not provide a structured methodology on how to perform the assessment. Therefore, in this study, as the initial attempt to assess students in the Virtual Garage concept, a summative evaluation approach was developed blending learning gain and in-game task performance.

3 Materials and methods

3.1 VR setup

The Virtual Garage concept is a holistic immersive virtual reality learning experience to educate students through a real-life engineering problem solved with CFD simulation data. Technological and pedagogical design guidelines were purposefully applied to benefit the evidence-based workflows. The Virtual Garage is made of two modules; Module#1 and Module#2. The prior aims at teaching VR controllers and the theory of the content with learning tasks. The latter is the assessment where users solve the problem by using CFD simulations. Each module is designed to take 20 min on average with a 5 min break between subsequent modules by taking VR goggles off to mitigate the potency of simulator sickness. Details on the Virtual Garage concept are available in a previous study (Solmaz et al. n.d.). The Unity game engine was utilized with additional packages to develop the VR software. Simulations were run in a workstation by COMSOL v5.6 and integrated by using an extract-based data processing approach (Solmaz and Van Gerven 2022b). Meta Quest 2 VR goggles were employed to deploy the Virtual Garage.

In this study, we aimed at exploring students’ behavioral intention to use the Virtual Garage application together with perception and user assessment. In this version of the Virtual Garage as shown in Fig. 2, students were expected to learn the fundamentals of the water chlorination process, and subsequently design and analyze a water treatment basin using available CFD simulations. They were supposed to make a decision on a set of parameters to obtain an optimal design that satisfies the design constraints. More information about CFD simulations, verification & validation and learning objectives of the CASE#5 study can be found in the Supplementary Information. A walkthrough video and VR software of Case#5 are freely available on YouTube and GitHub, respectively (See Supplementary Information for weblinks).

Fig. 2
figure 2

The Virtual Garage—CASE#5: Design a water treatment basin

3.2 Questionnaire design

Empirical data was gathered before, during and after the VR experiments. The pre-tests covered three separate sections: (1) the socio-demographic background of the respondents including interest and experience; (2) the pre-knowledge test; and (3) a part of UTAUT2 constructs to apprehend the effect of perception. Besides, users' activities in the VR were logged to assess their performance by means of time, achievements and decisions. Finally, UTAUT2 and post-knowledge tests were filled by users.

Face validity checks were carried out by researchers in the field to assure that the questionnaire measure intended knowledge. Subsequently, pilot studies were conducted with postgraduate chemical engineering students to check for misconceptions in the formulation of the questions. Changes were made based on feedback and corrections received. The following sections detail the final version of the items in the questionnaire. We performed a quantitative methodology to collect, interpret and assess measured metrics to achieve the abovementioned research objectives.

3.2.1 UTAUT2 constructs and items

Items from the baseline UTAUT2 models and available literature were utilized to ensure content validity, thereby compiling a questionnaire to adequately measure intended constructs. The research model contains multiple items for each construct. Items were slightly reworded to make a better fit for the research context. Each item was measured with a seven-point Likert scale, ranging from strongly disagree (1) to strongly agree (7). Items of performance expectancy, effort expectancy and behavioral intention were adopted from Venkatesh et al. (2012). For other constructs, we employed available items in the literature to gather reliable data. Table 9 demonstrates the full list of adopted items utilized in the present study.

3.2.2 Summative assessments

Knowledge gain and in-game task performance constituted the summative assessments to provide an overall evaluation of students’ success. A knowledge test with eight multiple-choice questions was designed to measure knowledge gain based on pre and post-tests. It consists of items measuring domain knowledge gained through the learning modules and assessing lower and higher cognitive skills in Bloom’s taxonomy (Anderson et al. 2001). Three choices were provided in each question with one correct answer and two credible distractors. The test was validated by peer chemical engineers. The test is available in the Supplementary Information. Likewise, in-game data were automatically logged and utilized to quantify users’ in-game performance. Total time spent to complete tasks, achievements and decisions were collected.

3.3 Participants and recruitment

In order to determine the required sample size, we pursued the advice given by PLS-SEM experts (Hair 2017). Checking on Cohen’s table for sampling size, the sample size of 54 appeared to achieve a statistical power of 80% for detecting R2 values of at least 0.25 with a 5% probability of error. Therefore, we aimed at having at least more than 55 participants to make our analysis statistically sound.

Before finding participants and conducting the test, an ethical approval (G-2021–4281-R3(AMD)) was received from the ethics committee at KU Leuven to comply with standards for the processing of personal data in academic contexts. The testing was carried out with 57 participants from KU Leuven (Belgium), the von Karman Institute for Fluid Dynamics (Belgium) and Kompetenzzentrum Virtual Engineering Rhein-Neckar Hochschule Mannheim (Germany). A convenience sampling method was intended to reach the target cohort of students from relevant disciplines such as chemical, mechanical and civil engineering.

3.4 Data collection and procedure

Experiments were performed in May and June 2022. National and regional safety measures with regard to COVID-19 were applied during the experiments such as proper ventilation of the testing environment, disinfection of equipment and social distancing. Due to the availability of three VR goggles, participants were able to simultaneously participate in the testing. No incentives were provided. Only volunteered participants joined in the testing.

Testing started with brief instructions on research, ethics, the Virtual Garage application and intended learning objectives. A consent form was signed by participants giving further details on their voluntary participation, experimental procedure, data collection, potential risks and discomforts, anonymity and confidentiality. A paper-pen data collection was carried out for pre- and post-tests. The collected data was digitalized by the researcher. Before the VR experience, users filled in the sociodemographic test and pre-knowledge test in addition to a portion of UTAUT2 to measure the perception of different constructs. In-game data was logged in the VR goggles with the anonymous user names input by users. After the VR experience, users filled post-knowledge test and UTAUT2. The entire testing per session took 90 min on average.

3.5 Data analysis

Before running the data analysis, the collected data were screened to detect missing and duplicated data. One participant was identified as an outlier by the interquartile rule for outliers. We kept this set of data in the collection since it may be a legitimate part of the population. Also, A/B testing did not significantly change obtained results. The following sections provide details on the analysis of UTAUT2 analysis and relevant statistical methods to analyze the perception and user assessment.

3.5.1 PLS-SEM for UTAUT2

Partial Least Squares Structural Equation Modeling (PLS-SEM) is the most common model to analyze UTAUT2 results as a multivariate analysis method. In this study, we, therefore, performed the data analysis according to the methodology of Hair et al. (Hair 2017). UTAUT2 model was analyzed utilizing a two-step approach as being the most commonly applied methodology in the literature. The first step is the measurement model to conform relationships among items in constructs to sustain reliability and validity by convergent reliability, internal consistency reliability and discriminant validity. The second step is the structural model to create composite models and estimate complex cause-effect relationships in path models with constructs via coefficients of determination, predictive relevance, size and significance of path coefficients, and effect sizes. We employed SmartPLS software to perform PLS-SEM for the conceptual model.

3.5.1.1 Measurement model

The measurement model assesses whether the relationships among items in constructs are reliable and valid. Once the measurement model is approved, we can proceed with the structural model to predict and analyze technology acceptance.

First, convergent validity is measured to establish a positive correlation between items of the same construct. The items should either converge or share a high proportion of variance. The convergent validity is composed of outer loadings, indicator reliability and average variance extracted (AVE). To establish convergent validity, outer loadings should be equal to or above 0.708. Indicator reliability is the size of outer loading which is the square of its value and should be higher than 0.5. The AVE value should be or above 0.50 to verify that the construct explains more than half of the variance of its indicators. The guideline demonstrates several examples in case the items do not satisfy the above criteria to maintain convergent validity.

Another metric is the internal consistency to examine inter-correlations between different items to provide a consistent measure for the constructs. Cronbach’s alpha and composite reliability are approaches to explain the characteristics of internal consistency reliability. Both should be higher than 0.7 whereas above 0.6 is still qualified for exploratory purposes. While Cronbach’s alpha may result in relatively low reliability values, composite reliability can overpredict the reliability. Therefore, it is reasonable to consider and report both criteria; representing the lower level with Cronbach’s alpha and the upper level with composite reliability.

Discriminant validity is the final metric assessing whether a construct significantly differs from other measured constructs. It is imperative to maintain discriminant validity to prevent overlapping constructs to measure an indigenous characteristic. Even though cross-loadings and the Fornell-Larcker criterion for discriminant validity assessment have long been traditionally applied by researchers, recent literature does not find these assessments reliable to identify discriminant validity. Instead, Heterotrait-Monotrait Ratio (HTMT) of the correlations was proposed to carry out a reliable assessment. It can truly indicate the potency of the correlation between two constructs. Running the bootstrapping procedure to derive a distribution of the HTMT statistic, a reliable statistical comparison can be performed to assess discriminant validity. The confidence interval of the HTMT statistic should be lower than 1.

3.5.1.2 Structural model

Having the measurement model approved, the structural model can be processed to create composite models and estimate complex cause-effect relationships in path models with constructs, thereby evaluating our conceptual models and hypotheses. PLS-SEM was performed to evaluate the structural model and determine path coefficients together with statistical significances, interrelationships and predictions. We used bootstrapping to assess the significance of path coefficients with 5000 bootstrap samples.

First, collinearity is checked to verify that constructs do not have critical levels of collinearity among each other. The construct’s tolerance (VIF) value should be between 0.2 and 5 to eliminate collinearity, thus eradicating biases measurements among constructs.

Ensured that the collinearity does not exist, we can run the PLS-SEM algorithm to calculate the structural model and path coefficients to test hypotheses. The patch coefficients vary between − 1 and + 1 for the strong negative and positive relationships in the model, respectively. Path coefficients above 0.2 are in general assumed to have a significant relationship, whereas below 0.1 is insignificant. The relationships become weaker when the path coefficient is close to 0. Furthermore, to detect statistical significance, the bootstrapping procedure is undertaken by calculating t-values, p values and bootstrap confidence intervals for path coefficients given the context of standardized errors. These three metrics should be utilized to interpret the statistical significance of the path coefficients. Critical values for the metrics are given together with the results.

The coefficient of determination (R2) is utilized to evaluate the structural model to measure a specific endogenous construct’s actual and predicted values such as behavioral intention in our model. The value of R2 varies between 0 and 1. The higher the value gets, the more accurate the predictive power becomes. R2 values of 0.75, 0.50 and 0.25 for endogenous latent variables can be respectively described as substantial, moderate, or weak. The effect size of R2 is determined via the effect size ƒ2. It is calculated by omitting a specified exogenous construct from the model and recalculating the R2 value without the omitted construct. The critical values for f2 are 0.02, 0.15 and 0.35, respectively, representing small, medium, and large effects of the exogenous latent variable. No effect is assumed below 0.02.

3.5.2 Statistical analysis

Both descriptive and inferential statistics were considered to explore the significance and predict relations among datasets. We employed the R programming language for statistical calculations. The descriptive analysis was run to determine the mean (M), median (MD) and standard deviation (SD), thus summarizing the characteristics of the datasets. Predictions were performed with the applied methods from the inferential statistics. The central limit theorem and normal distribution were checked with the Shapiro–Wilk test. Pre- and post-tests of the perception and knowledge gain were compared using a dependent test for paired samples. For the normally distributed datasets, we utilized the paired t-test while the Wilcoxon Signed-rank test was performed for the non-normal distributions. Correlations were applied to increase our understanding of relations in the assessed metrics. Pearson and Spearman's correlations were carried out for parametric and non-parametric analysis, respectively. Statistical significance was achieved for the p value < 0.05, which is indicated by a bold asterisk (*).

4 Results and discussion

4.1 Participant profile

57 participants successfully completed the testing. Table 1 illustrates the socio-demographic profile of the participants. The majority of participants were between 24 and 26 years old (31.6%). 54.4% percent of participants were conducting doctoral research, whereas 15.8% and 29.8% were enrolled in bachelor's and master's programs, respectively. Chemical engineering constituted the main training discipline for 42% of the participants and followed by mechanical engineers (17.5%).

Table 1 Participant profile. Sociodemographic information of participants (n = 57)

Participants also rated their interest, experience and habit in VR, CFD and gaming. Details are available in the Supplementary Information. In brief, approximately 89% of participants were interested in VR, 72% in CFD and 63% in digital games. 61% and 82% of the participants were novices to CFD and VR, respectively. Speaking of habits, 28% of the participants reported playing games often or more. Interestingly, desktop settings to visualize 3D data were found at least often difficult for some 30% of participants.

Overall, the socio-demographic analysis indicated that the sampled participants can be a representative portion of the population, arguably a heterogenous distribution barring gender. Also, interest, experience and habit illustrated the diversity of the population in which participants are in general novices to VR and CFD. Likewise, one of the ultimate goals of the Virtual Garage concept is to enable an immersive visualization environment to help students build spatial reasoning skills. When participants were questioned about the difficulties of desktop setting to visualize 3D data, the overall response was quite high, thus confirming the significance of the Virtual Garage’s objectives.

4.2 Conceptual model: behavioral intention with UTAUT2

Details on the data analysis of the UTAUT2 model were elaborated in the section Materials and methods. Therefore, in this section, we only present interpret and discuss the obtained results. First, we examined the measurement model to verify and validate reliable relationships among items in constructs. Then, the structural model was analyzed to create composite models and estimate complex cause-effect relationships in path models with constructs, thereby evaluating our conceptual models and hypotheses.

4.2.1 Measurement model

Table 2 summarizes the assessment of the measurement model. Results showed that the measurement model established a reliable and valid methodology to calculate each construct. Measurements adequately complied with all critical thresholds with satisfying convergent reliability, internal consistency reliability and discriminant validity. Detailed HTMT confidence intervals are available in the Supplementary Information.

Table 2 Summary of the measurement model

Neither items nor constructs exceeded critical values, thus confirming the applicability of the structural model. Results indicated the importance of adapting the intended construct from the available literature to correctly measure the constructs with relevant items. As anticipated, behavioral intention, effort expectancy and performance expectancy constructs from the baseline UTAUT2 model resulted in higher overall scores to validate the measurement model than other constructs. Several external constructs were adopted from the literature to build a conceptual model that fits the research context. Learning value, personal innovativeness and self-efficacy revealed good results. Likewise, even though relatively lower than other constructs; content value, intrinsic value and flow theory also scored sufficiently to validate the entire measurement model. It is noteworthy that in the literature flow theory has a malleable form and has been structured in various ways to assess the flow in technology acceptance. In our study, the flow theory construct is constituted by attention focus, interaction and immersion utilizing items from similar but different constructs. Measurement evaluation highlighted that the formed flow theory is a valid and reliable construct.

Descriptive statistics of the constructs are illustrated in Table 3 together with percent differences against the highest score. All constructs showed a mean score higher than 5.6, meaning that the results lie between slightly agree and agree but mostly close to agreeing. The intrinsic value and personal innovativeness constructs showed the highest scores, which can be due to voluntarily recruited participants. In contrast, the learning value revealed the lowest score which may be attributed to the participant profile being a novice to technical content, CFD and VR. Participants relatively scored high on the effort expectancy, indicating the easiness of the Virtual Garage to operate and become skillful in VR. The flow theory was also ranked high meaning that participants remained positive towards focus attention, immersion and interaction triggered in the Virtual Garage concept. The content value, self-efficacy, behavioral intention and performance expectancy also resulted in relatively high scores, confirming the importance of content, individual's belief in their capacity to operate VR, enthusiasm towards adopting the Virtual Garage concept and usefulness of the Virtual Garage concept, respectively. All in all, it appears that students show higher likeliness on intrinsic, engaging and enjoyable aspects of the Virtual Garage. This concurs well with earlier findings (Udeozor et al. 2021b).

Table 3 Descriptive statistics of each construct with mean (M) values (5 = slightly agree, 6 = agree and 7 = strongly agree), standard deviations (SD), and relative differences with the highest scored construct

4.2.2 Structural model

The structural model was first evaluated without moderating factors. To assess the significance of interest and experience, moderator effects were separately analyzed. The following sections first present and discuss the results of the conceptual model excluding moderators. Then, we elaborate on further investigations on how moderators affected the interrelations among constructors.

4.2.2.1 Without moderators

Before diving into the structural model, we examined the collinearity to mitigate the effects of bias between constructs. The construct’s tolerance (VIF) value was lower than 5 in all items, thus eliminating collinearity as aimed. The VIF values are available in the Supplementary Information.

Figure 3 demonstrates the conceptual model path coefficient computed by PLS-SEM. The analysis revealed that UTAUT2 can explain influencing factors behind students’ behavioral intention to use the Virtual Garage. Table 4 summarizes the path coefficients with relevant metrics to interpret the statistical significance.

Fig. 3
figure 3

Conceptual model with path coefficients. Bold continuous lines indicate important path coefficients (> > 0.1), whereas level of statistical significance marked by bold asterisks; *p < 0.05, **p < 0.01. Dashes lines shows no influence of constructs on the BI (> = 0.1)

Table 4 Constructs with path coefficients and statistical significance

The learning value was the strongest predictor of students’ behavioral intention, indicating that users who find the Virtual Garage concept has value in their learning are more likely to intend to use it. Similarly, the content value showed a significant effect on behavioral intention, meaning that users who find the content in the Virtual Garage concept attractive, useful and understandable are likely to intend to use it. In the previous section, results showed that students overall scored higher likeliness on constructs relevant to engagement, motivation and easy-to-use. In contrast to this finding, PLS-SEM analysis showed that the significant effects behind the students’ intention to use the Virtual Garage were the learning value and content value. Due care must be exercised when interpreting mean construct scores and PLS-SEM analysis at the same time. Liking and behavioral intention to use has shown debatable outcomes in different contexts (Veksler and Eden 2017; Li et al. 2017a; Sung et al. 2022). Our findings showed that there was no noteworthy relationship between liking and behavioral intention to use the Virtual Garage. Recent research argued that the same can also apply in the context of learning in immersive technologies: “liking is not learning” (Makransky et al. 2019). It should be noted that overall learning in immersive technologies can be as effective as traditional practices (Chen et al. 2020a; Södervik et al. 2021).

Furthermore, intrinsic value and personal innovativeness had positive effects on behavioral intention, even though both brought no statistical significance. These were two constructs that participants rated the highest likeliness. Hence, this likeliness on intrinsic value and personal innovativeness was still positively but insignificantly reflected in the behavioral intention. The learning value and content value are the main driving constructs to convince students to use the Virtual Garage. The results point to the probability that intrinsic value and personal innovativeness can also contribute to the behavioral intention to reach out to more diverse populations. Performance expectancy, effort expectancy, self-efficacy and flow theory showed no effects nor significance on the behavioral intention.

Moreover, the coefficient of determination (R2) was calculated by PLS-SEM to evaluate the predictive capacity of the conceptual model on behavioral intention. Results showed that R2 = 0.74, which corresponds to substantial predictive power on behavioral intention (Hair 2017). Table 5 shows the effect size f2 for each omitted exogenous construct. It appeared that the learning value has an impact on behavioral intention more than the medium level. A similar trend emerged in the content value with a small-to-medium effect on behavioral intention. There were small effects of intrinsic value and personal innovativeness observed on behavioral intention. The effect sizes of all constructs were found in line with the path coefficients and statistical significances, thus validating the impact of constructs on behavioral intention.

Table 5 The critical values for f2 are 0.02 (small effect), 0.15 (medium effect), and 0.35 (large effect), respectively
4.2.2.2 With moderators

Our analysis of the structural model was extended to examine the effects of moderating factors on behavioral intention. Each moderator was separately inspected to determine its effect between all exogenous constructs and the behavioral intention. Although both positive and negative moderating effects were observed on behavioral intention, the results were statistically insignificant. Nevertheless, we identified some interesting patterns to be worth reporting, as summarized in Table 6.

Table 6 Moderating effects of interest, experience and habit

Effort expectancy with the moderation of the gaming habit positively affected behavioral intention. Interestingly, a similar pattern was also detected with the moderation of interest in the game. The participants who were experienced and interested in digital games found the Virtual Garage easy to use, and this was positively reflected in behavioral intention. It would appear that users’ past experience and interest in gaming can lower the entry threshold in terms of usability since users may have been familiar with relevant components from gaming.

Furthermore, the performance expectancy positively affected the behavioral intention within the moderation of the interest in VR. Unlike interest, experience in VR did not show any effect on behavioral intention. Each VR application is composed of custom design elements that diversify the content and overall experience, as well as the level of immersion, interaction and presence. Recent research claimed that the experience with other VR applications may not be a predictor of users' performance concerning simulator sickness (Palmisano and ConsTable 2022). Similarly, in contrast to interest, past experience was not a triggering factor in behavioral intention.

Within the moderation of the interest in CFD, self-efficacy showed a positive effect on behavioral intention. Since participants are interested in the learning content, this may positively alter their attitude and motivation to cope with any issues to prevent them from using the Virtual Garage. Intriguingly, personal innovativeness also became a positive predictor of behavioral intention within the moderation of the interest in CFD. The Virtual Garage is the first of its kind blending CFD simulations with VR as a popular emerging technology. It is very likely that users interested in new technologies and CFD simulations may appreciate the Virtual Garage. It could be thus reasonably assumed that developing interest either in content or in technology prior to the VR experience can convince users to accept the new learning mediums.

Moreover, the moderation of performance expectancy and self-efficacy with the experience in CFD indicated a positive effect on behavioral intention. In contrast to the experience in VR, being experienced in content played a crucial role in behavioral intention. CFD-experienced users demonstrated a higher faith in their performance and beliefs to use the Virtual Garage. This suggests that experience and interest in CFD can have a wider positive effect on behavioral intention than experience and interest in VR. Broadly speaking, the content overtakes the technology when it comes to the moderation of experience and interest. What is surprising is the fact that the experience in CFD moderated a negative relationship between learning value and behavioral intention. Experienced users remained skeptical about learning value and its related factor to the use of the technology. However, this is not particularly surprising in light of the fact that experienced users may seek advanced features to interact with CFD data as being provided in CFD software for expert users. Instead of performing a whole CFD project, the Virtual Garage aims at facilitating a learning medium where CFD data can be utilized as learning content whilst providing a first contact for non-CFD expert users. CFD simulations have a complex custom workflow that is generally undertaken through an exploratory iterative fashion. Hence, experienced users may find the VR experience limiting, resulting in a negative tendency toward the behavioral intention.

Finally, we assessed the moderating effect of having difficulties visualizing 3D data on desktop settings between constructs and behavioral intention. Results revealed that self-efficacy negatively affected the behavioral intention of this moderation. In other words, it appeared that the ones having difficulties visualizing 3D data on desktop settings reported lower self-efficacy and this negatively affected their behavioral intention to use the Virtual Garage. This result provides valuable evidence for the importance of the competence and self-belief of users in technology. Users already having difficulties with desktop settings pointed out a similar tendency to lack confidence and competence to operate the VR experience regardless of the content. One of the main objectives of the Virtual Garage is to help non-expert users develop spatial reasoning skills to properly interpret CFD simulation data thanks to spatial immersion and advanced interaction in VR. However, either being or feeling incompetent to operate VR experiences might hamper the learning activity and may turn the Virtual Garage into a futile attempt to help in cognition for this portion of users. Therefore, it seems likely that getting users in contact with a generic practicing VR experience prior to the Virtual Garage may increase their competency and belief in VR technology whereas preventing potency of the cognitive overload.

4.2.3 Perception and attitude

Table 7 illustrates differences between pre- and post-test scores of measured performance expectancy, effort expectancy, self-efficacy, intrinsic value and immersion constructs. According to the statistical analysis, effort expectancy, self-efficacy and immersion significantly increased after the use of the Virtual Garage, whereas performance expectancy and intrinsic value had no significant changes. Users demonstrated higher perceived ease-of-use (effort expectancy), competency and belief to operate VR (self-efficacy), and immersion after being exposed to the Virtual Garage. Results indicated the importance of prior experience in technology and content. While the prior experience can encourage users to properly operate VR applications, it can also help them truly apprehend the immersion confirming previous findings (Joa and Magsamen-Conrad 2021). It is noteworthy that several technology acceptance studies are available in the literature solely measuring the perception without any active interaction of the users within the actual digital content. Although the goal was in general to only measure the technology acceptance based on the initial perception, it may result in weak and misleading argumentations, thus diverting and hindering the underlying factors.

Table 7 Change in perception and attitude

A correlation analysis was performed to further explore how post-tests of five constructs were related to individuals’ interests and experiences as utilized in the moderation analysis. Since the dataset is not normally distributed, we used Spearman’s correlation. We only report detected correlations. Interest in VR positively correlated to performance expectancy (p = 0.015*, r = 0.319) and effort expectancy (p = 0.019*, r = 0.310), whereas experience in VR only positively correlated to performance expectancy (p = 0.038*, r = 0.274). Results indicated that the ones interested and experienced in VR showed a positive attitude towards performing better in the Virtual Garage. This substantiates the positive moderating effect of the interest in VR between performance expectancy and behavioral intention as previously discussed. In addition, interest in gaming positively correlated to performance expectancy (p = 0.036*, r = 0.277), effort expectancy (p = 0.020*, r = 0.306) and self-efficacy (p = 0.012*, r = 0.330). Users interested in gaming reported better performance, ease of use, competence and self-belief to operate the Virtual Garage. Likewise, interest in CFD positively correlated to intrinsic value (p = 0.015*, r = 0.319). As expected, an intrinsic intention was observed in the CFD-interested portion of the participants. It would seem that users’ intrinsic motivation and interest can be triggered by having the digital content they are interested in.

4.3 Learning and task performance

In this research, we further made an effort to observe the effect of the Virtual Garage on knowledge gain and task performance. We uncovered a custom summative assessment as a preliminary work towards integrating an assessment methodology in the Virtual Garage.

4.3.1 Knowledge test

Knowledge gain was composed of 12 multiple-choice questions in which each correct answer scored 1, with 12 as the maximum total. Descriptive statistics showed that there was a 17.1% increase in the mean score of post-test (M = 8, SD = 1.71) than pre-test (M = 6.63, SD = 1.94). Statistical significance was assessed with a pair-wise analysis by comparing mean values of pre- and post-test scores. Shapiro–Wilk normality test showed that a non-parametric analysis should be applied to analyze the chance between pre- (W = 0.965, p = 0.105) and post-tests (W = 0.911, p = 0.0005). Since the dataset is not normally distributed, we used the Wilcoxon signed-rank test. The test showed a significant statistical increase between pre- and post-tests (V = 174, p = 1.875e-05**), indicating that students significantly performed better in the post-test. No significant correlation was detected between learning and interest & experience. We were not able to compare the Virtual Garage against available educational practices since such techniques do not exist.

4.3.2 In-game performance

Users' actions in the Virtual Garage were collected to form an assessment methodology including in-game metrics. Exposure time and decisions were logged. The Virtual Garage has two subsequent modules of 20 min with a 5 min break in between to prevent long exposure time and to reduce simulator sickness. Users on average spent 19.5 min (MD = 18.9, SD = 4.7) and 13.8 min (MD = 15.5, SD = 5) in Module#1 and Module#2, respectively. Time was not included in the assessment since it is not a metric measuring individual’s success in the Virtual Garage concept.

In-game performance was assessed with final decisions made by users in Module#2. At the end of Module#2, users filled in a decision-making board to provide a final design that satisfies the given constraints. Table 8 summarizes overall scores and percentage distributions in the population. According to Spearman's rank correlation, we detected a positive correlation between the post-knowledge test and in-game performance (p = 0.022*, r = 0.301). No correlation was revealed between in-game performance and experience & interest and perception, thus matching earlier findings (Udeozor et al. 2021a).

Table 8 Scores of post-knowledge test, in-game performance and summative assessment

4.3.3 Summative assessment

We developed a summative assessment methodology in which the final score is made up of 50% learning gain and 50% task performance. More on the summative assessments can be found in the Supplementary Information. Table 8 presents the entire scorings interfered in the summative assessment. No significant correlation was found between the summative assessment score and UTAUT2 constructs, as well as experience & interest.

We believe that the assessment methodology should be customized based on the utility of the Virtual Garage concept. On one hand, if the practitioners solely look for an edutainment tool, they can either use an online questionnaire or carry out an interview session to get some insights and reflections on students’ experiences instead of learning outcomes. On the other hand, as utilized in our work, an assessment methodology can be developed and implemented in the relevant courses and exercises.

An assessment rubric with intended learning outcomes may help lecturers come up with a fair evaluation scheme. We advise lecturers to progressively implement similar applications in educational practices since several unknown aspects of assessment might appear along the journey. For example, In-game assessment tools can further be implemented in learning modules to provide a ground for self-evaluation and reflection such as using the learning analytics dashboard.

4.4 Practical implications, limitations and future directions

Although this study has provided valuable insights into the use of the Virtual Garage in an educational context, it's imperative to translate these findings into concrete, actionable recommendations for practitioners and researchers. Below, we outline specific practical implications derived from our study, alongside the previously discussed limitations and directions for future research.

4.4.1 Reflection on the importance of UTAUT2 constructs

The PLS-SEM analysis revealed that the learning value and content value were the main predictors influencing students' behavioral intention to use the Virtual Garage. Despite their prominence in acceptability models, performance expectancy and effort expectancy did not show significant effects on behavioral intention in this study. One plausible explanation is the potential redundancy between the constructs of learning value and content value with performance expectancy and effort expectancy. Since learning and content value directly align with the core purpose of the Virtual Garage, they might overshadow the more general constructs of performance expectancy and effort expectancy. The specific context of VR in education, focusing on the immersive and content-rich experience, may have amplified the importance of learning value and content value over the traditionally significant variables. This observation may suggest a need to reconsider the weight and relevance of different constructs in technology acceptance models, especially in niche and specialized applications like VR in education. On the one hand, for future VR applications in education, it may be beneficial to prioritize enhancing learning and content value over the general ease of use or perceived utility, as these seem to be more influential in shaping students' behavioral intentions. On the other hand, it would be also beneficial to separately report on the results with the traditional UTAUT2 construct to objectively compare behavioral intention against available literature.

4.4.2 Implications for practitioners and developer

In light of our study's findings, there are significant implications for both lecturers and developers in the realm of virtual reality (VR) applications in education. Lecturers are encouraged to integrate the concepts of learning value and content value into their course designs, selecting VR applications that resonate with the educational needs and interests of their students. This approach not only aligns educational content with student preferences but also leverages intrinsic motivation and personal innovativeness, fostering a learning environment that is both engaging and interactive. For developers, the focus should be on creating VR applications that enhance learning value and content relevance. This entails designing content that is immersive, yet firmly anchored in educational objectives and applicable to real-world scenarios. Developers should also prioritize intuitive user interfaces and user experiences, catering to users with varied levels of experience and innovativeness in VR. Overall, our study underscores the necessity of a user-centered approach in both the educational and technological aspects of VR application development, emphasizing the importance of aligning these applications with the behavioral intentions and perceptions of students. This holistic approach promises to significantly enrich the educational landscape through the thoughtful integration of VR technology.

Our study highlights the versatile applications of the Virtual Garage in diverse educational settings, particularly beneficial for engineering education. For practitioners, this implies the potential for integrating VR technologies like the Virtual Garage to enrich the learning experience. Such integration should be customized to align with specific course objectives and student needs, thereby maximizing the educational impact. On the research front, there's a need for further exploration into the deployment of VR technologies across various educational contexts. This exploration should focus on assessing the impact of VR on learning outcomes, taking into account a variety of student demographics and academic disciplines. By doing so, the research can provide a broader understanding of VR's effectiveness in different learning environments.

4.4.3 Limitations and future study recommendations

Instead of performing the experiments in a classroom environment, we recruited participants through a convenience sampling method. Therefore, repeating the experiments within a larger and heterogenous population from university courses may provide further validation to our findings. Also, the results were mostly based on graduate students, as 84% of the participants were enrolled in a master's program or higher. To truly understand the behavioral intention of younger groups—such as undergraduate students—it would be valuable to perform a follow-up study. Another limitation of our study was the imbalanced gender ratio, as our population was dominated by male participants (73.1%). This prevented us to disclose any gender-driven behaviors. Lastly, we are aware that it is expected to compare the Virtual Garage against available educational practices. However, a traditional version of the Virtual Garage is non-existent. Hence we did not attempt to perform a comparison study. Instead, as previously proposed (Solmaz et al. n.d.), it would bring more added value to compare the effect of immersion and other relevant factors such as usability and learning between variants of CFD scenes in VR.

VR technology has a dynamic trait, day by day becoming more affordable, accessible and diverse thanks to advances in hardware and software. We, therefore, convey some direction for future works concerning today’s situations in technology. First, more research could be concentrated on the practical implementation of VR in a current educational context. Best practices could be properly assessed and reported. Not only students but also lecturers including all other stakeholders would be a part of the implementation and assessment. Secondly, the metaverse is being transformed from hype to reality. Collaborative learning environments, where peer students and lecturers can be simultaneously present, would be an intriguing direction. Lastly, the added value of VR in remote education would be worth studying to figure out the underlying factors behind both students’ and decision-makers' behavioral intentions.

5 Conclusion

This study explores the underlying factors behind students’ behavioral intention to use an immersive virtual reality learning environment with CFD simulations, namely the Virtual Garage concept. Learning value and content values are driving factors in the students’ behavioral intention. In other words, it appears that students can continue using similar applications if only it has a learning value with content that matters for their education. Intrinsic value and personal innovativeness also show a positive impact on behavioral intention. We provide further evidence of the importance of perception. Participants’ perception of effort expectancy, self-efficacy and immersion significantly and positively increased after using the Virtual Garage. In addition, considerable progress in the evaluation of learning and performance is also demonstrated. Our results provide encouraging findings for practitioners and decision-makers in the design, development and implementation of similar digital educational tools.