1 Introduction

Seafarers’ competence have been regarded as a crucial determinant of maritime safety (Kongsvik et al., 2020; A. Wahl et al., 2020). Competency-based training (CBT) was introduced by the International Maritime Organization (IMO) through the International Convention on Standards of Training, Certification and Watch keeping for Seafarers (STCW) 1995 as amended to increase seafarers’ competence and address the human-error-related incidents in the maritime domain (Emad & Roth, 2008). The essence of CBT is to operationalize “training and assessment” activities in the workplace or in a job-like environment (Fletcher & Buckley, 1991) thereby making simulators a significant medium of CBT in maritime education and training (MET) (Martes, 2020). Maritime simulators have been proposed as a solution for bridging the “experiential learning gap” of entry-level seafarers (A. Wahl et al., 2020) in addition to them being a key element for efficient training in high-risk domains (Moroney & Lilienthal, 2009), for emergency training (Billard et al., 2020) as well as for enhancing behavioural and performance outcomes (Röttger & Krey, 2021). Consequently, simulator training for critical navigation components in maritime operations such as automatic radar plotting aid (ARPA), and radio detection and ranging (RADAR) have been made mandatory by the STCW regulations in the Section B-I/12 – Guidance regarding Use of Simulators (IMO, 2010). Since then, simulator training has become a standard aid for training of seafarers in maritime institutes, where it is utilized to bridge the gap between theory and practice by providing an opportunity to experience the real maritime work environment in a virtual medium (Hontvedt & Arnseth, 2013).

Over the years, maritime simulators have evolved across various modalities, depending on their functionality, scale, and purpose. On the other hand, continuously changing training needs of seafarers with the emerging technical and operational stature of the maritime industry makes it impossible to use an all-in-one simulator for training. For example, a full-mission bridge simulator may be suitable to replicate basic to complex navigation scenarios for training, whereas desktop-based simulators may be considered more suitable for procedural training or equipment familiarization (Kim et al., 2021). Similarly, cloud-based simulators may seem suitable for remote-training accessible at any time and location, whereas Virtual Reality (VR) simulators may be used to provide highly immersive training in 3D environment (Mallam et al., 2019). Moreover, the multifaceted demands considering the factors such as training duration, instructors’ competence, and evaluation methods (Nazir et al., 2019) generate diverse training needs requiring a comprehensive institutional strategy. Thus, the availability of various simulator modalities, each with distinct characteristics, combined with numerous emerging factors to consider, creates a decision-making challenge for maritime instructors when selecting the appropriate simulator to meet specific training needs. Kim et al. (2021) assessed four different modalities of maritime simulators – full-mission, cloud-based, desktop-based, and virtual reality (VR) simulators—using a qualitative approach. Such an approach offers the pros and cons of using different modalities of maritime simulators but does not provide a structured framework for decision-making nor any in-depth insight into the factors that dictate the simulator selection process. Therefore, the research question of this study is formed as: “What factors influence the selection of simulator modalities for maritime training, and how can their importance rankings be used to evaluate simulators?”

This study proposes a multi-criteria decision-making (MCDM) framework for the evaluation of the four modalities of maritime simulators—full-mission, desktop-based, cloud-based, and VR simulators considering 13 relevant factors (or sub-criteria). First, the underlying factors affecting the selection of maritime simulators are extracted from published literature, and grouped under three higher level criteria—technical, instructional, and organizational criteria. Then, these criteria and their corresponding sub-criteria are ranked by their weights derived from a survey of experts utilizing the Bayesian best-worst method (BWM). Finally, an evaluation of the four modalities of simulators is performed using the Preference Ranking Organization Method for Enrichment Evaluation (PROMETHEE) method. A sensitivity analysis is conducted to explore and discuss the influence of varying weights of the 13 sub-criteria on the preferential ranking of simulator modalities.

The subsequent sections of this study include: the description of methodology employed in this study delineating how two MCDM methods, i.e., Bayesian best-worst method (BBWM) and PROMETHEE have been utilized in conjunction in Section. 2, the results of analysis in Section. 3, discussion of results along with a few practical implications in Section. 4, and the conclusions with future research directions in Section. 5.

2 Methodology

MCDM methods have traditionally been used in classic decision-making or assessment contexts such as equipment selection in process industries (Standing et al., 2001; Tabucanon et al., 1994), performance-based ranking of universities in the education domain (H.-Y. Wu et al., 2012) or assessing the effects of multiple criteria in cloud technology adoption in healthcare domain (Sharma & Sehrawat, 2020). Different types of MCDM methods are being utilized including analytic hierarchy process (AHP), analytic network process (ANP), technique for order preference by similarity to ideal solution (TOPSIS), multi-criteria optimization and compromise solution (VIKOR), decision making trial and evaluation laboratory (DEMATEL), simple additive weighting (SAW), PROMETHEE, and elimination and choice translating reality (ELECTRE), along with their variants (Zavadskas et al., 2014). A combination of multiple methods is also common in the literature (Dağdeviren, 2008; Kheybari et al., 2021; Nabeeh et al., 2019).

In this study, a combination of MCDM methods, i.e., Bayesian BWM-PROMETHEE, is used to first, rank the factors and criteria influencing the selection of maritime simulators, then to evaluate four simulator modalities considering those factors. A systematic literature review approach was followed to identify the relevant factors for the selection of maritime simulators. Figure 1 presents the methodological workflow of this study.

Fig. 1
figure 1

Workflow of research methodology

2.1 Literature review

The proposed MCDM framework related to the evaluation of maritime simulators along with their associated selection criteria require data from two specific dimensions: (1) criteria and/or sub-criteria affecting the selection of simulators, and (2) types of maritime simulator modalities to be evaluated. First, a systemic approach is followed to identify the criteria and sub-criteria from scientific literature. The literature search was performed using the following Boolean search strings:

(“maritime” OR “shipping”) AND (“seafarer* training” OR “maritime education and training” OR “MET” OR “training” OR “education”) AND (“simulator*”)

The search was conducted in two databases—Scopus and Web of Science (WOS)— returned a total of 168 documents, after excluding duplicates and including only peer-reviewed articles written in English language. After the initial screening of abstracts, a total of 69 articles were finally selected for full-text review. Table 1 provides a summary of the literature search process.

Table 1 Summary of literature search

2.1.1 Criteria and sub-criteria affecting the selection of simulators

The 69 selected articles were reviewed systematically using an Excel file to identify items relevant for maritime simulator training. The review followed an iterative process involving the authors, which lead to clustering of the items under 13 thematic sub-criteria: (C1) fidelity, (C2) immersivity, (C3) possibility of remote training, (C4) possibility of team training, (C5) ease of training, (C6) ease of assessment, (C7) pedagogic value, (C8) appropriate methods for training, (C9) diversity of training scenario, (C10) training efficiency, (C11) regulatory compliance, (C12) cost of training and (C13) capacity of institutions. These sub-criteria were then categorized under three higher-level criteria: technical (C1–C5), instructional (C6–C10) and organizational (C11–C13) criteria. Table 2 presents the identified items from the published literature along with the thematic sub-criteria, and criteria.

Table 2 Summary of literature review with identified criteria

2.1.2 Identifying maritime simulator modalities

Maritime simulators can be classified in several ways considering the difference in their capabilities and training objectives. For example, DNV GL (2021) classified all types of maritime simulators into four different categories: class A, class B, class C, and class S for full-mission, multi-task, limited-task, and special-task, respectively. The categorization considers each individual maritime operations having separate training objectives such as ship navigation, engine room operation, cargo handling, etc. However, maritime training institutes may use one simulator for multiple training needs, such as a full-mission bridge simulator for both navigation and communication training, which makes it difficult to further categorize simulators solely based on their task-relevance. Therefore, this study opted for the analysis of simulators categorized according to their physical modalities, reliance on internet and hardware usage as proposed by Kim et al. (2021). A description of the four selected simulator modalities is presented in Table 3.

Table 3 Available simulator modalities for maritime training (adapted from Kim et al., 2021)

Following the identification of the relevant criteria, sub-criteria, and simulator modalities, MCDM methods are operationalized in the subsequent sections.

2.2 Bayesian BWM for criteria ranking

A structured BWM survey was designed for data collection, where the practitioners and experts in MET, such as maritime instructors (MI), maritime education researchers (MR), and head of departments (HOD) with relevant backgrounds from different maritime institutions participated. The potential respondents including the members of the International Association of Maritime Universities (IAMU) were reached out through email and other professional contacts. The survey link was kept open for participation for 1 month from 1 June 2022 until 1 July 2022. A total of 41 responses were received, among which 8 were removed for inconsistent answers and a further 8 responses were set aside due to their lack of direct experience with maritime simulators. Consequently, 25 respondents with hands-on teaching experience with simulators (mean = 9.87 years, SD = 7.21) were selected for final analysis. Table 4 provides an overview of respondents.

Table 4 Overview of survey respondents

Since the inception of BWM as a Bayesian probabilistic group decision-making method (Mohammadi & Rezaei, 2020; Rezaei, 2015), it has been used in many decision-making studies in various domain, e.g., in aviation domain for evaluating green performance of the airports (Kumar et al., 2020), in healthcare for selecting waste disposal location (Torkayesh et al., 2021) etc. In this study, the Bayesian BWM has been employed to rank three (03) criteria and their corresponding 13 sub-criteria that influence the selection of maritime simulators. A stepwise approach was followed for estimating the local weights of the criteria and sub-criteria, which are then used for calculating global (or overall) weights of all the sub-criteria.

  • Step 1: Identification of different criteria and sub-criteria affecting simulator selection

The first step of Bayesian BWM is to identify criteria for evaluation. In Section. 2.1, 13 sub-criteria concerning simulator training in the maritime domain were identified under three criteria, as presented in the proposed MCDM framework in Fig. 2.

Fig. 2
figure 2

Proposed MCDM framework for evaluating of simulator modalities (image source: author and Kongsberg Digital)

  • Step 2: Identifying the most important (MI) and the least important (LI) criterion and sub-criterion

Experts determined the most important (MI) and the least important (LI) criteria through the survey. The respondents’ evaluation about the MI and LI criteria is generated for the three criteria and their corresponding sub-criteria. The identified MI and LI from the survey can be found in the supplementary data.

  • Step 3: Comparing the most important (MI) with other criteria (j)

Experts were asked to rate the importance of their selected MI criterion with respect to the other criteria on a scale of 1-to-7 (“1” being “equally” important and “7” being “very strongly” important than). Thus, the vector for the most important-to-others (MO) is formed as:

$$MO=\left({x}_{MI1},{x}_{MI2},{x}_{MI3},\dots \dots {x}_{MI n}\right)$$
(1)

Here, xMIj denotes the preference of most important (MI) criterion over the criterion j, where xMI. MI = 1.

  • Step 4: Comparing the other criteria (j) to the least important (LI)

Subsequently, the experts were asked to rate the importance of other criteria with respect to their selected least important (LI) criteria on a scale of 1-to-7 (“1” being “equally” important and “7” being “very strongly” important than). Thus, the vector for the others-to-the least important (OL) is formed as:

$$OL=\left({x}_{1 LI},{x}_{2 LI},{x}_{3 LI},\cdots \cdots {x}_{nLI}\right)$$
(2)

Here, xjLI denotes the preference of another criterion j over the least important (LI) criterion, where xLI. LI = 1.

  • Step 5: Estimating the overall weight

In Bayesian BWM, the weights of the criteria can be estimated based on the MO and OL vectors as inputs. Following (Mohammadi & Rezaei, 2020), the probability mass function (PMF) of the OL vector can be expressed as a multinomial distribution as follows:

$$P\left( OL\left|w\right.\right)=\frac{\left({\sum}_{j=1}^n{x}_{jw}\right)!}{\Pi_{j=1}^n{x}_{jw}!}\prod\limits_{j=1}^n{w}_j^{x_{jw}},$$
(3)

considering w as the probability distribution of weights. Since OL vector represents the preference of other criteria over the LI criteria, the MO vector represents the preference of the MI criteria over the others.

$$MO\sim multinomial\left(\frac{1}{w}\right)$$
(4)

Hence, the weight vector can be estimated through Dirichlet distribution shown below since MCDM weights are non-negative and have sum-to-one characteristics.

$$Dir\left(w\left|\alpha \right.\right)=\frac{1}{B\left(\alpha \right)}{\Pi}_{j=1}^n{w}_j^{\alpha_j-1};\;\textrm{here}\ \upalpha \upvarepsilon {R}^n$$
(5)

The aggregated weight (wggg) and individual expert weight (w1 : k) corresponding to their inputs can be calculated using the most important-to-others vector (MO1 : k) and others-to-least important vector (OL1 : k) for all experts ∀k = 1, 2, 3, ⋯. . , K. Therefore, the joint probability distribution can be expressed as:

$$P\left({w}^{ggg},{w}^{1:k}\left|{MO}^{1:k}\right.,{OL}^{1:k}\right)$$
(6)

The individual expert weight (wk) should be within the bounds given by the aggregate weight (wggg) as below:

$${w}^k\left|{w}^{ggg}\right.\sim Dir\left(\gamma \ast {w}^{ggg}\right),\forall k=1,2,3,\cdots ..,K;$$
(7)

where γ follows a gamma (.01,.01) distribution parameter.

2.3 PROMETHEE for evaluating of the simulator modalities

Developed by Brans and Vincke (1985), PROMETHEE have become an established MCDM method for ranking or evaluating different alternatives. It has been used both as stand-alone and in combination with other MCDM methods for evaluation of finite number of alternatives (Albadvi et al., 2007); for example, in the selection of manufacturing systems (Anand & Kodali, 2008) or sustainable energy planning (Pohekar & Ramachandran, 2004).

In this study, the weight of each criterion is estimated using Bayesian BWM. Subsequently, these criteria weights are utilized as inputs for evaluating the 4 simulator modalities as alternatives using PROMETHEE. In PROMETHEE, we consider a preference function P (the difference between two alternatives a and b) for a particular criterion. The degree of preference (P) ranges from 0 to 1.

$${\displaystyle \begin{array}{c}{P}_{j\left(a,b\right)}={G}_j\left[{f}_j(a)-{f}_j(b)\right],\\ {}0\le {P}_{j\left(a,b\right)}\le 1,\end{array}}$$
(8)

In Eq. (8), the preference function is related to the criterion fj(i), where Gj represents a non-decreasing function of the deviation between fi(a) and fj(b).

The PROMETHEE calculations use the following functions:

$$\pi \left(a,b\right)=\frac{\sum_{j=1}^n{\omega}_j{P}_j\left(a,b\right)}{\sum_{j=1}^n{\omega}_j},$$
(9)
$${\phi}^{+}(a)=\sum_{x\in A}\pi \left(x,a\right),$$
(10)
$${\phi}^{-}(a)=\sum_{x\in A}\pi \left(a,x\right),$$
(11)
$$\phi (a)={\phi}^{+}(a)-{\phi}^{-}(a).$$
(12)

Here, π(a, b) denotes the overall preference index of alternative a over b, where both belong to the set of alternatives A. The leaving flow, ϕ+(a) measures how a dominates all other alternatives of A (the outranking characteristic of a). A higher value of ϕ+(a) indicates a better position of alternative a over others. The entering flow, ϕ(a) measures how a is dominated by all other alternatives of A (the outranked characteristic of a). A lower value of ϕ(a) indicates a better position of a over others. On the other hand, a higher value of the net flow, ϕ(a) represents a better position of alternative a.

Consequently, the sub-tools of PROMETHEE—PROMETHEE I, PROMETHEE II, and PROMETHEE rainbow—are utilized respectively for partial ranking, complete ranking, and visually representing all criteria according to their order of importance for each corresponding simulator modality (i.e., alternative).

2.3.1 PROMETHEE I: for partial ranking

PROMETHEE I provide a partial ranking of alternatives; for example, between alternative a and b, it can estimate preference (aPb), indifference (aIb), and incomparability (aRb) using the following functions:

aPb (alternative a is preferred over b) if:

$${\displaystyle \begin{array}{c}{\phi}^{+}(a)>{\phi}^{+}(b)\ and\ {\phi}^{-}(a)<{\phi}^{-}(b); or\\ {}{\phi}^{+}(a)>{\phi}^{+}(b)\ and\ {\phi}^{-}(a)={\phi}^{-}(b); or\\ {}{\phi}^{+}(a)={\phi}^{+}(b)\ and\ {\phi}^{-}(a)<{\phi}^{-}(b).\end{array}}$$
(13)

aIb (indifference between alternative a and b) if:

$${\phi}^{+}(a)={\phi}^{+}(b)\ and\ {\phi}^{-}(a)={\phi}^{-}(b).$$
(14)

aRb (alternative a and b is incomparable) if:

$${\displaystyle \begin{array}{c}{\phi}^{+}(a)>{\phi}^{+}(b)\ and\ {\phi}^{-}(a)>{\phi}^{-}(b); or\\ {}{\phi}^{+}(a)<{\phi}^{+}(b)\ and\ {\phi}^{-}(a)<{\phi}^{-}(b).\end{array}}$$
(15)

Here, in aPb both the outranking and the outranked flows are consistent, meaning a higher power of a is associated with a lower weakness of b in all cases. Therefore, the comparison between alternatives a and b can be considered as sure.

However, in aRb, the outranking and the outranked flows are inconsistent with regard to power-weakness analysis, meaning that a true preference of one alternative over the other cannot be determined. The decision-maker holds the responsibility to make a choice in this situation.

2.3.2 PROMETHEE II: complete ranking

PROMETHEE II provides a solution to the lack of definitive ranking of PROMETHEE I. It considers the net outranking flow: ϕ(a) = ϕ+(a) − ϕ(a) where the higher the net flow, the better the alternative.

In scientific literature, both PROMETHEE I and PROMETHEE II are used in conjunction for complex decision-making scenarios since PROMETHEE I ensures the inclusion of indifferent and incomparable alternatives in the calculations, which may be left out in PROMETHEE II (Brans & De Smet, 2016).

2.3.3 PROMETHEE rainbow

The PROMETHEE rainbow is used to visualize a disaggregated view of the complete ranking derived from PROMETHEE II. It represents the details of net outranking flow: ϕ(a) = ϕ+(a) − ϕ(a), where both the most-significant and less-significant criteria for each simulator modality can be displayed in their order of importance.

The model, along with the mathematical formulations as described above is processed through Visual PROMETHEE Academic Edition software (version 1.4.0.0). The adopted Bayesian BWM-PROMETHEE approach provides a clear and concise graphical representation that simplifies the decision-making process. Such an approach not only enhances the transparency of the decision-making process but also enables stakeholders to comprehend and interpret the decision outcomes with ease.

3 Results

3.1 Weights of the criteria and their ranking

The estimation of both criteria and sub-criteria weights was performed using the Bayesian BWM syntax in MATLAB software (Mohammadi & Rezaei, 2020). First, the local weights derived from MATLAB were used to construct a visual credal ranking depicting a probabilistic comparison of criteria (see Fig. 3). For example, it can be inferred with 100% confidence that both instructional criteria (INST) and technical criteria (TECH) of simulators are the more important than the organizational criteria (ORG), while the confidence level decreases to 66% if the instructional criteria are compared with technical criteria (TECH) and so on (see Fig. 3a). In the sub-criteria level, fidelity (FID) is the most important criterion among the technical criteria of simulators (see Fig. 3b). Similarly, pedagogic value (PED) is the most important among the instructional criteria (Fig. 3c) and regulatory compliance (REG) among the organizational ones (see Fig. 3d). Table 5 reports the global weight for each sub-criteria, calculated by multiplying their local weights to their respective criteria-level weights. For instance, the global weight of fidelity (FID) is 0.0884, derived by multiplying local weight of technical (TECH) criteria (0.3737) with local weight of fidelity (FID) (0.2365).

Fig. 3
figure 3

Credal ranking of criteria (a) and sub-criteria (bd). Criteria level: technical (TECH), instructional (INST), and organizational (ORG). Sub-criteria level: fidelity (FID), immersivity (IMM), possibility of remote training (RMT), possibility of team training (TMT), ease of training (ESY), ease of assessment (ASM), pedagogic value (PED), appropriate methods of training (MTH), diversity of training scenario (DSC), training efficiency (TRE), regulatory compliance (REG), cost of simulators (COS), and capacity of institutions (CAP)

Table 5 Aggregated weight of criteria and sub-criteria

The global weights of the sub-criteria (see Fig. 4) reveal that regulatory compliance (REG), pedagogic value (PED), and training efficiency (TRE) are the top three factors for evaluating simulator modalities. On the other hand, possibility of remote training (RMT), ease of assessment (ASM), and cost of simulators (COS) are the three least important factors, while the others—fidelity (FID), possibility of team training (TMT), appropriate methods of training (MTH), immersivity (IMM), the capacity of institutions (CAP), diversity of training scenario (DSC), and ease of training (ESY)—appear in decreasing order of importance.

Fig. 4
figure 4

Global ranking according to the weights of sub-criteria. Criteria level: technical (TECH), instructional (INST), and organizational (ORG). Sub-criteria level: fidelity (FID), immersivity (IMM), possibility of remote training (RMT), possibility of team training (TMT), ease of training (ESY), ease of assessment (ASM), pedagogic value (PED), appropriate methods of training (MTH), diversity of training scenario (DSC), training efficiency (TRE), regulatory compliance (REG), cost of simulators (COS), and capacity of institutions (CAP)

3.2 Evaluation of simulator modalities

PROMETHEE I and PROMETHEE II were employed to evaluate the four simulator modalities considering the global weights of 13 sub-criteria as derived from Bayesian BWM. Phi (+ve) score for the outranking flow and Phi (-ve) scores for the outranked flow as described in the methodology section were estimated using the Visual PROMETHEE software (see Table 6).

Table 6 Phi scores for the simulator alternatives

The unique Phi scores of each alternative demonstrate that they are not indifferent (see Eq. 14), enabling the ranking possibility of alternatives. PROMETHEE I was used to represent the partial ranking and PROMETHEE II was used for the complete ranking. Two columns in Fig. 5a represent the outranking flow and the outranked flow for the partial ranking, while the net flow is represented by a single column in Fig. 5b for the complete ranking. The horizontal lines in the flow columns account for the position of different simulators in their estimated ranking. The ranking starts at the top, where full-mission and VR simulator modalities rank higher followed by the cloud-based and desktop-based simulators (see Fig. 5).

Fig. 5
figure 5

PROMETHEE ranking of simulator modalities

As the horizontal lines in Fig. 5a neither overlap nor cross, the simulators are neither indifferent nor incomparable, signifying their distinct characteristics and comparability of alternatives in PROMETHEE ranking.

The PROMETHEE rainbow provides a graphical representation of the comprehensive evaluation of various simulator modalities by ranking them according to their preference. This evaluation considers both the criteria of utmost significance and those of relatively lesser significance, arranged in descending order of importance for each modality. In Fig. 6, the simulator modalities are ranked from left to right, while the criteria are ranked from top to bottom. For example, considering the full-mission simulators in the leftmost column, the upper portion of the PROMETHEE rainbow emphasizes the most-significant criteria, such as regulatory compliance, pedagogic value, and training efficiency. Meanwhile, the lower portion of the diagram focuses on less significant criteria, i.e., institutional capacity, simulator cost, and remote training capabilities (see Fig. 6).

Fig. 6
figure 6

PROMETHEE rainbow

4 Discussion

The results offer an evaluation of maritime simulators based on the ranked importance of various selection factors. Full-mission simulators appear to be the most preferred alternative among the experts, followed by VR, cloud-based, and desktop-based modalities.

At the criteria level, the preferences emphasize the instructional features of simulators, while the sub-criteria level focuses on aspects such as regulatory compliance, pedagogical value, training efficiency, fidelity, and team training. Contemporary research in maritime simulator training echoes these findings, emphasizing the need for pedagogical utility (Sellberg, 2018), fidelity (de Oliveira et al., 2022) and team training (Kandemir et al., 2018) capabilities of simulators. This illustrates a trend towards prioritizing educational effectiveness and adhering to industry standards while selecting and implementing maritime simulator training. On the other hand, remote training, assessment convenience, and cost are perceived as less significant factors in the context of simulator training at maritime institutions. This is likely because maritime institutions currently do not prioritize remote training and predominantly rely on traditional assessment methods. Additionally, simulator costs are not viewed as a major concern, as long as other essential selection criteria are satisfied.

A clearer picture emerges when the importance-based criteria ranking is examined with respect to each simulator modalities. The results reveal that the remote training capability (C3), cost of simulators (C12), and institutional capacity (C13) are the least significant criteria for full-mission simulators, while the same factors become important considerations for cloud-based simulators. In contrast, VR simulators have the potential to facilitate higher pedagogic value (C7), training efficiency (C10), fidelity (C1), team training capabilities (C4), appropriate training methods (C8), immersivity (C2), diverse training scenarios (C9), and remote training capabilities (C3). However, cloud-based simulators are perceived as less suitable when considering the exact criteria, except for their remote training capabilities (C3). Desktop-based simulators, on the other hand, are viewed as suitable when prioritizing organizational criteria (cost, institutional capacity, and regulatory compliance) and easier assessment procedures. Yet, they are perceived as less beneficial in terms of technical (fidelity, team training, etc.) and most instructional (pedagogical value, training efficiency, etc.) criteria, as shown in Fig. 6 (PROMETHEE rainbow).

A sensitivity analysis representing the effect of varying weights for each sub-criterion on the priority ranking of simulator modalities is presented in Fig. 7. For example, the initial priority ranking stays the same when only considering fidelity (C1) and immersivity (C2) with 100% weight. However, when focusing exclusively on remote training (C3), cloud-based simulators rank highest, followed by VR, desktop-based, and full-mission simulators (see Fig. 7 (C3)). Similarly, when emphasizing ease of training (C5), cloud-based simulators are preferred over desktop-based and VR simulators. In contrast, when prioritizing ease of assessment (C6), desktop-based simulators rank above VR and cloud-based alternatives, while full-mission simulators remain the top choice in both scenarios. Consistent with this trend, cloud-based simulators are least preferred when prioritizing pedagogic value (C7) but become most preferred option when institutional capacity (C13) is the highest priority (see Fig. 7 (C13)).

Fig. 7
figure 7

Sensitivity analysis to varying weights of sub-criteria

The study highlights the perception of experts and practitioners of maritime education and training (MET) for the evaluation of differing simulator modalities which has significant current and future implications.

4.1 Implications in maritime training

It is necessary to instil the required level of competence for future seafarers corresponding to the increasing level of complexity in real environment to ensure workplace safety. Maritime institutes usually find it challenging to utilize appropriate tools in the competence development process, especially due to the lack of understanding and available information. Therefore, an evaluation of simulator modalities along with the ranking of different relevant criterion is particularly useful for the maritime instructors, providing them with greater insights about the capabilities of available simulator modalities. This study provides a systemic framework for existing practices about how instructors optimize limited training resources to address specific training needs. For example, full-mission simulators are used as highly efficient team training solution, while VR simulators are suited for immersive remote training. In contrast, cloud-based simulators serve as cost-effective, user-friendly remote solutions, and desktop-based simulators are suitable for budget-conscious situations where remote access is not a priority.

The in-depth analysis reveals a perceived lack of pedagogic value in cloud-based and desktop-based simulators, despite its importance in educational technologies (Anderson & Dron, 2011; Fowler, 2015). Moreover, experts perceive that there are performance assessment challenges in VR and cloud-based simulators, which coupled with the emerging assessment technologies for seafarers assessment (e.g., eye tracking, accelerometers, heart-rate monitors) (Kim et al., 2021; Mallam et al., 2019), paves new ways for future research in performance assessment within maritime simulator training.

It is widely recognized that instructors’ knowledge and familiarity with technology-based tools is essential for the success of technology-based teaching and learning (Ghavifekr & Rosdy, 2015). Similarly, trainee familiarity with these tools is also important as it increases engagement and enhances learning outcomes (Chiu, 2021). Therefore, it would be beneficial for instructors and trainees to have a comprehensive understanding of the available simulator modalities, including their strengths and weaknesses. The mapping of different modalities of simulators along with associated criteria and sub-criteria as presented in this study would enable more informed decision-making and expected to facilitate enhanced learning outcomes.

4.2 Industrial and policy implications

The fusion of technology, pedagogy, and content is essential for effectively integrating new technologies into educational contexts (Mishra & Koehler, 2006). Key technological features, such as fidelity, immersivity, and usability, are determined by manufacturers, who also enable remote and team training capabilities. Thus, it is vital for manufacturers to address both technological and instructional aspects of simulators to enhance training outcomes. Additionally, the results of this study highlight that regulatory compliance, pedagogic value, and training efficiency serve as distinguishing factors between traditional full-mission simulators and emerging, less-expensive cloud-based alternatives (see Fig. 6). This presents an opportunity for technology developers to enhance regulatory compliance and improve instructional capabilities of low-cost simulators.

In practice, the adoption of learning technology (e.g., simulators) often lacks a comprehensive long-term strategy, as the procurement process tends to focus on immediate needs rather than future goals (Ringstaff & Kelley, 2002). Such shortsighted approach can lead to investment decisions that overlook broader educational goals, such as the 13 sub-criteria identified for maritime simulator selection. The selection process for simulator providers usually involves a pre-bidding and subsequent post-bidding stages, during which the bidders must meet a set of criteria to advance to the next phase. This evaluation process does not emphasize any single factor but assesses a combination of factors that contribute to the final decision where the proposed MCDM approach and associated identified sub-criteria could provide a comprehensive and systematic framework to evaluate potential simulator providers. The sensitivity analysis could be particularly useful in contexts where it is necessary to determine the effect of putting higher importance to a specific sub-criterion and seeing how it might affect the overall simulator selection process.

The results also suggest that VR simulators are perceived as more costly than cloud-based or desktop-based simulators. The reason could be the general perception about the high cost of VR scenario development, customization and procurement of professional-grade VR hardware (Su et al., 2020). However, VR training could be a cost-effective option in the long term than other high-fidelity simulator alternatives as evidenced in other domains such as in healthcare and in engineering training instances (Joshi et al., 2021; Perrenot et al., 2012). In addition, VR simulators would likely be used in situations where the other important criteria (e.g., pedagogic value, training efficiency, fidelity, team training, appropriate methods of training, immersivity, diversity of training scenario, and remote training) outweighs the cost considerations.

5 Conclusion and future directions

This study evaluates the state-of-the-art maritime simulator modalities based on 13 relevant factors. The proposed MCDM framework provides the opportunity to evaluate four available simulator modalities (i.e., full-mission, desktop-based, cloud-based and VR simulators) based on factor importance rankings. A sensitivity analysis revealed that the priority ranking of simulator modality selection could be influenced by the varying weights of the 13 factors. The findings of this study could be beneficial for both the academic and the industrial stakeholders aiming to provide quality education for maritime trainees.

Future research, involving criterion-specific analysis of simulators could facilitate developing hybrid simulator training modules and curriculums. Such modules could employ a weighted combination of differing simulator technologies to be used for a specific training scenario where separate simulator modalities could complement each other to address highly contextual training needs. For example, determining the most efficient combination of full-mission and VR simulators for training in a fire emergency scenario.

The study’s framework and criteria weights were developed through a two-step process: initially deriving criteria from literature, followed by using in-depth expert assessment to assign weights to each criterion. Future studies should assess the MCDM framework's performance in specific organizational settings, while accounting for evolving simulation technologies and incorporating emerging criteria (e.g., manufacturer’s timely service provisions, ease of data extraction, etc.). In future studies, it could be valuable to explore the integration of other approaches, such as the Delphi method with other MCDM methods. This combination could facilitate expert consensus on both the criteria and alternatives, leading to more robust and well-informed decision-making processes.