1 Introduction

Non-technical skills (NTSs) are composed of both interpersonal and cognitive elements. These include situational awareness, teamwork, decision making, leadership, management and communication skills (Flin et al. 2003a, 2009). In a crisis situation, good NTSs allow a deck officer to quickly recognise that a problem exists and then harness the resources that are at their disposal to safely and efficiently bring the situation back under control. The ability to identify a deck officer’s level of NTSs could enable maritime educators to assess the effectiveness of their Human Element, Leadership and Management (HELM) training programmes. As a result, the aim of this paper is twofold:

  • To develop a methodology to enable educators to quantitatively assess the impact of Maritime and Coastguard Agency (MCA)-approved HELM training on deck officer’s NTSs with a view to identifying further training requirements.

  • To determine whether the HELM training provided to develop the NTSs of trainee deck officers is fit for purpose.

2 Background

A review of the maritime accident databases of the UK, the USA, Norway and Canada conducted by Barnett et al. (2006) found that human error is the main contributing factor in the majority of maritime accidents. The study illustrated that major maritime accidents are not caused by technical problems but rather by the failure of the crew to respond to situations appropriately. The main findings of the study were that (Barnett et al. 2006; Habberley et al. 2001):

  • Whilst the total number of accidents is declining, human error continues to be the dominant factor in 80 to 85 % of maritime accidents.

  • Human fatigue and task omission are closely related to failures in situation awareness.

    Human error cannot possibly be eliminated altogether, but measures can be taken to reduce it. Analysis in a number of industrial sectors has indicated that up to 80 % of accidents can be attributed to human factors (Gregory and Shanahan 2010; Flin et al. 2008). The Maritime Transportation Research Board, in the USA, also estimated that 80 % of accidents are due to human error (Goulielmos et al. 2012). Many court decisions about maritime accidents have also come to the same conclusion: “80% of all maritime accidents are due to human error” (Havold 2000). The Registro Italiano Navale (1996) suggests that the human factor is the root cause of between 60 and 80 % of serious maritime casualties. These casualties are often preceded by human error in the form of misconduct, omission, lack of competence, under-estimation of hazardous situations, lack of preparedness, lack of communication and responsibility (Goulielmos et al. 2012).

    The concept of non-technical skills originated in the aviation industry when the National Transportation Safety Board of the USA investigated a number of airline accidents in the 1960s and 1970s (Smith 2001; Kern 2001). The first generation of the crew resource management (CRM) courses that were developed to address the causes of these accidents were delivered in 1981. These courses have evolved over time and have now reached their fifth generation (Smith 2001; Kern 2001). The fifth generation of CRM courses accepts that human error is inevitable. With this being the case, CRM courses are now seen as an inclusive part of the range of countermeasures that make up three lines of defence (Kern 2001):

  • The avoidance of error

  • The trapping of incipient errors before they are committed

  • Mitigating the consequences of those errors that occur and are not trapped

    Various other sectors of a safety critical nature, such as nuclear power (Crichton and Flin 2004), anaesthesia (Fletcher et al. 2003a) and surgery (Yule et al. 2006), have followed in the footsteps of the aviation industry. Each of these sectors has made their own efforts towards developing models of those non-technical skills that are deemed relevant within each of their respective domains. Within the shipping industry, the need for the training and assessment of non-technical skills was acknowledged in 2012 when the HELM training was made compulsory.

    The road to this point began in 2010 when a comprehensive ongoing review of the 1978 Standards of Training, Certification and Watchkeeping for Seafarers (STCW) Convention culminated in a Conference of Parties to the STCW Convention being held in Manila. This conference adopted a significant number of amendments to the STCW Convention and the STCW Code. Amongst these amendments was the requirement to introduce mandatory training in resource management, leadership and teamwork at operational level and leadership and managerial skills at management level. These respective packages of training were to be referred to as HELM(O), the operational level course aimed at trainee officer of the watch-level students, and HELM(M), the management level course aimed at qualified officers studying for a chief mate’s licence.

3 Methodology

To achieve the aims of this paper, a three-phase approach was adopted (Fig. 1). In phase 1, a taxonomy for deck officer’s NTSs is established, behavioural markers are then identified and the relative importance of each attribute is calculated using the analytic hierarchy process (AHP). In phase 2, a set of scenarios are identified for the assessment of deck officer’s NTSs in a simulated ship’s bridge environment. Subsequently, a random selection of students that have completed the MCA-approved Chief Mate programme was performed and then they were observed performing their role as part of the bridge team during the identified scenarios. During this process, data is collected regarding their NTS-related performance. Finally, in phase 3, the data that was collected was fed into the evidential reasoning (ER) algorithm and utility values were subsequently produced that enable the effectiveness of the HELM training that the students have received to be evaluated.

Fig. 1
figure 1

A three-phase approach to assessing the effectiveness of MCA-approved HELM training (source: own)

3.1 Phase 1, step 1: establish a taxonomy for deck officer’s NTSs

Based on a review of relevant literature (Flin et al. 2003a; International Association of Maritime Universities 2010; MNTB 2012) and input from experienced deck officers, a taxonomy for non-technical skills was developed. Twelve senior officers were asked to provide their input. To qualify for inclusion amongst these 12, the individuals had to have more than 10 years of experience at sea, hold an unlimited Masters licence and currently be studying towards postgraduate qualification. The relevant literature established the boundaries within which the discussion was to take place and the responses of the senior deck officers as they were interviewed were then collated to construct a taxonomy for the issue. The taxonomy that was developed is shown in Table 1.

Table 1 Taxonomy for non-technical skills amongst deck officers

3.2 Phase 1, step 2: identify behavioural markers

Behavioural markers are used for the assessment of those undergoing training in a simulated environment. Such systems were first developed in the aviation industry (Helmreich et al. 1999). Later on, other safety critical sectors such as medicine and nuclear power developed their own behavioural marker systems. Any system of assessment that is based on a behavioural marker framework must, as far as possible, be designed to ensure that it is capable of capturing the fullest context of the environment in which the assessment is taking place (Gatfield 2008).

To determine which skills and behaviour are important for deck officers to have, interviews were conducted with 12 experienced senior deck officers, based in the UK and Sweden. To qualify for inclusion amongst these 12, the individuals had to have more than 10 years of experience at sea, hold an unlimited Masters licence and currently be studying towards postgraduate qualification. The same 12 officers used in step 1 were used again here in step 2. The purpose of this was to determine, from the perspective of experienced senior deck officers, the most significant elements of deck officer’s non-technical skills. Based on the output of these interviews, a series of behavioural markers for assessing the teamwork, leadership, situational awareness and decision making of an officer in a ship bridge simulator were identified. These markers are demonstrated in Tables 2, 3, 4 and 5. There are five levels of performance against each of these behavioural markers ranging from very good practice to very poor practice. By using these behavioural markers, an assessor is able to rate a student’s performance in a simulated ship’s bridge environment.

Table 2 Behavioural markers of teamwork sub-criteria
Table 3 Behavioural markers of the leadership sub-criteria
Table 4 Behavioural markers of the situational awareness elements
Table 5 Behavioural markers of the decision making elements

3.2.1 Teamwork

The need for people to work together as a team and to achieve objectives which contribute to the overall aims of their organisation has become increasingly important as organisations have grown in size and become more complex (West 2012; Cohen and Bailey 1997). Organisations with team-based structures can respond quickly and effectively in the modern fast-changing environment (Cohen and Bailey 1997). Teamwork is important in many workplace settings but is especially important in higher-risk industries such as aviation, nuclear power, healthcare and maritime.

To be able to achieve its goals, a team must function effectively from the moment it is established. In the case of a ship’s bridge team, team members must have a common understanding of how they will be expected to work together to manoeuvre their ship (Civil Aviation Authority (CAA) 2016). The effective operation of such a team is highly dependent on the team’s ability to perform a range of skills. These include, but are not limited to, communication, coordination, cooperation and control (Stanton 1996). The selected teamwork sub-criteria along with their associated behavioural markers are shown in Table 2.

3.2.2 Leadership and management

Fiedler (1995) defines a team leader as “a person who is appointed, elected, or informally chosen to direct and co-ordinate the work of others in a group”. Leadership is about encouraging team members to work together, assigning them tasks and assessing their performance, developing the knowledge base of the team as a whole, improving team members’ skills and abilities, continuously motivating team members, planning and organising the execution of tasks and establishing a positive team atmosphere (Salas et al. 2004). The selected leadership and management sub-criteria along with their associated behavioural markers are shown in Table 3.

3.2.3 Situational awareness

Endsley (1995) defines situational awareness as “the perception of the elements in the environment within a volume of time and space, the comprehension of their meaning and the projection of their status in the near future”. It has been widely established and accepted that situational awareness is a contributory factor to many accidents and incidents in safety critical industries (Grech et al. 2002). However, situational awareness only began to receive attention in the late 1980s when related research started in the aviation industry (Salmon et al. 2009). The selected situation awareness sub-criteria along with their associated behavioural markers are shown in Table 4.

3.2.4 Decision making

Good decision making is an essential skill for the completion of operations in any high-risk organisation. In aviation, decision making is defined as “The process of reaching a judgement or choosing an option” (Flin et al. 2003b). Although this definition is labelled as aeronautical decision making, it may also be taken as a universal definition for all high-risk industries. Like an aeroplane pilot, a ship’s master also has to make different types of decisions for different situations. The selected decision making sub-criteria along with their associated behavioural markers are shown in Table 5.

3.3 Phase 1, step 3: calculate the relative importance of each attribute using AHP

The behavioural markers identified from interviewing 12 experienced senior deck officers, based in the UK and Sweden, allowed a non-technical skills assessment framework for deck officers to be established (Fig. 2). The next stage of the assessment process is to assign a weight to each criterion in the framework by applying a mathematical decision making method known as analytic hierarchy process (AHP).

Fig. 2
figure 2

Assessment framework for deck officer’s non-technical skills (source: own)

3.3.1 Analytic hierarchy process

The AHP was pioneered by Saaty (1980). It is a popular method that is widely used in decision making tasks. Although it has been used in military decision making, it is not restricted to military problems (Coyle 2004). It is a multi-criteria decision making (MCDM) method that helps the decision maker deliver an informed decision when dealing with a complex situation (Ishizaka and Labib 2009). To make a comparison between the available alternatives, numbers (i.e. intensity of importance) are assigned by experts. The intensity of importance indicates how many times more important one element is over another element. The scale for comparison is based on Saaty (1990, 2008) and is shown in Table 6. When considering a number of attributes for evaluation, the main objective of the technique is to provide judgements on the relative importance of each of these attributes to each other. Once this is accomplished, it is then necessary to ensure that the judgements are quantified to an extent that permits their quantitative interpretation (Pillay and Wang 2003).

Table 6 Pairwise comparison scale

Riahi et al. (2012) state that the quantified judgements between pairs of attributes (A i and A j ) can be represented by an n-by-n matrix (D). The entries a ij are defined by the following entry rules:

  1. Rule 1:

    If a ij  = α, then a ji  = 1 / α, α ≠ 0.

  2. Rule 2:

    If A i is judged to be of equal relative importance as A j , then a ij  = a ji  = 1.

$$ D=\left[\begin{array}{llll}1\hfill & {a}_{12}\hfill & \dots \hfill & {a}_{1n}\hfill \\ {}1/{}_{a12}\hfill & 1\hfill & \dots \hfill & {a}_{2n}\hfill \\ {}\dots \hfill & \dots \hfill & \dots \hfill & \dots \hfill \\ {}1/{}_{a1n}\hfill & 1/{}_{a2n}\hfill & \dots \hfill & 1\hfill \end{array}\right] $$
(1)

In this formula, i, j = 1, 2, 3,…, n and each a ij is the relative importance of attribute A i to attribute A j .

Having recorded the quantified judgments of comparison on pair (A i , A j ) as the numerical entry a ij in the matrix D, what is left is assigned to the n contingencies (A 1, A 2,…, A n ), a set of numerical weights (w 1, w 2,…, w n ) that should reflect the recorded judgements. Generally weights w 1, w 2,…, w n can be calculated by using the following equation, where a ij represents the entry of row i and column j in a comparison matrix of order n:

$$ {\omega}_k=\frac{1}{n}{\displaystyle \sum_{j=1}^n}\frac{a_{kj}}{{\displaystyle {\sum}_{i=1}^n}{a}_{ij}}\kern0.5em \left(k=1,2,3,\dots, n\right) $$
(2)

The weight vector of the comparison matrix will provide the priority order, but it cannot confirm the consistency of the pairwise judgement. The AHP provides a measure of the consistency of the pairwise comparisons by computing a consistency ratio (CR) (Aull-Hyde et al. 2006). The CR is devised in such a way that a value of less than 0.10 is deemed consistent in the pairwise judgement. A decision maker should review the pairwise judgements if the resultant value is more than 0.10. The CR value is calculated according to the following equations where CI is the consistency index, RI is the average random index (Table 7), n is the matrix order and λ max is the maximum weight value of the n-by-n comparison matrix D:

Table 7 Value of RI versus matrix order (Endsley 1995)
$$ CR=\frac{CI}{RI} $$
(3)
$$ CI=\frac{\lambda_{\max }-n}{n-1} $$
(4)
$$ {\lambda}_{\max }=\frac{{\displaystyle {\sum}_{j=1}^n}\left[\left({\displaystyle {\sum}_{k=1}^n}{w}_k{a}_{jk}\right)/{w}_j\right]}{n} $$
(5)

3.3.2 Geometric mean method

Expert judgements were aggregated using the geometric mean method. The formula below [28] demonstrates this method. In this formula, e kij is the kth expert judgement on pair of attributes A i and A j :

$$ Geometric\ mea{n}_{ij}={\left[{e}_{1ij},{e}_{2ij},{e}_{3ij},\dots, {e}_{kij}\right]}^{\frac{1}{k}} $$
(6)

3.4 Phase 2, step 4: identify a set of scenarios for the assessment of deck officer’s NTS

It was necessary to produce scenarios that would give participants the opportunity to demonstrate the recognised elements (teamwork, situational awareness, leadership and decision making) of their NTSs in a ship’s bridge simulator. With this as the aim, the following three scenarios were generated:

Scenario 1

The team’s vessel (own vessel) is alongside a jetty in Southampton. Sufficient time will be given prior to the commencement of the exercise for those involved to be able to check all the equipment and familiarise themselves with the pre-prepared passage plan. The bridge equipment needs to be tested and the relevant checklists need to be completed prior to the exercise commencing. The team will pilot their own vessel and maintain all the records as required. Each team will independently manoeuvre their own vessel without the use of tug boats. However, the bow thruster can be used if the team feels that it is necessary. During the exercise, the team’s own vessel will encounter a number of other vessels that are either inbound to or outbound from Southampton. There will be a grounded vessel in the vicinity of the Nab Tower with a salvage operation underway that requests a wide berth. Whilst passing the Fawley Terminal, gyro number 1 will begin to drift at a rate of 1° per s. Based on the position of the vessels at the time of passing, there will also be the possibility of interaction with a large inbound containership.

Scenario 2

This exercise is set in the approaches to the Bosphorus, Turkey. Sufficient time will be given prior to the commencement of the exercise for those involved to be able to check all the equipment and familiarise themselves with the pre-prepared passage plan. The Master begins the exercise in the debriefing room and is available to be called to the bridge as required. The bridge equipment needs to be tested, and the relevant checklists need to be completed prior to the exercise commencing. The vessel is to proceed to an anchorage to begin bunkering operations. There will be a number of vessels in the area concerned. These will range from vessels at anchor, vessels approaching the team’s vessel from different directions, overtaking and numerous ferries crossing. A strong tide is present that makes it more difficult to steer. As time progresses, the vessel will go on to proceed through the Bosphorus towards the Black Sea. At this stage, there will be a number of south-bound vessels, strong cross currents and ferry operations. All of these will require strict adherence to the passage plan as well as manoeuvring to avoid collision with other ships.

Scenario 3

This exercise starts at the handover of a watch. Sufficient time will be given prior to the commencement of the exercise for those involved to be able to check all the equipment and familiarise themselves with the pre-prepared passage plan. The bridge equipment needs to be tested and the relevant checklists need to be completed prior to the exercise commencing. The Third Officer (3/O) commences the exercise as the Officer of the Watch (OOW). The instructor will act as the lookout and is available on a walkie-talkie/telephone. An individual from each of the three bridges will be playing the role of the Second Officer and will proceed to the bridge to take over the watch. The handover will take place using the appropriate procedure and checklists. When the watch has been handed over, the relieved OOW will return to the debrief room and will take up the role of Chief Mate (CM). Bridge 1 will be situated in such a way that in the initial 20 min of running the exercise, it will be just in a visual range of a target showing a strobe light (normally fitted on a life raft). This target will have a very poor radar return and will only be detected if the radar controls are set appropriately. Another target to the north (a vessel in distress) will provide a weak radar return but will not be within visual range. Bridges 2 and 3 will be within very high frequency (VHF) range of bridge 1. The scenario is set in Global Maritime Distress and Safety System (GMDSS) sea area A2 (medium frequency range). Each bridge will have a Master and CM available to assist the OOW if requested. The exercise will progress in the anticipation that the OOW on bridge 1 will identify the life raft, summon the master and instigate a search and rescue plan. In the event that the OOW does not take the appropriate action, the virtual lookout (the instructor) will call the bridge and report the sighting. The exercise will then be conducted in line with the delegates’ responses. One of the bridges will be tasked as being the On Scene Commander (OSC), and there will be a minimum intervention from the Maritime Rescue Coordination Centre (MRCC). There will be other vessels in the area. These consist of the following:

  • A warship to the west that is in medium frequency (MF) range and has an operational helicopter.

  • A fishing vessel to the north that will offer assistance and has the benefit of a low freeboard. Her position is such that the Emergency Position Indicating Radio Beacon (EPIRB) position of the causality can be reached if she is utilised immediately.

3.5 Phase 2, step 5: gather data on the NTS performance of officers in the scenarios

Participants were randomly selected from a cohort of students that had completed the CM training programme. They were then divided into two groups. The first group had not yet received non-technical skills training and the second group had received non-technical skills training through the MCA-approved HELM course. Within these groups, they were then divided in sub-groups which would make up their bridge teams. Based on the bridge simulator scenarios detailed in step 4, the qualitative characteristics (teamwork, leadership and management, situational awareness, decision making) of each participant were assessed using the identified behavioural markers (Tables 2, 3, 4 and 5). Data was gathered on their performance in each of the scenarios. The data gathered was then aggregated using the ER algorithm.

3.6 Phase 3, step 6: feed the gathered data into the ER algorithm

The theory of evidence was first generated by Dempster (1967) and further developed by Shafer (1976). After the further development, the theory came to be referred to as the Dempster-Shafer (D-S) theory of evidence. The D-S theory was originally used as an approximation tool for information aggregation in expert systems. Subsequently, it also came to be used in decision making under uncertainty. Through the ongoing evolution of the theory of evidence the ER algorithm was developed. The ER algorithm can be explained as follows (Riahi et al. 2012):

Let R represent a set with five linguistic terms (very poor, poor, average, good and very good). The associated belief degrees (β) can be synthesised by two subsets R 1 and R 2 from two different assessments. Then, for example, R, R 1 and R 2 can separately be expressed by

$$ \begin{array}{c}\hfill R=\left\{{\beta}^1\ \mathrm{very}\ \mathrm{poor},{\beta}^2\ \mathrm{poor},{\beta}^3\ \mathrm{average},{\beta}^4\ \mathrm{good},{\beta}^5\ \mathrm{very}\ \mathrm{good}\right\}\hfill \\ {}\hfill {R}_1=\left\{{\beta}_1^1\ \mathrm{very}\ \mathrm{poor},{\beta}_1^2\ \mathrm{poor},{\beta}_1^3\ \mathrm{average},{\beta}_1^4\ \mathrm{good},{\beta}_1^5\ \mathrm{very}\ \mathrm{good}\right\}\hfill \\ {}\hfill {R}_2=\left\{{\beta}_2^1\ \mathrm{very}\ \mathrm{poor},{\beta}_2^2\ \mathrm{poor},{\beta}_2^3\ \mathrm{average},{\beta}_2^4\ \mathrm{good},{\beta}_2^5\ \mathrm{very}\ \mathrm{good}\right\}\hfill \end{array} $$
(7)

Suppose that the normalised relative weights of two assessments in the evaluation process are given as w 1 and w 2 (w 1 + w 2 = 1). Then, w 1 and w 2 can be estimated by using the AHP. Suppose that \( {M}_1^m \) and \( {M}_2^m \) (m = 1, 2, 3, 4 or 5) are individual degrees to which the subsets R 1 and R 2 support the hypothesis that the evaluation is confirmed to the five linguistic terms. Then, \( {M}_1^m \) and \( {M}_2^m \) are obtained as

$$ \begin{array}{c}\hfill {M}_1^m={w}_1{\beta}_1^m\hfill \\ {}\hfill {M}_2^m={w}_2{\beta}_2^m\hfill \end{array} $$
(8)

Suppose that H 1 and H 2 are the individual remaining belief values unassigned for \( {M}_1^m \) and \( {M}_2^m \) (m = 1, 2, 3, 4 or 5). Then, H 1 and H 2 are expressed as

$$ \begin{array}{c}\hfill {H}_1={\overline{H}}_1+{\tilde{H}}_1\hfill \\ {}\hfill {H}_2={\overline{H}}_2+{\tilde{H}}_2\hfill \end{array} $$
(9)

where \( {\overset{-}{H}}_n \) (n = 1 or 2) represents the degree to which the other assessors can play a role in the assessment and \( {\overset{\sim }{H}}_n \) (n = 1 or 2) is caused by the possible incompleteness in the subsets R 1 and R 2. \( {\overset{-}{H}}_n \) (n = 1 or 2) and \( {\overset{\sim }{H}}_n \) (n = 1 or 2) are described as

$$ \begin{array}{c}\hfill {\overline{H}}_1=1-{w}_1={w}_2\hfill \\ {}\hfill {\overline{H}}_2=1-{w}_2={w}_1\hfill \\ {}\hfill {\tilde{H}}_1={w}_1\left(1-{\displaystyle \sum_{m=1}^5}{\beta}_1^m\right)\hfill \\ {}\hfill {\tilde{H}}_2={w}_2\left(1-{\displaystyle \sum_{m=1}^5}{\beta}_2^m\right)\hfill \end{array} $$
(10)

Suppose that β m (m = 1, 2, 3, 4 or 5) represents the non-normalised degree to which the reliability evaluation is confirmed to each of the five linguistic terms as a result of the synthesis of the judgements produced by assessors 1 and 2. Suppose that HU represents the non-normalised remaining belief unassigned after the commitment of belief to the five linguistic terms because of the synthesis of the judgements produced by assessors 1 and 2. The evidential reasoning algorithm is stated as

$$ \begin{array}{c}\hfill {\beta}^{m^{\prime }}=K\left({M}_1^m{M}_2^m+{M}_1^m{H}_2+{M}_2^m{H}_1\right)\hfill \\ {}\hfill {{\overline{H}}^{\prime}}_U=K\left({\overline{H}}_1{\overline{H}}_2\right)\hfill \\ {}\hfill {{\tilde{H}}^{\prime}}_U=K\left({\tilde{H}}_1{\tilde{H}}_2+{\tilde{H}}_1{\overline{H}}_2+{\tilde{H}}_2{\overline{H}}_1\right)\hfill \\ {}\hfill K = {\left(1-{\displaystyle \sum_{T=1}^5}{\displaystyle \sum_{\begin{array}{c}\hfill R=1\hfill \\ {}\hfill R\ne T\hfill \end{array}}^5}{M}_1^T{M}_2^R\right)}^{-1}\hfill \end{array} $$
(11)

After aggregation, the combined degrees of belief are generated by assigning values of \( {\overset{-}{H\prime}}_{\mathrm{U}} \) to the five linguistic terms using the normalisation process. Within this formula, H U is the unassigned degree of belief representing the extent of incompleteness in the overall assessment

$$ \begin{array}{c}\hfill {\beta}^m=\frac{\beta^{m\hbox{'}}}{1-{{\overline{H}}^{\prime}}_{\mathrm{U}}}\kern2em \left(m=1,\kern0.5em 2,\kern0.5em 3,\kern0.5em 4\ \mathrm{or}\kern0.5em 5\right)\hfill \\ {}\hfill {H}_{\mathrm{U}}=\frac{{{\tilde{H}}^{\prime}}_{\mathrm{U}}}{1-{{\overline{H}}^{\prime}}_{\mathrm{U}}}\hfill \end{array} $$
(12)

The above gives the process of combining two subsets. If three subsets are required to be combined, the result obtained from the combination of any two subsets can be further synthesised with the third subset using the above algorithm. In a similar way, the judgements of multiple assessors of lower-level criteria in the chain system (i.e. components or subsystems) can be combined.

3.7 Phase 3, step 7: produce utility values

The main aim of using a utility approach is to obtain a single crisp number for the final output result or goal in order to rank them. Let the utility of an evaluation grade H n be denoted by u(H n ) and u(H n + 1) > u(H n ) if H n + 1 is preferred to H n ; u(H n ) can be estimated using the decision marker’s preferences. If no preference information is available, it could be assumed that the utilities of evaluation grades are equidistantly distributed in a normalised utility space. The utilities of evaluation grades that are equidistantly distributed in a normalised utility space are calculated as follows (Salmon et al. 2009):

$$ \mu \left({H}_n\right)=\frac{V_n-{V}_{\min }}{V_{\max }-{V}_{\min }} $$
(13)

where V n is the ranking value of the linguistic term H n that has been considered, V max is the ranking value of the most-preferred linguistic term H N and V min is the ranking value of the least-preferred linguistic term H l .

The utility of the top level or general criterion S(E) is denoted by u(S(E)). If β H  ≠ 0 (i.e. the assessment is incomplete, \( {\beta}_H=1-{\sum}_{n=1}^N{\beta}_n \)), there is a belief interval [β n ,(β n  + β H )], which provides the likelihood that S(E) is assessed to H n . Without loss of generality, suppose that the least-preferred linguistic term having the lowest utility is denoted by u(H l ) and the most-preferred linguistic term having the highest utility is denoted by u(H N ). Then, the minimum, maximum and average utilities are defined as follows (Riahi et al. 2012):

$$ \begin{array}{c}\hfill {u}_{\min}\left(S(E)\right)={\displaystyle \sum_{N=2}^N}{\beta}_nu\left({H}_n\right)+\left({\beta}_l+{\beta}_H\right)u\left({H}_l\right)\hfill \\ {}\hfill {u}_{\max}\left(S(E)\right)={\displaystyle \sum_{n=1}^{N-1}}{\beta}_nu\left({H}_n\right)+\left({\beta}_N+{\beta}_H\right)u\left({H}_N\right)\hfill \\ {}\hfill {u}_{\mathrm{average}}\left(S(E)\right)=\frac{u_{\min}\left(S(E)\right)+{u}_{\max}\left(S(E)\right)}{2}\hfill \end{array} $$
(14)

If all the assessments are complete, then β H  = 0 and the maximum, minimum and average utilities of S(E) will be the same. Therefore, u(S(E)) can be calculated as

$$ u\left(S(E)\right)={\displaystyle \sum_{n=1}^N{\beta}_nu\left({H}_n\right)} $$
(15)

It is perhaps worth mentioning that the above utilities are used only for characterising an assessment and not for criteria aggregation (Riahi et al. 2012).

4 Results

A three-phase approach was adopted to address the issue of whether the existing MCA-approved HELM training is fit for purpose. Each phase delivered its own set of results that were fed into the subsequent phase. The results generated in each of these phases were as follows:

4.1 Phase 1: analytic hierarchy process

Data was collected from 12 experienced management level seafarers. To qualify for inclusion amongst these 12, the individuals had to have more than 10 years of experience at sea, hold an unlimited Masters licence and currently be studying towards a postgraduate qualification. The results obtained from four of these experts were considered to be inconsistent (CR was greater than 0.1), so they were not included in the subsequent calculations. The consistent results from the remaining eight experts were fully utilised in the AHP. The resultant weights of the criteria and sub-criteria can be seen in Table 8.

Table 8 Weights of the criteria and sub-criteria

4.2 Phase 2: scenarios

Based on the scenarios that were developed and the established behavioural marker assessment framework, observations were made of the performance of those involved in each scenario. After conducting extensive simulator observations, enough data was collected to be fed into the ER algorithm.

4.3 Phase 3: evidential reasoning

Through utilising the evidential reasoning algorithm and the utility approach, the data collected from observations made during the scenarios was used to make a comparison between the average performance of groups with the HELM training and the average performance of groups without the HELM training. To do this, the utility values were calculated and the mean for each group was identified to allow it to be compared against the average group performance. Table 9 shows the utility values of the groups with the HELM training and those without. Table 10 shows the rankwise sequencing of all the groups. Table 11 compares the average utility value of those groups with HELM training against those groups without.

Table 9 Utility values of the groups with HELM training and those without
Table 10 Rankwise sequencing of all the groups
Table 11 A comparison of the average utility value of the groups with HELM training against those without

Table 10 shows that there is no relationship between the position of a group within the order of ranking and whether that group has received HELM training. This is best exemplified by group number 6, a group which has not received HELM training, being ranked as the group with the greatest utility value.

Table 11 shows that the average utility value of groups with the HELM training is only 0.8 % greater than the average utility value of groups without the HELM training. It was evident during the observations that the students with the HELM training did not apply the non-technical skills that were taught during the course. Generally, students were found to be especially weak in situational awareness and decision making. Lack of anticipation resulted in poor decisions. In some instances, task delegation was not clear and this resulted in task omissions.

5 Conclusions

This paper had two aims. The first was to develop a methodology to enable educators to quantitatively assess the impact of MCA-approved HELM training on deck officer’s NTSs with a view to identifying further training requirements. To achieve this aim, a three-phase approach was adopted where initially a taxonomy for deck officer’s NTSs was established and then each attribute was allocated a weight through the AHP. Subsequently, a set of scenarios were identified for the assessment of deck officer’s NTSs in a simulated ship’s bridge environment. Finally, the data collected through these scenarios were fed into the ER algorithm and the utility values were produced. In this regard, a methodology has been devised and tested to demonstrate its ability to assess the NTSs of individuals operating as part of a bridge team in a simulated bridge environment. As a result, it can be considered that the first objective has been successfully met.

The second aim of this paper was to establish whether the HELM training provided to deck officers is fit for purpose. It can clearly be seen that the results derived from the methodology established in this paper show that the average utility value of the groups with HELM training is higher than the average utility value of the groups without HELM training. However, with only a 0.8 % difference between the values, this is such a small margin that it is misleading to state that HELM training clearly benefits all those that participate in it. With this being the case, it suggests that the current HELM training course is an ineffective method for improving all deck officer’s NTS. However, a larger data set would be required to enable a conclusive result to be arrived at.

The conclusion that the present HELM course is ineffective is based on simulator scenarios performed by CM students only. It is possible that this may not be a representative population to draw such a conclusion from.