1 Introduction

Safety of life at sea is one of the most important functions for stakeholders within the maritime domain. Whether ship owners and operators, ship designers and builders, classification societies, or port state, and flag state regulators—all of these stakeholders have an interest in safety. There currently exist a wide variety of measures to ensure safety within the maritime domain. In efforts to further enhance safety, stakeholders often turn to measuring risk (an inverse of safety) as a means of identifying further opportunities for improvement to avoid rare event or even common accidents. Without getting into the semantic differences about the definitions of risk just yet, it should be noted that there are a wide variety of approaches to conducting risk assessment within the maritime domain (for a review of some, see Guedes Soares and Teixeira 2001). Covello and Merkhofer (1993) describe risk analysis as consisting of three stages—hazard identification, risk assessment, and risk evaluation—which provide information for consideration in risk management. There is a strong history of risk analysis within the maritime domain, including the Formal Safety Assessment process (International Maritime Organization 2002) which was introduced by the International Maritime Organization to inform its standards development activities (see e.g., Ventikos and Psaraftis 2004; Ellis et al. 2008; Vanem et al. 2008; Hu et al. 2007; Wang and Foinikis 2001). This study has been focused on building a model suitable for empirical analysis and hypothesis testing.

1.1 Causal chain framework

The use of a causal chain framework is prominent in the maritime domain (e.g., Antão and Guedes Soares 2008; Harrald et al. 1998; Baisuck and Wallace 1979) as well as in many other domains (e.g., see Bick et al. 1979; Fischhoff et al. 1981 for an example of causal chain use in highway transportation). Even though the use of a causal chain framework is extremely instructive in understanding system risks, some difficulties have been associated with that particular form of risk analysis. For example, Wagenaar and Reason (1990) suggested a concern that focus should be applied to matters of higher order than paths of accidental events, Wagner (1999) observed that causality may not be appropriate for second-order or non-linear interactions among variables, Rasmussen et al. (1990) noted certain procedural difficulties for capturing human error, and Russell (1919) noted certain linguistic obstacles involving the notion of causality. Despite these points for consideration, there remains considerable historical support for using the causal chain approach (see e.g., Wold 1965, 1954; Strotz and Wold 1960; Strotz 1960; Wold 1960 which focus on econometric models). While all methodologies have pros and cons, as well as supporters and detractors, it is not the intent of the study to suggest that any one method is superior to another. Rather, this study is intended to illustrate a theory-building frame as an alternative means of risk analysis.

1.2 Other frameworks for conducting risk analyses

The study of risk includes the examination of risk assessment, risk analysis, risk-informed decision making, risk management, risk perception and tolerance, risk communication, and all of the other interesting explorations which treat the concept of risk as an ends or means in advancing our understanding of the world. Regardless of how we define this area of inquiry, the modern study of risk is in its relative infancy. Often, as disciplines or areas of inquiry develop and mature, and when new phenomenon are discovered less frequently, it may become inevitable that researchers will turn their attention to better understanding the known findings. This sort of lifecycle could be viewed as typical; the field of social psychology is one example where such development has occurred. Spencer et al. (2005) noted that, during a similar inflection point in the development of the social psychology field, researchers began to overemphasize one particular model and methodology based upon the seminal work of Barron and Kenny (1986). While it is probably premature to suggest that one model or methodology has emerged as dominant in the study of risk, and it is definitely premature to consider the study of risk a well-formed discipline, the use of accident causal chains as a framework for risk study has become widely accepted in the maritime domain (see e.g., Grabowski et al. 2007a, b; van Dorp et al. 2001). While the use of accident causal chains in the maritime domain appears to emanate from the other disciplines, Baisuck and Wallace (1979) provided us a proposed causal chain framework as a means of developing public maritime policy strategies without conducting time-consuming descriptive analyses. Interestingly, while Spencer et al. (2005) call for establishing causal chain analysis as a needed companion to the existing meditational analyses, this paper puts forth a theoretical moderation-of-process model built in the tradition of the social sciences (Dubin 1978) to complement the existing causal chain models already existing within the maritime domain. To put this taxonomy of models in context, Fig. 1 provides a description of recommended experimental designs adapted from Spencer et al. (2005).

Fig. 1
figure 1

Recommended experimental designs based upon ease of measurement and ease of manipulation of the system under study

This taxonomy shows various combinations of relative ease of system manipulation (in columns) and relative data availability (in rows) and suggests various types of experimental design. Casual chain analyses are conceptually intuitive and informative; the prevalence of causal chain analyses in the maritime domain is explained, in part, by this figure. From a practical perspective, data for analyses are typically unavailable, but once the analysis has been completed (often using data from expert judgment), adjustments to the system are relatively easy to make and results observed. Alternatively, in cases where data are available, and the system is relatively easy to manipulate, the moderation-of-process design is appropriate. The remainder of this paper will be devoted to developing an analytical framework for a moderation design using the eight-step theory-building methodology proposed by Dubin as described in Fig. 2. This theory-building process has been deployed widely. Examples include scenario planning (Chermack 2005), entrepreneurship (Ardichvili et al. 2003), and software design (Sjoberg et al. 2008).

Fig. 2
figure 2

Dubin's eight-step theory-building research methodology

The purpose of this paper will be to put forth a theory-building process as a means of assigning relative risk to vessels, and ultimately as a means of strategically allocating resources for ship inspections and other safety risks to the highest risk vessels along the lines proposed by Degré (2007) and others. The model developed was empirically tested using data from the US commercial small passenger vessel fleet. This fleet is made up of approximately 6,000 vessels and is considered to be a sufficiently large, unique population to demonstrate the efficacy of the process and model.

2 Analytic framework

In the social science tradition of theory building (Dubin 1978), this study developed a model, or theory, to identify relevant risk factors in order to indicate high-risk states that may necessitate additional action to reduce risk and thereby, improve or maintain safety. Ideally, theory-building principles are universal and transcend disparate paradigms of thought and research (Gioia and Pitre 1990), but Dubin's theory-building methodology clearly falls within the functionalist paradigm (Holton and Lowe 2007) and, as such, treats the nature of a phenomenon as basically objective, awaiting impartial exploration and discovery. This paradigm supports a deductive approach to theory building that specifies hypotheses to be tested in the world using statistical analyses. While the functionalist approach to risk analysis is common in the maritime domain, Dubin's theory-building process is not prevalent and presents as an opportunity for advancement.

There are many methods for risk assessment. Ayyub (2001) provides taxonomy of nearly 100 methods organized by type as well as by release assessment, exposure assessment, consequence assessment, and risk estimation. While the first three types are similar to Merhhofer's (1986) risk chain, the risk estimation category provides considerable potential utility for this study as it does not segment into particular elements of the risk triplet (Kaplan 1997), but rather provides a means of capturing all facets of risk. In particular, the risk estimation taxonomy contains relative risk models—of which the model proposed in this paper, will be yet another.

Using an expert elicitation and aggregation process, risk factors were identified. Then, the interactions between and among these unit concepts/constructs were established. System boundaries and states were described. Finally, through another expert elicitation process using a paired comparison method, empirical indicators were created for each of the risk factors to define the system and allow empirical evaluation of the model.

2.1 Risk factors

Risk factors serve as the building-block unit concepts/constructs for the development of this risk analysis model. Ultimately, the selection of which risk factors to consider is arbitrary and at the discretion of the researcher. The identification of risk factors, or units, is often a reflection of the current “disciplinary” state. As Dubin (1978, p. 80) noted:

“It might even be suggested that the stage of development of a discipline is revealed by the emphasis on the ways in which new units are advanced in the field. If extension is the method primarily employed, then the discipline is probably relatively new, and analytical attention is still directed at filling out the collection of analytic units employed. If the discipline is well established, then analytic attention may turn to filling in the analytic units employed, in which case subdivision will be employed.”

Whereas many of the models being developed for considering risk present new risk factors for consideration (see e.g., Trucco et al. 2008), this model does not introduce new risk factors, but attempts to further refine or define some of the most prominent ones. Considering the field of risk analysis, this refinement of existing risk factors is somewhat out-of-step with the disciplinary trajectory previously described. Thus, while the utility of this model will necessarily be restricted by the choice and definition of its risk factors, the conceptual frame of theory building presented will provide broad value in specifying a context for consideration, discussion, and development of future models.

2.1.1 Using risk factors as a leading indicator of accidents

Recently, much work has been focused on developing leading indicators for the maritime domain (e.g., Grabowski et al. 2007a, b). Direct leading indicators, those that are observable and can be identified as a direct cause of an incident, can be used directly in a causal chain format. However, direct leading indicators are extremely difficult to ascertain. Therefore, indirect indicators (those proxies or surrogates that are associated with an incident) are identified to predicate accidents. The key to identifying the indirect indicators, also known as proxies, is to establish a strong linkage between the risk factor and the resultant undesirable event. Risk factors can be viewed as the leading indicators of risk. This concept of risk factors has a long history of use in the medical fields (see two recent examples e.g., Reunes et al. 2011; Grandjean et al. 2011). While we will ultimately be interested in exploring the various relationships between the risk factors, we will first identify those risk factors between specific correlates and accident outcomes.

2.1.2 Using an abstraction hierarchy to identify risk factors

Even though risk can be problematic as a leading indicator because probability is conditional and exposure increases with time, and it presses the limits of human cognition (e.g., Kahneman et al. 1982), there may be methods for identifying risk factors through other than temporal relationships or direct estimation that may yield more favorable results. Dubin provides a method for identifying new units through inventing and introducing intervening variables into the model. For example, using the risk lexicon proposed by the National Research Council (2008), risk is defined as the potential for unwanted, adverse consequences. This notion of risk clearly has a probability component and a consequence component that are combined through some mathematical operation to quantify the risk. In the typical causal chain framework, leading indicators are typically identified temporarily by disaggregating the causal chain into its component parts, each preceding the other like antecedent links in a chain. However, from the example of the definition of risk, this temporal decomposition is not possible. What is the intermediate condition between probability or consequence and risk? To avoid this situation, rather than horizontal temporal (or even special) decomposition, vertical abstraction decomposition is considered. Rasmussen (1986) introduced the notion of the abstraction hierarchy such that means–ends, parts–whole abstractions across five levels (i.e., functional purpose, abstract function, generalized functions, physical functions, and physical forms) are employed to provide system representations and clear procedures that improve safety. In this context, risk as a function of probability and consequence is an abstraction at the other end of the abstraction hierarchy from more readily identifiable physical forms and functions that are less cognitively complex and more likely to be interpreted consistently. Therefore, risk factors are developed by pushing down the abstraction hierarchy from the risk abstraction to more tangible objects that indicate risk or some component of risk.

2.1.3 Expert elicitation of risk factors

In this study, the focus was on developing a means of assessing the risks of small passenger vessels. For the purposes of this study, small passenger vessels are defined as vessels that carry more than six passengers (including at least one for hire) and less than 100 gross tons (U.S. Code of Federal Regulations 2008). The study was limited to small passenger vessels because it was deemed that this vessel population was sufficiently similar in nature to provide a meaningful comparison. Including additional vessel types such as tankers, barges, cargo ships, or even large passenger vessels would likely have produced less meaningful results because the population would not have been homogeneous.

A homogenous group of five experts was assembled with over 60 years of collective experience in port state and flag state regulatory development and implementation. The experts were trained in the elicitation process and were instructed to develop a list of their top ten attributes for judging the risk of small passenger vessels. Using the nominal group technique (van de Ven and Delbecq 1971), the experts came up with a list of 28 unique risk factors (see Appendix). These risk factors ranged from operating route (the only one which all of the experts suggested) to vessel age (that the majority of experts recommended) to market competition (one of several risk factors that only one expert suggested). In retrospect, it might have been advantageous to have assembled a diverse heterogeneous group of problem solvers in that the portfolio of solutions would have been more diverse and likely containing an even better solution than the expert group (Hong and Page 2004; Page 2007). Nonetheless, the results were sufficient to continue in the development of this model.

Next, the experts participated in a Delphi technique (Linstone and Turoff 1975) to establish a list of the most informative risk factors. A consensus about the top risk factors was achieved after three Delphi rounds of expert input, consolidation, and clustering (see Table 1).

Table 1 Expert consensus list of most informative small passenger vessel risk factors

To provide a level of abstraction similar to the other risk factors, crew competency, operator experience and quality, casualty (accident) history, and discrepancy (regulatory violation) history were merged into a single composite risk factor called operator characteristics, much like the safety culture factors in Håvold (2010a, b). Thus, four primary risk factors resulted: vessel characteristics which are similar to the ship classifier model inputs in Balmat et al. (2009), route characteristics, operator characteristics, and passenger loading. Recalling the abstraction hierarchy and definition of risk, these four primary risk factors are defined less abstractly than the component parts of risk (i.e., probability and consequence) (Montewka et al. 2010) and, thus, may be more actionable.

2.1.4 Risk factors defined

Table 2 presents these units in more detail, one may notice that the abstraction diminishes even more as we begin to further define each risk factor. The abstraction will all but disappear later when measurements are applied and empirical indicators are developed.

Table 2 Detailed summary of small passenger vessel risk factors

Some of the units are comprised of attributes, but all of the unit composites are variable in nature such that they are present in some degree, are real in that some form of empirical indicator will be available, are sophisticated in that they are able to be defined, and are member in that the unit of analysis will be the individual vessel rather than the collective fleet. An enumerative unit is one that possesses a particular property characteristic regardless of the condition or state of the vessel. An associative unit is one that has specific property characteristics in a particular vessel condition. There are also relational units (described by relationship among or between property characteristics) and statistical units (that summarizes the distribution of the properties), but these types of units are not present within this model. The four risk factors are then combined to form the summative unit, defined as relative risk. By their nature, summative units are global units that stand for an entire complex compilation of characteristics. According to Dubin (1978), this has the “characteristic of meaning a great deal, much of which is ill-defined and unspecified.” Despite this, summative units can serve to characterize a bundle of properties at one time, but it should primarily be used for educational purposes. In this study, the educational purpose will be policy development and analysis. The summative nature of relative risk can be observed in Fig. 3. Each of the four risk factors combines to produce the summative relative risk. Typically, risk is considered a multiplicative combination of probability and consequences, but in this case, as will be seen later, it is an additive combination and this may be partially explained by the double counting that exists within the overlapping nature of the specific risk factors. It should be noted that conclusions should not be drawn at this point. Even though the units are scaled and relational, the entire empirical relationship will need to be considered such that errors are not introduced. Once the units have been established, the next step is to identify the interactions among the units.

Fig. 3
figure 3

Risk factors combine to form summative relative risk

2.2 Interactions among and between units

In this section, the interaction among and between units will be explored. In an actual setting, the mathematical relationship between risk factors would be considered carefully. However, for the sake of demonstrating a proof of concept for using Dubin's theory-building process in the maritime domain, it has been arbitrarily assumed that each of the four enumerative or associative risk indicators combine to form the summative relative risk index as described in Fig. 3. Ultimately, this assumed relationship would be tested and evaluated in the theory testing steps of Dubin's process (see Fig. 2).

2.2.1 Mediators, moderators, independent, overlapping, and proxy risk factors

Up until this point, we have implicitly and arbitrarily assumed that there is independence between each risk factor and the others. While this may have some intuitive appeal and may make the model more tractable, it is important to examine the risk factor relationships. By synthesizing across several disciplines, Kraemer et al. (2001) developed a contemporary framework for classifying risk factor relationships based upon correlation, temporal precedence, and dominance—which relates closely to Dubin's three forms of interaction (categorical, sequential, and determinant). Based upon the particular combination of correlation, temporal precedence, and dominance, risk factors can be determined to be mediators, moderators, and independent, overlapping, and proxy risk factors. Table 3 shows the hypothesized relationships between the units.

Table 3 Hypothesized interactions among risk factors A and B to affect the outcome of lagged accidents

Based upon the Kraemer taxonomy and the hypothesized relationships, it may be suggested that passenger loading risk factor and vessel characteristics risk factor are independent. An argument could be made that passenger loading might be influenced by hull material due to the limits of various construction materials in that larger vessels tend to be constructed of steel and, as a result of their size, facilitate higher passenger loadings. However, since this study focuses on small passenger vessels, this effect is not considered. In future empirical studies used to develop an actual risk rating scheme, this should be considered and the boundaries of the model (see section 2.3) adjusted accordingly. Likewise, the operator characteristics risk factor would be considered independent of all other risk factors because the dominance is likely to be misclassified through expert judgment (as opposed to empirical evaluation). That particular combination of correlation, temporal precedence, and dominance is theoretically impossible. The route characteristics risk factor would be considered a proxy for the vessel characteristics risk factor—likely due to the potential impact of past regulatory interventions. passenger loading and route characteristics would be considered overlapping—again likely due to regulatory interventions like the issuance of tiered operating certificates that allow an inverse relationship between passenger loads and route exposure. This may indicate that these two risk factors derive from the same underlying construct.

Ultimately, these will need to be thoroughly tested using the empirical data available before the model could be deployed, but for the purposes of this proof of concept study, this cursory evaluation will be sufficient.

2.2.2 Human and organizational factors as catalyst

Operator characteristics may be viewed as a catalyst—such that the presence of this risk factor is necessary in order for two or more risk factors to combine in an adverse way, but that this combination does not impact the catalyst itself. While this model has not been developed in such a manner, there is considerable evidence to suggest that human and organizational factors play a significant role in maritime accidents (upwards of 80% of accidents are, in part, attributable to operator characteristics) (see e.g., Gemelos and Ventikos 2008; Bea 2002; Psaraftis et al. 1998). In the scheme presented above, if the operator characteristics risk factor were found to be catalytic, it would present as a mediator or moderator. Later, the exact nature of this relationship will be empirically explored. Figure 4 demonstrates one way (namely as a moderator) in which operator characteristics could act as a catalyst.

Fig. 4
figure 4

Moderator effects of human and organizational factors on risk

This model suggests that as relative risk increases, the potential and actuality for casualties increases, where casualty is a new dependent unit. A marine casualty is generally defined as an accident involving any vessel on the navigable waters of the USA including accidental groundings, collisions, damage to the vessel like fires or explosions, and loss of life or serious injury. As the catalyst operator characteristics improves, it serves to moderate, or diminish the effects (i.e., decrease the slope of the relationship) of relative risk on casualties.

2.2.3 Policy as catalyst

In the preceding portion of this paper, we have treated the theoretical model strictly as a descriptive model. In order to make the model a bit more useful, we will now introduce a new unit—that of policy. This unit is associative, measured as nominal, and can be identified as a subjective, indirect, descriptive input indicator. Policy could be described in many different forms—e.g., the introduction of “new” inspection regimes (as suggested by Degré 2007, Rousos and Ventikos 2008, and Conachey et al. 2008), providing enhanced response capabilities, or through additional yet to be defined non-regulatory measures. Figure 5 provides an illustration of that catalytic moderator relationship. Enhanced policy should reduce or mitigate the adverse relationship between the various risk factors or relative risk index and casualties or accident rates.

Fig. 5
figure 5

Potential impact of policy on risk factors—relationship to accident rates

2.3 Model boundaries

Boundaries are important to the specification of any theoretical model. For the purposes of this study, the boundaries of the model have been limited to the regulated US small passenger vessel fleet which can be observed in Fig. 6. This “closed” system, where there is no exchange between the system and its environment, is selected with the intent of reducing risk and improving safety through policy development.

Fig. 6
figure 6

Model boundaries

There are currently about 6,000 registered small passenger vessels with the requisite certificates to operate that fall within the boundaries of the model. Using interior and exterior boundary-determining criteria, it was determined that the set of units was sufficient. For example, by sub-setting the property space as in Fig. 6, affirmative criteria can be established to distinguish a unit or law of interaction from other possible types that would be excluded. While the units may seem to be generalizable to a larger population, in this model, these units have been restricted to include only those types within the boundaries. As the domain covered by this model is constricted, the number of boundary-determining criteria must increase—there is an inverse relationship between the two. For example, if we were to narrow the domain of this model from the entire small passenger vessel industry segment to that of say a specific port, region, class of vessel or sub-sector (e.g., only ferry vessels), additional boundaries would need to be defined. It should be noted that the greater the number of boundary-determining criteria and the smaller the domain covered by the model, the more the homogeneous the vessels will become within the model. Likewise, as the domain is “relaxed,” the model will be exposed to greater heterogeneity of vessels, operators, routes, etc. Additionally, as the model is extended and generalized, the boundaries must be reexamined to ensure they remain relevant. This model was empirically tested using data from a particular 18-month period and if it were to be used now or for a different type of vessel or domain, the model would require validation and verification. However, this is not the intent of this paper—instead, this paper is aimed at evaluating a particular model-building framework.

2.4 System states

A system state is defined by three features:

  • Characteristic values for units

  • Characteristic values are determinant

  • Portfolio of unit values is intransient

Using these three features, we can precisely define a system (i.e., a vessel) state as a whole by its essential distinctive features (i.e., risk factors). The system state of a particular vessel will be defined primarily by its summary relative risk rating, that defined by the risk factors for vessel characteristics, route characteristics, operator characteristics, and passenger loading. There are many potential possibilities in the ways in which we might define system state. For simplicity, we will briefly examine two such possibilities here.

2.4.1 System composed of two states

First, a binary system of “high” and “low” risk system states could be developed by selecting some arbitrary risk threshold value. This type of system has the advantage of being easy to implement with only two states, but suffers from a lack of precision and likely a sensitivity to small changes in risk factors. It would be easy to see how the “old” vessel on an “open” route with a “sketchy” operator with a “troubled” history carrying “many” passengers would present a high-risk state and that a “new” vessel on a “limited” route with a “quality” operator with an “unblemished” history carrying “few” passengers would present a low-risk state. However, when we move from the extremes closer to the threshold, it will become increasingly difficult to distinguish the difference between the system states. For example, suppose we had one vessel just to the “north” of the threshold and one just to the “south” of that threshold. Small changes in those risk factors or even the threshold will have a significant impact upon the system states.

2.4.2 System with intermediate transition state

Alternatively, rather than having a fine threshold line, a broader transition line could be established. If, in addition to the high- and low-risk states, a transition state was developed such that it presented a broad zone between high- and low-risk states, this would alleviate the sensitivity disadvantage of the binary state system above. In the model with three system states, “transition” in addition to high- and low risk, again-arbitrary or empirically established thresholds could be devised such that policy could be applied to the high-risk vessels and a separate policy deployed on the low-risk vessels such that one would not impact the other. One could imagine equal zones, a 10%/80%/10% zone breakdown, or something that was based upon a normal distribution that might be expected to arise as a result of the central limit theorem.

2.4.3 Frozen and variable states

Frozen states of units are not expected to change with time. The risk factors of route characteristics and passenger loading are clearly frozen. Additionally, even though age will obviously increase with time, the vessel characteristics will likely not shift significantly over time and can be considered frozen as well. On the other hand, variable or fluid states of units have the potential to change through time. Nagel (1961) called these types of states “state coordinates” because they determined the system state. In this model, the risk factor for operator characteristics could be considered a variable or state coordinate. This would align with the catalyst notion of human and organizational factors previously discussed.

Alternatively, the composition of risk factors could determine system state. If vessel state could be categorized just as a function of frozen states alone, the case could be made that there is no need to identify the critical threshold as it will be immediately defined. However, it does not seem reasonable that these system states could be predicated by frozen states alone, but rather as a function of frozen states and state coordinates. A sensitivity analysis to determine the elasticity of state transition based upon each of the risk factors will be conducted later in this paper as a part of the identification of empirical indicators.

2.4.4 System state defined

Dubin identified three criteria necessary for system states:

  • Inclusiveness—all units within the system have a value or a distinctive range of values in a particular state.

  • Determinant—individual units are measurable and distinctive for a particular state of the system.

  • Persistent—each state should have some life span or time.

In its simplest form, the overall state of the vessel system that satisfied the above criteria might be found in Fig. 7. This is similar to the two-state system described previously in this section, but without the risk abstraction. It defined two observable states—normal operations and pre-casualty.

Fig. 7
figure 7

Two-state system

A vessel would primarily reside in the state of normal operations (whatever that might be and it could potentially include unusual operations). On rare occasions, the vessel would shift to a pre-casualty (or casualty-eminent) state that would be the result and/or the presence of the necessary combination of risk factors. Typically, the undesired pre-casualty state will be short-lived as the vessel will be driven to return to its preferred equilibrium of normal operations. If, however, a vessel remains too long in the pre-casualty state, a casualty is likely to occur. Once a casualty occurs, the vessel would permeate the boundaries of this model. Such a state model would be useful for identifying which risk factors may be of use in predicting when this transition is likely to happen. Upon examination, while additional states could be added (e.g., birth, death, transition, etc.), these will not be considered in this model. Rather, when a vessel moves to one of these states, it will permeate the boundaries and will then be considered a part of another system that will require additional definition. However, it might be an interesting exercise in forensics to examine the transition from different states into the death state (whether prematurely or as expected), but again, that will be left for another time. Also, it would seem natural to apply Markov chain principles and state diagram techniques to glean information about transition probabilities from particular states to death, etc.

Now that the model has been well-defined using Dubin's theory-building framework, propositions can be developed.

2.5 Propositions

The most basic, if not trivial, proposition would be one that describes how relative risk is directly related to the risk factors. For example, if risk factors for vessel characteristics, route characteristics, operator characteristics, and/or passenger loading increase, the relative risk increases.

  1. P1a

    Relative risk is directly related to vessel characteristics

  2. P1b

    Relative risk is directly related to route characteristics

  3. P1c

    Relative risk is directly related to operator characteristics

  4. P1d

    Relative risk is directly related to passenger loading

These could be considered descriptive or even predictive propositions.

Likewise, another set of slightly less trivial propositions include those for when the risk factors increase, the vessel is more likely to experience a casualty.

  1. P2a

    Vessel characteristics is directly related to casualty

  2. P2b

    Route characteristics is directly related to casualty

  3. P2c

    Operator characteristics is directly related to casualty

  4. P2d

    Passenger loading characteristics is directly related to casualty

There are many potential versions of these sorts of secondary propositions. In Fig. 8, we see four distinct, but related categories of propositions.

Fig. 8
figure 8

Typology of basic propositions

Along the reverse diagonal of the typology above (i.e., the upper right and lower left boxes), there is a direct relationship between the independent and dependent units, they “move” in the same direction. On the diagonal (i.e., upper left and lower right boxes), there would be an inverse relationship. Given four risk factors and four potential ways to state this particular proposition, there are 16 different versions of this one proposition.

A third type of proposition could be a state proposition. For example, if the system persists in the “risk” state for a prolonged period of time, in addition to observing an increased number of casualties, system permeability will result and the vessel will leave the system and consequently enter another state—that is to say, death. This was discussed briefly in the multi-state system previously.

  1. P3

    Prolonged high relative risk is directly related to death.

Like the third proposition, a forth proposition could be put forth. Namely, equilibrium is established in the “non-risk” state. If the third proposition is not fulfilled, a vessel will typically pass through the risk state and return to the non-risk state such that a vast majority of time is observed in the non-risk state. This return to equilibrium may be a self-correcting mechanism whether or not a casualty occurs.

  1. P4

    Moderated relative risk is the equilibrium state to which vessels' return.

Finally, strategic propositions can be developed such that they define limiting or critical values for one or more of the units. These critical points could be maxima or minima, inflection points, or even trigger points where state shifts might occur. If we were to plot the relative risk or even specific risk factors as the independent variable and create a cumulative distribution curve, we could potentially classify our propositions such that they would inform our theory testing.

Figure 9 shows possible examples of the strategic propositions for the cumulative relative risk curve for a fleet of vessels. Proposition(s) P1 generally describe the overarching relationship and, thus, is descriptive but not strategic in nature. Proposition(s) P2 may be used to define the central inflection point. Proposition(s) P3 may be used to define the range where system permeability may occur, and proposition(s) P4 may be used to define that secondary inflection point at which vessels return to a non-risk state rather than being subjected to the state shift and system departure.

Fig. 9
figure 9

Examples of strategic propositions

3 Empirical indicators

Now that the theoretical model has been sufficiently established (as described by the first five steps in Fig. 2), attention can be turned toward testing of the theory (i.e., the remaining three steps described in Fig. 2). Using the model as a framework, empirical indicators for each unit must be developed in order for hypotheses to be formed and ultimately tested. An empirical indicator is the means by which a researcher measures the value of a particular unit. The next sections describe the process by which empirical indicators were developed and refined though statistical evaluation.

3.1 Development of empirical indicators

Returning to the results of the expert elicitation process described in section 2.1.3, in the process of categorizing the four major risk factors of vessel characteristics, route characteristics, operator characteristics, and passenger loading, several sub-factors were identified. While not explicitly a part of the model developed from the risk factors, these sub-factors were ultimately used in the development of the empirical indicators. Prior to assigning measurement scales to the sub-factors, the experts were asked to assign relative significance between the initial risk factors (see Table 1) using a modified version of the analytic hierarchy process (Saaty 1982) that had the experts make paired comparisons of each combination of risk factors (15 comparisons in total) and assign relative weightings between each pair. Results were compiled using an estimated Eigenvector approach, and the five experts relative weightings were aggregated using an arithmetic mean (Clemen and Winkler 1999; Seaver 1978). Prior to deploying this model, other forms of aggregations should also be considered (Shugan and Mitra 2009). Table 4 describes the results of the paired comparison approach to assigning relative weights to the risk factors.

Table 4 Experts’ aggregate relative weights for initial risk factors

The comparisons were examined and a consistency ratio was established. The consistency ratio is a consistency index, which counts the proportion of incoherent cyclic triads that are identified, divided by a random index, that which might be expected from a randomized response (Saaty 1980). In this case, the consistency ratio was 0.07. The threshold for concern is when values of greater than 0.1 are returned for the consistency ratio. Therefore, it was determined that the responses were sufficiently consistent. Combining the relative weights for these initial risk factors into rounded weights for the four risk factors of the model yields the following: vessel characteristics account for 20% of relative risk, route characteristics account for 30%, operator characteristics 20%, and passenger loading accounts for 30% of relative risk. These are indicated in the left column of relative risk index below.

The experts then developed scales for each of the four major risk factors based upon the most prominent sub-factors originally identified. Each sub-factor scale was developed again using a modified analytic hierarchy process. Each risk factor has an associated empirical scale that is based upon these sub-factors as presented in Table 5.

Table 5 Empirical indicator measurement scales for measuring relative risk

3.2 Statistical evaluation of empirical indicators

Next, a statistical analysis was performed upon these empirical indicators. Using the objective sub-factors scales (which accounted for about 65% of the risk index found in Table 5. Fig. 10 shows some of the actual work used to develop the proxy scales.), a proxy scale was developed. The proxy scale becomes a relational–statistical indicator of the direct measures for the sample population. The test will be homologous with unit class in order for the homology to be preserved. Using 18 months of data from the Coast Guard's Marine Safety Management System database, proxy relative risk scores were assigned to the population of small passenger vessels (i.e., those within the boundaries of the model). To demonstrate the overall explanatory power of the model, the proxy relative risk scores for the 18-month period and the marine casualty occurrences for the corresponding vessels during a lagged 12-month period were plotted on a cumulative distribution curve (see Fig. 11). Thus, this figure illustrates the dominant relationship between relative risk scores and casualties. By selecting a specific relative risk score, one can determine both the percentage of vessels above that score and the percentage of casualties attributed to vessels scoring above that score.

Fig. 10
figure 10

Actual work used in calculating risk proxy scales and scores

Fig. 11
figure 11

Relationship between identified relative risk and marine casualties and marine investigations

Thus, this figure illustrates the relationship between vessels' relative risk scores and their corresponding casualties. For example, consider the vessels that have proxy scores above 59 (the portion of the curves to the right of the dashed vertical line), 5% of vessels score above 59, and 33% of marine casualties are from vessels scoring above 59. Similarly, it may also be noted that 50% of marine casualties are attributed to 10% of the vessels, i.e., those scoring above 55. This Pareto relationship has the property of being beneficial for resource allocation decisions, which will be discussed in more details later.

3.3 Sensitivity analysis of empirical indicators for model

While the previous results would indicate that relative risk is an indicator of marine casualties, it is important to determine just how sensitive those results would be to shifts in the arbtrarily assigned weights in the proxy model and also how well the entire model (described in Table 5) would stand up in practice. The former was examined empirically by creating a family of alternative models and examining how changes to risk factor weights and risk factor rating scales would impact the overall results, i.e., how often marine casualties could be attributed to the relatively highest-risk vessels. To accomplish the latter, the relative risk model was packaged in a small passenger vessel risk rating tool and provided to Coast Guard marine safety offices in two US ports for a demonstration testing and evaluation on their local small passenger vessel populations.

As noted, in addition to testing each risk factor as an empirical indicator (and it should be noted that collectively, the risk factors are more informative), a sensitivity analysis was performed on the proxy model by creating a family of models that were slightly altered from the original. As a cautionary note about relative empirical indicators, validity does not focus on the converged empirical indicators, but on the individual units themselves. In this model, the individual empirical indicators did prove to be homologous with the relative indicator. The seven model family consisted of the following perturbations of the original “expert” model:

  1. Model 1.

    Passenger loading scale inverted based upon marine casualty histories

  2. Model 2.

    Egalitarian weighting (i.e., 25%) provided to each risk factor

  3. Model 3.

    Passenger loading weight reduced by 10%

  4. Model 4.

    Route characteristics weight reduced by 10%

  5. Model 5.

    Passenger loading weight increased by 10%

  6. Model 6.

    Route characteristics weight increased by 10%

  7. Model 7.

    Operator characteristics weight increased by 10%

Whenever a weight was increased or decreased (ie., models 3 through 7), the decrement was evenly distributed across the remaining risk factors so that the total relative risk weight = 100%. Figure 12 illustrates the relative performance of the family of models in comparison to the original expert model.

Fig. 12
figure 12

Sensitivity analysis of family of models

For each model, the figure shows what relative risk score is represented by the 95th percentile of vessels (horizontal axis), how many marine casualty cases are captured by vessels scoring above that 95th percentile score (vertical axis), and also proportion of high-risk vessels the derivative model had in common with the original expert model (as indicated by the size of the bubble). For example, if the bubble for a particular model is almost as large as the bubble for the orignial expert model (e.g., model 1), then roughly the same vessels were captured by both models; whereas, if the bubble is considerably smaller than the original expert model (e.g., model 6), then there is a significant difference in the sample of vessels obtained.

It should also be noted that the lower the relative risk score, the greater the attribution to marine casualties. This finding illustrates the relative positioning of the model curves and may be a factor in model selection, but since this study is devoted to the proof of concept for using Dubin's theory-building framework, the original expert model will be the focus of the remainder of the paper.

3.4 Beta testing of small passenger vessel risk rating tool

As mentioned previously, the original expert model was formed into a relative risk rating tool (using the results from Table 5). This small passenger vessel risk rating tool was then pilot tested at two US ports—Providence, Rhode Island, and Seattle, Washington.

In the Providence test, 18 small passenger vessels were scored using the relative risk rating tool. Those vessels already considered to pose an increased risk scored higher than the others. The port officials indicated that the results were not surprising and that the highest-scoring vessels were already receiving elevated concern and attention. They suggested modifications to the tool including a suggestion that passenger loading could be a multiplier rather than an additive risk factor. In the Seattle test, 30 vessels (15% of the small passenger vessel fleet) were evaluated using the relative risk rating tool. They selected vessels suspected to be from all parts of the spectrum of relative risk and found that the top and bottom third received scores that were as they would have expected. It should be noted that these were small stratified samples (approximately 10% of the population) and were not used as part of the empirical testing of the model. Instead, this beta testing served as a form of eliciting face validity and the specific input from this beta-testing was not incorporated into the eventual model.

At the same time, the small passenger vessel risk rating tool was shared with various parties within the small passenger industry segment and it was met with a range of responses from absolute support for the model and its development to an almost visceral denouncement of the model suggesting that its use would be tantamount to illegal profiling.

In general, the beta tests confirmed the results that were analytically obtained using the empirical indicator testing. The unit of relative risk employed in the model developed in this study as measured by the small passenger vessel relative risk rating tool results in an effective means of “measuring” the perceived risk posed by specific small passenger vessels.

4 Discussion

The process for developing theory and models can often be ill-defined or haphazard. One of the purposes of this article is to demonstrate how Dubin's theory-building and testing process (described in Fig. 2) can be successfully applied to risk modeling in the maritime domain. This research process has several implications.

First, the moderation-of-process design used this study should prove to be an effective method and will serve as a complement to the more prominent causal chain designs already widely used in risk analysis. Additionally, the process used in this study has been sufficiently illustrated and demonstrated such that practitioners could apply this process within their domain to further their efforts to develop risk assessment and management systems. In fact, under the Paris Memorandum of Understanding and similar Port State control regimes, more general risk-factor models have been in use for decades. What distinguishes this model is that it is narrower in its focus; it focuses on a specific fleet and type of vessels within a particular region. Thus, while there are some significant similarities in the risk factors (e.g., vessel characteristics and operator characteristics), the model presented here is unique due to its focus (including the route characteristics and passenger load risk factors).

Second, the model developed in the study, can be used within the maritime domain, with a little additional validation and verification, such as evaluation using Patterson's (1986) eight criteria for theory building. While identifying relative risk for individual small passenger vessels may be beneficial, it is envisioned that a more effective use of this model would be analysis at the fleet level. By evaluating individual small passenger vessel risk and then rolling that up to a set of summary findings and trends, gaps in the overarching safety regimes could be identified. Earlier in this article, policy was introduced as a potential catalyst, or moderator, for improving safety through risk reduction. Figure 13 provides a template to enact a segment-wide approach to improving safety.

Fig. 13
figure 13

A risk management framework for the small passenger vessel industry segment

It is broken down into the typical three components of risk management—characterization, assessment, and management. The model developed in this study and presented in this article would be used to “quantify” relative risk in the risk characterization phase. The risk characterization would then be used as screening devices to identify a subset of the identified highest-risk vessels (which as demonstrated by this model, are those responsible for disproportionate number of marine casualties). Using a threshold value for relative risk, specific vessels within the fleets at each port would be identified for additional scrutiny—risk assessment. Alternatively, vessels that are relatively low risk may be eligible for reduced scrutiny. The risk assessment might use something like the ten-step risk analysis process presented in the Passenger Vessel Association's Risk Guide (Passenger Vessel Association 2000). The process in the PVA Risk Guide estimates probability and consequence in order to quantify risk for specific aspects of each vessel's operations. It also goes on to identify the most cost-effective risk reduction solutions using a cost-benefit analysis. Immediately, vessel-specific risk reduction and management may be effected. Additionally, by compiling risk reduction measures, regional and national risk reduction strategies can be developed as a part of more global risk management processes.

Third, this risk management framework can be evaluated using existing data for ports that have currently or previously employed a small passenger risk rating tool to allocate resources. The data could be broken down into a split-half time series analysis using a quasi-experimental pre- and post-test design. This study extension is currently underway and will determine the efficacy of the risk management framework just described. The principle hypothesis would be that the relationship between small passenger vessel risk and marine casualties will be weaker for risk-informed policy regimes than for those that do not consider risk.

Finally, as discussed in Holton and Lowe (2007), theory-building scholars should conduct more of this type of research to advance the discipline of risk analysis and to make theory building more accessible to practitioners by using an intra-disciplinary approach. Dubin's method provides a coherent process for theory building. Other methods such as case study (e.g., Dooley 2002; Yin 1994) or grounded theory building (e.g., Egan 2002; Glaser 1992) are equally well-developed and could provide additional insight into the study of the discipline of risk analysis and also the maritime domain. This article represents advancement in that direction, yet, much work remains.