Integrating Organizational and Management Variables in the Analysis of Safety and Risk

For decades, despite research and description of modern large-scale technologies as “socio-technical systems”, there has been little headway made in integrating research on both the socio and technical aspects of these systems. Social scientists and engineers continue to have contrasting and often non-intersecting approaches to the analysis of organizational factors and the physical aspects of technologies. This essay argues that an important part of this problem has been the ambiguous and underspecified character of the social science research concepts applied to the analysis of organization and management factors. It suggests an important opportunity to more closely integrate social science research into the understanding of hazardous technologies as socio-technical systems through a strategy of clarifying concepts and definitions (such as “safety”) that allow transforming qualitative organizational and managerial “factors” into variables to create metrics useful in the evaluation of safety management systems. It argues also that practitioners have an important role to play in this process. A final argument addresses the contribution that safety metrics could make to the development of higher resolution safety management across a wider spectrum of scales and time-frames than those currently considered by managers and designers of socio-technical systems.

technologies [15]; the understanding of management challenges posed by complex technologies [5,16,22]; organizational and managerial dimensions of high reliability in the operation of hazardous technologies [1,11,19,20,23]; and, more recently, the analysis of catastrophic accidents involving complex technical systems [13,[16][17][18]26]. Current accident analyses almost always identify organizational and managerial factors as root if not proximate causes of these accidents.
Yet for all of this development the research that has explored the organizational and managerial side of technical systems remains in the main un-integrated into the perspectives taken by engineers and many managers in their technical designs and organizational practices. Instead of thinking of both organizational and technical dimensions together as part of socio-technical systems, many designers and managers continue to think of humans and organizations as simple extensions of machines or as sources of error in the proper operation of technical systems. Here, for example, is one recent description of human factors engineering offered by an engineer at a safety meeting: If you open the plates of a circuit breaker, you will eventually have an arc. You don't want the electrons to arc, but no engineer would say that the electrons that formed the arc were lazy or complacent: if you don't want the arc, you engineer the system around the constraint. Human factors engineering operates according to the same principle; identify the constraints in the interactions between the employees and the workspaces, tools, and technology, and engineer around it. 1 Meant to be an argument against attributing accidents simply to operator failures, this statement at the same time reveals a narrower engineering perspective. We know from socio-technical systems research that human and organizational factors can be a support to design and not only a constraint. For example, engineers may make design errors or offer incomplete designs that humans can identify, and organizations can correct. We know also that human behavior, despite its aggregate regularities, is less predictable and has more variance in particular cases than the physical laws and principles within which engineers design. Given that technologies are socio-technical systems, we should expect engineers to incorporate human and organizational factors more deeply into designs and not simply design "around them".
However, integrating organizational variables into technical design processes poses many challenges.

Challenges to Reconcile Them
It is not by oversight that organizational and managerial variables are often neglected, in engineering or risk research. A range of large divides exist between organizational and management research, and the performance and risk variables typically attended to by commercial organizations and the regulatory agencies that oversee them. Considering these divides will provide clues in developing strategies to achieve both a research and practical integration of social and technical variables in the understanding and practice of safety management.

Technical and Methodological Differences
Concepts and definitions of physical or mechanical variables are largely agreed-upon and formally expressed through stipulated meanings in artificial language such as physical descriptions or mathematical models and formulas. Most are measured along interval scales.
Social and organizational factors such as leadership, authority, centralization, decision-making, motivation, mindfulness, stress, culture and even "safety" itself are grounded in concepts expressed in natural language with all of its ambiguities and imprecision [7,8]. These concepts are then difficult to translate into measurable "variables". These organizational and managerial "variables" are often defined as nominal categories (e.g. "high reliability" organizations) or described as opposites in binary pairs (e.g. flexibility/ridigity or centralization/decentralization) not as continuous scales of measurement [24]. These are "factors" but not really variables.
Further, much safety and accident research is in the form of case studies which are difficult to compare and aggregate because of their elements of uniqueness. Often the management or organizational failures are described in non standardized terms that do not allow comparative measurement. It is also difficult to learn about the impact of organizational and managerial factors across cases because without interval measures, we cannot construct regression models to determine their separate contribution to given outputs.

Practical Challenges
Because organization and management concepts are likely to be categorical and not easily expressible in ordinal or interval measures, it is difficult to connect analyses of them as factors with physical and mechanical variables for purposes of modeling integrated relationships in affecting the safety or performance of an organization. Also, many of the social sciences that analyze organizational factors are, unlike engineering, not "design sciences" with research directed toward formal design principles and cumulative findings to guide action and application.

Political Challenges
Finally, there are political problems with employing organizational and managerial factors in an integrated analysis of safety. Often these factors have implications that raise the political temperature surrounding their development and use. Business organizations may resist leadership, decision-making or culture analyses because of their potential implications for assessments of managerial competence or effectiveness. Regulatory organizations may avoid using organizational and managerial findings because of their vulnerability to political or legal challenges if they base regulations and enforcements on what will be challenged as ambiguous or subjective measures and assessments.
How, given the diverse analytic domains of physical models versus organizational factors, do we find a way to combine them in an additive way to improve our understanding, management and regulation of safety and risk in complex technical systems? Important risks and opportunities call for closer integration between the two research approaches, but we are currently far away from this objective, with a mutual ignorance, indifference, or even hostility, between researchers in these two domains. The recent stress on safety management systems (SMS's) by industry groups and regulators has created growing demand for careful analysis of the implementation of these systems and the measurement of their impact on rates of incidents and accidents. How can we address these opportunities?

The Need for Clarifying Key Concepts
Among the key organizational concepts that lack clarity is the concept of safety itself, and the relationship between safety and risk. For many designers, managers and regulators, it is all too often assumed that "safety" is synonymous with the mitigation of risk. "How much safety are we willing to pay for?" is often a question about "Which specific risks are we willing to address?" But a report on aviation safety by a group of representatives from 18 national aviation regulatory agencies concluded the following: Safety is more than the absence of risk; it requires specific systemic enablers of safety to be maintained at all times to cope with the known risks, [and] to be well prepared to cope with those risks that are not yet known [21]. Safety is about assurance; risk is about loss. Safety is in many respects a perceptual property, "defined and measured more by its absence than its presence" [18]. It is hard to establish definitively that things are "safe", but much easier to recognize specific conditions of "unsafety" retrospectively in the face of accidents. Risk is a calculated property. Several failures or incidents can occur without invalidating a risk estimate (two 100-year storms in consecutive years for example), but a single failure can disconfirm the assumption of safety. This distinction also applies to a difference between safety management and risk management. Risk management is managing to probability estimations which apply to events over a large run-of-operations or number of years. Safety management is managing down to the level of precluding a single event in a single operation at any time [3].
Karl Weick's definition of safety as "the continuous production of dynamic nonevents" [28] offers more promise. Here "safety" defines positive actions-identifying potential sources and consequences of accidents (including incomplete or unforgiving technical designs), acting to prevent them, constantly monitoring for precursor conditions that add risk or uncertainty, training and planning for the containment of consequences of accidents if they do happen-in short safety management. As part of this definition, it is important to understand the distinction between safety as "dynamic non-events" and non-events in systems without careful management that simply have so far "failed to fail". Unfortunately, there is at present significant confusion about this conceptual difference. How can we distinguish non-events that are simply "failing to fail" from those dynamic non-events that reflect effective safety management, without having to wait for an accident? The answer partly lies with understanding and measuring the implementation process of safety management systems.

Some Propositions about the Implementation of Safety Management Systems
There is an important difference between implementing the structural features of an SMS in an organization-safety officers; safety plans; formal meetings; safety budgets; formal accountability and reporting relationships-and • achieving a widely distributed acceptance of safety management as an integral part of actual jobs in the organization, • a collectively shared set of assumptions and values concerning safety (a "safety culture") and • commitment to safety as part of the individual identity of personnel in an organization.
Without wide and deep employee engagement, an SMS will simply be an administrative artifact without a strong connection to actual behaviors that link to safetypromoting performance and safer outcomes. Further, it takes time, persistent effort, adaptive behavior, continuous monitoring (with metrics) and correction to implement and maintain an effective SMS. These propositions lead us back then to the earlier essential question about determining the effectiveness of a safety management system, without having to wait for an accident. One answer is to develop metrics to detect the full implementation and integrity in operation of an SMS: • metrics for organizational and managerial conditions and practices-both positive and negative-that give information about the condition of safety management itself [10,21,25] and • metrics identifying and addressing precursor management to add granularity to safety performance assessments apart from accidents.

A Strategy for SMS Metrics Development
Retrospective measures already exist for incidents and accidents, many required by law and regulation. The strategy of SMS metrics is to provide precursor indicators so that the integrity of an SMS can be assessed before an accident occurs. The precursor strategy is well illustrated by research on "High Reliability Organizations" (HROs) such as selected nuclear power plants, air traffic control organizations, high voltage electrical grids that were known for effective safety management [11,12,19,20]. This HRO research led to the recognition that a key to high reliability is not a rigid invariance in operations and technical and organizational conditions, but rather the management of fluctuations in task performance and conditions which keeps them within acceptable bandwidths and outside of dangerous or unstudied conditions [23]. Supporting this narrow bandwidth management is the careful identification, analysis, and exclusion of precursor conditions that could lead to accidents or failures. HROs begin with those core events and accidents they wish never to experience and then analyze outward to conditions both physical and organizational that could, along given chains of causation, lead ultimately to these accidents or to significantly increased probabilities of them. This "precursor zone" typically grows outward to include additional precursor conditions based on more careful analysis and experience. These precursors are leading indicators, for these organizations, of potential failures and are given attention and addressed by supervisors and managers. Precursors are in effect "weak signals" to which "receptors" throughout many levels of the organization are attuned and sensitive. In its effectiveness, a process of precursor management with metrics can impart a special kind of "precursor resilience" to organizations [20]. With an effective safety management system, they can move back from the approach to precursor zones quickly and still maintain the robustness of safe performance and reliable outputs.
Metrics should reflect models of causation pertaining to safety. It should be clear why they are important as metrics. This is promoted by the leading indicator strategy and its underlying analysis. The identification of precursors through their potential connection to accidents provides validity to them as metrics.
Single, high-value metrics offer perverse incentives to "manage to the metric" or to distort the measurement process itself. Or as one manager once conceded, "organizations will do what you inspect but not necessarily what you expect!" More metrics with more data if possible should then be developed to cover each element of a safety management system to be assessed and improve the overall reliability of the process.
Finally, safety management metrics should be widely accepted in an organization as important tools for learning, not as instruments of control and punishment. To promote their acceptance, they should be the product of a joint development process which includes regulators, organizational researchers and participants at a variety of levels and across departments and units. Individuals at the level of task performance often have tacit knowledge and practical insights about conditions that support or detract from their safe performance and measurements, both direct and indirect, that can reveal these conditions. The metrics that are developed should make sense to all participants.

Achieving Higher Resolution Safety Management
The integration of SMS metrics with physical and engineering analyses can lead to a more powerful socio-technical understanding of complex systems, their operation and their risks. But coupling this understanding to safety requires also that we increase the scale, scope and time frame of safety management itself. Here are some examples.

Shifts in Scale: Micro-analysis
Many precursors to system failures can be found in conditions that surround the performance of specialized tasks. Human factors research addresses some of theseincluding attention load, noise levels, ergonomic requirements that induce fatigue or injury. More recently cognitive work analysis research has focused on micro-level task psychology, sub-cultures and roles associated with successful task performance relative to particular technologies or missions [14,27]. For example, robotic surgery has led to changes in the roles of surgeons and support groups and requires personal resilience among surgeons to deal with unexpected issues as well as new methods for surgical training [2].
A similar micro-analysis has also been applied to understanding the role of "reliability professionals" prominent in the operation of HROs [19,20]. These are individuals who have special perspectives on safety and reliability, cognitively and normatively. They mix formal deductive knowledge and experiential knowledge in their understanding of the systems they operate and manage. Their view of the "system" is larger than their formal roles and job descriptions, and frequently center on real-time activities. They internalize norms and invest their identity in the reliable and safe operation of their systems. In this they are "professionals" on behalf of reliability and safety, but not defined by particular degrees or certifications.
This degree of granularity allows the identification of SMS implementation down to the level asserted as important in the first proposition: to be successful it must include achieving a widely distributed acceptance of safety management and safety culture as an integral part of actual jobs down to the level of specific tasks. Micro-level analyses can lead to metrics that can be indicators of this degree of implementation. Note that the shift to this micro level also means an analysis of actions and behaviors over short-time intervals, in the real-time operation of a technical system.

Shifts in Scale: Macro-analysis
At the other end of high resolution is the ability to analyze actions and behaviors over larger scales and scope and with effects over considerably longer time intervals.
Here the analysis and measurement would move beyond a single organization and its SMS to cover network safety and reliability [4]. This leads to a consideration of safety management in relation to interconnected risks among infrastructures [20] and across sectors.
Transmission planning for large utility grids, for example, is a process that can cut across large populations and across nations. Generally, it has to look ahead over a 5-10-year time frame to anticipate electricity demand patterns and new generation technologies as well as to encompass the time it takes to translate plans into actual construction of new transmission lines and capacity. But as one grid management analyst noted: "What goes on in planning eventually ends up in operations." That is, activity and management on this time frame will eventually impose itself on day-to-day real-time grid operations.

Elongated Time Frames
Many interconnected risks span an international and even a global scale and an intergenerational time-frame. Problems such as global climate change and sea-level rise are slow-motion issues which convey inter-generational risk. These safety management problems will need to be addressed across many different sectors on a global scale over the next 20-50 years.
Similarly, long term effects of nuclear waste disposition and storage are safety management challenges. But they require planning and possibly ongoing safety management attention over decades, if not centuries. We currently pay attention to planning for reliability of infrastructures, but we will have to pay more attention, with metrics, to reliable planning itself as a management process. Larger scales and longer time frames also require that safety management be supported by social policy and regulation.
Analyses of safety management across these scales and time frames can lead to a higher resolution additive understanding of organizational and managerial factors in safety and reliability, running from macro to micro levels of analysis over long-and short-term-time frames. Then we can analyze the safety interconnections between the levels and time scales-how what happens or does not happen at one level of planning and management scale can affect the safety of operating conditions at another. How culture, roles and psychology surrounding individuals in their specific tasks can Fig. 1 A higher resolution safety management framework affect their performance and how this performance in turn impacts system safety well beyond that task. The following figure (see Fig. 1) is one integrated illustration of the scale and scope of organizational and managerial attention in relation to the time frame needed for action to promote safety.

Conclusion
This paper began with an expression of disappointment over the lack of progress in integrating organizational and management variables with physical models into our understanding of technologies as socio-technical systems.
It concludes with the recognition that it will take a large and persistent R&D effort to achieve the integration of organizational and managerial variables as safety management metrics into the physical analysis of technical systems. But an integrated understanding of socio-technical systems, across scales, scopes and time, could significantly add to our understanding of how to manage and ultimately design them for increased safety.
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.