A wide spectrum of services provided by intelligent critical infrastructures (e.g. Smart Grids) heavily depend on Cyber-Physical Systems (CPS) that are able to monitor, share and manage information. On the other hand, an increasing number of cyber attacks and security breaches are part of rapidly expanding cyber threat, which in many cases has form of cyber terrorism.
The cyber-physical security can be analysed from classical crisis management point of view. In fact, most of incident management processes in the cyber domain follows the ITIL model that is depicted in Fig. 2. It focuses on incidents detection, diagnosis (e.g. identification of exploits that attacker exploited), repairmen (e.g. elimination of the software vulnerability that attacker exploited), recovery and restoration (e.g. to normal business operation status).
However, this type of model may not properly show the iterative nature of continuous improvement that usually are implemented after the crisis as an element of lessons learnt. Therefore, the model of cyber security life-cycle would be that one which is intended to define how to prevent, detect, respond to and recover from cyber crisis, and finally to avoid reoccurrence. Thus, we can define Cyber Attack Timeline, illustrated in Fig. 3, which is constituted of the following three phases:
-
A Pre-Crisis (Steady State) phase in which organization aims at providing all services as usual while increasing the preparedness to an critical event. For this phase it is important to have risk management process that will allow the organization for risk anticipation and proactive response.
-
A Crisis phase in which a threat has to be maintained and system recovered. It is an emergency case in which it is necessary to change the approach so that threats can be quickly removed and their effects mitigated.
-
A Post Crisis phase during which the “lesson learned” as a result of the Crisis phase needs to feedback the whole process in order to reduce its impact in the future.
In this section, we further elaborate on different aspects related to cyber security of CPS systems that is embraced into crisis management phases namely: prevention, detection, containment, and post-incident.
4.1 Pre-crisis Phase
4.1.1 Prevention and Proactive Response
The cyber security prevention is an important aspect when it comes to cyber-physical systems and its impact on critical infrastructures. It requires some amount of the resources to be allocated, however, it is better than often costly recovery (or in worst case no recovery at all). As the value and importance of prevention is at least well acknowledged in the communities, it is still in many cases perceived as product that can be purchased and deployed in an organisation. In fact, the prevention is long-lasting and continuous process reaching far beyond technical problems embracing organisational, regulatory, and human aspects.
Particularly, the cyber attack prevention requires (within the organisation) well established roles that will be responsible for containing the cyber attack and its causes. This implies that an organisation should define detailed cyber incident response plan that will describe how an incident should be reported, investigated and responded (Fig. 4). Moreover, when the cyber incident involves personal information, it implies various data privacy and security laws that may have different shape in different countries.
As mentioned in [21], it is very important for Critical Infrastructures operators to identify the risks posed by the communication networks and existence of dependencies with third party systems. This is even more important form wider perspective, because such risk anticipation can prevent the possibility of cascading failures causing catastrophic system damages.
The risk management cycle is a comprehensive process (Fig. 2) that requires organizations to:
-
frame the risk (i.e., establish the context for risk-based decisions),
-
assess the risk,
-
respond to the risk once determined,
-
monitor the risk.
Usually this requires effective communication and an iterative feedback loop, that will facilitate continuous improvement in the risk-related activities.
As it is suggested by ENISA [22], a good practice for well-suited prevention mechanisms is to subscribe to relevant information sources that would give up-to-date overview of current cyber threats and incidents reported. ENISA also stresses the importance of information sharing.
More local (service based) approach to risk modelling has been proposed by OWASP [23]. The approach follows the idea of decomposition of complex system to smaller components (see Fig. 5 Threat Risk Modelling proposed by OWASP). It is important to stress the fact that all key players (e.g. security officers, employees) need to understand the security objectives. Therefore, usually the complex system is broken down into objectives such us: reputation, availability, financial, etc. Other security objectives may be enforced by the law (financial or privacy laws), adapted standards (e.g. ISO).
The key element of this risk assessment methodology is the possible threats identification. Microsoft has suggested two different approaches to identify those threats. One is a threat graph (see Fig. 6), as shown in Fig. 2, and the other is a structured list.
4.1.2 Threat Detection
The capability of early detection of cyber threats is a very important element for good cyber crisis preparedness. Probably, one of the most classic way to categorise the cyber attack detection technique is to assign them into one of the following groups, namely: signature-based, anomaly-based or hybrid (Fig. 7).
Each of this class of algorithms has their drawbacks and advantages, and different approaches to identify attacks. Some of the methods have also different methods for data aggregation (e.g. host-based or network-based) and traffic properties description (e.g. packet-based analysis or aggregated connections flows). All the above mentioned aspects are dissuaded in the consecutive subsections.
The Signature-based category of cyber attacks detection typically include Intrusion Prevention and Detection Systems (IDS and IPS) which use predefined set of patters (or rules) in order to identify an attack. The patterns (or rules) are typically matched against a content of a packet (e.g. TCP/UDP packet header or payload). Commonly IPS and IDS are designed to increase the security level of a computer network trough detection (in case of IDS) and detection and blocking (in case of IPS) of network attacks.
Commonly the patterns an attack for IPS and IDS software are provided by experts form a cyber community. Typically, for a deterministic attacks it is fairly easy to develop patterns that will clearly identify given attack. It often happens when given malicious software (e.g. worm) uses the same protocol to communicate trough network with command and control centre or other instance of such software. However, the problem of developing new signatures becomes more complicated when it comes to a polymorphic worms or viruses. Such software commonly modifies and obfuscates its code (without changing the internal algorithms) in order to be less predictive and easy to detect.
4.2 Crisis Phase
In this phase risk management is not important, because it gives priority to incident management in order to solve crisis and mitigate threats by adopting proper countermeasures. However, it is worth mentioning that the emergency and contingency procedures adopted during a Crisis Phase are developed during the Pre-Crisis phase. In other words, during the Crisis phase it is not only important to have an overall situational awareness picture, but also to have a strategy to recover form crisis in the most efficient way possible. There are different models for cyber incidents handling. For instance, ENISA defines (see Fig. 8) formal manner starting from incident reporting, going through analysis and recovery, and concluding with post-analysis followed by improvements proposal. This model of cyber crisis response is widely adapted by Emergency Response Teams (CERTs). According to definition provided by ENICS [24] CERTs are the key institutions that are obliged to receive, inform and respond to cyber security incidents. At the same time, they act as educational entities in order to raise the cyber-related awareness and provide primary security service for government and citizens. Every single country that is connected to the Internet should have capabilities to respond to cyber-related security incidents. Nevertheless, not every country has such capabilities. One of the earliest CERT teams focused on critical infrastructures was the US ICS-CERT (Industrial Control Systems Cyber Emergency Response Team) that was established in 2009 [25]. This institution aims at reducing the impact of cyber attacks. In order to achieve this goal ICS-CERT takes preventative actions such as vulnerability monitoring and reporting (each year ICS-CERT releases annual reports in order to spread the information about the security incidents).
However, before the actual incident handling will take place, usually the incident is verified and pre-classified, in order to assess its significance, severity and time constrains required to resolve it. This activity is named triage and refers to situation in which there are limited resources and the decision maker has to decide on the priorities of actions relying on the severity of the particular cases.
An important thing, which is not directly reflected by the incident handling model, is fact that CERTs also collaborate with other Computer Emergency Response Teams that are part of international or private sector institutions. This cooperation allows the CERTs to share the information about control systems-related security incidents and mitigation measures.
4.3 Post-crisis Phase
The post crisis phase is the phase in which threat has been eliminated and system has been repaired, thus allowing the restoration of provided services and return to usual business activities.
As recent cyber incidents show, it is important for the Critical Infrastructure operators to have employees that would be educated and skilled in cyber security aspects. The post-crisis phase is important for an organisation to draw some conclusion after the crisis and use this time as an opportunity to increase the number of cyber security professionals at various levels of skill and competence, as well as to upgrade the competence levels of the already hired staff.
In fact, learning from previous experiences is a continuous process for the organisation. According to the terminology adapted in [26] this problem can be decomposed into:
Obviously, in order to address all of above mentioned aspects, it is necessary to have resources allowing for relevant data gathering and analysis. In many cases, dedicated tools facilitating the end-user with such functionalities are used. Particularly, in the post-crisis phase it is necessary to collect the lessons learnt and analyse the overall crisis scenario from wider perspective in order to identify root cause of the crisis and procedural pitfalls that may have been identified. In particular, a new risk analysis must be performed in order to evaluate if the previously defined security controls are still effective and to estimate whether risk levels have been changed.