The goal of this work is to present an architecture that aims at overcoming the limitations of the current literature on security frameworks and distributed platforms.
The proposed architecture is logically composed by four different macro-blocks. Each macro-block is characterized by a set of components that implement functionalities peculiar of the macro-block they belong to. The first macro-block is implemented locally in the Local Agents of digital services. It is composed by all that parts that add more security capabilities to the local services for monitoring, inspection and enforcement purposes. The second macro-block is the Security Manager. It constitutes the centralized part of the framework and includes all the components that collect and process data from local services, implementing mitigation and reaction strategies. The third macro-block is the Identity Management. It permeates the large majority of the components of the framework, being present both in the local and centralized parts of it. The Identity Management is mainly responsible for the coordination of digital identities and access policies, and performs identity protection and access control functionalities. The fourth macro-block is the User Interface. It is also implemented locally, and regulates the human–machine interaction for a tailored presentation of analytics to different kinds of users, and the definition of control and management policies to react to security issues.
Overall, this reference architecture follows the typical structure of Security Information and Event Management (SIEM) systems. However, some relevant extensions are necessary, to effectively tackle the technical and procedural challenges brought by the dynamic composition of digital services. To this respect, interactions among the macro-blocks described above is performed through standardized security Application Programming Interface (APIs), that are exposed by any digital service, be it a (cloud) application, a virtualization infrastructure, a serverless function, an IoT device, etc. They are implemented at both the control and data planes. In the control plane, APIs deliver control and management data used to discover security capabilities and enable, disable and configure the security functions in the Local Agents. In the data plane, APIs allow the local security functions to report the collected set of events, data and measurements to the Security Manager for the application of advanced security services. All these components will be described in detail in the subsections that follow.
Local agents are in charge of collecting service descriptors, events, data, and logs (collectively indicated as security context). The purpose is to expose some internal information of each service, so to allow the detection of multi-vector threats and to improve the trust in service operation. Multiple agents should be present to collectively cover at least the following scopes:
inspection: collection of data, events, measurements from heterogeneous sources (application logs, system calls, network traffic) that can be used to detect attacks and identify new threats;
tracking data belonging to users through metadata, with explicit identification of personal and sensitive information that may raise privacy issues;
configuration analysis: to report incorrect, faulty, or weak settings as lack of encryption, weak or blank passwords, unnecessary network sockets in listen state, outdated or buggy software versions, etc.;
certification of the origin and integrity of the software component, identity of the vendor/seller, etc.
An exporter function is responsible to authorize access by any remote party, according to the settings of the owner, as well as to configure the reporting behavior, e.g., by changing the frequency and/or verbosity of context information. An enforcer function applies enforcement policies: packet classification and filtering, removal of private and/or sensitive data, configuration changes. Enforcement will also cover data protection, by ensuring they are accessed, shared, and exported according to their owner policies in terms of data minimization, purpose limitation, integrity, and confidentiality.
Despite the large numbers of tools already available for monitoring and inspection, their usage in a multi-tenancy context is not straightforward. As a matter of fact, they should give visibility over local resources to external entities, so it is challenging to restrict the scope to a subset of resources in case of multi-tenancy. It is also important to ensure that only authorized entities have access to these components, to avoid making them an additional threat.
A very important requirement of the local agents is that they should be lightweight to not require additional resource allocation, with a small footprint on service execution. They have also to be efficient without increasing the attack surface. Security functions in local agents are controlled by a local management and control component, being responsible for managing the software of the functions, reporting information on their correct utilization, monitoring their internal structure, and generating report messages. It can also inspect traffic for security purposes, but anyway it provides descriptive information related to security functions.
The implementation of the security agents should be tailored to specific services, given the large heterogeneity of digital resources: applications, devices, functions, SaaS, or even more complex resources like a cloud infrastructure or a NFV framework. In this last case, two possible implementation scenarios can be realized. In the first scenario, NFV can be viewed as a digital service itself, providing connectivity and networking functions on demand. Security agents can be used to monitor both VNFs and the virtualization infrastructure. Here a management and orchestration functional block (e.g., the NFV-MANO ) is needed to manage and orchestrate the VNFs, but only partially, since some security agents may be present in the infrastructure and therefore are not manageable by an orchestrator. In the second scenario instead, single VNFs can represent digital services that are orchestrated by NFV-MANO which, in this second scenario, can be used to automatically deploy and manage security agents within each VNF. These two examples give an idea of the different possibilities to implement local agents in a virtualized infrastructure, and how the NFV-MANO can be employed to manage and orchestrate the VNFs.
Remote collection of logs is already a well established practice, with many frameworks available for this purpose (Scribe, Flume, Heka, Logstash, Chukwa, fluentd, NSQ, and Kafka). From a research perspective, the real challenge is programmability, which is the capability of the framework to dynamically adapt operations to continuously evolving attack patterns, defining and updating monitoring, inspection, and enforcement tasks accordingly. It goes beyond plain configurability at run-time (e.g., to adjust the verbosity of logs, frequency of sampling, and other tuneable parameters), since programmability also includes the definition of new tasks, by injecting lightweight yet secure code on-the-fly, without the need for full or partial re-design of the whole system or some of its components. For example, it could enable tailored analysis of network packet bodies locally, without developing new full-fledged inspection modules. The target is more flexible operation than today, allowing lightweight processing for normal operation, while moving to deeper inspection (and larger overhead) at the early stage of any suspicious anomaly, or upon triggers from cyber-threat intelligence. Task offloading to local services helps balancing the trade-off between processing and network overhead in an effective way, tailoring the broad range of local capabilities to the specific nature of the digital service.
Luckily, the modern technologies selected for this task are not resource-hungry, so resource allocation is not a problem, like explained in a preliminary study on this topic . At the same time, this kind of flexibility would allow more efficient allocation of resources, by dynamically adapting the processing load to the evolving context. Such approach is very useful whenever the detection is based on techniques (like ML, or Artificial Intelligence) which are largely based on the extraction and analysis of features that cannot be known in advance since attacks evolve and new threats emerge, thus effectively addressing the need to tackle the continuous evolution of attack patterns and to investigate or react to zero-day attacks. Indeed, in this second case, static configuration options might not be enough to detect or implement unexpected features in real-time. Summarizing, programmability is implemented in the control plane of each local agent, and develops on two main directions:
The operational parameters (log files, configurations, current status of the system, filtering events, etc.) are modified at run-time, according to pre-defined templates, patterns, and options.
Security programs can be on-boarded without re-designing, re-deploying, and even re-starting local agents. In this case, the same framework is also responsible for verifying authorization, integrity, and safety of any piece of code that is injected into remote objects.
Programming models should target lightweight tasks, to not overwhelm resource-constrained devices, and execution in safe sandboxes, to limit damages coming from compromised code. A promising technology to this purpose is the extended Berkeley Packet Filter (eBPF),Footnote 1 which currently provides inspection capabilities for both network packets and system calls.
The Security manager is the most valuable and innovative component in the proposed architecture. It is responsible for collecting and sharing the security context among multiple detection and identification algorithms, according to the overall objectives and behavior described by high-level user policies. As shown in Fig. 3, multiple logical components are required to implement the Security manager.
The first task for the Context Broker is to manage the heterogeneity of sources and protocols, which is reflected in different data and control interfaces. The Context Broker hides this heterogeneity and exposes a common data model to the other components, for discovering, configuring, and accessing the security context available from the execution environment (namely, the different digital resources).
The Context Broker has also capabilities of data abstraction, fusion, and querying. The flexibility in programming the execution environment is expected to potentially lead to a large heterogeneity in the kind and verbosity of data collected. For example, some virtual functions may report detailed packet statistics (i.e., those at the external boundary of the service), whereas other functions might only report application logs. In addition, the frequency and granularity of reporting may differ for each service. The definition of a security context model is therefore necessary for security services to know what could be retrieved (i.e., capabilities) and what is currently available, how often, with what granularity (i.e., configuration).
Data aggregation and fusion capabilities will help distill refined information from the large set of events and data collected by the local agents. A common abstraction should be used to expose such capabilities in a consistent way, by organizing and aggregating data coming from local agents into features. A feature identifies what kind of data have to be extracted from the whole dataset that can be generated by local agents; it is kind of data “subsampling”. Possible examples of data representing features are: sections of logs, specific fields of network packets, performance metrics, Operating System indicators, events from applications, protocols, traffic statistics, etc. The choice of the extracted feature is related to the threat under analysis, and is a critical issue for the correct identification of current and future threats, but it is helpful for two different reasons. First, resources are saved locally, according to the programmability requirement, because features are a usually a small subset of all data that local agents can provide. Second, the feature is the same whatever the number and type of agents and the service implemented, so, whatever the agents/services added on-the-fly, the detection and analysis procedures are not modified.
The correct identification of the most appropriate features is very challenging, because it depends on the service topology, the agents mapped on it, the type of attack to be detected, and how to carry out the attack detection. The better the suitability of the feature extracted, the more effective the security service in its detection and analysis operations (security services will be described in detail in Sect. 4.2.3).
Correlation of data in the time and space dimensions will naturally lead to concurrent requests of the same kind of information for different time instants and functions. In this respect, searching, exploring and analyzing data in graph databases should be considered as implementation requirements. Indeed, unlike tabular databases, graph databases support fast traversal and improve look-up performance and data fusion capabilities. Finally, the last implementation requirement is the ability to perform quick look-ups and queries, also including some forms of data fusion. That would allow clients to define the structure of the data required, and exactly the same structure of the data is returned from the server, therefore preventing excessively large amounts of data from being returned. This aspect could be very useful during investigation, when the ability to understand the evolving situation and to identify the attack requires to retrieve and correlate data beyond typical query patterns.
Another feature of the Context Broker is data storage. Given the very different semantics of the context data, the obvious choice is non-relation databases (NoSQL). This allows to define different records for different sources, but also poses the challenge to identify a limited set of formats, otherwise part of the data might not be usable by some security services. The validity and volume of data affect the size of the database and the need for scalability. Local installations are suitable when data are kept for days or months, but cloud storage services may be necessary for longer persistence or larger systems. On the other hand, remote cloud storage is not suitable for real-time or even batch analysis. Another design issue is the possibility to scale-out horizontally and/or inborn support for parallel processing and big data analytics, if the data volume becomes large.
The first task of the Context Programmer is to manage the programmability of local agents, which, as detailed in Sect. 4.1, is the capability to shape the depth of inspection according to the current need, in both spatial and temporal dimensions, so to effectively balance granularity of information with overhead. This is a novelty with respect to the existing frameworks, which have an intrinsic rigidity in analysis and detection procedures often based, and dependent, on data sets generated by each agent.
Programmability also includes the capability to offload lightweight aggregation and processing tasks to each local environment, hence reducing bandwidth requirements and latency. This would change the reporting behavior by tuning parameters that are characteristic of each app (logs, events), network traffic, system calls (e.g., disk read/write, memory allocation/deallocation), remote procedure calls toward remote applications (e.g., remote databases), etc. The Context Programmer is the logical element that offers a homogeneous control interface for configuring and programming different data sources, by implementing the specific protocols (control channel). The Context Programmer has also a context discovery layer. Context discovery should manage an evolving topology by discovering new components that join or leave the service and that cannot be deployed and managed freely, since the related resources belong to SPs that are very often external to the framework. Since different actors are usually involved in the same service chain, access to the context is subject to identity management and access control. By selectively querying all components involved in the chain, this layer builds the logical topology of the overall service, including the security properties and capabilities of each node.
The Context Programmer can also enable pushing pre-defined programs from a programs library. The programs library is a collection of software that can be injected into the programmable hooks present in the execution environment. Different languages can be used by different hooks, e.g., ELF binaries, java bytecode, python scripts, or P4/eBPF programs. Such programs are written and compiled offline, and then inserted in the library by the Security Dashboard. They also include metadata for identification and description, so to be easily referred by the Security Controller.
From a security perspective, it is important to formally verify the programs safety and trustworthiness. This is implicitly guaranteed whenever the code is executed within an execution sandbox. But in case of general-purpose languages, the correctness and safety of the source code might be verified by static tools for source-code analysis.
One of the main advantages of collecting heterogeneous security information in a centralized repository is the possibility to carry out analysis and correlation well beyond the typical limited scope of existing security functions (Denial of Service (DoS) detection, IPS, IDS, antivirus, etc.), and in a far more efficient way, i.e. without replicating monitoring and inspection operations. This is the main task of the security services, that process data, exploiting possible correlations between apparently independent events which may come from the same multi-vector attack. Their main features are both detection and assessment, based on specific security policies that can allow or deny a service, depending on policy-dependent requisites. They are also conceived for log analysis; for example, depending on the monitored activities reported by logs coming from different digital services, they can detect traffic anomalies and signal them as suspicious activity. Security services are placed in the centralized part of the framework; they compare data coming from the Context Broker with predefined security and control policies, and take automatic actions accordingly. In turn, the Context Broker exposes to them a common security context, abstracted from data coming from heterogenous sources and protocols, and with different data and control interfaces.
Security services should run dynamically, eventually being combined together to carry out more complex analysis and assessment tasks. The security service components must be created ad-hoc, so that they are well-defined and with compatible APIs. The ambitious goal is to guarantee full interaction among them, through common and standardized API semantics. Accordingly, an entity responsible for managing and orchestrating the execution of security services is needed in the Security Manager. This management entity is also responsible for the right choice of the applications based on their interface compatibility, so that the exchange of data and control information is guaranteed.
Security services can also run in a virtualized environment, in containers like VMs, with a dynamic allocation of resources for scalability and optimization purposes, and without keeping a tight bound between the running software and the underlying hardware environment. So, there is virtually no limit to the number and types of security services that can be implemented: verification of trust properties, intrusion detection, DoS detection, remote attestation, etc. This is the same principle at the basis of the NFV architectural framework as described in , so it is not a novelty by itself. The real novelty resides in the application context of this architectural part of the framework, which is totally different from the NFV counterpart for two reasons. First, security services are not network functions and do not provide a network service. Second, the Context Broker abstracts control and information data at a high abstraction layer, that can be seen as transparent towards the underlying network layer at which packets are processed (please refer to Section 4.2.2 for details on this aspect).
Beyond the mere re-implementation of legacy appliances for performance and efficiency matters, the specific research challenge is how to implement a new set of security services aiming to detect anomalies and threats effectively and proactively. From this point of view, a possible but interesting approach can be the adoption of ML algorithms. As known, they have the capability to extract various patterns, which can be seen as sequences of subsampled data, that identify legitimate or malicious activities, based on the fact that the behaviour of a traffic pattern in case of attack is different from that of a normal traffic pattern [17, 44]. ML algorithms allow to learn the patterns that characterize a normal behaviour of the feature, so to recognize differences that can be identified as possible threats and attacks, and all this independently from the configuration of the local agents. This aspect is very useful in this scenario, since local agents are almost always implemented externally to the framework.
The features making part of the context can be used to train the ML algorithms that, in turn, will detect attacks and anomalies by discovering differences between the patterns learnt in case of normal traffic and patterns analyzed run-time. The strength of this approach is that ML algorithms are able to emulate the patterns behaviour without rigid and predefined rules, that instead are created in the training phase by the algorithms themselves. The main difficulty of this kind of approach is that new threats, or even variants of the existing ones, can affect features that are different from those chosen to detect traffic anomalies; so, as remarked in Sect. 4.2.2, it is of great importance to correctly choose the set of features to feed the ML algorithms and instruct the local agents. In this context, an analysis of the correlations among features can be of great help, since the relationships between different pattern behaviours help improving the effectiveness of the detection process. In fact, if data extracted from different features are correlated, the behaviour of a feature influences the others, allowing ML algorithms to detect more effectively new threats as soon as they change the normal behaviour of a feature. Capturing correlations among features to feed ML algorithms is actually a challenge, given the wide variety of data coming from local agents that are manageable by ML algorithms and that can be used to build the context.
A broader classification of security services includes the features of attack detection, threat identification, data tracking, trust and risk assessment.
Attack Detection: It is the capability to monitor the system behavior to recognize activity patterns that can be associated to known threats and attacks. Rule-based detection algorithms show their limits in the time to define new rules and to push updates to every installation. Similarly, the creation of legitimate profile usages is a complex and cumbersome task, which must be tailored to each different environment and use case. The challenge here lies in adding more intelligence to process the security context and to correlate even apparently uncorrelated heterogeneous events and data (network traffic, log files, user behavior) on different systems. This concept would add more flexibility to the detection process, freeing the algorithms from rigid and predefined rules and increasing their robustness in the detection of novel attacks, especially zero-day ones. Accordingly, the detection of such types of attacks is a peculiarity of this specific part of the framework. The effectiveness of such capability strongly depends on the choice and/or development of the specific algorithms to be run as security services. It is left to the implementation choices, and can be developed ah hoc for this purpose, or chosen among all the algorithms that handle the detection of zero-day attacks. What is important to remark here is that this architectural solution allows to overcome the heterogeneity of the external infrastructures, each one with its own capability of detecting zero-day attacks. In this respect, the previously cited ML methods, including (but not limited to) K-Nearest Neighbors, Naive Bayes, Graph Kernel and Support Vector Machine can help in this direction [6, 30, 46].
Threat Identification: It aims at identifying anomalies and suspicious activities that deviate from the average system behavior, and tries to define new patterns for unknown attacks; all this, in an automated way. Although very detailed classifications and taxonomies of both attack and defense methodologies have been already identified, attacks continuously transform to circumvent detection rules in security appliances. Again, ML methods promise significant advances in this field, especially when combined with the multilevel correlation analysis among the attributes of correct and malicious data [15, 29]. A possible approach in the adoption of ML algorithms is the so-called supervised learning. Here, the ML algorithm is trained to possible malicious patterns that deviate from normal traffic, so to be able to recognize each of them in the detection phase. Given the impracticability of elaborating detection rules for unknown threats, an alternative and ambitious approach is the unsupervised learning, whose goal is to autonomously identify anomalies, i.e., non-conforming patterns compared to the well-defined notion of normal behavior. This would also satisfy the automaticity requirement of the framework. The most critical point in such approach lies in the selection of the most suitable data set that is used to train the ML algorithm. This data set must be composed by traffic that is not affected by anomalies of whatever kind. After the training phase, the ML algorithm should be able to identify the unknown anomaly during the detection phase.
Data Tracking: It represents the capability to follow the position and transfer of private and sensitive data along the business chain, check compliance with user’s privacy policies, and alert or remove data in case of violations. Data privacy solutions for the cloud entails the introduction of specific middleware to control and manage access to data. This works when data are shared among a pre-defined set of applications that run in a homogeneous environment, but it is more challenging to achieve in heterogeneous, dynamic, and composite systems. The recent introduction of the General Data Protection Regulation (GDPR) in Europe has boosted an increasing interest in data privacy and sovereignty. The typical approach is limited to the procedural level, while technical enforcement solutions are still missing. The proposal in this direction lies in the adoption of security APIs in each digital service, that will enable to query about the presence and usage of private and sensitive data; in addition, any access to data should trigger a notification and the verification of user policies. In this way, beyond enforcement of data access, records will be kept about the transfer of data to other services, enabling later verification of persistence and request for removal. Here, the main challenge is the identification of new ways to trade data. Blockchain technologies might provide interesting solutions, since the problem is not far from Digital Right Management (DRM), which is already present in recent research roadmaps .
Trust and Risk Assessment: It represents the capability to assess the reliability of the different actors and the services involved in the business chain, by evaluating the appropriateness of security properties (presence of vendor/software certification, presence of private/sensitive data, configuration settings, etc.) to the user’s policies, and by evaluating the risk related to security breaches. When heterogeneous services are automatically selected from different domains to be chained together, their security properties should be formally verified to satisfy the high-level trust policies (trusted vendors/countries, minimal encryption requirements, trust chains, security mechanisms, etc.) of users, that should always be aware of the weakness of a service, and able to decide whether it is acceptable or not. Trustworthiness will involve the two dimensions of identity (service owner/provider) and integrity (software). Assuming the lack of a common authentication framework worldwide, the challenge here is to build reputation models based on recursive trust relationships, similarly to what already used in e-mail systems (i.e., PGP).
From an architectural perspective, each security service will only be required to implement the interfaces towards the Context Broker and the Security Controller. For existing tools, this could be achieved by developing plug-ins or adapters. The interface to the Context Broker will be used to retrieve relevant information, including both real-time and historical data. This interface will allow selective queries to return aggregated data, with respect to multiple services and time periods. The interface to the Security Controller is used to notify security events like threats and attacks, that may trigger some forms of reaction. The description of the event may include an estimation of the accuracy of the detection, so to trigger the collection of more detailed information; alternatively, this information could be retrieved by evaluating specific conditions on the current security context.
The combined analysis of the security context can greatly enhance the detection capabilities, especially in case of large multi-vector attacks. The challenge is clearly to merge knowledge without exposing sensitive information to external domains. In this respect, the notion of local processing and distributed security analysis as hereby proposed may provide an effective solution for multi-layer detection mechanisms. The combination of heterogeneous monitoring data will open the opportunity for novel detection capabilities. For example, analysis of application logs that indicate multiple login failures may help detecting attack patterns in the encrypted network traffic. From a practical perspective, however, the real range of security services will be limited by the possibility to find an acceptable trade-off between the complexity to implement local inspection and the communication overhead.
The Security Controller represents the most valuable part of the architecture, conceived to automate as much as possible the behavior of the whole framework. It positions between the high-level policies and the context, and orchestrates security functionalities, according to what already devised in on-going initiatives . So, the role of the Security Controller is to mediate between network applications and the underlying data plane.
It can work in three alternative ways:
fully automated: the framework reacts to specific conditions based on pre-defined rules, without any intervention from humans. This is only possible for well-known threats. For example, a packet filter may be installed when the traffic streams grow beyond a given threshold. Another example is the request to isolate or remove a service upon indication of intrusion.
semi-automated: in case of unknown or complex attacks, pre-defined policies might not be able to cover all possible situations or variants, so the system may only partially respond automatically and wait for further inputs from humans. This may be the case of anomalous (yet not overwhelming) flows of packets that are temporarily blocked while waiting for additional actions from the security provider.
supervised: the system is able to react autonomously, but the likelihood or impacts of possible errors suggests confirmation from humans. In the same example as the previous point, the security provider is asked the permission to block the traffic, so to avoid to disrupt any critical activity.
Automatic reaction shortens response times and unburden humans from mechanical and repetitive tasks. However, full awareness and the need for post-mortem analysis recommend to keep track and report any action to the dashboard, at least to give visibility of the occurrence of attacks.
We can give a concrete example of how the Security Controller is expected to behave in case of DoS attack. Detection of volumetric DoS is typically based on analytics on the network traffic. Since deep inspection of the traffic leads to high computational loads and latency, an initialization policy only requires statistics about the aggregate network traffic that enters the service, which may be collected by standard measurements reported by the kernel. The same policy also initializes an algorithm for network analytics and sets the alert thresholds. Upon detection of an anomaly in the traffic profile, an event is triggered and the Security Controller invokes the corresponding DoS policy. The policy now requires finer-grained statistics, and the Security Controller selects a packet filtering tool (e.g., eBPF) for packet classification, installs and configures it. The policy also requires the detection algorithms to work with the broader context information available. As soon as the analysis comes to a new detection, it triggers a new alert, this time including the relevant context (i.e., identification of suspicious flows, origins, etc.). Before taking the decision about how to react, the mitigation policy may evaluate some conditions to check if the suspicious flow comes from an expected user of the service, if it has been previously put in a blocklist or in an allowlist, and if it is acceptable based on previously recorded time series. The actions to be implemented (e.g., dropping all packets, dropping selected packets, redirecting suspicious flows towards external DoS mitigation hardware/software, stop the service, move part or the whole service to a different infrastructure) are therefore notified to the Security Controller, which again translates them in a set of commands for the external service orchestrator and/or configurations and programs to be installed in the execution environment. Notifications about the detected attack and the implemented actions are also sent to the Security Dashboard.
High-Level Security Policies
Policies define the behavior of the system. Conceptually, policies do not implement inspection, detection or enforcement tasks, so they do not correspond to any existing security function (IDS/IPS, antivirus, Virtual Private Networks). Instead, they represent an additional upper layer for control of security services. Policies are therefore used to automate the response to expected events, avoiding whenever possible repetitive, manual, and error-prone operations done by humans.
The simplest way to define behavioral policies is the Event-Condition-Action (ECA) pattern, which covers a broad range of interesting cases. The definition of an ECA policy requires at least 3 elements:
an Event that defines when the policy is evaluated; the event may be triggered by the data plane (i.e., detection algorithms), the management plane (i.e., manual indications from the dashboard, notifications from the service orchestrator), or the control plane (i.e., a timer);
a Condition that selects one among the possible execution paths; the condition typically considers context information as data source, date/time, user, past events, etc.;
a list of Actions that respond, mitigate, or prevent attacks. Actions might not be limited to simple commands, but can implement complex logics, also including some form of processing on the run-time context (e.g., to derive firewall configuration for the running instance). They can be described by imperative languages, in the forms of scripts or programs.
The range of possible operations performed by policies include enforcement actions, but also re-configuration and re-programming of the monitoring/inspection components in the execution environment. Enforcement and mitigation actions are mostly expected when the attack and/or threat and their sources are clearly identified and can be fought. Instead, re-configuration is necessary when there are only generic indications, and more detailed analysis could be useful to better focus the response. A typical example is a volumetric DoS attack. To keep the processing and communication load minimal, the monitoring process may only compute rough network usage statistics every few minutes. This is enough to detect anomalies in the volume of traffic, but does not give precise indication about the source and identification of malicious flows to stop. Re-configuring the local probes to compute per-flow statistics or more sophisticated analysis helps to implement traffic scrubbing.Footnote 2
From a research perspective, the ambition is the definition of high-level policies in terms of objectives and intents, that could be defined even by non-technical users. The adoption of advanced reasoning models, even based on some forms of artificial intelligence, is clearly a very promising yet challenging target to automate the system behavior. This would open the opportunity for dynamically adapting the response to new threat vectors. In this respect, the historical analysis and correlation of the events and conditions with the effects of the corresponding actions from existing policies or humans would provide useful hints to assess the effectiveness of the latter, so to identify and improve the best control strategies.
Identity Management and Access Control
The security context retrieved by the Context Broker contains a lot of information about service usage patterns, users, exchanged data, and so on. Access to this data should therefore be limited to authorized roles and algorithms. In addition, configuration of the remote agents must remain a prerogative of the security controller and trusted policies, so it is important to track the issuer of such commands. The Context Broker is therefore expected to enforce access policies settled by the Identity management module (Idm). In line with the reference scientific literature on this topic, identity management and access control can be flexibly managed through the ABAC logic.
Public Key Infrastructure
The overall security architecture grounds its roots on a Public key Infrastructure (PKI), embracing a Local Certification Authority and a list of authentic users. From the cryptography perspective, Local Certification Authority and users are in possession of a public-private key pair. The private key is kept secret for all the entities. The public key, instead, can be shared within the whole architecture by means of a X.509v3 certificate, signed by the Local Certification Authority. In order to offer a good level of flexibility, the proposed architecture also envisages the possibility to integrate Local Agents and users belonging to heterogeneous domains/platforms (multi-domain approach). In that case, the Identity management block depicted in Fig. 3 can integrate more PKIs, each one managed by its own Local Certification Authority.
The proposed architecture envisaged in this paper implements the decoupling between authentication and authorization functionalities. In this case, a key role is provided by the Idm component. Specifically, the Idm component contains a database that maps the identity of both Users, Local Agents, and any other component belonging to the Security Manager to a specific list of attributes. From one side, it is able to authenticate Users, Local Agents, and any other component belonging to the Security Manager within the system. From the other side, it is able to provide them the right set of attributes that, according to the ABAC logic, will be used to protect resources or grant the access to protected resources during the authorization phase.
Authorization procedures and policy enforcement are managed through the Distributed Multi-Authority Ciphertext-Policy Attribute-Based Encryption (DMA-CP-ABE) algorithm, as suggested in . After a successful authentication process managed by the Idm component, the ABAC/Attribute-Based Encryption (ABE) component delivers attributes to users, Local Agents, and any other component belonging to the Security Manager through a trusted file structure (like, for instance, an extended version of a JSON web token ). These attributes are encoded as a list of cryptographic material. The Security Manager drives the generation of the policies that control the access to resources at both Exporter and Enforcer components of the architecture. Protection against pollution attacks is implemented to avoid that attackers can bind access rights from different platforms to satisfy a complex policy. Policies for time-limited authorization and revocation of access rights are also implemented to increase the security level. Once authenticated, users can use attributes in their possession for accessing to resources and services available within the architecture. Depending on the access policy, they must demonstrate to be in possession of the right set of attributes by performing specific cryptographic operations .
The Security Dashboard is the main management tool used to build situational awareness, to perform reaction and investigation operations, and to share cyber-threat intelligence.
Upon analysis, detection, and assessment, users must be made aware of the current situation. Bare technical information (e.g., available algorithms for encryption or integrity, the software version) will be totally useless for most users. The real value added here is to deliver tailored informative contents at different levels of the companies structure, to bring awareness to humans and ensure the better understanding of the current situation. For example, loss or uncertainty in the position of private data triggers a warning about potential violation of a specific regulation to the legal staff. Any loss of integrity, data, or availability can be reported to the management staff, in terms of potential impact on the overall company business (block of the production, loss of customers, bad reputation). Risk assessment at the management layer also requires to automatically feed existing tools, reducing the reliance on labor intensive and potentially error-prone analysis by experts.
Reaction and Investigation
The user interface can be used to select specific analysis and detection algorithms, to visualize anomalies and security events and to pinpoint them in the service topology, to set run time security policies, and to perform manual reaction. With respect to the last two options, it has to be pointed out that security policies are the best way to respond to well-known threats, for which there are already established practice and consolidated methodologies for mitigation or protection. However, the identification of new threats and the elaboration of novel countermeasures require direct step-by-step control over the ongoing system behavior. The dashboard interacts with the orchestration system to give security provider full control over the graph in case of need.
Effective reaction and mitigation of attacks largely depend on their timely detection and deep understanding of causes and implications. The accuracy of detection and analysis algorithms is of paramount importance, but the greatest benefit comes from collaboration at the national and international levels, so that appropriate countermeasures and remediations could be undertaken in advance. Again, automation is the main challenge, to overcome the intrinsic slowness of current manual processes. From a technical perspective, the main aspect is the automatic generation of incident reports in standard formats (e.g., STIX), their collection in common repositories, and the generation of cyber-threat intelligence with attack patterns and threat description [39, 41].
From the description of the framework carried out in this section, it is clear that its main goal relies in the quick identification of the compromised parts of the service chain, taking the related remediation and mitigation actions. This process is automatic, i.e., there is no need to declare the weak points of the chain by the service providers. The inter-working among external local agents, Context Broker and security services allows the quick identification of services, or parts of them, that are malicious or under attack. Unfortunately, adversarial or dishonest participants in the chain are very difficult to detect. An effective identification of such actors strongly depends on the trust mechanisms that can be implemented in the platform. This is actually an open issue, that will be further discussed in Sect. 6.