Agent-Based Evidence Collection in Cloud Computing
Nowadays there are many offerings of cloud services all over the world which have various security requirements depending on their business use. The compliance of these cloud services with the predefined security policies should be proven. In a cloud infrastructure this is not an easy job, because of its immense complexity. This paper proposes an architecture which uses software agents as its core components to collect evidence across the different layers of cloud infrastructures (Cloud Managment System, Hypervisor, VM, etc.) and builds a chain of evidence to prove compliance with predefined security policies.
KeywordsCloud computing Evidence Persistence Accountability Audit
This work addresses the problem of collecting, processing and persisting evidence from different sources inside a highly dynamic environment and its automation dependent on a policy describing the contract between a costumer entity and its Cloud Service Provider (CSP). The automation aspect is addressed by an Software Agent (in following agent) and is based on the previous work “Supporting Cloud Accountability by Collecting Evidence Using Audit Agents”  by Prof. Dr. Christoph Reich and M. Sc. Thomas Rübsamen. For example, a CSP is collecting information about policy violations and storing them using the service of a second CSP. The potentially confidential captured evidence need to be persisted in a integrity-verifiable way and protected from unauthorized access. This project’s expected contributions are additional control mechanisms highlighting the transparency desired by the cloud service costumers and accentuating the trust in CSPs and their contractual awareness.
The proposed agent-based architecture, which we describe in the following, collects evidence to allow the detection of policy violations and generates policy violation reports while protecting sensitive information and respecting costumer privacy at the same time. Thereby using an agent framework supporting strong and weak agent migration  was necessary for distributing and delegating tasks on demand adjusted to their different destination environments. The data to collect is depending on the assured policy contract between a cloud costumer and a cloud provider which, can be statutorily regulated, defined by the service provider or created out of user specific criteria. Through periodically audits, the implemented agents are able to provide the requested claims of evidence by persisting recognized policy violations.
While a Cloud Service itself potentially contains service interdependencies with external Service Providers, the sources of evidence to be covered by a trusted service are increasing, too. A Chain of Accountability can be formed implementing these centrally coordinated trusted (accounting) services along the supply chain. Pointing out the exact location of a occurred policy violation depicts how trust in a CSP is strengthened using services implementing evidence reveal- and notify mechanisms and therefore supporting accountability (e.g. members of the architecture to propose). In a multi-CSP scenario with service coherences inter-CSP collaboration is still a fundamental requirement. Currently, there is no standardized way for a cloud service costumer to check on his own whether or not he is affected by a policy violation occurred along the supply chain. The meanwhile established usage of Web Objects (like for example the Amazon Simple Storage Service AS3) extends the Chain of Accountability with dynamic interaction, which (usually) is transparent to the costumer. Potential evidence sources like these connections on demand and their potentially scalable content must be observed as of the time a active interaction with the service occurs. Therefore, the possibility of interacting with a CSP’s trusted service provides the transparency needed in a complex environment like the cloud.
This paper is structured as follows: in the first chapter of this paper we discuss related work (see Sect. 2). In Sect. 3, the evidence collection and persistence architecture including used technologies and agent coherences is described. Following that, the actual collection of evidence and the different collection agent types are explained in Sect. 4 followed by the persistence mechanism in Sect. 5. Following that, Sect. 6 describes the migration of agents in a scenario, where multiple cloud providers are involved. After that, an example of how service coherences are affecting policy violation evaluation, will be discussed in Sect. 7. We conclude this paper in Sect. 8 where our perception of further work is noted, too.
2 Related Work
Reich and Rübsamen empathized the need of policy violation audits and proposed how evidence collection has to be mapped to accountability in their previous work . They are proposing an Audit Agent System which was the groundwork for the construction of the architecture presented in this paper. This work will not focus on the mentioned audit aspects but on the storing, presenting and processing of evidence.
The idea of using Digital Evidence Bags (DEB)  plays a key role as a solid evidence persistence structure in this paper. Based on the work by Turner, Schatz and Clark propose an extension for connecting evidence composing and referencing DEBs using a Sealed Digital Evidence Bag . This mechanism is a possible extension to this architectures persistence mechanism.
The usage of software agents also were proposed in the context of Security Audit as a Service (SAaaS) . A related presentation layer and distributed sources of evidence are discussed in this paper but specializing on security policies and the guardedness of a source located at the input layer. Also, a ‘security business flow’ modeler generating the policies to observe is differencing SAaaS from approach proposed in this paper.
Of course there are many tools offering a agent-based solution for monitoring network devices by collecting and analyzing a wide range of current system properties. Industry standards like Nagios  also providing an agentless monitoring solution which is a less capable but goes easy on resources. Also there are Software as a Service (SaaS) monitoring solutions like New Relic  which providing agents collecting data dependent on different scopes and devices. Besides the traditional monitoring of (network - e.g. cloud) resources it supports real-time analytics and a performance monitoring, which can be integrated into the application development process. Therefore, it is not surprising to be confronted with this system using a Platform as a Service (PaaS) like Cloud Control  or AppFog  simplifying the monitoring of scalable applications. Transmitting the current values to the cloud brings different advantages like rapid data analyzing using resources on demand. The architecture proposed in this paper also supports monitoring functions but differs by extending this aspect in focusing on active intervention like interacting with third party tools and their provided APIs.
3 Architecture Overview
JADE complies with the Foundation for Intelligent Physical Agents (FIPA) specifications [11, 12], which define the internal behavior on action selection and execution as well as external agent interaction. The external agent interaction refers to the interaction context and the message creation, which draws on the FIPA ACL (Agent Communication Language). Other specified parts of JADE are system and platform services, which can be used for agent service registration or agent migration .
Being acquainted with the used agent framework lightens the contact with and comprehension of this architecture. A high level overview of the architecture is depicted in Fig. 2, revealing how distributed agents are communicating within it and in which way the different architectural components are interacting. All parts of this architecture are positioned inside one JADE Platform containing an evidence interpretation as well as a persistence agent which both interact with the distributed collection agents. Note, that additional services, which are provided out-of-the box by JADE (such as the centralized service registry and multi-platform interaction), are not pictured in Fig. 2.
To provide a chain of evidence, every trusted service of the supply chain contains a controlled agent which is responsible for evidence collection. To determine a policy violation, a variety of sources like the cloud management system, network packet data flows and storage units are browsed for conspicuous patterns. Also, external programs can be triggered to analyze their output.
To ensure data and therefore evidence integrity, every evidence collection run is persisted using a Digital Evidence Bag  which has been adopted for this approach (see Sect. 5). The DEB can contain raw data and diverse meta-data depending on the costumer contract or the kind of occurred policy violation, respectively. This architecture provides an interface delegating and coordinating tasks by interacting with the distributed Evidence Collection Agents.
The Input Layer contains different agents responsible for collection of evidence in their different scopes and locations. Currently covered evidence sources and their techniques are specified in Sect. 4. Each detected violation is noticed and the whole evidence including its meta- and additional control data is sent to the processing side. To guarantee information integrity, evidence is provided with a signature, which must be consistent during the evidence report process. This means, evidence integrity is ensured from the violation detection until the conversion from raw data to the output message has happened. At this point, a participating actor or an external system could be notified automatically. This automatic, transparency-strengthening process could therefore strengthen the trust in a CSP providing composite services .
The Processing Layer is responsible for the integrity observance and the evidence request access tracking which is expressed inside the DEBs Tag Continuity Blocks (TCB), implemented as a H2 Database . The Evidence Interpret And Persist Agent is listening for successfully finalized or failed evidence collection records created by the Input Layer which triggers the persisting of new insights. Also, the Evidence DB is communicating with the Evidence Presentation Agent, which interacts directly with the Delegation/Control Agent. Its only purpose is the expressive presentation of requested evidence in due consideration of the requesting actor. It is also conceivable to provide a costumers exclusive evidence collection system. In this scenario, the compliance of policies (from CSP side) is possible using a separate Cloud Service Provider Agent with the required access rights, respectively the interaction with the costumers exclusive Delegation Control Agent.
The Presentation Layer is the only point of direct contact with a human actor (e.g., a customer or a trusted third-party auditor) inside this architecture. The user interaction handling and request transformation is handed over to the Delegation/Control Agent. Besides requesting meta data about currently active Evidence Collection Agents, a costumer entity is able to check the current status of the contractual compliance with its CSP. Also the explicit delegation of a evidence collection run is possible due communicating with this agent. Conveniently, the opportunity of adapting the JADE library to JSP based systems is given. Therefore, the orchestration of agent actions (which of course must follow agreed-upon policies) could be added for instance to a costumers private web interface.
Protective goals inside this forensic mechanism are integrity and confidentiality of collected data which have to be guaranteed until the evidence collection has finished and was persisted. On the other hand, collected data should only be requested by authorized auditors or other authorized entities such as cloud regulators.
4 Evidence Collection
This agent is able to interact directly with the central component of any cloud infrastructure. To detect policy violations, APIs provided by the Cloud Management System are used to gather needed information. In case of OpenNebula  the process of evidence collection could be the request for current storage, network or virtualization orchestration and of course the analysis of log files, where events originating from cloud operations are recorded. For example, there is this internal project working on business secrets, which is placed on a separate hypervisor. Because of its critical data, this system would claim besides other transparency increasing measures the delegation of a CMS Agent. This agent would be responsible for gathering lifecycle information, tracking of occurred snapshots (which are relevant considering the aspect of needed confidentiality) and workload information of every (reachable) node.
Application Specific Agent:
These agent types collect a specific kind of evidence defined by the policy. For example, a policy could look like “It is not allowed to store email addresses inside a VM”. To detect evidence of non-compliance, patterns matching email addresses need to be searched and recorded. Therefore, the agent is triggering an external program searching the VMs hard disk for the given pattern. This can be done using the Cornell Spider tool , which generates a log file containing all file paths that possibly compromise the given pattern including additional meta data. Of course the occurrence of false positives cannot be excluded automatically from the output.
Storage System Agent:
This agent is communicating directly with various Storage Management APIs. Besides performance monitoring this agent is also able to determine the exact location (e.g. datacenter) of a service. This feature can be used to verify the awareness of policies requiring a geographically aspect.
This agent’s task is the investigation of different network-enabled devices inside a CSPs network. Some policies will prohibit the network communication with certain addresses and/or address-ranges for a specific network device. By analyzing NetFlow logs, the policy violating communication can be tracked and used as evidence (e.g., communication endpoints, time and duration).
In Sect. 6 there is a description of how to use this agent type along different CSPs in case of using their XaaS as supply chain.
After the evidence collection run has finished, the executing Evidence Collection Agent generating a evidence record as basis for the corresponding DEB. In some cases the evidence is a complete file which must be sent to the Processing Layer for purposes of conservation of evidence. Working large log files containing evidence can become a performance problem because each file containing a violation must be transmitted to the processing layer. In that case the evidence file must be transmitted inside a signed Blob and persisted at the processing layer (best encrypted, too) guaranteeing tamper evident properties.
5 Evidence Persistence
A DEB contains a Tag which has a 1:N relation to an Index, which in turn is related 1:1 to a Bag.
The Tag Table containing information about each evidence collection run in all because of its relation to the particular Bag and their overlying Index Tables. At this layer, meta data about the evidence record, information about the delegating agent and a reference to the last accessing subject of this evidence collection run are stored.
To be aware of actions performed on a specific Tag or Tag-underlying layer the TCB entry references a row inside the Request Table to detect every single corresponding evidence access.
The 1:N relation between the Tag and the Index is established using an intermediate table.
Also, containing a signature of every underlying relation altogether, the Tag table is a robust core element of this evidence persistence mechanism. Inside the evidence database the signatures are stored as Base64 varchar(String) and being restored as byte array for data integrity verification using the Java.security API.
Besides the Progress start (at the Evidence Collection Agents side) and the Policy scheme, the Index table contains a bagFormat attribute to categorize the occurred evidence. As mentioned before every evidence collection run that scored no policy violation must be persisted, too. This can be done at this point. The bagFormat is a first indicator of the significance of persisted evidence. It can be ‘structured text’, ‘raw binary data’, ‘archive’, ‘no policy violation’ or any other suitable categorization.
By referencing the Bag table by its corresponding evidenceUnitID, a requesting subject is able to reach a stored policy violation. If the evidence is associated with a file, the file is persisted including its signature at the processing layers storage but is not stored inside the database because of performance loss inside the signing and verifying mechanism. However the path to this evidence file is stored inside this Bag table besides the evidenceDetectionTime, original evidence meta data and an additional Hash. Therefore, using the JAVA Security API every Tag is signed with the Evidence Interpret and Persist Agent’s DSA key, while the actual bag holds the Evidence Collection Agent’s signature, which was created during the collection process.
6 Agent Migration
Also, the reaction on an occurred event is described (e.g., the notification of a subject about the analyzed evidence scoring a specific result). The A-PPL is currently work in progress. It extends PPL which extends the eXtensible Access Control Markup Language (XACML)  which is why this part is currently emulated inside this architecture using a XACML Parser deciding whether a function will be executed or not (e.g. the migration function).
Because of the A-PPL Engine’s SaaS aspects  the service probably will run inside a different ISP’s virtual machine, which also possibly will be stored inside a different ISP’s Infrastructure. To provide a high availability the data is probably replicated to another data center, which possibly is located in another country and because of that following another juristic system . Therefore, the actual avail of transparent data location in the cloud is to interpret as additional risk for a costumer.
Depending on the imported rules the Delegation/Control Agent distributes/migrates the requested Evidence Collection Agents to their JADE destination platform. The different scarcely spawned Evidence Collection Agents initially performing a service registration after noticing their corresponding Processing Layer to communicate with. To avoid unnecessary overloading data traffic the evidence is always persisted inside their corresponding Evidence Store (weak migration).
The possibilities of network analytics are given anyway if the Net-Flow Agent is positioned ISP-local. To implement the Net-Flow Agent in a CSP-comprehensive way, every CSP must offer either a standardized API to check out the necessary connection stats or executing a (continuous) Net-Flow Agent by their own to communicate with. Potentially this API will implement the Cloud Trust Protocol (CTP) . To collect every available net-flow evidence the different CSPs must report their supporting services to the requesting system (e.g. this architecture) or respectively provide a way to check them out. While there is no standardized API the opportunity of using a Net-Flow Agent on every service providing CSP is given (see Sect. 7).
Also the possibility of temporary migration is given, where an agent will be transferred to another platform and migrated back to the ordering agents platform (strong migration). In this case, not only the agents executable is transmitted between the platforms but its currently objects containing new insights, too. Of course, distributed agents remove themselves from their corresponding platforms in case of a non-continuous evidence collection run. To collect evidence, the corresponding agent platform must be started with the required rights to access the resource (in most cases this is root).
7 Complex Service Provision Scenarios
Figure 5 depicts a scenario where CSP A is offering a (potentially public) Service A but using Service B provided by CSP B (transparent to costumers). In this example, a network communication with a country outside the EU takes place at the service provided by CSP B.
The agreed upon policy predicates that every processed data must been held inside the European Union but because of the supply chain CSP interoperability is needed, more precisely a trusted service is needed.
Both CSP A and CSP B are hosting services inside their own datacenters located inside the EU. All necessary evidence connections are provided using this evidence collection architecture trusting a central Locality Compliance Agent which is aware of all service relationships used by a potential costumer.
If CSP A respects the policy by hosting a service inside the EU but is in turn using a service provided by CSP B, CSP A alone is not able to guarantee the compliance to this policy.
Requesting evidence reports from the affected (distributed) platforms Delegation/Control Agent is how compounded evidence is come about. Depending on the data transmitted from Service A to Service B, potentially valuable information could be transmitted to a ‘forbidden’ location, which is why the Evidence Collection Agent placed at CSP B diagnoses for network communication policy violations and creates an evidence record inside its platform’s Evidence DB. Once the Locality Compliance Agent receives all necessary evidence reports from the participating CSP’s Presentation Layers, the occurrence of interdependent policy violations are checked depending on the given EU data policy. Every new insight about interdependent policy violations will be forwarded to the A-PPL Engine which possibly will consider further steps (i.e., notifying stakeholders). Passing the new insights along the chain of accountability is relevant for a full conclusion but must go easy on network resources at the same time if this evidence architecture is applied in large datacenters. Therefore, for example a simple ‘OK=EU’ or ‘NOK=USA’ message on demand is sufficient for a pooled decision filling the dashboard of a customer’s private web interface using the boolean product of all replies, group of replies respectively.
8 Conclusion and Future Work
This paper underlines the importance of structured evidence collection on demand running on nearly every device supporting transparency and therefore trust. The presented architecture enables distributed collecting and persisting of evidence following imported rules. Because of the possible distributed JADE platforms this construct will work in large CSP data centers without overloading the average performance.
To provide a quick evidence processing there must be a mechanism that excludes already known policy violations at Evidence Collection Agent side. The mentioned Sealed Evidence Bag  is a potentially evidence persistence extension. Also, the collected evidence must be presented in a convincing and distinct way at the user interface. Still existing challenges among other things are the performance evaluating of the defined architecture and scaling tests enabling the deployment of this architecture for highly dynamic services.
This research is closely related to the A4Cloud Project.
- 1.Reich, P.D.C., Rübsamen, M.S.T.: Supporting cloud accountability by collecting evidence using audit agents. In: 2013 IEEE International Conference on Cloud Computing Technology and Science (2013)Google Scholar
- 3.Turner, P.: Unification of Digital Evidence from Disparate Sources (Digital Evidence Bags). QinetiQGoogle Scholar
- 4.Schatz, B., Clark, A.: An open architecture for digital evidence integration. In: AusCERT Asia Pacific Information Technology Security ConferenceGoogle Scholar
- 5.Validating Cloud Infrastructure Changes by Cloud AuditsGoogle Scholar
- 6.Nagios. www.nagios.com
- 7.New Relic. http://www.newrelic.com/
- 8.Cloud Control. www.cloudcontrol.com
- 9.App Fog. www.appfog.com
- 10.Italia, T.: Java Agent DEvelopement framework. http://jade.tilab.com
- 11.Poslad, S.: Specifying protocols for multi-agent systems interaction. ACM Trans. Auton. Adap. Syst. (TAAS) 2(4), 1–24 (2007)Google Scholar
- 12.Foundation for Intelligent Physical Agents. http://www.fipa.org/
- 13.Reddy, P.I.P., Damodaram, D.A.: Implementation of Agent Based Dynamic Distributed ServiceGoogle Scholar
- 14.Nikraz, M., Caireb, G., Bahri, P.A.: A Methodology for the Analysis and Design of Multi-agent Systems using JADE. Telecom Italia LabGoogle Scholar
- 15.Jansen, W., Grance, T.: Guidelines on security and privacy in public cloud computing. National Institute of Standards and Technology, U.S. Department of Commerce (2011)Google Scholar
- 16.H2 Database Engine. http://www.h2database.com
- 17.Open Nebula. http://opennebula.org/
- 18.Tchamdjou, M.Y.D.E.: Agenten zur Erkennung von sensiblen Daten und deren Schutz. HFU, Technical reportGoogle Scholar
- 19.Accountability for the Cloud. http://www.a4cloud.eu/
- 20.XACML - Extensible Access Control Markup Language. www.oasis-open.org/
- 21.Benghabrit, W., Grall, H., Royer, J.-C., Sellami, M., Azraoui, M., Elkhiyaoui, K., Önen, M., Santana De Oliveira, A., Bernsmed, K.: A cloud accountability policy representation framework. In: CLOSER - 4th International Conference on Cloud Computing and Services Science, Barcelone, Espagne (2014). http://hal.inria.fr/hal-00941872
- 22.Bradshaw, S., Cunningham, A., Luciano, L.D.C., Hon, W.K., Hörnle, J., Reed, C., Walden, I. In: Millard, C. (ed.) Cloud Computing Law. Oxford University Press, Oxford (2013)Google Scholar
- 23.Cloud Trust Protocol. https://cloudsecurityalliance.org/research/ctp/