Pre-Deployment Security Assessment for Cloud Services through Semantic Reasoning

. Over the past ten years, the adoption of cloud services has grown rapidly, leading to the introduction of automated deployment tools to address the scale and complexity of the infrastructure companies and users deploy. Without the aid of automation, ensuring the security of an ever-increasing number of deployments becomes more and more challenging. To the best of our knowledge, no formal automated technique currently exists to verify cloud deployments during the design phase. In this case study, we show that Description Logic modeling and inference capabilities can be used to improve the safety of cloud conﬁgurations. We focus on the Amazon Web Services (AWS) proprietary declarative language, CloudFormation, and develop a tool to encode template ﬁles into logic. We query the resulting models with properties related to security posture and report on our ﬁndings. By extending the models with dataﬂow-speciﬁc knowledge, we use more comprehensive semantic reasoning to further support security reviews. When applying the developed toolchain to publicly available deployment ﬁles, we ﬁnd numerous violations of widely-recognized security best practices, which suggests that streamlining the methodologies developed for this case study would be beneﬁcial.


Introduction
The term Infrastructure as Code (IaC ) refers to the practice of configuring, provisioning, and updating systems resources from source code files, which are compiled into atomic instructions and then executed to deploy the desired architecture [29]. The advantage of handling code, instead of manually provisioning resources, lies in the capability to use version control systems, orchestration frameworks, and automated testing tools as part of the deployment process. In addition to instructions relevant for resource creation, dependencies, and updates, IaC configuration files contain information about settings, dataflow, and access control. In a time when cloud companies provide customers with simpleto-launch, albeit extremely powerful infrastructure, it is crucial to automatically and provably verify the security of such systems.
In this study, we investigate IaC deployment frameworks and how these are formally modeled and reasoned upon. We explore the usage of description logics (DLs) as a conceptual-modeling formalism that is expressive, decidable, and refutation, depending on the property being checked. We formalize additional security knowledge that allows for checking system-level semantic properties; i.e., properties that consider the nature of the cloud environment and more complex reachability over an inferred graph representation of the infrastructure.
Throughout the study, we make four novel contributions: (i ) the formalization and logical encoding of AWS CloudFormation (Section 3); (ii ) a technique to express security properties (Section 4); (iii ) the experimental evaluation of encoding and query times, accounting for the most common security issues that we found over publicly available IaC templates (Section 5); and (iv ) an extension that enables semantic dataflow reasoning (Section 6). Our tool is implemented in Scala and available online [14]. We include preliminaries in Section 2; discuss related work in Section 7; and conclude in Section 8.

Preliminaries
Description Logics DLs are a family of logics well suited to model relationships between entities. They provide the logical foundation of the well-known Web Ontology Language [23,20,32], for which extensive tool support exists (e.g., the Protégé editor and off-the-shelf reasoners such as FaCT, HermiT, and Pellet [30,39,18,36]). We introduce the description logic ALC [34,1,24], Attributive Logic with Complement, and two additional features that are relevant for our study. ALC formulae are built from symbols from the alphabets N C , of atomic concept names; N R , of role names; and N I , of individual names. These are the DL equivalents of FOL unary predicates, binary predicates, and constants, respectively. ALC concept expressions are built according to the grammar: where A is an atomic concept from the set N C ; C, D are possibly complex concepts; and r is a role from the alphabet N R . Terminological knowledge is represented via general concept inclusion axioms C D. As an example, in the remainder of this paper we will refer to two standard axioms that enforce the domain and range of binary relations: dom(r, C) ≡ ∃r.
C and ran(r, C) ≡ ∃r − . C. Assertional knowledge is represented via concept assertions C(a) and role assertions r(a, b). In this paper, we will use three additional operators: inverse roles, functionality constraints, and complex role inclusions. The first, denoted r − , encodes the converse of the binary relationship r. The second enforces binary relationships to be functional. The third, written r • s t, establishes that the chaining of the two relationships r and s implies the relationship t, and can be used to implement transitivity (when r = s = t). A model of a DL knowledge base is an interpretation I, over a domain ∆, that satisfies all the axioms and assertions contained and implied by the knowledge base. For the purpose of our application, we leverage two classical inference problems: satisfiability and instance retrieval, whose full definitions are found in standard textbooks [2,3].
AWS CloudFormation AWS CloudFormation, cfn, provides users with a declarative programming language and a framework to provision and manage over 500 resources spread across 70 services [15]. 3 Services are products such as storage, databases, and processors, and their interface is implemented through resources, which are the actual modules that users declare and deploy. Their declaration is done by writing one or more so-called CloudFormation Templates (JSON-formatted configuration files). Within a template, users configure settings and communication of the desired resource instances. As an example, let us consider one of the most widely known storage products within AWS: the Simple Storage Service S3 (also illustrated in Listings 1.1 and 1.2). The CloudFormation interface for S3 consists of two resources: S3::Bucket and S3::BucketPolicy. A Bucket is a single unit of storage whose properties include encryption, replication, and logging settings, which can be viewed as the bucket's own configuration parameters. They could also be references to other resources that are connected to the current one, e.g., the unique ID of another bucket where logs are stored. A BucketPolicy is a resource that links an access control policy to a bucket. All the properties that can be instantiated and the structure of resource-types such as S3::Bucket and S3::BucketPolicy are given in the CloudFormation Resource Specification [15]. The resource specification is a collection of files that prescribe resource properties and their allowed values. Provided that a configuration file is valid with respect to the specifications, an IaC deployment environment compiles it into instructions that are then executed to provision the requested resources in the correct dependency order and with the desired settings.

Formalization and Encoding of IaC Deployments
While setting up this case study, we found it convenient to come up with a formalization, of both IaC resource specifications and IaC configuration files, to use as an intermediate representation during the encoding process. This was also needed since we could not find suitable research in the area (although some preliminary research on IaC formalization does exist: e.g. the PhD thesis in [12]). As mentioned in Section 2, users consult the resource specifications to find out what fields and values are allowed when declaring a resource. Intuitively, these provide a sort of type-system, or JSON schema, against which configuration files must validate. Configuration files contain the resource declarations of the instances that the user wishes to deploy. Let us illustrate this with some examples. Listing 1.1 shows a snippet of the S3::Bucket resource-type specification. In addition to the main resource type, the specification includes definitions for its subproperties, their types, and whether these are required. Although the example only shows string properties, in general, allowed properties values range over objects, arrays, and primitive types such as integers, doubles, longs, strings, and booleans. Listing 1.2, on the other hand, shows a common usage scenario of the S3 storage service, where a bucket with basic configuration is used to store the desired data. The instance has logical " Co nfigS3Bu cket ": { "Type": "AWS::S3::Bucket", "Properties": " BucketName ": "ConfigStore", " L o g g i n g C o n f i g u r a t i o n ": { " D e s t i n a t i o n B u c k e t N a m e ":"ConfigStore", " LogFilePrefix ":"config-bucket-logs/"}} Listing 1.2. S3::Bucket instance declaration ID ConfigS3Bucket, is of type S3::Bucket, and specifies two toplevel properties, BucketName and LoggingConfiguration. It is easy to see that this instance declaration validates against the resource specification of Listing 1.1. This snippet is taken from one of the benchmark deployments evaluated in Section 5 (StackSet 15) and, incidentally, it violates a security best practice: "no bucket should store its own logs." Such formalization has been instrumental to capture infrastructure configurations, resources settings and inter-connections, and to precisely and automatically encode it into DL.
Encoding We translate IaC specifications into DL terminological knowledge, and IaC configurations into assertional knowledge. The conceptual modeling features needed to model the former include axioms to define domain and range of properties, requiredness, and functionality. These give us enough expressivity to infer qualities of nodes that are underspecified, such as those that are referenced by a template but not declared in it (e.g., already deployed and running elsewhere), whose configuration is unknown. To give readers an intuition of the encoding procedure, let us look at the equation below, which contains some of the axioms and assertions generated by the translation of the code in Listings 1.1 and 1.2.

Security Properties Specification
We group properties into three categories that reflect their high-level meaning: security issues, mitigations, and global protections to security concerns. We view these in analogy to must and may specifications, which one would use to express that an issue may be present (vs. must be absent) or that a protection must be in place (vs. may be missing). Each property type is matched to a corresponding query structure, which aids the translation of security requirements into formal specifications and implements different fail/pass logics. Queries are written as description logic expressions whose outcome can be one of UNSAT, SAT with no instance found (SAT/0), and SAT with instances (SAT/+). These are achieved by running a satisfiability check, possibly followed by an instance retrieval call.
Mitigations are configurations of single resources that reduce the likelihood of a security event. In order to pass, these checks must be verified. Examples are: M1 "All buckets must keep logs," M2 "Only buckets that host websites can have a public preset ACL," and M3 "Data stores must have backup or versioning enabled." Security Issues are configurations that potentially increase exposure to security concerns. In order to pass, these checks must be falsified. Examples are: I1 "There may be a bucket that is not encrypted," I2 "Encrypted bucket that sends events to a not-encrypted queue," and I3 "There may be a networking component that opens all ports to all." Global Protections are more general mitigations, applied on single resources or as configuration patterns, whose presence and proper configuration ensures protection over the system as a whole. Examples are: P1 "There is an alarm configured to perform an action when triggered," and P2 "There is a configuration recorder logging changes to the infrastructure." We refer the reader to the repository in [14] for the properties specification files. 4

Application to Existing Infrastructure
We now discuss the application of our approach to real-world IaC deployments. We analyze AWS CloudFormation specification and configuration files, showing that the approach is practical, scalable, and identifies potential security issues.
Operation of the Tool We develop a tool that performs three main tasks. First, the encoding of the cfn resource specifications into formal models (Resource Terminologies). 5 Second, the encoding of the actual cfn configuration files, also called StackSet, into formal models (Infrastructure Model ). Third, inference and query answering for a set of predefined queries. We use the OWLApi [22] for the encoding phase, and JFact [39] as the inference engine. Experimental Setup We run our tool on 15 CloudFormation StackSets openly available on GitHub. Regarding metrics, we define the infrastructure size as the numbers of both declared resources (N ) and their types (N RT ). The latter determines which resource terminologies are imported into the final encoded model and thus influences its size, measured in number of logical axioms (N α ).
The smallest StackSet has 6 resources and 6 resource types, the largest has 508 resources and 21 resource types. We implement 50 properties from the ScoutSuite collection [35] that are applicable at design time and, thus, over IaC deployment files. Of the 50 properties, 29 are mitigations, 18 are security issues, and 3 are global protections. We conduct our evaluation on an Intel Core i5 with 16GB RAM and perform warmup runs and clear the heap before each measurement. This tuning helps to minimize the impact of just-in-time compilation and to reduce the likelihood of garbage collection during the measured benchmark runs.

Results Evaluation
The average compilation time of the entire cfn resource specifications (542 files) was 940ms. Table 1 reports the results of our experimental evaluation. StackSets are sorted by number of resources. For each, we measure the time taken by the stackset encoding (ENC), inference (INF), and query answering task (grouped by outcome: UNSAT, SAT with no instances, and SAT with instances). As we can see from the table, the encoding time increases with the infrastructure's size, producing larger models that require longer inference times. Average query answering times increase accordingly. UNSAT queries have shorter average answering times than those evaluating to SAT/0 or SAT/+ (UN-SAT proofs are found before a SAT outcome can be deduced). In addition, once a query is proved SAT, we invoke a procedure for instances retrieval to determine whether satisfying instances are present or not. The specific infrastructure configuration and its size are the main influencing factors of query answering times.
Considering that the average template has about 50-100 resources, and templates having 100-500 resources are rare, the results suggest that our approach scales to real-world IaC templates. For example, StackSet 04 has 132 resources, is encoded in 363ms, classified in 2.1s, and has a max average per-query time of 162ms. Assuming a pool of 100 checks to be run, the automated modeling and verification of such an infrastructure would take, in the worst-case, around 18s.

Found Security Issues
Across all 15 deployments, we run 15x50 = 750 checks: 608 pass and 142 fail. Of the 142 failing checks, 73 do not return any instance and 69 return one or more instances (i.e., they fail with a SAT/+ outcome). Such a difference is due to the nature of the single check and its definition of failure. A global protection check fails when no instance implementing the protection is found; a security issue check fails whenever is possible (SAT/0 or SAT/+); and a mitigation check fails when no instance is found. We consider SAT/+ findings particularly important, as they do not only witness a potential security issue but also an actual misconfiguration. In particular, the 69 SAT/+-failing checks fail on 239 resource instances, with the most found issues being:

Missing or misconfigured encryption 131
Missing or misconfigured logging 46

Missing or misconfigured versioning/backup/replication 44
Missing User password reset requirement 12

Misconfigured authorization 3
Misconfigured networking configuration 3 The 73 findings returning no instances fall into two groups: the absence of any monitoring or alarming system is very frequent, as is the dependency on external resources whose security posture cannot be assessed.

Absent global monitoring/alarming/logging protection 41
Usage of external resources with unknown configuration 32

Semantic Reasoning about Dataflows
To conclude our study, we manually craft two proof-of-concept models of terms related to cloud security (ontologies). We use these to extend the formalization of the CloudFormation IaC specification that was automatically generated by our tool. Such domain-specific ontologies formalize several common cloud terms, such as account, deployment, authenticated and unauthenticated users; generic dataflow terms, such as storage, process, nodes, and flows of different kind; and service-specific dataflow terms. By adding these on top of the underlying IaC formal specification, we can reason about the higher-level business logic and reachability of the infrastructure, and we can abstract it and visualize it in a more convenient way. This is where the full inference power of description logics comes into play. Such an inference power would be hard to achieve with an alternative encoding (e.g. using a modal logic). Let us illustrate how this technique is applied to system-level analyses of interest for a security review: dataflow and trust boundary analyses. A trust boundary is a portion of a system whose components trust each other and where data can securely flow. Multiple trust boundaries may exist within one system. Dataflows that travel across boundaries may introduce security issues and should be carefully reviewed. In Fig. 1, we see an example of such a situation, where the infrastructure is deployed across two accounts, prod and test, sharing resources AccessLog and AccessTopic. In our encoding, we use the so-called DLs inclusion axioms to rewrite properties that (when chained) imply the existence of a more general relation and to infer additional characteristics of nodes. For example, in the following list axioms 2-7 formalize the relationships of "logging to" and "sending notifications to" a resource, which imply the existence of a transitive dataflow between nodes; and axioms 8-9 allow to infer that the node devs@mail is an external node.
∃Protocol.{"email"} ∀Endpoint.EmailAddress (7) EmailAddress ExternalNode (8) This encoding enables us to compute a succinct dataflow diagram from the reasoned IaC configuration (see Fig. 2), and to formally verify properties that usually require a manual analysis of the infrastructure and its underlying graph representation. E.g., the question, "can data flow from the customer-data bucket to the outside?" can now be formalized as a DL formula and, using a  Fig. 1 reasoning engine, the existence of a dataflow that starts on the customer-data bucket and reaches the devs@mail node can now be inferred. We note that, due to the structure of the TopicSubscription resource, this dataflow could not have been detected with simple reachability analysis on a graph built without the aid of semantic reasoning. Moreover, the dataflow diagram highlights another potential source of information leakage: testers being exposed to customer access information. This needs to be mitigated by enforcing the proper trust boundaries, in particular, by adding a dedicated access log storage for customer-data bucket in the prod account.

Related Work
To the best of our knowledge, the problem of formally verifying the design of a cloud infrastructure in its entirety has not been addressed before. Formal reasoning techniques have been successfully applied to different aspects of the cloud, e.g. networks and access policies [16,7,4,5]. Non-formal tools exist that recommend and run checks against already deployed resources [35,13], or scan IaC templates [11,10,38] for syntactical patterns violating security best practices. These checks overlap considerably and can be expressed in our framework as well.
The disadvantages of such tools are that checks are local to single components, can be performed only post-deployment, need complex configurations, access permissions, or even manual interaction. The CFn-Linter [10] has a rule-based component that users can extend with custom syntax checks, but none of the rules currently available focus on security. The CFn-nag linting tool [11] checks compliance to best practices only locally to the single resources; e.g., it cannot detect issues such as "there is an events queue, receiving from a bucket with critical functionality, that may not be encrypted" or "there might be a user that is shared by multiple policies" (which would go against the least privilege principle); as well as including in its analysis external resources that are referenced by the template being linted.
Regarding our choice of logic, large-scale configuration problems have been tackled with description logic before [26,27]. Simpler first-order logic formulas with operators to represent object-oriented interface relationships could be used to model IaC specifications. However, such an encoding would only partially solve our problem, which is more complex because our overall goal is to do formal semantic analyses (e.g. dataflow and threat modeling). Semantic-based approaches, even DL-based, are being used to do conceptual modeling of security engineers' expertise with the provable and explainable inference capabilities of logics. As an example, we refer the reader to the OWASP "Ontology-driven Threat Modeling" project [31] that aims at the formalization of security-related knowledge in the context of different types of computer systems by means of description logic ontologies. In contrast to logic programming languages, such as Datalog, DLs inherently support functionality axioms and the existence of anonymous individuals within a domain that is assumed to be open. These are supported out-of-the-box without the need for an additional, more complex, axiomatization or encoding. In particular, we took advantage of DL's openworld assumption to implement, in our properties encoding, verification and falsification. Another alternative to DLs as a modeling language would be to use 3-valued models with labels on states and transitions and apply model checking [8,9]. However, expressive branching-time logics [25,33] have not been studied in the context of 3-valued models and we are also not aware of tool support at the level available for DLs (cf. [17,21]).

Conclusion and Future Work
Throughout this case study, we investigated the usage of description logicsbased semantic reasoning to evaluate the security of cloud infrastructure predeployment. We encoded Amazon Web Services' Infrastructure as Code specifications and configurations into description logic models and verified the presence and absence of potential security issues. We showed how this approach enables deeper system-level analyses such as dataflow analysis. All results can be generalized to other existing IaC tools. While working on this project, we interacted with developers on two occasions. First, for the benchmark templates used in our experimental evaluation, we contacted the owners, told them about the misconfigurations, and discussed potential security implications. Second, within AWS, security engineers use a technique based on this paper for security reviews of AWS products before they are launched, helping developers fix real issues predeployment. In the process, we received valuable feedback that we used for improving precision and reducing the number of false-positive results. We plan to continue researching for an even better-fitting description logic formalism, query language, three-valued semantics, and decision procedures for verification and falsification of properties relevant to security analyses, such as dataflows, trust boundaries, and threat modeling.