Keywords

1 Introduction

The increasing number of attack vectors and the growing complexity of attacks on computer networks force operators to continuously assess and improve their networks, services, and configurations. Analyzing, testing, and deploying new security features and configuration improvements is time-consuming, challenging, and error-prone. The same holds for general software upgrades or network infrastructure changes. Performing this on an operational network is often not suitable, as service continuity has to be ensured and outages cannot be tolerated.

Therefore, reproducing a network environment into a self-contained test environment is beneficial as the operational network is not influenced. Testing and analyzing different options to improve the network and its security can be evaluated and tested sufficiently before deployment. With large and complex network environments, reproducing such network environments cannot be done manually as information about the environment and its elements may be unknown, incomplete, or not available in a formal description. Hence, an automated process to reproduce network environments in a physical or virtualized testbed is required.

In this work, we present INSALATA, the IT NetworkS AnaLysis And deploymenT Application. INSALATA enables network operators and researchers to automate reproduction of arbitrary network environments in physical or virtualized testbeds. To represent network environments, we provide a network model comprising layer-2 network segments, IP networks, connectivity information (routing and firewalling), network nodes (routers, hosts), network services (DNS, DHCP), and host information (network interfaces, memory, disks, operating system). INSALATA can analyze network environments to obtain a formal description of the network to track the state continuously or in discrete intervals. Here, INSALATA uses information fusing to provide a comprehensive view on the network by aggregating information from multiple collector modules. Using descriptions decouples analysis and deployment and enables re-using, archiving, and distributing these descriptions. INSALATA can instantiate descriptions on physical or virtualized testbeds employing a PDDL planner to structure the setup process and resolve inter-dependencies between setup steps. To minimize setup steps and reuse existing testbed setups, we support incremental setups by determining the delta between current and target testbed state.

The key contributions of our work are (a) INSALATA, a fully automated, modular, and extensible framework to reproduce network environments on testbeds, (b) the open source implementation of INSALATA available on GitHub, and (c) a case study showing INSALATA’s applicability to real world scenarios using exemplary module implementations.

The remainder of this paper is structured as follows: First, we describe our goals and requirements for INSALATA in Sect. 2. Afterwards, we analyze if related work can fulfill these in Sect. 3. In Sect. 4, we present INSALATA’s design and introduce its components. Next, we present the main components of INSALATA in detail, in particular the underlying information model in Sect. 5, the Collector Component in Sect. 6, and the Deployment Component in Sect. 7. In Sect. 8, we summarize important implementation details. In Sect. 9, we present a case study to show the applicability of our proposed system. Finally, we give a conclusion and present future work in Sect. 10.

2 Goals and Requirements

The overall goal is to reproduce arbitrary network environments into physical or virtualized testbeds. Therefore, we need (a) a suitable information model reflecting required information, (b) an automated information acquisition process, and (c) an automated deployment process.

(a) Information Model: The information model abstracts from the network environment. The goal is to reflect network environments up to application layer of the TCP/IP reference model. The information model has to be extensible to allow to add use case specific services and additional information elements. Therefore, we require the following information to be present: (a) basic network nodes, like hosts and routers, (b) networks on layer 2 and 3, including appropriate addressing schemes, (c) connectivity information like routing and firewalling, (d) basic network services like DNS and DHCP, and (e) host information, including network interfaces, disks, memory, CPUs, or operating system.

(b) Information Acquisition: The goal is to provide information acquisition that supports different types of information collection techniques, supports continuous monitoring of the network environment, and is fully automated. We identified that the following information collection techniques, differing in terms of intrusiveness and quality of information they provide, have to be supported:

Manually specified information is not intrusive, but rarely up-to-date. Including such information is required if other techniques are not applicable.

Passive scanning has no direct impact on networks, but collected information is limited and access to all network segments is required.

Active scanning creates load in a network and on components, but provides more detailed information about entities and services in the network.

Network management protocols need to be available on investigated nodes, but reduce system’s load and information requests are standardized.

Direct access to components, e.g. with SSH, delivers rich information, but requires appropriate access, to invoke applications, and interpret the output.

Agent-based information collection collects information just-in-time, but agents need to be deployed and run on the components.

(c) Deployment Process: The deployment process has to be incremental, so that the delta between the current and the target state is computed during setup and only required configuration steps are executed. Additionally, the deployment process has to be modular and extensible in order to cope with use case specific requirements. Therefore, the setup process has to be divided into small, self-contained steps. To be able to use the deployment process on different testbed architectures, the process itself needs to be independent from the underlying architecture as much as possible.

3 Related Work

To the best of our knowledge, no application to reproduce network environments exists. Therefore, we examine the two main components of INSALATA, namely information acquisition and testbed setup for deployment, separately. Next, we investigate network description languages as we need a suitable network model for INSALATA. Finally, we discuss network management protocols and frameworks to investigate appropriate implementation mechanisms to deploy the description of the network environment on the testbed.

Data Acquisition Applications are needed to obtain information about the network environment. Here, continuous monitoring is required and information from different sources needs to be fused. Additionally, tracking changes is a requirement. IO-Framework [6, 19] does not support continuous monitoring and only supports intrusive collection methods. The common Network Information Service (cNIS) [1] utilizes static information and higher level services (SSH or SNMP) but does not include less invasive information collection techniques. MonALISA [7, 11] and PerfSONAR [29] are not capable of continuously monitoring the network and detect changes. OpenVAS [23] is used to identify vulnerabilities within an infrastructure but has limited scanning capabilities. Single purpose tools like Nmap [20], Traceroute [2], or xprobe2 [35] can be used to collect single aspects of the network environment but do not provide a holistic view. Dedicated network management protocols, like SNMP [8, 21] or Netconf [27] can only be used to retrieve dedicated information from single network components, but do also not provide a complete view on the network.

Testbed Management Frameworks are used to orchestrate and control testbeds. All presented frameworks do not provide incremental setups but rebuild the designated network from scratch leading to higher effort within the setup process and manually configured changes get lost. Additionally, testbed orchestration and experiment execution are often tightly coupled. vBET [18] and LaasNetExp [24] are both closed source, preventing to extend those frameworks. VNEXT [25] and NEPTUNE [5] do not provide the automated setup of basic network services, like DNS or DHCP. Emulab [34] or DETER [4] tightly couple the infrastructure setup and the experiment execution. This requires to rebuild the network after each experiment.

Network Description Languages and Ontologies can be used as information model. The related work within this field lacks in providing the information elements needed for a proper mirroring of network environments, especially in terms of reflecting the connectivity due to routing and the usage of firewalls. IF-MAP [3, 30, 31] is not capable of reflecting interfaces or routing information. The target-centric ontology for intrusion detection [32] does not provide a sufficient addressing scheme for elements nor reflect routing or firewalls. The Network Markup Language (NML) [14, 15] provides a schema for exchanging network descriptions on a generic level, but does not provide concepts like network routing. The Infrastructure and Network Description Language (INDL) [13, 17, 28] extends NML, but is not capable of reflecting routes or firewall rules. Tcl-based format in Emulab and ns-2 [34] is not capable of modeling networks explicitly resulting in verbose definitions for large networks.

Network Management Protocols and Frameworks can be used to setup and configure the descriptions in testbeds. To do so, the virtualized testbed has to be setup, e.g. router and hosts as virtual machines, and those components need to be configured appropriately afterwards, e.g. using adequate routing tables. Dedicated network management protocols, like SNMP [8, 21] or Netconf [27] can be used to configure components and request dedicated information in a standardized way. Both protocols do not provide built-in mechanisms to manage larger infrastructures as a whole. Ansible [9] is a push-based framework to configure larger infrastructures using so-called playbooks. Those playbooks need to be written or adapted for each configuration. Ansible can not be used directly for our approach, but is suitable as an important building block. Puppet [26] is a pull-based framework for infrastructure configurations. As the testbed is reconfigured in irregular intervals, a pull-based mechanism is not suitable.

4 Approach and System Design

INSALATA consists of the Collector and the Deployment Component as its two main components:

The Collector Component is responsible for collecting and fusing information of the network environment and the current state of the testbed into a descriptions (see Sect. 6).

The Deployment Component manages configuration changes and the automated setup process on the testbed (see Sect. 7).

Both components utilize the same information model to structure the information about the network environment (see Sect. 5) and are managed and orchestrated by a central controller, the Management Unit. The system architecture of INSALATA showing these basic components and their interaction is depicted in Fig. 1.

Fig. 1.
figure 1

System overview of INSALATA showing basic components

The Collector acquires information about the network environment and is used to generate a description for the testbed. It maintains the current state of the monitored network. Here, the collection process can be done continuously whereas configuration changes can be tracked and stored with a timestamp in a database. This approach allows to rebuild a network at each point in time.

The Deployment Component utilizes a description that is either obtained by the Collector or provided by the user. Based on this, the Deployment Component configures the testbed to reproduce a network environment. To ease writing descriptions, a Preprocessor is utilized, replacing missing but calculable values in the description and checking the it for its validity.

5 Description of the Information Model

To be able to reflect a network environment in a testbed, a formal description of this network is required. This description needs to contain information elements discussed in Sect. 2. Each information element needs to be leviable from the network environment in an automated manner and has a unique identifier. The proposed information model is shown in Fig. 2. An information element is represented as box, the identifier of each element is underlined and additional attributes describing the information element are listed. For FirewalRules and Routes a combination of attributes is used as identifier. Relations between information elements are denoted as arrows in between and additionally denote their cardinality.

Fig. 2.
figure 2

Information model of INSALATA

The main information element is the Network Component representing a node in the network environment, e.g. hosts, routers or switches, and are further specified by the Template attribute. A Network Component is equipped with certain Disks and Interfaces. Interfaces are needed to interconnect Network Components in different kinds of networks, e.g. Layer 2 Networks or Layer 3 Networks.

Depending on its functionality, a Network Component can maintain Routes or Firewall Rules. Those express the connectivity between Network Components. The latter can be represented as raw dumps (Firewall Raw) or in a simplified format (Firewall Rule) to ease transformation between different firewall applications as proposed in [10]. As a simplification is not free of information loss, the raw information is stored additionally.

To support large-scale test environments consisting of multiple servers, a Network Element is associated with a Location specifying the testbed server the Network Component is emulated on. In case of a description reflecting the network environment, the Location is set to physical.

Another important concept of the information model is the Service element. This information element is used to reflect basic network services, like DNS or DHCP. A Service can be further specified by adding a Product and a Version to allow a high accuracy. Additionally needed services can be added to the information model using inheritance, allowing use case specific applications.

6 Information Collector Component

The Collector Component is capable of managing multiple Environments describing multiple networks to be monitored. The overview of INSALATA’s information Collector Component is depicted in Fig. 3.

Fig. 3.
figure 3

Information Collector Component of INSALATA

For each Environment, multiple Collector Modules, employing a particular information collection technique as described in Sect. 2, can be configured to collect the required information. A Collector Module obtains information about at least one information element and possible relations between information elements. Therefore, modules have to obtain the unique identifier of an information element. The collected information elements are handed over to the Environment fusing all obtained information into a comprehensive graph describing the network. Here, we assume that a module delivers no false, but potentially incomplete information. This modular approach has the advantage that different, specialized information collection techniques can be combined resulting in a more detailed and more precise view on the network environment.

To fuse information, the Collector utilizes the identifier of each object and the type of the information element. Objects with the same identifier and of the same type are treated as the same object. Each Collector Module passes discovered objects to the Collector. Attributes and relations are fused together in case multiple modules report the same objects.

Another challenge is to manage existing information elements in the model becoming obsolete. Therefore, a deletion scheme needs to be implemented within an Environment. Each information element in an Environment is equipped with timers for each Collector module. Each time, an information element is delivered by a module, the module specific timer is updated. If all timers in the list expire, the information element and its relations are deleted from the Environment. In addition, each module can actively set its own timer to zero, if it is capable to determine the non-existence of an element.

To be able to recreate an environment at any (observed) point in time, we track the network environment’s state over time. Whenever an information element or relation is modified, i.e., is added, deleted, or updated, we save this delta as an event to the database. Such events contain the type of change, the information element and its properties. Within an Environment, only the current state of the network description is maintained.

7 Infrastructure Deployment Component

The Deployment Component executes the following steps: (a) determine required configuration steps using the current testbed state and the given description, (b) determine a correct execution plan how to achieve the given description, and (c) deploy the changes on the testbed following the computed execution plan. The overview of the execution flow of INSALATA’s Deployment Component is depicted in Fig. 4.

Fig. 4.
figure 4

Execution flow of INSALATA’s Deployment Component

First, the Deployment Component has to determine what needs to be changed. Therefore, we need the current state of the testbed in the form a description as Description \(D_2\) that can be determined using the Collector Component. Additionally, we need the target state in the form a description as Description \(D_1\) provided from the Collector or the user. The Deployment Component uses a Change Detection Module to detect added, removed, and updated information elements in the delta between these states.

To determine how to change the testbed, we use automated planning and scheduling from the domain of artificial intelligence [16]. A planning problem is described using a dedicated planning language such as the Planning Domain Definition Language (PDDL). PDDL separates the planning problem into a domain description, describing the problem domain, and a problem description, describing an instance of the problem [12]. A PDDL domain description describes the objects’ types, predicates (i.e., properties), and actions. Each action has a definition of objects it is applicable to, preconditions that have to be fulfilled, and an effect altering the predicates of objects. With INSALATA, we provide a domain description for our information model and steps as PDDL actions necessary to setup a testbed. A detailed overview on the steps we defined and their inter-dependencies can be found in the domain description provided with the implementation. The PDDL problem description describes all objects, their type and their initial state. The problem specifies the goal state of all objects by giving their desired predicates [22] inside a goal section. While the domain file is static, the problem file depends on the current and target state and is computed each time a description is deployed on the testbed.

The computed changes, the current testbed state, and the target state are given to the Problem Parser Module computing a PDDL problem description. This description is given to the Planner Module, an automated planning and scheduling solver computing an execution plan containing the correct order of actions that will bring the testbed from the current into the target state. The execution plan is passed to the Builder to execute the steps on the designated testbed. Implementations of these steps are provided by architecture-specific Builder Modules since the implementation of such steps is different for different testbed architectures and objects. The Builder uses meta-data associated with the Builder Modules and the objects to identify the correct implementation. This allows us to dynamically add new implementations, e.g. for new operating systems or services, without changing the Deployment Component.

8 Implementation

INSALATA is written in Python 3 and is available in INSALATA’s GitHub repositoryFootnote 1. A detailed code documentation is available onFootnote 2.

The Management Unit can be controlled using a XML-RPC client communicating with INSALATA. The presented information model is reflected using object-oriented Python classes.

Besides the Collector Component itself, we provide the following Collector Modules: (1) a XEN module using to collect information from environments using XEN virtualization, (2) an XML module for manually provided information, (3) a Tcpdump module for passive network scanning using Tcpdump, (4) an Nmap module for active network scanning using Nmap, (5) an SNMP module to retrieve information from nodes using SNMP, (6) an SSH module to retrieve information from nodes via SSH, and (7) a Zabbix module for agent-based information collection with the Zabbix network monitoring system. Some modules have limited functionality, meaning that not all information possible to collect is implemented, e.g. the SSH module does not collect firewall rules. New collector modules can be added to INSALATA to cover the desired scanning environment.

We integrate fast-forwardFootnote 3 as a planner into INSALATA’s Deployment Component. The domain file describing the setup process of a testbed we provide, is given in PDDL. The required problem files are generated for each setup individually in an automated manner. We provide a framework allowing to add new Builder Modules in an easy way. Here, we utilize Python annotations to determine the most suitable Builder Module for a given step within the determined plan and the configured object. Besides provided Builder Modules to setup the basic topology on XEN (hosts, routers, layer 2 networks), we utilize Ansible for additional configurations (routing, firewalling, DNS, and DHCP).

9 Case Study: Chair’s Teaching Infrastructure – iLab

To show INSALATA’s applicability in practical scenarios, we assesed INSALATA in a case study. For this case study, we use a setup adapted from the lab course iLabFootnote 4 offered by the TUM’s Chair of Network Architectures and Services. The iLab is a course to teach student’s practical skills in administering network setups and configurations for different scenarios using real hardware. A typical setup students have to work with during an iLab course is shown in Fig. 5. This setup consists of two Cisco and a Linux router, and four host residing in different private networks. In our case study, our goal is to reproduce this network environment in a network testbed using XEN virtualization.

Fig. 5.
figure 5

Typical infrastructure setup within the iLab course

We provide the IP addresses of all routers to the INSALATA system as a starting point using manual input in form of XML files. All routers and hosts are configured to allow INSALATA to access the systems using SSH and SNMP.

In the first phase, we obtain the required description of the network environment using our Collector Component. To expand our infrastructure information using our passive Tcpdump Collector Module, we generate traffic on the involved hosts. Using the SNMP and SSH Collector Modules, missing interfaces, MAC addresses, and routing information is obtained from routers and hosts. Using theses Modules, we are able to reflect the network environment shown in Fig. 5 as description.

In the second phase, we use our Deployment Component to reproduce the obtained description into our virtualized testbed using XEN with the xapi toolstack. INSALATA computes an execution plan to setup the testbed from scratch in 0.252 s. The resulting execution plan consists of 92 steps, including setting up virtual machines and networks, configuring interfaces and deploying routes. Each step is executed in sequence and no parallelization is done. The total time required to setup the testbed and configure it is 42 min 32 s. Figure 6 visualizes the setup and configuration process in regard to its execution time.

The most time-consuming tasks during the setup process are the creation of new virtual machines, which happens at the beginning. The reason for this is that here new virtual machines have to be cloned from the respective template, including copying hard disk images and required reboot operations. Configuration steps like creation of virtual networks and interfaces are done nearly instantaneously. After the setup of our description, we validated the correctness of our setup using manual inspection, Ping and Traceroute.

Fig. 6.
figure 6

Time distribution of setup steps in the iLab case study

10 Conclusions and Future Work

In this work we present INSALATA, a system capable of reproducing network environments in network testbeds. INSALATA enables network operators and researchers to test and analyze new security features and general configuration changes in a separated test environment before deployment in operational networks. To be able to formalize network environments, INSALATA utilizes an information model particularly crafted for representing network topologies, entities, and services in descriptions. To obtain the required information from network environments, we support a modular Collector Component automatically assessing networks and fusing information from different Collector Modules. The Deployment Component provides automated planning and scheduling to instantiate descriptions onto a physical or virtualized testbeds. Within our case study, we show the applicability of our approach reproducing a real world network environment with several routers and hosts onto a virtualized testbed.

To further improve INSALATA, we will continue our work in this field and on INSALATA and highly appreciate feedback, improvements, and extensions from the community. To extended INSALATA’s capabilities, additional Collector and Builder Modules can help to obtain additional properties from the network environment, such as user information or generic service configurations from hosts. Existing Collector Modules can be extended to obtain more information using existing assessment techniques, such as firewall information using SSH. To reproduce network environments more realistically, Builder Modules to support Microsoft Windows and additional network services, like mail or ftp are beneficial. One of our main goals is to make the deployment process more efficient by parallelizing the execution of the setup plan. Since INSALATA only provides mechanisms to setup and orchestrate a testbed, we aim to integrate our experiment execution framework GPLMT [33] into the INSALATA system.