Keywords

1 The Resource Continuum

The advent of the Internet of Things (IoT) era enables the possibility to interconnect together millions of devices, which are able to collect an enormous quantity of data processed somewhere in the Cloud for different purposes [9]. According to Gartner [6], in 2016 there were 6,4 billion edge devices around the world, but recently, IDC [8] predicted up to 60 billion connected entities by 2025, generating about 80 ZB of raw data. This enormous quantity of data highlights the limits of the Cloud approach: network saturation, the unsustainability of larger datacenters in terms of size and energy consumption, unreliability, and latency of internet connections which are not suitable for real-time or emergency use-cases. In this context, new post-Cloud trends aim to shift part of the processing near to the data, exploiting the increased capabilities of edge devices, which unfortunately are still limited by battery energy budget and thermal issues. The most promising approach is called Fog computing [4, 11] and it includes, in the latest definition, a layer between Cloud and edge that provides computational and storage services, including also aspects related to networking and management. This approach follows a resource continuum concept since the system connects devices from the Cloud to the edge seamlessly, as shown by Fig. 1.

Fig. 1
figure 1

The resource continuum reference architecture

1.1 A Comprehensive Architecture

In order to understand the underlying problems, the first step is to define and model the infrastructure. In this regard, the infrastructure can be organized following a multi-tier approach [11], as highlighted in Fig. 1: the computing nodes expose different capabilities (i.e., processing, storage, and network connectivity) between the various level. Secondly, the resources are modeled following a bi-dimensional space. The vertical dimension spans different paradigms: in this way, moving from lower to higher levels, it is possible to gain performance and energy budget but also increases costs and latency; on the contrary, from Cloud to edge leads to more pervasiveness and low-power consumption. Instead, through the horizontal dimension, it is possible to perform scaling and balancing among sibling nodes, or implementing fault-tolerance policies by (a) switching between different providers at the Cloud level, (b) composing dynamic worker groups exploiting Fog devices, (c) arranging sensors and Edge devices together.

As a consequence, such architecture is characterized by an high level of computing resource heterogeneity, which can be divided into three categories:

  • inter-level: very different capabilities depending on the level;

  • intra-level: different resources at the same level;

  • intra-node: heterogeneous resources available on the same device/node (e.g., general-purpose single or multi-core CPUs, many-core GPU, HW accelerators).

Dealing with such a dynamic, modular and heterogeneous system requires to address some research challenges, which are the key objectives of our work: (a) the possibility to control and manage the entire system; (b) the development of a unified programming model and a transparent task distribution to fully exploit the connected devices; (c) the need for a run-time resource manager (RM) with proper task mapping and resource allocation strategies; (d) the availability of real-world use-case applications as well as accessible and physical hardware test-bed to perform evaluations.

2 The BarMan Framework

The BarMan framework in Fig. 2 integrates different frameworks in a single suite and extends their capabilities to enable a run-time managed execution of multi-tasking applications on a heterogeneous and distributed (embedded) computing system. In the following paragraphs, we highlight the main features of each module: in particular, it comprises The BarbequeRTRM resource manager for embedded, mobile, and HPC systems, the libmango programming model for modular applications, and the BeeR framework in order to distribute them among the devices. Eventually, as shown in Fig. 2, these modules can interface with various platforms and use-cases.

2.1 The BarbequeRTRM Resource Manager

The Barbeque Run-Time Resource Manager (BarbequeRTRM) [3]Footnote 1 is a modular, open-source, and extensible run-time RM able to manage the allocation of computing resources to concurrent applications while taking into account both applications’ QoS requirements and dynamic resource availability. Supporting several different types of platforms (e.g., embedded, HPC, mobile...), it has been extended to support multi-device cooperation.

The main feature is the so-called Adaptive Execution Model, an Android-style execution flow in which each application and resource manager interact. This allows the applications to be controlled and reconfigured by the manager in order to respect performance/power constraints. The proper management is performed by various allocation policies that can be easily plugged to deal with specific optimization metrics (e.g., latencies, bandwidth, power consumption...).

Fig. 2
figure 2

Overview of the BarMan framework, applications and integrated devices

2.2 libmango: A Task-Based Programming Model

The libmango Programming Library [1]Footnote 2 has been initially developed inside the MANGO European Project [5] to develop multi-tasking applications on HPC platforms. This way, applications can be decomposed into small chunks of code (called tasks or kernels) that are offloaded and run on different nodes. Our solution outperforms state-of-the-art solutions [1], and differently from libraries like OpenCL, allows the developer only to provide a task-graph of the application (i.e., a Directed Acyclic Graph describing tasks, memory buffers, and their inter-dependencies) without implementing the task-to-device mapping logic, which is delegated to the management software for the optimization at run-time.

Fig. 3
figure 3

BarMan framework’s modules deployment

2.3 Transparent Tasks Distribution: BeeR

The programming library has been extended to support dynamic content offloading by the open-source BeeR. It is composed of two components: a client library, which includes API to extend the libmango, and a daemon server deployed on the devices and is in charge of handling the execution of incoming tasks, as well as providing minimal support to retrieve device status. Figure 3 shows the interaction between the BeeR and the other BarMan modules. In the typical flow, when the user requests an application, the BarbequeRTRM provides an appropriate mapping plan based on the selected optimization metrics and the current state of the entire system (load, energy budget, availability...). Then, the BeeR client enables the communication between the remote daemon instances and defines the set of assigned resources for each task. On the remote side, the BeeR daemon performs the resource and buffer reservation and manages the task execution.Footnote 3

2.4 Use-Case Evaluation Scenarios

One of the main demands in the research community is the need to have real-world use-case applications and an accessible cluster to perform experiments. Thus, in order to prove and evaluate the functionality of the entire framework, we developed two application scenarios tested on a real self-build Fog cluster: the SmokyGrill. The latter is composed of different interconnected embedded boards (i.e., Jetson TX2, Odroid H2, Freescale) that aim to replicate a typical Fog setup with both low-end and high-end devices.

Video Surveillance The first application is related to the video surveillance scenario, and its goal is to classify and track moving objects in a specific area. It is inspired by the simulated use-case by Gupta et al. [7], then implemented with some modifications and optimization using the libmango API. Given the application, we develop the LAtency Versus Accuracy (LAVA) allocation policy to optimize three metrics (detection accuracy, detection latency, and tracking latency) by minimizing the execution and communication (wireless or wired) latencies of its tasks. Finally, the evaluation consists of (a) a kernel execution characterization; (b) network and framework overheads measurement; (c) policy execution analysis with a set of scenarios aiming to cover different devices’ availability and configurations. Among the other results that can be found in [12], the most promising outcome shows the benefit of performing a distributed execution on proper available devices instead of considering the original monolithic version by improving the execution time up to \(66\%\) depending on the situation scenario considered (Fig. 4).

Fig. 4
figure 4

Comparison of execution time

Large-scale Emergency system In this second use-case, we apply our architectural model to a large-scale emergency scenario [13]. The current emergency systems are outdated and can not satisfy the time-sensitive need for trustworthy emergency services when natural disasters happen. To overcome these limitations, we propose a semantic-based trustworthy information-centric Fog system that provides emergency services, and it is based on three components: (a) edge devices: to collect and pre-process the information at the Edge level; (b) the information-centric (IC) Fog network: to exploit the Fog paradigm by computing semantic information for secure emergency analysis and management; (c) a Cloud emergency center (CEC).

The objectives of this application are: (1) proposing a novel emergency communication network to aggregate and analyze emergencies with semantic information in the network layer; (2) designing a semantic-based trustworthy routing scheme, able to filter untrusted content by improving the quality of emergency services; (3) evaluating the system through a real testbed and a modified Cloudsim simulation software for large-scale scenarios in order to provide valuable data to deploy the system on existing IIoVT devices.

After retrieving data from the testbed, we filled the simulation software to analyze how the system handles a 3-hour earthquake disaster varying the number of emergencies and the layers’ availability. In this regard, we evaluate four scenarios: (a) without Fog networks and Cloud (WTFC), (b) with only Fog networks (WF), (c) with working Fog networks and Cloud (WFC); (d) a traditional system without Fog networks (WTF). As shown in Fig. 5, using the resource continuity (i.e., WFC scenario), the failure rate decreases between \(87\%\) and \(37\%\) w.r.t. other system’s configurations.

Fig. 5
figure 5

Failure rate versus number of emergencies

3 Exploiting Mobile Devices

Integrating mobile devices into the resource continuum opens wide opportunities in terms of scalability and pervasiveness. However, since they are battery-supplied, an efficient and fine-grained power management is crucial to meet users’ satisfaction and maintain devices’ availability.

Moreover, nearby devices can be exploited to decrease power consumption by delegating part of the computation, thus leading to privacy and over-usage issues [10]. In this regard, an integrated resource manager can assure isolation and energy budgeting for the different running applications or incoming tasks. Thus, we develop an Android version of the BarbequeRTRM, which supports mobile applications in terms of QoS and resource demands.

3.1 Run-Time Adaptive Application Execution

Figure 6 shows the API design, which bonds the native layer of Android OS where the BarbequeRTRM runs and the Java application framework layer through a custom service-based wrapper. In this way, applications can be integrated with the BarbequeRTRM AEM, enabling the possibility to (1) set the application performance/power saving goal (i.e., the throughput) or (2) require an explicit constraint on the resource allocation. Given this information, the RM can enforce power/resource management through the underlying Linux frameworks (e.g., cpufreq, cgroup) to set CPU operating points, and reserve CPU time quota or cores basing on the specific optimization policy plugged. In this regard, one of our experimental evaluations are devoted to providing a proof-of-concept of the prototype. We profile the execution of a mobile benchmark suite gathering performance (fps) and power consumption measures in order to define a set of pre-defined resource assignment configurations called Application Working Modes (AWM); each AWM contains information about the amount of CPU, the frequency setting, the average system power consumption, and the performance estimation. Then, a simple policy chooses different AWMs in a constrained range based on the maximum throughput required by the user. Figure 7 shows an evaluation scenario with the Image Effect benchmark, where the desired throughput (dotted-red line) is reached and the power consumption reduced by a subsequent changing of AWMs made by the policy.

Fig. 6
figure 6

Overview of the Android BarbequeRTRM software stack

Fig. 7
figure 7

Benchmark’s performance and system’s power consumption

4 Conclusions and Future Directions

Besides the different results highlighted in this brief research summary, there are still open issues that need to be addressed. In particular, regarding the task distribution problem, the main challenges are related to security and privacy, which require actuating hard isolation and lightweight cryptography techniques.

Moreover, from a resource management standpoint (1) machine learning technique can be used to predict incoming applications and their performance goal required by specific application or process, thus improving the current mobile policy; (2) the LAVA allocation policy will also be extended also with energy-related aspects.

Finally, other use-case applications can be integrated, like the automotive one developed during the M2DC EU project [2]; as well as deploying in real large-scale experiments the two presented use-cases to gather useful measurements and insights to improve simulation models and management strategies.