Scalable and Reliable Multi-dimensional Sensor Data Aggregation in Data Streaming Architectures

Ever-increasing amounts of data and requirements to process them in real time lead to more and more analytics platforms and software systems designed according to the concept of stream processing. A common area of application is processing continuous data streams from sensors, for example, IoT devices or performance monitoring tools. In addition to analyzing pure sensor data, analyses of data for entire groups of sensors often need to be performed. Therefore, data streams of the individual sensors have to be continuously aggregated to a data stream for a group. Motivated by a real-world application scenario of analyzing power consumption in Industry 4.0 environments, we propose that such a stream aggregation approach has to allow for aggregating sensors in hierarchical groups, support multiple such hierarchies in parallel, provide reconfiguration at runtime, and preserve the scalability and reliability qualities of stream processing techniques. We propose a stream processing architecture fulfilling these requirements, which can be integrated into existing big data architectures. As all state-of-the-art stream processing frameworks have to handle a trade-off between latency, resource-efficiency, and correctness, our proposed architecture can be configured for low latency and resource-efficient computation or for always ensuring correct results. To assist adopters in choosing appropriate configuration options, we provide an experimental comparison. We present a pilot implementation of our proposed architecture and show how it is used in industry. Furthermore, in experimental evaluations we show that our solution scales linearly with the amount of sensors and provides adequate reliability in the presence of faults.


Introduction
Stream processing [1,2] has evolved as a paradigm to process and analyze continuous streams of data, for example, coming from IoT sensors. The rapid development of stream processing engines [3] over the last years has paved the way for applications that process data exclusively  1 Software Engineering Group, Kiel University, 24098 Kiel, Germany online, i.e., as soon as it is recorded. Whereas a couple of years ago Lambda architectures were the de facto standard for analytics platforms, currently more and more platforms follow the Kappa architecture pattern, where data is exclusively processed online [4]. Further, entire software system architectures [5] follow patterns such as asynchronously communicating microservices [6] and event sourcing [7], which require data to be available as continuous streams instead of actively polled from databases.
When considering continuous streams of measurement data, for example, from physical IoT sensors or software performance metrics, often an aggregation of multiple such streams is required. Whereas in the traditional approach of first writing all measurements to a (relational) database and then querying this database, this is a well-known task, performing such an aggregation on continuous data streams raises challenges. This is in particular true when requirements for scalability and reliability have to be considered.
In this paper, we contribute to the seminal work on stream processing by presenting and evaluating dataflow architectures to aggregate data of multiple streams in real time. This paper is an extended version of a workshop paper presented at the 2019 IEEE International Conference on Big Data [8]. We extend it by proposing two architecture variations ("Dataflow Architecture Variations"), providing now support for out-of-order record, although at the cost of increased latency and resource demand. Further, we experimentally compare our architecture variations in terms of their performance ("Experimental Comparison of Architectures") to assist adopters in choosing an appropriate option.
The remaining paper is structured as follows: "Motivating Example" motivates the demand for our aggregation approach by an example for IoT sensor data streams. "Requirements for Stream Aggregation" derives essential requirements for an aggregation approach. "The Dual Streaming Model" provides a brief summary of the fundamental model used by our stream processing architecture for aggregating data streams, which is presented in "Basic Dataflow Architecture." "Pilot Implementation for IoT Sensor Data" shows how we implement this architecture in an industrial IoT monitoring platform. "Experimental Scalability and Reliability Evaluation" evaluates our proposed architecture in terms of scalability and reliability. "Dataflow Architecture Variations" presents two architecture variations of our basic architecture from "Basic Dataflow Architecture," supporting out-of-order records. "Experimental Comparison of Architectures" experimentally compares all three architecture variations in terms of their performance. "Industrial Case Study" describes how the proposed stream processing architecture is used in an industrial setting to aggregate power consumption data. Finally, "Related Work" discusses related work and "Conclusions" concludes this paper.

Motivating Example
Operators of industrial production environments have high interest in getting detailed insights into the resource usage of machines and production processes. Those insights may reveal optimization potential, provide reporting for stakeholders, and can be used for predictive maintenance. Today's industrial production environments operate a variety of network-accessible measuring instruments. Such instruments (sensors) continuously measure, inter alia, the resource usage of individual machines, for example, their electrical power consumption. They may publish these metrics via a messaging system, allowing real-time analytics systems to collect, process, store, and visualize those data.
In particular, but not exclusively in the case of electrical power consumption, it is not only producing machinery that uses resources but also other company areas such as IT infrastructure, employee offices, or building technology. This leads to the situation that the amount of resource consuming devices is often immense, which makes it difficult to assess. Therefore, metrics for groups of machines in addition to consumption metrics of the individual machines are often required. Consequently, the data streams of the individual sensors have to be continuously aggregated. Referring to Fig. 1, which exemplary shows a production environment comprising various machines and devices, operators may require to answer questions in real time such as the following: What is the resource usage of a certain machine type (e.g., the total power consumption of all turning shops)? What is the resource usage by business unit (e.g., the total power consumption of producing machinery)? What is resource usage by physically collocated machines (e.g., the total power consumption per building or shop floor)? What is the overall company-wide resource usage? Further, for machines featuring multiple independent power supplies (e.g., for redundancy), already obtaining consumption data for single machines requires to aggregate data streams of their individual power supplies.

Requirements for Stream Aggregation
Even though we motivated the need for real-time aggregation of data streams by resource data recorded by IoT devices, similar kinds of data aggregations are required by several other types of continuous sensor data streams. Continuing our motivating example, we derive the following requirements for a data stream aggregation architecture: Multi-layer Aggregation Measurements of sensors are aggregated to groups of sensors. These groups can again be aggregated to larger groups and so forth. In the previous example, these groups could first be groups of individual machines of the same type, then groups of machines fulfilling the same function, then all machines used by the same production step and, finally, all machines in the entire production environment.

Multi-hierarchy Aggregation
In addition to a single hierarchy as described above, it is likely that there is a demand for supporting multiple such hierarchies. Referring to the previous example, besides a hierarchy based on the purpose of machines, one may also need a hierarchy which represents the physical location. For example, in a first step machines in the same shop floor are grouped and then all shop floors in a certain building are grouped. Fig. 1 Schematic illustration of a manufacturing company operating two buildings and a wide range of power-consuming machinery and infrastructure. Operators may be, for instance, interested in the total power consumption of all turning shops (red), directly required for production processes (blue), of a certain building (green), or of the entire company (yellow) Hierarchy Reconfiguration at Runtime Stream processing applications are often characterized by demands for high availability. Therefore, we aim for an approach that allows to modify or extend the previously described hierarchies at runtime to prevent reconfigurations causing downtimes.

Preserving Scalability and Reliability
Stream processing is usually used for large volume of data where the load has to be handled by multiple CPU cores or even multiple computing nodes. Therefore, an approach that performs aggregations on these streams has to preserve scalability and reliability properties to not become the overall architecture's bottleneck.

The Dual Streaming Model
The dual streaming model [9] is the foundation for the stream processing architectures described in this paper. It is a model to define the semantics of a stream processing architecture. It adopts the notion of data streams and streaming operators from other established stream processing models [10][11][12].
A data stream is an append-only sequence of immutable records, where records are key-value pairs augmented by a timestamp. Key-value pairs allow for data parallelization as records with different keys can be processed in parallel. This is the fundamental idea of building highly scalable stream processing applications.
Streaming operators are functions applied to each record of an input stream, whose results are appended to an output stream. The number of output records may be zero, one, or more than one depending on the type of operator. Usually, operators are distinguished as stateless or stateful. Stateless operators produce an output solely based on the currently processed input record, whereas stateful operators may also take previous input records and computations into account. Typical examples for stateless operations are filter, map, flatMap, or merge, whereas aggregate or join are common examples for stateful operations. Successively connecting operator output streams with other operators allows to define complex stream processing topology architectures, for example, to build big data analytics applications.
The dual streaming model extends these models by considering the result of streaming operators as successive updates to a table. These updates may be materialized into a versioned table or represented as a stream of insert, update, and delete events, inducing a duality of streams and tables. Sax et al. [9] present a reference implementation of the dual streaming model called Kafka Streams, a stream processing framework build on top of the distributed messaging system Apache Kafka [13].

Basic Dataflow Architecture
In this section, we present a basic dataflow architecture for hierarchically aggregating streams of data. We apply the dual streaming model to model the topology of consecutive operations on the streaming data, which are required for aggregating sensor data. This model forces an unidirectional and side effect-free description of the data flow and, thus, allows the scalability and reliability facilitated by the model to be exploited. The dataflow architecture described in this way can be implemented as an encapsulated component, e.g., as a microservice, which can be integrated into existing software systems. Since in some cases the dual streaming model abstracts too many details to comprehensibly explain our dataflow architecture, we use some architectural elements which are only present in their reference implementation, but not in the model. Figure 2 visualizes our proposed architecture. The individual processing steps, realized as streaming operators, are described in the following.
1. Data Sources Our proposed architecture requires two data sources. The first one is an input stream of measurements, keyed by a sensor identifier. This is the sensor data stream

Merging Measurement Streams
The first operator merges the input stream with a stream of already calculated aggregation results. For the creation of this aggregated stream, see step 7. Note that the aggregated stream may require an additional converting step. In the following, we make no distinction between sensor measurements and aggregation results and call these values simply measurements. Table The  measurement stream and the groups table are joined using an inner join operation. This leads to a new update of the resulting table whenever either a measurement arrives or the groups a sensor belongs to changes. The result of this join operation is a tuple consisting of the measurement and the set of all groups this measurement has effect on.

Duplicating Join Results
In the next step, the measurements are duplicated in a way that for each group a new record is forwarded. This record has the following form: The key is a pair of sensor identifier and the corresponding group this record is created for. The value is the measurement. This operation is stateful as it always stores the last set of sensor groups and compares them with the new one. If a sensor was part of a group in a previous record but not in the currently processed one, a special record is forwarded with a "tombstone" value. This value serves for informing the following topology operators that the corresponding sensor is not longer part of the corresponding group. Table The

Pilot Implementation for IoT Sensor Data
In this section, we return to our motivating example from "Motivating Example" and show in a pilot implementation how our proposed architecture can be used to aggregate IoT sensor data. The Titan Control Center [14] is a microservicebased application for analyzing the energy consumption in manufacturing enterprises. It integrates energy consumption data of different data sources (e.g., machine-level data, building technology, or external software systems) and aggregates, analyzes, and visualizes them in near real time (see Fig. 3). We extended the Control Center's architecture by an additional microservice, which implements the topology described above and replaces a former, less scalable and reliable data aggregation. This microservice subscribes to a stream providing energy consumption data of individual machines and to a stream forwarding changes to the sensor hierarchy (provided by the Configuration microservice of the Control Center). It aggregates the data of all sensors in a group by summing them up and publishes every result to a dedicated topic allowing other services to subscribe to this aggregated data. Other microservices use these data, for example, to calculate power consumption statistics, to produce forecasts, or to visualize them. Specifying the proposed architecture with the dual streaming model allows for a straightforward implementation in Kafka Streams. Nevertheless, as the dual streaming model is more abstract than the Kafka Streams API, we had to introduce some additions: In some places, streams had to be explicitly converted into tables via reduce operations. The data type of aggregated records does not match the data type for sensor measurements, thus, an additional map operation before merging with the sensor data stream was necessary. To avoid emitting subtract events from the table of aggregations, these events have to be explicitly filtered out. The operator duplicating records for each parent had to be implemented as a custom flatTransform operator, as Kafka Streams currently does not support flatMap operations on tables. Using Kafka Streams promises a high degree of scalability and reliability, as deploying multiple instances of this microservice causes Kafka Streams to balance the load among all instance based on the topology and the number of configured Kafka topic partitions.

Experimental Scalability and Reliability Evaluation
We experimentally evaluate the scalability and the reliability of the proposed stream processing architecture. For these evaluations, we simulate different numbers of power metering sensors and aggregate their data streams to predefined groups using the Titan Control Center (see above). Each simulated sensor emits one measurement per second. We record both the number of sensor measurements per sensor and the number of computed aggregation results per second as well as the average event-time latency per second. The event-time latency is the time passed between sensor record generation and obtaining the result of an aggregation [3]. The experimental setup is deployed in a Kubernetes cluster of 4 nodes, each equipped with 384 GB RAM and 2 × 16 CPU cores providing 64 threads (overall 256 parallel threads).

Evaluation of Scalability
A software system is considered scalable if it is able to continue processing an increasing workload with additional resources provided [15]. In order to assess the scalability of our proposed approach, we therefore evaluate how an increasing amount of input data can be handled by an increasing number of aggregation instances. For this purpose, we identify the number of instances required to aggregate the data streams of a given workload. We group 8 simulated sensors into one group and group 8 of such groups again into one larger group. We evaluate 4 workloads, where we simulate 2, 3, 4, or 5 nested groups, resulting in a total We consider a deployment (i.e., a certain set of deployed instances) as being sufficient to handle the given workload if all simulated data can be processed such that no data records are piling up between their generation and the aggregation. To obtain this information, we measure the event-time latency between generation and aggregation. If this latency remains largely constant, we conclude that records are aggregated at approximately the same speed as they are generated. We apply linear regression to compute a trend line and consider the processing to be largely constant if the trend line's slope is less than 100 ms. 1 For each evaluated workload, we determine the minimum number of instances, which is required to handle that workload, i.e., shows an event-time latency trend line with a slope of less than 100 ms. This evaluation is repeated 10 times. Figure 4 shows the median over all repetitions of the required number of instances per workload. We observed that the required number of instances scales linearly with the amount of data to be aggregated.

Evaluation of Reliability
In order to assess the reliability of our proposed approach, we evaluate how the architecture behaves when components fail. We generate the workload described in the scalability evaluation, which simulates 4 nested groups of sensors, and aggregate the data with 24 instances of the aggregation component. After 10 min of processing, we inject a failure during operation by shutting down 18 instances and starting them again after 5 more minutes. We measure both the number of generated messages and the amount of aggregation results during the entire evaluation. Since both values fluctuate strongly, we additionally calculate a moving average over a 60-s window.
The average number of processed records per second over time is presented in Fig. 5. It can be seen that the amount of performed aggregations decreases sharply during the simulated failure. This is reasonable as there are not enough resources available anymore to process all data. However, if the stopped instances are replaced by new instances, the amount of processed data increases again. Furthermore, as we set the number of processing instances twice the number actually necessary (cf. scalability evaluation), the piled up data is also processed.

Dataflow Architecture Variations
While the basic architecture described in "Basic Dataflow Architecture" provides correct results when all measurements from the input stream arrive in the order of their timestamps, out-of-order records would produce wrong aggregation results. According to the dual streaming model, a late-arriving record would cause a recomputation of all consecutive steps. While this is in principle desired, the corresponding lookup in the last values table only yields the most recent value causing wrong results. A simple, yet in many cases sufficient solution is to simply reject late-arriving records. A more complex, but also correct solution is to explicitly consider measurement times during the aggregation process. In the following, we present two architecture variations and compare them with our basic architecture in Table 1.

Tumbling Window-Based Dataflow Architecture
In order to support out-of-order records, we extend our previously described architecture by a windowing operator located after the duplicate step and before the creation of the last values table (see Fig. 6). This operator splits the absolute timeline into tumbling windows, that is, time windows of constant but configurable size [9], and associates each record with the time window it was recorded in. The windowing operator transforms each input record by extending its key by the corresponding window, where a window is represented by a pair of start and end timestamp.
For the consecutive last value table, this means the table does not only store the last value per sensor-grouppair anymore, but instead stores this value for each time window. However, to not let the table grow indefinitely, a decision has to be made for how long data of passed time windows have to be retained. This can be configured via a "grace period," which effectively specifies for how long out-of-order records are considered by the architecture since records, which arrive after the grace period of their corresponding window has expired, are discarded. The following grouping and aggregation is then not exclusively made by the key's "group" part, but by its "group" and "window" attribute. Afterwards, a result is only emitted when the window ended. In contrast to our basic architecture, a result for a group is thus no longer published to the aggregations stream with every update of a sensor, but only at the end of each window. This is achieved by using a suppression function [16] called before publishing to the aggregation result streams. Suppressing record forwarding is not part of the dual streaming model but is implemented as an operator in Kafka Streams.
With this extended architecture, out-of-order records are associated with their corresponding time window. Thus, they neither have to be discarded nor their consideration has influence on the aggregation results for other windows. In comparison with our basic architecture of "Basic Dataflow Architecture," the memory usage of the last value table in this architecture is larger by a constant factor, which corresponds to the number of simultaneously opened windows. The number of simultaneously open windows at any time corresponds to the number of windows whose grace period has not expired until that time and, thus, only depends on the choice of the grace period (see Table 1).
The major drawback of this architecture is that an emit rate has to be chosen in advance. Since for computing an aggregation result for a group of sensors only measurements within the respective time window are considered, this architecture requires each sensor to provide at least one measurement per window. The frequency sensors delivering data with must therefore be known in advance (or at least an upper estimate), so that the window size is  Fig. 6 Extended dataflow architecture, which allows considering the recording time of measurements to calculate correct results for out-oforder records. The notation of architectural elements corresponds to the one in Fig. 2. The added data type window represents a time window consisting of an inclusive start timestamp and an exclusive end timestamp chosen sufficiently large. Furthermore, this frequency automatically corresponds to the frequency of publishing aggregation results. However, these two frequencies may conflict if individual sensors generate measurements with different frequencies but the frequency of generating aggregation results should not be bounded by the lowest of these frequencies.

Hopping Window-Based Dataflow Architecture
Our second architecture variation is a modification of the tumbling window-based architecture depicted in Fig. 6.
In order to support out-of-order records as well as to allow configuring the aggregation emit rate independently from the lowest measurement frequency, it modifies the windowing operator in a sense that it creates hopping windows [9] instead of tumbling windows. That is, the windowing operator does not split the timeline in successive windows, but instead in windows, which overlap by a configurable duration. Thus, a measurement does not belong to one window anymore but to multiple windows, causing the windowing operator to not emit one windowed record per input anymore but instead to produce multiple windowed records. The semantics of the following operators do not change. However, duplicating a record for different windows causes the following grouping and aggregate operations to be performed for each open window. The final suppress operation, however, causes an aggregation result to be emitted only when the window closes. Thus, on average only one of the aggregation results is emitted per measurement.
The hopping window-based architecture thus allows to choose the window size according to the maximal expected duration between two measurements of the same sensor. On the other hand, it allows to specify an additional emit rate, which can be significantly smaller and configures the interval with which aggregation results are emitted.
Consequently, if no new measurements for a sensor were received since emitting the last aggregation result, this architecture allows to rely on previous measurements, which were recorded before the last emission of an aggregation result (but are still within the window size). Unlike to our basic architecture but identically to the tumbling windowbased architecture, the hopping window-based architecture only emits results at certain points in time (i.e., whenever a window closes). In contrast to the tumbling window-based architecture, however, the emit rate can be set arbitrary small without compromising correctness. However, the greater the difference between window size and emit rate is, the more records will be generated. The windowing operator duplicates each measurement as often as there are overlapping windows. According to this factor, records are created that need to be processed by the consecutive streaming operators and the last value table has to store accordingly more values. "Experimental Comparison of Architectures" presents our experimental comparison of how the choice of the window size influences the required computing resources.

Experimental Comparison of Architectures
In this section, we experimentally evaluate and compare the performance of the individual architectures presented in "Basic Dataflow Architecture" and "Dataflow Architecture Variations." For this purpose, we reuse the experiment setup from "Experimental Scalability and Reliability Evaluation" and simulate 512 sensors organized in a pre-defined hierarchy with 3 nested levels of groups. Again, each sensor emits one measurement per second and we use the Titan Control Center to aggregate sensor measurements according to the hierarchy. The Kafka Streams application executing the stream processing architecture is configured with a commit interval of 1 s meaning that Kafka Streams stores and forwards intermediate results in the stateful operations every second. We evaluate different numbers of Kafka partitions and set the number of Kafka Streams threads to 4 times the number of partitions ensuring that each Kafka Streams task can be executed in a separate thread. For the two window-based architectures, we set the emit rate to 1 s and the grace period to 10 s. For the hopping windowbased architecture, an additional window size has to be configured, which we set in individual experiments to 10 s as well as to 60 s. Each configuration is executed for 10 min, where we discard all monitoring results for the first minute and we repeat each execution 10 times. The experimental setup is deployed on the same Kubernetes cluster as described in "Experimental Scalability and Reliability Evaluation." However, in contrast to the previously described evaluations of scalability and reliability, we only deploy one aggregation instance but configure it to use multiple threads.
For each evaluated architecture, we continuously monitor the required computing resources as well as the average aggregation latency and the consumer lag. The aggregation latency is the time passed between receiving a sensor measurement and emitting the corresponding aggregation result. The consumer lag describes the difference between the number of stored records in Kafka and the number of already consumed records by the aggregation. For determining the required computing resources, we measure the memory usage of the aggregation as well as Kafka Streams' disk usage since Kafka Streams stores the operator's state on the local disk but also uses in-memory caching. Furthermore, we determine the required number of threads by finding a configuration which does not lead to an increasing consumer lag over time. Figure 7 shows the average value of all monitored properties. We provided sufficient memory on the experiment nodes, which explains the low usage of the local disk in comparison with main memory. It is remarkable that the resource consumption is almost the same for our basic architecture and the tumbling window-based one. This leads us to conclude, that for a better correctness no more hardware resources have to be provided. Also the aggregation latency scarcely differs. However, if larger aggregation window sizes are required, as enabled by the hopping window-based architecture, these can only be achieved with significantly increased resources. For example, an aggregation window that is 10 times larger than the emit rate requires 42% more memory compared with a window size that equals the emit rate. An aggregation window of 1 min for a 1-s emit rate requires even 69% more memory. Kafka Streams allows to store the state of operators at the local disk, which is typically cheap compared with main memory. Whereas the increasing memory usage is therefore probably not an issue for many use cases, the increasing amount of generated data requires accordingly more threads for processing them. An aggregation window of 60 s requires, for example, 4 times the amount of threads than an aggregation window of 10 s. We normalize the consumer lag by dividing the measured lag by the amount of parallel open windows, which corresponds to the number of updates triggered per measurement (see Table 1). Compared with the number of measurements generated per second, the consumer lag stays on a fairly low level for all architectures. From the basic to the tumbling window-based and from the tumbling window-based to the hopping window-based architecture with a 10-s window, the aggregation latency only increases slightly. In further experiments, we observed that with a shorter commit interval the latency can be reduced at the cost of increased resource consumption. In order to reduce the amount of intermediate data, our Kafka Streams implementation uses a suppress operator, which ensures that aggregation results are only emitted after the aggregation time window passes. Unfortunately, with this configuration, late-arriving records (i.e., those arriving during the grace period) cause significantly delayed aggregation updates, which probably explains the large delay for the architecture with a hopping window of 60 s.

Industrial Case Study
Our implementation is used in production with two manufacturing enterprises to apply Industrial DevOps [17]. In these enterprises, the aggregation results are used, for example, to gain insights into how much energy is used by certain types of machines (e.g., the overall air conditioning), how big the difference is between measured companywide energy consumption and the sum of all known consumers, or how much an individual machine contributes to the overall consumption of all machines of that type. Figure 8 shows a screenshot of the Titan Control Center's comparison view. It provides a continuously updating view of the total electrical power consumption of our partner Kieler Nachrichten Druckzentrum, a newspaper printing company, in comparison with the consumption of its major consumers.

Related Work
Analyzing sensor data, often from IoT device, is a frequent use case for stream processing [18]. In such analytics systems, aggregating individual measurements to reduce the overall amount of data is a common task [19].
The primary type of data aggregation studied in literature is aggregating data streams along the dimension of time. In this case, records within the same time window having the same key are aggregated to a new record. Depending on the actual requirements, records can be aggregated within fixed windows, sliding windows, or session-based windows [12]. In addition to research on the efficient computation of window aggregations [20][21][22], there are also publications proposing software architectures for a temporal aggregation platform, for example by Twitter [23]. Temporal aggregations are compatible with our approach, which means that temporal aggregated streams of sensor data can be further aggregated to groups and streams for sensor groups can be further aggregated over time.
Joins in stream processing (analogous to those in SQL) [24] allow to connect different or identical streams. As the join operation is a bivariate function and thus aggregating multiple streams would require a chain of join operations, a corresponding pipeline can only statically be created and not dynamically adjusted at runtime. This contrasts our approach, which allows reconfigurations at runtime by changing the sensor groups table or an underlying stream.
As most approaches applying stream processing, our presented approach is primarily based on data parallelism [2], meaning that (sub)topologies exist in multiple instances, each processing a portion of the data. Pipe-and-Filter frameworks such as TeeTime [25] employ task parallelism, where the individual filters (operators) are executed in parallel. Since in this way all data pass each filter, the identified Note that for the tumbling window-based architecture, e equals w. Whereas our basic architecture does not support out-of-order records and the tumbling window-based architecture requires the measurement frequency of sensors to be known beforehand, the hopping window-based architecture is able to handle both. However, in contrast to the former two, it generates significantly more intermediate records and thus uses more memory in the last value table. requirements can be realized in a single filter. In contrast to our solution, however, scalability would be significantly compromised.

Conclusions
Software systems, which analyze or react on sensor data streams, often have to process not only raw measurements but also aggregated data for groups of sensors. In this paper, we presented a stream processing dataflow architecture for continuously aggregating sensor data. It supports the aggregation of hierarchical groups, multiple such groups in parallel, allows for reconfigurations at runtime and can be integrated (e.g., as a microservice) into existing big data architectures. We presented three variations of this dataflow architecture having different characteristics regarding correctness and performance. Our experimental comparison assists adopters in choosing an appropriate variation. We provide an implementation of this architecture for power consumption data and show how our implementation can be integrated into an analytics platform used in industry. Furthermore, in an experimental evaluation, we showed that our proposed architecture scales linearly with the amount of sensors and tolerates faults during operation. A replication package and experimental results provided as supplemental material allow to repeat and extend our work [26]. For future work, we plan to optimize the hopping window-based architecture in terms of its resource usage, while still being able to handle out-of-order records and tolerate varying measurement frequencies. Furthermore, future work may explore how our architecture can be implemented with other stream processing engines, for example, by considering recent trends towards uniform stream query languages [27]. As we were not able to discover any scalability limitations in our conducted experimental evaluation, we also plan to conduct experiments with even larger amounts of sensors.
Funding This research is funded by the German Federal Ministry of Education and Research (BMBF) as part of the Titan project (https:// www.industrial-devops.org, grant no. 01IS17084B). Open Access funding is enabled and organized by Projekt DEAL.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http:// creativecommonshorg/licenses/by/4.0/.