In this section, we show how the measures proposed in Section 5 can be implemented in a software architecture that adopts the microservice architecture pattern, big data stream processing techniques, and fog computing. In our Titan project on Industrial DevOps (Hasselbring et al. 2019), we develop methods and techniques for integrating Industrial Internet of Things big data. A major emphasis of the project is to make produced data available to various stakeholders in order to facilitate a continuous improvement process. The Titan Control CenterFootnote 1 is our open source pilot application for integrating, analyzing, and visualizing industrial big data from various sources within industrial production (Henning and Hasselbring 2021).
The architecture of the Titan Control Center follows the microservice pattern (Newman 2015). It consists of loosely coupled components (microservices) that can be developed, deployed, and scaled independently of each other (Hasselbring and Steinacker 2017). Our architecture features different microservices for different types of data analysis. Individual microservices do not share any state, run in isolated containers (Bernstein 2014), and communicate only via the network. This allows each microservice to use an individual technology stack, for example, to choose the programming language or database system that fits the service’s requirements best. In a previous publication (Henning et al. 2019), we show how these architecture decisions facilitate scalability, extensibility, and fault tolerance of the Titan Control Center.
Figure 4 shows the Titan Control Center architecture. It contains the microservices Aggregation, History, Statistics, Anomaly Detection, Forecasting, and Sensor Management. In addition to these microservices, our architecture comprises components for data integration, data visualization, and data exchange.
The Titan Control Center is deployed following the concepts of edge and fog computing (Garcia Lopez et al. 2015; Bonomi et al. 2012). In particular suited for Internet of Things (IoT) data streams, with edge and fog computing data is preprocessed at the edges of the network (i.e., physically close to the IoT devices), whereas complex data analytics are performed in the cloud (Pfandzelter and Bermbach 2019). In order to facilitate scalability and fault tolerance, the Titan Control Center microservices for data analysis and storage are deployed in a cloud environment. This can be a public, private, or hybrid cloud, which allows elastic increasing and decreasing of computing resources. On the other hand, software components for integrating power consumption data into the Titan Control Center are deployed within the production. This includes querying or subscribing to electricity meters, format and unit conversions, filtering, but also aggregations to reduce the amount of data points. We employ our Titan Flow Engine (Hasselbring et al. 2019) for this purpose. It allows graphical modeling of data flows in industrial production according to flow-based programming (Morrison 2010). With the Titan Flow Engine individual processing steps are implemented in so-called bricks, which are connected via a graphical user interface to flows. This enables production operators to reconfigure power consumption data flows, for example, to integrate new electricity meters, without having advanced programming skills.
All communication among microservices as well as between the data integration and microservices takes place asynchronously via a messaging system. We use Apache Kafka (Kreps et al. 2011) in our pilot implementation. Moreover, the Titan Control Center features two single-page applications that visualize analyzed data and allows for configuring the analyses.
In the following, we present how each measure proposed in Section 5 can be implemented using the Titan Control Center.
Near real-time data processing
Power consumption data is processed in near real time at all architectural levels of the Titan Control Center. This start by the ingestion of monitoring data and immediate filter, convert, and aggregate operations in the Titan Flow Engine at the edge. The final integration step is sending the monitoring data to the messaging system. Following the publish–subscribe pattern, microservices subscribe to this data stream and are notified as soon as new data arrive. In the same way, individual microservices communicate with each other asynchronously. Apache Kafka as the selected messaging system is proven for high throughput and low latency (Goodhope et al. 2012). Within microservices, we process data using stream processing techniques (Cugola and Margara 2012). This implies that microservices continuously calculate and publish new results as new data arrive. For implementing stream processing architectures in most of the microservices we use Kafka Streams (Sax et al. 2018). As all computations are performed in near real time, also the visualizations can be updated continuously. Hence, the visualization applications (see Section 6.7) periodically request new data from the individual services.
Multi-level monitoring
The Aggregation microservice (Henning and Hasselbring 2019) of the Titan Control Center computes the power consumption for groups of machines by aggregating the power consumption of the individual subconsumers. This microservice subscribes to the stream of power consumption measurements coming from sensors, aggregates these measurement continuously according to configured groups, and publishes the aggregation results via the messaging system as if they were real sensor measurements. In addition to sensor measurements, however, these data are enriched by summary statistics of the aggregation.
As proposed in Section 5.2, the Aggregation microservice supports aggregating sensor data in arbitrary nested groups and multiple such nested group structures in parallel. In one of our studied enterprises, we integrate power consumption data of different kinds of sensors, which provide data in different frequencies. An important requirement for the Aggregation service was therefore to support different sampling frequencies. Furthermore, besides the focus on scalability throughout the entire Control Center architecture, an important requirement for this microservice is to reliably handle downtimes and out-of-order or late arriving measurements. Therefore, it allows to configure the required trade-off between correctness, aggregation latency, and performance (Henning and Hasselbring 2020).
The Sensor Management microservice of the Titan Control Center allows to assign names to sensors and arrange these sensors in nested groups. For this purpose, the Titan Control Center’s visualization components provides a corresponding user interface. The Sensor Management service stores these configuration in a MongoDB (MongoDB 2019) database. It publishes changes of group configurations via the messaging system such that the Aggregation service (and potentially other services) are notified about these reconfigurations. The Aggregation service is designed in a way that, when receiving reconfigurations, it immediately starts aggregating measurements according to the new group structure. Further, as aggregations are performed on measurement time and not on processing time, it supports reprocessing historical data.
Temporal aggregation
Both types of temporal aggregations discussed in Section 5.3 are supported by the Titan Control Center. As both types serve different purposed, they are implemented in individual microservices. Both services subscribe to input streams, which provide monitored power consumption from sensors as well as aggregated power consumption for groups of machines.
Aggregating Tumbling Windows
The History microservice receives incoming power consumption measurements and continuously aggregates all data items within consecutive, non-overlapping, fixed-sized windows. The results of these aggregations are stored to an Apache Cassandra (Lakshman and Malik 2010) database as well as published for other services. The History service supports aggregations for multiple different window sizes in parallel, allowing to generate time series with different resolutions. To prevent the amount of stored data from becoming too large, time series of different resolutions are assigned different times to live. Thus, the Titan Control Center allows, for example, to store raw measurements captured with high frequency for only one day, but aggregated values in minute resolution for years. Window sizes and times to live can be individually configured according to requirements for trackability and availability of storage infrastructure.
Aggregating Temporal Attributes
The Statistics microservice aggregates power consumption measurements by a temporal attribute (e.g., day of week) to determine an average course of power consumption, for example, per week or per day. These statistics are continuously recomputed, stored in a Cassandra database, and published for other services, whenever new input data arrives. In our studied pilot cases we found out that in particular the average consumptions over the day, the week, and the entire year allow to detect pattern in the consumption. Furthermore, aggregating temporal attributes such as the month of the year over one year allows to observe how monthly peak loads evolve over time.
Correlation
The Titan Control Center provides different features for correlating power consumption data. One of these features is graphical correlation of power consumption of different machines or machine groups. Our visualization component (see Section 6.7) provides a tool, which allows a user to compare the power consumption of multiple consumers in time series plots (see Fig. 5). It displays multiple time series plots below each other, each containing multiple time series. The user can zoom into the plots and shift the displayed time interval. All charts are synchronized by the time domain, thus zooming or shifting one plot also effects the others (Johanson et al. 2016). This tool allows operators to analyze interesting points in time (such as outtakes or load peaks) in more detail.
Together with the newspaper printing company, we implemented a first proof of concept for correlating real-time production data with power consumption data. We correlated the printing machines’ power consumption with their printing speed. For this purpose, we integrated the production management system using the Titan Flow Engine and visualized both types of data in our visualization component. Even though we were able to show the feasibility of such a real-time correlation, we identified that for in-depth analyses, power consumption data with higher accuracy is required. Similarly, we prototypically correlated the power consumption of air conditioning systems with weather data. We identified a high impact of the outside temperature on the power consumed for cooling and, thus, use weather data as a feature for our forecasting implementations (see Section 6.6).
Anomaly detection
The Titan Control Center envisages individual microservices for independent anomaly detection tasks and, hence, allows to choose an appropriate technique for each task. This includes individual techniques for different production environments and even for different machines.
With our pilot implementation, we already provide an Anomaly Detection microservice, which detects anomalies based on summary statistics of the previous power consumption. These statistics (e.g., per hour of week) are continuously recomputed by the Stats microservice (see Section 6.3) for each machine and machine group and published via the Control Center’s messaging system. Our Anomaly Detection microservice subscribes to this statistics data stream and joins it with the stream of measurements (from real machines or aggregated groups of machines). Ultimately, this means each incoming measurement is compared to the most recent summary statistics of the corresponding point in time and machine. If the measured power consumption deviates to much from the average consumption of the respective hour and weekday, it is considered as an anomaly. More precisely, for a measurement x and summary statistics providing the arithmetic mean μ and standard deviation σ, the service computes the absolute distance from the arithmetic mean \(d = \lvert x - \mu \rvert \) and tests if d < kσ, where k is the configurable number of standard deviations. All detected anomalies are again published to a dedicated data stream via the messaging system, allowing other microservices to access detected anomalies. Moreover, the microservice stores all detected anomalies in a Cassandra database.
The currently implemented method for detecting anomalies is rather simple. It does not require complex model training or manual modeling, but is not able to consider trends, seasonality over larger time periods, or external variables. We are working on extending our pilot implementation, in order to join the measurement stream with the data stream published by the forecasting service (see Section 6.6). This implementation will consider measurements as anomalies if they deviate too much from the prediction, which is a common approach for anomaly detection.
Forecasting
Similar to anomaly detection, we envisage individual Forecasting microservices for different types of forecasts, for example, used for different power consumers. Forecasting benefits notably from the microservice pattern since technologies used for forecasting often differ from the ones used for implementing web systems. The Titan Control Center supports arbitrary Forecasting microservices, each using its own technology stack. The only requirement for a Forecasting service is that it is able to communicate with other services via the messaging system.
Our pilot implementation already features a microservice that performs forecasts using an artificial neural network with TensorFlow (Abadi et al. 2016). This neural network is trained offline using historical data and mounted into the microservice at start-up. During operation, the Forecasting microservice subscribes to the stream of measurements (again monitored or aggregated) and feeds each incoming measurement into the neural network. The forecast results are stored in an OpenTSDB (The OpenTSDB Authors 2018) time series database and published to a dedicated stream via the messaging system.
In a first proof of concept, we build and trained such neural networks together with the newspaper printing company. We selected a set of machines in the company with different power consumption patterns and trained individual networks per machine. These neural networks use not only the historical power consumption of their machines as input, but also the power consumption of other machines as well as environmental data, such as the outside temperature. We deploy individual instances of our Forecasting microservice for each neural network, allowing for individual forecasts of each machine.
Visualization
As suggested in Section 5.7, the Titan Control Center features web applications for visualizing power consumption data. Since visualization serves as a measure to integrate the results of other measures, we also regard the visualization software components as integration of the individual analysis microservices. The Titan Control Center provides two single-page applications for visualization: a graphical user interface, tailored to the specific functions of the Titan Control Centers, and a dashboard for simple, but highly adjustable data visualizations. In the following, we describe both applications and their corresponding use cases.
Control Center
The Titan Control Center user interfaceFootnote 2 serves to provide a consistent access to all functionalities of the Titan Control Center. This includes visualizing the analysis results of microservices, but also control functions for configuring microservices. The user interface is implemented with Vue.js (You 2019) and D3 (Bostock et al. 2011).
Figure 6 shows a screenshot of the Titan Control Center’s summary view. It consists of several components which collect and show the individual analysis results for the entire production. A time series chart displays the power consumption in course of time. This chart is interactive, allowing to zoom and shift the displayed time interval. Colored arrows indicate how the power consumption evolved within the last hour, the last 24 hours, and the last 7 days. A histogram shows a frequency distribution of metered values serving to detect potential for load peak reduction. A pie chart breaks down the total power consumption into subconsumers. Line charts display the average course of power consumption over the week or the day, as provided by the Statistics microservice (see Section 6.3). The visualizations are periodically updated with new data. This causes, for example, the time series diagram to shift forward continuously and the arrows to change color and direction.
Apart form this summary view, our pilot implementation also provides the described types of visualization for individual machines and groups of machines. Starting from an overview of the total power consumption, a user can thus navigate through the hierarchy of all consumers. Furthermore, the single-page application allows to graphically correlate data (see Section 6.4) and to configure machines and machines groups maintained by the Sensor Management service. Visualizations of forecasts and detected anomalies are currently under development.
Dashboard
The second application is a pure visualization dashboard implemented with Grafana (Grafana Labs 2020) (see Fig. 7). It provides a set of common visualizations such as line charts, bar charts, and gauges. As presented in Fig. 7, we mainly display time series charts as bar or line charts. The dashboard is highly adjustable, meaning that users can add, modify, and rearrange chart components. Such adjustments can be performed graphically and only require usage of provided interfaces. Thus, especially IT savvy production operators can customize dashboards. Moreover, they can create own dashboards and share them among users. In this way individual dashboards, for example, for management and production operators can be implemented.
In contrast to the Control Center, this dashboard does not provide any control functions (e.g., for sensor configuration) and no complex interactive visualizations (e.g., the comparison tool). Thus, it only serves as an extension to the Control Center, allowing for visual analysis and reporting. In particular, this dashboard covers use cases, where power consumption data should be integrated in existing dashboards (as it is the case in one studied enterprise) or if dashboards should be customized by production operators.
Alerting
Altering in the Titan Control Center is implemented using the Titan Flow Engine in the integration component. All messages that are published to the messaging system can again be consumed by the Titan Flow Engine and processed in flows. This way, production operators can create and adjust alerting flows directly within the production environment. Our pilot implementation already provides a flow that sends an email whenever an anomaly in power consumption is reported. In dedicated bricks, the operator can filter the types of anomaly an alert should be generated for and configure how the email should be sent (e.g., message and receiver). The flow engine allows to model flows that perform arbitrary actions in the production environment when alerts are received. This includes communications with machines again, for example, to show alerts on machine monitors.