In order to fulfill the previously identified requirements, the following architecture of the software ecosystem is proposed.
Figure 2 presents an architectural overview of the software ecosystem (in blue) and how it is operated on the automation hardware component. The dashed lines show the communication with other iSSN hardware components (e.g., by IEC 61850 or Modbus to the transformer’s tap-changer), local measurements at the bus bars, and connections to the PLC-based field communication. Furthermore, a communication uplink to the DSO’s SCADA system must be provided.
Concerning software modules, the proposed ecosystem has to provide means of local data storage, an application management system, and a communication middleware to connect these modules with the applications that are intended to be operated within the software ecosystem. The automation hardware component needs to be an industry grade computer with local storage, extended I/O interfaces to support, e.g., IEC 61850, IEC 60870-5-104, Modbus, RS485 and other industry related local and remote communication systems.
Software modules and processes
The single modules of the substation’s software ecosystem are described subsequently. These building blocks operate locally on the industry grade PC and interact by exchanging messages through the Gridlink middleware.
The central component of the use case needs to be a building block that allows communication of several attached applications (cf. R1). Use case requirements were transformed into a specification for a proper communication infrastructure in an iSSN . The implementation of this specification is the Gridlink, a decentralized distributed message bus developed in Java and based on vert.x and Hazelcast. Gridlink was briefly mentioned in previous publications (e.g. ) but has not yet been introduced in detail. It is the successor of an earlier platform , improving modules’ coupling with new functions like a service registry and enabling provisioning features. Various modules, each providing certain features and some of them introduced later in this paper, dynamically form a cluster of known instances during execution . By default, modules running in the same sub net form a Hazelcast cluster, being discovered by using multicast messaging . The modularity of the solution is evident by allowing modules to join or leave at any time without introducing any influence on other modules and their execution. Furthermore, by using a decentralized architecture and not introducing any single point of failure, Gridlink shows to be resilient and scalable . Hence, a fail of any module neither prevents other modules to be executed further nor to communicate with remaining modules. In hand with this paper, the use of Gridlink is now shifted from previous experiences made in laboratory projects  to use cases in real world environments.
Modules A module is a Java program that (1) implements a dedicated functionality, (2) takes over one or more roles/topics, (3) is addressable and reachable over the event bus via one (or more) role/topic address(es), (4) provides functions to other modules. In terms of message transmission between modules, every module can serve as a data source by issuing messages to other modules by specifying their destination’s address and every module can serve as a data sink by registering a message handler for these messages. While (1) is fulfilled by all modules, (2)-(4) are not necessarily as modules serving as a data source only do not need to take over roles/topics and are thus not addressable. Each module has access to a distributed list of all modules that are currently attached to the Gridlink and active—the Gridlink Registry. It includes all roles/topics the module is registered to and a list of requests the according role/topic is able to handle. Modules requiring any communication channel to peripheral components outside the Gridlink system are called gateway modules. Currently, implemented use cases require such modules for REST calls, for XMPP protocol handling and for receiving measurement values via an IEC 60870-5-104 translator module (cf. R2).
Direct sending and publish/subscribe mechanism Gridlink supports three types of message exchange; one of these used for implementing a publish/subscribe mechanism, two for direct addressing of modules’ roles. Of the latter ones, one is able to register a handler for receiving and handling replies on the issued message, the other one is not. These methods are named publish, sendWithTimeout and send and will later on only be named by these. While being able to address all modules implementing a specified topic by using publish, the other two options only address exactly one module, being chosen by round-robin fashion if two or more modules implement the same destination role.
Gridlink proxy The proper execution of sending a message over the Gridlink and receiving this message at the other module is sometimes required to be amended. By using a Gridlink proxy, the normal execution is interrupted to run user-defined code for interception or modification of messages to be sent or to execute additional tasks when messages are transmitted over the component.
Examples, as shown in Fig. 3, include the logging or encryption of messages. Further examples concerning interception of messages include the filtering of messages, e.g., for a simulation of missing values. More than one proxy can be configured to be run in a serial fashion. Whether a proxy or interceptor is in use, stays transparent to sender and receiver module. It is required to add and remove proxies and interceptors dynamically during the module’s execution as described later in Sect. 3.2.2.
Introductory use case example This simple use case consists of three modules: (i) a data generator module, periodically producing (random) measurement values, (ii) a storage module, for permanent persistence of measurement values, and (iii) a monitoring module, that is able to show the most recent or a time span of most recent values.
Listing 1 The storage module registers itself for handling requests on role storage. Modules interested in measurement values may not be limited to this single module. We therefore use an event handler for the topic measurements, used by all modules eventually interested in such events and to which events will be published to.
Listing 2 The data generator issues a CreateDataPointRequest to role storage with a new DataPoint as payload having name “meterA.u1” and tag “voltage”. While in real applications it would be appropriate to receive a SuccessReply on a successful creation of the data point at the storage module or an ErrorReply otherwise, the example here should serve as a demonstration for sending messages without expecting any reply.
Listing 3 The data generator periodically issues MeasurementEvents to all modules registered to the topic measurements, having the measurement as payload.
Listing 4 The monitoring module intending to show the most recent value of the data point periodically issues GetMostRecentEntryRequests to the storage, having the required data point as payload. Naturally, this requires a reply containing the requested value, hence sendWithTimeout is used specifying a code block to be asynchronously executed after the intended reply was received. By Java 8 lambda syntax, it is specified that method doSomething is executed after a GetMostRecentEntryReply was received. If the specified timeout of 5 seconds expires while no reply has been received, the code block is unregistered and – as no specific error handler was specified – the default error handler is called to react.
Application provisioning stands for dealing with remote install, update, upgrade and configure operations to enhance or modify the functionality of a iSSN (cf. R3–R6). Within Gridlink, a designated core module, the AppManager, is responsible for receiving provisioning requests (e.g., over REST or XMPP, usually triggered by a user via a web client) and handling received requests. Gridlink modules that are started, configured, updated and stopped by the AppManager are called managed modules. As the AppManager itself is a normal Gridlink (gateway) module, it is able to communicate with any other module. Each module implicitly implements a shutdown role, on which the AppManager is able to issue remote shutdown requests. Provisioning tasks are transparent to managed modules. The requirement of not having any single point of failure still is valid though using the AppManager, as its fail crashes the remote provisioning features only. No interference with the operation of other modules arises.
Module installation The AppManager receives a request to install a module via its REST or XMPP connection (cf. R3). It downloads a ZIP archive file containing the software to install from an App Store and unpacks it to a destination directory accompanied by its default configuration file.
Module start The installed software artifact is started by the AppManager by creating a new process (cf. R3). A module usually should get implicitly started after the installation’s success. However, there are use cases where this is not the desired way to go. An example is the bulk installation of many modules that have dependencies on each other (cf. R4). Note, that by dependency we do not mean that a module’s execution would depend on the other module—this would introduce restrictions on the resilience requirement (cf. R1). However modules, e.g., that require to persist any data will state that they require a storage module to be available. In that case it is desired that a module is not started in hand with its installation, but explicitly started at a later time by using the start command.
Module stop A running module can be stopped by the AppManager on receiving a request from the periphery by issuing a Gridlink ShutdownRequest to the module’s implicit shutdown role. The module gets informed that its shutdown was initiated and can react accordingly or can also decide that it is currently not safe to shut down. Therefore the requestee issues an according Gridlink reply including its decision. A negative decision is communicated to the operator user, who can decide to insist on stopping the module immediately. In that case, the AppManager who maintains a list of its started modules can force the module to stop if necessary, e.g., by killing the process using its PID.
Module deinstallation Concluding a module’s life cycle, a deinstallation command initiates the removal of a stopped module’s data from the directory by the AppManager.
Module update The module update refers to the replacement of existing running software by another version and is therefore a combination of the described tasks stop, deinstallation, installation and start (cf. R5). Therefore, the AppManager moves the new artifact to its destination directory. It issues a ShutdownRequest to the running module, which can persist its current state to a file. The AppManager moves this state file to the directory of the new module and starts it. On start-up the module gets the old state and can continue its work accordingly.
Module configuration The configuration of modules can be altered during their execution (cf. R6). To that purpose, the AppManager replaces the configuration file in the module’s directory with the received new version. The module in execution gets informed of a configuration change and may react accordingly; by design it is nevertheless not required to restart the module.
We introduced the Gridlink Proxy functionality earlier in this paper and claimed that it should be possible to add and remove proxies dynamically during the module’s execution. The implementation of these dynamics are achieved by using described module configuration feature. During the run-time of the module, the iSSN administrator may externally define which proxies should be in service. The replacement of proxies happens without requiring a restart, and transparently to the executing module.
Module information Information of the running modules, such as the configuration that are currently used and their state can be requested to be transmitted over the gateway functionality of the AppManager (cf. R7). Described functionality refers to a pull approach. It may be suitable to use a push approach instead to inform peripheral components on a configuration change.
Planned next steps While it is easy to communicate to modules running on different machines in the same sub net by using Gridlink communication (cf. R1), the AppManager’s tasks get more complicated. Coping with file operations, starting and killing modules requires to involve a SSH client. In our use cases it is currently not required that modules run on a different machine. However, we are working on a multi-host concept. In further steps, the AppManager may be able to decide where to install a module, e.g., based on the machines’ load.
Storacle is a Java-based embedded data store for time-series of measurement data and meta data of such generating data points. Key performance indicators vital for Smart Grid infrastructures like high volumes of data of which records are small and immutable, frequent readouts and statistical indicators are optimally supported  (cf. R8). In  we compared Storacle with state-of-the-art off-the-shelf NoSQL and SQL data bases by using relevant benchmarks and taking into account the limitation of storage size and processing resources that may be present at machines in a substation. A format evaluation in  suggested the use of the Protocol Buffer format  as basis as it leads to required data size and retrieval time superior to other potential data bases in this use case. In  we further listed Cube, RRD4J, Cassandra, InfluxDB, neo4j and OpenTSDB and described why it is not recommended to use these already existing time series database systems for this use case.
The architecture of Storacle, shown in Fig. 4, is based on a three-tier approach. In the first step data to be added is saved in the RAM. Subsequently, data is persisted in a periodic manner to the hard disk. However it will remain in the RAM for some time in order to provide fast responses when queried, reducing costly read-ins from the disk. Optionally, data is periodically copied to a remote location, being a “cloud” or a remote file server which is assumed to have no data size limitations and which may provide replication. After exceeding a defined threshold time, where historical data is no longer of relevance for any local applications and thus these applications do not require access to the data anymore, data can be removed from the local disk once it has successfully been transferred to the third layer for permanent storage. Transfer and removal of old data from the local disk conserves disk space. Depending on the number of data points that need to be persisted, the frequency of their production of data and the available file size, Storacle is capable of persisting long periods of time without using such a third layer. Benchmarks showed that a time series record requires about 18 bytes of storage; one data point producing data with a frequency of 1 Hz therefore requires about 541 MB for one year of data (\(\sim \)31.5 million entries). Reading data back from the cloud is currently not possible as not necessary for our use cases.
The data to be handled consist of actual time series lists (e.g., periodical measurements of smart meters or any other sensors) and meta data of the sources data point including a description, location, other meta data and statistical values based on the time series data, e.g., for detection of abnormal values. Meta data can further be populated by adding tags to describe a data point and are persisted to the local storage periodically. After insertion of a measurement, the data record stays immutable; while meta data tags may be altered at any time.
Statistics Statistical information i.e., for measurement values, frequency of updates and deltas between a measurement’s time stamp and the receiving time at the storage module are available. However, they are limited to information which can be calculated without having all (previous) values available (on-line algorithm). For each data point, the number of values, the minimum and maximum value, the mean value, variance and standard deviation are available and updated on each received entry. A histogram classifies received values in bins, e.g., of size 2.5 V for voltage measurement values. Additionally, a 95 and a 99 % confidence interval for the mean value is available. These statistics are available for each data point.
The storage module is a Gridlink module responsible for storing measurement values, grid topology data and meta data. It is a vital module in an intelligent substation system as other application modules require access to historical measurement such as voltages or switch positions and other data (cf. R8). Requests issued to this module are either ones that add additional data to the storage or to request persisted data for the use of the issuing module. The storage module is primarily implemented by using Storacle for time series measurement values and the meta data. Data of which Storacle is not capable of persisting such as topology files, are either handled by the Storage module itself or other data base solutions may be used for those.
All request types the storage module is capable to handle are listed next. Most of these request types have an corresponding reply type. Some requests, naturally, reply only by returning a SuccessReply or ErrorReply (e.g., on UpdateTagsRequest) or are defined to not reply at all (e.g., on AddEntryRequest).
CreateDataPointRequest to create the data point,
ExistsDataPointRequest to detect whether the data point (already) exists,
GetDataPointsRequest to retrieve all data points. The request may optionally contain a list of tags, such that the storage returns only those data points that contain any or all of the tags.
GetTagsRequest to retrieve the tags of the data point,
UpdateTagsRequest to update the tags of the data point,
AddEntryRequest to add an entry of the data point to the storage,
GetMostRecentEntryRequest to retrieve the most recent entry of the data point,
GetEntriesOfTimeSpanRequest to retrieve all entries within a time span of the data point,
GetStatisticsRequest to retrieve the statistics of the data point,
GetTopologyRequest to retrieve the grid topology,
SaveDataPointsCSVRequest to export the values of data points within a time span to a CSV file.
Observing storage’s events Events of data points are published to designated topics, regardless whether or not anyone has interest in these events. An explicit (and traditional) registration to data point’s events is therefore not necessary. To get informed of such events, it is only necessary to register to the designated data point’s observer topic. The storage module issues events whenever a data point was created, a measurement was added or tags have been altered. This functionality can be used to implement a push mechanism to update a view once a new value is received. Some types of events are published to the global observer topic allowing to observe events of general interest without prior knowing a data point’s name.
Grid representation module
The grid representation module (GRM) collects grid-data, preprocesses this data and provides current information about the distribution grid and its elements to other modules in a well structured form. The grid topology is provided as XML-based common information model (CIM)  and imported by the GRM building up a topological model using the information about nodes, lines, transformers, connections, etc. For each device, an entry in a table is created including device specific information and a unique identifier.
Typically, an adjacency matrix is used to store the information if two elements within a network are connected. In the Grid Representation Module, the matrix can be interpreted as extended adjacency matrix because it contains much more information about the topology and the relation between network elements (e.g., if they are directly connected, indirectly connected, the path length in terms of hops between two elements, and if it is possible to reach an element when changing switch positions). By using the row and column index as unique identifier of the elements, the connections information can be read very fast. The matrix is build initially when GRM is started and the topology information is available. Therefore, well-defined starting points of the grid are identified (e.g., MV/LV connection point). Starting at these elements, a recursive algorithm iterates through the elements of the network to find some predefined final nodes (e.g., energy consumers). Based on the traversed path within the network topology, the connection information is stored. By using positive and negative connection lengths, parent-child-relations in terms of network topology are shown. In case of topology changes (e.g., change of a switch position), the information is updated in the Grid Representation Module. Thus, the GRM can be seen as an up-to-date representation of the grid. Detailed information about the connection information and example can be found in .
Another function of the GRM is to model grid monitoring devices, in particular their position within the topology and to provide their corresponding power profiles. Therefore, the monitoring devices are identified and their power profiles are requested from the storage module by the GRM. In general, grid monitoring devices are sensing active and reactive power flow for each phase of a line. Thus, measured power is the sum of all sub-branches and all directly connected nodes. GRM analyzes the determined positions of the monitoring devices and—in case that a parent-child-relation exists to a sub-branch—so called residual profiles are calculated. To provide useful data, gaps within the profiles are identified and seamless time-series are created.
Building representation module
The functionality of the Building Representation Module (BRM) is similar to the GRM—collect and preprocess data and provide it to other modules and applications. In almost the same manner as the GRM, the BRM requests profiles from the storage module, in particular profiles of buildings instead of monitoring devices within the distribution grid. Also, gaps within the profiles are identified and seamless time-series are created. As a result, suitable data can be provided for other modules of the application frame. At the moment, the building representation by their power profiles and data preparation are the only implemented functions of BRM. Further, functions like in-building actuator modeling and energy market participation-related functions are envisioned.