Performance Tests of Smart City IoT Data Repositories for Universal Linear Infrastructure Data and Graph Databases
- 105 Downloads
The infrastructure managed by Smart City systems includes mainly linear structures, such as pipelines, electric power lines, railway lines, etc.. In this paper, we propose the transcription of linear infrastructure to graph representation and storing needed meta-data in a graph database, merged with the traditional relational database used for device management. Such an approach enables fast acquisition of data about infrastructure’s properties and allows merging information about the structure of the managed infrastructure with information devices’ properties and statistics. We develop a graph generator that generates virtual network basing on representative examples of linear infrastructure, incorporation of a data from existing metering systems with the generated structure, which automates the deployment and allows to evaluate the performance of Smart City monitoring and management system. We present results of performance tests that show and the creation time of graph database structures in Neo4j and the difference between the performance of the Microsoft SQL Server database and the Neo4j database in a few Smart City use cases. The results show that the use of the graph database to execute queries related to linear infrastructure allows decreasing the response time up to 9 times.
KeywordsSmart City Smart devices Graph databases
The idea of Smart City became firmly rooted in the consciousness of the world population, and in the last years, many cities began to incorporate more and more smart technologies in their infrastructure [1, 2]. In the Smart City IoT architecture, which deals with data from sensor smart devices, such as gas, heat, or electric energy meters, one of the important information is a layout of an existing infrastructure in which smart devices are implemented. The lack of information about the actual infrastructure severely decreases the usefulness of Smart City systems and delivers only raw data about measurements from smart devices without any additional context.
Smart City IoT systems collect data from smart devices, e.g., the smart media meters, actuators, controllers, etc. In this paper, we focus on smart media meters that measure utility consumption and can be placed on the clients’ devices such as heaters, water pipes, or gas pipes; moreover, such devices can also be placed on the infrastructure components that are outside of clients’ apartment, but are crucial in utility delivery, such as ground pipes, manholes, tanks, etc. These devices can give the utility supplier crucial information about utility consumption but also about the state of the whole infrastructure and possible emergencies—given we have knowledge about the whole infrastructure and how it is connected. Most of the infrastructures serviced by Smart City IoT systems that we focus on are so-called man-made linear infrastructures in this paper referred to as LI. As the construction law defines, the LI is a construction object, whose characteristic parameter is its length, e.g., railways, waterworks, canal, gas pipeline, heat pipelines, pipeline, electric power lines, cable ducting or roads. A good representation of such infrastructure is very helpful in locating, e.g., leakage in these systems, especially leaks in gas and fuel lines are very dangerous and need to be quickly localized. As the inconsistencies can exist from nay different sources such as malfunctioning sensor or miscalculation of equipment , it is important to supply effective algorithms that identify real leaks without false positive alerts . As all the above-mentioned infrastructures are defined as an LI, their characteristics can be easy translated into graphs. The usage of graphs and graph theory in representation of LI is not a new idea and it was used to represent the LI and detect faults in numerous publications for gas [5, 6], district heating [7, 8], water , and landscape planning  Quite interesting idea was presented by the authors of US patent  that gives an idea for generating A method and system to generate a network graph representation of a physically connected network.
The Smart City systems must merge existing data about smart meters state, readings, and parameters with information about the structure of the infrastructure, such as the topology of the network. The knowledge about the exact infrastructure and the connections between LI nodes gives many new possibilities of utilizing the system to monitor many parameters on a certain path between the media source and receiving clients. Knowledge of the whole infrastructure with online access to the state of all smart devices in every point of LI gives a good level of control and helps to pinpoint possible malfunctions of infrastructure components. An important problem in the development of a management or decision support system for linear infrastructure is the lack of data of measurements from representative systems. We use data from a measurement system that has large areas monitored using the smart devices, and collects a large amount of data about the smart devices measurements, properties, monitored equipment information, localization information, etc. However, the data lack information about how the localizations are connected and via which medium. To create a pilot proof of concept system, we created the LI infrastructure generator in the form of a multi-connection graph. The virtual infrastructure is then populated by real data from the Network Management System (NMS) database and can be used as a base for developing an application that potential clients require
The Smart City systems must merge existing data about smart meters state, readings, and parameters with information about the structure of the infrastructure, such as the topology of the network. The knowledge about the exact infrastructure and the connections between LI nodes gives many new possibilities of utilizing the system to monitor many parameters on a certain path between the media source and receiving clients. Knowledge of the whole infrastructure with online access to the state of all smart devices in every point of LI gives a good level of control and helps to pinpoint possible malfunctions of infrastructure components. An important problem in the development of a management or decision support system for linear infrastructure is the lack of data of measurements from representative systems. We use data from a measurement system that has large areas monitored using the smart devices, and collects a large amount of data about the smart devices measurements, properties, monitored equipment information, localization information, etc. However, the data lack information about how the localizations are connected and via which medium. To create a pilot proof of concept system, we created the LI infrastructure generator in the form of a multi-connection graph. The virtual infrastructure is then populated by real data from the Network Management System (NMS) database and can be used as a base for developing an application that potential clients require.
The Linear Infrastructure Representation
All of the above-mentioned infrastructures can be naturally presented as a directed or undirected multigraph, where edges have their own identity. The main problem is how to represent the LI infrastructure in the graph model to include the various possible transportation methods that can exist in the Smart City infrastructure. Every one of possible architectures has their own characteristics, e.g., waterworks are generally connected with single pipe transporting fresh water (with sewage system considered as a separate architecture), in case of electric power, we have cables, and in case of a district heating, we deal with two separate pipes—one transporting hot water that powers the system (power) and second pipe with cool water that is returning to a heating plant (return). Our first idea was to model the real infrastructure, where connections will be represented using graph edges with certain properties. However, after a deeper analysis, we decided to create transport nodes to include the possibility of several devices that can be mounted on transporting objects (e.g., several sensors inside the pipe). The main idea of our graph structure can be described as a hierarchical graph structure divided into layers, and the full idea is presented in , where we focus on a district heating system. A similar solution was presented in ; however, our solution includes much more focus on the description of monitored objects (meters) and monitoring devices (devices).
Infrastructure layer this layer represents the real connections between physical objects (e.g., heating plant, client location), points (“crossroads”), and transport medium (transport nodes representing, e.g., pipes).
Meter layer this layer represents devices on which sensors are mounted (e.g., tanks, heaters, etc.).
Device layer this layer represents physical measuring platforms that can include one or several sensors.
The graph representation of LI presented in  forces us to store graph data in some repository. There are several methods of storing the graph data in a relational database, such as the Microsoft SQL Server; however, this causes some problems. First of all, we need separate tables for nodes and edges, with a series of additional tables storing objects data. The example of such a simplified structure we created is presented in Fig. 4. The straightforward query about the path from one node to other is quite easy in this structure, however getting, for example, the shortest path requires implementing additional algorithms, and the response times will be longer than from, e.g., graph databases that are designed to store such data and have already implemented algorithms for optimizing graph search. We made research on available graph database systems and decided to utilize the graph database environment.
Based on graph theory—it consists of nodes, edges, and properties.
Properties can be ascribed to both nodes and edges.
The graph databases store the relations between records directly (no need to join multiple tables as in RDB).
Maximal number of generated source nodes.
Maximal number of inputs for a node.
Maximal number of outputs for a node.
Number of generated nodes.
Maximal number of observed meters for a node.
Maximal numbers of DEVICES for METERS.
Maximal number of levels DEVICE.
Number of devices on every level.
probability of getting additional level for DEVICE (0–1).
Integration with representative data from NMS database
As for now, we generate graph data, however, in the representative environment, the system will get the data about localisations and connections from external sources (e.g., a database or CSV file). The application is ready to receive such data, and all of the mechanisms of creating graph nodes and connections are implemented; however, to actually implement the loading process, we need to have the source data, similar to Extraction Transformation and Load (ETL) process in data warehouses, we need to implement it separately for a data source, because for each client, it will look differently, the data will be stored in different formats and repositories, so this part is actually to be fully implemented during pilot implementation of the system. The import of data is partially implemented with the NMS database; the application fully supports ETL of NMS data and already is used for generated infrastructure (Fig. 3).
Graph Generator Tests
We performed tests of graph generator, where we were importing data from the NMS-to-Neo4j structure. The tests are thoroughly described in , however, the main conclusion was that along with the increase of the number of middle nodes, the building time grows exponentially, and however, the fact that it is a one-time process the building time around 4–5h is expected and acceptable.
The performance tests we present in this paper show the ability to get the path from one node to another. This is quite important in case of Smart City systems, where queries about the state of devices on a path from source to the client are quite important—checking the exact path of delivering gas, energy, or water is needed to, e.g., find alternative connection possibilities or checking measurement values while looking at possible leak situation. As a current installment of the NMS database is not able to process such queries, we introduced (as a comparison) Microsoft SQL Server database with schema presented on Fig. 4, and it stores the same data (generated + extracted from NMS) as a Neo4j structure, as presented in Fig. 3.
In both figures, we can see a considerable overhead on a first query, which is probably the Neo4j and Microsoft SQL Server overhead and it can be neglected, as it is not present in the following queries. The clear difference between response time that increases with the number of nodes and already implemented algorithms optimizing graph searches, proves that Neo4j will be a better choice for the Smart City systems.
In this paper, we propose the idea of representing the linear infrastructure of, e.g., railways, waterworks, canal, gas pipeline, heat pipelines, electric power lines, cable ducting or roads. The schema presented in  and explained in section “The Linear Infrastructure Representation” enables us to directly represent the real connections between LI objects and connect them with devices, meters, and sensors present in the system.
The graph generator creates a structure similar to physical infrastructure and combines virtual nodes with real data about real objects transferred from the NMS database. The tests show the response times for queries about certain paths in a graph structure and compare the results for the test Microsoft SQL Server database and the Neo4j graph database. The test clearly shows the superiority of the Neo4j database, thus confirming our choice of technology.
The bottleneck of this solution is quite a long database creation time for Neo4j; however, since, as in classical databases and data warehouses, this is a one time process, and the possible updates are much faster, long building time is an acceptable drawback in Smart City systems.
This research was funded by Polish National Center for Research and Development Grant number POIR.04.01.04-00-0005/17.
- 2.Giffinger R, Fertner C, Kramar H, Kalasek R, Pichler-Milanovic N, Meijers E. Smart cities. In: Ranking of European Medium-Sized Cities. Centre of Regional Science, Vienna UT; Vienna, Austria: 2007. Final Report.Google Scholar
- 8.Blommaert M, Salenbien R, Baelmans M. An adjoint approach to thermal network topology optimization. In: Computational methods and simulations, IHTC-16; 2018. pp. 2081–9. https://doi.org/10.1615/IHTC16.cms.024074.
- 11.https://patents.google.com/patent/US9473368B1/en. Accessed 20 Jan 2019.
- 12.Gorawski M, Grochla K: Graph representation of linear infrastructure in Smart City IoT Systems. In: ICUMT 2019, to be publishedGoogle Scholar
- 13.https://neo4j.com/. Accessed 20 Jan 2019.
- 14.https://db-engines.com/en/ranking/graph+dbms. Accessed 20 Jan 2019.
- 15.https://www.w3.org/TR/sparql11-overview/. Accessed 20 Jan 2019.
- 16.Cypher Query Language. https://neo4j.com/developer/cypher-query-language/. Accessed 20 Jan 2019.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.