Abstract
This paper examines the use of Elasticsearch for data warehousing and analyses of geo-referenced sensor data. Elasticsearch has several advantages compared to its direct competitors. For example, it is capable of handling time series, spatial data, and objects. Moreover, it is natively connected with the data shippers Beats, Logstash, and the visualisation tool Kibana. This paper proposes a method to implement and query multidimensional models in Elasticsearch. No prior work has evaluated Elasticsearch for data warehouses and analytical queries, especially for sensor environmental data. This paper therefore also presents extensive experiments to evaluate its querying performance. The proposed approach is applied to the analysis of sensor data used in the context of CEBA, an environmental cloud solution developed to collect, store, and analyse environmental data. An experimental performance analysis is also provided.
Similar content being viewed by others
Data availability
The data that support the study are available in Google Drive at https://drive.google.com/drive/folders/1ATdzq_p-jwrhPLyWkrfE_O8nCE8LD6s4?usp=sharing.
References
ConnecSenS P. 2015–2020. http://www.lpc-clermont.in2p3.fr/spip.php?article583. Retrieved June 2021.
Terray LA-J. From sensor to cloud: an IoT network of radon outdoor probes to monitor active volcanoes. Sensors. 2020; pp. 2755 (Multidisciplinary Digital Publishing Institute).
Bajer M. Building an IoT data hub with Elasticsearch, Logstash and Kibana. In: 5th international conference on future internet of things and cloud workshops (FiCloudW). IEEE. 2017. pp. 63–8.
Inmon WH. Building the data warehouse. New York: Wiley; 2005.
Jarke MA. Fundamentals of data warehouses. New York: Springer; 2002.
Pinet FA. Precise design of environmental data warehouses. Oper Res. 2010; vol. 10. pp. 349–369.
Bicevska ZA. Towards NoSQL-based data warehouse solutions. Procedia Comput Sci. 2017; vol. 104. pp. 104–111.
Lenzerini M. Data integration: a theoretical perspective. In: Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on principles of database systems. 2002. pp. 233–46.
Sabtu AA. The challenges of extract, transform and loading (etl) system implementation for near real-time environment. In: 2017 international conference on research and innovation in information systems (ICRIIS). IEEE. 2017.
Pilato D. How to fetch data from multiple index using join like sql. Retrieved from Elasticsearch. 2017. https://discuss.elastic.co/t/how-to-fetch-data-from-multiple-index-using-join-like-sql/106131. Retrieved June 2021.
Bansal SK. Integrating big data: A semantic extract-transform-load framework. In: Computer. IEEE. 2015. pp. 42–50.
Elasticsearch. (2020). ELK. https://www.elastic.co/elastic-stack. Retrieved June 2021.
Guo DA. State-of-the-art geospatial information processing in NoSQL databases. ISPRS Int J Geo-inf. 2020; pp. 331 (Multidisciplinary Digital Publishing Institute).
Dubey SA. Data visualization on GitHub repository parameters using Elastic search and Kibana. In: 2018 2nd international conference on trends in electronics and informatics (ICOEI). IEEE. 2018. pp. 554–8.
Nipun Garg SM. Spatial databases spatial data warehouses. Retrieved from pdfs.semanticscholar.org. 2011. https://pdfs.semanticscholar.org/684a/4a2c41360e5965281ee09cabbb621f4400cb.pdf. Retrieved June 2021.
Matei AA-M. OLAP for multidimensional semantic web databases. Enabl Real Time Bus Intell. 2014;81–96.
Wrembel R. Data warehouses and OLAP: concepts, architectures and solutions: concepts, architectures and solutions. Igi Global. 2006.
Albrecht AA. Managing ETL processes. NTII. 2008;8:12–5.
CEBA project. 2020–2025. https://mesocentre.uca.fr/projets-associes/ceba. Retrieved June 2021.
Werneck GL. Georeferenced data in epidemiologic research. Ciencia Sa’ude Coletiva. 2008;13:1753–66.
Alam MM. A survey on spatio-temporal data analytics systems. 2021. arXiv:2103.09883.
Hintze PA. Geographically referenced data for social science. RatSWD_WP_. 2009.
Lee J-GA. Geospatial big data: challenges and opportunities. Big Data Res. 2015;2:74–81.
Kulsawasd Jitkajornwanich NP. A survey on spatial, temporal, and spatio-temporal database research and an original example of relevant applications using SQL ecosystem and deep learning. J Inf Telecommun. 2020;4(4):524–59.
Elasticsearch. Scalability and resilience: clusters, nodes, and shards. Retrieved from Elasticsearch. 2021. https://www.elastic.co/guide/en/elasticsearch/reference/current/scalability.html. Retrieved June 2021.
Tewtia HK. COVID-19 insightful data visualization and forecasting using elasticsearch. In: Computational intelligence methods in COVID-19: surveillance, prevention, prediction and diagnosis. Springer. 2021. pp. 191–205.
CEBA. CAHIER DES CHARGES BASE DE DONNEES. 2018. http://doc.ceba.uca.fr. Retrieved June 2021.
Elasticsearch. Creating a visualization. 2021. https://www.elastic.co/guide/en/kibana/6.8/createvis.html. Retrieved June 2021.
Bedard YA. Fundamentals of spatial data warehousing for geographic knowledge discovery. Geogr Data Min Knowl Discov. 2001;2:53–73.
Barnsteiner F. Elasticsearch as a time series data store. 2015. https://www.elastic.co/blog/elasticsearch-as-a-time-series-data-store. Retrieved June 2021.
Ngo TT-A. An analytical tool for georeferenced sensor data based on ELK stack. In: Proceedings of the 7th international conference on geographical information systems theory, applications and management (GISTAM 2021). SCITEPRESS—Science and Technology Publications, Lda. 2021. pp. 82–89.
Kramer M. GeoRocket: a scalable and cloud-based data store for big geospatial files. SoftwareX. Elsevier. 2020. p. 100409.
Bartlett R. Local geographic information storing and querying using elasticsearch. In: Proceedings of the 13th workshop on geographic information retrieval. pp. 1–4. 2019.
Quoc HN. An elastic and scalable spatiotemporal query processing for linked sensor data. In: Proceedings of the 11th international conference on semantic systems. pp. 17–24. 2015.
Dobson SA. A reference architecture and model for sensor data warehousing. IEEE Sens J. 2018;18:7659–70 (IEEE).
PostGIS. Chapter 15. PostGIS Special Functions Index. 2022. https://postgis.net/docs/PostGIS_Special_Functions_Index.html. Retrieved 4 2022.
Agarwal SA. Performance analysis of MongoDB versus PostGIS/PostGreSQL databases for line intersection and point containment spatial queries. Spat Inf Res. 2016;24:671–7.
Bartoszewski DA. The comparison of processing efficiency of spatial data for PostGIS and MongoDB databases. In: International conference: beyond databases, architectures and structures. Springer. 2019. pp. 291–302.
Bimonte SA. When spatial analysis meets OLAP: multidimensional model and operators. Int J Data Warehous Min. 2010;6:33–60.
Boulil KA. A UML & spatial OCL based approach for handling quality issues in SOLAP systems. In I. (1) (ed.). pp. 99–104. 2012.
Boulil KA. Spatial OLAP integrity constraints: from UML-based specification to automatic implementation: application to energetic data in agriculture. J Decis Syst. 2014;23:460–80.
Boulil KA-P. Guaranteeing the quality of multidimensional analysis in data warehouses of simulation results: application to pesticide transfer data produced by the MACRO model. Ecol Inform. 2013;16:41–52.
Miralles AA. EIS pesticide: an information system for data and knowledge capitalization and analysis. In: Euraqua-peer scientific conference. 2011.
Liang SA-Y. OGC SensorThings API part 1: sensing, version 1.0. Open geospatial consortium. 2016.
ISO 19156:2011. From International Organization for Standardization, ISO 19156:2011, geographic information—observation & measurement. 2011. https://www.iso.org/standard/32574.html. Retrieved Mar 2022.
ISO 19115-1:2014. From geographic information—metadata—part 1: fundamentals. 2014. https://www.iso.org/standard/53798.html. Retrieved Mar 2022.
Geonetwork. 2022. https://geonetwork-opensource.org/. Retrieved Mar 2022.
Acknowledgements
This research was financed by the French government IDEX-ISITE initiative 16-IDEX-0001 (CAP 20-25) and the PhD was funded by the European Regional Development Fund (FEDER).
Funding
This research was financed by the French government IDEX-ISITE initiative 16-IDEX-0001 (CAP 20-25) and the Ph.D. was funded by the European Regional Development Fund (FEDER).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is part of the topical collection “Geographical Information Systems Theory, Applications and Management” guest edited by Lemonia Ragia, Cédric Grueau and Robert Laurini.
Appendices
Appendix A
Appendix B
We can visualise our dataset in many forms, e.g. bar charts and line graphs. In this part, we explain how to produce a visualisation on Kibana (Elasticsearch, Creating a Visualization, 2021).
-
Navigate to the visualisation page by clicking on Visualise on the left panel on Kibana home page.
-
Select a visualisation type, e.g. line, area, or maps.
-
Select the expected dataset index.
A metric and bucket aggregation query panel will be displayed by default as in Fig. 24.
In the visualisation controller, the metrics in Fig. 25 are represented for the visualisation of the Y axis and the buckets in Fig. 26 are represented for the visualisation of the X axis. Consider the visualisation 3 (see Fig. 22) as a use case example to monitor air humidity measurement by devices and hours: the Y axis shows the value of air humidity received from each sensor, and the device name and hours are displayed across the X axis.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Ngo, T.T.T., Sarramia, D., Kang, MA. et al. A New Approach Based on ELK Stack for the Analysis and Visualisation of Geo-referenced Sensor Data. SN COMPUT. SCI. 4, 241 (2023). https://doi.org/10.1007/s42979-022-01628-6
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s42979-022-01628-6