Abstract
This paper presents approaches for preparing different types of data to be loaded into the document-oriented NoSQL Elasticsearch database. The considered database allows not only to store data, but also provides an opportunity to use Kibana, data visualization utility, which is a powerful tool for data analysis. The task of preprocessing is essential, because well-prepared data not only allows you to increase the accuracy of the analysis, but also expand its capabilities. For more coverage, the approaches are described with the use of real cases that have been solved by analysts. The paper presents methodological and practical ways to solve problems both by transforming the data and adding new fields, and by correctly mapping for Elasticsearch indexes. For a clear demonstration of the approaches, their practical application is given on the example of two datasets with bibliographic information on papers and information on funding of scientific and technical projects. The demonstration shows the difference between initial and enriched data, as well as the charts built by working with the data, which enables advanced data analysis.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Bajer, M.: Building an IoT Data Hub with Elasticsearch, Logstash and Kibana. In: 2017 5th International Conference on Future Internet of Things and Cloud Workshops (FiCloudW), pp. 63–68. IEEE (2017)
Talas, A., Pop, F., Neagu, G.: Elastic stack in action for smart cities: making sense of big data. In: 2017 13th IEEE International Conference on Intelligent Computer Communication and Processing (ICCP), pp. 469–476. IEEE (2017)
Shah, N., Willick, D., Mago, V.: A framework for social media data analytics using Elasticsearch and Kibana. Wireless Netw. 28(3), 1179–1187 (2018)
Lahmadi, F. Beck, Finickel, E., Festor, O.: A platform for the analysis and visualization of network flow data of android environments. In: 2015 IFIP/IEEE International Symposium on Integrated Network Management (IM), pp. 1129–1130. IEEE (2015)
Barakhnin, V., Kozhemyakina, O., Mukhamedyev, R., Borzilova, Y., Yakunin, K.: The design of the structure of the software system for processing text document corpus. Bus. Inform. 13(4), 60–72 (2019)
Zamfir, V.-A., Carabas, M., Carabas, C., Tapus, N: Systems monitoring and big data analysis using the Elasticsearch system. In: 2019 22nd International Conference on Control Systems and Computer Science (CSCS). IEEE (2019)
Haugerud, H., Sobhie, M., Yazidi, A.: Tuning of elasticsearch configuration: parameter optimization through simultaneous perturbation stochastic approximation. Front. Big Data 5, 686416 (2022)
Ngo, T.T.T., Sarramia, D., Kang, M.-A., Pinet, F.: A new approach based on ELK stack for the analysis and visualisation of geo-referenced sensor data. SN Comput. Sci. 4(3), 241 (2023)
Hunter, T.: Advanced Microservices: A Hands-on Approach to Microservice Infrastructure and Tooling. Apress, Berkely, CA, USA (2017)
Elastic: https://www.elastic.co/guide/en/elasticsearch/reference/8.7/mapping.html. Last accessed: 24 Apr 2023
Walter-Tscharf, F.F.W.V.: Indexing, clustering, and search engine for documents utilizing Elasticsearch and Kibana. In: Mobile Computing and Sustainable Informatics, pp. 897–910 (2022)
Rosenberg, J., Coronel, J.B., Meiring, J., Gray, S., Brown, T.: Leveraging Elasticsearch to improve data discoverability in science gateways. In: Proceedings of the Practice and Experience in Advanced Research Computing on Rise of the Machines (Learning), vol. 19, pp. 1–5. ACM (2019)
Kim, K.-J., Cho, Y.-B.: Improving elasticsearch for Chinese, Japanese, and Korean text search through language detector. J. Inform. Commun. Converg. Eng. 18(1), 33–38 (2020)
Scopus Homepage. https://www.scopus.com. Last accessed 16 May 2023
NIH RePORTER Homepage. https://reporter.nih.gov. Last accessed 16 May 2023
Agarwal, V.: Research on data preprocessing and categorization technique for smartphone review analysis. Int. J. Comput. Appl. 131(4), 30–36 (2015)
GarcÃa, S., RamÃrez-Gallego, S., Luengo, J., BenÃtez, J.M., Herrera, F.: Big data preprocessing: methods and prospects. Big Data Analytics 1(1), 9 (2016)
Fan, C., Chen, M., Wang, X., Wang, J., Huang, B.: A review on data preprocessing techniques toward efficient and reliable knowledge discovery from building operational data. Front. Energy Res. 9, 652801 (2021)
Al-Jabery, K.K., Obafemi-Ajayi, T., Olbricht, G.R., Wunsch, D.C., II: Data preprocessing. In: Computational Learning Approaches to Data Analytics in Biomedical Applications, pp. 7–27 (2020)
Uematsu, H., Nguyen, P., Takeda, H.: Design for data structures: data unification and federation with Wikibase. In: 2022 IEEE International Conference on Big Data, pp. 6169–6178. IEEE (2022)
Acknowledgements
The study was supported by the Russian Science Foundation grant No. 23-75-30012, https://rscf.ru/project/23-75-30012/.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2024 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this paper
Cite this paper
Ulizko, M.S., Tukumbetova, R.R., Artamonov, A.A., Antonov, E.V., Ionkina, K.V. (2024). Data Preparation for Advanced Data Analysis on Elastic Stack. In: Samsonovich, A.V., Liu, T. (eds) Biologically Inspired Cognitive Architectures 2023. BICA 2023. Studies in Computational Intelligence, vol 1130. Springer, Cham. https://doi.org/10.1007/978-3-031-50381-8_96
Download citation
DOI: https://doi.org/10.1007/978-3-031-50381-8_96
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-50380-1
Online ISBN: 978-3-031-50381-8
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)