Development of a method for processing log files using clustering

Mahmudova, Shafagat

doi:10.1007/s00500-022-07740-2

Development of a method for processing log files using clustering

Application of Soft Computing
Published: 24 December 2022

Volume 27, pages 1617–1628, (2023)
Cite this article

Soft Computing Aims and scope Submit manuscript

Shafagat Mahmudova ORCID: orcid.org/0000-0003-1817-0756¹

166 Accesses
Explore all metrics

Abstract

A log file is a document that keeps track of all events that occur on a website or server. Many log files are very large, so they can be regularly written over outdated content, or entire collections of log files with names, including a date, for example, can be created. In the event of technical problems, site inaccessibility, virus infection, hacker attacks and Distributed Denial of Service (DDoS) attacks, the resource administrator can use the information in log to find the cause, which makes it easier and faster to eliminate unwanted incidents. The paper analyzes the definition, types, location, use and examples of log files. Data are transferred to the MySQL database using the Squid.db database. Clustering is performed using a database. The study highlights clustering, analyzes metrics, and determines the proximity of clusters and objects in clusters in Euclidean space. Experiments are conducted and the results are satisfactory. For example, data are transferred to the MySQL database using the Squid.db database. Since the Squid proxy server is a cache proxy server, it stores resources, and the work is done quickly on the next request. Data are clustered using a compiled table of databases transferred to MySQL via Squid proxy. In this case, unnecessary entries are deleted from the table, which significantly speeds up data processing. The application of clustering method in problem solving is fast and simple. For the problem stated, the degree of closeness of clusters and objects in clusters in Euclidean space is determined. Experiments are conducted using obtained results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-source Log Clustering in Distributed Systems

Big Data Mining Using K-Means and DBSCAN Clustering Techniques

Elasticsearch and Carrot2-Based Log Analytics and Management

Data availability

Enquiries about data availability should be directed to the authors.

References

Alekseev A, Megino F, Klimentov A, Korchuganova T, Maendo T, Padolski S (2015) Building analytical platform with Big Data solutions for log files of PanDA infrastructure. In: International conference information technologies in business and industry, PTS pp.1–4. https://doi.org/10.1088/1742-6596/1015/3/032003
Aleksey L (2008) Bystryy perenos log-faylov v MySQL. https://www.opennet.ru/tips/1610_mysql_log.shtml
Kimberly C, Wessels D (1997) Internet cache protocol (ICP). Electronic resource
Deepika Dutta M., Virender D., Salim P., Murthy C (2020) Design and Development of Centralized Squid Proxy Management System. In: IEEE ınternational conference on electronics, computing and communication technologies (CONECCT). https://doi.org/10.1109/CONECCT50063.2020.9198539
Golovchiner M (2009) Bazy dannykh: Osnovnyye ponyatiya, modeli dannykh, protsess proyektirovaniya: uchebnoye posobiye. Tomsk.: TGU, pp. 125.
Grinkevich S (2009) Analiz Web-logov dlya postroyeniya modeli nagruzochnogo testirovaniya. http://www.software-testing.ru/library/testing/general-testing/468-weblog.
Hassani M, Shang WY, Shihab E, Tsantalis N (2018) Studying and detecting log-related issues. Empir Softw Eng 23:3248–3280. https://doi.org/10.1007/s10664-018-9603-z
Article Google Scholar
Khaydukov D (2009) Primeneniye klasternogo analiza v gosudarstvennom upravlenii. Filosofiya matematiki: aktual’nyye problemy. MAKS Press, Rossiya
Google Scholar
Kuhner N, Lindenmayr O, Maier A (2017) Classification of Body Regions Based on MRI Log Files. Proceedings of the 10th international conference on computer recognition systems CORES. Advances in Intelligent Systems and Computing, Springer; 578 (pp.102–109). https://doi.org/10.1007/978-3-319-59162-9
Ma T, Te Yu, Xiuge Wu, Cao J, Al-Abdulkarim A, Al-Dhelaan A, Al-Dhelaan M (2020) Multiple clustering and selecting algorithms with combining strategy for selective clustering ensemble. Soft Comput 24:15129–15141. https://doi.org/10.1007/s00500-020-05264-1
Article Google Scholar
Mandel’ ID (1988) Klasternyy analiz. Finansy i statistika, Moscow
Google Scholar
Mas'kov S (2012) Preimushestva ispolzovaniya klasternogo podhoda k razvitiyu regional'noy ekonomiki. Mockow
Pautov K, Popov F (2015) Metod klasterizatsii tematicheskikh profiley pol'zovateley i yego primeneniye dlya analiza internet-trafika. Fundamental'nyye issledovaniya 7:765–769
Google Scholar
Ribeiro PC, Biles ML, Lang C, Silva C, Plass JL (2018) Visualizing log-file data from a game using timed word trees. Inf vis 17:183–195. https://doi.org/10.1177/1473871617720810
Article Google Scholar
Shatalkin AI (2012) Taksonomiya: Osnovaniya, printsipy i pravila. Zoologicheskiy muzey MGU. Tovarishchestvo nauchnykh izdaniy KMK, Rossiya
Google Scholar
Teixeira C, de Vasconcelos JB, Pestana GA (2018) Knowledge Management System for Analysis of Organisational Log Files. In: 13th Iberian conference on information systems and technologies (CISTI). IEEE, Iberian Conference on Information Systems and Technologies.
Tsikhan TV (2003) Klasternaya teoriya ekonomicheskogo razvitiya. Teoriya i Praktika Upravleniya 5:74–81
Google Scholar
Xu X, Jiang Y, Flach T, Katz-Bassett E, Choffnes D, Govindan R (2015) Investigating Transparent Web Proxies in Cellular Networks. In: International conference on passive and active network measurement. pp 262–276 https://doi.org/10.1007/978-3-319-15509-8_20

Download references

Acknowledgements

The author thanks the editors and anonymous reviewers for their helpful comments and suggestions that have led to this improved version of the paper.

Funding

The authors have not disclosed any funding.

Author information

Authors and Affiliations

Institute of Information Technology of ANAS, Baku, Azerbaijan
Shafagat Mahmudova

Authors

Shafagat Mahmudova
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Shafagat Mahmudova.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Mahmudova, S. Development of a method for processing log files using clustering. Soft Comput 27, 1617–1628 (2023). https://doi.org/10.1007/s00500-022-07740-2

Download citation

Accepted: 11 December 2022
Published: 24 December 2022
Issue Date: February 2023
DOI: https://doi.org/10.1007/s00500-022-07740-2

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Development of a method for processing log files using clustering

Abstract

Access this article

Similar content being viewed by others

Multi-source Log Clustering in Distributed Systems

Big Data Mining Using K-Means and DBSCAN Clustering Techniques

Elasticsearch and Carrot2-Based Log Analytics and Management

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Development of a method for processing log files using clustering

Abstract

Access this article

Similar content being viewed by others

Multi-source Log Clustering in Distributed Systems

Big Data Mining Using K-Means and DBSCAN Clustering Techniques

Elasticsearch and Carrot2-Based Log Analytics and Management

Data availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation