Abstract
A log file is a document that keeps track of all events that occur on a website or server. Many log files are very large, so they can be regularly written over outdated content, or entire collections of log files with names, including a date, for example, can be created. In the event of technical problems, site inaccessibility, virus infection, hacker attacks and Distributed Denial of Service (DDoS) attacks, the resource administrator can use the information in log to find the cause, which makes it easier and faster to eliminate unwanted incidents. The paper analyzes the definition, types, location, use and examples of log files. Data are transferred to the MySQL database using the Squid.db database. Clustering is performed using a database. The study highlights clustering, analyzes metrics, and determines the proximity of clusters and objects in clusters in Euclidean space. Experiments are conducted and the results are satisfactory. For example, data are transferred to the MySQL database using the Squid.db database. Since the Squid proxy server is a cache proxy server, it stores resources, and the work is done quickly on the next request. Data are clustered using a compiled table of databases transferred to MySQL via Squid proxy. In this case, unnecessary entries are deleted from the table, which significantly speeds up data processing. The application of clustering method in problem solving is fast and simple. For the problem stated, the degree of closeness of clusters and objects in clusters in Euclidean space is determined. Experiments are conducted using obtained results.
Similar content being viewed by others
Data availability
Enquiries about data availability should be directed to the authors.
References
Alekseev A, Megino F, Klimentov A, Korchuganova T, Maendo T, Padolski S (2015) Building analytical platform with Big Data solutions for log files of PanDA infrastructure. In: International conference information technologies in business and industry, PTS pp.1–4. https://doi.org/10.1088/1742-6596/1015/3/032003
Aleksey L (2008) Bystryy perenos log-faylov v MySQL. https://www.opennet.ru/tips/1610_mysql_log.shtml
Kimberly C, Wessels D (1997) Internet cache protocol (ICP). Electronic resource
Deepika Dutta M., Virender D., Salim P., Murthy C (2020) Design and Development of Centralized Squid Proxy Management System. In: IEEE ınternational conference on electronics, computing and communication technologies (CONECCT). https://doi.org/10.1109/CONECCT50063.2020.9198539
Golovchiner M (2009) Bazy dannykh: Osnovnyye ponyatiya, modeli dannykh, protsess proyektirovaniya: uchebnoye posobiye. Tomsk.: TGU, pp. 125.
Grinkevich S (2009) Analiz Web-logov dlya postroyeniya modeli nagruzochnogo testirovaniya. http://www.software-testing.ru/library/testing/general-testing/468-weblog.
Hassani M, Shang WY, Shihab E, Tsantalis N (2018) Studying and detecting log-related issues. Empir Softw Eng 23:3248–3280. https://doi.org/10.1007/s10664-018-9603-z
Khaydukov D (2009) Primeneniye klasternogo analiza v gosudarstvennom upravlenii. Filosofiya matematiki: aktual’nyye problemy. MAKS Press, Rossiya
Kuhner N, Lindenmayr O, Maier A (2017) Classification of Body Regions Based on MRI Log Files. Proceedings of the 10th international conference on computer recognition systems CORES. Advances in Intelligent Systems and Computing, Springer; 578 (pp.102–109). https://doi.org/10.1007/978-3-319-59162-9
Ma T, Te Yu, Xiuge Wu, Cao J, Al-Abdulkarim A, Al-Dhelaan A, Al-Dhelaan M (2020) Multiple clustering and selecting algorithms with combining strategy for selective clustering ensemble. Soft Comput 24:15129–15141. https://doi.org/10.1007/s00500-020-05264-1
Mandel’ ID (1988) Klasternyy analiz. Finansy i statistika, Moscow
Mas'kov S (2012) Preimushestva ispolzovaniya klasternogo podhoda k razvitiyu regional'noy ekonomiki. Mockow
Pautov K, Popov F (2015) Metod klasterizatsii tematicheskikh profiley pol'zovateley i yego primeneniye dlya analiza internet-trafika. Fundamental'nyye issledovaniya 7:765–769
Ribeiro PC, Biles ML, Lang C, Silva C, Plass JL (2018) Visualizing log-file data from a game using timed word trees. Inf vis 17:183–195. https://doi.org/10.1177/1473871617720810
Shatalkin AI (2012) Taksonomiya: Osnovaniya, printsipy i pravila. Zoologicheskiy muzey MGU. Tovarishchestvo nauchnykh izdaniy KMK, Rossiya
Teixeira C, de Vasconcelos JB, Pestana GA (2018) Knowledge Management System for Analysis of Organisational Log Files. In: 13th Iberian conference on information systems and technologies (CISTI). IEEE, Iberian Conference on Information Systems and Technologies.
Tsikhan TV (2003) Klasternaya teoriya ekonomicheskogo razvitiya. Teoriya i Praktika Upravleniya 5:74–81
Xu X, Jiang Y, Flach T, Katz-Bassett E, Choffnes D, Govindan R (2015) Investigating Transparent Web Proxies in Cellular Networks. In: International conference on passive and active network measurement. pp 262–276 https://doi.org/10.1007/978-3-319-15509-8_20
Acknowledgements
The author thanks the editors and anonymous reviewers for their helpful comments and suggestions that have led to this improved version of the paper.
Funding
The authors have not disclosed any funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares no conflict of interest.
Ethical approval
This article does not contain any studies with human participants or animals performed by any of the author.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Mahmudova, S. Development of a method for processing log files using clustering. Soft Comput 27, 1617–1628 (2023). https://doi.org/10.1007/s00500-022-07740-2
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-022-07740-2