Skip to main content
Log in

Development of a method for processing log files using clustering

  • Application of Soft Computing
  • Published:
Soft Computing Aims and scope Submit manuscript


A log file is a document that keeps track of all events that occur on a website or server. Many log files are very large, so they can be regularly written over outdated content, or entire collections of log files with names, including a date, for example, can be created. In the event of technical problems, site inaccessibility, virus infection, hacker attacks and Distributed Denial of Service (DDoS) attacks, the resource administrator can use the information in log to find the cause, which makes it easier and faster to eliminate unwanted incidents. The paper analyzes the definition, types, location, use and examples of log files. Data are transferred to the MySQL database using the Squid.db database. Clustering is performed using a database. The study highlights clustering, analyzes metrics, and determines the proximity of clusters and objects in clusters in Euclidean space. Experiments are conducted and the results are satisfactory. For example, data are transferred to the MySQL database using the Squid.db database. Since the Squid proxy server is a cache proxy server, it stores resources, and the work is done quickly on the next request. Data are clustered using a compiled table of databases transferred to MySQL via Squid proxy. In this case, unnecessary entries are deleted from the table, which significantly speeds up data processing. The application of clustering method in problem solving is fast and simple. For the problem stated, the degree of closeness of clusters and objects in clusters in Euclidean space is determined. Experiments are conducted using obtained results.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
EUR 32.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or Ebook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

Data availability

Enquiries about data availability should be directed to the authors.


  • Alekseev A, Megino F, Klimentov A, Korchuganova T, Maendo T, Padolski S (2015) Building analytical platform with Big Data solutions for log files of PanDA infrastructure. In: International conference information technologies in business and industry, PTS pp.1–4.

  • Aleksey L (2008) Bystryy perenos log-faylov v MySQL.

  • Kimberly C, Wessels D (1997) Internet cache protocol (ICP). Electronic resource

  • Deepika Dutta M., Virender D., Salim P., Murthy C (2020) Design and Development of Centralized Squid Proxy Management System. In: IEEE ınternational conference on electronics, computing and communication technologies (CONECCT).

  • Golovchiner M (2009) Bazy dannykh: Osnovnyye ponyatiya, modeli dannykh, protsess proyektirovaniya: uchebnoye posobiye. Tomsk.: TGU, pp. 125.

  • Grinkevich S (2009) Analiz Web-logov dlya postroyeniya modeli nagruzochnogo testirovaniya.

  • Hassani M, Shang WY, Shihab E, Tsantalis N (2018) Studying and detecting log-related issues. Empir Softw Eng 23:3248–3280.

    Article  Google Scholar 

  • Khaydukov D (2009) Primeneniye klasternogo analiza v gosudarstvennom upravlenii. Filosofiya matematiki: aktual’nyye problemy. MAKS Press, Rossiya

    Google Scholar 

  • Kuhner N, Lindenmayr O, Maier A (2017) Classification of Body Regions Based on MRI Log Files. Proceedings of the 10th international conference on computer recognition systems CORES. Advances in Intelligent Systems and Computing, Springer; 578 (pp.102–109).

  • Ma T, Te Yu, Xiuge Wu, Cao J, Al-Abdulkarim A, Al-Dhelaan A, Al-Dhelaan M (2020) Multiple clustering and selecting algorithms with combining strategy for selective clustering ensemble. Soft Comput 24:15129–15141.

    Article  Google Scholar 

  • Mandel’ ID (1988) Klasternyy analiz. Finansy i statistika, Moscow

    Google Scholar 

  • Mas'kov S (2012) Preimushestva ispolzovaniya klasternogo podhoda k razvitiyu regional'noy ekonomiki. Mockow

  • Pautov K, Popov F (2015) Metod klasterizatsii tematicheskikh profiley pol'zovateley i yego primeneniye dlya analiza internet-trafika. Fundamental'nyye issledovaniya 7:765–769

    Google Scholar 

  • Ribeiro PC, Biles ML, Lang C, Silva C, Plass JL (2018) Visualizing log-file data from a game using timed word trees. Inf vis 17:183–195.

    Article  Google Scholar 

  • Shatalkin AI (2012) Taksonomiya: Osnovaniya, printsipy i pravila. Zoologicheskiy muzey MGU. Tovarishchestvo nauchnykh izdaniy KMK, Rossiya

    Google Scholar 

  • Teixeira C, de Vasconcelos JB, Pestana GA (2018) Knowledge Management System for Analysis of Organisational Log Files. In: 13th Iberian conference on information systems and technologies (CISTI). IEEE, Iberian Conference on Information Systems and Technologies.

  • Tsikhan TV (2003) Klasternaya teoriya ekonomicheskogo razvitiya. Teoriya i Praktika Upravleniya 5:74–81

    Google Scholar 

  • Xu X, Jiang Y, Flach T, Katz-Bassett E, Choffnes D, Govindan R (2015) Investigating Transparent Web Proxies in Cellular Networks. In: International conference on passive and active network measurement. pp 262–276

Download references


The author thanks the editors and anonymous reviewers for their helpful comments and suggestions that have led to this improved version of the paper.


The authors have not disclosed any funding.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Shafagat Mahmudova.

Ethics declarations

Conflict of interest

The author declares no conflict of interest.

Ethical approval

This article does not contain any studies with human participants or animals performed by any of the author.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Mahmudova, S. Development of a method for processing log files using clustering. Soft Comput 27, 1617–1628 (2023).

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: