Baler: deterministic, lossless log message clustering tool

  • Narate TaeratEmail author
  • Jim Brandt
  • Ann Gentile
  • Matthew Wong
  • Chokchai Leangsuksun
Special Issue Paper


The rate of failures in HPC systems continues to increase as the number of components comprising the systems increases. System logs are one of the valuable information sources that can be used to analyze system failures and their root causes. However, system log files are usually too large and complex to analyze manually. There are some existing log clustering tools that seek to help analysts in exploring these logs, however they fail to satisfy our needs with respect to scalability, usability and quality of results. Thus, we have developed a log clustering tool to better address these needs. In this paper we present our novel approach and initial experimental results.


Text mining Text clustering Log file analysis System log analysis 


  1. 1.
    de Guzman J Boost-spirit, c++ libraries for parsing and output generation. URL
  2. 2.
  3. 3.
    Makanju AA, Zincir-Heywood AN, Milios EE (2009) Clustering event logs using iterative partitioning. In: KDD ’09: Proceedings of the 15th ACM SIGKDD international conference on knowledge discovery and data mining. ACM, New York, pp 1255–1264. doi: 10.1145/1557019.1557154 CrossRefGoogle Scholar
  4. 4.
    Rigoutsos I, Floratos A (1998) Combinatorial pattern discovery in biological sequences: the teiresias algorithm. Bioinformatics 14(1):55–67. doi: 10.1093/bioinformatics/14.1.55. URL CrossRefGoogle Scholar
  5. 5.
    Schroeder B, Gibson GA (2006) A large-scale study of failures in high-performance computing systems. In: Proceedings of the international conference on dependable systems and networks. IEEE Computer Society, Washington, pp 249–258. doi:  10.1109/DSN.2006.5. URL Google Scholar
  6. 6.
    Stearley J Supercomputer event logs. URL
  7. 7.
    Stearley J (2004) Towards informatic analysis of syslogs. In: IEEE international conference on cluster computing, pp 309–318. doi: 10.1109/CLUSTR.2004.1392628 Google Scholar
  8. 8.
    Stearley J Sisyphus–a log data mining toolkit (2008). URL
  9. 9.
    Vaarandi R (2003) A data clustering algorithm for mining patterns from event logs. In: 3rd IEEE workshop on IP operations and management (IPOM 2003), pp 119–126 CrossRefGoogle Scholar
  10. 10.
    Vaarandi R (2004) A breadth-first algorithm for mining frequent patterns from event logs. In: Proceedings of the 2004 IFIP international conference on intelligence in communication systems, pp 293–308 Google Scholar

Copyright information

© Springer-Verlag 2011

Authors and Affiliations

  • Narate Taerat
    • 1
    Email author
  • Jim Brandt
    • 2
  • Ann Gentile
    • 2
  • Matthew Wong
    • 2
  • Chokchai Leangsuksun
    • 1
  1. 1.Louisiana Tech UniversityRustonUSA
  2. 2.Sandia National Laboratory in CaliforniaLivermoreUSA

Personalised recommendations